Monday, March 30, 2009

The Unreasonable Effectiveness of Data

Following the post of David's Wienberger on his excellent blog. I read the article written by some Google's scientists carefully. I must admit - it is fascinating.

First, its title "The Unreasonable Effectiveness of Data" is fully intentional, admitted reference to “The Unreasonable Effectiveness of Mathematics in the Natural Sciences” by Eugene Wigner. Nice play with titles, but it is a bit misleading. The role of mathematics in natural science is just the opposite to the role of pure data in human knowledge. I could elaborate on it longer, but - what strikes me deeply, is another thing.

It seems that authors dismiss the message of Semantic Web advocates, among them, Tim Berners-Lee, for reasons that are not very clear.

Let me cite: " (...) But even if we have a formal Semantic Web “Company Name” attribute, we can’t expect to have an ontology for every possible value of this attribute. For example, we can’t know for sure what company the string “Joe’s Pizza” refers to because hundreds of businesses have that name and new ones are being added all the time. We also can’t always tell which business is meant by the string HP.(...)”

Well, in all Semantic Web proposals we do not care what "Joe's Pizza" or "HP" means!

We care about one thing - that "Joe's Pizza" is The Company Name.

We do not need ontology for the name itself, we need it for a different potential "Company Name" concept !!!

Not everything in this article is plainly bad, though.

What I liked, was the call "So, follow the data" - in some vague sense they reaffirmed the principle of least action of Tim Berners Lee. I also must admit, that the distinction of “Semantic Web” from “Semantic Interpretation” is very convincing and it is another good part of the article.

Finally, I often think, that Google would be The One who could push Semantic Web forward. And for some reason they don't.

They could simple cry out loudly: "Hi webmasters around the world - use RDF or microformats to mark your contact/author data and we will use it in our search engine!"

Apart from conspiracy theories, there is something in this article, written by Google researches that justifies their unwillingness to start the ball rolling...

This post was first published as comment on David Weinberger and Seb Schmoller blogs.


  1. I agree. The Google scientists clearly have a bias against the semantic web standards. The “Semantic Web” vs “Semantic Interpretation” dichotomy is misleading because the semantic web includes relational database type functionality on unique data that is not able to be inferred statistically.

    Please also see Stefano's comments:

  2. Rick,

    Thanks for pointing to Stefano's comments about the article.

    He expressed it even stronger - it is hypocritical at best, and toxic at worst.

    Yes, the dichotomy of Semantic Web and Semantic Interpretation is misleading.
    Trying to contradict them is - toxic :-)

    Nonetheless, the article has some virtues in stressing the possibility to extract meaningful information from unstructured data.
    As I said in my post - that was original rationale behind Tim-Berners Lee's Principle of Least Power...

    BTW, hyperlink to your blog is broken.

  3. This study by Google throws further light on the issues.


  4. Seb,

    yes, it sheds light on the status quo of markup elements and attributes in todays web.

    but - does it also say something about google's agenda for/against SW, we discuss here? Maybe I missed something?