Web 3.0 and its discontents
January 31st, 2008For my first post to our new blog, I thought I would jump into an area that is of great and timely interest: The emerging “Semantic Web” and the technologies and solutions proposed to enable it.
There has been a lot of “Web 3.0″ buzz in the last year. See for example this MIT “Technology Review” article, Business 2.0’s piece on Radar Networks, the New York Times’ Metaweb article, and John Markoff’s original Web 3.0 article from the NY Times in late 2006. The reaction in the blogsphere has been equally interesting. There appears to be a combination of believers and advocates, both Web 2.0 players who are mad at the hype being stolen and those who are skeptics. If I were to put myself in a camp, I’d have to say I’m an “optimistic skeptic”.
I believe something like this vision of Web 3.0 will play out, but it might take the market six or seven “attempters” before we find a Google of Web 3.0.
Whoever eventually gets it right must overcome at least three critical issues to make the Web 3.0 vision reality. I’ll lay them out here.
The W3C vision of the Semantic Web is a Dead End
Semantics = Metadata + Reasoning was and is a bad idea in the context of bridging human communication and machines.
Karen Sparck Jones sums it up nicely here. As she explains rather eloquently, we really have to look at which Semantic Web we are talking about. Most parties in the space sell the value of the “high end” Semantic Web (inferential reasoning from advanced world models using a uniform lexicon with known derivation rules) but really only have technology for the “low end” Semantic Web (human tagging/machine entity extraction + crude resolution procedures for mapping/structuring like elements into classes). The truth is that building the sub-domain ontology hooks is just a way to replace business logic with a derivative of XML. The marginal gains in flexibility by this approach are a costly tradeoff for the complexity, bloat, and performance implications of pushing around such an overly expressive and poor representation of knowledge.
Most vendors in the space seem to think they can execute on a simple ontology around some patterns of activity, such as a limited ontology around people, places, and particular electronic modes of communication. This is really just reengineering the integration of Outlook/Exchange with social networking or the development of mapping rules from certain descriptive strings in Wikipedia along a priori detectable paths (such as X is a Location and is tied to this person’s entry). While there is little doubt that this does enrich the content (a la MarkLogic’s enterprise offerings), it really isn’t the Semantic Web. This doesn’t mean that there is little value in that – there is – but, it isn’t the promise of the Semantic Web.
Scalability and Complexity
No one has demonstrated deep semantic web infrastructure of any scale. This doesn’t mean that such infrastructure is impossible, just that no one has shown it working at Web-scale.
There has been talk of powerful triple- and n-tuple stores and searches over billions of tuples in millisecond time. Given what GigaSpaces and other tuple space architectures have accomplished, this isn’t as big an issue as people think. Of course, most of these numbers are out of the funded companies, not necessarily indicative of real world environments.
The problem isn’t the scale, it’s the ambiguity of the state-space. Language and the agents that use it are utterly magical in how easily they deal with ambiguity that defies simple traversal of a limited graph. This is one reason many bright people have speculated that the brain must have some quantum properties in how it makes inferences across such a large number of potential states.
To see how hard the combinatorics of this are, compare it with work in n-gram models. People were still receiving PhDs for dissertations on 5-gram models as of a few years ago. While some may argue that the fixed semantics of a rule-base/ontology don’t lead to anything in this kind of state space, I’d challenge them to deal with a large, dynamic lexicon and more than trivial top-level ontological classes.
The bottom line is that a schematic representation is likely only going to be able to handle traversal along very rigid paths that are mapped to very specific use cases. In other words, if you try to make business logic that is remotely as dynamic as human semantics in language, you will have a problem representing them correctly. Worse yet, current representations are inherently unscalable.
There is promising work in this area looking at semantics as a superposition of states with runtime collapse into the appropriate sense (see Maya Design). Most systems that try to do this the “old fashioned way” take around a second to process a sentence on modest hardware. And they are still limited to modest global semantics.
Here’s the key point: Without proof of scale, it’s just a cute demo.
Magic is a lot easier in a controlled environment under unrealistically small data constraints. It’s just a looser way of overfitting. The US Intelligence Community has already been through this and is on the other side of investing a whole lot more money than Sand Hill Road in getting this to work. Upon looking into this a while back, former NSA Chief Scientist Eric Haseltine was summarily unimpressed.
Thankfully, some people are doing a good job of setting expectations honestly for what could be done right now (see Nova’s comments in the Business 2.0 article mentioned at the beginning of this post). Those people have a far better prospect of creating value and loyalty in a future user base.
We don’t need a Web 3.0 version of Google: Just Say NO to Semantic Silos!
The Web 3.0 vision should not be realized through any one Web site. Instead we should work to realize it using diverse software that sits all over, the distributed hybrid of the current grid architecture of the Amazons, Suns, and Google and the BitTorrent model of distributed tracking and swarming of intelligence.
The alternative, implementing the current hosted model of data for semantics, would be very dangerous. And yet that is the espoused goal of the current market leader.
What could be more Orwellian than having to go to an outside server to determine what the accurate sense of the word “tax” or “war” is? That prospect should scare you. Semantic silos have much larger consequences than our current lock-in to a given social network or hosting for a video. Should such silos develop, it will be tantamount to auctioning off the truth.
The moral answer to this is to build the Semantic Web using a new type of software, not the same old centralized uber-service. And that is the answer we are pursuing and look forward to discussing with you.
Please contact us with your thoughts and subscribe to this blog to join the discussion.










