Archive for the ‘Semantic Web’ Category

For Niall, another approach to Web 3.0

Monday, June 11th, 2007

Following a post in which I had commented on Google’s and VortexDNA’s approaches to Web 3.0, a reader wrote in asking what other solutions are out there.

Fortunately, ZDNet’s David Berlind interviewed Sir Tim Berners-Lee last Wednesday at the MITX (Massachusetts Innovation and Technology Exchange) Technology Awards, and then obligingly posted the video on his blog. Berners-Lee gives a convincing and relatively plain English (he does use the term ‘data bus’) explanation of the Semantic Web that’s been his pet project—I found it worth the 11 minutes I spent watching it.

Tim Berners-Lee at MITX

One of the many things he touched base on was the difference between the Semantic Web and mash-ups of APIs. Basically, he explained, with mash-ups, people have to intentionally take two different databases and decide to put them together. An example might be someone taking a database of maps and a database of coffeehouses and slapping one on the other so you get maps of coffeehouses.

By contrast, Berners-Lee’s vision of the Semantic Web is that all of the information is just data, accessed by the universal query language SparQL. Once the work has been done to convert documents to standardized data, anyone can pull up anything or combine it in any fashion to see relationships.

In his writeup of the interview, though, Berlind raised some doubts as to whether it would be to the benefit of webcos to be on board with the data standards being proposed by Berners-Lee.

Given the popularity of API-driven access, in the back of my mind, I couldn’t help wonder if there wasn’t a bit of a race going on. On one side, there’s the W3C with the work its doing on the Semantic Web (based very much on something known as RDF or the Resource Description Framework).

On the other, a lot of big Internet companies would probably prefer developers go the non-standard API route because of the way API-dependencies can result in developer loyalty (ok, “lock-in”). After all, once code is written and reliant on APIs (and it works), API extrication (in favor of using SparQL against RDF) will invariably entail a rewrite. That is unless developers are anticipating the Semantic Web and modularizing their code in such a way that they have query modules that abstract query specifics. In that case, so long as the module returns the same information, it’s only the guts of the module that have to be fixed (trust me, it’s much more complicated that I’m making it seem).

Berlind isn’t the first to raise these questions—they cut to the heart of the Semantic Web vs. Semantic Search debate. It reminds me of the early days of Mac against PC. Yes, in that case the clear winning strategy was to open the standards to everyone, but in that case the standards hadn’t been defined or entrenched. To achieve Berners-Lee’s proposition, billions of web contributors would have to shift the way they operate. It will take a powerful tipping point to get there.

Do you think it will be possible to establish universal standards, like RDF, to make web data infinitely accessible? Or is the smarter bet on semantic tools to process non-semantic data?

Solve for semantics at the search engine level

Tuesday, May 29th, 2007

I’ve put up a few posts about the controversial semantic web or ‘Web 3.0′. Most people have a gut reaction that the concept is buzzword-heavy and lacking in practicality, or even a clear definition. Dr Riza C. Berkan summed up the issues today with intellectual rigor in a ReadWriteWeb post:

The two basic views of a semantic search are identified by the location of the semantic resources to be implanted. The first view is to embed the semantic resources in the Web pages themselves. It is called the “Semantic Web”. Why not compose Web pages in a structure that is semantics friendly?

…The “Semantic Web” approach has been around for a long time now. Unfortunately, it is based on an unrealistic assumption that every Web author will abide by the complex rules of semantics - not to mention the education it requires - and place content in the correct buckets of mysteriously unified standards. Another form of this approach may be to design Web factories that crank out refined Web pages once fed by ordinary Web pages. Of course if there is more than one factory, you have the standards issue again. In this day and age of fast content production, the Semantic Web seems to be more idealism than realism.

Dr Berkan goes on to discuss the pros of focusing efforts to understand the user at the search stage:

Without relying on statistics, long-tail queries can be analyzed by semantic algorithms on the fly, and bring search results with the accurate context… a semantic approach is very effective in handling dynamic content and can unleash its full power the second the content is born.

The argument, highly valid, is that it is easier to make one search engine intelligent than billions of web pages.

Dr Berkan’s company, hakia, offers a semantic search engine, as do Cognition Search and Lexxe. Powerset is working on theirs.

VortexDNA shares Dr Berkan’s view—in fact, we’re taking one step further away from the content. The idea behind MyWebDNA is not to create a new search engine, but a universal measure of relevance that can be overlaid onto any search engine.

Our tactics are different: the means of determining relevance can be through context, meaning, or, in our case, the purpose and values of the user. But our fundamental approach is the same: create the right lens, and the results will come into focus.

Forget about keywords—focus on the individual for relevance

Friday, May 25th, 2007

In an article aptly titled The Mind Blowing Evolution of the Social Web, Solomon Rothman of WebProNews had this to say about personalization:

Web 3.0 will see all the social, user generated, and independent content conglomerated, analyzed and spit out in ways that can be quickly and efficiently customized to what’s important to you. Relevancy will no longer be determined at the keyword level, but on the individual level. Smart services will actually understand what you like and will evolve as do your likings and importance. It won’t be artificial intelligence yet, but it will have enough data from enough places to be able to quickly learn about your habits through smart “agents.”

I say that the article is aptly titled because I agree that the power of the web is nothing short of mind-blowing. And Solomon’s take on what constitutes personalization fits hand-in-glove with the VortexDNA view of the world, particularly that one sentence, which I think bears repeating:

Relevancy will no longer be determined at the keyword level, but on the individual level.

Later in the paragraph, he says that the web will have data to learn about your habits. MyWebDNA, of course, is taking a different approach, operating on the premise that we can get as good or better results from focusing on who you are, rather than on your habits. But we’re all going in the same direction here, following our hypotheses towards greater relevance.

This is the natural direction of the web. It can’t be more content—we’ve already got content coming out of our ears. It can’t be more participation—YouTube, Flickr and Wikipedia have effectively ensured that the web is now thoroughly powered by its community.

No, the direction of the web has to be towards understanding. Google’s mission is ‘to organize the world’s information and make it universally accessible and useful’. But we all know that ‘the world’s information’ isn’t necessarily useful to me. If I devote the rest of my life to absorbing information, I’m still only ever going to access a tiny subset of all the information that’s out there. So in order for Google to fulfil its mission of making information universally accessible, they have to make it individually relevant.

This, I suspect, is the driver behind Google’s personalization push. I’d love to know what you think.

Yahoo seeks to understand the people behind the technology

Sunday, May 13th, 2007

In a move they’re comparing to Bell Labs, Yahoo has just added researchers in economics and sociology to their team. Elinor Mills got this interesting quote for her ZDNet piece from Prabhakar Raghavan, head of Yahoo Research:

Having researchers who aren’t focused on computer science will not only help Yahoo improve its product and service development, but could lead to advances in the development of technologies underlying the Internet. The viewpoint and way of thinking (of researchers) is different from people like myself who come from a computer science background.

Brav-o, Yahoo! I thoroughly applaud that move. Essentially, what they’re saying is this:

If you want to be able to serve people, first you must seek to understand them.

Not only must you seek to understand them, but you must seek to understand them for who they really are, genuinely, independent of their direct connection to your product or service. That’s how you ensure that your decisions are driven by what people want and need, rather than trying to tailor people’s wants and needs to your decisions. That’s why it’s so smart for Yahoo to set up the researchers as a distinct department, not as a subset of the marketing department.

All of our previous discussion about Web 3.0 notwithstanding, I believe this is also an opportunity to reflect on the different stages of a maturing marketplace. You could say this: In Web 1.0, we were surprised that we could even post text, and everything seemed impressive. In Web 2.0, we started to test the boundaries of the technology for technology’s sake, and marveled at our newfound power. I propose that Web 3.0 will be the merger of rapidly evolving technology and continuous adaptation to the wants and needs of the audience that technology is meant to serve: us.

Tell me how you feel—would you prefer Yahoo and others spend time to understand what makes you tick, or do you think they’re straying into territory that shouldn’t concern them?

Hunting and gathering on Amazon

Sunday, May 6th, 2007

Last week, in an article entitled Discovery: The Anti-Search, David Berkowitz described the trend towards greater integration of discovery in a user’s search experience:

Through discovery, when you read your favorite newspaper online, you’re presented with a wealth of links from around the Web that should be of interest to you, including other articles, related books or products, or video clips, whether or not you’d expect them to be directly relevant. Amazon.com does this regularly, such as when it told me that customers who bought the Black & Decker 3.4 PS550B Handsaw also bought a 5-pound bag of Haribo Gummi Bears and the movie “Borat.”

Berkowitz rightly notes that discovery can’t replace search—they’re more effective together, like hunting and gathering—but that it absolutely can enhance search, in his word, ’serendipitously’.

What a glorious word, ’serendipitously’. It fairly rolls off the tongue. What’s so beautiful about it is how it niftily combines an element of happenstance with a portion of positive fortune, and that’s exactly what Berkowitz is pointing at here: you shouldn’t just stumble on random sites, but on sites that happen to be specifically interesting to you.

In the early days of the Internet, everything was so new that it all seemed serendipitous, like the old saying that any sufficiently advanced technology is indistinguishable from magic. As we continue to grow in our experience, though, we need ever greater depth if we want to retain that wide-eyed amazement. The original Godzilla is fun to watch now, but if we want to believe in the special effects, we need Peter Jackson.

In the case of Amazon’s handsaw/Gummi Bear/Borat combination, the algorithms are working purely on historical statistics of other users. Surely, they reason, if one person bought our handsaw and then our mockumentary, someone else will be interested in the same combination. And, like those early movies, the initial results have been impressive. If Amazon doesn’t get it right, you give a giggle or ignore it and move on. If they nail it, though, you can’t believe it: “How did they know I love Gummi Bears? They must really care about me!”

They’re bound to nail it sometimes, because it’s not unusual for people to make similar purchase combinations. Surely, though, you know someone who shares your love for handtools but not much else.

This is where companies like VortexDNA come in, allowing serendipity to occur not based a single instance of external behavior, but rather on an expression of who you are. Maybe 100 people who bought the saw also bought the Gummi Bears, but only two of them share your core purpose and values. At the same time, 40 people who are aligned with who you really are bought a Donna Summers CD. VortexDNA suggests that Amazon is more likely to score a sale by suggesting Donna than by pushing the Bears.

Serendipity in search is what continues to maintain the Internet as an exciting and vibrant place of discovery. Caring about who the user is will keep it that way.

Google VP evangelizes for relevance

Wednesday, April 4th, 2007

New Zealand recently had the honor of hosting Internet legend Vint Cerf, currently Vice President and Chief Internet Evangelist of Google. While he was here, had a chat with my friend Steve Ballantyne, who in turn obligingly published large chunks of it in the National Business Review.

As the guy who led the development of TCP/IP protocols, and someone without whom the Internet as we know it would not exist, Cerf tends to have a few wise comments about the direction of the web. Here’s one of them:

Now I think Google is very definitely the place to be. Google swept its competitors aside very quickly – I was a big user of AltaVista until the Google page-rank algorithm showed up.

There’s still a problem with finding the information that is relevant. My guess is that the next step in search will require making things more relevant, which may require things like the semantic web that Tim Berners-Lee has been working on.

That was great to hear. It’s great that he’s got such a clear understanding of how the Internet is not yet satisfying all of the needs of its users. Obviously, it’s also great that Cerf is saying the same thing that we’ve been saying here. VortexDNA’s purpose in life is to be a Universal Measure of Relevance (and, yes, fellow Lynne Truss devotees, that title deserves proper noun status).

But it’s particularly great because Cerf is a Google guy, and what he’s really saying is this:

“Google’s page-rank algorithm is good, but it’s not enough.”

Assuming you know a bit about Google’s page-rank algorithm, you could also interpret his comments as follows:

‘Popularity’ does not necessarily equal ‘relevance’.

Ranking pages based on popularity was a phenomenal start. Google set the bar for search—and, in doing so, provided a launch platform to the moon. Now the technology is available to combine Google’s immensely powerful statistical algorithms with a means of ensuring those results are more relevant to the particular individual conducting the query. This collaboration will take us a step closer to Cerf’s vision of relevant search.

We don’t need to reinvent the wheel, but there’s nothing wrong with improving it.