Archive for the ‘SEO’ Category

Relevance challenges for search engines

Monday, July 23rd, 2007

Hamlet Batista put up an excellent piece on SEOMoz a few days ago, entitled 7 Reasons Why Search Engines Don’t Return Relevant Results 100% of the Time. In it, he describes, funnily enough, the seven reasons why search engines don’t return relevant results:

  1. Relevance is subjective
  2. Natural language searches
  3. Poor queries
  4. Synonymy
  5. Polysemy
  6. Imperfect performance
  7. Spam

Batista really does a superb job of exploring each of these reasons; I’m just going to additionally touch on a few of them here.

1. Relevance is subjective
Let’s start with the first one, ‘relevance is subjective’. Batista describes it this way:

You can do a search for ‘coffee’ in Canada and find Tim Horton’s website as the most relevant. Makes sense, as that’s the most popular coffee chain in Canada, but for somebody in Seattle, Starbucks might be the most relevant result. You can do a search for the ‘49ers’ and be looking for the football team, but a historian may be looking for research material on California. And you might even do a search today for ‘bones’ trying to find where to buy your dog a treat, but tomorrow you do that same search looking for an episode of the TV series ‘Bones’ that you missed the night before.

…So far the best approaches the search engines have come up with are the use of human quality raters and personalized search. The better the search engines profile the searcher, the higher the chances of producing relevant results. This method obviously raises a lot of privacy concerns.

At VortexDNA, of course, we take the concept of subjective relevance a step further than location or job description; we suggest, and have shown, that it is profoundly affected by the user’s core purpose and values.

He also suggests that personalization inevitably leads to privacy concerns—only true for methods that rely on tracking history and demographics. When values are used to calculate relevance, there’s no need to track search history or clickstream.

2. Natural language searches
Next on Batista’s list is the use of natural language in search queries:

A search engine, on the other hand, receives ‘who has smith as last name in chicago’ or ’smith last name chicago’. The query is in natural language — our language.

Is it, though? When was the last time you spoke with a person and said, ‘Smith last name chicago’? I submit to you that we are far more demanding of our search engines than we are of any human being. Look at Batista’s examples under the previous point about ‘bones’ and ‘coffee’. Would you go t an information desk and ask, ‘Bones?’ When they’re put forth as search examples, though, we don’t question them; it’s in fact a highly plausible scenario for us to a word or two at a search engine and then be disappointed when they’re unable to disambiguate our queries.

That’s not natural language; it’s unreasonable expectations. It also leads into Batista’s next point:

3. Poor queries
His description of poor queries include colloquialisms (like ’sucker’ for vacuum) and misspellings. As I stated above, I think poor queries also includes minimalist terms and odd syntax. We couldn’t expect a human being to know what we were after with those words, but we do hope for a machine to guess our intent.

6. Imperfect performance
As I said, I’m only going to touch on some of the seven, so we’ll skip synonymy and polysemy and go straight to imperfect performance. Batista says that the two criteria that define search performance are precision and recall:

Precision is a measure of how efficient the search engine is in returning only the relevant results for the search. The more irrelevant results, the lower the precision. Recall, on the other hand, measures how good the search engine is in returning all the relevant results. (Of course, this assumes the researcher knows how many relevant results there are.) The more relevant results missing from the search, the lower the recall.

Ideally, a search engine should identify all relevant documents without returning any irrelevant ones (100% precision and 100% recall). In practice, this has been proven to be impossible, as precision and recall are inversely proportional.

It sounds like Heisenberg’s Uncertainty Principle, which states that

…it is impossible to perfectly measure a particle’s position and velocity at the same time. The more accurately you measure a particle’s position, the more inaccurate your measure of its velocity, and vice versa.

It may sound strange that I’m citing quantum physics when we’re talking about search engine performance, but I call parallels where I see ‘em, thank you very much.

The point here is not only that it appears to be impossible to achieve perfect precision and perfect recall simultaneously, but also that the aim should be to find the optimum tension or balance between the two. At what point does declining recall produce diminishing returns for incremental increases in precision, and vice versa?

I have some additional questions about these measures; namely, how precision and recall can be defined when we already know that relevance is subjective (see point 1). They do, however, serve as valuable parameters for putting search improvement efforts in context.

I really appreciate Batista’s skill in describing these seven challenges, and I believe we’re only scratching the surface here. What do you see as the biggest challenge search engines face in delivering the results you want?

Personalized search and SEOs

Friday, July 6th, 2007

Aaron Goldman and Gord Hotchkiss of Search Insider have been having a bit of a back-and-forth about search personalization. Aaron summed up their respective positions last week:

Gord has been preaching that the biggest (and most important) opportunity for innovation in the search space is around personalization. I agree that we’ll see steady investment and advances in this area, but I’m less bullish than Gord on the prospects of personalized search to truly benefit the digital ecosystem.

These are two columnists that have a lot of respect for each other; Gord responded in true sportsman fashion yesterday:

It’s hard to find fault with his points. They’re all very real flaws in making personalization a credible evolution in search relevancy. Also, somewhere along the line, it appears that I’ve become the cheerleader for personalized search.

The problem for SEOs
In last week’s piece, Aaron went on to say that one situation where an individual doesn’t want to see results tailored to him or her is SEO practitioners who want to see what the general public sees atop the search rankings.

SEO practitioners, in fact, have been the most vocal segment to raise concerns about personalization. I suspect that this is primarily due to concern about revamping the business model and fear that they might be done out of a job; I would suggest that they will continue to play a vital role for online advertisers even though their tactics will have to be modified.

As with all professions, SEOs will be wiser to be continually looking forward to adapt the value they provide to the changing market. People who spend their energy trying to retain a status quo against the unstoppable force of market evolution will ultimately lose.

The new paradigm
The ones who do embrace what seems to be a pretty definite trend towards personalization will quickly realize the tremendous opportunity for online advertising. Forget about keywords! Serve up ads that are relevant to the user, not relevant to the words!

At a round table last month, Gord spoke about the necessary shift in focus for SEOs:

The thing about SEO in pre-personalization is that there are keywords and algorithms and everything revolves around keywords. But in personalization, it revolves around users: social pattern, search history, web history, and current tasks would revolve around this.

It’s very difficult for a marketer to look at an individual user. That becomes very granular. We’re going to look at buckets of behavior and work around themes. Themes that fall into common user themes are emphasized instead of keywords. Long tail optimization becomes very interesting. Optimizers will look at the long tail a little bit more where personalization may not be an impact right away. Personalization can really drive a much more presentation of universal search results. If you know more about the user, you’re more confident in providing different results to the user. Thus, understanding user behavior is vital. Knowing what people are looking for is critical. User-centric development will finally take hold. You would not believe how many sites are not user-centric. This will really push that.

(Note: I couldn’t tell from the post if this was an exact transcript or not; Gord, feel free to correct me if I got something wrong!)

His first sentence is what sums it up: the current SEO paradigm revolves around keywords; the personalized SEO paradigm revolves around users.

This is why the incremental shift towards personalization is inevitable: because our world is moving away from outputs and towards outcomes. At it most extreme, the Internet is an output. Our enhanced ability to connect as humans is an outcome.

VortexDNA is entirely people-focused, although the technology doesn’t rely on search history and so avoids many privacy and tracking concerns. The people in this company believe that the more you can truly connect with on a profound level, the more successful you will be. I hope that SEOs are excited by the opportunity that search engine personalization will provide to connect client content with the people who will be most moved by it.