Coding Horror: Trouble In the House of Google: If these dime-store scrapers were doing so well and generating so much traffic on the back of our content – how was the rest of the web faring? My enduring faith in the gravitational constant of Google had been shaken.... I can't help noticing that we're not the only site to have serious problems with Google search results in the last few months.... Anecdotally, my personal search results have also been noticeably worse lately. As part of Christmas shopping for my wife, I searched for "iPhone 4 case" in Google. I had to give up completely on the first two pages of search results as utterly useless, and searched Amazon instead.
People whose opinions I respect have all been echoing the same sentiment -- Google, the once essential tool, is somehow losing its edge. The spammers, scrapers, and SEO'ed-to-the-hilt content farms are winning.
Like any sane person, I'm rooting for Google in this battle, and I'd love nothing more than for Google to tweak a few algorithmic knobs and make this entire blog entry moot. Still, this is the first time since 2000 that I can recall Google search quality ever declining, and it has inspired some rather heretical thoughts in me -- are we seeing the first signs that algorithmic search has failed as a strategy? Is the next generation of search destined to be less algorithmic and more social?
It's a scary thing to even entertain, but maybe gravity really is broken...
Why We Desperately Need a New (and Better) Google: This semester, my students at the School of Information at UC-Berkeley researched the VC system from the perspective of company founders. We prepared a detailed survey; randomly selected 500 companies from a venture database; and set out to contact the founders. Thanks to Reid Hoffman, we were able to get premium access to LinkedIn—which was very helpful and provided a wealth of information. But some of the founders didn’t have LinkedIn accounts, and others didn’t respond to our LinkedIn “inmails”. So I instructed my students to use Google searches to research each founder’s work history, by year, and to track him or her down in that way.
But it turns out that you can’t easily do such searches in Google any more. Google has become a jungle: a tropical paradise for spammers and marketers. Almost every search takes you to websites that want you to click on links that make them money, or to sponsored sites that make Google money. There’s no way to do a meaningful chronological search.
We ended up using instead a web-search tool called Blekko. It’s a new technology and is far from perfect; but it is innovative and fills the vacuum of competition with Google (and Bing).
Blekko was founded in 2007 by Rich Skrenta, Tom Annau, Mike Markson, and a bunch of former Google and Yahoo engineers. Previously, Skrenta had built Topix and what has become Netscape’s Open Directory Project. For Blekko, his team has created a new distributed computing platform to crawl the web and create search indices. Blekko is backed by notable angels, including Ron Conway, Marc Andreessen, Jeff Clavier, and Mike Maples....
In addition to providing regular search capabilities like Google’s, Blekko allows you to define what it calls “slashtags” and filter the information you retrieve according to your own criteria. Slashtags are mostly human-curated sets of websites built around a specific topic, such as health, finance, sports, tech, and colleges. So if you are looking for information about swine flu, you can add “/health” to your query and search only the top 70 or so relevant health sites rather than tens of thousands spam sites. Blekko crowdsources the editorial judgment for what should and should not be in a slashtag, as Wikipedia does. One Blekko user created a slashtag for 2100 college websites. So anyone can do a targeted search for all the schools offering courses in molecular biology, for example. Most searches are like this—they can be restricted to a few thousand relevant sites. The results become much more relevant and trustworthy when you can filter out all the garbage.
The feature that I’ve found most useful is the ability to order search results. If you are doing searches by date, as my students were, Blekko allows you to add the slashtag “/date” to the end of your query and retrieve information in a chronological fashion. Google does provide an option to search within a date range, but these are the dates when website was indexed rather than created; which means the results are practically useless. Blekko makes an effort to index the page by the date on which it was actually created (by analyzing other information embedded in its HTML). So if I want to search for articles that mention my name, I can do a regular search; sort the results chronologically; limit them to tech blog sites or to any blog sites for a particular year; and perhaps find any references related to the subject of economics. Try doing any of this in Google or Bing
The problem is that content on the internet is growing exponentially and the vast majority of this content is spam. This is created by unscrupulous companies that know how to manipulate Google’s page-ranking systems to get their websites listed at the top of your search results. When you visit these sites, they take you to the websites of other companies that want to sell you their goods. (The spammers get paid for every click.) This is exactly what blogger Paul Kedrosky found when trying to buy a dishwasher. He wrote about how he began Googleing for information…and Googleing…and Googleing. He couldn’t make head or tail of the results. Paul concluded that the “the entire web is spam when it comes to major appliance reviews”.