Categories
Blog

Google search is not a programmers best friend

I was playing around with google this weekend. The original problem I wanted to solve was that last.fm returns strange strange release dates for albums, so I was writing a small script that would extract the correct release date from various sources. I was aiming at www.metal-archives.com and wikipedia. Both of these sites have different search pages, and in general I’ve come to rely more on google’s site:xxx functionaly, than on individual pages own search engines. So I thought, why not just use google programmatically to search the sites. Seems easy enough.

Failure 1 (I’m feeling lucky):

Google has a very nice feature called “I’m feeling lucky” that will direct you to the first result. If I could specify my queries good enough, I could rely on that, and not have to parse google to get the url. It’s very simple, you just add &btnI at the end of your query and google will redirect. Sadly it works fine most of the time, but sometimes it just fails to redirect you. I couldn’t find any patterns to this randomness and a “works sometimes solution” is not a good one 😐

Failure 2 (google ajax):

I then found out that google has a seemingly very nice api that lets you do queries and get JSON back. JSON is easy to work with and it also allows one to go through several results, in case google doesn’t return the right one as the first result. After a bit poking around I found out that google ajax randomly returns different results from the normal google. It’s like using Yahoo instead of google. A bit of poking around returns the following 2 year old bug report. Furthermore the TOS directly forbids using the API for this kind of activity. Oh well, it didn’t work anyway.

Failure 3 (parsing google results directly):

After two bitter defeats I thought screw it, I’ll just parse the damn google result pages, how hard can it be? At least I know that it gives me the right results. So I did that, coded everything up and checked that things was working. Then let it loose on my collection (2×275 requests) and around the middle it stopped working. I poked a bit around, and found out that google has identified my program as a bad boy and decided to spank it by returning a “Please identify yourself as a human” page back instead of the normal google result page.

As a side note, after 3 bitter defeats I was ready to jump ship and try bing or Yahoo. That was a quick detour though, as none of them where up for the challenge of returning good results.

Categories
funny

Gmail please fix your spam filter

Lately Gmail has been become increasingly frustrating to use. When I check mail in the morning I have about 5 spam mails in my inbox. Even some from Viagra. Even worse, the number of false positives (ham classified as spam) has also been increasing so that I now at least once a day have to resort to looking for legitimate mail in my spam folder. For a spam filter this is the worst case scenario. Actually it seems like Gmail has trouble categorizing spam as seen below. I hope it’s only a UI bug, but something deeper down is definately rotten.

Categories
On the web

The right medium

Just finished reading Jeff Jarvis book What Would Google Do. It’s an interesting study of applying google thinking to a wide range of other businesses. In the book he mentioned the Is google making us stupid article, which mulls over what happens if we can just search for everything, do we even need to remember anything anymore? And what happens if we shift our reading from books, to blog posts, to 160 letter twitter posts. Does that make us more stupid? Of course not! The only thing that twitter, blogs and facebook is making us better at, is choosing the right medium to convey our message. That is to say, in a more effecient way.  Some ideas are best presented in a book, some in a blog post and some in twitter. Too often books could be cut in half (anyone else noticed that the sweet spot for books seems about 200 pages?) and sometimes a blog post might as well have been a twitter status update. It’s all of matter of choosing the right medium.

Luckily ideas can start out as a simple twitter post or idle chat in the hallway and then turn into something bigger. We often play around with ideas at the IOLA office. Sometimes they end at the drawing board, either discarded or put into the ever growing stack of fun ideas to try out when we got time, and sometimes they turn into sometimes bigger, like Nemo or YayArt. The interesting part of course is always what happens when you show your ideas to the world. That is often the litmus test, will people take the idea and run with it, or was it dead before it even started.

Categories
On the web

Spank my monkey part 2

This is a list of scripts that I have been collecting over a long period of greasemonkey use. The extension to firefox just keeps getting better and better all the time.

Flickr

  • Allsizes+ – Now with nag screen galore, but still very nice to snatch bigger resolution pictures.
  • Cross recommendations – This really makes browsing random pictures on flickr much more interesting.

Last.fm

IMDB

Google/Gmail