Saturday, June 13, 2009

Retrieve url from keywords in Python

I had an interesting problem at work this week. I was given a list of 1400+ store names in a spread sheet but they did not have their corresponding web site urls which I needed to perform a task. I could have waited for one of the writers to complete this manual and tedious task but I was adamant of a programmatic way to achieve this repetitive work.

The above was solved in PHP with the help of Google AJAX Search API. Since I am a big proponent of Python, here is the equivalent Pythonic version:

import json, urllib

BASE = 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&'
keywords = 'django'

url = BASE + urllib.urlencode({'q': keywords})
results = json.load(urllib.urlopen(url))
response_data = results['responseData']
results = response_data['results']

# Show the url from the first search result
print results[0]['visibleUrl']

Not all of the search results gave me the exact url for the store names but 98% of the job was done automatically, the suspicious ones can be eyeballed and corrected.

No comments: