Following from: "Reconstruct a Feed's History Using Google Reader"
"Google Reader is more than a feed reader: it's also a platform for feed caching and archiving. That means Google Reader stores all the posts from the subscribed feeds and they're available if you keep scrolling down in the interface."
But if you look carefully, it's more than that.
"...should return the latest 100 posts from this blog as an ATOM/XML file."
This is regardless of the original feed format. Universal Feed Parser mostly rocks, but if you want Google to do that work for you, it's pretty easy. You do need a Google Reader account, user name and password provided on the command line.
import sys
from amara import bindery
import httplib2 #easy_install httplib2
import urllib, urllib2
class feedconverter(object):
gfeed = "http://www.google.com/reader/atom/feed/%s"
gfeed_limited = "http://www.google.com/reader/atom/feed/%s?r=n&n=%i"
reader_prep_uri = "http://www.google.com/reader/api/0/token"
def __init__(self, user, passwd):
'''
user - Google e-mail including the "@gmail.com"
passwd - password
'''
#FIXME: might as well use httplib2 here as well
# get an AuthToken from Google accounts
# http://code.google.com/apis/accounts/docs/AuthForInstalledApps.html#Parameters
auth_uri = 'https://www.google.com/accounts/ClientLogin'
authreq_data = urllib.urlencode({ "Email": user,
"Passwd": passwd,
"service": "reader",
"source": "Amara demo",
"accountType": "GOOGLE",
#"continue": "http://www.google.com/",
})
auth_req = urllib2.Request(auth_uri, data=authreq_data)
auth_resp = urllib2.urlopen(auth_req)
auth_resp_body = auth_resp.read()
auth_resp_dict = dict(x.split("=")
for x in auth_resp_body.split("\n") if x)
self.auth = auth_resp_dict["Auth"].strip()
self.sid = auth_resp_dict["SID"].strip()
self.h = httplib2.Http()
self.h.follow_all_redirects = True
self._update_token()
return
def _update_token(self):
headers = {'Cookie': 'SID='+self.sid}
response, content = self.h.request(self.reader_prep_uri, 'GET', body=None, headers=headers)
#print response, content
self.token = response
return
def atomize(self, feed, count=None):
headers = {'Cookie': 'SID=%s; T=%s'%(self.sid, self.token)}
if count:
response, content = self.h.request(self.gfeed_limited%(feed, count), 'GET', body=None, headers=headers)
else:
response, content = self.h.request(self.gfeed%(feed), 'GET', body=None, headers=headers)
return content
feed = "http://www.thenervousbreakdown.com/uogbuji/feed/"
user = sys.argv[1]
passwd = sys.argv[2]
fc = feedconverter(user, passwd)
doc = bindery.parse(fc.atomize(feed, 10))
print doc.feed.titleI'd be grateful for any thoughts on improving this code. In particular I tried to get it to work using cookielib with no success. I think that would eliminate the need for the external dependency on httplib2
And yes, there's a plug hidden in there, for my new literary outlet The Nervous Breakdown
I'd never have figured out Google's crazy auth dance without help from this page, and especially the comment by pinowsky on Oct 04, 2008.
See also
http://wiki.python.org/moin/RssLibraries - summary of RSS parsers and tools for Python
http://www.intertwingly.net/blog/2008/09/27/Planet-Hopping - Sam Ruby with some rumination on FeedParser frailty
http://gdatatips.blogspot.com/2008/08/perform-clientlogin-using-curl.html
