Getting Google Reader to work with Livejournal's protected entries via CURL (note, hosting service required)

BLOT: (06 Sep 2010 - 10:18:59 PM)

Getting Google Reader to work with Livejournal's protected entries via CURL (note, hosting service required)

I have been playing around with Google Reader today. I know that some of my friends use it a lot, and it offers me an alternative feed reader when I am on the road or at work, so it interests me. However, it had one big flaw that I could see. I have a small handful, say about 10, friends who post semi-regular, protected updates to Livejournal. The obvious, and probably expected, answer was simply to check my Livejournal Friends page while on the road or at work, and that works; but part of me balked at the limitation and wanted to find a way around it. After an hour or two of tinkering, I came up with this idea (this shouldn't have taken this long, but I'll explain the process).

My first thought, once I discovered the issue, was to try and find some way to call the feed that allowed it to authenticate "on the fly", one of the old http://user:pass@address.com sort of solutions. That did not work. I looked around online for a solution, and the first thing I found was Yahoo! Pipes which probably would work if they weren't so odd and strange a program. You can set up fairly complex inputs and outputs and connect them various mash-up systems. Theoretically, I could set up an authenticated feed reader on pipes, and then have it output the feed's contents as another feed, which could be read by Google Reader. My first and only attempt at working that through did not work, and for all I know, published by password to the internet.

This got me thinking, though, why not chew up the feed with another feed reader and then spit out a parsed xml file that I could do something with. I use Akregator so I first went into its archives and found them in mk4 file format, which is a Metakit type. I was able to use the viewer to see the contents of the archive, and even read them if I wanted, but did not quite feel like compiling some sort of command line system that would directly interact with the mk4 files. It seemed possible, maybe even preferable, but eh. Not quite what I was looking to do. The next idea was a continuation of this, I thought about getting another feed reader. Something simple, maybe, like Rawdog, which had the added benefit of converting the feeds read into an HTML file via Python, which was about one step away from doing what I needed to do with them. Reading through the config files, though, it was missing a certain finesse. Not Rawdog itself, but the solution. Installing a new application just so that I could write something to fit on top of it that would do something else with its outputs? Finally, my brain went, "Oh, yeah, CURL [however you want to uppercase that]!" Well, it went "WGet!" and then I remembered that CURL was better at that sort of thing, and...anyway.

If you want to use my solution to this problem, you'll need the following bits:

cURL [I looked up the uppercasings like a good boy]
crontab
Some sort of hosting service that allows you to either execute scripts or to make uploads after the scripts are executed elsewhere
Google Reader
Livejournal account that is already friend approved (this isn't some magic trick to see otherwise protected beyond-your-reach journal entries, you have to have to permission to see the entries in question)

Also note the following caveats, which I would rank of fair importance, considering.

You are going to be not so much side-stepping security as confirming your identity and then making your own copy of certain files. If you share these files, you can actually expose private journal entries. Be very careful about this and play nice. In scope, it is no worse than you printing out private entries or copy-and-pasting them, but it is possible that you can forget they were once private and then hit "share with note" or such on Google Reader without thinking about it.
You can set the system up to refresh about as often as you like, but I have it go off every hour or so. LJ's a grown server, and should be ok, but if you have a lot of feeds, they might look upon it suspiciously. There is no reason you couldn't set up some sort of system involving multiple crontab lines and multipe bash files, if you had more than you felt comfortable with running all at once.
At least in the method I am about to list below, it's going to expose a password and username to network sniffers and, if you put this on another server, possibly to that server's admins

Ok, now that that is taken care of, this is how I set it up.

First, I created a bash script that had a series of cURL commands. Each one reads something like /usr/bin/curl --silent --digest --u username:password http://lj-user.livejournal.com/data/atom?auth=digest -o /path/to/hosted/file/lj-user.rss. One per line. The silent seemed to help, but may not be necessary. You'll need to toss the "--digest" in, or it doesn't seem to work (I have to admit I am not 100% sure what that means). The name of the file is whatever you want it to be, but keep in mind the first caveat, above. If you make it too obvious, then others might be able to access it and that's no good. Since it doesn't matter what the rss is named, you might could aim for something like a random sixteen digit code, that you change from time to time.

Once you get that file up and running (and don't, like me, forget the #!/bin/bash at the beginning) you'll stick that just about anywhere you please and then you'll want to crontab -e and make a line that reads something like 0 * * * * /path/to/script. If you are not doing this directly on the server, you'll then need to set up a second stage which automatically uploads the files to a server. I'll avoid going into that, but there are various options out there.

The next step, and essentially the last, is to go into Google Reader and then tell it the location of the files. Then you do all the various things you want to do like tag them or sort them. Google Reader seems to have it's own little schedule dictating when it updates. While I get it to show entries this way, and it's kind of cool because it brings over the journal title and such, it takes a few minutes before it checks again. I bet there is some sort of Google algorithm that won't check sites that aren't super popular super often, so expect a *gasp* 10 or so minute lag time on the hour (or however often you have it set up). That's what I've been getting so far, but let me give it a day or two and I'll update to let you know if it is actually worse than this over a longer scale.

TAGS: Linux Tricks

BY WEEK: 2010, Week 36
BY MONTH: September 2010

Getting Google Reader to work with Livejournal's protected entries via CURL (note, hosting service required)