Google Reader to Librivox

From Tech
Jump to: navigation, search

This script goes to my shared google reader page and downloads the ogg files for any librivox audiobook on it. It keeps track of the downloaded audiobooks and makes sure to only download them once. If placed in cron, you can use google reader to share audiobooks that you are interested in, and they will be downloaded automatically for you the next time the script runs. If you want to run it for your google reader, replace the three instances of my url with your url. Note that it is in a regular expression and replace it accordingly.

For this script to work, you must use google reader, and you must be subscribed to the librivox rss feed: http://librivox.org/newcatalog/NewReleases.xml

#!/bin/bash
wget -q http://www.google.com/reader/shared/03682581869898758301 --output-document=feeds
toget=`grep "\"http://librivox.org/.*\"" feeds --only-matching`
nextpage=`grep "\"http://www.google.com/reader/shared/03682581869898758301.*\"" feeds --only-matching | sed -e 's/\"$//' | sed -e 's/^\"//'`
echo $nextpage
while [ -n "$nextpage" ]
do
for arg in $toget
do
url=`echo $arg | sed -e 's/\"$//' | sed -e 's/^\"//'`
downloaded=`grep $url downloaded.log`
if [ "$downloaded" == "" ]
then
echo "downloading $url"
wget -q $url --output-document=librivox.html
oggs=`grep "\"http://www.archive.org/download/.*ogg\"" librivox.html --only-matching`
title=`grep "<title>LibriVox   » .*</title>" librivox.html --only-matching | sed -e 's/<title>LibriVox   » //' | sed -e 's/<\/title>//'`
rm librivox.html
echo $title
mkdir "$title"
for oggfile in $oggs
do
oggurl=`echo $oggfile | sed -e 's/\"$//' | sed -e 's/^\"//'`
echo `basename $oggurl` >> "$title/$title.m3u"
wget $oggurl --output-document="$title/`basename $oggurl`"
done
echo $url >> downloaded.log
else
echo "skipping download of $url, already downloaded"
echo
fi
done
wget -q $nextpage --output-document=feeds
nextpage=`grep "\"http://www.google.com/reader/shared/03682581869898758301.*\"" feeds --only-matching | sed -e 's/\"$//' | sed -e 's/^\"//'`
toget=`grep "\"http://librivox.org/.*\"" feeds --only-matching`
done
Personal tools