kerravonsen: 9th Doctor wearing his headlamp: Technical wizard (technical-wiz)
[personal profile] kerravonsen posting in [community profile] perl
For all of us here who are fannish as well as geeky, you might be interested in this.

I have written my own fan-fiction downloader in Perl, which can be installed from CPAN as "WWW::FetchStory". There are probably Linux-isms in the code. (frown) For example, it uses the "wget" program to do the actual downloading.

But I would love other people to use the script! It has plugins (which I am calling "fetchers") for various different fiction sites, which know how to download multi-chapter fics from those sites, so you only have to give the table-of-contents URL for the fic and it will figure out the rest. Depending on the particular fetcher, it will get not just the title and author, but the summary, the categories and the characters.
It also has an option to create an EPUB file rather than HTML files.

Currently, I have written fetchers for:

AO3: AO3 General fanfic archive
Ashwinder: ( A Severus Snape/Hermione Granger HP fiction archive.
DigitalQuill: ( A Harry Potter fiction archive.
DracoAndGinny: ( A Draco Malfoy/Ginny Weasley HP fiction archive.
DreamWidth: ( Journalling site where some post their fiction.
FanfictionNet: ( Huge fan fiction archive.
FictionAlley: ( A Harry Potter fiction archive.
HPAdultFanfiction: ( An adult Harry Potter fiction archive.
LiveJournal: ( Journalling site where some people post their fiction.
Owl: ( A Harry Potter fiction archive.
PetulantPoetess: ( A Harry Potter fiction archive.
PotionsAndSnitches: ( A Severus Snape + Harry Potter gen fiction archive.
PotterPlace: ( A Harry Potter fiction archive.
SSHGExchange: ( Severus Snape/Hermione Granger fiction exchange comm.
TardisBigBang3: ( Round 3 of the TARDIS BigBang challenge.
Teaspoon: ( A Teaspoon And An Open Mind; a Doctor Who fiction archive.
TwistingHellmouth: ( Twisting The Hellmouth; Buffy The Vampire Slayer crossovers.

But every now and then, those sites change their code and the fetcher for that site breaks. (frown)

Also, for a number of those archives, you must be logged in if you want to download "adult" rated fic. The solution I devised for that is rather clumsy (and Linux-centric); it looks for a "cookies.txt" file in your home directory, which you need to have exported from your browser after you logged in to the site.
If someone has a better solution, I would love to hear from you.

For the more geeky among you, the source is in my git repository at
I would LOVE people to contribute to it, whether that be fixing bugs, fixing documentation, improving fetchers, or writing new fetchers.

Date: 2011-08-02 01:42 am (UTC)
dreamatdrew: "Dreamwidth Irish Pub", overprinted on green around a pint glass with Celtic knotwork on it. (Pub)
From: [personal profile] dreamatdrew
OK, I has not looked at your code yet, but there are ways to get around the external-dependencey-on-wget. And while said solutions would present the same problem (a separate file for the cookie jar), it would kill off some linux-centricism. (He says while posting from one of 4 [in some way shape or form] *nix boxes in his home)

I'll take a look at code later and see what would work best, should you like?

Date: 2011-08-02 11:59 pm (UTC)
pauamma: Cartooney crab holding drink (Default)
From: [personal profile] pauamma
Nitpick: Dreamwidth, not DreamWidth.


perl: cc-by-nc (Default)
Pathologically Eclectic Rubbish Lister

August 2012

56 7891011

Style Credit

Expand Cut Tags

No cut tags
Page generated Oct. 23rd, 2017 04:13 am
Powered by Dreamwidth Studios