marnanel: (Default)
Monument ([personal profile] marnanel) wrote in [community profile] perl2010-01-22 11:57 am

Large amounts of data in CPAN packages

I have a CPAN package called Lingua::EN::Alphabet::Shaw which transliterates between the Latin and Shavian alphabets.  Currently it keeps the transliteration data, a few megabytes, in /usr, and it gets installed along with the package.  I was thinking that it would be useful to add the ability to download the data every so often from shavian.org.uk, but unless that involves me actually updating the CPAN package every week, it doesn't seem workable to keep the data in /usr, since the user generally doesn't have write permission there.  So either we keep one copy in /usr and any updates in /home, or we require the user to download the data on first use of the package rather than during installation.

I'm wondering what your opinion is of packages which do this.  I remember it was one of several annoyances I had with Lingua::Phoneme.

(Of course, the user might not even have much of a home directory, say if it was the Apache user, which makes things a bit more complicated.)

hatter: (Default)

[personal profile] hatter 2010-01-22 09:29 pm (UTC)(link)
Install default in /usr from package, and a small script in /bin to grab an updated data set, stored in ~ and used in preference to /usr ?


the hatter
hatter: (Default)

[personal profile] hatter 2010-01-22 09:46 pm (UTC)(link)
Oh noes, those precious megabytes ! I doubt anyone would notice, even on devices which aren't full-size computers. They could toast man/ and grab a whole load more back much more easily.


the hatter
zorkian: Icon full of binary ones and zeros in no pattern. (Default)

[personal profile] zorkian 2010-01-23 08:57 am (UTC)(link)
Agreed. Especially nowadays, unless you're on an embedded system a few megabytes doesn't really matter. I'd rather the package ship with a default up to date "ish" set of data, then have it update something in ~ later.

At least that way it's going to always work: I'm picturing scenarios in which someone distributes the deb/rpm/whatever and installs the package on machines that don't necessarily have outbound Internet connectivity. (Production environments. f.ex.)

[personal profile] csjewell 2010-03-17 07:55 am (UTC)(link)
Wouldn't it be more logical to put the data in File::ShareDir::dist_dir('Lingua-EN-Alphabet-Shaw'), rather than assuming a /usr that does not exist on non-Unixen?