Sorry for the noise from Woodside NY.
After a most inspiring conversation with JMBC today, I decided to
make paper handles (papids) the keys in plind JSON. I add the
relative file as a field 'F' is the data for each payload. Yes, that
duplicates that value, because all plods of a particular papid are
in the same relfi ... but ok. It's impure but it seems to work.
--
Cheers,
Thomas Krichel http://openlib.org/home/krichel
skype:thomaskrichel
I just wrote a few lines of python to summarize the plind
archec@darni:$ plind_stats
1071893 papers
1447873 PDF payloads
1226498 plodis
The plodi is a playload digest. So basically this tells you how many
different payloads we have. My policy is to duplicate payloads if
they belong to different papers. While this wastes disk space,
anything else would make it harder of consumers of the data.
Having hit over a million on all figures is good, it should
make the funders happy.
--
Cheers,
Thomas Krichel http://openlib.org/home/krichel
skype:thomaskrichel