I just started to write a test WARC. Here is the start of a test file. WARC/1.0^M WARC-Type: warcinfo^M WARC-Record-ID: <urn:uuid:1baaba9e-b976-11eb-aed6-901b0ef71694>^M WARC-Date: 2021-05-20T14:17:28Z^M Content-Type: application/warc-fields^M Content-Length: 232^M ^M operator: Thomas Krichel <krichel@openlib.org> funder: Fondation Banque de France project: Lebach, http://governance.repec.org/applications/lebach.docx conformsTo: http://bibnum.bnf.fr/WARC/WARC_ISino_28500_version1_latestdraft.pdf ^M ^M WARC/1.0^M WARC-Type: resource^M WARC-Target-URI: file:///RePEc/aah/aarhec/aarhec1988.rdf^M WARC-Date: 2004-04-01T21:21:24Z^M WARC-Record-ID: <urn:uuid:1baabcec-b976-11eb-aed6-901b0ef71694>^M Content-Type: application/octet-stream^M WARC-Block-Digest: sha1:4SQLBS5JEULWYJ7JUJEO5XXCFL5FS7XM^M Content-Length: 4683^M ^M Template-Type: ReDIF-Paper 1.0^M Title: TWO PAPERS ON THE TEST OF LUCAS VARIABILITY HYPOTHESIS.^M Author-Name: CHRISTENSEN, M.^M Author-Name: PALDAM, M.^M Keywords: tests ; supply ; economic theory ; demand^M Overall this is looking good. Files can be stored as resource records. The URI is the file starting with RePEc. Sure this is not an absolute file name but I don't think we need to be that pedantic. The time on the resource is the time in the tarball that I have. I will take care to also archive files with a ~ ending as if they are versions of the file without the tilda. The UUID is the same, I still have to find out why. I intend to add the tarball date to the warcinfo fields. The idea is to have on file per RePEc archive. Later, we will be able to run this on a daily bases. Comments on these choices are very welcome. A bad policy now will be hard to undo! I have written to Olaf and Jan about the need for me to have more disk space. While I hope this project will save disk space it's not enough. The problem is that darni is 95% full. -- Cheers, Thomas Krichel http://openlib.org/home/krichel skype:thomaskrichel
participants (1)
-
Thomas Krichel