here is a status update on the issue. The update process in the update daemon was running until midnight by the server time, but have not finished. By then it has processed 750 archives of 1356 total. It was then interrupted by the nightly script job, which I did not disable in crontab. The nightly script has restarted the update daemon, which caused the update process to stop. When I got up this morning, i've requested an update again, with parameters to (hopefully) run quickly -- without thorough processing -- through the parts that were processed yesterday. my estimate is that the processing will take 20-28 hours from now to get finished. i'll make sure the nightly script does not interfere this time. Then I'll do some checks and probably some more selective updates via the update daemon and then we would be ready to put the service back online. -ivan On Sat, Aug 20, 2011 at 8:43 PM, Ivan Kurmanov <duraley@gmail.com> wrote:
I've started working on preparing (fixing) the Storable-serialized data of RAS for proper (full) migration from nebka, and I was working with the live code and live database. And I've made a mistake. The mistake caused an important part of the data in the database -- the data column in the objects table -- to be overwritten with a value that was relevant to only one of these records. In other words, i've put something which looks like a proper document details into description of a large number of other documents. I don't know how many of the records were affected, but i estimate that probably at least several thousands.
When I realized what is going on, I've aborted the operation and killed the mysql thread that was doing the job.
And before that I've also (via the same mistake) have rewritten all institution details in the DB.
This corruption would mean that wrong data would be shown to the users. Specifically, in research profile suggestions and in institutions search.
With Thomas' help, I've taken RAS down and has put the Service Temporarily Unavailable page online instead. At the same time I've disabled most of the RAS-related cronjobs in the aras account.
And I've started a full update of RePEc in the update daemon, which should rewrite the corrupted data with correct data taken from the files. But this update may take days to complete. That's why i've disabled the cronjobs to have as minimal concurrent jobs as possible. I don't have a better estimate now. I'm watching the update daemon log, but i don't expect it to finish soon anyway.
-ivan