The machine survived overnight. It passed all test on the ACIS side. I am now restoring progessively services. The RI daemon is now running, I did a run of /home/aras/acis/bin/nightly >>/home/aras/nightly.log 2>&1, which is scheduled it crontab to run at 23:45, i.e. just after the last known instructions before the crashes. It worked well. I have not reestablished the following services: # # Report, backup, rotate, archive # 54 23 * * * /home/aras/acis/bin/nightly >>/home/aras/nightly.log 2>&1 # # Make RePEc:ras (RePEc:per) archive # */9 * * * * cd /home/aras/acis && /home/aras/acis/bin/make-repec-per.sh # # Clean up old ACIS user sessions # */26 * * * * /home/aras/acis/bin/clean-up >> /home/aras/acis/clean-up.log # # Clean up old ACIS user sessions # */26 * * * * /home/aras/acis/bin/clean-up >> /home/aras/acis/clean-up.log # # Update daemon database checkpoint # 27 * * * * cd /home/aras/lib/bdb/bin ; ./db_checkpoint -1 -h /home/aras/acis/RI\/data && ./db_archive -d -h /home/aras/acis/RI/data I am holding off the rest for the moment. Should we revert the DNS record so that people can connect now? On Mon, 28 Jan 2008, Ivan Kurmanov wrote:
Sounds hopeful.
There is also a job or two in crontab of user adrepec.
in root, do "crontab -lu adrepec"
ivan
On 28 Jan 2008, at 22:27, Christian Zimmermann wrote:
I looked everywhere in the logs, I see nothing wrong. There are some indications of corrupt mysql tables, but when I checked those used by RAS after the first crash, they were fine. Maybe there are corrupt tables elsewhere. I have not yet run the checks, I'll try this evening.
I commented out crontab in the root and aras accounts with '#CZ'. Let's see whether the machine survives the night. If so, and nobody else see a problem, we should gradually get the service back. The first thing would be to get adrepec current. Then open the web server to users. Then get CitEc data back. Does this make sense?
On Mon, 28 Jan 2008, Christian Zimmermann wrote:
First things I see: both crashes happened exactly at the same time:
Jan 17 23:09:01 nebka /USR/SBIN/CRON[14205]: (aras) CMD (cd /home/aras/acis && /home/aras/acis/bin/make-repec-per.sh ) Jan 17 23:10:01 nebka /USR/SBIN/CRON[14237]: (www-data) CMD ([ -x /usr/lib/cgi-bin/awstats.pl -a -f /etc/awstats/awstats.conf -a -r /var/log/apache/access.log ] && /usr/lib/cgi-bin/awstats.pl -config=awstats -update >/dev/null) Jan 17 23:10:01 nebka /USR/SBIN/CRON[14238]: (root) CMD (test -x /usr/lib/atsar/atsa1 && /usr/lib/atsar/atsa1) Jan 17 23:15:01 nebka /USR/SBIN/CRON[14474]: (root) CMD ([ -x /usr/lib/sysstat/sa1 ] && { [ -r "$DEFAULT" ] && . "$DEFAULT" ; [ "$ENABLED" = "true" ] && exec /usr/lib/sysstat/sa1 $SA1_OPTIONS 1 1 ; }) Jan 17 23:16:01 nebka /USR/SBIN/CRON[14476]: (aras) CMD (/home/aras/acis/bin/apu 7 >>/home/aras/apu-job.log 2>&1) Jan 17 23:17:01 nebka /USR/SBIN/CRON[14489]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) Jan 17 23:18:01 nebka /USR/SBIN/CRON[14492]: (aras) CMD (cd /home/aras/acis && /home/aras/acis/bin/make-repec-per.sh ) Jan 17 23:20:01 nebka /USR/SBIN/CRON[14547]: (www-data) CMD ([ -x /usr/lib/cgi-bin/awstats.pl -a -f /etc/awstats/awstats.conf -a -r /var/log/apache/access.log ] && /usr/lib/cgi-bin/awstats.pl -config=awstats -update >/dev/null) Jan 17 23:20:01 nebka /USR/SBIN/CRON[14548]: (root) CMD (test -x /usr/lib/atsar/atsa1 && /usr/lib/atsar/atsa1) Jan 17 23:22:01 nebka /USR/SBIN/CRON[14703]: (root) CMD (du -cs /* > du_slash_`date -I`) Jan 18 14:04:08 nebka syslogd 1.4.1#18: restart.
...
Jan 17 23:09:01 nebka /USR/SBIN/CRON[14205]: (aras) CMD (cd /home/aras/acis && /home/aras/acis/bin/make-repec-per.sh ) Jan 17 23:10:01 nebka /USR/SBIN/CRON[14237]: (www-data) CMD ([ -x /usr/lib/cgi-bin/awstats.pl -a -f /etc/awstats/awstats.conf -a -r /var/log/apache/access.log ] && /usr/lib/cgi-bin/awstats.pl -config=awstats -update >/dev/null) Jan 17 23:10:01 nebka /USR/SBIN/CRON[14238]: (root) CMD (test -x /usr/lib/atsar/atsa1 && /usr/lib/atsar/atsa1) Jan 17 23:15:01 nebka /USR/SBIN/CRON[14474]: (root) CMD ([ -x /usr/lib/sysstat/sa1 ] && { [ -r "$DEFAULT" ] && . "$DEFAULT" ; [ "$ENABLED" = "true" ] && exec /usr/lib/sysstat/sa1 $SA1_OPTIONS 1 1 ; }) Jan 17 23:16:01 nebka /USR/SBIN/CRON[14476]: (aras) CMD (/home/aras/acis/bin/apu 7 >>/home/aras/apu-job.log 2>&1) Jan 17 23:17:01 nebka /USR/SBIN/CRON[14489]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) Jan 17 23:18:01 nebka /USR/SBIN/CRON[14492]: (aras) CMD (cd /home/aras/acis && /home/aras/acis/bin/make-repec-per.sh ) Jan 17 23:20:01 nebka /USR/SBIN/CRON[14547]: (www-data) CMD ([ -x /usr/lib/cgi-bin/awstats.pl -a -f /etc/awstats/awstats.conf -a -r /var/log/apache/access.log ] && /usr/lib/cgi-bin/awstats.pl -config=awstats -update >/dev/null) Jan 17 23:20:01 nebka /USR/SBIN/CRON[14548]: (root) CMD (test -x /usr/lib/atsar/atsa1 && /usr/lib/atsar/atsa1) Jan 17 23:22:01 nebka /USR/SBIN/CRON[14703]: (root) CMD (du -cs /* > du_slash_`date -I`) Jan 18 14:04:08 nebka syslogd 1.4.1#18: restart.
du /* seems to be the tripping point.
Christian Zimmermann FIGUGEGL! Department of Economics University of Connecticut 341 Mansfield Road, Unit 1063 Storrs, CT 06269-1063 http://ideas.repec.org/zimm/ christian.zimmermann@uconn.edu http://ideas.repec.org/e/pzi1.html
On Mon, 28 Jan 2008, Christian Zimmermann wrote:
Tim seems to have put nebka back online, and it seems to be spewing out emails. I will comment everything in crontab and kill whatever is running to let us investigate the problems.
Christian Zimmermann FIGUGEGL! Department of Economics University of Connecticut 341 Mansfield Road, Unit 1063 Storrs, CT 06269-1063 http://ideas.repec.org/zimm/ christian.zimmermann@uconn.edu http://ideas.repec.org/e/pzi1.html
_______________________________________________ RAS-run mailing list RAS-run@lists.openlib.org http://lists.openlib.org/cgi-bin/mailman/listinfo/ras-run
_______________________________________________ RAS-run mailing list RAS-run@lists.openlib.org http://lists.openlib.org/cgi-bin/mailman/listinfo/ras-run
_______________________________________________ RAS-run mailing list RAS-run@lists.openlib.org http://lists.openlib.org/cgi-bin/mailman/listinfo/ras-run
-ivan