Christian Zimmermann writes
I am back and stand ready to run to the server farm if necessary.
With 11 hours time difference, I was in bed. I have been thinking a bit more. I remember, when I had a similar problem with raneb, there were only 12 or 40 bad bad blocks, but they caused the disk to crash. Now that the offending disk has been replaced, it's all quiet on the raneb front. I would therefore suggest that the troubles come from the bad block The way I understand disks, is that decay is expenential. Most modern disks have some extra space through RAID, that is hidden from the O/S. When bad block appear the data is moved from the bad blocks to blocks that are healthy, in a way that is transparent to the o/s. When there are too many bad blocks, the o/s start seeing them, and that's when Linux gets rather merciless, it does not take hardware issues lightly. So even with 3 bad blocks, we need to get rid of the disk, software updates will not help. e2fsck has a -c option that will scan for bad blocks and mark the bad blocks as bad, so that they are not used by the o/s. When we run this on startup, with the root file system mounted read-only, it should mark the bad blocks. If we then immediately (so that there are no further bad blocks) rsync the files from sda to sdb, make sdb bootable, then swap disks to boot from sdb, we should be fine. I did such an operation locally and can give further instructions if you agree with the general course of action. Cheers, Thomas Krichel http://openlib.org/home/krichel RePEc:per:1965-06-05:thomas_krichel phone: +7 383 330 6813 skype: thomaskrichel