Christian Zimmermann writes
Tim got it finally to work. I must have done something in the RAID configuration utility that erase the tables on sdb1.
Oh great.
The current state of the system is: kernel 2.6, ext3 filesystem with dir_index feature, empty sdb1, boot and root on sda1.
Note: Time added the dir_index feature also to sda1. This allows better handling of large filesystems, but works only with 2.6
Tim is convinced, and I agree, that we do not have a hard drive problem. The problem is software related and has to do with the fact that there is an awful lot of disk I/O going on on this machine. We should assess all the rsync's and such running and see whether they are necessary, and whether they needed at the current frequency.
Yes, it is an i/o related software issue. Linux kernels don't handle hardware problems gracefully, but horribly. This also applies to bad disks. To solve the issue, you either rewrite the Linux kernel, or you get a new disk.
We should also be used the second drive to distribute the I/O load optimally across the two drives. Say, put only /home on sdb1, or only /home/aras.
My sense this strategy would also be valid on raneb, snefru, etc., which seem to have disk emergencies more often than usual...
No. There is no space on them, the close to 1TB of disk space on raneb and snefru is used up by backups. But there is no backup of nebka because of your bedevilling of rsync. Snefru has had no disk problem. Raneb had it, it was bad blocks, changed disk, all clear. Chichek had it, it was bad blocks, changed disk, all clear. Fafner had it, it was bad blocks, changed disk, all clear. In the meantime, I keep backups. The fact that we were able to do the entire rsync after marking the bad blocks as bad demonstrates that when the system does not hit the bad blocks, it works. Next bad block comes along it will go belly up again. In my experience, the more bad blocks you have, the more bad blocks you get.
1) put the RePEc Author Service back online. We were having recently 15-40 new authors a day signing up. We do not want to discourage new users.
2) Think hard how to optimize disk load
3) Then only implement new strategy.
The first priority should be a complete backup, daily. More rsync, not less Cheers, Thomas Krichel http://openlib.org/home/krichel RePEc:per:1965-06-05:thomas_krichel phone: +7 383 330 6813 skype: thomaskrichel