----- Forwarded message from Bob Parks <bparks@artsci.wustl.edu> ----- Envelope-to: krichel@localhost Delivery-date: Wed, 27 Feb 2008 23:57:15 +0600 From: Bob Parks <bparks@artsci.wustl.edu> To: Thomas Krichel <krichel@openlib.org> X-Antivirus: avast! (VPS 080227-0, 02/27/2008), Outbound message X-Antivirus-Status: Clean X-SA-Exim-Connect-IP: 128.252.93.43 X-SA-Exim-Mail-From: bparks@artsci.wustl.edu X-Spam-Checker-Version: SpamAssassin 3.2.3 (2007-08-08) on snefru.openlib.org X-Spam-Level: X-Spam-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.2.3 Subject: Re: [RAS] badblocks X-SA-Exim-Version: 4.2.1 (built Tue, 21 Aug 2007 23:39:36 +0000) X-SA-Exim-Scanned: Yes (on snefru.openlib.org) Thomas Krichel wrote:
Bob Parks writes
My meory is different about raneb.
We had a bad disk. When we replaced, it was fine.
All of the errors seem to melt away when some of the crons are disabled - such as Christian did with the ones involving du.
So what does this conclude? A software problem?
Yes, IMHO. As Christian wrote earlier about nebka, there are limits to directory sizes. He seemed to indicate that a cron job with du might have been the entire problem. We have had similar problems in the past.
Get rid of THE disk does not compute. There are two disks, and in a configuration that is as fault tolerant as it gets.
But it will break at some stage. The badblocks show it's broken.
There are bad blocks on every disk. Bad blocks, unless a large number, do not show that the 'disk' is failing. And again, this is a mirror'ed disk, two disks, in Raid 1, with a hardware controller. Now that I think on it, it is not clear what badblocks on what disk are being reported by the Adaptec controller - Note that nearly identical hardware exists on Bill's RFE machine and never an error. You have had problems on nebka, and snefru (idential hardware) and raneb (very different hardware). That alone leads me to suspect software.
Up to you and Christian but I believe this is not the solution. Bob
What is the solution?
As Christian has done, carefully bring the machine back to life without all the crons and add the crons sparingly. I have not heard of any more problems with nebka since he did that and it is on the same Raid 1 2 disk mirror. If you do decide to make the 143 gig bootable, Christian should, after a time, boot and enter the Adaptec controller. Then break the 'container' which has the two 68 gig disks, and then you can have two 68 gig disks, check them individually, and gain 68 gig of space. In the end, it is your choice. Bob
Cheers,
Thomas Krichel http://openlib.org/home/krichel RePEc:per:1965-06-05:thomas_krichel phone: +7 383 330 6813 skype: thomaskrichel
----- End forwarded message ----- -- Cheers, Thomas Krichel http://openlib.org/home/krichel RePEc:per:1965-06-05:thomas_krichel phone: +7 383 330 6813 skype: thomaskrichel
participants (1)
-
Thomas Krichel