Re: [RAS] badblocks

27 Feb 2008


      ----- Forwarded message from Bob Parks <bparks@artsci.wustl.edu> -----

Envelope-to: krichel@localhost
Delivery-date: Thu, 28 Feb 2008 00:47:49 +0600
From: Bob Parks <bparks@artsci.wustl.edu>
To: Thomas Krichel <krichel@openlib.org>
X-Antivirus: avast! (VPS 080227-0, 02/27/2008), Outbound message
X-Antivirus-Status: Clean
X-SA-Exim-Connect-IP: 128.252.93.43
X-SA-Exim-Mail-From: bparks@artsci.wustl.edu
X-Spam-Checker-Version: SpamAssassin 3.2.3 (2007-08-08) on snefru.openlib.org
X-Spam-Level: 
X-Spam-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00 autolearn=ham
	version=3.2.3
Subject: Re: [RAS] badblocks
X-SA-Exim-Version: 4.2.1 (built Tue, 21 Aug 2007 23:39:36 +0000)
X-SA-Exim-Scanned: Yes (on snefru.openlib.org)


Thomas Krichel wrote:
...
Bob Parks writes
...
Yes, IMHO.  As Christian wrote earlier about nebka, there are limits to 
directory sizes.  He seemed to indicate that a cron job
with du might have been the entire problem.  We have had similar problems 
in the past.
my theory: du puts stress on the disk, it hits the bad block, and bang!
Possible, very possible.
...
...
There are bad blocks on every disk.  Bad blocks, unless a large number, 
do not show that the 'disk' is failing. And again, this is a mirror'ed 
disk, two disks, in Raid 1, with a hardware controller.  Now that I think 
on it,
it is not clear what badblocks on what disk are being reported by the 
Adaptec controller -
my theory: the disk is one disk to the o/s.  
Yes it is, but a bad block is a physical disk concept - but who knows what 
evil lurks in the depths.
...
...
Note that nearly identical hardware exists on Bill's RFE machine and 
never an error.  You have had problems
on nebka, and snefru (idential hardware) and raneb (very different 
hardware).  That alone leads me to suspect
software.
I don't remember a problem on snefru. The common file set are
 the adrepec files (common on raneb, sahure, fafner, nebka,  mutabor) and 
the citec files, common on mutabor, raneb,
 snefru, sahure, fafner (Yes, I back up!). 
 What I think is what's written in 27.2.4. badblocks and e2fsck
 of 
http://eduunix.ccut.edu.cn/index/html/linux/OReilly.LPI.Linux.Certification....
They say 
When a disk is failing, it will usually get an exponential increase in
bad blocks, and after a short while it will run out of spare blocks,
whereupon you will get into trouble with your filesystems on that
disk.
It has already run out of spare blocks, that's why some
 bad blocks show up to the o/s.
Could very well be - the eduunix.ccut.edu is very good and I will go with 
your theory.  I will be interested to know just
how you rsync to the 143 gig and then make it bootable.
Bob
...
Cheers,
Thomas Krichel                    http://openlib.org/home/krichel
                               RePEc:per:1965-06-05:thomas_krichel
 phone: +7 383 330 6813                       skype: thomaskrichel
_______________________________________________
RAS-run mailing list
RAS-run@lists.openlib.org
http://lists.openlib.org/cgi-bin/mailman/listinfo/ras-run