Pages

Monday, August 16, 2010

How to properly run fsck on (/) root or other partitions including LVM

As it is an important issue to deal with low level thing in the server archtecture. Being an GNU/Linux administrator/NOC/Ops one has to have the clear cut understanding what they are doing.Because handling the production box require lot of common sense and in depth knowlegde about the platform/OS.

So without much ado lets play with it or let me show you the simple tricks.

I do not issue any guarantee that this will work for you.

So the first question come into the mind why the hell you need to check the filesystem?? Specially the root(/) part of it...sound pretty dull and boring...huh..please don't ignore this.You know ignorance is a sin...so do not commit it.

Now filesystem can be corrupted in various ways..few common ways are :

1) Not properly shutdown the server(although most of the cases journaling will do the healing)

2) Sudden power cut left your system down with lot of processing going on

3)Somebody has done something special(bad sense) to corrupt the data on that particular partition.

It is a bad idea and not recommended to run fsck(yes,this is the inbuilt tool you need to use)the mounted partition or drive.So don't do that.

Now, running fsck on other partition like /home,/var,/usr ...

First and foremost thing to be done is get into a single user mode..how do you do that?

ok once you type init 1 at the terminal prompt you will be taken to the singe user mode.From there simply unmount the partions as show below:

root@bhaskar-laptop_08:16:36_Mon Aug 16:/home/bhaskar # init 1 ---> this will bring to the single user mode

root@bhaskar-laptop_08:16:36_Mon Aug 16:/home/bhaskar # umount /dev/sda2 ---> assuming this partion hold the /home section.

Now run the fsck:


root@bhaskar-laptop_08:18:02_Mon Aug 16:/home/bhaskar # fsck -yfv /dev/sda2


Ok..let me explain the flags or switch I passed with the fsck .

y------> it will try to detect and fix any filesystem related corruption without manual intervention.

f-----------> this will force check even the system check says it's clean.

v--------> It will provide you the verbose explanation what that comming going through on the terminal screen.

Now a major problem in our hand. That we find out that root(/)partition of the filesystem gor corrupted due to some reasons.So we need to fix that issue to get back the system as soon as possible on the track.

For this kind of problem..it significant that on a mounted system you just cannot run fsck...as I said earlier..becauase it will corrupt the data on it.So we need a installation cd/dvd for our rescue. The first cd/dvd will do the job for us or get a systemrescuecd to do that.

Once you boot with one of those cd/dvd and put the below text at the command prompt it presents:

#linux rescue nomount

Now once you fire that one you are on the prompt so you can begin work on that.First we need to do is fire a mknod command.Now ask me why need to do that???

Because we had passed the option nomount in the last section so it will not parse any file system or it will not initialize any filesystem or create any device to operate on.If you try to run fsck now it will fail.

So to run correctly the fsck to on a filesystem we need to create device file for that.For that we need to run mknod.But to use mknod we need to know the Major number and Minor number of the device.Lets get those number...wait before that I need to tell you few thing about what Major number and Minor number of a device and how it signifies.

What is Major Number and Minor number??

Traditionally, the major number identifies the driver associated with the device. For example, /dev/null and /dev/zero are both managed by driver 1, whereas virtual consoles and serial terminals are managed by driver 4; similarly, both vcs1 and vcsa1 devices are managed by driver 7. Modern Linux kernels allow multiple drivers to share major numbers, but most devices that you will see are still organized on the one-major-one-driver principle.

The minor number is used by the kernel to determine exactly which device is being referred to. Depending on how your driver is written, you can either get a direct pointer to your device from the kernel, or you can use the minor number yourself as an index into a local array of devices. Either way, the kernel itself knows almost nothing about minor numbers beyond the fact that they refer to devices implemented by your driver.

So it's clear?? right.lets move on we need to find out the major number and minor number of the device to run mknod:

root@bhaskar-laptop_08:42:30_Mon Aug 16:/home/bhaskar # ls -al /dev/sda
brw-rw---- 1 root disk 8, 0 Aug 16 07:15 /dev/sda

See it will look like this...as 4the and 5th column holds the major number and minor number.Now create the device file:

#mknod /dev/sda b 8 0

It will create the device file.Once it's done you are safe to run fsck on that particular partition holding your root(/) filesystem.

#fsck -yfv /dev/YourRootPartition(sda,hda,....)

Now lets have some fun with LVM.

We need few tools to manipulate that kind of partition which will provide the lvm package within the os or in come inbuilt with other rescue cd.

We need to find out physical disk,volume group and logical partition ..where we are going to run fsck..right?

pvscan :Physical scanning of particular disk

root@bhaskar-laptop_08:44:06_Mon Aug 16:/home/bhaskar # pvscan
PV /dev/sda8 VG bhaskarlaptop lvm2 [46.15 GiB / 21.15 GiB free]
Total: 1 [46.15 GiB] / in use: 1 [46.15 GiB] / in no VG: 0 [0 ]

vgscan :Volume group scanning

root@bhaskar-laptop_08:50:24_Mon Aug 16:/home/bhaskar # vgscan
Reading all physical volumes. This may take a while...
Found volume group "bhaskarlaptop" using metadata type lvm2

lvscan :Logical volume scanning

root@bhaskar-laptop_08:52:00_Mon Aug 16:/home/bhaskar # lvscan
ACTIVE '/dev/bhaskarlaptop/data' [25.00 GiB] inherit

Now it is not activates then you need to activate the specific logical volume like this:

#lvchange -ay "yourLogicalVolume"

The final step:

Run the fsck on logical volume:

#fsck -yfv /YourLogicalVolume


Hope this will help.

Cheers!
Bhaskar

2 comments:

  1. Thanks for the clear and concise instructions. I used the commands you specified for logical volume groups and the problem was fixed. The only difference is that I prefixed the commands with "lvm". So instead of "pvscan" I used "lvm pvscan". Don't know if there is a difference.

    Thanks again,

    Luffy

    ReplyDelete
  2. I use nfs root filesystem, which got corrupted. How can I run fsck on that?
    Can I directly run fsck on the directory /root_fs ?

    ReplyDelete