Teknikal's_Domain

#<NTA:NnT:SSrgS:H6.6-198:W200-90.72:CBWg>

RED ALERT: Emergency Laptop Brain Transplant

2019-09-26 4 min read Fails Hardware My stuff Maintenance Teknikal_Domain Unable to load comment count

Incident Report

Sept. 26, 2019: Was upgrading Mint from 19.1 to 19.2, midway through, the update fails with read-only filesystem. I was cleaning up the drive at the time (getting rid of some unneeded files) and was encountering the same error. When Nemo refreshed, every icon on my desktop had a big “✖️” icon, signifying that there was an error accessing - the entire laptop was actually read-only. Being a sensible person, I tried to reboot. Cinnamon didn’t come up. Instead, there was a terminal prompt, with (initramfs).

Cause

I knew, as a matter of fact, that the /etc/fstab entry for my root filesystem (only filesystem, no separate home partition), contained the option errors=remount-ro. Put simply: “If the disk encounters a severe error that cannot be corrected, disable writing to the filesystem.” There’s only one problem: If a hard drive is failing bad enough that the OS sees it and takes action, it’s dead. Replace. NOW.

Repair

Part 1: Booting

Repairing was a two step process: first, I had to boot. This could easily be done by just trying to exit the initramfs shell… and it threw this at me:

TD-MDP: recovering journal
TD-MDP contains a file system with errors. Check forced.
Inodes that were part of a corrupted orphan linked list found.

TD-MDP: UNEXPECTED CONSISTENCY: RUN fsck MANUALLY.
        (i.e., without -a or -p options)
fsck exited with status code 4
The root filesystem on /dev/sda1 requires a manual fsck


BusyBov v1.27.2 (Ubuntu 1:1.27.2-2ubuntu3.2) built-in shell (ash)
Enter 'help' for a list of built-in commands.

(initramfs)

To put in normal terms: “Something went seriously wrong with the filesystem, to the point where I don’t feel safe touching it. You need to be the one to authorize the repairs.” Luckily, that’s the easiest thing to do: fsck /dev/sda1. I won’t post the video, but it gave plenty of screens of orphaned node, corrupted list, bad attributes, and deleted nodes missing data. Luckily… all was fixable with minimal data loss (I didn’t notice any). Exit again, and…. Cinnamon has crashed and restarted in emergency mode. Well…. crap.

Part 2: Fixing Cinnamon

Now this I know exactly what’s wrong. The drive errored in the middle of an OS upgrade, it’s probably in an inconsistent state. The command debconf was busy configuring new packages, and I know there’s a set of options (that I’ve forgotten somehow) that tell it to pick up where it left off. Run that, reboot again, fixed!

Now that my computer’s working, let’s do some digging.

Immediate Emergency Avoided, Commence Investigation

First off: One off error or potential failure? S.M.A.R.T. statistics can answer that…. Though just opening up the Disks application reads “640 bad sectors” and “DISK IS LIKELY TO FAIL SOON!

Power-On Hours: 8 months and 23 days
Reallocated Sector Count: 568 sectors
G-sense Error Rate: 61
Power-off Retract Count: 22
Current Pending Sector Count: 72 sectors
Uncorrectable Sector Count: 72 sectors

Yep… drive is dead. I know that my laptop is usually on close to 24/7, so Power-On Hours is a pretty good estimate of how long the drive has been installed, not just running. G-sense Error Rate is just a count of how many times the drive thinks it got moved around too much.. happens in a laptop. Power-off Retract Count is essentially the number of times it’s made an emergency (aka not planned) retract (move the head off the disk surface). As for the rest… well let’s just say that they should stay at 0. And they have not. Brand new Seagate drive, which (because weirdness) I physically cannot warranty. Guess I’m buying another one. With one-day shipping, and making a backup while it ships… never can be too careful with this, at least for me.

Recovery Procedures

I have a spare 240 GB SSD in my room, works perfectly, and that has a backup of all the data. However that was only a temporary measure, because a new drive arrived the next day, and I put everything on that, forgot to write the bootloader, fixed that, and got it working, with twice the disk space I had before, but that… that’s a story for another day.

comments powered by Disqus