Goal
I am trying to figure out why my file system has become read-only so I can address any potential hardware or security issues (main concern) and maybe fix the issue without having to reinstall everything and migrate my files from backup (I might lose some data but probably not much).
According to the manual of btrfs check
:
Do not use --repair unless you are advised to do so by a developer or an experienced user, and then only after having accepted that no fsck successfully repair all types of filesystem corruption. Eg. some other software or hardware bugs can fatally damage a volume.
I am thinking of trying the --repair
option or btrfs scrub
but want input from a more experienced user.
What I’ve tried
I first noticed a read-only file system when trying to update my system in the terminal. I was told:
Cannot open log file: (30) - Read-only file system [/var/log/dnf5. log]
I have run basic checks (using at least 3 different programs) of my SSD without anything obviously wrong. The SSD and everything else in my computer is about 6 and a half years old, so maybe something is failing. Here is the SMART Data section of the output from sudo smartctl -a /dev/nvme0n1
:
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02, NSID 0x1)
Critical Warning: 0x00
Temperature: 31 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 1%
Data Units Read: 33,860,547 [17.3 TB]
Data Units Written: 31,419,841 [16.0 TB]
Host Read Commands: 365,150,063
Host Write Commands: 460,825,882
Controller Busy Time: 1,664
Power Cycles: 8,158
Power On Hours: 1,896
Unsafe Shutdowns: 407
Media and Data Integrity Errors: 0
Error Information Log Entries: 4,286
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 31 Celsius
Temperature Sensor 2: 30 Celsius
Error Information (NVMe Log 0x01, 16 of 64 entries)
No Errors Logged
Self-test Log (NVMe Log 0x06, NSID Oxffffffff)
Self-test status: No self-test in progress
No Self-tests Logged
I tried the following I think from a live disk sudo mount -o remount,rw /mount/point
but that output an error such as, cannot complete read-only system
.
sudo btrfs device stats /home
and sudo btrfs device stats /
outputs:
[/dev/mapper/luks-7215db73-54d1-437e-875d-f82fae508b5d].write_io_errs 0
[/dev/mapper/luks-7215db73-54d1-437e-875d-f82fae508b5d].read_io_errs 0
[/dev/mapper/luks-7215db73-54d1-437e-875d-f82fae508b5d].flush_io_errs 0
[/dev/mapper/luks-7215db73-54d1-437e-875d-f82fae508b5d].corruption_errs 14
[/dev/mapper/luks-7215db73-54d1-437e-875d-f82fae508b5d].generation_errs 0
This seems to suggest that corruption is only in the /home directory.
However, sudo btrfs check /dev/mapper/luks-7215db73-54d1-437e-875d-f82fae508b5d
stops at [5/8] checking fs roots
with the end of the output at the top of this image:
Some of these files may be in the /
directory, but I’m not sure without looking into further.
sudo btrfs fi usage /
provides:
I think that Data,single
, Metadata,DUP
, and System,DUP
might be saying I can repair the corruption if it’s only in metadata or system but not if it’s the actual file data. Might be something to explore more.
Here is vi /etc/fstab
:
sudo dmesg | grep -i “btrfs”
states:
The file system is indeed unstable. Once, I wasn’t able to list any files in my /home
directory, but I haven't run into this issue again across several reboots.
What I think might be causing this
I suspect that changing my username, hostname, and display name (shown on the login screen) recently may have caused problems because my file system became read-only about a week to a week and a half after doing so. I followed some tutorials online, but I noticed that many of my files still had the group and possibly user belonging to the old username. So I created a symbolic link at the top of my home directory pointing the old username to the new one, and it seemed like everything was fine until the read-only issue. There may have been more I did, but I don’t remember exactly as it’s been a few weeks now. I have a history of most or all of the commands I ran if it might be helpful.
I think it may be something hardware related, something I did, software bugs (maybe introduced by a recent update — I have a picture of packages affected in my most recent dnf upgrade
transaction, but I was unable to rollback or undo the upgrade because of the read-only file system), improper shutdowns (may have done this while making changes to the username, hostname, and display name), or a security issue.
dmesg
filtered for “btrfs” near the end of the same section. Filtering for “error” turned up nothing else relevant AFAIK. Looking atjournalctl -xb
now. From what I've read so farbtrfs check —repair
looks like a long shot to me as well{}
icon, or by adding a line containing three backticks before AND after the text. 2.You are almost certainly wrong about what you think is causing this. Changing the hostname etc will not cause disk errors.smartctl -a
on the SSD's real device node (not the /dev/mapper entry), and keep an eye out for the drive's age and lifetime and any FAILED/FAILING entries. maybe something likesmartctl -a /dev/sdi | awk '$1 ~ /^(9|202)/ || /FAIL(ING|ED)/'
which works for my ancient Crucial MX300 drives.smartctl -a
doesn't show the attributes on an nvme like it does for a sata ssd - i assumed sata ssd was what you meant when you said "SSD" but didn't say "NVME" . I only realised you were talking about an nvme when I saw the smartctl output. And, yeah, the info seems inconsistent and contradictory. and the power on hours seems completely wrong for a 6.5 year old drive. BTW, "Power on Hours" is exactly what the name implies - the count of hours where the drive has had power.