Skip to content

Official Raspberry Pi NVMe SSD drive (Biwin) ignores TRIM commands #6627

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ned14 opened this issue Jan 23, 2025 · 13 comments
Closed

Official Raspberry Pi NVMe SSD drive (Biwin) ignores TRIM commands #6627

ned14 opened this issue Jan 23, 2025 · 13 comments

Comments

@ned14
Copy link

ned14 commented Jan 23, 2025

Describe the bug

The official 512Gb Raspberry Pi NVMe SSD drive ignores TRIM commands, despite claiming to support them. Other NVMe SSD drives do not do this.

0000:01:00.0 Non-Volatile memory controller: Biwin Storage Technology Co., Ltd. KingSpec NX series NVMe SSD (DRAM-less) (rev 01) (prog-if 02 [NVM Express])
        Subsystem: Biwin Storage Technology Co., Ltd. KingSpec NX series NVMe SSD (DRAM-less)
NVME Identify Controller:
vid       : 0x1dee
ssvid     : 0x1dee
sn        : 2446143801757       
mn        : BIWIN CE430T5D100-512G                  
fr        : 1.4.7.67
rab       : 2
ieee      : 50c68e
cmic      : 0
mdts      : 5
cntlid    : 0
ver       : 0x10400
rtd3r     : 0x124f80
rtd3e     : 0x2191c0
oaes      : 0
ctratt    : 0
rrls      : 0
cntrltype : 1
fguid     : 00000000-0000-0000-0000-000000000000
crdt1     : 0
crdt2     : 0
crdt3     : 0
nvmsr     : 0
vwci      : 0
mec       : 0
oacs      : 0x17
acl       : 3
aerl      : 3
frmw      : 0x18
lpa       : 0x2
elpe      : 63
npss      : 4
avscc     : 0x1
apsta     : 0x1
wctemp    : 358
cctemp    : 360
mtfa      : 0
hmpre     : 16384
hmmin     : 3072
tnvmcap   : 512110190592
unvmcap   : 0
rpmbs     : 0
edstt     : 5
dsto      : 1
fwug      : 1
kas       : 0
hctma     : 0x1
mntmt     : 323
mxtmt     : 360
sanicap   : 0
hmminds   : 256
hmmaxd    : 64
nsetidmax : 0
endgidmax : 0
anatt     : 0
anacap    : 0
anagrpmax : 0
nanagrpid : 0
pels      : 0
domainid  : 0
megcap    : 0
sqes      : 0x66
cqes      : 0x44
maxcmd    : 0
nn        : 1
oncs      : 0x14
fuses     : 0
fna       : 0
vwc       : 0x7
awun      : 0
awupf     : 0
icsvscc   : 1
nwpc      : 0
acwu      : 0
ocfs      : 0
sgls      : 0x70001
mnan      : 0
maxdna    : 0
maxcna    : 0
oaqd      : 0
subnqn    : 
ioccsz    : 0
iorcsz    : 0
icdoff    : 0
fcatt     : 0
msdbd     : 0
ofcs      : 0
ps      0 : mp:3.00W operational enlat:100 exlat:600 rrt:0 rrl:0
            rwt:0 rwl:0 idle_power:- active_power:-
            active_power_workload:-
ps      1 : mp:2.80W operational enlat:150 exlat:700 rrt:1 rrl:1
            rwt:1 rwl:1 idle_power:- active_power:-
            active_power_workload:-
ps      2 : mp:2.70W operational enlat:200 exlat:1000 rrt:2 rrl:2
            rwt:2 rwl:2 idle_power:- active_power:-
            active_power_workload:-
ps      3 : mp:0.2100W non-operational enlat:1000 exlat:13000 rrt:3 rrl:3
            rwt:3 rwl:3 idle_power:- active_power:-
            active_power_workload:-
ps      4 : mp:0.0090W non-operational enlat:2000 exlat:19000 rrt:4 rrl:4
            rwt:4 rwl:4 idle_power:- active_power:-
            active_power_workload:-

Steps to reproduce the behaviour

lsblk -D
NAME        DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
loop0              0      128K       4G         0
nvme0n1            0      512B       2T         0
├─nvme0n1p1        0      512B       2T         0
├─nvme0n1p2        0      512B       2T         0
└─nvme0n1p3        0      512B       2T         0

Linux thinks it is supported.

nvme list
Node                  Generic               SN                   Model                                    Namespace  Usage                      Format           FW Rev  
--------------------- --------------------- -------------------- ---------------------------------------- ---------- -------------------------- ---------------- --------
/dev/nvme0n1          /dev/ng0n1            2446143801757        BIWIN CE430T5D100-512G                   0x1        512.11  GB / 512.11  GB    512   B +  0 B   1.4.7.67

Apparently all space is allocated (it is not, 1.52Gb out of 457Gb is allocated). Manually force a trim:

zpool trim -w rpool

It takes about a minute, so it's doing something.

nvme list
Node                  Generic               SN                   Model                                    Namespace  Usage                      Format           FW Rev  
--------------------- --------------------- -------------------- ---------------------------------------- ---------- -------------------------- ---------------- --------
/dev/nvme0n1          /dev/ng0n1            2446143801757        BIWIN CE430T5D100-512G                   0x1        512.11  GB / 512.11  GB    512   B +  0 B   1.4.7.67

Absolutely no change. Lets make sure it's not ZFS:

fstrim -v -a
/boot/firmware: 319.5 MiB (334980608 bytes) trimmed on /dev/nvme0n1p1
nvme list
Node                  Generic               SN                   Model                                    Namespace  Usage                      Format           FW Rev  
--------------------- --------------------- -------------------- ---------------------------------------- ---------- -------------------------- ---------------- --------
/dev/nvme0n1          /dev/ng0n1            2446143801757        BIWIN CE430T5D100-512G                   0x1        512.11  GB / 512.11  GB    512   B +  0 B   1.4.7.67

Nope, fstrim on the FAT boot partition also was ignored.

I also have a Pi with a Samsung SM961 MZ-VPW2560 NVMe SSD also running ZFS on root. There TRIM from ZFS works perfectly.

I think your Biwin SSD either has a misconfiguration in your kernel drivers, or its firmware is buggy. If the latter, you are surely best placed to persuade Biwin to release a fixed firmware for your SSD.

Device (s)

Raspberry Pi 5

System

Ubuntu 24.04 LTS

2025/01/14 00:16:48
Copyright (c) 2012 Broadcom
version 0451f142 (release) (embedded)

Linux europe7b 6.8.0-1017-raspi #19-Ubuntu SMP PREEMPT_DYNAMIC Fri Dec 6 20:45:12 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux

Logs

dmesg is absolutely clean and shows no errors of any kind.

Additional context

No response

@P33M
Copy link
Contributor

P33M commented Jan 23, 2025

Can you demonstrate a different NVMe drive producing different results? How are they different?

@ap-wtioit
Copy link

@P33M trim should normally trim the nvme namespace usage:

user@host:~$ sudo nvme list
Node                  SN                   Model                                    Namespace Usage                      Format           FW Rev  
--------------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1          S675NX0RA09...       SAMSUNG MZVL2512HCJQ-00B00               1         482.99  GB / 512.11  GB    512   B +  0 B   GXA7601Q
/dev/nvme1n1          S675NX0T400...       SAMSUNG MZVL2512HCJQ-00B00               1         483.10  GB / 512.11  GB    512   B +  0 B   GXA7601Q
user@host:~$ sudo fstrim -av
/boot: 0 B (0 bytes) trimmed on /dev/md1
/: 142.7 GiB (153169674240 bytes) trimmed on /dev/md2
user@host:~$ sudo nvme list
Node                  SN                   Model                                    Namespace Usage                      Format           FW Rev  
--------------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1          S675NX0RA09...       SAMSUNG MZVL2512HCJQ-00B00               1         357.80  GB / 512.11  GB    512   B +  0 B   GXA7601Q
/dev/nvme1n1          S675NX0T400...       SAMSUNG MZVL2512HCJQ-00B00               1         357.92  GB / 512.11  GB    512   B +  0 B   GXA7601Q
user@host:~$ 

@ned14
Copy link
Author

ned14 commented Jan 23, 2025

I have an identically configured Pi (Ubuntu 24.04 on ZFS on root) using a different SSD model:

nvme list
Node                  Generic               SN                   Model                                    Namespace  Usage                      Format           FW Rev  
--------------------- --------------------- -------------------- ---------------------------------------- ---------- -------------------------- ---------------- --------
/dev/nvme0n1          /dev/ng0n1            S34ENB0J809464       SAMSUNG MZVPW256HEGL-000H1               0x1         70.83  GB / 256.06  GB    512   B +  0 B   CXZ74H0Q

Here TRIM is working correctly (first Gb is much lower than second Gb). Identical kernel, however the Pi firmware is slightly older:

2023/12/06 18:29:25 
Copyright (c) 2012 Broadcom
version e02d33b3 (release) (embedded)

Apart from the firmware being slightly different, and the SSD models being different, these two Pi are configured absolutely identically using the same set of instructions.

@meistermeier
Copy link

Exactly the same as the issue author (original 512Gb model):
before

Node                  Generic               SN                   Model                                    Namespace Usage                      Format           FW Rev  
--------------------- --------------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1          /dev/ng0n1            2446143806502        BIWIN CE430T5D100-512G                   1         512.11  GB / 512.11  GB    512   B +  0 B   1.4.7.67

trim it

sudo fstrim -av
/boot/firmware: 445.4 MiB (467087360 bytes) trimmed on /dev/nvme0n1p1
/: 698 MiB (731889664 bytes) trimmed on /dev/nvme0n1p2

after

Node                  Generic               SN                   Model                                    Namespace Usage                      Format           FW Rev  
--------------------- --------------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1          /dev/ng0n1            2446143806502        BIWIN CE430T5D100-512G                   1         512.11  GB / 512.11  GB    512   B +  0 B   1.4.7.67

Posting here to not just add a ➕
😄

@pelwell
Copy link
Contributor

pelwell commented Jan 23, 2025

Does it claim to trim every time? And does the usage persist across reboots?

@bretmlw
Copy link

bretmlw commented Jan 23, 2025

Does it claim to trim every time? And does the usage persist across reboots?

I have the same drive (512GB BIWIN) and it doesn't claim to trim every time, on further attempts it says 0B trimmed. On reboot, it continues to show 100% usage and shows /: 466.9 GiB (501362577408 bytes) trimmed on /dev/nvme0n1p2 again

Same behaviour on a Silicon Power A60 256GB, and a Crucial P3 Plus 500GB at this point. The rest are in the office, including the 256GB Samsung-based Pi one sadly.

@pelwell
Copy link
Contributor

pelwell commented Jan 23, 2025

it doesn't claim to trim every time, on further attempts it says 0B trimmed.

Interesting. Other than the output from fstrim -v -a, is there any evidence that the trim has not been successful?

@ned14
Copy link
Author

ned14 commented Jan 23, 2025

Other than the output from fstrim -v -a, is there any evidence that the trim has not been successful?

There is nothing in the kernel logs. There is nothing in the NVMe logs. As far as the kernel and filesystem is aware, the TRIM is successful.

Additional data point: when you first turn on the drive and do nvme list, it shows about 20Mb used out of the total. So the Biwin firmware DOES have a concept of all the space not being allocated initially. Once every sector has been written, nvme list always shows all the space being used. blkdiscard for the whole drive does go away for a few seconds before coming back, AND the drive contents have disappeared, but nvme list is unaffected.

I haven't tried it, but if one did a secure erase of the drive to completely reset it, it would be interesting to learn if nvme list would then report no space used.

There is another possibility: TRIM is working fine, but the drive does not report it correctly to nvme list, so the bug is purely a reporting bug. I suggest this as blkdiscard does reset the drive's contents to zero, so it's 'working' by that measure.

In any case, I think the Pi Foundation ought to investigate. If TRIM isn't working, the longevity of these drives will be impacted never mind write performance once 'full'.

@P33M
Copy link
Contributor

P33M commented Jan 24, 2025

nvme list uses the Identify Namespace command to extract this information. Depending on the namespace feature THINP this value can be static and equal to the namespace size:
NVMe specification 1.4c s6.1.7:

When the THINP bit in the NSFEAT field is cleared to ‘0’, the controller:
• shall report a value in the Namespace Capacity field that is equal to the Namespace Size; and
• may report a value in the Namespace Utilization field that is always equal to the value in the Namespace Capacity field.

THINP is indeed 0:

nvme id-ns /dev/nvme0n1 | grep nsfeat
nsfeat  : 0

Deallocate support is mandatory, and as your testing shows, causes data to be rendered unrecoverable.

@P33M P33M closed this as completed Jan 24, 2025
@ned14
Copy link
Author

ned14 commented Jan 25, 2025

THINP is indeed 0:

nvme id-ns /dev/nvme0n1 | grep nsfeat
nsfeat : 0
Deallocate support is mandatory, and as your testing shows, causes data to be rendered unrecoverable.

I appreciate your feedback that a NVMe device is permitted to not reduce its reported utilisation and it's still "okay".

I don't think the THINP bit has much to do with anything except that this behaviour is permitted. On my desktop machine:

nvme list
Node                  Generic               SN                   Model                                    Namespace  Usage                      Format           FW Rev  
--------------------- --------------------- -------------------- ---------------------------------------- ---------- -------------------------- ---------------- --------
/dev/nvme0n1          /dev/ng0n1            PHM2908100PZ960CGN   INTEL SSDPE21D960GA                      0x1        960.20  GB / 960.20  GB    512   B +  0 B   E2010480
/dev/nvme1n1          /dev/ng1n1            S677NF0R551920       SAMSUNG MZVL22T0HBLB-00B00               0x1          1.14  TB /   2.05  TB    512   B +  0 B   GXB7801Q
/dev/nvme2n1          /dev/ng2n1            S677NF0R551887       SAMSUNG MZVL22T0HBLB-00B00               0x1          1.68  TB /   2.05  TB    512   B +  0 B   GXB7801Q
/dev/nvme3n1          /dev/ng3n1            S463NF0K939687P      Samsung SSD 970 PRO 512GB                0x1        432.33  GB / 512.11  GB    512   B +  0 B   1B2QEXP7

nsfeat is zero for all four (the Intel drive is Optane, it doesn't need to care about flash write lifetimes).

I think my main concern and bugbear is how can I tell if these Biwin drives actually implement TRIM properly or not? Will they have longevity problems? Will there be write speed issues?

I feel a bit annoyed because early Raspberry Pi branded NVMe SSDs were actually rebadged Samsung PM991a's and they're a great drive where you can see TRIM working correctly.

If I can't see TRIM working, that gets me anxious. The Samsung PM991a is very similar money to the Raspberry Pi branded Biwin drive. Surely the advice to all Raspberry Pi owners must be "avoid the branded Biwin NVMe drive, get a Samsung PM991a instead"?

I don't expect an answer to that - it is a rhetorical question. All I can say is this type of thing tends to appear on HackerNews and people get worked up. So expect more comments here maybe.

Thanks for looking into this in a timely fashion.

@geerlingguy
Copy link

@ned14 - One quick note — I do remember asking Raspberry Pi about this (using different hardware underneath the sticker), and they said that's the main reason their specifications were more generic on the Raspberry Pi SSD page.

The specs are a minimum, and as long as those are met, the underlying chips are interchangeable.

I'm not judging whether that's a good policy or bad, but at least from their perspective, the Biwin controller meets the specifications on the tin. I think if you want a consistent experience with any SSD (whether Pi branded, Pineboards, Cytron, etc.), I would rather stick to one of the flash manufacturers directly... but even there, with the same model, this part-swapping happens!.

@ned14
Copy link
Author

ned14 commented Jan 26, 2025

@geerlingguy I guess how much this matters to you depends on use case. In my case these Pis are off to colocation in a far away data centre where they are expected to be operational for a decade. Getting physical access to them after will be expensive and time consuming. ZFS being copy on write does a LOT of writing (I have a Samsung 830 here, 100,000 hours powered on a couple of Petabytes written and going strong).

Some of my concern is that there aren't many 2230 sized NVMe SSDs which don't have compatibility issues with the Pi. This is why the stamp of approval from the Pi Foundation would matter.

With the benefit of hindsight and for anybody else reading this later, if you needed an ultra reliable Pi, my advice to anybody would be:

  1. Fit only a truly full sized 2280 NVMe SSD to the Pi. Why? Better heat spreading.
  2. Choose a model with plenty of DRAM for flash tables as the Pi appears to particularly be hurt by DRAMless SSDs.
  3. Choose a model where TRIM functioning and write amplification can be measured over time.
  4. Choose a SSD where both the controller and flash are made by the vendor as most of the others tend to be rushed out the door and not honed to perfection over time.

Which pretty much leaves only SSDs with genuine Samsung controllers on them (NOT Samsung branded SSDs with non-Samsung controllers), SK Hynix, Kioxia or Micron. Most of the cheaper supply off eBay are used system pulls, they tend to have 10k hours on them and a few dozen Tb of data written. Those tend to be good value for money.

@geerlingguy
Copy link

@ned14 - Yeah, I agree with you, especially wrt 2280 instead of 2230... the latter is generally only used in consumer or highly-space-constrained devices. It's think that's why many of the other entrants like Pineboards and Cytron (and now I think Argon40 as well) have gone with 2280 and sometimes 2242 sizes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants