Below are some interface bandwidth tests on various systems/drives using hdparm -tT. With hdparm, my understanding is:
cached reads: essentially the RAM bandwidth of the host CPU – should be roughly the same for all devices
buffered disk reads: the read bandwidth of the disk interface. The Jeson speed divided by 4 is 388, and we got 300 on the i.MX8.
Note, the i.MX8 setup in this test had some hardware modifications to the PCI signals that were not compliant for routing the high-speed signals, so the performance may be degraded in this test. Will update this once we have the next rev.
Note, below is just a sample of one – results did vary some from run to run.
Observations:
Jetson NVMe speed is almost as fast as the AMD workstation
NVMe read bandwidth is more than 5 times more than SSD.
the i.MX8 single-lane nVME speed is less than 1/4 the speed of the 4-lane systems.
the eMMC on the i.MX8 is almost as fast as the NVMe (280 vs 300).
SD is quite slow compared to eMMC (70/85 vs 280).
SD speeds on Jetson and i.MX8 were similar (70 vs 85).
Summary of test results:
Platform
Disk
Disk Read Bandwidth
AMD
NVMe
1,847 MB/sec
AMD
SSD
332 MB/sec
AMD
Aegis NVX (USB 2.0)
38 MB/sec
AMD
Aegis NVX (USB 3.0)
418 MB/sec
i.MX8
SD
85 MB/sec
i.MX8
eMMC
280 MB/sec
i.MX8
NVMe
300 MB/sec
Orin
SD
70 MB/sec
Orin
NVMe
1,554 MB/sec
AMD Ryzen 3900X Workstation
NVMe Samsung SSD 970 EVO 500GB
/dev/nvme0n1:
Timing cached reads: 32046 MB in 1.99 seconds = 16081.96 MB/sec
Timing buffered disk reads: 5544 MB in 3.00 seconds = 1847.35 MB/sec
SSD
/dev/sda:
Timing cached reads: 31542 MB in 1.99 seconds = 15834.95 MB/sec
Timing buffered disk reads: 998 MB in 3.00 seconds = 332.61 MB/sec
i.MX8 QuadMax
eMMC
/dev/mmcblk0:
Timing cached reads: 1592 MB in 2.00 seconds = 795.83 MB/sec
Timing buffered disk reads: 842 MB in 3.01 seconds = 280.14 MB/sec
SD
/dev/mmcblk1:
Timing cached reads: 1620 MB in 2.00 seconds = 810.12 MB/sec
Timing buffered disk reads: 256 MB in 3.01 seconds = 85.06 MB/sec
NVMe (1 lane) Samsung 970 EVOPlus 1TB
/dev/nvme0n1:
Timing cached reads: 1582 MB in 2.00 seconds = 790.66 MB/sec
Timing buffered disk reads: 900 MB in 3.00 seconds = 299.57 MB/sec
Jetson Orin Nano
SD
/dev/mmcblk1:
Timing cached reads: 6080 MB in 2.00 seconds = 3043.07 MB/sec
Timing buffered disk reads: 212 MB in 3.03 seconds = 70.04 MB/sec
NVMe (4 lane) Samsung 970 EVOPlus 1TB
/dev/nvme0n1p1:
Timing cached reads: 7836 MB in 2.00 seconds = 3922.64 MB/sec
Timing buffered disk reads: 4664 MB in 3.00 seconds = 1553.92 MB/sec
[cbrake@ceres yoe-distro]$ sudo hdparm -tT /dev/nvme1n1
/dev/nvme1n1:
Timing cached reads: 34618 MB in 1.99 seconds = 17378.62 MB/sec
Timing buffered disk reads: 1546 MB in 3.00 seconds = 515.00 MB/sec
I’m not sure what the difference is as the PCI adapter card should support 4 lanes. The difference could be the drive – guess I need to try it with an EVO drive. It is still a little faster than the on-board M.2 SSD slot.
On the i.MX8 platform with a single PCI lane, I did some tests transferring and writing large datasets using Syncthing. Syncthing was super easy to install – just wget the release and run it. Then I synchronized some large data collections to various devices.
NVMe: 110MiB/s (it appears a NVMe drive can pretty much keep up with writing data streaming over GiB Ethernet.
eMMC: the rate went up and down quite a bit, but probably averaged around 25MiB/s.
SD: 15MiB/s
NVMe is a good option in embedded systems if you need fast/large storage. Adding to your board is not that hard – requires a clock chip and two differential pairs for the PCIe Rx/Tx signals. PCIe is a very interesting signaling standard as the data clock is embedded in the data signals, so other than signals in each diff pair there are no hard requirements in length matching the clock and data lines.
Depending on the motherboard, it may not be running using four lanes as you may expect. I have seen slots closer to the bottom (furthest from CPU) run at 2x or even 1x speeds even though they are 16x PCIe slots.
Thanks for the note on differences in PCI slots – that is good to know and I’ll see if I can move it closer to the CPU next time I have the case apart.