A while ago, I had my Proliant N40L server running as my home NAS, on FreeNAS 0.7, which was working rather well as a file server. Didn't have much fun getting the DLNA streaming working properly, but the ZFS implementation was great, the speeds were rapid, and it was quite brilliant. Unfortunately I also used 4x 1TB Seagate Barracuda desktop drives in it, which resulted in one dying after a while with the vibration in the cage. Oops.
So, one rebuilt server later with 4x 3TB WD Red NAS drives, I was ready to build a new tank. Decided to have a go with
FreeNAS 8.2, which worked quite well. However, don't be fooled by the out-of-box configuration and built-in tweaker - I had to drop in quite a few optimisations from my old 0.7 setup to get it running quickly. But it was running quite well, it has to be said. There's 8GB of RAM in this box, for context later.
However, after some thought, I decided I wanted to get the
Plex Media Server running. Sadly, at the time of writing this, it doesn't support FreeBSD, nor can FreeBSD run it under the Linux emulation as it uses epoll, which isn't supported at the moment on the BSD side. So... I decided to have a go with Linux.
One thing I didn't want to lose from the FreeNAS implementation, which was one of the primary reasons for using it, was ZFS. A while ago I'd discovered
ZFS on Linux, a full kernel module for Linux, which neatly sidesteps the licensing issue (ZFS is licenced under the old Sun CDDL, which is incompatible with GPL, which is why ZFS isn't happily in the Linux kernel and doing amazing things). So, installed a copy of Ubuntu 12.04 server onto a USB flash drive via a VM on my PC, booted the microserver up off the USB key, and added the repositories for ZFS on Linux. A couple of quick installs later, and lo and behold, ZFS on Linux. Surprisingly, the performance outdid both FreeNAS implementations, although that could be other factors at work at this point:
FreeNAS 0.7
freenas:/mnt/tank# dd if=/dev/zero of=./testfile bs=1M count=10K
10240+0 records in
10240+0 records out
10737418240 bytes transferred in 87.457035 secs (122773637 bytes/sec) - 117MB/sec write
freenas:/mnt/tank# dd if=./testfile of=/dev/null bs=1M
10240+0 records in
10240+0 records out
10737418240 bytes transferred in 60.337002 secs (177957437 bytes/sec) - 169MB/sec read
FreeNAS 0.8 (before tweaks)
[root@freenas] /mnt/storage# dd if=/dev/zero of=./testfile bs=1M count=10K && dd if=./testfile of=/dev/null bs=1M
10240+0 records in
10240+0 records out
10737418240 bytes transferred in 225.523565 secs (47611070 bytes/sec) (45.4MB/sec)
10240+0 records in
10240+0 records out
10737418240 bytes transferred in 78.246473 secs (137225588 bytes/sec) (131MB/sec)
FreeNAS 0.8 (after tweaks)
[root@freenas] /mnt/storage# dd if=/dev/zero of=./testfile bs=1M count=10K && dd if=./testfile of=/dev/null bs=1M && rm ./testfile
10240+0 records in
10240+0 records out
10737418240 bytes transferred in 62.405507 secs (172058825 bytes/sec) - 164MB/sec write
10240+0 records in
10240+0 records out
10737418240 bytes transferred in 53.998347 secs (198847165 bytes/sec) - 189MB/sec read
And finally, Ubuntu 12.04 with ZFS on Linux:
root@RickNAS:~# dd if=/dev/zero of=/storage/testfile bs=1M count=10K && dd if=/storage/testfile of=/dev/null bs=1M
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 67.544 s, 159 MB/s
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 31.78 s, 338 MB/s
rick@RickNAS:~$ dd if=/dev/zero of=/storage/testfile bs=1M count=10K; dd if=/dev/zero of=/storage/testfile2 bs=1M count=10K; dd if=/storage/testfile of=/dev/null bs=1M; dd if=/storage/testfile2 of=/dev/null bs=1M
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 61.1301 s, 176 MB/s
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 59.9313 s, 179 MB/s
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 47.6692 s, 225 MB/s
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 32.2975 s, 332 MB/s
Quite strange, the improvement, but I wasn't going to knock it. So, all looked good, until I tried doing file transfers over CIFS (Windows networking), FTP and SFTP. The results were... abysmal. CIFS and FTP started fast, and rapidly dropped to about 30MB/sec. SFTP was... well, 3MB/sec didn't impress at all. To ensure that ZFS wasn't the issue, I reformatted one of the partitions as ext4, and did another test:
root@RickNAS:/storagetest# dd if=/dev/zero of=./testfile bs=1M count=10K
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 66.944 s, 160 MB/s
root@RickNAS:/storagetest# dd if=./testfile of=/dev/zero bs=1M
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 68.2629 s, 157 MB/s
root@RickNAS:/storagetest#
Not as fast as the raidz was, but still reasonably respectable. Sadly, I got similar results from CIFS etc.
Deciding to stick with ext4 for the moment just to eliminate any extra possible issues, I started to wonder why the performance was so terrible. I ran iperf to my PC, and got pretty respectable speeds out of it, far exceeding the data speeds I was getting in transfer. I went to work trying to improve the Samba speed, to no avail. No matter what I tweaked, nothing seemed to have any effect.
Next step: attempt reinstall natively, after having written out the ISO to USB.
So, reinstalled, after using the PenDrive Linux installer. Got it installed natively this time. Made zero difference. SFTP speeds are pathetic, around the 10MB/sec mark. Across gigabit, that's a bit of a joke, and it's nowhere near maxing CPU, so it's not that as far as I can tell. Did read up online that the driver for the onboard NIC can sometimes be less than perfect and to upgrade to a newer kernel. Did that. Nothing happened. Well, time passed, that's about it. iperf figured look okay though, averaging about 850mbit mark, which isn't too shabby. So there's some sort of bottleneck going on, and it's getting quite frustrating.
Next, I've tried to download /dev/zero via SFTP (in FileZilla on the Windows side) and it's settled for now at about 7.2MB/sec download speed. There's something quite clearly wrong. It's using 36% of one CPU for SSHD and 4-5% for sftp-server. It's clearly quite broken. I created a 4GB file in tmpfs in RAM for transfer tests, and vsftpd has slowed to 6.6MB/sec, which is very broken indeed. Make that 6.1MB/sec. Ouch. Rerunning the SFTP transfer resulted in SSHD taking up 96% of CPU at first (fair enough, maxed out) and doing about 40MB/sec, but it rapidly dwindled back to the 8MB/sec mark, using 26% CPU.
Next step: trying local FTP. Hooked in via localhost, retrieved said ram-based file and wrote to disk. 4294967296 bytes received in 26.49 secs (158306.3 kB/s). So, 26s to download the 4GB file, 157.5MB/sec. Seems reasonable. The put was even faster, 13.58 secs for the 4GB. So clearly it's not the software, or at least, not when running locally.
Okay. Back to the basics. Ran a few more longer iperf tests, for 6 minutes at a time each way, speeds were around 820mbit/sec on average. Nothing too spectacular, but nothing too unreasonable, and I was using the PC that they were connecting to at the time as well. Out of curiousity, I installed the CIFS utils and mounted up my Windows network share onto Linux, to copy some files across via the protocol to see what happened. This is where it got a little more confusing.
First, I ran DD to test it, using an mkv video file, 1.6GB. The second time I ran it after dumping the cache, just to make sure, was a little slower than the first time but still fairly respectable.
root@RickNAS:/mnt/win/Rick/Downloads# dd if=./testfile.mkv of=/tmpfs/testfile.mkv bs=1M
1548+1 records in
1548+1 records out
1623222074 bytes (1.6 GB) copied, 16.5486 s, 98.1 MB/s
98MB/sec. Reasonable enough, I think it came out at about 105-110 last time. The same timed operation with cp, after a reboot to flush caches, and a different file of similar size at 1.53GB (as I was struggling to kill caches):
root@RickNAS:/mnt/win/Rick/Downloads# time cp ./testfile.mkv /tmpfs/testfile.mkv
real 1m25.174s
user 0m0.096s
sys 0m6.992s
So, 85 seconds to copy 1.53GB, which equates to just under 18.5MB/sec. Seems a bit harsh. Dropped the cache, and ran it again to see what happened, after the Windows box had cached it:
real 0m17.701s
user 0m0.096s
sys 0m7.248s
So. No real problem copying there, at 88.5MB/sec. Similar to the above then, really, but a little slower. Still, acceptable enough. At this point, I have to eliminate the Windows box as a potential source of slowdown. Theoretically it shouldn't be, as it's on a RAID 0 with a pair of 500GB drives, but... one way to find out. Cue DD for Windows, and a /dev/zero device driver (translated: \\.\zero):
C:\temp>dd if=\\.\zero of=.\testfile bs=1M count=1K
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 11.5837 seconds, -0.0 MB/s
Well, that -0.0 MB/s is wrong, but the theory is reasonable. Trying a 10GB file for good measure:
C:\temp>dd if=\\.\zero of=.\testfile bs=1M count=10K
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 812.172 seconds, -246489356887425550000[snip] MB/s
Erm... 0.21MB/sec?! I know I did a bit of web browsing in the meantime... but there's something wrong there. I'm somewhat hoping it's the /dev/zero driver...
Okay. So plan B. Let's try a dd from the Linux box to the Windows box, the other way around.
root@RickNAS:/mnt/win/Rick/Downloads# dd if=/tmpfs/testfile.mkv of=./testfile.mkv bs=1M
1576+1 records in
1576+1 records out
1653435119 bytes (1.7 GB) copied, 123.167 s, 13.4 MB/s
... This would be a facepalm sort of moment then.
So what the
hell is going on with the Windows box?!
CrystalDiskMark didn't report anything too strange on sequential read/write. 178.8MB/s read, 102.8MB/s write.
After this, I was entirely unable to access any of the mounted files on the Windows side, which left me rather confused. Had to drop
this fix into my Windows box, and then all was shiny again. Very odd. So, another try of the DD then...
dd if=/tmpfs/testfile.mkv of=./testfile.mkv bs=1M
1576+1 records in
1576+1 records out
1653435119 bytes (1.7 GB) copied, 60.3011 s, 27.4 MB/s
Slightly better, but still a bit rubbish, in all honesty. Again, for good measure:
Resource monitor did show the write speed at 75MB/sec... briefly. Oh.
1576+1 records in
1576+1 records out
1653435119 bytes (1.7 GB) copied, 18.7947 s, 88.0 MB/s
... I don't get it. Same again got 87.3MB/s. It shows it as writing. Okay. Let's try another FTP download shall we? Nope, still rubbish.
Okay, time to eliminate the hard disk as an issue. Time to download to a USB HDD and see how that goes.
... Not so well. It chugged along at about 11MB/sec. That's quite poor. But it was fairly consistent, at least. CrystalDiskMark gave it a whole 10MB/sec write too, so no shock. Okay. Throwing a pure RAM disk into Windows to take out the drives as culprits.
dd:
root@RickNAS:/mnt/win# dd if=/tmpfs/testfile.mkv of=./testfile.mkv bs=1M
1576+1 records in
1576+1 records out
1653435119 bytes (1.7 GB) copied, 16.2628 s, 102 MB/s
Looks far more like normality.
ftp:
File transfer successful, transferred 1,653,435,119 bytes in 16 seconds.
So, that'll be... 98.55MB/sec then. Oh good.
Out of curiosity, I ran sftp, just to see what the performance was like, although I was expecting "dire". I got what I expected, about 20MB/sec. Fair enough, that's CPU-bound. Okay, other way round, just to make sure:
root@RickNAS:/mnt/win# dd if=./testfile.mkv of=/tmpfs/testfile.mkv bs=1M
1576+1 records in
1576+1 records out
1653435119 bytes (1.7 GB) copied, 16.1864 s, 102 MB/s
Looks good. Trying cp commands both ways:
root@RickNAS:/mnt/win# time cp /tmpfs/testfile.mkv ./testfile.mkv
real 0m16.803s
user 0m0.064s
sys 0m7.208s
root@RickNAS:/mnt/win# time cp ./testfile.mkv /tmpfs/testfile.mkv
real 0m15.218s
user 0m0.060s
sys 0m7.432s
So. RAM to RAM is actually fine. I'm going to run Samba to test it out too, to make sure all is as it should be...
... From: 100MB/sec. Copying back 110MB/sec.
Oh.
Okay, so let's try Linux HDD to ramdisk, just to make sure all is shiny as it should be.
... About 110 each way.
So. After all that hunting... all that debugging... all that chasing ghosts, upgrading network drivers and kernels, tearing my hair out... there was nothing wrong with the Linux box at all... it turned out it was my local hard disk. I just hope that this post saves someone else the hours of attempted fixes I tried. It's been a painful journey, but a lesson well learned.