Windows 10, NVMe SSDs, VROC

It sounded like an easy task: my main SSD was running out of space and I decided to upgrade to a 2TB Samsung 970 NVMe drive. It would usually be an easy task, but clearly I shouldn’t expect for things to be easy with the way I use a computer, still 20 years after starting doing incredibly rare stuff.

It ended up with me reinstalling Windows 10 three times, testing the Acronis backup restore procedure, buying more adapters than I ever thought I would need, and cursing my laziness when I set up a bunch of stuff in the past.

Let’s start with a bit of setup information: I’m talking about the gamestation, which I bought after moving to London because someone among the moving companies (AMC Removals in Ireland, and Simpsons Removals in London) stole it. It uses an MSI X299 SLI PLUS motherboard, and when I bought it, I bought two Crucial M.2 SSDs, for 1TB each — one dedicated to the operating system and applications, and the other to store the ever-expanding photos library.

At some point a year or so ago, the amount of pictures I took crossed the 1TB mark, and I needed more space for the photos. So thanks to the fact that NVMe SSDs became more affordable, and that you can pretty much turn any PCIe 3.0 x4 slot into an NVMe slot with a passive adapter, I decided to do just that, and bought a Samsung 970 EVO Plus 1TB, copied the operating system to it, and made the two older Crucial SSDs into a single “Dynamic Volume” to have more space for pictures.

At first I used a random passive adapter that I bought on Amazon, and while that worked perfectly nice to connect the device, it had some trouble with keeping temperature: Samsung’s software reported a temperature between 68°C and 75°C which it considers “too high”. I ended up spending a lot of time trying to find a way around this, and I ended up replacing all the fans on the machine, adding more fans, and managed to bring it down to around 60°C constantly. Phew.

A few months later, I found an advertisement for the ASUS Hyper M.2 card, which is a pretty much passive card that allows to use up to four NVMe SSDs on a PCI-E x16 slot as long as your CPU supports “bifurcation” — which I checked my CPU and motherboard both to support. In addition to allowing adding a ton of SSDs to a motherboard, the Hyper M.2 has a big aluminium heatsink and a fan, that makes it interesting to make sure the temperature of the SSD is kept in control. Although I’ll be honest and say that I’m surprised that Asus didn’t even bother adding a PWM fan control: it has an on/off switch that pokes out of the chassis and that’s about it.

Now fast forward a few more months, and my main drive is also full, and also Microsoft has deprecated Dynamic Volumes in favour of Storage Spaces. I decided that I would buy a new, bigger SSD for the main drive, and then use this to chance to migrate the photos to a storage space bundling together all three of the remaining SSDs. Since I already had the Hyper M.2 and I knew my CPU supported the bifurcation, I thought it wouldn’t be too difficult to have all four SSDs connected together…

Bifurcation and VROC

The first thing to know is that the Hyper M.2 card, when loaded with a single NVMe SSD, behaves pretty much the same way as a normal PCI-E-to-M.2 adapter: the single SSD gets the four lanes, and is seen as a normal PCI-E device by the firmware and operating system. If you connect two or more SSDs, now things are different, and you need bifurcation support.

PCI-E bifurcation allows splitting an x8 or x16 slot (8 or 16 PCI-E lanes) into two or four x4 slots, which are needed for NVMe. It requires support from the CPU (because that’s where PCI-E lanes terminate), and from the BIOS (to configure the bifurcation), and from the operating system, for some reason that is not entirely clear to me, not being a PCI-E expert.

So the first problem I found with trying to get the second SSD to work on the Hyper M.2 is that I didn’t realise how complicated the whole selection of which PCI-E slot has how many lanes is on modern motherboards. Some slots are connected to the chipset (PCH) rather than the CPU directly, but you want the videocard and the NVMe to go to the CPU instead. When you’re using the M.2 slots, they take some of the lanes away, and it depends on whether you’re using SATA or NVMe mode which lanes they take away. And it depends on your CPU how many lanes you have available.

Pretty much, you will need to do some planning and maybe some pen-and-paper diagram to follow through. In particular, you need to remember that where the lanes are distributed is statically chosen. Even though you do have a full x16 slot at the bottom of your motherboard, and you have 16 free lanes to connect, that doesn’t mean those two are connected. Indeed it turned out that the bottom slot only has x8 at best on my CPU, and instead I needed to move the Hyper M.2 two slots up. Oops.

The next problem was that despite Ubuntu Live being able to access both NVMe drives transparently, and the firmware able to boot out of them, Windows refused to boot complaining about inaccessible boot device. The answer for this one is to be found in VROC: Virtual RAID on CPU. It’s Intel’s way to implement bifurcation support for NVMe drives, and despite the name it’s not only there if you plan on using your drives in a RAID configuration. Although, let me warn here, from what I understand, bifurcation should work fine without VROC, but it looks like most firmware just enables the two together, so at least on my board you can’t use bifurcated slots without VROC enabled.

The problem with VROC is that while Ubuntu seems to pass through it natively, Windows 10 doesn’t. Even 20H1 (which is the most recent release at the time of writing) doesn’t recognize SSDs connected to a bifurcated host unless you provide it with a driver, which is why you end up with the inaccessible boot device. It’s the equivalent of building your own Linux kernel, and forgetting the disk controller driver or the SCSI disk driver. I realized that when I tried doing a clean install (hey, I do have a back for a reason!), and the installer didn’t even see the drives, at all.

This is probably the closest I’m getting to retrocomputing, by reminding me of installing Windows XP for a bunch of clients and friends, back when AHCI became common, and having to provide a custom driver disk. Thankfully, Windows 10 can take that from USB, rather than having to fiddle around with installation media or CD swap. And indeed, the Intel drivers for VROC include a VMD (Volume Management Device) driver that allows Windows 10 to see the drives and even boot from them!

A Compromising Solution

So after that I managed to get a Windows 10 installed and set up — and one of my biggest worries went away: back when my computer was stolen and I reinstalled Windows 10, the license was still attached to the old machine, I had to call tech support to get it activated, and I wasn’t sure if it would let me re-activate it; it did.

Now, the next step for me was to make sure that the SSD had the latest firmware and was genuine and correctly set up, so I installed Samsung Magician tools, and… it didn’t let me do any of that, because it reported Intel as the provider for the NVMe driver, despite Windows reporting the drive to be supported by their own NVMe driver. I guess what they mean is that the VROC driver interferes with direct access to the devices. But it means you lose access to all SMART counters from Samsung’s own software (I expect other software might still be able to access it), with no genuinity checks and in particular no temperature warning. Given I knew that this had been an issue in the past, this worried me.

As far as I could tell, when using the Hyper M.2, you not only lose access to the SSD manufacturer tooling (like Magician), but I’m not even sure if Windows can still access the TRIM facilities — I didn’t manage to confirm for good, I got an error when I tried using it, but it might have been related to another issue that will become apparent later.

And to fit this all up, if you do decide to move the drives out of the Hyper M.2 card, say to bring them back to the motherboard, you are back to square one with the boot device being inaccessible, because Windows will look for the VROC VMD, which will be gone.

At that point I pretty much decided that the Hyper M.2 card and the whole VROC feature wouldn’t work out for me, too many compromises. I decided to take a different approach, and instead of bringing the NVMe drives away from the M.2 slots, I planned to take the SATA drives away from the M.2 slots.

You see, the M.2 slots can carry either NVMe drives using PCI-E directly, or still common SATA SSDs — the connector is keyed, although I’m not entirely sure why, as there’s nothing preventing to try connecting a SATA M.2 SSD in a connector that only supports NVMe (such as the Hyper M.2), but that’s a different topic that I don’t care to research myself. What matters is that you can buy passive adapters that convert an M.2 SSD to a normal 2.5″ SATA one. You can find those on AliExpress, obviously, but I needed them quickly, so I ordered them from Amazon instead — I got Sabrent ones because they were available for immediate dispatching, but be also careful because they sell both M.2 and mSATA converters, as they all use the same protocol and you just need a passive adapter.

Storage Space and the return of the Hyper M.2

After installing with the two Samsung SSDs on the motherboard’s M.2 slots I finally managed to get the Samsung Magician working, which confirmed not only that the drive is genuine, but also that it already has the latest firmware (good). Unfortunately it also told me that the temperature of the SSD was “too high”, at around 65°C.

The reason for that is that the motherboard predates the more common NVMe drives, and unlike LGR’s, it doesn’t have full aluminium heatsinks to bolt on top of the SSDs to keep the temperature. It came instead with a silly “shield” that might be worse than not having it, and it positioned the first M.2 slot… right underneath the videocar. Oops! Thankfully I do have an adapter with a heatsink that allows me to connect the single SSD to a PCI-E slot without needing to use VROC… the Hyper M.2 card. So I counted for re-opening the computer, moving the 2TB SSD to the Hyper M.2, and be done with that. Easy peasy, and since I already had the card this is probably worth it.

Honestly if I didn’t have the card I would probably have gone for one of those “cards” that have both a passive NVMe adapter and a passive SATA adapter (needing the SATA data cable, but not the power), since at that point I would have been able to keep one SATA SSD on the motherboard (they don’t get as hot it seems), but again, I worked with what I had at hand.

Then, as I said above, I also wanted to take this change to migrate my Dynamic Volumes to the new Storage Spaces, which are supposed to be better supported and include more modern features for SSDs. So once I got everything reinstalled, I tried creating a new pool and setting it up… to no avail. The UI didn’t let me create the pool. Instead I ended up using the command line via PowerShell, and that worked fine.

Though do note the commands on Windows 10 2004/20H1 are different from older Server versions. Which makes looking for solutions on ServerFault and similar very difficult Also it turns out that between deleting Dynamic Volumes from two disks and adding them to a Storage Spaces Pool, you need to reboot your computer. And the default way to select the disk (the “Friendly Name” as Windows 10 calls it) is to use the model number — which makes things interesting when you have two pairs of SSDs with the same name (Samsung doesn’t bother adding the size to the model name as reported by Windows).

And then there’s the kicker, which honestly got me less angry than everything else that went on, but did make me annoyed more than I showed up: Samsung Magician lost access to all the disks connected to the Storage Spaces pool! I assume this is because the moment when they are added to the pool, Windows 10 does not show them in the Disk Management interface either, and Magician is not updated to identify disks at a lower level. It’s probably a temporary situation, but Storage Spaces are also fairly uncommon, so maybe they will not bother fixing that.

The worst part is that even the new SSD disappeared, probably for the reason noted above: it has the same name as the disk that is in the Storage Spaces Pool. Which is what made me facepalm — given I once again lost access to Samsung’s diagnostics, although I confirmed the temperature is fine, the firmware has not changed, and the drive is genuine. I guess VROC would have done just as well, if I confirmed the genuineness before going on with the reinstalling multiple times.

Conclusion

Originally, I was going to say that the Hyper M.2 is a waste of time on Windows. The fact that you can’t actually monitor the device with the Samsung software is more than just annoying — I probably should have looked for alternative monitoring software to see if I could get to the SMART counters over VROC. On Linux of course there’s no issue with that given that Magician doesn’t exist.

But if you’re going to install that many SSDs on Windows, it’s likely you’re likely going to need to use Storage Spaces — in which case the fact that Magician doesn’t work is also moot, as it wouldn’t work either. The only thing you need to do is making sure that you have the drivers to install this correctly in the first place. Using the Hyper M.2 – particularly on slightly older motherboards that don’t have good enough heatsinks for their M.2 slots – turns out to be fairly useful.

Also Storage Spaces, despite being a major pain in the neck to set up on Windows 10, appear to do a fairly good job. Unlike Dynamic Volumes they do appear to balance the writing to multiple SSDs, they support TRIM, and there’s even support for preparing a disk to be removed from the pool, moving everything onto the remaining disks (assuming there’s enough space), and freeing up the drive.

If I’m not getting a new computer any time soon (and I would hope I won’t have to), I have a feeling I’ll go back to use the Hyper M.2 for VROC mode, even if it means reinstalling Windows again. Adding another 2TB or so of space for pictures wouldn’t be the cheapest idea, but it would allow expansion at a decent rate until whatever next technology arrives.

Updating firmware on a Crucial M4 drive, with Linux, no luck

When you think of SSD manufacturers, it might be obvious to think of them as Linux friendly, given they target power users, and Linux users are for the most part are power users. Seems like this is not that true for Crucial. My main personal laptop has, since last year, a 64GB Crucial M4 SSD – given I’ve not been using a desktop computer for a while it does start to feel small, but that’s a different point – which underwent a couple of firmware update since then. In particular, there is a new release 070H that is supposed to fix some nasty power saving issues.

Crucial only provide firmware update utilities for Windows 7 and 8 (two different versions of them), and then they have an utility for “Windows and Mac” — the latter is actually a ZIP file that contains an ISO file… well, I don’t have a CD to burn with me, so my first option was to run the Windows 7 file from my Windows 7 install, which resides on the external SATA harddrive. No luck with that, from what I read on the forums what the upgrader does is simply setting up an EFI application to boot, and then reboot. Unfortunately in my case there are two EFI partitions, because of the two bootable drives, and that most likely messes up with the upgrader.

Okay strike one, let’s look at the ISO file. The ISO is very simple and very small.. it basically is just an ISOLINUX tree that uses memdisk to launch a 2.88MB image file (2.88MB is a semi-standard floppy disk size, which never really went popular for real disks, but has been leveraged by most virtual floppy disk images in bootable CD-Roms for the expanded size). Okay what’s in the image file then? Nothing surprising, it’s a FreeDOS image, with Crucial’s own utility and the firmware.

So if you remember, I had some experience with trying to update BIOS through FreeDOS images, and I have my trusty USB stick with FreeDOS arriving on Tuesday with most of my personal effects that were in Italy waiting for me to find a final place to move my stuff on. But I wanted to see if I could try to boot the image file without needing the FreeDOS stick, so I checked. Grub2 does not include a direct way to load image — the reference to memdisk in their manual refers to the situation where you’re loading a standalone or rescue Grub image, nothing to do with what we care about.

The most obvious way to run the image is through SYSLINUX’s memdisk loader. Debian even has a package that allows you to just drop the images in /boot/images and adds them to the grub menu. Quick and easy, no? Well, no. The problem is that memdisk needs to be loaded with linux16 — and, well, Grub 2 does not support it if you’re booting via EFI, like I am.

I guess I’ll wait until Tuesday night, when my BIOS disk will be here, and I’ll just use it to update the SSD. It should solve the issue once and for all.

Protecting yourself from R-U-Dead-Yet attacks on Apache

Do you remember the infamous “slowloris” attack over HTTP webservers? Well, it turns out there is a new variant of the same technique, that rather than making the server wait for headers to arrive, makes the server wait for POST data before processing; it’s difficult to explain exactly how that works, so I’ll leave it to the expert explanation from ModSecurity

Thankfully, since there was a lot of work done to cover up the slowloris attack, there are easy protections to be put in place, the first of which would be the use of mod_reqtimeout… unfortunately, it isn’t currently enabled by the Gentoo configuration of Apache – see bug #347227 – so the first step is to workaround this limitation. Until the Gentoo Apache team appears again, you can do so simply by making use of the per-package environment hack, sort of what I’ve described in my previous nasty trick spost a few months ago.

# to be created as /etc/portage/env/www-servers/apache

export EXTRA_ECONF="${EXTRA_ECONF} --enable-reqtimeout=static"

*Do note that here I’m building it statically; this is because I’d suggest everybody to build all the modules as static; the overhead of having them as plugins is usually quite higher than what you’d have for loading a module that you don’t care about.*

Now that you got this set up, you should ensure to set a timeout for the requests; the mod_reqtimeout documentation is quite brief, but shows a number of possible configurations. I’d say that in most cases, what you want is simply the one shown in the ModSecurity examples. Please note that they made a mistake there, it’s not RequestReadyTimeout but RequestReadTimeout.

Additionally, when using ModSecurity you can stop the attack on its track after a few requests timed out, by blacklisting the IP and dropping its connections, allowing slots to be freed for other requests to arrive; this can be easily configured through this snipped, taken directly from the above-linked post:

RequestReadTimeout body=30

SecRule RESPONSE_STATUS "@streq 408" "phase:5,t:none,nolog,pass, setvar:ip.slow_dos_counter=+1,expirevar:ip.slow_dos_counter=60"
SecRule IP:SLOW_DOS_COUNTER "@gt 5" "phase:1,t:none,log,drop, msg:'Client Connection Dropped due to high # of slow DoS alerts'"

This should let you cover yourself up quite nicely, at least if you’re using hardened, with grsecurity enforcing per-user limits. But if you’re using hosting where you don’t have decision over the kernel – as I do – there is one further problem: the init script for apache does not respect the system limits at all — see bug #347301 .

The problem here is that when Apache is started during the standard system init, there are no limits set for the session is running from, and since it doesn’t use start-stop-daemon to launch the apache process itself, no limits are applied at all. This results in a quite easy DoS over the whole host as it will easily exhaust the system’s memory.

As I posted on the bug, there is a quick and dirty way to fix the situation by editing the init script itself, and change the way Apache is started up:

# Replace the following:
        ${APACHE2} ${APACHE2_OPTS} -k start

# With this

        start-stop-daemon --start --pidfile "${PIDFILE}" ${APACHE2} -- ${APACHE2_OPTS} -k start

This way at least the system generic limits are applied properly. Though, please note that start-stop-daemon limitations will not allow you to set per-user limits this way.

On a different note, I’d like to spend a few words on telling why this particular vulnerability is interesting to me: this attack relies on long-winded POST requests that might have a very low bandwidth, because just a few bytes are sent before the timeout is hit… it is not unlike the RTSP-in-HTTP tunnelling that I have designed and documented in feng during the past years.

This also means that application-level firewalls will start sooner or later filtering these long-winded requests, and that will likely put the final nail on the coffin of the RTSP-in-HTTP tunnelling. I guess it’s definitely time for feng to move on and implement real HTTP-based pseudo-streaming instead.

Service limits

There is one thing that, by maintaining PAM, I absolutely dread. No, it’s not cross-compilation but rather the handling of pam_limits for what concerns services, and, in particular, start-stop-daemon.

I have said before that if you don’t properly run start-stop-daemon to start your services, pam_limits is not executed and thus you won’t have limits supported by your startup. That is, indeed, the case. Unfortunately, the situation is not as black and white as I hoped, or most people expected.

Christian reported to me the other day he was having trouble getting user-based limits properly respected by services; I also had similar situations before, but I never went as far as checking them out properly. So I went to check it out with his particular use case: dovecot.

Dovecot processes have their user and group set to specific limited users; on the other hand, they have to be started as root to begin with; not only the runtime directories are not writeable but by root, but also it fails to bind the standard ports for IMAP as user (since they are lower than 1024); and it fails further on when starting user processes, most likely because they are partly run with the privileges of the user logging in.

So with the following configuration in /etc/security/limits.conf, what happens?

* hard nproc 50
root hard nproc unlimited
root soft nproc unlimited

dovecot hard nproc 300
dovecot soft nproc 100

The first problem is that, as I said, dovecot is started from root, not the dovecot user; and when the privileges are actually dropped, it happens directly within dovecot, and does not pass through pam_limits! So the obvious answer is, the processes are started with the limits of root, which, with the previous configuration, are unlimited. Unfortunately, as Christian reported, that was not the case; the nproc limit was set to 50 (and that was low enough that gradm killed it.

The first guess was that the user’s session limits are imposed after starting the service; but this is exactly what we’re trying to avoid by using start-stop-daemon. So, we’ve got a problem, at first glance. A quick check on OpenRC’s start-stop-daemon code shows what the problem is:

                if (changeuser != NULL)
                        pamr = pam_start("start-stop-daemon",
                            changeuser, &conv, &pamh);
                else
                        pamr = pam_start("start-stop-daemon",
                            "nobody", &conv, &pamh);

So here’s the problem; unless we’re using the --user parameter, start-stop-daemon is applying limits for user nobody not root. Right now we’ve got a problem; this means that we cannot easily configure per-user limits for the users used by services to drop their privileges, and that is a bad thing. How can we solve this problem?

The first obvious solution is adding something like --user foo --nochuid that would make start-stop-daemon abide to the limits of the user provided, but no call to setgid() or setuid() is performed, leaving that to the software itself to take care of that. This is fast but partly hacky. The second option is not exclusive and actually should probably be implemented anyway: set up the proper file-based capabilities on the software, then run it as the user directly in s-s-d. With maybe a bit of help from pam_cap to set it per-user rather than per-file.

At any rate, this is one thing that we should be looking into. Sigh!

To finish it off, there is one nasty situation that I haven’t checked yet and actually worries me: if you set the “standard user” limits lower than nobody (since that is what the services would get), they can probably workaround the limits by using start-stop-daemon to start their own code; I’m not sure if it works, but if it does, we’ve got a security issue on at least openrc at our hands!