MSI X299 SLI PLUS problems and solutions

Last year, I posted about an issue with missing BitLocker and PIN authentication with my replacement Gamestation build. While it does not look like this is a particularly popular post, I did confirm that at least a couple of people managed to get good use out of that blog post.

As usual, my Twitter feed contains spoilers of this blog post, as I have ranted, complained, and asked questions (mostly to Jo) trying to figure out my Windows problems. The reason I’m writing this down is as usual as a reference to myself, so I don’t repeat the same mistakes over and over again, and as a reference for others, since particularly one of the error codes I’m going to talk about appears to find almost exclusively scammy “PC fixing” websites. And yes I know that I’m repeating the word BIOS later while this is clearly an UEFI board, but MSI calls it as such, and to be honest for most non-technical folks the differences between the two terms don’t exist.

All long help threads should have a sticky globally-editable post at the top saying ‘DEAR PEOPLE FROM THE FUTURE: Here’s what we’ve figured out so far …’

First of all, as noted in the previous post, it looks like nearly all of the settings in the BIOS are lost at any upgrade of the firmware. This is particularly annoying when it looks like a lot of the updates are early boot microcode updates to cover the increasing complexity of mitigating Spectre-style vulnerabilities, and reasonably shouldn’t need to change the semantics or format of settings such as Secure Boot, TPM settings, or smart fan configuration.

So make sure to take good screenshots of all your settings before updating your firmware, as otherwise you’ll fight for hours trying to reconfigure it as you had it before.

Your computer is not resuming from sleep when you press the power button. This appears to be common, I’ve found a bunch of forums posts by people complaining about this behaviour on a number of MSI motherboards. Most of them appears to be in the form of DenverCoder9, although with a little more detail: people claiming they solved the issue by either downgrading or upgrading the motherboard’s BIOS. Not wanting to downgrade my BIOS and having just upgraded it, I wanted to find a better answer, and turns out I probably did find it. Here’s the solution: disable GO2BIOS feature.

Some more details, which can be useful for others in the future if they encounter similar issues and the solution I’m providing is not helping them. The GO2BIOS feature by MSI is a shortcut to enter the BIOS configuration screen without using the keyboard, and it’s particularly handy once you enable all the fast-boot options, as the keyboard might not respond at all. To force entering the BIOS configuration, then, you just need to keep pressed the power button for four seconds when you turn on the computer. That’s what clued me to the connection between the setting and the failure to resume, as they both related to the power button.

The reason why downgrading or upgrading the BIOS appeared to solve the issue is the one I noted above: all firmware updates on these boards appear to completely reset the settings to defaults, and the GO2BIOS feature is not enabled by default (and probably few people would consider re-enabling it in the hurry.)

Windows 10 bluescreens with WHEA_UNCORRECTABLE_ERROR. This is trickier, mostly because all of the search hits for this particular code appears to point at very dodgy websites, and the only hit I could find on the Microsoft website was for a forum post where it was suggested that the particular code I was saying was related to AMD CPUs. Since my machine is an i7, that made no sense whatsoever.

The WHEA in the name stands for Windows Hardware Error Architecture, which suggested that the cause of the bluescreen is caused by something like a Machine-Check Exception. This was particularly scary because it started happening right after I installed a new NVMe SSD, which appeared to get very warm, leading me to first install two more fans, and then replacing the original fans with PWM ones.

During this “ordeal” I also had been installing and updating quite a few pieces of software, related to CPU, motherboard, the Kraken cooler, and so on. And since I had just updated the BIOS I also had been tweaking a lot of parameters around, including tried re-enabling the auto-over-clock feature that, as I discussed previously, appears to be implemented mostly in firmware.

Eventually, I found that I solved the problem by uninstalling MSI’s Control Center software. I had already previously disabled the OC assistant, but even with that I kept receiving random blue screens when browsing websites, or just opening Lightroom. Since I uninstalled the Control Center software I have not experienced a single one for a few days. And that including a “torture test” with Prime95 that brought the CPU to 100C and to thermal throttling.

I’m not sure what the root cause for this is. I can only imagine that there’s some strange interaction between the firmware and the software that was not quite well tested. Or maybe there’s a new update on Windows 10 that caused Control Center to fight for resources. But whatever the reason it seems the right thing to do was to remove MSI’s software, which anyway does not really do anything you can’t do in the BIOS configuration screen.

I hope this post can find its way to those looking for answers for these (or similar enough) issues. And if you find that there are other possible causes for this, feel free to leave a comment on the post.

Windows 10: what to do if BitLocker and PIN stop working after update

I don’t really like the idea of having to write about proprietary software here, but I only found terrible alternative suggestions on the eb so I thought I would at least try to write down about it in the hope to avoid people falling for very bad advice.

The problem: after updating my BIOS, BitLocker asks for the key at boot, and PIN login for Windows 10 (Microsoft Account) fails, requiring to log in with the full Microsoft account password. Trying to re-enable the PIN fails with the error message “Sorry, there was a problem signing you in”.

The amount of misleading information I found on the Internet is astonishing, including a phrase from what appeared to be a Microsoft support person, stating «The operating system should always go with the BIOS. If the BIOS is freshly updated, then the OS has to be fresh as well.» Facepalms all over the place.

The solution (for me): go back in the BIOS and re-enable the TPM (“Security Module”).

Some background is needed. The 2017 Gamestation I’m using nowadays is built using a MSI X299 SLI PLUS with a plug-in TPM which is a requirement to use BitLocker (and if you think that makes it very safe, think again).

I had just updated the firmware of the motherboard (that at this point we all still call “BIOS” despite being clearly “UEFI” based), and it turns out that MSI just silently drop most of the customization to the BIOS settings after update. In particular this disabled a few required settings, including the TPM itself (and Secure Boot — I wonder if Matthew Garrett would have some words about the implementation of it in this particular board at this point).

I see reports on this for MSI and Gigabyte boards alike, so I can assume that Gigabyte does the same, and requires re-enabling the TPM in the settings when updating the BIOS version.

I would probably say that the MSI firmware engineering does not quite fully convince me. It’s not just the dropping all the settings on update (which I still find is a bad thing to do for users), but also the fact that one of the settings is explicitly marked as “crypto miner mode” — I’m sure looking forward for the time after the bubble bursts so that we don’t have to pay premium for decent hardware just because someone thinks they can make money from nothing. Oh well.

Motherboard review: ASUS vs MSI

You may remember last year I bought a gamestation to play games at home (and that means running Windows on it). Last month, I had to do a relatively big change: replace the motherboard altogether. And since I now managed to compare two motherboards of about the same generation, I thought I can give a bit of a comparative review of the two.

My original motherboard was an ASUS X99-S (which right now has an absolutely crazy price!) which I coupled with an Intel 5930K (which is not sold anymore). The motherboard on paper is great, SATA3, m.2 and so on, and it may actually be good if it’s not a broken one, but mine clearly was.

The first glitch I noticed, but not paid enough attention to, was related to the USB 3 ports. While all the ports worked fine, I never managed to install the ASMedia drivers, even though the ASMedia controller was meant to be backing some of the ports, and SysRescCD was actually seeing them fine. This bothered me for a while when I had performance issues on one of my devices, but otherwise it seemed ok.

The second problem was tricky to pin down exactly if it was always there or if it was an update causing it. When I bought the Gamestation, the memory was expensive so I only got 32GB of it. A few months later, I had some spare pocket money (well, I got some bonuses that I wanted to exchange for some gratification) and bought 32GB more. Stupidly, I don’t remember if I checked if it worked fine, just trusted it. A few months later, while trying to do some big processing in Lightroom, I came to notice that Windows only saw half of the RAM. I thought it was a bad bank or something like that, but any combination of shuffling the RAM around would only have Windows seeing 32GB of it. Even though CPU-Z would see all eight banks in.

At that point, Nikolaj suggested it could be an ME problem, so I went on and re-flashed the BIOS from scratch with an SPI flash adapter, but that didn’t help. Re-seating the CPU also didn’t help. I was appalled, but it was not enough to replace the board just yet, so I put the extra RAM to the side and soldiered on. I was wrong.

Last November, literally the day after my birthday, I came back home from a trip and wanted to download some dozens of GB of pictures I took… and my computer wouldn’t boot. The bootcode showed the system blocked in a CSM (compatibility system mode) failure. Trying all the permutation of things to change helped nothing, so it was either the motherboard or the CPU — I took a bet on the motherboard given the previous history, and ordered a MSI X99 SLI Plus while I was in the US — it was significantly cheaper than in Europe.

My hunch was right and indeed, the new motherboard solved the problem. The specs between the two are about the same, actually, there is the same ASMedia USB controller, though this time the drivers install correctly, all the RAM is actually seen by the system now, and of course the computer boots. But this is just the very superficial look at it. There is something else.

Both ASUS and MSI provide software utilities for overclocking, as it is expected for motherboards designed for the Haswell-E family of processors. But the approach the two take is significantly different. ASUS encodes most of the logic in the software itself it appears, with their “DIP5” core, while MSI appears to keep it in the firmware (that also appears to make the boot process a bit slower).

ASUS utility pack is called “AISuite”, and the major version is tied to the board’s generation, version 3 for the X99 motherboards. While there has been at least one update since the time I bought the card, the last version released for the suite was on 2015-07-28. In addition to the overclocking UI, the suite includes a handful of other board-specific tools: one to set the bulk transfer mode sizes (to provide higher performance on USB3 non-UAS devices, not needed on Linux as the kernel does the right thing by default), one to allow faster charge on iPhone devices, and so on so forth. Some of this is actually quite useful, for instance the faster USB transfer actually is useful, although it also has the side effect of stopping WD SmartWave tools from recognizing the drive, and so break your backups if you decided to use WD’s own tool rather than Microsoft’s.

On the other hand, a release for the DIP5 core was released on 2016-06-29, to support the new CPUs — their 2011-3 socket is full-pin, which allowed them to support a further generation of CPUs with only firmware updates. This is effectively an update for the various drivers needed for the underlying overclocking system, as well as a complete overhaul of the Suite UI — which is likely due to actually applying a newer-generation Suite to the motherboard.

Unfortunately, the new Suite UI does not come with a new set of add-ons for charger, USB, etc. This would be okay, except the add-ons ABI changed: the moment you open the Suite app, you have to press Enter so many times, as it tries to fetch icon files that do not exist. Copying the old PNG files into the new path makes it stop throwing up these errors, but the UI clearly shows the wrong icons.

Oh and by the way, starting AISuite with a different motherboard causes Windows 10 to blue-screen. I know because after booting my gamestation with the new motherboard I was welcome by the blue screen of death and I had a sagging feeling of dismay, expecting the CPU to be broken instead (turns out no, it was all the AISuite’s fault).

What about MSI’s app then? Well, their approach appears to be significantly different: first of all the overclocking app only has the overclocking function — they rely on ASMedia’s own tooling and drivers for the USB bulk transfer reconfiguration, and provide an optional tool for the charging options. In the spirit of not reimplementing stuff, they also don’t require any new Windows driver for this, requiring you to install the Intel ME drivers instead… which was fun because the copy I had installed from before the motherboard replacement was newer than the one MSI provides on their website.

And this makes the MSI utility more interesting: last update 2016-12-06, since they use the same exact package for all their boards, it includes no board-specific features and no drivers, so updating it is significantly simpler for them.

The end result is that I’m fairly happy. MSI does not have the tons of crapware that ASUS appears to provide for their boards. They do come with a “Live Update” tool, which I wouldn’t trust, even though I have not tested. Too many of those apps have forgot to implement HTTPS, certificate validation or pinning, making them extremely risky to run, which is unfortunate.

An aside, when you replace the motherboard of your computer, most systems that use computer authorization will consider it a new computer. Including Microsoft’s own Windows 10 license handling, as the Windows 10 license is tied to a EFI variable, for what I remember.

Of all those systems, Microsoft’s was the easiest to deal with, though. The system booted as unactivated, and they do try to point you towards buying a new license, burying the right interface behind “Troubleshooting”, but once you say “I changed hardware recently”, it allows you to just replace the previous computer authorization with the current one.

Both Google Play Music and iTunes require authorizing an additional computer, and that makes it a problem if you are close to the limit (because then you may have to unauthorize them all and then re-authorize them. Stupid DRMs.

BIOSing away

On my so-called dayjob (that more than often ends up keeping me up late at night), I often have to update motherboards’ BIOS data — it happened more than once that USB didn’t really work correctly with some device until the BIOS was updated. I have written about it before and fortunately since then, flashrom started supporting a huge number of motherboards, making it the first choice to update BIOS chips, when they are supported.

Now, when they aren’t, I’m not much of an expert in adding support for them; I have been able to send one patch (which hasn’t been applied still unfortunately) to simply list an extra motherboard, but since most of the boards I have at hand are not mine, too often I cannot go on and try — most of the time I don’t even have the time to do so.

The procedure I have written two and a half years ago would still produce a working bootable USB sticks, but it’s messy; googling around I found an (italian) tutorial (Update (2016-04-29): this has been replaced by an English tutorial) that based itself on installing FreeDOS itself on an USB stick through qemu. And that gave me a very nice idea to solve this cleanly without requiring strange software or dd commands.

For completeness sake, I’d like to add that SysRescueCD’s FreeDOS bootable image seems to crash with invalid instruction errors on almost all the systems I tried it on. No clue as to why.

You need an USB stick of at most 2GB; this ensures that it can be used with FAT16, for maximum compatibility, and the trick will only work properly on systems that can properly boot an USB strick in HDD compatibility mode, which seems to be any system capable of starting off an USB stick at all. In 2GB you can easily add quite a few firmware files, which is something I’m happy with, even if most of the time I don’t need them again after the first time.

For what concerns software, you only need qemu, a copy of FreeDOS (both the Full and Base versions would do), and some good old remembrance of DOS commands). Start up QEmu with the USB stick as the harddrive:

# qemu -hda /dev/sdm -cdrom fdbasecd.iso -boot d

Then use FDISK (the DOS variant) to create a single primary partition in the drive. Using the DOS FDISK command ensures that you will have a DOS-compatible partition without having to deal with sizes, alignments, types, and so on.

After re-booting FreeDOS from the CD with the newly-created partition, simply format it, from FreeDOS, and make it bootable:

A:> FORMAT /S C:

This copies the minimum required files to boot the disk up, which are well enough to actually update a BIOS. Contrarily to the tutorial I linked above, I don’t have any intention to install FreeDOS, which then would require editing the startup files to disable things. The straight boot is just what I need.

Also, a little funny note here. Today’s box to set up had a MSI motherboard; beside “International” being spelt wrong in the original 1.00 BIOS’s vendor strings, the specs page of the board on MSI’s site states very clearly that “for chipset limitations” neither Windows 98 nor ME can be used on the board… given it’s a Pentium4-era board, that went without saying for me. But on the BIOS update documentation, they still insist on telling you to use a Windows 98 or ME boot disk! Isn’t that lovely?

At any rate, now I have my BIOS update stick, and I’m pretty happy with it. Love FreeDOS.