MSI X299 SLI PLUS problems and solutions

Last year, I posted about an issue with missing BitLocker and PIN authentication with my replacement Gamestation build. While it does not look like this is a particularly popular post, I did confirm that at least a couple of people managed to get good use out of that blog post.

As usual, my Twitter feed contains spoilers of this blog post, as I have ranted, complained, and asked questions (mostly to Jo) trying to figure out my Windows problems. The reason I’m writing this down is as usual as a reference to myself, so I don’t repeat the same mistakes over and over again, and as a reference for others, since particularly one of the error codes I’m going to talk about appears to find almost exclusively scammy “PC fixing” websites. And yes I know that I’m repeating the word BIOS later while this is clearly an UEFI board, but MSI calls it as such, and to be honest for most non-technical folks the differences between the two terms don’t exist.

All long help threads should have a sticky globally-editable post at the top saying ‘DEAR PEOPLE FROM THE FUTURE: Here’s what we’ve figured out so far …’

First of all, as noted in the previous post, it looks like nearly all of the settings in the BIOS are lost at any upgrade of the firmware. This is particularly annoying when it looks like a lot of the updates are early boot microcode updates to cover the increasing complexity of mitigating Spectre-style vulnerabilities, and reasonably shouldn’t need to change the semantics or format of settings such as Secure Boot, TPM settings, or smart fan configuration.

So make sure to take good screenshots of all your settings before updating your firmware, as otherwise you’ll fight for hours trying to reconfigure it as you had it before.

Your computer is not resuming from sleep when you press the power button. This appears to be common, I’ve found a bunch of forums posts by people complaining about this behaviour on a number of MSI motherboards. Most of them appears to be in the form of DenverCoder9, although with a little more detail: people claiming they solved the issue by either downgrading or upgrading the motherboard’s BIOS. Not wanting to downgrade my BIOS and having just upgraded it, I wanted to find a better answer, and turns out I probably did find it. Here’s the solution: disable GO2BIOS feature.

Some more details, which can be useful for others in the future if they encounter similar issues and the solution I’m providing is not helping them. The GO2BIOS feature by MSI is a shortcut to enter the BIOS configuration screen without using the keyboard, and it’s particularly handy once you enable all the fast-boot options, as the keyboard might not respond at all. To force entering the BIOS configuration, then, you just need to keep pressed the power button for four seconds when you turn on the computer. That’s what clued me to the connection between the setting and the failure to resume, as they both related to the power button.

The reason why downgrading or upgrading the BIOS appeared to solve the issue is the one I noted above: all firmware updates on these boards appear to completely reset the settings to defaults, and the GO2BIOS feature is not enabled by default (and probably few people would consider re-enabling it in the hurry.)

Windows 10 bluescreens with WHEA_UNCORRECTABLE_ERROR. This is trickier, mostly because all of the search hits for this particular code appears to point at very dodgy websites, and the only hit I could find on the Microsoft website was for a forum post where it was suggested that the particular code I was saying was related to AMD CPUs. Since my machine is an i7, that made no sense whatsoever.

The WHEA in the name stands for Windows Hardware Error Architecture, which suggested that the cause of the bluescreen is caused by something like a Machine-Check Exception. This was particularly scary because it started happening right after I installed a new NVMe SSD, which appeared to get very warm, leading me to first install two more fans, and then replacing the original fans with PWM ones.

During this “ordeal” I also had been installing and updating quite a few pieces of software, related to CPU, motherboard, the Kraken cooler, and so on. And since I had just updated the BIOS I also had been tweaking a lot of parameters around, including tried re-enabling the auto-over-clock feature that, as I discussed previously, appears to be implemented mostly in firmware.

Eventually, I found that I solved the problem by uninstalling MSI’s Control Center software. I had already previously disabled the OC assistant, but even with that I kept receiving random blue screens when browsing websites, or just opening Lightroom. Since I uninstalled the Control Center software I have not experienced a single one for a few days. And that including a “torture test” with Prime95 that brought the CPU to 100C and to thermal throttling.

I’m not sure what the root cause for this is. I can only imagine that there’s some strange interaction between the firmware and the software that was not quite well tested. Or maybe there’s a new update on Windows 10 that caused Control Center to fight for resources. But whatever the reason it seems the right thing to do was to remove MSI’s software, which anyway does not really do anything you can’t do in the BIOS configuration screen.

I hope this post can find its way to those looking for answers for these (or similar enough) issues. And if you find that there are other possible causes for this, feel free to leave a comment on the post.

Windows 10: what to do if BitLocker and PIN stop working after update

I don’t really like the idea of having to write about proprietary software here, but I only found terrible alternative suggestions on the eb so I thought I would at least try to write down about it in the hope to avoid people falling for very bad advice.

The problem: after updating my BIOS, BitLocker asks for the key at boot, and PIN login for Windows 10 (Microsoft Account) fails, requiring to log in with the full Microsoft account password. Trying to re-enable the PIN fails with the error message “Sorry, there was a problem signing you in”.

The amount of misleading information I found on the Internet is astonishing, including a phrase from what appeared to be a Microsoft support person, stating «The operating system should always go with the BIOS. If the BIOS is freshly updated, then the OS has to be fresh as well.» Facepalms all over the place.

The solution (for me): go back in the BIOS and re-enable the TPM (“Security Module”).

Some background is needed. The 2017 Gamestation I’m using nowadays is built using a MSI X299 SLI PLUS with a plug-in TPM which is a requirement to use BitLocker (and if you think that makes it very safe, think again).

I had just updated the firmware of the motherboard (that at this point we all still call “BIOS” despite being clearly “UEFI” based), and it turns out that MSI just silently drop most of the customization to the BIOS settings after update. In particular this disabled a few required settings, including the TPM itself (and Secure Boot — I wonder if Matthew Garrett would have some words about the implementation of it in this particular board at this point).

I see reports on this for MSI and Gigabyte boards alike, so I can assume that Gigabyte does the same, and requires re-enabling the TPM in the settings when updating the BIOS version.

I would probably say that the MSI firmware engineering does not quite fully convince me. It’s not just the dropping all the settings on update (which I still find is a bad thing to do for users), but also the fact that one of the settings is explicitly marked as “crypto miner mode” — I’m sure looking forward for the time after the bubble bursts so that we don’t have to pay premium for decent hardware just because someone thinks they can make money from nothing. Oh well.

Anyone working on motherboard RGB controllers?

I have been contacted by email last week by a Linux user, probably noticing my recent patch for the gpio_it87 driver in the kernel. They have been hoping my driver could extend to IT7236 chips that are used in a number of gaming motherboards for controlling RGB LEDs.

Having left the case modding world after my first and only ThermalTake chassis – my mother gave me hell for the fans noise, mostly due to the plexiglass window on the side of the case – I still don’t have any context whatsoever on what the current state of these boards is, whether someone has written generic tools to set the LEDs, or even UIs for them. But it was an interesting back and forth of looking for leads into figuring out what is needed.

The first problem is, like most of you who already know a bit about electrical engineering and electronics, that clearly the IT7236 chip is clearly not the same series as the IT87xx chips that my driver supports. And since they are not the same series, they are unlikely to share the same functionality.

The IT87xx series chips are Super I/O controllers, which mean they implement functionality such as floppy-disk controllers, serial and parallel ports and similar interfaces, generally via the LPC bus. You usually know these chip names because these need to be supported by the kernel for them to show up in sensors output. In addition to these standard devices, many controllers include at least a set of general purpose I/O (GPIO) lines. On most consumer motherboards these are not exposed in any way, but boards designed for industrial applications, or customized boards tend to expose those lines easily.

Indeed, I wrote the gpio_it87 driver (well, actually adapted and extended it from a previous driver), because the board I was working on in Los Angeles had one of those chips, and we were interested in having access to the GPIO lines to drive some extra LEDs (and possibly in future versions more external interfaces, although I don’t know if anything was made of those). At the time, I did not manage to get the driver merged; a couple of years back, LaCie manufactured a NAS using a compatible chip, and two of their engineers got my original driver (further extended) merged into the Linux kernel. Since then I only submitted one other patch to add another ID for a compatible chip, because someone managed to send me a datasheet, and I could match it to the one I originally used to implement the driver as having the same behaviour.

Back to the original topic, the IT7236 chip is clearly not a Super I/O controller. It’s also not an Environmental Control (EC) chip, as I know that series is actually IT85xx, which is what my old laptop had. Somewhat luckily though, a “Preliminary Specifications” datasheet for that exact chip is available online from a company that appears to distribute electronics component in the general sense. I’m not sure if that was intentional or not, but having the datasheet is always handy of course.

According to these specifications, the IT7236xFN chips are “Touch ASIC Cap Button Controllers”. And indeed, ITE lists them as such. Comparing this with a different model in the same series shows that probably LED driving was not their original target, but they came to be useful for that. These chips also include an MCU based on a 8051 core, similarly to their EC solution — this makes them, and in particular the datasheet I found earlier, a bit more interesting to me. Unfortunately the datasheet is clearly amended to be the shorter version, and does not include a programming interface description.

Up to this point this tells us exactly one thing only: my driver is completely useless for this chip, as it implements specifically the Super I/O bus access, and it’s unlikely to be extensible to this series of chips. So a new driver is needed and some reverse engineering is likely to be required. The user who wrote me also gave me two other ITE chip names found on the board they have: IT87920 and IT8686 (which appears to be a PWN fan controller — I couldn’t find it on the ITE website at all). Since the it87 (hwmon) driver is still developed out-of-kernel on GitHub, I checked and found an issue that appears to describe a common situation for gaming motherboards: the fans are not controlled with the usual Super I/O chip, but with a separate (more accurate?) one, and that suggests that the LEDs are indeed controlled by another separate chip, which makes sense. The user ran strings on the UEFI/BIOS image and did indeed find modules named after IT8790 and IT7236 (and IT8728 for whatever reason), to confirm this.

None of this brings us any closer to supporting it though, so let’s take a loop at the datasheet, and we can see that the device has an I²C bus, instead of the LPC (or ISA) bus used by Super I/O and the fan controller. Which meant looking at i2cdev and lsi2c. Unfortunately the output can only see that there are things connected to the bus, but not what they are.

This leaves us pretty much dry. Particularly me since I don’t have hardware access. So my suggestion has been to consider looking into the Windows driver and software (that I’m sure the motherboard manufacturer provides), and possibly figure out if they can run in a virtualized environment (qemu?) where I²C traffic can be inspected. But there may be simpler, more useful or more advanced tools to do most of this already, since I have not spent any time on this particular topic before. So if you know of any of them, feel free to leave a comment on the blog, and I’ll make sure to forward them to the concerned user (since I have not asked them if I can publish their name I’m not going to out them — they can, if they want, leave a comment with their name to be reached directly!).

Motherboard review: ASUS vs MSI

You may remember last year I bought a gamestation to play games at home (and that means running Windows on it). Last month, I had to do a relatively big change: replace the motherboard altogether. And since I now managed to compare two motherboards of about the same generation, I thought I can give a bit of a comparative review of the two.

My original motherboard was an ASUS X99-S (which right now has an absolutely crazy price!) which I coupled with an Intel 5930K (which is not sold anymore). The motherboard on paper is great, SATA3, m.2 and so on, and it may actually be good if it’s not a broken one, but mine clearly was.

The first glitch I noticed, but not paid enough attention to, was related to the USB 3 ports. While all the ports worked fine, I never managed to install the ASMedia drivers, even though the ASMedia controller was meant to be backing some of the ports, and SysRescCD was actually seeing them fine. This bothered me for a while when I had performance issues on one of my devices, but otherwise it seemed ok.

The second problem was tricky to pin down exactly if it was always there or if it was an update causing it. When I bought the Gamestation, the memory was expensive so I only got 32GB of it. A few months later, I had some spare pocket money (well, I got some bonuses that I wanted to exchange for some gratification) and bought 32GB more. Stupidly, I don’t remember if I checked if it worked fine, just trusted it. A few months later, while trying to do some big processing in Lightroom, I came to notice that Windows only saw half of the RAM. I thought it was a bad bank or something like that, but any combination of shuffling the RAM around would only have Windows seeing 32GB of it. Even though CPU-Z would see all eight banks in.

At that point, Nikolaj suggested it could be an ME problem, so I went on and re-flashed the BIOS from scratch with an SPI flash adapter, but that didn’t help. Re-seating the CPU also didn’t help. I was appalled, but it was not enough to replace the board just yet, so I put the extra RAM to the side and soldiered on. I was wrong.

Last November, literally the day after my birthday, I came back home from a trip and wanted to download some dozens of GB of pictures I took… and my computer wouldn’t boot. The bootcode showed the system blocked in a CSM (compatibility system mode) failure. Trying all the permutation of things to change helped nothing, so it was either the motherboard or the CPU — I took a bet on the motherboard given the previous history, and ordered a MSI X99 SLI Plus while I was in the US — it was significantly cheaper than in Europe.

My hunch was right and indeed, the new motherboard solved the problem. The specs between the two are about the same, actually, there is the same ASMedia USB controller, though this time the drivers install correctly, all the RAM is actually seen by the system now, and of course the computer boots. But this is just the very superficial look at it. There is something else.

Both ASUS and MSI provide software utilities for overclocking, as it is expected for motherboards designed for the Haswell-E family of processors. But the approach the two take is significantly different. ASUS encodes most of the logic in the software itself it appears, with their “DIP5” core, while MSI appears to keep it in the firmware (that also appears to make the boot process a bit slower).

ASUS utility pack is called “AISuite”, and the major version is tied to the board’s generation, version 3 for the X99 motherboards. While there has been at least one update since the time I bought the card, the last version released for the suite was on 2015-07-28. In addition to the overclocking UI, the suite includes a handful of other board-specific tools: one to set the bulk transfer mode sizes (to provide higher performance on USB3 non-UAS devices, not needed on Linux as the kernel does the right thing by default), one to allow faster charge on iPhone devices, and so on so forth. Some of this is actually quite useful, for instance the faster USB transfer actually is useful, although it also has the side effect of stopping WD SmartWave tools from recognizing the drive, and so break your backups if you decided to use WD’s own tool rather than Microsoft’s.

On the other hand, a release for the DIP5 core was released on 2016-06-29, to support the new CPUs — their 2011-3 socket is full-pin, which allowed them to support a further generation of CPUs with only firmware updates. This is effectively an update for the various drivers needed for the underlying overclocking system, as well as a complete overhaul of the Suite UI — which is likely due to actually applying a newer-generation Suite to the motherboard.

Unfortunately, the new Suite UI does not come with a new set of add-ons for charger, USB, etc. This would be okay, except the add-ons ABI changed: the moment you open the Suite app, you have to press Enter so many times, as it tries to fetch icon files that do not exist. Copying the old PNG files into the new path makes it stop throwing up these errors, but the UI clearly shows the wrong icons.

Oh and by the way, starting AISuite with a different motherboard causes Windows 10 to blue-screen. I know because after booting my gamestation with the new motherboard I was welcome by the blue screen of death and I had a sagging feeling of dismay, expecting the CPU to be broken instead (turns out no, it was all the AISuite’s fault).

What about MSI’s app then? Well, their approach appears to be significantly different: first of all the overclocking app only has the overclocking function — they rely on ASMedia’s own tooling and drivers for the USB bulk transfer reconfiguration, and provide an optional tool for the charging options. In the spirit of not reimplementing stuff, they also don’t require any new Windows driver for this, requiring you to install the Intel ME drivers instead… which was fun because the copy I had installed from before the motherboard replacement was newer than the one MSI provides on their website.

And this makes the MSI utility more interesting: last update 2016-12-06, since they use the same exact package for all their boards, it includes no board-specific features and no drivers, so updating it is significantly simpler for them.

The end result is that I’m fairly happy. MSI does not have the tons of crapware that ASUS appears to provide for their boards. They do come with a “Live Update” tool, which I wouldn’t trust, even though I have not tested. Too many of those apps have forgot to implement HTTPS, certificate validation or pinning, making them extremely risky to run, which is unfortunate.

An aside, when you replace the motherboard of your computer, most systems that use computer authorization will consider it a new computer. Including Microsoft’s own Windows 10 license handling, as the Windows 10 license is tied to a EFI variable, for what I remember.

Of all those systems, Microsoft’s was the easiest to deal with, though. The system booted as unactivated, and they do try to point you towards buying a new license, burying the right interface behind “Troubleshooting”, but once you say “I changed hardware recently”, it allows you to just replace the previous computer authorization with the current one.

Both Google Play Music and iTunes require authorizing an additional computer, and that makes it a problem if you are close to the limit (because then you may have to unauthorize them all and then re-authorize them. Stupid DRMs.