This is a bit of a strange post for the blog, but follows some of my previous attempts at writing down what I think is the solution to problems that are hard to find on Google, in an attempt to avoid DenverCoder9 Syndrome. So ride along with me for some troubleshooting answers — if you’re here to find the cause, and possible solutions, jump to “The Root Cause”!
The Setup: Nest Thermostat E
A couple of years ago, I bought two Nest Thermostat E thermostats for my mother: her house uses a heating system with two separate zones for upstairs and downstairs, and it’s been always difficult to manage, particularly as the thermostat sensors themselves are placed in particularly annoying locations. Sounds familiar? Also, since at the time the heating was upgraded (from a diesel/wood combo that hadn’t seen a drop of diesel in years, to a LPG system — don’t get me started on that) we were penny-pinching (seriously, don’t get me started), we didn’t install a timer-based thermostat, but one with a physical three-way switch “day, off, night.” Not that a timer-based approach would do much good to my mother, who has an even less fixed schedule than me, as I spoke of before.
The physical switch is an annoyance at best in most cases, but with my mother getting older, and having a harder time to walk up and down the stairs, it means that for a while she didn’t turn on the heating upstairs until she actually went to bed, being welcomed in a humid and cold bedroom — not ideal. Thus, a smartphone controlled thermostat was a good idea, in a similar way as the smart lights that came later: she can turn on the heating just before getting ready to bed, and by the time she’s in the room, she has the temperature she wants.
Now, the reason to choose the Nest Thermostat E were multiple: first, because at the time I was still working at Google so I got a bit of a discount, but most importantly it had the features that we wanted out of it: the ability to separate the control bit (which the Nest calls Heatlink) from the sensing bit (the actual thermostat) meant that instead of turning the heating on or off based on a single corner of the upstairs hallway, she’s actually controlling the temperature of her room (please see this Technology Connections video on thermostats for this phrase to make sense.) And because the hot water is a different story altogether in Italy, the Nest Thermostat E does not come with a water heater relay at all.
Also, at the time the Nest system was decent enough that you wouldn’t even need the phone to control it: you could use the Nest website instead, which my mother found a lot easier to use than the phone. Unfortunately I think all of those features have gone now that the whole control of it is over Google Home.
Yes, this does mean that on a very theoretical level, my mother would enjoy Home Assistant better than most other smart home solutions. But we’re not where we should be with low maintenance and long term support, so no, I don’t think it’s a good idea. It would definitely have been different if I still lived with her, in which case I would probably have used ESPHome-based relays and CGG1 sensors. Like I’m doing at home right now!
The Problem: Thermostats Disconnected, Can’t Be Setup
Over this past summer, my mother complained that the thermostats were disconnecting and she couldn’t control them on the phone anymore. Each thermostat could reach its matching Heatlink, so the heating was still working, but she couldn’t get it working on the phone anymore. Not the biggest of deals at first, particularly while it’s not that cold outside, but something to get sorted sooner or later. Once it became quite obvious I wouldn’t be able to visit her this year, my sister went to try to help her out.
Unfortunately, after spending some time resetting one of them to factory default and trying to follow the instructions to reconnect it… it kept not working. And this was worse, because once you factory reset it, you cannot use the thermostat at all until it’s getting connected back to the network and your Nest account! That’s a freaking big deal because it means my mother was left without heating at that point. She called me up for help.
The problem was, documentation is scarce. Error TD007 that comes up on the app is considered a wireless network connection problem in the official documentation (I’m not linking to it because Google keeps moving it around and it’ll be a dead link soon), and it suggests checking your WiFi password and Internet connection. Both were obviously not the source of the problem: my sister knew the password, and we were talking with a Portal, so Internet was connected.
With a special shout out to Tailscale (and not because some of my ex colleagues work there!), it was trivial to access to my mother’s router (a H388X leased to her by TIM, her ISP) from my laptop, and I could see that indeed, the thermostat would connect to the WiFi just fine, and even get an IP address! So that wouldn’t be the problem now, would it?
The Root Cause: Dishonest DNS
Well, turns out that a number of pages down the Google search results, I found that someone complained about the DNS servers of their ISP causing the problem for them.
See, over the past ten years and more, it became practice for ISPs to run caching DNS servers that do not just return you the true answer for queries, but also a number of fake answers for domains that are not registers, or at least have no DNS, in the hope to have you see a landing page with ads and referral links. This is common enough that a number of applications and hardware do attempt explicitly to resolve non-existing hostnames to figure out how those are presented to the user.
This was particularly more egregious before HTTPS became prevalent, and it was not unheard of providers rewriting the Google hostnames’ addresses so that they could inject ads, or at least track searches. Among these was OpenDNS – nowadays owned by Cisco – which was one of the earliest public, open-to-all DNS services with the ability to avoid captive portals and other malware, but that injected their own reverse proxies every time you visited Google. Which has to be the main purpose for which Google Public DNS was made public — after all, it is widely known as Honest DNS.
Well, turns out that most ISPs in Italy do the same. Not just for non-existing hostnames, sometimes for a number of existing ones that the Italian authorities declared illicit, requiring the ISP to stop you browsing those pages. Think The Pirate Bay and worst. And some ISPs, including my mother’s, use this latter point to make it hard if not impossible for you to change the DNS!
We did confirm that the problem only existed with my mother’s connection at that point: my sister set up a hotspot on her phone, and tried the setup through that: it worked! So at that point it was either DNS or a ISP-level block — but I was ready to bet on the DNS, because I knew TIM is using a “dishonest DNS.”
So indeed, TIM’s H388X with firmware version 1.2.0 (the latest at time of writing) is configured so you cannot change the DNS server addresses it provides to DHCP clients. When you look online for ideas on how to address this, you get a bunch of useless suggestions to change the addresses on the computers themselves — useless because in this case Nest Thermostat E does not have a way to do that. You need your DHCP server to provide valid DNS servers!
But, if you can do that, change the DNS servers your router provide away from your ISP! I still use Google Public DNS for this because I know enough to trust there’s no sordid ulterior motive for Google to provide them, but if you think that’s not the case, you can always use Cloudflare’s (just don’t use WARP unless you understand what that is meant to be doing.)
This, by the way, is why a number of IoT devices refuse to take their DNS servers from DHCP or Route Advertisement. It’s usually not a devious ploy for sidestepping your pihole, or other custom resolution services — it’s just a way to make sure your devices don’t stop working because the ISP’s DNS are returning garbage, and particularly not because the ISP’s leased router was updated to a new version that no longer accept to change the configured DNS!
My Solution: dnsmasq as DHCP Server
As I said, TIM’s H388X refuses to let you change the DNS server it advertises in DHCP responses. This is in my opinion maliciousness from the ISP because they can. It’s worse because my mother is actually paying for the router, and even if I did replace the router with something I can control, it would likely mean losing the landline number, as I don’t think anyone published a reversed configuration of their VoIP stack.
Fortunately, at least for now, the router still allows to disable the DHCP server that it includes! And even more luckily, I already had a small network device attached at my mother’s (in fact, that’s how I have been using Tailscale to set up her network before!) So a couple of commands later, I had a working DHCP server running with dnsmasq, which is still my to-go solution sixteen years later.
In this case, though, I do not want for dnsmasq to be my actual DNS server, I just need it to act as a DHCP server, but most importantly, I need it to provide Google Public DNS as the DNS server of the request, and a different IP than itself as the default router, since the TIM provided router is still the one connect to the outside network.
The configuration file that appeared to work for me is:
no-resolv server=220.127.116.11 server=18.104.22.168 interface=eth0 bind-interfaces dhcp-range=192.168.1.3,192.168.1.254,12h dhcp-option=option:router,192.168.1.1 dhcp-option=6,22.214.171.124,126.96.36.199 dhcp-authoritative
This sets it to only run for eth0 (to not conflict with whatever systemd does nowadays), sets a range within 192.168.1.0/24 which is how the router is configured (as 192.168.1.1), leaving 192.168.1.2 for itself, and assigning the rest of the addresses dynamically. Then it sets the options to tell the clients to use Google Public DNS for name resolution, and 192.168.1.1 as the default router.
Since I didn’t need the DNS server, I also changed the startup options of the process (in
/etc/default/dnsmasq on the Ubuntu system I’m doing this on) to include
-p0 that disables the DNS side altogether.
It’s easy to scoff at the fact that my mother’s heating depends on the Internet — but it really doesn’t. The setup depends on the Internet, and it’s indeed a bit awkward that there’s no way to say “Sorry I’ve got no working Internet right now, can you please just pair with your heatlink?” If we hadn’t factory reset the thermostat to attempt re-pairing them, they would have kept working just fine, just without mobile phone control, so no worse than the non-smart ones. Still even a bit better, because of the ability to move the thermostat where it’s most relevant.
Instead, this is an example of how messy the life of a firmware developer can be. You can choose to ignore the DNS you’re given and provide your own, but then you risk bad press when this is discovered by a geek with enough skills to find out, but not enough wisdom to know why that’s being done. Or you go with what DHCP tells you, and hope that the DNS is not being borderline fraudulent.
Leasing routers without the ability to change DNS servers is horrible, in my opinion( do I ever learn?) I can see how they would rationalize it: law enforcement makes them do it, malware changes servers leaving their customers with nasty ads, and so on. But it’s still a horrible thing to do to power users — and the fact that even customers rationalize this to “Well, I can change it on the Windows or macOS settings” makes me shake my head.
Also as it turns out, DoH (DNS-over-HTTPS) means that a number of browsers (Chrome, Edge, Firefox), and a number of devices (Android) will gladly ignore the DNS settings, either by default or if you ask them. And that hopefully means that ISPs will get to the point where it stops being profitable to run their own dishonest DNS servers.
I was intercepting device/browser DoH (port 5453) so I could use providers of my own choice. Suddenly last week this prevented most (but not all!) of my Nest devices (thermostats and doorbells) from connecting to the Nest service and thus being picked up by the app. Once I shut this off, connections worked again, but I wouldn’t have figured it out without your post. Thanks! It’s annoying that they basically do not document their devices in a complete and transparent way, and that they put DNS completely beyond user configuration 🙁