During my “staycation” week I decided to spend some time looking at my various Python projects, trying to make it easier, nicer and cleaner both for contributors and for myself. I also decided to spend some time to improve some of the modules I use, based on various lessons I learned in my own tools — and that got me to learn more about those tools.
First of all, thanks to Ben, glucometerutils has been fully re-formatted with black and isort already. And he set it up with pre-commit to make sure that new changes don’t break these formattings. This is awesome.
As I was recently discussing with Samuele on Twitter, it’s not that I always agree with the formatting choices in black. It’s that it takes away the subjective matter of agreement on formatting. Black does not always make the code look like the way I would make it look like, and I could argue that I can do a better job than black. But it’s also true that it makes everybody’s code look the same, which is actually a great way to fix it.
This is something I definitely picked up in my dayjob — I have been part of the Python Readability program (you can find more about it in the Building Secure and Reliable Systems book, downloadable for free), and through that I have reviewed literally hundreds of changes, coming from all different parts of Alphabet. While my early engagement had lots of comments on code formatting, the moment when Python formatting tools became more reliable and their usage became widespread, the “personal load” of doing those reviews went down significantly, as I could finally focus on pointing out overly complex functions, mismatching code and documentation, and so on. For everything else, my answer was unchanging: “let the formatter do its job, maybe just add a comma there as a suggestion to it.”
This is why I’m happy about black — not that I think its formatting is 100% what I would do. But because it gets close enough, and removes the whole argument or uncertainty around it. The same applies to isort, as well.
While applying pretty much the same set of presubmits and formatting to python-pcapng, I also found out flake8. This is another useful tool to reduce the work needed for reviews, and it also can be configured to run as part of the pre-commit hooks, making sure that violations are identified sooner rather than later. While the tool is designed to deal with styleguide violations, it also turned out to identify a few outright mistakes in glucometerutils. I’m now going to apply it throughout, whenever I can.
There’s more checks I would actually want to integrate — today I was going through all the source files in glucometerutils to update the type annotations, since I dropped Python 3.6 support. As I went to do that I realised that one of the files created in a pull request I approved recently was actually missing licensing information. I have now added both license and copyright annotations as suggested by Matija (who is an actual lawyer, unlike me) — but would love a pre-commit check that just ensured that all the files have a license, copyright, and that they have the expected license, for instance.
There’s a few more trivial checks available in pre-commit that I may actually enable throughout: checks for trailing whitespace, and missing newlines at end of files. All of those are easily fixed, and the fixers do exactly that, which is also a great way to make the tests easier on newcomers and contributors: you don’t just get told that “it’s wrong”, but also “let me fix that for you already”.
It’s not quite as encompassing as the bubble I’m used to, but it seems to be the closest I’m getting to it right now. Maybe I should just start building all the hooks that I feel I need, and see if someone else will adopt them afterwards. It seems to be a common thing to do after all.
Anyone who has written Gentoo ebuilds, by the way, have most likely recognized similarities with repoman, the tool used to validate ebuilds before submitting them to the tree. I think that’s possibly why I’m so interested in this. Because I do agree that tools like repoman are the way to go, and have insisted myself for repoman to be extended to cover more and more cases over time, as it would stop divegence.
I honestly hope to get to a point where there’s no argument made over a code change, on whether it complies to style or not — but rather leaving the enforcement (and the fixing) to computers, whenever it is possible. And that also means helping the computers making it possible by being less picky on things that can be overlooked.
I had an similarly interesting experience to this recently while I was trying to implement some code to broadcast state changes using condition variables in golang. I came across “Rethinking Classical Concurrency Patterns, Bryan C. Mills, August 28, 2018, https://github.com/golang/go/wiki/Go-Community-Slides” and once I’d grokked the pattern I found that I could intuatively think about if I needed events as data (slide 72, 102-103) or events as completions (slide 73, 104-105) — without having to worry about edge cases and if I had the synchronization primitives correct!
Some elements of style are easy to enforce by machine. Specifically, those that are decidable by “a machine”. If your style has things like “each Python function should declare and describe all its input arguments and describe its output (and type), and any exceptions it may raise; except for trivial and obvious functions”, and, yes, I have definitely seen a style guide where that (not exact phrasing) occurred. And had the inevitable “no, you are doing multiple assignments in this function, it is not trivial” argument.
But, these edge cases aside, machine-enforced typography (as it were) of code is a great boon, as it lessens the review burden as well as saves time (eventually…) for any reader of the uniformer code,.
There’s been a bit of a change in pace about that style guide. As a lot of documentation strings started to look like type annotations and, well, Python got type annotations already, so I started telling people to use that.
But yeah uniform code, and thus a presence of some convention (and validation of it) is definitely an improvement for everybody except those who say “but my code and I are special”…