I’m currently reading Secure Programming with Static Analysis which starts already to be a quite interesting read, and I’ve got some ideas about the previously noted static analysis which I want to publish so that somebody can tell me whether the same idea has been implemented already (and if not, somebody might actually pick it up and start working on it).
Basically, I’m sure that any kind of analysis (including static analysis and indeed the tinderbox itself) will end up creating a huge amount of output; skimming through that is going to be quite hard, especially because a big part of that is likely to be false positives. I know this already because I did look through Coverity’s output for xine before, and it was hellish to go through the code and find the real issues; to the point that I think I found more security weaknesses on my own than through Coverity.
What I’m thinking of is something similar to Coverity’s interface to mark some problem as “not a problem” but with a twist: instead of having a single person accept that something is not a problem, let’s take a page out of the community-driven websites book: instead of having a single moderator, most community-driven websites nowadays have a scoring system; registered users (and sometimes non-registered as well, but let’s not go there for now) are able to add or subtract points to posts, comments and stuff like that to show or hide it. The site then allows to set a minimum threshold for the stuff to be displayed for any given user, so that the “worse” stuff does not appear.
Doing the same for the results would be probably also a good idea: you make it possible for users to help the review by marking one particular warning (coming from whatever source, the compiler or a static analysis tool) as unimportant (–1) or risky (+1). Now of course there is another matter: not all users are very good at finding bugs – especially security-related problems that static analysis is mostly going to look for – so they might mark as unimportant something that is really a problem.
So at this point you need to have one further value, and that is the coefficient of knowledge of the users of the system: let’s say that somebody is registered in the system as a security expert (like Alex — who actually gave me the title of this post): his opinion about the importance of a warning is likely more interesting than that of a Joe user that has never done security work before. On the other hand handling all various developers by hand would be taxing for the admins, so I’d probably foresee some categorisation like this:
- normal users: they know the language so they can find possible problems;
- apprentices: they know the language, and they have done some work but nothing major;
- experts: they know the language and have done non-trivial contributions to a number of projects;
- security experts: manually accredited developers who work in particular on security areas.
To find out how many contributions an user has made, the easiest way would probably be to link the accounts to ohloh which would make its measuring now quite useful.
There is then the need for gathering skill data out of the running system: taking security experts as the main standard track, if an user has been selecting as valid and false positives some warnings the same way the security experts have been (that is, before the security experts expressed their opinion), it would increase their coefficient, while making mistakes would decrease their coefficient as they would be unreliable.
Now this is probably a bit complex to implement, but if it doesn’t exist already and somebody feels like starting working on it, I’d be happy to help gathering the logs data to feed it!
Beside a score of severity, maybe a static analysis annotation should have a tag that places it into categories of bugs or false positives. So for example if a static analysis tool has a certain heuristic which is wrong, a human may annotate a potential bug not only with “-1 unimportant” but also with a “bogus because this variable can never overflow”, which is a tag that can apply to many such sites. Real bugs can be tagged with “use after free” or “memory leak” or “lock inversion”, etc. This would make scanning a list of potential bugs easier, may make verifying scores easier, may be useful in improving the analysis tools.