This Time Self-Hosted
dark mode light mode Search

The frustration of debugging

I’m currently working on a replacement for bufferpool in lscube; the bufferpool library has provided up to now two different (quite different actually) functionalities to the lscube suite of software: it provided an IPC method by using shared memory buffers, and also a producer/consumer system for live or multicast streaming. Unfortunately, by the way it was developed in the first place, it is really fragile, and is probably one of the least understood parts of the code.

After trying to look to improve it I decided that it’s easier to discard it and replace it altogether. Like many other parts, it has been written by many people at different times, and this could be seen quite well. I think the one part that shows very well that it has been written with too little knowledge of programming details is that the actual payload area of the buffers is not only fixed-sized, but of a size (2000 bytes) that does not falls into the usual aligned data copy. Additionally, while the name sounded like it was quite generic, the implementation certainly wasn’t, since it kept some RTP-related details directly into the transparent object structures. Not good at all.

At any rate, I decided it was time to replace the thing and started looking into it, and designing a new interface. My idea was to build something generic enough to be reusable in other places, even in software completely different, but at the same time I didn’t feel like going all the way to follow the GObject module since that was way too much for what we needed. I started thinking about a design with one, then many, asynchronous queues, and decided to try that road. But since I like thinking of me as a decent developer, I started writing the documentation before the code. Writing down the documentation has actually shown me that my approach would not have worked well at all; after a few iterations over just the documentation of how the thing was supposed to work, I was able to get one setup that looked promising, and started implementing it.

Unfortunately, right after implementing it and replacing the old code with the new one, the thing wasn’t working; I’m still not sure now why it’s not working but I’ll go back to that in a moment. One other thing I would like to say though is that after writing the code, and deciding it might have been something I did overlook in the term of implementation, I simply had to look again at the documentation I wrote, as well as looking for “todo” markup inside the source code (thanks Doxygen!), to implement what I didn’t implement the first time around (but I decided was needed beforehand). So as a suggestion to everybody: keep documenting as you write code, is a good practice.

But, right now, I’m still unsure of what the problem is; it would be quite easy to just find it if I could watch at the code as it was executed, but it seems like the GNU debugger (gdb) is not willing to collaborate today. I start feng inside it, set a breakpoint on the consumer hook-up, and launch it, but when it actually stops, info threads shows me nothing, although at that point there are, for sure, at least three threads: the producer (the file parser), the consumer (the rtsp session) and the main loop. The funny thing is that the problem is certainly in my system, because the same exact source code does work fine for Luca. I’ll have to use his system to debug, or set up another system for pure debugging purposes.

This is the second big problem with gdb today, the first happened when I wanted to try gwibber (as provided by Jürgen); somehow it’s making the gnome-keyring-daemon crash, but if I try to hook gdb to that, and break it on the abort() call (the problem is likely an assertion that fails), it’s gdb itself that crashes, disallowing me from reading a backtrace.

I have to say, I’m not really happy with the debugging facilities on my system today, not at all. I could try valgrind, but last time I used it, it failed because my version of glib is using some SSE4 instruction it didn’t know about (for that reason I use a 9999 version of valgrind, and yet it doesn’t usually work either). I’m afraid at the end I’ll have to rely on adding debug output directly to the bufferqueue code and hope to catch what the problem is.

Current mood: frustrated.

Comments 2
  1. There is normally no need to break at abort(). GDB just stops when an assertion fails.

  2. Yes upon receiving SIGABRT. Unfortunately I had some bad experience with gdb and SIGABRT and I gotten used to just break on abort() to be able to inspect the status of the process at the moment of the call.

Leave a Reply to FlameeyesCancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.