Saturday 19 January 2008

kill 'em all


The last few weeks, we were suffering some instability in our beloved modest e-mail client. At specific time as well as randomly, modest would decide to call it a day, and prematurely return its pid to the rightful owner. Bug reports are not always enough to pinpoint the problem (which might be outside modest) and it's easy to be misled in some direction. We spent quite some time with the various tools at our disposal. All have their specific strengths and weaknesses, so we tend to use them combined. Let's discuss some of them.

gdb

First, there is the venerable gdb. Although armchair computer scientists snuff at the use of debuggers, down here in the trenches, it's often our last hope. Using gdb effectively takes time, and various extra difficulties (scratchbox, the ARM architecture) can make it a frustrating activity. And for people used to graphical debuggers (like the one in Visual Studio), gdb might seem a bit spartan, even when using something like GdbMode in Emacs. But once you've become friends with gdb, it's an incredible powerful tool, which even works on your N8x0.

As a small tip, in OS2008/Chinook provides the maemo-debug-scripts package, which (among others) offers native-gdb. I'm not sure what's so 'native' about it, but it provides gdb 6.6, which works much better than the apparently 'non-native' gdb 6.4, especially with threaded code. It's not clear to me why the ancient 6.4 is shipped in the first place, but there's probably a good reason. Read more about it here, which has a lot of very practical tips.

valgrind

Then we have valgrind (pronounced vel-grinned). Compared to gdb, which is like a brain surgeon, valgrind resembles a tax auditor (hurray!), with bytes in your RAM as the currency. Valgrind runs your program in a virtual machine, which offers replacements for the normal memory-management functions (malloc, free and friends). When running with valgrind, the application uses valgrind's replacements. And unlike the normal ones, valgrind's versions carefully checks where you get your memory from and what you do with it, and whether you free it when you're done with it.

It's an extremely useful tool for finding memory errors that occur during runtime. One weak point of valgrind is that it doesn't run on ARM; but still, it's a great way to find memory corruptions, leaks and so on, which will show up on X86 as well. Any kind of memory error found on X86 corresponds quite likely to a crash on ARM.

Note that I've only talked about the 'memcheck' tool inside valgrind; there are many more, such as cachegrind, massif and the new iogrind, which are great for profiling your code.

A relatively recent version of valgrind is available in the OS2008/Chinook repositories.

static checking

Except from these runtime tools, there are some other ones, which work at the static, source-code level. There are things like Coverity and lint, but in my (limited) experience, they only catch a small number of problems much that weren't also caught by gcc with -Wall -Werror + valgrind/gdb. Still, it's quite attractive to use any kind of bug prevention you can get your hands on. And let's not forget one of the most important tools: careful code review. Finding some critical part of code, and then simply reading it, letting the statements play out in your mind, and imagining the interactions. The human mind is the greatest debugging tool of all (and a pretty good bug-introduction tool too... )

Now, I hope you can find a bright mind somewhere yourself... Regarding gcc, version 3.4.4 is available in OS2008/Chinook, but modest can be compiled as well (outside scratchbox) with gcc 4.2, using Ubuntu Embedded, or the rudimentary gnome-frontend. Compiling there, and on a 64-bit architecture, helped to fix some issues as well. And the newer gcc is much better at detecting with -Wall.

so far, so good... so what

Now, even with all these tools, I can promise that there are still some bugs left in modest. But also that there are quite a few less. So please get your latest update at the usual place (the application manager). If you find any kind of instability, please file bugs at the usual place, and please describe as detailed as possible what you were doing -- thanks!

5 comments:

Anonymous said...

> the maemo-debug-scripts package, which (among others) offers native-gdb. I'm not sure what's so 'native' about it, but it provides gdb 6.6, which works much better than the apparently 'non-native' gdb 6.4

Native = the one installed to the Sbox target (as a maemo-debug-scripts dependency), non-native = the one provided by Sbox

maemo-debug-scripts is also available for OS2007, but with older version of Gdb I think.

Misha said...

Great article, Dirk.

Anonymous said...

"Although armchair computer scientists snuff at the use of debuggers, down here in the trenches, it's often our last hope."

Computer Scientists often stress the importance of debugging tools. What are you talking about?

Riku Voipio said...

Great article! some minor points:

native-gdb is a arm binary to debug arm target binaries.

cross-gdb is a x86 binary that debugs arm target binaries.

cross-gdb faster and require cpu transparency, but doesn't support everything a native gdb does. It's default merely due to historic reasons, ie when qemu did not work well enough to run qemu and people didn't want to bother setting up sbrsh just to debug.

As for static vs runtime testing, valgrind/gdb will never catch bugs that happen not be in the codepaths of normal usage testing. Since usage testing tends to repeat testing the same codepath over and over, the coverage of runtime testing might be surprisingly small. Especially the error paths (where lots of bugs lurk!) will fail to be very hard to notice using valgrind.

djcb said...

@anonymous: ah, thanks, that explains things.


@misha: thanks :)


@slippery-pete: in academia, amongst some there really is an emphasis on making sure the code is correct beforehand so debugging will less of a necessity. Of course they're partially right, and debugging should be less necessary than it is today, using formal methods and such. But, today we still need gdb.

@riku: ah, thanks. good point about static checking, indeed many bugs hide in error paths where they are never tested. we always compile -Wall -Werror for that reason. gcc 4.x is pretty good at that.