Wednesday, 24 December 2008

spitfall

Implementing GTK+-widgets and other GObjects in C requires quite a bit of boilerplate code - that's hardly news. One obvious way to deal with that is to use a different programming language. If you're into C++, I can recommend the excellent GtkMM C++-bindings for GTK+. Programming GtkMM feels very natural and follows the C++-idioms; it's easy to integrate with std:: and friends. Also, it's LGPL and pure C++.

Another option is Vala. If you haven't heard about it, Vala is a programming language in its own right, with similarities to C#, but specifically designed for use with GObject. One very interesting thing about Vala is that it compiles to plain C-with-GObjects (as an intermediate step). Thus, you write in Vala, with no 'libvala' needed, with code which is just as fast as handwritten C. Vala also supports many other libraries, which can make them easier to use, compared with plain C. Using Vala, writing GObject/GTK+-based applications becomes a lot easier. Vala Overview.

Finally, my truly low-tech solution is spuug. Spuug is a little GObject code-generator that I wrote in 2006 to learn some Ruby, and to save myself some time. And boy, has it saved me some time! Now, finally a new version. The credit for this go mostly to Viktor Nagy (many thanks!), who submitted some patches.

spuug usage is quite easy; for example:


$ spuug --class=FunkyFooBar --namespace=Funky --parent=GtkWidget

will generate funky-foobar.c and funky-foobar.h with 150 lines of boilerplate code, as a starting point for some FunkyFooBar-widget.

Of course, spuug works well for Maemo-code, and I know of a number of programs that are using it.

There are of course some disadvantages to using code-generators. But the advantage of spuug is that it doesn't require you to learn any new language. Also, after using it, you're not depending on spuug - the output is perfectly readable C code.

Tuesday, 2 December 2008

the song remains the same


So, after three years I finally made a new version ttb, my teletekst viewer, which is especially interesting for Dutch-speakers and linguisticly-inclined people studying West-Germanic languages. The new version brings user-help and some cosmetic updates.

The program is listed as the 'official' client for Linux by the NOS (state television), and I'm getting quite some mails -- but interestingly, not one single bug in three years. To be honest, there is a bug remaining: there is too much bad news in the news section. I am working on that one, but it might take a while.

I am also preparing a Maemo-version. Interestingly, I had a version running on an 770 in early 2005 at LinuxTag, but I never got to packaging it. Anyway, the work has to wait until after my trip to a friend's wedding in the Eternal City of Rome, where I'll be flying.

As if all of that were not enough, I started a blog with tips for emacs-users; the idea is to have frequent small posts that show one useful trick: Emacs-Fu. Let's see if I succeed.

Wednesday, 26 November 2008

it's so easy


Sometimes, I like to use mathematical notation in webpages, either to impress people or simply for decoration. One way to do that is MathML, which is an XML-based markup language for mathematical notation. However, many browsers do not support MathML at all, or require you to download plugins and/or special fonts. Another problem with MathML is that XML is a really inconvenient format to edit by hand. Practically, you'll need some kind of formula editor.

tex vs mathml


As an old-schooler, I prefer to use the math-notation invented for TeX instead - it is short and sweet and powerful. Donald Knuth invented the whole TeX language because he was unhappy with the quality of typesetting of mathematic, and it is widely used in both computer science and mathematics. Anyway, I'm sure many people remember the 'abc-formula' to calculate the roots of a quadratic function :


In the TeX-sublanguage for math, one can specify the formula as follows:

-b \pm \sqrt{b^2 - 4ac} \over 2a

The corresponding MathML is no fewer than 20 lines; see the example in Wikipedia. Clearly, MathML is not designed for hand-editing. There are are some editors available, but hand-editing TeX is much faster (at least for me); and, as mentioned, even if you have the MathML, many browser will not show it correctly.

So what I'd like is a way to use (i) TeX-notation and (ii) have it display correctly in any (graphical) browser. One way to that is to use LaTeX to process and render the formulae, and convert that to a PNG-image. In 2004, I wrote a little tool called WebTeX to create small images from TeX-formulae. It was nothing too fancy; you enter a <img ...>-element with some decription of some formula, and the little tool would turn it into an image, using LaTeX and ImageMagick. I don't maintain that old tool anymore - it was time for something new. Therefore...

texdrive


This weekend, I wrote a new maths-in-webpages tool using emacs-lisp. The emacs-integration makes adding formulae to html-pages really easy. For example, if I want to include the famous Bayes' Theorem, I simply type:

M-x texdrive-insert-formula
Formula: $P(A|B) = \frac{P(B|A)P(A)}{P(B|A)P(A) + P(B|\overline{A})P(\overline{A})}$
Title: bayes-theorem

Et voilà; the following is inserted:

<img src="bayes-theorem.png" title="bayes-theorem"
class="texdrive-formula" name="$P(A|B) = \frac{P(B|A)P(A)}{P(B|A)P(A) + P(B|\overline{A})P(\overline{A})}$"
border="0">

Now, all we need to do is texdrive-generate-images-from-html, and the corresponding image will be generated:


So, for immediate download: texdrive.el. It works pretty well for me; please let me know if you have any problems or are missing something. In some cases, the formulae are not as sharp as they could be; I hope I'll be able to improve it with some tweaking. Anyway, it's nice to see how one can solve problems by glueing together some existing open-source tools. Standing on the shoulders of giants...

Note that some wiki-software, notably Wikipedia's MediaWiki, use a similar approach.

Tuesday, 11 November 2008

the test that stumped them all

Most of us are not Donald Knuth, and indeed need to test our software. That is even true for my hobby projects - when I offer software for use by others, it's a matter of craftmanship to deliver the best software possible. It's very hard to foresee all the possible environments (architecture, compiler, library version, ...) where my software might be run. But at least, I can minimize the number of programming errors by testing things as much as possible.

The trouble with testing, however, is that it is dead boring. I hate doing boring things -- life is just too short. So, I want to do my testing in the least boring way possible -- I'd like to be able to simply run:


$ make test

and have that go through all my test cases, and report any failures. The idea is that if it is so easy to run tests, you might actually do so, and make sure your software is working according to plan. When doing a release, it is so easy to forget something really obvious, for which you get embarrasing bug reports... Running some automated tests gives some peace of mind when doing a release.

gtest

Since 2.16, the GLib library offers a unit-testing framework called GTest (note, this is not to be confused with Google Test, sometimes also called GTest). GTest is not much different from, say, check, but it's part of GLib and integrates nicely with it. I have started to use it for mu, and I am quite happy with it. Here, I will not go into the details of actually writing test cases, but talk about how to integrate GTest with your code. For the best results, you'd probably want to integrate it with your build system. I am using autotools.

The overall setup is that for all my directories with code, there is a subdirectory tests/ which contains the test code. Those test cases are unit-tests, which test one function or a couple of them combined. Now, of course it's a lot easier when your code is written in such a way that makes this easy[1]. In addition to the per-directery tests/, there is also a top-level tests/, which tests the whole software workflow. In the case of mu, this means that the tests will index some test messages, fill a database with that, and then run some test queries against this database. When all of that works correctly, I am quite confident that my software is not totally broken.

autotools

Now, let's discuss how you can integrate GTest with your code; this is inspired by the way GTK+ does it these days. First, here is gtest.mk, a file in the top of my source tree, that I include in all Makefile.ams that require GTest support:

TEST_PROGS=

test: all $(TEST_PROGS)
@ test -z "$(TEST_PROGS)" || gtester -l --verbose $(TEST_PROGS); \
test -z "$(SUBDIRS)" || \
for subdir in $(SUBDIRS); do \
test "$$subdir" = "." || \
(cd $$subdir && $(MAKE) $(AM_MAKEFLAGS) $@ ) || exit $? ; \
done

.PHONY: test

This blob adds a test target to various Makefiles, which will run the gtester program (part of GTest) with your test programs.
In my configure.ac I have:

# g_test was introduced in glib 2.16
PKG_CHECK_MODULES(g_test,glib-2.0 >= 2.16,
[have_gtest=yes],[have_gtest=no])
AM_CONDITIONAL(MU_HAVE_GTEST, test "x$have_gtest" = "xyes")
if test "x$have_gtest" = "xno"; then
AC_MSG_WARN([You need GLIB version >= 2.16 to build the unit tests])
fi

With this, I make sure that my code also works with older versions of GLib; the unit tests will only work with newer versions, of course. With this, you'll have a symbol MU_HAVE_GTEST that you can use in your Makefile.am; for example, in index/Makefile.am, I have:

include $(top_srcdir)/gtest.mk

SUBDIRS= .

if MU_HAVE_GTEST
SUBDIRS += tests
endif
[....]

As you can see, it includes gtest.mk mentioned above, and (conditionally) add tests/ as a subdirectory to visit.The unit tests are in this subdirectory. Note that by explicitly setting SUBDIRS to '.' first, we ensure that first we build the code in index, before we go to tests/.

unit tests

Below is a simple example unit test program; it only uses a small subset of GTest. You can further organize your test cases (see GTestSuite and GTestCase) and see Fixtures, which setup the testing environment. I don't use those, but they might be useful for others. In general, I am only using a small subset; check out the GTest-documentation to find out more. Anyway, here are some simple test cases:

#include <glib.h>
#include "my-code-to-test.h"


static void
test_num_str (void)
{
char *str;

g_assert_cmpstr (str = my_num_str(1001),==,"one thousand and one");
g_free (str);

g_assert_cmpstr (str = my_num_str(-1),==,"minus one");
g_free (str);
}


static void
test_warning (void)
{
/* no complex roots: my_sqrt(-1) should
* return MY_SQRT_ERROR and issue a g_warning; the
* g_warning will trigger the process to fail,
* which is what we're expecting */
if (g_test_trap_fork (0, G_TEST_TRAP_SILENCE_STDERR))
g_assert (my_sqrt (-1) == MY_SQRT_ERROR);

g_test_trap_assert_failed ();
}



int
main (int argc, char *argv[])
{
g_test_init (&argc, &argv, NULL);

g_test_add_func ("/mytests/test-add", test_add);
g_test_add_func ("/mytests/test-warning", test_warning);

return g_test_run ();
}


Now, we can run our tests with:
$ make test

(Note that the test cases are fork()ed, and you can actually write a test case where it passes if an abort or even a segfault occurs.)

For mu-0.4 I get the following output:


[...]
make[1]: Entering directory `/home/djcb/src/mu-0.4/tests'
TEST: test-index-search... (pid=15553)
/all/test-query01: OK
/all/test-query02: OK
/all/test-query03: OK
/all/test-query04: OK
/all/test-query05: OK
/all/test-query06: OK
/all/test-query07: OK
/all/test-stats01: OK
PASS: test-index-search
make[1]: Leaving directory `/home/djcb/src/mu-0.4/tests'

Nice and easy; if you're less lucky, you might get something like:

make[1]: Entering directory `/home/djcb/src/mu-0.4/tests'
TEST: test-index-search... (pid=16024)
/all/test-query01: **
ERROR:test-index-search.c:117:query_01: assertion failed (mu_msg_sqlite_get_subject(row) == "this can't be right"): ("Re: What does 'run' do in cperl-mode?" == "this can't be right")
FAIL
GTester: last random seed: R02S2d24e3907b0c62e6a008e891f401fedf
/bin/bash: line 5: 16023 Terminated gtester --verbose test-index-search
make[1]: Leaving directory `/home/djcb/src/mu-0.4/tests'

With that, all we need to do is fix the bug and test again... rinse-lather-repeat. Using GTest, it's really easy to run test cases. In general I try to keep my software pass the tests at the end of every programming session. Now, this does not work when I do big changes, but after stabilizing things again, I make sure all test cases pass, both old and new.

parting thoughts

One thing still missing from GTest is some way to see the code coverage, i.e. to see which part of the code are covered by tests. I think it should be possible to do this using gcov, but it'd be nice if someone automated that a bit. Another issue is that for effective use, you will need something like the setup described here. One can hardly expect someone new to Unix-development to figure this out by themselves... but of course, we cannot really blame GTest for that.

Hopefully my setup helps a bit to setup non-boring testing (even though it might be a bit boring in itself...). There are real-life examples of this in both mu and GTK+. And finally, if you find any inaccuracies, please let me know -- there are no unit tests for blog entries to save me from mistakes...



[1] Now, a discussion of how to write easily testable functions deserves its own blog entry, but there are some general things to keep in mind. Keep your functions short, limit the number of parameters, avoid global variables, limit side-effects to only a few functions, etc. In other words, use the lessons learnt from functional programming languages. And as a nice side-effect (ha!), such functions tend to be much less error-prone in the first place.

Saturday, 1 November 2008

i dream in infra red


I released mu 0.4 (my e-mail indexing/search tool), and as always, I try to learn things from it.

One of the main problems with writing correct and maintainable software is complexity. I am not talking about computational (big-O) complexity here - I am talking about code complexity, as a subjective measure for readability. Some people write very elegant and readable code, while others write code that is very hard to understand. It would be nice to have some objective measure.

cyclomatic complexity

While certainly not perfect, I found McCabe's Cyclomatic Complexity a useful tool for this. Thomas J. McCabe describes his method in his classic paper from 1976 as a metric of the flow graph of the program. I won't go into the details of the exact calculation here (it's straightforward though, read the paper) -- the bottom line is that the higher the complexity, the harder the code is to understand and to test. Indeed, it's not just about readability for humans: the complexity has a direct relation with the amount of code paths, and consequently, the testability of the function. If complexity is high, you'll have an unholy number of code paths, which are impossible to fully test, and software quality will suffer.

Making sure your code is not too complex (according to this measure) means simply assuring that there are not too many code-paths (really: decisions); ie. split your code in to short functions that do one thing, and do it well.

pmccabe

Now, how do we get the numbers to identify overly complex functions? Thankfully, we don't need to calculate anything by hand. There is the pccmcabe-package (debian/ubuntu) which does the work for us, for example:

$ pmccabe -fv prime.c
Modified McCabe Cyclomatic Complexity
| Traditional McCabe Cyclomatic Complexity
| | # Statements in function
| | | First line of function
| | | | # lines in function
| | | | | filename(definition line number):function
| | | | | |
6 6 18 4 26 prime.c(5): main
6 6 19 1 30 prime.c

An interesting example of complexity is the __strptime_internal in evolution-data-server/trunk/libedataserver/e-time-utils.c, which has complexity of 196(!). I am glad I do not have to maintain that one...

recommendation

What should be the maximum recommended cyclomatic complexity for a function is debatable - but many coding guidelines suggest a value of 10. If you go much beyond that, it's easy to see that the function gets very complex.

As always we should use guidelines with care. I can imagine some inherently complex algorithms that you nevertheless wouldn't like to split precisely *because* you want to keep things as understandable as possible. But those will be rare exceptions.

practical

Obviously, limiting cyclomatic complexity is not sufficient to create maintainable software; there are still many other opportunities for making your code hard to understand. Still, it does not hurt to at least keep this one aspect under control, especially as experience suggests there is a high correlation between function complexity and error density. Fortunately, it's usually not too hard to reduce the complexity: split big functions (carefully!) into smaller ones; logical units that do one thing, and do one thing well.

I made sure the new mu follows the <=10-rule. I found some extra targets for Makefiles quite useful for that:


cc10:
@pmccabe `find -name '*.c'` | sort -nr | awk '($$1 > 10)'

cc20:
@pmccabe `find -name '*.c'` | sort -nr | awk '($$1 > 20)'

Now, I can simply type make cc10 or make cc20 to get all the functions that violate the rule CC <= 10, resp CC <= 20. Mu version 0.3 still contained a handful of function that broke the rule, but I have now simplified them - splitting big functions up. In my projects, I have usually followed the rule to some extent, intuitively, but I definitely could have written better code if I'd pay attention to the number before. There is of course a risk in changing working code just because of 'some number'; but in the long run I think it will really pay off.

Wednesday, 29 October 2008

a kind of magic


Today just a short tip: if you are using emacs and git, I can recommend magit.

Magit is a git-mode for emacs, which makes using git convenient and easy to use. Magit was created by running mate Marius. It's under heavy development, but I have been a happy user for while. There is even a user manual, which you actually don't need very much, as things work very much as you would expect.

If you are not using emacs, this might be a good reason to start.

Wednesday, 22 October 2008

seek & destroy

In my last entry I wrote a bit about optimizing my little project. One other significant optimization I found was inode-sorting, from an idea I got from some old postings on the mutt mailing list.

The idea is as follows: some file systems, in particular ext3, support hashed b-trees to speed-up lookups in large directories (paper). That's nice for finding particular files. However, as a side-effect, when you scan full directories (as mu does when indexing), you might get the entries back in a rather chaotic order. If you then try to open the files in that order, you suffer from long seek times, and consequently, bad performance.

The solution is to sort the dir entries by their inode (in ascending order), and then open the corresponding files in that order. This is what mu (mu-index) does by default, starting with version 0.3. You can turn it off with --tune-sort-inodes=0, but there is usually little need for that, as the overhead of sorting is negligible.

So, what difference does it make? Answer: it depends on how the files are laid out; if you already get your files back in their 'natural order', there won't be much difference - this is what happens on my main machine. But, on another (old) machine where the files are not in that order, the improvements are substantial: I found that indexing 1500 message in 25 seconds without inode-sorting, goes down to 15 seconds with inode-sorting; a nice 40% improvement.

Note(1): this works for ext3 directories with dir_index enabled; there's a HOWTO. There are other file systems that have similar features, but I haven't tested those. Note(2): This optimization is not very useful for flash-based file systems, as they don't really care in what order you open files.

Saturday, 18 October 2008

chasing time


As discussed before, I am working on a little hobby project called mu, for indexing/searching e-mail messages in maildirs. As a true hobby project, it's about finding things out. I'll take notes as I go along.

indexing

One important part of indexing and searching is.... indexing. Indexing (in this context) is the operation of recursively going through a maildir, analyzing each message file, and storing the results in a database. In mu's case, there are actually two databases, one SQLite-database and one Xapian-database (a really interesting tool - to be discussed later).

Indexing may take a considerable amount of time; mu version 0.1 took 192 seconds (on average) to index 10000 messages in my testing corpus. And this version did not even support the Xapian database. Indexing involves reading from disk, querying the database to see if the message is already there, and if not, storing the message metadata. Because of this scheme, re-indexing of the same 10000 messages only takes about 5 seconds (with re-indexing, only modified/new messages need to be indexed).

The full indexing operation probably does not happen very often, for most people. Still, I think it's very worthwhile to try and make it faster. Nobody likes to wait for 192 seconds, even once - and during development, I need to do a full index rather often. Another important reason is that optimizing software is simply interesting - which is a main motivator for a hobby project.

So, let's see how we can make this a bit faster; here I'll only discuss some of the database-related optimizations.

transactions

As mentioned, mu stores the indexing data in two databases; one SQLite-database and one Xapian-database. Both of these databases know the concept of a transaction. By default, SQLite puts every query in a separate transaction. This is very safe, but also quite expensive. When indexing messages, there is no risk of data loss, so it's quite reasonable to increase the transaction size. And this makes things a lot faster. Between mu version 0.1 to 0.2, I increased the default from one transaction per message (3 queries) to one transaction per 100 messages. This made indexing more than 2.5 times faster -- see the table below. This improvement is even more impressive when considering that I also added full-text search, indexing message bodies as well (this is what Xapian is for).

For Xapian transactions, the default value I chose is 1000 transactions -- but the performance effects are much smaller. So, my 'optimal' values, are 100 and 1000, respectively. I found that transactions bigger than that don't improve the performance very much, but of course still affect memory usage. You can tune these with --tune-sqlite-transaction-size and --tune-xapian-transaction-size. The defaults should be just fine for the normal desktop use case - still, if you need a less memory-hungry but slower version, that is possible too. See the mu-index(1) man page for details.

pragmatic

Another area for performance are SQLite's PRAGMA-statements. Some useful ones are PRAGMA synchronous= (which you can influence with --tune-synchronous and PRAGMA temp_store=, which you can tune with --tune-temp-store. Again, see the mu-index(1) man page for details.

It turns out that PRAGMA synchronous allows for some improvement. This setting determines whether SQLite does it writes in a synchronous way. It's faster (and slightly less safe, but the notes at the end of this blog entry). From the table below, it seems that PRAGMA temp_store does not make much difference in this case. This PRAGMA determines where we store temporary (non-committed) results. Some testing suggests this is because, when we do not enable synchronous writing (above), even the 'file' temp_store never physically hits the disk, due to caching by the kernel.

results

Having optimization options tunable through command line options is really useful. Software optimization, especially from what your read online, seems to be a field full of myths, outdated 'facts' and placebo-effects. And even if the information is correct, it may not apply to your use case. The only thing you can do is measure it. And with command line-options I can easily do that, as well as see how various combinations of optimizations perform.

Here's a table with the results for indexing 10000 messages with version 0.3. Between all the runs, I used

# sync && echo 3 > /proc/sys/vm/drop_caches
to flush the caches. That's a critical step - the kernel caches a lot of data, which makes subsequent runs much faster if you don't flush the caches. And that is not what I wanted to measure.







msg/sqlite tx
msg/xapian tx
synchronous sqlite
temp store sqlite
time (s)
notes
1
1
full
file
1536
1
1
normal
default
182
similar to defaults for mu 0.1, but faster
100
1000
full
file
73
100
1000
no
file
68
100
1000
no
memory
68
default for mu 0.3
10000
10000
no
memory
67

As an example, the default for mu version 0.3 is equivalent to:
./mu-index --tune-sqlite-transaction-size=100 --tune-xapian-transaction-size=1000  --tune-synchronous=0 --tune-temp-store=2 ~/data/testmaildir
Again, see the mu-index(1) manpage for details.

Note, these optimizations are a good strategy for indexing data, that is, generating data from data that is already safely stored somewhere else. If anything goes wrong, we can always restart the indexing later. However, if your database stores data that cannot easily be retrieved again afterwards (say, that one occurrence of the Higg's Boson in your particle accelerator), you would want to be a bit more careful.


There are some more optimizations possible; some I have even implemented, such as inode-sorting, which is documented in the mu-index(1) man page. To be discussed some other time.

Sunday, 28 September 2008

it's all greek to me

It's been a while since my last blog entry... I haven't done much work on modest lately, but it is in safe hands. I did start a new little hobby project though; it's called mu, and it's a collection of command line tools to index / search e-mails stored in Maildirs. It doesn't run on N8x0 (yet), but I guess it wouldn't be very hard to port it. Of course, this kind of software has been written before - but for a hobby project, that does not really matter. It's all about trying things out.

I am taking notes about the things I learn as I go along... there's a lot of optimization stuff to discuss but unfortunately, it's too much to fit into this blog entry... will write about that later. I am off to Greece now -- to corrupt the youth of Athens. I hope I can understand the people; I taught myself a little bit, but rumours have it that the language has changed quite a bit in the last 2500 years...

And not to forget: happy birthday, GNU. 25 years... I may not always agree with RMS, but he deserves the greatest respect for his accomplishments. A George Bernard Shaw quote comes to mind:


"Reasonable people adapt themselves to the world. Unreasonable people attempt to adapt the world to themselves. All progress, therefore, depends on unreasonable people.".

Sunday, 18 May 2008

my name is nobody

It's great to see the improvements in the modest e-mail client. For most people, there should be little reason still to use the old email client. Great thanks to all involved -- my friends from Spain, Belgium, Germany and elsewhere; Vivek, Mox, and all users, contributors etc. It's good to mention contributors sometimes; I found the CNN Money-article about the N810 development team a bit off-balance in that respect - it would have been nice to include some people who write the software, too.

Anyway, back to modest. I'm sure that someone, somewhere is missing some feature that is essential to them. While usability and feature-richness are not necessarily conflicting, in practice they often are (I know, I'm a mutt-user!) But, no excuses -- in my (slightly biased) opinion, it's a nice little e-mail client and a great improvement. And with the code being open and free, there is nothing stopping people from firing up their favorite text editor and start hacking on their missing pet feature.

My personal role in the modest-project will diminish a bit. I'll be slaying some new dragons - still Nokia, open source, yadayada; I'll write a bit more about that in the near future. I feel that modest will continue its life in trusted hands, and of course I'll keep an eye on that ;-)

Friday, 18 April 2008

the thing that should not be

Just a short note: due to an unfortunate regression, Modest (version: W16 release) does not work with SSL/TLS, breaking providers such as Gmail. See bug 3084. The reason was that what we tested with, differs slightly with the Chinook environment, and so this one fell through the cracks. Mea culpa... Anyhow, the problem has been fixed. If you build things yourself, get the latest (tinymail and modest) and all will work fine. If you don't want to do that, you'll have to wait until Monday; unfortunately, we can't do anything before that.

Once more, apologies from the Modest team for the inconvenience.

Tuesday, 1 April 2008

images and words


These are interesting times... I just found out that the next revision of our internet tablet will have WiMAX-support. Rest assured - modest will support that as well.

Also, recently I have been studying the (very much recommended!) work of Edward Tufte, on the visualization of data, and how modern technology is great at obfuscating real meaning behind snappy graphs. Still people are trying to generate meaningful (or sometimes just pretty) pictures out of masses of data. One of the masses of data being email messages. Check these great post on FlowingData which show many different visualization of email data. For a more practical example, look at MailTrends, which analyzes the emails in your Gmail account for your.

et tu, emacs?


I was very happy to see prebuild Emacs packages for Maemo. I wonder if my instructions are still valid, especially regarding key-bindings on the N810. Anyway, I'd be interested in the next steps in integration Emacs with the platform. I'd like to connect the HW-zoom buttons to zooming the fonts in Emacs, and maybe marry the emacs-server setup with the application menu -- ie., don't use new emacs instances for new files, but instead use new buffers in the existing instance. Now all I need is a little time.

Friday, 21 March 2008

recreation day


Photo from Evergrey-concert, yesterday 20.03. Excellent music from the Swedish rockers; I've known them for years, but this was the first time I saw them live. Great concert, very talented band, and they were nice enough to do an autograph session afterwards; I even got into a picture with the guys -- slightly embarrassing...

Time for an update... Our beloved modest e-mail client is humming along happily. We're putting a lot of energy of testing all kinds of use cases, as well as weird error conditions. Modest/tinymail contain quite some code (in total around 240K lines), so there are a lot of things to test. Anyway, I was already quite happy with our first bèta release, back in December. And modest has seen solid and consistent improvement since, every single week (with a few regressions thrown in to keep things interesting...).

Also, I have been quite happy with my emacs-on-N810. It has turned my N810 is a versatile PDA. I'm slowly capturing the power of org-mode in Emacs (see the 25 minute video), which is an amazing way to handle todo-lists, GTD and so on.

Then, there is so much happening in free software land, it's hard to keep track of it, even if just looking at the level of fundamental tools. Some things that I found quite interesting:


  • Dehydra/GCC is a plugin for gcc (Javascript!) built within the context of the Mozilla project. Mozilla uses an object system called XPCOM, which is 'inspired' by Microsoft's COM. However, times have changed, and in many place in the huge Mozilla codebase, this XPCom is seen unnecessary bloat and complexity. For example, in COM-style, one uses the return value of a method for error checking; the 'real' return value comes as an outparam. However, in many cases (see DeCOMtamination), it's much better to use a normal return value, and use e.g. exceptions for error handling. Now, try to do that automatically, taking into account possibly misuse of outparams -- sometimes, sed/awk/perl are just not enough. And that is where DeHydra comes in.
  • gold, the new & improved GNU linker. It's good to see that even classic tools like ld are still being improved -- and quite significantly in this case, esp. for speed. What we're still waiting for is link-time optimization, which can significantly speed-up programs, e.g. by making sure the most used functions are in the same memory page.
  • quagmire (giggedigig!), finally an autotools-replacement projects that seems actually capable of doing so. The initial goals is to replace automake and libtool with a bunch of GNU make macros. By simply requiring GNU make instead of a 'normal' make, a lot of the hackery autotools disappears. Another nice thing is that it understands pkg-config, which simplifies another set of problems. Apparently, the longer term goal is to replace autoconf as well. And, given designer Tom Tromey's track record, the future looks bright.

Monday, 25 February 2008

come together


I've returned to Helsinki after visiting the FOSDEM-conference in Brussels. Before anything else, I'd like to thank and compliment the organizers for creating a great conference. And not just the organizers, all the volunteers that made it another great FOSDEM. The amount of work that goes into something like this cannot be overestimated, and it all went very smooth. If anything, it was too succesful, so many people...

Anyway, I had the chance to meet a great many old friends, as well make new ones. It's fantastic to see all the free software projects that improve things all over the software stack. Kernel, console, X, web services, funky UI bling, end-user applications, embedded software,... So much combined brain power, pushing the envelope of free software.

I did a presentation (should be available soon) of our own little addition to that, the modest e-mail client. Although there was some delay (Murphy!), I was quite happy with my talk. And it was particularly interesting to talk to modest users - what do they like, what do they miss, and so on. Overall, we've been blessed with very helpful users, and with there assistance, we were able to kill quite a number of bugs which would have very hard to fix otherwise. See our resolved buglist in our bugzilla for some great examples of that.

Anyway, overall FOSDEM made me quite happy -- such a gathering of smart people and great software, promising a lot of good things for the free software future.

Note, screenshot is of the unstable, proof-of-concept GNOME desktop-version of modest, courtesy of dape.

Thursday, 21 February 2008

heeding the call

This coming weekend, a big part of our multinational modest-team (Sergio, Berto, Philip and Dirk) will be at the FOSDEM conference in Brussels. There will even be a presentation about the modest e-mail client. So, if you have any questions, suggestions or even some constructive(!) criticism, this is your chance! You can use IM if you're looking for me (diggler[at]gmail.com).

Last year, I had a great time at FOSDEM, with many, many interesting people as well as the nice atmosphere (food and drinks) in Brussels. Hope to see many of you there!

Monday, 18 February 2008

waiting for 22


Achilles had Patroclus. Don Quixote had Sancho Panza. Michael Jackson has Bubbles. And I have emacs. On my N810. A while ago, I already wrote about it. I even showed some screenshot of emacs running in scratchbox. But, I didn't take the final step - getting it to run on an actual N810. Recently, I tried to get that to work. Well, that was frustrating... Some hackish instructions follow. They may or may not work for you -- try at your own risk :)

  • First I tried to simply rebuild the Emacs23-packages in scratchbox; that failed, because the compilation somehow crashes QEMU;
  • Then I tried Emacs 22 instead... but the problem remained;
  • So, I decided that maybe I should try to compile the package it on the N810 itself. Again, that failed. One of many problems: package building requires a real grep, and if you try to install it, it wants to remove the whole busybox environment; sigh. I fought the system - the system won...

But, all was not lost. There are prebuilt Debian packages of Emacs22 available; I took the armel-packages from there, and tried to install them. That almost worked. Almost, because the size of emacs is almost legendary. It did not fit on my root file system on the N810.

We're nearing the solution though; I copied the contents of the .debs to a directory emacs810 (with mc); then I copied this directory to the MMC-card of the N810 (/media/mmc2/). I set some symlinks, ie.


# ln -s /media/mmc2/emacs810/usr/share/emacs /usr/share/emacs
# ln -s /media/mmc2/emacs810/usr/share/emacs22 /usr/share/emacs22
# ln -s /media/mmc2/emacs810/usr/bin/emacs22-gtk /usr/bin/emacs22
# ln -s /media/mmc2/emacs810/usr/share/applications/emacs22.desktop /usr/share/applications/hildon/

Also, you'll need to install libungif4g. That should do the trick, and emacs should show up in your Extras-menu. And we can run emacs! Victory is mine!

Well, almost. In emacs, a very useful key is the Meta-key, usually mapped to the Alt-key of your keyboard. But of course, there's no alt-key on the N810-keyboard. Instead, I decided to remap the Chr-key. I'd like to remap it in my .emacs, but I haven't been able to do so.

Anyway, as a first start, I added these to /usr/share/X11/xkb/symbols/nokia_vndr/rx-44 (the xkb keyboard mapping):


key <SPCE> { [ space, space, Tab, space ] };
key <COMP> { [ Meta_L, Meta_L, Multi_key, Meta_L ] };

modifier_map Mod1 { Meta_L };

The first one will turn Fn-Spc into Tab, which is very useful for completion (for some reason, I couldn't get M-i working in the minibuffer). The second one will turn the Chr key into the M-key (obviously, you can't run emacs without that), with Fn-Chr giving the old Chr key. Not sure what it will break - it's black magic.

Ok, that's it. These steps should be cleaned-up, pre-packaged and made single-click-available. Anyway, the steps above should hopefully get you a working hand held emacs. Happy hacking!

Sunday, 17 February 2008

be quick or be dead

Note, image has not much to do with this post, it's emacs22 running on my N810. More about that in my next post


For a while, we've been doing semi-weekly updates to modest, usually on Friday's. And every week, we're fixing many issues, big and small. Modest has grown quite a bit, and modest + tinymail/camel has about 240K lines of code. That's quite a haystack for bugs to hide, but I'm quite happy with the speed at which we've been able to squash them. The big fix of last week was adding support for maemo-launcher, due to heroic hacking efforts by dape. Maemo-launcher significantly improves startup speed. How does it do that? To answer that, let's look at what happens at application startup time.

still haven't found what i'm looking for


Few people have the time to write a UI-toolkit, or even printf(3), for every piece of software they develop. Thankfully, we can reuse libraries to do such things for us. The most common form ar dynamic libraries. With the ldd(1) utility, you case see which ones your application uses. For example, for modest:

[sbox-CHINOOK_X86: ~] > ldd /usr/local/bin/modest
linux-gate.so.1 => (0xffffe000)
libgtkhtml-3.8.so.15 => /usr/lib/libgtkhtml-3.8.so.15 (0xf7ef9000)
libtinymail-gnomevfs-1.0.so.0 => /usr/local/lib/libtinymail-gnomevfs-1.0.so.0 (0xf7ef6000)
libtinymail-maemo-1.0.so.0 => /usr/local/lib/libtinymail-maemo-1.0.so.0
.... (69 others) ....

Now, when we start modest, we must load these libraries. Suppose, somewhere in the code, we have:

magic_check = gtk_check_button_new_with_label ("Enable magic");

When starting the program, the dynamic linker will now have to figure out at what memory location gtk_check_button_new_with_label, is to be found. And not just that function... if we look at modest, we can find the number of external function (or more general, symbols), with the nm(1) utility:

[sbox-CHINOOK_X86: ~] > nm -u /usr/local/bin/modest
(...)
U gtk_check_button_new_with_label
U gtk_check_menu_item_get_active
U gtk_check_menu_item_get_type
U gtk_check_menu_item_set_active
U gtk_clipboard_get
(...)

In total, there are are almost 1300 external symbols in just modest; and this is only a fraction of the total, as GTK+ will have a lot of external symbols as well (think Glib, Pango, ...). In total, there will be many thousands. Without going into the details, it takes a significant amount of time to do the symbol lookup. Even on a fast desktop machine it can cause noticeable delays (esp. for C++), and more so on 770/N8x0.

Another factor that affects application startup significantly is initialization: in GTK+-based applications, e.g. gtk_init takes quite some time. In particular, applying the GTK-theme is slow on 770/N8x0, because the default theme file (gtkrc) is huge: the default theme on Ubuntu ("Human") contains 242 lines, but on my N810 default theme has 7046 lines, on much slower hardware. And note that the theme is very picture-heavy, and many little images must be loaded. To get an idea of how much work must be done you can use strace(1) when starting an application... scary stuff.

And finally, another slowdown is the physical loading of all these libraries into memory. This will typically only happen the first time, as Linux will keep the data around as long as there is enough memory (note, for testing, you can force a flush with echo 3 > /proc/sys/vm/drop_caches).

i remember now, i remember how it started


What can we do about all this slowness? Enter maemo-launcher.

Maemo-launcher is a daemon that loads common libraries at startup, does a gtk_init for you, and as such, the price for doing all this work (as explained above) is only paid once. Your actual application is compiled as a dynamic library. When your application is started, maemo-launcher forks, this dynamic library is loaded, and we jump to straight to its main-function. This totally gets rid of the initializations, theme loading and such mentioned above, and saves quite some startup time.

How much are the improvements you can expect? Johannes Schmid did some testing, and found improvements of about 25% for small programs. Some non-scientific testing for modest shows that the startup-time is 1-2 seconds faster. Does that matter? Well, let's look at our 10 million ;-) modest users, all of them starting modest once a day. With a 1 second improvement, every day we save almost 4 months of time! Jokes aside, application startup times are very visible to users, and really determine whether they consider your software fast and snappy, or huge and slow; it's time well spent trying to improve that.

parting thoughts


For a very practical way of how to get maemo-launcher working with your program, see the Appendix of the MaemoMM-tutorial. It also shows how to add additional libraries to the set loaded and initialized by maemo-launcher. Recommended reading.

Note; nm as shown above does not work with stripped binaries (i.e. the binaries of which the symbol got removed). And unless you compile everything yourself, most of the binaries on your system will be stripped. However, you can still get much of the information with objdump -R; however, it it's not equivalent to nm -u, and will contain some non-external symbols.

Finally, this is the kind of post that could contain embarrassing factual errors :-) please check the comments for updates.

Saturday, 26 January 2008

battle against time


Like many people, I get a lot of e-mail every day, and I spend a very substantial part of my time processing them. In order to keep things under control, I'm following a number of practices, which I'll describe in the following. As with most good ideas, it's not my own idea -- it's inspired by a time-management method called Getting Things Done (GTD). More about that later.

inbox zero


So how to become in control of our e-mail again (instead of the other way around)?

The most important rule here is that we should always end the working day with an empty inbox. Fantastic as that may sound, it's really the key here. So, how do we do that?


  • At least once a day (and before leaving the office), review the messages in your inbox. For each message, decide what to do with it:

    1. If it requires some action:

      • if it takes less than two minutes: do it now;
      • If someone else can better handle it: delegate (forward) it;
      • otherwise, move it to a folder NextActions;

    2. If might be interesting or important later, archive it
    3. Otherwise, it's just crap - delete it.

  • After this review round, your inbox will be empty. No message stays behind - look at them, and decide where to move them.
  • Now, periodically check your NextActions, take the necessary actions, and move messages to your archive after completing them. (I've found that maybe only one-third of my incoming e-mail end up in NextActions)

That's all there's to it. Following these simple rules, it will be much easier to deal with lots of e-mail. If you have 250 e-mails in your inbox, some of which are weeks old, it's really hard not to forget something important; also, it costs a lot of time to scan those same messages again and again (I'd say handling a mailbox with n mails has at least O(n2) complexity :-).

Of course, this is just a starting point; you could add WaitingFor-folder with a copy of mails you delegated (forwarded) for tracking. You could somehow add the calendar. You can add other to-do items. You could think of some smart way to manage those items in NextActions. And so on... All of those things are discussed in GTD.

gtd


Getting Things Done (GTD) is David Allen's method of time management, which deals with much more than just e-mail. The steps above are a sort-of GTD-light that works very well for my e-mail. Anyway, I can really recommend the GTD-book for anyone interested in time-management. It's a rare gem in the sea of 'personal-productivity' books. Read an overview on the 43 Folders-website, or their excellent article on inboxzero, which has lots of additional tips.

I've been using GTD (and especially these e-mail handling practices) for quite a while now, and it has really saved me a lot of time and frustration. I'm quite sure this will work for a lot of people; but of course, the only way to really find out is to try it out yourself. On the internet, GTD enjoys an almost cult-like following on the web - don't let that scare you away. Just be skeptical, and use what works for you.

epilogue: gtd on n8x0


You could implement quite a bit of this on your N8x0 and (surprise) modest, and I have done so quite succesfully. What's still missing is some calendar integration, and an easy way to handle (search through) huge archives of messages. Anyway, being so mobile, the N810 with modest has become a great productivity tool for me.

One geeky alternative would be to use Emacs; people like Sacha Chua and others have written a lot about using Emacs for Getting Things Done. Now all we need is to port emacs to the N810 -- project maemacs to the rescue! Or maybe someone could make an N8x0 version of Chandler? (unfortunately, the book is still better than the program).

Saturday, 19 January 2008

kill 'em all


The last few weeks, we were suffering some instability in our beloved modest e-mail client. At specific time as well as randomly, modest would decide to call it a day, and prematurely return its pid to the rightful owner. Bug reports are not always enough to pinpoint the problem (which might be outside modest) and it's easy to be misled in some direction. We spent quite some time with the various tools at our disposal. All have their specific strengths and weaknesses, so we tend to use them combined. Let's discuss some of them.

gdb

First, there is the venerable gdb. Although armchair computer scientists snuff at the use of debuggers, down here in the trenches, it's often our last hope. Using gdb effectively takes time, and various extra difficulties (scratchbox, the ARM architecture) can make it a frustrating activity. And for people used to graphical debuggers (like the one in Visual Studio), gdb might seem a bit spartan, even when using something like GdbMode in Emacs. But once you've become friends with gdb, it's an incredible powerful tool, which even works on your N8x0.

As a small tip, in OS2008/Chinook provides the maemo-debug-scripts package, which (among others) offers native-gdb. I'm not sure what's so 'native' about it, but it provides gdb 6.6, which works much better than the apparently 'non-native' gdb 6.4, especially with threaded code. It's not clear to me why the ancient 6.4 is shipped in the first place, but there's probably a good reason. Read more about it here, which has a lot of very practical tips.

valgrind

Then we have valgrind (pronounced vel-grinned). Compared to gdb, which is like a brain surgeon, valgrind resembles a tax auditor (hurray!), with bytes in your RAM as the currency. Valgrind runs your program in a virtual machine, which offers replacements for the normal memory-management functions (malloc, free and friends). When running with valgrind, the application uses valgrind's replacements. And unlike the normal ones, valgrind's versions carefully checks where you get your memory from and what you do with it, and whether you free it when you're done with it.

It's an extremely useful tool for finding memory errors that occur during runtime. One weak point of valgrind is that it doesn't run on ARM; but still, it's a great way to find memory corruptions, leaks and so on, which will show up on X86 as well. Any kind of memory error found on X86 corresponds quite likely to a crash on ARM.

Note that I've only talked about the 'memcheck' tool inside valgrind; there are many more, such as cachegrind, massif and the new iogrind, which are great for profiling your code.

A relatively recent version of valgrind is available in the OS2008/Chinook repositories.

static checking

Except from these runtime tools, there are some other ones, which work at the static, source-code level. There are things like Coverity and lint, but in my (limited) experience, they only catch a small number of problems much that weren't also caught by gcc with -Wall -Werror + valgrind/gdb. Still, it's quite attractive to use any kind of bug prevention you can get your hands on. And let's not forget one of the most important tools: careful code review. Finding some critical part of code, and then simply reading it, letting the statements play out in your mind, and imagining the interactions. The human mind is the greatest debugging tool of all (and a pretty good bug-introduction tool too... )

Now, I hope you can find a bright mind somewhere yourself... Regarding gcc, version 3.4.4 is available in OS2008/Chinook, but modest can be compiled as well (outside scratchbox) with gcc 4.2, using Ubuntu Embedded, or the rudimentary gnome-frontend. Compiling there, and on a 64-bit architecture, helped to fix some issues as well. And the newer gcc is much better at detecting with -Wall.

so far, so good... so what

Now, even with all these tools, I can promise that there are still some bugs left in modest. But also that there are quite a few less. So please get your latest update at the usual place (the application manager). If you find any kind of instability, please file bugs at the usual place, and please describe as detailed as possible what you were doing -- thanks!

Sunday, 6 January 2008

keep on rockin' in the free world


Ok, I guess it's still in time to wish people a happy new year. Apart from the obligatory sessions of eating, drinking and reconnecting with my inner-child, Christmas has been a great time to enrich the internal uranium, and I feel full of 235U again. I'm sure I'll need plenty of it in the new year. So, once more, best wishes to all, at let's make the world a better place in 2008.

One thing to get there is of course is to polish that raw diamond that we call modest. I've been quite happy with the progress we've made since our bèta less than one month ago. We've closed quite a number of bugs, and made steady improvements in performance and the handling of specific emails and mail servers. And we've been making frequent releases, roughly every week. If you're using these weekly updates, you might not necessarily see so much difference between versions, depending of course on your particular use case. But believe me when I say that we are not sitting still :)

Anyhow, there are a couple of problems we're looking into now:


  • first, performance problems with really big folders (ie., many thousands of mails). We're trying to come up with a solution, but it's not easy (as is the case with most interesting things in life). Please bear with us;
  • second, problems with specific servers. Here you can help us! We're testing modest with different POP/IMAP/SMTP servers. But, there are many more different ones in the world, with a wide variety of versions, configurations - a combinatorial explosion. If your server doesn't play nice with modest, please file bugs with all the details (server, version, configuration,...). Also, protocol traces or PCAP-files (tcpdump/wireshark) are very useful, as are test accounts. Remember, if we cannot reproduce it, we probably can't fix it. If there's information you don't want to share with bugs.maemo.org, you may also mail me directly. That does require you to trust me, though.
  • finally, we've seen some problems with rare emails not being shown correctly. Again, if you get such an email, please file a bug, and attach the email (after stripping it of any privacy-sensitive information of course).

Anyhow, for the large majority of users, modest seems to be working quite nicely; if you haven't tried it yet, I invite you to give modest a try, and tell us what you think.