Archives for: June 2006, 16

06/16/06

Regarding string duplication

This entry might bore most of you, upset some others, and could be useful for some. Sorry!!!

Last 3 days I've been reading (in my study pauses ;-)) on code optimalisation (both code speed and memory) and DSO pitfalls (pointers and some samples might follow later).

One of the things I read was a paper by Ulrich Drepper (Glibc guy), where one of the samples was about string duplication (strdup) optimalisation.

I looked into the Glib (no, thats not glibc) code to see how it was handled there, and found a simple, straight-forward, (unoptimized) implementation.
So obviously, I wanted to find out if this could be somewhat optimized. So I started my precious gvim, and made a testcase, which does some (well, a lot of) different duplications, both using g_strdup and my own little implementation.

An overview of what the test code does:

  • Do one simple strdup using g_strdup, and free the result (so the test is "fair", think of relocation time)
  • Copy, print and free a compile-time constant string, a variable string (argv[0]), and an empty string (strlen printed here) to make sure the code works fine
  • In a loop of 10000000 iterations, duplicate a constant string, a variable one (again, argv[0]) and an empty one, and free them

Why always a constant string, a variable one and an empty one? Simply because my implementation tries to handle these cases in the most efficient way ;-)

Here's the result:

nicolas@marslander ~/tmp $ gcc -o my_strdup_test -O2 `pkg-config --cflags --libs glib-2.0` g_strdup_test.c
g_strdup_test.c:9:10: warning: #warning "Using own implementation"
nicolas@marslander ~/tmp $ gcc -o g_strdup_test -O2 `pkg-config --cflags --libs glib-2.0` g_strdup_test.c -DUSE_G_STRDUP
g_strdup_test.c:5:10: warning: #warning "Using g_strdup"
nicolas@marslander ~/tmp $ ./g_strdup_test > /dev/null && echo -e "g_strdup_test:\n==============" && time ./g_strdup_test && ./my_strdup_test > /dev/null && echo -e "my_strdup_test:\n===============" && time ./my_strdup_test
g_strdup_test:
==============
foo = foo
./g_strdup_test = ./g_strdup_test
0 = 0

real    0m9.438s
user    0m8.968s
sys     0m0.064s
my_strdup_test:
===============
foo = foo
./my_strdup_test = ./my_strdup_test
0 = 0

real    0m5.906s
user    0m5.627s
sys     0m0.033s

I know 'time' is not a very good benchmark, but I guess the result is fairly obvious.
This result is +- the average, g_strdup version ranging between 9.4 and 9.6 seconds, mine between 5.8 and 6.1 seconds. This is a 30-40% speed gain, not wasting more memory.

Obviously, not all duplications would become faster. Actually, only the ones of strings which are constant at compile-time. In some applications this can be a major amount of g_strdup calls though.

I should note this "enhancement" is only possible when GCC is used as compiler as it uses some GCC-specific extensions. This is not an issue though, as on other compilers the code won't become slower :-)

The code I used can be found here. I added comments to make some things easier to understand (I hope), but the code is not "clean". A check for GCC should be added etc, but hey, this is just a proof of concept.

I really like these rather low-level optimisation things. They allow a lot of applications to become more efficient without changing the actual app, and low-level code is fun.
I'll read some more about this, and try to blog about interesting papers/techniques/... too, got some things in mind, but lack time now (yeah, those exams).

Luckily the end is in sight, and so is GUADEC :-) Booked my plane some days ago, still awaiting one email about lodging stuff. I'm so glad and thankful I'll be able to be there. Great start of holidays ;-)

Enough now, more algebraics and geometry to learn...

Permalink . Ikke . 07:30:53 pm . 645 Words . Linux, Coding Corner . . 969 views . 3 comments