Ikke's Blog

Post details: Regarding string duplication

Jun 16
Regarding string duplication

This entry might bore most of you, upset some others, and could be useful for some. Sorry!!!

Last 3 days I've been reading (in my study pauses ;-)) on code optimalisation (both code speed and memory) and DSO pitfalls (pointers and some samples might follow later).

One of the things I read was a paper by Ulrich Drepper (Glibc guy), where one of the samples was about string duplication (strdup) optimalisation.

I looked into the Glib (no, thats not glibc) code to see how it was handled there, and found a simple, straight-forward, (unoptimized) implementation.
So obviously, I wanted to find out if this could be somewhat optimized. So I started my precious gvim, and made a testcase, which does some (well, a lot of) different duplications, both using g_strdup and my own little implementation.

An overview of what the test code does:

  • Do one simple strdup using g_strdup, and free the result (so the test is "fair", think of relocation time)
  • Copy, print and free a compile-time constant string, a variable string (argv[0]), and an empty string (strlen printed here) to make sure the code works fine
  • In a loop of 10000000 iterations, duplicate a constant string, a variable one (again, argv[0]) and an empty one, and free them

Why always a constant string, a variable one and an empty one? Simply because my implementation tries to handle these cases in the most efficient way ;-)

Here's the result:

nicolas@marslander ~/tmp $ gcc -o my_strdup_test -O2 `pkg-config --cflags --libs glib-2.0` g_strdup_test.c
g_strdup_test.c:9:10: warning: #warning "Using own implementation"
nicolas@marslander ~/tmp $ gcc -o g_strdup_test -O2 `pkg-config --cflags --libs glib-2.0` g_strdup_test.c -DUSE_G_STRDUP
g_strdup_test.c:5:10: warning: #warning "Using g_strdup"
nicolas@marslander ~/tmp $ ./g_strdup_test > /dev/null && echo -e "g_strdup_test:\n==============" && time ./g_strdup_test && ./my_strdup_test > /dev/null && echo -e "my_strdup_test:\n===============" && time ./my_strdup_test
g_strdup_test:
==============
foo = foo
./g_strdup_test = ./g_strdup_test
0 = 0

real    0m9.438s
user    0m8.968s
sys     0m0.064s
my_strdup_test:
===============
foo = foo
./my_strdup_test = ./my_strdup_test
0 = 0

real    0m5.906s
user    0m5.627s
sys     0m0.033s

I know 'time' is not a very good benchmark, but I guess the result is fairly obvious.
This result is +- the average, g_strdup version ranging between 9.4 and 9.6 seconds, mine between 5.8 and 6.1 seconds. This is a 30-40% speed gain, not wasting more memory.

Obviously, not all duplications would become faster. Actually, only the ones of strings which are constant at compile-time. In some applications this can be a major amount of g_strdup calls though.

I should note this "enhancement" is only possible when GCC is used as compiler as it uses some GCC-specific extensions. This is not an issue though, as on other compilers the code won't become slower :-)

The code I used can be found here. I added comments to make some things easier to understand (I hope), but the code is not "clean". A check for GCC should be added etc, but hey, this is just a proof of concept.

I really like these rather low-level optimisation things. They allow a lot of applications to become more efficient without changing the actual app, and low-level code is fun.
I'll read some more about this, and try to blog about interesting papers/techniques/... too, got some things in mind, but lack time now (yeah, those exams).

Luckily the end is in sight, and so is GUADEC :-) Booked my plane some days ago, still awaiting one email about lodging stuff. I'm so glad and thankful I'll be able to be there. Great start of holidays ;-)

Enough now, more algebraics and geometry to learn...

Comments:

Comment from: Stu [Visitor] · http://insightdynamiks.com
Is it possible to post the link to that paper - Cheers
PermalinkPermalink 06/16/06 @ 20:08
Comment from: Sven [Visitor] · http://sven.gimp.org
Wouldn't it make sense to always test for the empty string and do the respective optimization, not only for constant strings?

Also, did you benchmark if optimizing for the empty string is actually an improvement?
PermalinkPermalink 06/17/06 @ 10:39
Comment from: anonymous [Visitor]
You should be posting your "optimization" code where it really belongs, on gtk-devel-list@gonme.org. Your code has a number of issues, a blog comment is not the proper place to discuss this type of thing though; but posting to the corresponding development forum will give you the badly needed review.
PermalinkPermalink 06/17/06 @ 13:22

Leave a comment:

Your email address will not be displayed on this site.
Your URL will be displayed.

Allowed XHTML tags: <p, ul, ol, li, dl, dt, dd, address, blockquote, ins, del, span, bdo, br, em, strong, dfn, code, samp, kdb, var, cite, abbr, acronym, q, sub, sup, tt, i, b, big, small>
(Line breaks become <br />)
(Set cookies for name, email and url)
(Allow users to contact you through a message form (your email will NOT be displayed.))

Categories

Who's Online?

  • Guest Users: 195

Misc

XML Feeds

What is RSS?