Ikke's Blog

Post details: Seam Carving: Content-aware image resizing

Sep 2
Seam Carving: Content-aware image resizing

Most people most likely saw the YouTube movie on content-aware image resizing which got blogged quite a lot lately. I read the corresponding paper, and wrote an implementation (not finished/perfect at all, but well) in Python. If it would ever become "production quality" a Gimp and/or GEGL plugin would be nice.

Here's a sample:
Original image

Resized using Gimp, cubic interpolation, 150px
Resized using Gimp

Resized image, 150px
Resized using seam carving

Overview of removed pixels
Removed pixels

This transformation is done in about 2 seconds (mainly because of some calculations in pure Python. For most calculations I use the Python Imaging Library and SciPy/NumPy, which are mainly C modules and much faster). As you can see the implementation still needs lots of love.

You can see another sample (image resized from 1000 to 250px in 8 seconds) here.

Git repository is here. Please email any patches!

The algorithm itself is surprisingly "simple" and easy to understand, great job by the researchers! More on that later. I should be studying mathematical analysis now, 2nd time I got to redo this exam, bloody university :-(

Using very expensive algorithm
Expensive algorithm
This image was generated by:

  • Loading the input image
  • 150 times:
    • Calculate energy and cost of current working picture
    • For every pixel in the top row, calculate the cost of the "best path" starting at this pixel
    • Figure out which path is the cheapest
    • Create an image which is the working image, minus this best path
    • Replace the working image with the image generated in the previous step

This took 273 seconds on my system, as the complexity is something like O(150*N*N*N*N*N*N*M) where M is the complexity of the gradient magnitude calculation.
Conclusion: not a workable solution :D
Do notice there are significant changes between this image and the one posted above. As I wrote this as a quick hack, I didn't include code to show which paths were removed from the original image.


Comment from: Daniele Medri [Visitor] Email · http://daniele.wordpress.com
Take a look: http://www.thegedanken.com/retarget/
PermalinkPermalink 09/02/07 @ 12:58
Comment from: Ikke [Member] · http://www.eikke.com
Major difference: closed source vs open source ;-)
PermalinkPermalink 09/02/07 @ 13:02
Comment from: John Drinkwater [Visitor] Email · http://johndrinkwater.name/
Broadway tower ftw :)
PermalinkPermalink 09/02/07 @ 13:28
Comment from: Ikke [Member] · http://www.eikke.com
Could be, got this from a "Creative Commons" search on Flickr :-) Thanks to the original author ;-)
PermalinkPermalink 09/02/07 @ 13:45
Comment from: dhon [Visitor] Email
Wonder what the hard lines are caused by?

Are you recalculating the colour gradient on each loop?

Are you using both horizontal and vertical gradients?
PermalinkPermalink 09/02/07 @ 13:52
Comment from: Ikke [Member] · http://www.eikke.com
Path choosing algorithm isn't very good yet. Colour gradients are only calculated once, both horizontal and vertical, and the gradient magnitude is calculated to do the edge detection (image of that is in the image I link to, the 2 towers). Gradients are calculated using convolution with the Sobel matrices.

Only, once gradients and costs are calculated, the mechanism to find out where to start a path just sucks (it's 2-line based now), this needs to be reworked.
PermalinkPermalink 09/02/07 @ 14:18
Comment from: dhon [Visitor] Email
Indeed running the program 200 times removing only 1 px each time over the castle image eliminates the hard lines in the grass/snow but takes chunks out of the castle.

I wonder if you could adjust the gradient of nearby pixels when you choose a path. This should be faster than recalculating the gradient on each iteration.

Eg if the gradient looks like
3 2 5 (choosing 2 as the path) then split the grad to the adjacent pixels gives
4 * 6
PermalinkPermalink 09/02/07 @ 14:59
Comment from: Ikke [Member] · http://www.eikke.com
Not gradient, gradient magnitude.
The problem is this magnitude is calculated using 9 pixels (the pixel itself and the 8 pixels around it) so I don't think you can just add it to the 2 next to it. I'll investigate this further later on, unless someone got great ideas/patches right now ;-)
PermalinkPermalink 09/02/07 @ 15:06
Comment from: Giacomo [Visitor] Email
I should be studying mathematical analysis now, 2nd time I got to redo this exam, bloody university :-(

D'oh. Same here (but third time I try this exam), then I can finally take my master of science in telecommunications engineering.
I hate non-applied mathematics.
PermalinkPermalink 09/02/07 @ 15:41
Comment from: Ikke [Member] · http://www.eikke.com
*Grin* 2nd time I have to redo it == 3rd time I do it ;-)

And yeah, non-applied maths suck most of the time. Like, I had a course on convolution and gradients once, and I never understood what it was all about. They used fluids in the examples etc.
Now I worked on this application thingy, got in contact with image manipulation, the energy functions (which is a pure application of gradients and convolution), and I finally understand what it's all about, and it's actually so easy to understand.
PermalinkPermalink 09/02/07 @ 15:44
Comment from: Giacomo [Visitor] Email
*Grin* 2nd time I have to redo it == 3rd time I do it ;-)

Ah, well... Same here, I understood convolutions and Fourier transform only after my Digital signals processing (mostly audio, 1-d signals) and Digital image processing courses. :)
With a more practical approach I arrived to "like" probability theory and Markov chains too, but my first contact with those things was a NIGHTMARE.
PermalinkPermalink 09/02/07 @ 15:54
Comment from: Ikke [Member] · http://www.eikke.com
Hehe, I did a "choice-subject" on sound, music and sound processing too this year, it was very understandable.
I did have classes on Fourier before too in math classes, which were kinda hard, but luckily I had some (practical) pre-knowledge as when I was 16-17 I used to write quite a lot of sound DSP code, including the use of the DFT(FFT). I didn't completely understand how it worked (although I understood the basics, thanks to a great article, "The DFT a pied", should be possible to find that somewhere on the net), but I knew what the result of the operation was, I could hear the result of playing with it (and doing an ifft afterwards, obviously) which made everything so much easier to understand.
PermalinkPermalink 09/02/07 @ 15:58
Comment from: Giacomo [Visitor] Email
That was a good thing, I had absolutely no previous knowledge of pretty much everything I studied at the university.

To tell the truth, I went to the Liceo Classico, which is a "humanistic" high school, I studied ancient greek and latin for 5 years, and I saw my first limits, derivatives and integrals in my first university course.

It was hard, really hard, until the first practical courses, but I had them only at the end of the second year.
PermalinkPermalink 09/02/07 @ 16:08
Comment from: Ikke [Member] · http://www.eikke.com
/me studied Latin 4 years long too, then switched to sciences - mathematics (8h of math each week), as I wanted to continue my studies in the sciences field. Luckily, in highschool maths an sciences were fun: easy to understand, easy to derive your own proofs when necessary, not necessary to learn them stupidly by heart, etc.
Somewhat different at uni :-/
PermalinkPermalink 09/02/07 @ 16:11
Comment from: Giacomo [Visitor] Email
Here in Italy you can't easily change (or at least you could not change easily, it seems there's a school reform every year so I'm not sure of how things are now): I started the Liceo Classico, I had to end it, so in my last year in the high school I had 2 hours a week of trigonometry, 3 hours a week of physics and stop.
I still don't understand how we could study physics without limits, derivatives and integrals.
PermalinkPermalink 09/02/07 @ 16:36
Comment from: tim [Visitor] Email
The DFT à Pied:
PermalinkPermalink 09/02/07 @ 19:42
Comment from: Ikke [Member] · http://www.eikke.com
That's the one. The author got some other very well written and interesting articles too. If you're interested in audio DSP, check them out!
PermalinkPermalink 09/02/07 @ 19:45
Comment from: Raphael Slinckx [Visitor] Email · http://raphael.slinckx.net
Dude ! That's amazing, i can't wait to be able to use this in gimp, you rock !
PermalinkPermalink 09/02/07 @ 21:58
Comment from: Ikke [Member] · http://www.eikke.com
Actually I just noticed there's a severe bug in the implementation, implementing 2 days after reading the paper is not a good idea :-)
PermalinkPermalink 09/02/07 @ 22:15
Comment from: qhartman [Visitor] Email
Keep up the good work!!! I would _love_ to see this implemented as a GIMP plugin!
PermalinkPermalink 09/02/07 @ 23:27
Comment from: CODE NOW [Visitor] Email
Show the code
PermalinkPermalink 09/03/07 @ 03:31
Comment from: Alexander Larsson [Visitor] Email
The obvious way to do this efficiently is to go backwards from the bottom and then keep all intermediate best-path and its cost for each pixel in the row.

Something like this:
1) For each pixel in the bottom row, its best path is empty and its cost is the energy of the pixel

2) For each row from the next-to-bottom and up to the top do: For each pixel in the row, the best path is the pixel plus the cheapest path of the three pixels below it.

3) When you're at the top you have the costs for all paths, pick the cheapest one and remove.

You never have to store more than two rows of costs and paths, and paths are not that large, as you just need 2 bits to store each pixel choice (since there are only three possible choices: sw, s, se).

Total time complexity per pixel column removed is:
O(w * h), total space complexity is O (h * w)
PermalinkPermalink 09/03/07 @ 08:07
Comment from: Calvin Spealman [Visitor] Email · http://www.ironfroggy.com/blog/
Still looking forward to getting involved, but boy does git suck! Hah, I kid. Sort of...

My wife is a photographer and is excited by using this in some of her digital work, when the weighting tool is implemented. That is what I'm mostly interested in. I'd probably end up branching off a bit, but what do you think about defining a format for the precalculated data?
PermalinkPermalink 09/03/07 @ 08:16
Comment from: Sean B. Palmer [Visitor] Email · http://inamidst.com/sbp/
I wrote (well, upgraded) a thing in C which does seam carving pretty fast and effectively, though it segfaults on some large images. Nice to see a Python implementation; hope you can optimise it somewhat!

Source here: http://inamidst.com/code/seamless.c
PermalinkPermalink 09/03/07 @ 09:18
Comment from: Roderik [Visitor] Email

I cannot seem to download the original paper from the authors website (above). Could you maybe email it to me?

Great work on the implementation. If I would be able to make a little more time I would have also tried implementing it. Keep up the good work.

BTW If you are interested in some other great implementations (this is 3D rendering related) look at this site: http://graphics.stanford.edu/~fedkiw/
PermalinkPermalink 09/03/07 @ 09:42
Comment from: Ikke [Member] · http://www.eikke.com
@CODE NOW: read before you yell

@Alexander: indeed, that's the original way I did it, although from top to bottom. Later changes changed this quite a lot (and not always "correct") as I forgot what I was actually doing.
Major issue is: once you delete one seam, everything needs to be recomputed.

@Calvin: once you're familiar with it it's ok :-)

@Sean: I use Python for prototyping, when the algorithm is working "reasonably well", writing a C version (non-segfaulting ;-)) is pretty straight-forward.

@Roderik: it's 20MB, kinda large to email... There are some torrents of it IIRC, Google is your friend :-)
PermalinkPermalink 09/03/07 @ 13:11
Comment from: Thomas [Visitor] Email
Fascinating algorithm indeed.

Got a partial implementation myself, written in C#/.NET. Currently does only carve vertical seams for downscaling in the horizontal direction. I have done no speed measurements, because downscaling happens in (as close as possible to) realtime, while you resize the window containing the image.

I think, if you want this to be efficient for resizing in both directions at the same time, for a useful GIMP plugin you'd have to switch to C. No idea how difficult to use the interface to GIMP is though.
PermalinkPermalink 09/03/07 @ 14:22
Comment from: Ikke [Member] · http://www.eikke.com
GIMP plugin will come when the python implementation is completely done :-) Shouldn't be that hard...
PermalinkPermalink 09/03/07 @ 15:03
Comment from: cwillu [Visitor] Email
Wonder how much effect psyco would have
PermalinkPermalink 09/03/07 @ 17:23
Comment from: Palmax [Visitor] Email
It can be a so good idea. Can you develop a GUI? I think that it will be so interesting for use this code.
PermalinkPermalink 09/03/07 @ 20:16
Comment from: Ray Strode [Visitor] Email
Another interesting idea (in addition to a gimp plugin) would be to offer a "Smart Scaling" option in the background tab of the appearance capplet.

Distros could pick default backgrounds that behave well with the algorithm, and then the whole widescreen versus 4:3 aspect ratio issue could be side stepped.
PermalinkPermalink 09/04/07 @ 03:45
Comment from: Alfred Milgrom [Visitor] Email
I think it is fascinating that in the first implementation the castle is shifted from a position in the original image of about 35% in from the left to a position 20% in from the left.

On the other hand the more expensive implementation moves the castle to about the middle of the image.

Maybe the algorithm needs to keep track of the position of the paths removed to try to maintain most of the original composition?

As well, rather than just removing the least-cost path in each iteration, maybe the pixels on either side of the removed path could be adjusted to take into consideration the removal of the missing path?
PermalinkPermalink 09/04/07 @ 08:35
Comment from: Ikke [Member] · http://www.eikke.com
@Ray: that might be easily possible when a library version (or GEGL) of this exist, I do not intend to work on that though.

@Alfred: It tries to maintain as much as possible automaticly. The problem with the first implementation is it doesn't take changes into account, ie: you calculate everything, then remove seams, whilst this is not correct as after removing one seam you got a new images and all calculations should be redone (as in the last example).
About pixels around the seam: in the first implementation I interpolated the colorspace when removing a seam, it'd be easy to add that to the later versions too, IIRC is't also done in the last version.
Current GIT uses the last algorithm (always recalculating before removing one seam) which is kinda slow, but Psyco actually does a good job.
PermalinkPermalink 09/04/07 @ 11:58
Comment from: Where is the source code? [Visitor] Email
You screamed at "code now" to read the page but I too do not see any source code.
PermalinkPermalink 09/04/07 @ 14:44
Comment from: Ikke [Member] · http://www.eikke.com
Zif: there's a link to the GIT repository containing all code (gitweb interface, clone URI is in there) in the post. I won't post source code in a blog post ;-)
PermalinkPermalink 09/04/07 @ 15:06
Comment from: Mariano [Visitor] Email
I have a C++ implementation of the algorithm which works out your image in less than 2 seconds.

A couple of suggestions:
1.- Use dynamic programing for the shortest path
2.- Don't compute the ENTIRE gradient every single time, you know what pixels you are changing, then change also the gradient!
3.- I'm using an idea for the energy function that preserves the global energy of the image as you treat it. Currently I'm testing with nice results 'retouching' the gradient by merging whatever energy was in the pixel to be eliminated into its two neighbours (50% each) give it a try.

I'm thinking on doing a plain C implementation to make a GIMP plugin out of it, since it might be faster than python.

If you want to see my code, mail me; it's working for shrinking in both dimensions (will start working on the expansion as soon as I get some free time).

PermalinkPermalink 09/05/07 @ 21:32
Comment from: carlobaldassi [Visitor] Email
I also have done an implementation, in C++ turned to C, and then tuned it into a GIMP plugin. It can be downloaded at the GIMP plugin registry http://registry.gimp.org under the name "Liquid rescale" in category "Transforms".

It installs under the Image menu. It scales in both directions. It uses the original algorithm as described in the paper, with the gradient from the 4 nearest neighbors. The code is rather complete, it mainily misses a good GUI, and some options to play with. And maybe an extension of the energy function to let it be more flexible.
PermalinkPermalink 09/06/07 @ 07:04
Comment from: Tim Wintle [Visitor] Email · http://www.timwintle.co.uk
I have been working on a python implementation too, and am suffering from the inefficiency of python too.

Once the seams are discovered it works really fast, though. I will post results on my blog when I have implemented a faster algorithm for seam creation.
PermalinkPermalink 09/08/07 @ 21:32
Comment from: Serge [Visitor] Email
I have created the experimental one, you may download it here: http://www.intuimage.com
PermalinkPermalink 09/11/07 @ 16:09
Comment from: Tim Wintle [Visitor] Email · http://www.timwintle.co.uk
Hi again, I am currently trying to get people who have written their own versions of this together to compare results on a standard set of images. If anyone reading is interested contact me via my blog.

PermalinkPermalink 09/13/07 @ 15:21
Comment from: Nick [Visitor] Email
Ditch Python. Seriously. It's like driving a Ferrari with the handbrake on.
PermalinkPermalink 09/18/07 @ 09:22
Comment from: Will [Visitor] Email · http://rsizr.com/
Check out rsizr.com for a Flash-based implementation of seam carving that lets you resize images, both in height and width simultaneously, in real time. (You can rescale and crop images too!) Can't beat the speed of a desktop app, but it can get the job done...
PermalinkPermalink 09/29/07 @ 06:31
Comment from: yaniv [Visitor] Email · http://yaniv.leviathanonline.com/
Check out http://yaniv.leviathanonline.com/blog/math/seam-carving/
PermalinkPermalink 10/25/07 @ 19:42
Comment from: Brain_ReCall [Visitor] Email · http://sourceforge.net/projects/c-a-i-r/
Unfortunetly, a really fast implimention needs a different language. I've been working on this in C++ with some pthread multi-threading goodness. I don't have exact times for determining each individual seam removal, however scaling an image down from 1024x678 to 800x600 takes about 15 seconds.

Some things that can be done to increase speed:
1. Don't recalculate the edge detection image every time. You can get away with only calculating it every so often and still get good results.
2. Resize the grayscale of the image along with everything else. Saves one extra step for edge detection.
3. Minimize the times when you need to do bounds-checking. Optimize your loops so areas that need bounds checking are handled separately.

Attempting to preserve unchanged portions of the energy map may be more costly than actually fully recalculating it. Depending on where a seam was removed, you may only be left with less than 1/3 of the energy map unchanged. Coding to repair the map might be more costly than just blindly recalculating.
PermalinkPermalink 10/29/07 @ 01:47
Comment from: yaniv [Visitor] · http://yaniv.leviathanonline.com
Cool post.
I also have a python implementation (many people posted interesting ideas on my post).
PermalinkPermalink 11/23/07 @ 17:07
Comment from: Lazar [Visitor] Email · http://www.inverudio.com
I was gonna ask you about source code, but then I read comments...
PermalinkPermalink 11/26/07 @ 01:30
Comment from: hou k [Visitor] Email
I am very interested in this algorithm. Could you send me a copy of the source code? Thanks a lot.
PermalinkPermalink 12/13/07 @ 09:12

This post has 4 feedbacks awaiting moderation...

Leave a comment:

Your email address will not be displayed on this site.
Your URL will be displayed.

Allowed XHTML tags: <p, ul, ol, li, dl, dt, dd, address, blockquote, ins, del, span, bdo, br, em, strong, dfn, code, samp, kdb, var, cite, abbr, acronym, q, sub, sup, tt, i, b, big, small>
(Line breaks become <br />)
(Set cookies for name, email and url)
(Allow users to contact you through a message form (your email will NOT be displayed.))


Who's Online?

  • Guest Users: 132


XML Feeds

What is RSS?