Most people most likely saw the YouTube movie on content-aware image resizing which got blogged quite a lot lately. I read the corresponding paper, and wrote an implementation (not finished/perfect at all, but well) in Python. If it would ever become "production quality" a Gimp and/or GEGL plugin would be nice.
Here's a sample:
Resized using Gimp, cubic interpolation, 150px
Resized image, 150px
Overview of removed pixels
This transformation is done in about 2 seconds (mainly because of some calculations in pure Python. For most calculations I use the Python Imaging Library and SciPy/NumPy, which are mainly C modules and much faster). As you can see the implementation still needs lots of love.
You can see another sample (image resized from 1000 to 250px in 8 seconds) here.
Git repository is here. Please email any patches!
The algorithm itself is surprisingly "simple" and easy to understand, great job by the researchers! More on that later. I should be studying mathematical analysis now, 2nd time I got to redo this exam, bloody university :-(
Using very expensive algorithm
This image was generated by:
- Loading the input image
- 150 times:
- Calculate energy and cost of current working picture
- For every pixel in the top row, calculate the cost of the "best path" starting at this pixel
- Figure out which path is the cheapest
- Create an image which is the working image, minus this best path
- Replace the working image with the image generated in the previous step
This took 273 seconds on my system, as the complexity is something like O(150*N*N*N*N*N*N*M) where M is the complexity of the gradient magnitude calculation.
Conclusion: not a workable solution :D
Do notice there are significant changes between this image and the one posted above. As I wrote this as a quick hack, I didn't include code to show which paths were removed from the original image.
Are you recalculating the colour gradient on each loop?
Are you using both horizontal and vertical gradients?
Only, once gradients and costs are calculated, the mechanism to find out where to start a path just sucks (it's 2-line based now), this needs to be reworked.
I wonder if you could adjust the gradient of nearby pixels when you choose a path. This should be faster than recalculating the gradient on each iteration.
Eg if the gradient looks like
3 2 5 (choosing 2 as the path) then split the grad to the adjacent pixels gives
4 * 6
The problem is this magnitude is calculated using 9 pixels (the pixel itself and the 8 pixels around it) so I don't think you can just add it to the 2 next to it. I'll investigate this further later on, unless someone got great ideas/patches right now ;-)
D'oh. Same here (but third time I try this exam), then I can finally take my master of science in telecommunications engineering.
I hate non-applied mathematics.
And yeah, non-applied maths suck most of the time. Like, I had a course on convolution and gradients once, and I never understood what it was all about. They used fluids in the examples etc.
Now I worked on this application thingy, got in contact with image manipulation, the energy functions (which is a pure application of gradients and convolution), and I finally understand what it's all about, and it's actually so easy to understand.
Ah, well... Same here, I understood convolutions and Fourier transform only after my Digital signals processing (mostly audio, 1-d signals) and Digital image processing courses. :)
With a more practical approach I arrived to "like" probability theory and Markov chains too, but my first contact with those things was a NIGHTMARE.
I did have classes on Fourier before too in math classes, which were kinda hard, but luckily I had some (practical) pre-knowledge as when I was 16-17 I used to write quite a lot of sound DSP code, including the use of the DFT(FFT). I didn't completely understand how it worked (although I understood the basics, thanks to a great article, "The DFT a pied", should be possible to find that somewhere on the net), but I knew what the result of the operation was, I could hear the result of playing with it (and doing an ifft afterwards, obviously) which made everything so much easier to understand.
To tell the truth, I went to the Liceo Classico, which is a "humanistic" high school, I studied ancient greek and latin for 5 years, and I saw my first limits, derivatives and integrals in my first university course.
It was hard, really hard, until the first practical courses, but I had them only at the end of the second year.
Somewhat different at uni :-/
I still don't understand how we could study physics without limits, derivatives and integrals.
Something like this:
1) For each pixel in the bottom row, its best path is empty and its cost is the energy of the pixel
2) For each row from the next-to-bottom and up to the top do: For each pixel in the row, the best path is the pixel plus the cheapest path of the three pixels below it.
3) When you're at the top you have the costs for all paths, pick the cheapest one and remove.
You never have to store more than two rows of costs and paths, and paths are not that large, as you just need 2 bits to store each pixel choice (since there are only three possible choices: sw, s, se).
Total time complexity per pixel column removed is:
O(w * h), total space complexity is O (h * w)
My wife is a photographer and is excited by using this in some of her digital work, when the weighting tool is implemented. That is what I'm mostly interested in. I'd probably end up branching off a bit, but what do you think about defining a format for the precalculated data?
Source here: http://inamidst.com/code/seamless.c
I cannot seem to download the original paper from the authors website (above). Could you maybe email it to me?
Great work on the implementation. If I would be able to make a little more time I would have also tried implementing it. Keep up the good work.
BTW If you are interested in some other great implementations (this is 3D rendering related) look at this site: http://graphics.stanford.edu/~fedkiw/
@Alexander: indeed, that's the original way I did it, although from top to bottom. Later changes changed this quite a lot (and not always "correct") as I forgot what I was actually doing.
Major issue is: once you delete one seam, everything needs to be recomputed.
@Calvin: once you're familiar with it it's ok :-)
@Sean: I use Python for prototyping, when the algorithm is working "reasonably well", writing a C version (non-segfaulting ;-)) is pretty straight-forward.
@Roderik: it's 20MB, kinda large to email... There are some torrents of it IIRC, Google is your friend :-)
Got a partial implementation myself, written in C#/.NET. Currently does only carve vertical seams for downscaling in the horizontal direction. I have done no speed measurements, because downscaling happens in (as close as possible to) realtime, while you resize the window containing the image.
I think, if you want this to be efficient for resizing in both directions at the same time, for a useful GIMP plugin you'd have to switch to C. No idea how difficult to use the interface to GIMP is though.
Distros could pick default backgrounds that behave well with the algorithm, and then the whole widescreen versus 4:3 aspect ratio issue could be side stepped.
On the other hand the more expensive implementation moves the castle to about the middle of the image.
Maybe the algorithm needs to keep track of the position of the paths removed to try to maintain most of the original composition?
As well, rather than just removing the least-cost path in each iteration, maybe the pixels on either side of the removed path could be adjusted to take into consideration the removal of the missing path?
@Alfred: It tries to maintain as much as possible automaticly. The problem with the first implementation is it doesn't take changes into account, ie: you calculate everything, then remove seams, whilst this is not correct as after removing one seam you got a new images and all calculations should be redone (as in the last example).
About pixels around the seam: in the first implementation I interpolated the colorspace when removing a seam, it'd be easy to add that to the later versions too, IIRC is't also done in the last version.
Current GIT uses the last algorithm (always recalculating before removing one seam) which is kinda slow, but Psyco actually does a good job.
A couple of suggestions:
1.- Use dynamic programing for the shortest path
2.- Don't compute the ENTIRE gradient every single time, you know what pixels you are changing, then change also the gradient!
3.- I'm using an idea for the energy function that preserves the global energy of the image as you treat it. Currently I'm testing with nice results 'retouching' the gradient by merging whatever energy was in the pixel to be eliminated into its two neighbours (50% each) give it a try.
I'm thinking on doing a plain C implementation to make a GIMP plugin out of it, since it might be faster than python.
If you want to see my code, mail me; it's working for shrinking in both dimensions (will start working on the expansion as soon as I get some free time).
It installs under the Image menu. It scales in both directions. It uses the original algorithm as described in the paper, with the gradient from the 4 nearest neighbors. The code is rather complete, it mainily misses a good GUI, and some options to play with. And maybe an extension of the energy function to let it be more flexible.
Once the seams are discovered it works really fast, though. I will post results on my blog when I have implemented a faster algorithm for seam creation.
Some things that can be done to increase speed:
1. Don't recalculate the edge detection image every time. You can get away with only calculating it every so often and still get good results.
2. Resize the grayscale of the image along with everything else. Saves one extra step for edge detection.
3. Minimize the times when you need to do bounds-checking. Optimize your loops so areas that need bounds checking are handled separately.
Attempting to preserve unchanged portions of the energy map may be more costly than actually fully recalculating it. Depending on where a seam was removed, you may only be left with less than 1/3 of the energy map unchanged. Coding to repair the map might be more costly than just blindly recalculating.
I also have a python implementation (many people posted interesting ideas on my post).
This post has 4 feedbacks awaiting moderation...