Show HN: Improved Seam Carving Using Forward Energy

seam_carver · on May 27, 2018

Don't think most people know about the improved seam carving algorithm (since only the original is taught in most CS curriculums) and thought you might think this is cool!

jaytaylor · on May 27, 2018

It is a neat algorithm, hope you are able to get it merged into scikit-image [0] without too much fuss~ good luck!

[0] https://github.com/scikit-image/scikit-image/issues/3082

chatmasta · on May 28, 2018

Seam carving is really cool, not just for the typical application of changing images, but also as a method of fingerprinting images (and the other side, anonymizing them). There are some interesting papers on these use cases via Google scholar [0].

There are a lot of potential businesses that could be built around image fingerprinting. Off the top of my head those include the obvious copyright/IP monitoring, and also more novel ideas like "meme tracking" to observe the propagation of memes across the Internet.

[0] https://scholar.google.co.uk/scholar?hl=en&as_sdt=0%2C5&q=se...

seam_carver · on May 28, 2018

That looks pretty cool! Thanks for the link! I'll look more into it soon.

xg15 · on May 27, 2018

> Both papers introduce different ways to "calculate the energy of a pixel, which is a measure of its importance—the higher the energy, the less likely that the pixel will be included as part of a seam."

If I understand correctly, this function is basically how the algorithm decides which parts of the image to cut (or stretch). So it seems like it could be interesting to experiment with a range of different functions here.

Has anyone tried to e.g. plug in a CNN?

seam_carver · on May 27, 2018

I'm unfamiliar with CNN's, but you can draw masks manually to protect/remove parts of an image, or use a face recognition thing, or a motion detection thing, etc.

TTPrograms · on May 27, 2018

The biggest value of ML here would likely be to create a more useful energy function. This could be informed either by object detection data or hand drawn "energy" masks.

The other big change that could be useful would be to go with a non-greedy approach. Say you want to scale down a crowd of people - how do you remove, say, one person from the crowd instead of removing "low-energy" strips from multiple people. The people in the crosswalk image sort of demonstrates the difficulty of using basic energy metrics there.

One interesting idea would be to use GANs - come up with some parameterized seam removal function (with number of seams a input) such that the resulting image is still highly probable under the data distribution. So you can't cut objects into weird shapes since there aren't images like that in the dataset.

seam_carver · on May 27, 2018

By GAN, do you mean "Generative adversarial network?" I don't have much ML experience so I don't really understand what you're saying :P

Do you know a good place to start? I heard Andrew Ng Coursera course is good.

TTPrograms · on May 28, 2018

Yeah, that's right. I don't have any personal recommendations for materials, but the Andrew Ng course would likely be a good start.

clickok · on May 28, 2018

I would not dive into GANs right off the bat. Ng's course is a good start, maybe also buy or borrow a copy of Russel and Norvig's "Artificial Intelligence: A Modern Approach"[0]. If you decide to look into deep learning, I've heard good thing about Bengio's book (see also Hugo Larochelle's lectures on YouTube) and if you want to learn more about reinforcement learning, "Reinforcement Learning: An Introduction" by Sutton and Barto is frankly amazing (and there's a great course on YouTube by David Silver).

From reading your code, you're probably familiar with most of the stuff you need to learn ML, so I thought I would take a stab at explaining how you get from "no machine learning experience at all" to "surely GANs will solve all our problems". Broadly speaking, a lot of machine learning is just deciding what your goal is (carve an image, play a game, classify objects, group similar data points, etc.) definining an objective function (e.g., number of sharp edge removed, player score, a classification loss function[1], distance over points in a cluster) and then maximizing or minimizing that objective function.

In classification, you want a function `f(.)` such that given a data point `x`, `f(x) ≈ y`, where `y` is the correct label for that data point. Say you have a bunch of these data points in a matrix `X`, and a vector of correct labels `y`–– how do you come up with a function that correctly classifies the data in `X`, and how can you do it so that data you might collect in the future would also be correctly classified?

You could do this with linear least squares (come up with weights `w` to minimize `|f(x) - y|`, with f(x) = x^T w) but unless your data is cleanly separable, that probably won't work so well. There are a lot of possible techniques people have tried, but lately neural networks have become popular because they are powerful approximators, relatively easy to deploy, and pretty versatile. But the training process[2] still boils down to tweaking weights until `|f(x) - y|` is as low as you can make it.

General adversarial networks are kinda a tweak on this idea-- you start with a bunch of images, and you train a network to generate images that are "similar" to them; call the initial images the "real" images and the generated ones "fakes". You want your network to generate fakes that are similar to the real images. How do you define similar? Well, you have a second network whose objective is to distinguish between real and fake images. You train both at the same time, using feedback from the second network to improve the first network, and using the images generated by the first network as training examples for the second.

Initially, the first network will generate splotchy garbage that the discriminator can easily tell apart from the real images. Over time, the first network improves, creating increasingly realistic fakes, while the second network gets better at determining whether an image is real or generated. After much training, you have two networks: one that is good at generating images that are similar to the seed images, and another network that is good at identifying if an image is real or fake.

––––––––

0. It even has a website and example code http://aima.cs.berkeley.edu/

1. https://en.wikipedia.org/wiki/Loss_functions_for_classificat...

2. See here for a nice explanation of the backprop algorithm, http://neuralnetworksanddeeplearning.com/chap2.html or if you want the OG (original Geoff) paper on the topic, it's in the proceedings for Parallel Distributed Processing 1986 -- "Learning representations by back-propagating errors", Rumelhart, Hinton, and Williams.

seam_carver · on May 28, 2018

That's a lot to take in; thanks for all the links!

mlthoughts2018 · on May 27, 2018

You could use a one-hot encoding to represent the resizing operation that’s desired ([0, 0, 1] for horizontal shrink, [0, 1, 0] for vertical shrink, and [0, 1, 1] for shrinking both, then use first component of 1 for expansion).

The input to a CNN model could then be the input image and the one-hot encoded resize instruction. And the “label” could be the pixel map of the chosen seam-carving seam that produces the resize step.

The CNN output will be a softmax pixel map, optimized to reproduce the seam-carving result.

Then at runtime, you would just have to add a slight post-processing step that ensures the CNN’s predicted seam doesn’t violate any constraints for the intended output shape of the resize operation chosen.

You could use the existing seam carving algorithm to generate the training data, but would need to be careful the training set covers a huge variety of image statistics to avoid overfitting resizing to a particular subset of image features (or conversely, you could try to learn “better” than seam carving for specialized resizing on subclasses of images, like specialized for faces, cityscapes, etc.)

seam_carver · on May 27, 2018

I don't know most of the terms you just said... I guess I should try learning some ML? I heard the Andrew Ng Coursera course is a good place to start.

innagadadavida · on May 28, 2018

14 sec per image is excessive, is there either an implementation or algorithm that can handle do this in less than 500ms per image?

seam_carver · on May 28, 2018

It's 14 seconds to remove 200 seams on the 512-by-342 bench image, but I note potential optimizations in the notebook.

Right now, the seam_carve function in scikit-image accepts an energy matrix, which it then uses to generate the accumulated energy matrix. Forward energy calculates the accumulated energy matrix directly, but I calculate the energy matrix backward from it to be able to use scikit's seam_carve function.

If I write my own seam removal function or change scikit's, I think I could be able to make the speed comparable to backwards energy.

Furthermore, the accumulated energy matrix calculation step in scikit uses a double for loop, which might be able to replaced with some np stuff for a speedup for both forward and backwards energy.

I should probably raise this as an issue on GitHub... thanks!

seam_carver · on May 28, 2018

I actually realized I was doing a lot of unnecessary computation in my dynamic programming step. I removed it and from 14 s it's now 8 s.

seam_carver · on May 29, 2018

After talking with some friends, basically I could get this to be a lot faster if I just wrote it in Cython. (the skimage.transform.seam_carve function is written in Cython). Also, I realize now that the double for loop in Cython is actually fast, so that means that the skimage implementation of dynamic programming is optimal. But I can still change it to accept the accumulated energy matrix!

WhiteNoiz3 · on May 28, 2018

For anyone interested, Photoshop has had this feature for a long time - Content Aware Scaling. It's also much faster.

seam_carver · on May 28, 2018

My goal right now is to get as close as I can to Photoshop speed; I'm fairly certain that the source code for Photoshop is not available online...

seam_carver · on May 29, 2018

Also I should just write this in Cython since I do a lot of unnecessary stuff when using numpy.

imranq · on May 27, 2018

This is the third assignment in Princeton’s Algorithms II course on Coursera - highly recommended!

seam_carver · on May 27, 2018

That assignment only covers the original seam carving algorithm; this post also covers the improved seam carving algorithm!