Don't think most people know about the improved seam carving algorithm (since only the original is taught in most CS curriculums) and thought you might think this is cool!
Seam carving is really cool, not just for the typical application of changing images, but also as a method of fingerprinting images (and the other side, anonymizing them). There are some interesting papers on these use cases via Google scholar [0].
There are a lot of potential businesses that could be built around image fingerprinting. Off the top of my head those include the obvious copyright/IP monitoring, and also more novel ideas like "meme tracking" to observe the propagation of memes across the Internet.
> Both papers introduce different ways to "calculate the energy of a pixel, which is a measure of its importance—the higher the energy, the less likely that the pixel will be included as part of a seam."
If I understand correctly, this function is basically how the algorithm decides which parts of the image to cut (or stretch). So it seems like it could be interesting to experiment with a range of different functions here.
I'm unfamiliar with CNN's, but you can draw masks manually to protect/remove parts of an image, or use a face recognition thing, or a motion detection thing, etc.
The biggest value of ML here would likely be to create a more useful energy function. This could be informed either by object detection data or hand drawn "energy" masks.
The other big change that could be useful would be to go with a non-greedy approach. Say you want to scale down a crowd of people - how do you remove, say, one person from the crowd instead of removing "low-energy" strips from multiple people. The people in the crosswalk image sort of demonstrates the difficulty of using basic energy metrics there.
One interesting idea would be to use GANs - come up with some parameterized seam removal function (with number of seams a input) such that the resulting image is still highly probable under the data distribution. So you can't cut objects into weird shapes since there aren't images like that in the dataset.
I would not dive into GANs right off the bat.
Ng's course is a good start, maybe also buy or borrow a copy of Russel and Norvig's "Artificial Intelligence: A Modern Approach"[0].
If you decide to look into deep learning, I've heard good thing about Bengio's book (see also Hugo Larochelle's lectures on YouTube) and if you want to learn more about reinforcement learning, "Reinforcement Learning: An Introduction" by Sutton and Barto is frankly amazing (and there's a great course on YouTube by David Silver).
From reading your code, you're probably familiar with most of the stuff you need to learn ML, so I thought I would take a stab at explaining how you get from "no machine learning experience at all" to "surely GANs will solve all our problems".
Broadly speaking, a lot of machine learning is just deciding what your goal is (carve an image, play a game, classify objects, group similar data points, etc.) definining an objective function (e.g., number of sharp edge removed, player score, a classification loss function[1], distance over points in a cluster) and then maximizing or minimizing that objective function.
In classification, you want a function `f(.)` such that given a data point `x`, `f(x) ≈ y`, where `y` is the correct label for that data point.
Say you have a bunch of these data points in a matrix `X`, and a vector of correct labels `y`–– how do you come up with a function that correctly classifies the data in `X`, and how can you do it so that data you might collect in the future would also be correctly classified?
You could do this with linear least squares (come up with weights `w` to minimize `|f(x) - y|`, with f(x) = x^T w) but unless your data is cleanly separable, that probably won't work so well.
There are a lot of possible techniques people have tried, but lately neural networks have become popular because they are powerful approximators, relatively easy to deploy, and pretty versatile.
But the training process[2] still boils down to tweaking weights until `|f(x) - y|` is as low as you can make it.
General adversarial networks are kinda a tweak on this idea-- you start with a bunch of images, and you train a network to generate images that are "similar" to them; call the initial images the "real" images and the generated ones "fakes".
You want your network to generate fakes that are similar to the real images.
How do you define similar?
Well, you have a second network whose objective is to distinguish between real and fake images.
You train both at the same time, using feedback from the second network to improve the first network, and using the images generated by the first network as training examples for the second.
Initially, the first network will generate splotchy garbage that the discriminator can easily tell apart from the real images.
Over time, the first network improves, creating increasingly realistic fakes, while the second network gets better at determining whether an image is real or generated.
After much training, you have two networks: one that is good at generating images that are similar to the seed images, and another network that is good at identifying if an image is real or fake.
2. See here for a nice explanation of the backprop algorithm, http://neuralnetworksanddeeplearning.com/chap2.html or if you want the OG (original Geoff) paper on the topic, it's in the proceedings for Parallel Distributed Processing 1986 -- "Learning representations by back-propagating errors", Rumelhart, Hinton, and Williams.
You could use a one-hot encoding to represent the resizing operation that’s desired ([0, 0, 1] for horizontal shrink, [0, 1, 0] for vertical shrink, and [0, 1, 1] for shrinking both, then use first component of 1 for expansion).
The input to a CNN model could then be the input image and the one-hot encoded resize instruction. And the “label” could be the pixel map of the chosen seam-carving seam that produces the resize step.
The CNN output will be a softmax pixel map, optimized to reproduce the seam-carving result.
Then at runtime, you would just have to add a slight post-processing step that ensures the CNN’s predicted seam doesn’t violate any constraints for the intended output shape of the resize operation chosen.
You could use the existing seam carving algorithm to generate the training data, but would need to be careful the training set covers a huge variety of image statistics to avoid overfitting resizing to a particular subset of image features (or conversely, you could try to learn “better” than seam carving for specialized resizing on subclasses of images, like specialized for faces, cityscapes, etc.)
It's 14 seconds to remove 200 seams on the 512-by-342 bench image, but I note potential optimizations in the notebook.
Right now, the seam_carve function in scikit-image accepts an energy matrix, which it then uses to generate the accumulated energy matrix. Forward energy calculates the accumulated energy matrix directly, but I calculate the energy matrix backward from it to be able to use scikit's seam_carve function.
If I write my own seam removal function or change scikit's, I think I could be able to make the speed comparable to backwards energy.
Furthermore, the accumulated energy matrix calculation step in scikit uses a double for loop, which might be able to replaced with some np stuff for a speedup for both forward and backwards energy.
I should probably raise this as an issue on GitHub... thanks!
After talking with some friends, basically I could get this to be a lot faster if I just wrote it in Cython. (the skimage.transform.seam_carve function is written in Cython). Also, I realize now that the double for loop in Cython is actually fast, so that means that the skimage implementation of dynamic programming is optimal. But I can still change it to accept the accumulated energy matrix!