Terminology note: there is a difference between Style Transfer and Fast Style Transfer.
Fast Style Transfer takes a very long time to train a style model (and the output models can be somewhat large; the pre-trained models in the Dropbox are 20MB), but the styles can be applied quickly. Fast Style Transfer is the technique that is used by Prisma/Facebook. This repo is the first I've seen that uses TensorFlow instead of lua/Torch dependency shenanigans, and as a result should be much easier to set up. (this code release also beats Google's TF code release for their implementation: https://research.googleblog.com/2016/10/supercharging-style-... )
EDIT: Playing around with this repository, I can stylize a 1200x1200px image on a dual-core CPU on a 2013 rMBP in about 30 seconds.
Normal Style Transfer is ad-hoc between the style image and the initial image; it can only be used for one image at a time, but overall it is faster than training a model for Fast Style Transfer (however, this is infeasible for mobile devices).
Playing around with the tool a bit, supersampling seems to work pretty well to avoid those artifacts. (Thanks to the speed of Fast Style Transfer, this is feasible.)
EDIT: There's a catch. The checkpointed models are calibrated for smaller images, so you'll get better styles with smaller images. Larger images tend to just have images repeated: https://twitter.com/minimaxir/status/793161755992002560
I hadn't seen that, thanks for the link - it looks useful! Right now the kernel size is not divisible by the step size - I'll try changing the transformation network.
Utterly amazing what people are achieving with neural nets. The idea that 'style transfer' can be fit into an algorithm is slightly blowing my mind right now.
The jumping fox video does looks a bit 'off' though, I think because the animation is kept the same and so it ends up looking too realistic for that style. Still this is early days!
With the "Great Wave" painting as the network's style input, the limitations of the technique become more apparent. It's clear that a human painter would never render the Chicago skyline in this way: there are incongruent little waves on buildings' edges and all over the sky.
The antennas on top of the tallest tower are particularly revealing. The neural network just sees an area of higher local contrast, and has continued the same pattern that was applied in the sky at the top-right of the antennas but with more contrast applied. This doesn't make any sense for what's supposed to be a painting.
There's no intelligence here, "just" pattern matching that can do a brilliant illusion of creative variance on the right kind of content. ("Just" in quotes because it's still a great achievement.)
Now we need some kind of object recognition-enhanced version of style transfer that learns constraints on what "makes sense" given "sensible" labeled/captioned training examples!
It has been done with manual segmentation [1]. And the results are mind blowing. There also a lot of work done on segmentation with neural nets [2, 3], so I wouldn't be surprised to see someone implementing this idea in the near future.
There has been a lot of cool image manipulation/transformation/synthesis work coming out over the past couple of years using NNs and such. I'm curious if any of these techniques have started worming their way into products? Will new effects in Photoshop (or whatever) get better over point releases as companies train up better and better NNs?
It would be nice if this algorithm could be created in a more localised form, so that you could take a photo and apply the effect with different intensities as if you were using a brush.
Merely because sometimes it gets things wrong, and I think a normal human could correct things if they had more control over the process.
Can surely be done with the original neural style transfer algorithm. You'd just have to use a soft mask over the picture when calculating the content loss function.
Don't know if there's an analog way for fast style transfer.
Fast Style Transfer takes a very long time to train a style model (and the output models can be somewhat large; the pre-trained models in the Dropbox are 20MB), but the styles can be applied quickly. Fast Style Transfer is the technique that is used by Prisma/Facebook. This repo is the first I've seen that uses TensorFlow instead of lua/Torch dependency shenanigans, and as a result should be much easier to set up. (this code release also beats Google's TF code release for their implementation: https://research.googleblog.com/2016/10/supercharging-style-... )
EDIT: Playing around with this repository, I can stylize a 1200x1200px image on a dual-core CPU on a 2013 rMBP in about 30 seconds.
Normal Style Transfer is ad-hoc between the style image and the initial image; it can only be used for one image at a time, but overall it is faster than training a model for Fast Style Transfer (however, this is infeasible for mobile devices).