Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Fast Style Transfer in TensorFlow (github.com/lengstrom)
184 points by logane on Oct 31, 2016 | hide | past | favorite | 20 comments



Terminology note: there is a difference between Style Transfer and Fast Style Transfer.

Fast Style Transfer takes a very long time to train a style model (and the output models can be somewhat large; the pre-trained models in the Dropbox are 20MB), but the styles can be applied quickly. Fast Style Transfer is the technique that is used by Prisma/Facebook. This repo is the first I've seen that uses TensorFlow instead of lua/Torch dependency shenanigans, and as a result should be much easier to set up. (this code release also beats Google's TF code release for their implementation: https://research.googleblog.com/2016/10/supercharging-style-... )

EDIT: Playing around with this repository, I can stylize a 1200x1200px image on a dual-core CPU on a 2013 rMBP in about 30 seconds.

Normal Style Transfer is ad-hoc between the style image and the initial image; it can only be used for one image at a time, but overall it is faster than training a model for Fast Style Transfer (however, this is infeasible for mobile devices).


Thanks for the details on this - one question is your rMBP using CUDA/Nvidia for the 30 sec render?


No, which is why I specified dual-core CPU.

The README cites ~100ms with a Titan X using CUDA, although speed depends on image size.


Nice work! Have you seen this - could help reduce some of those checkerboard artifacts

http://distill.pub/2016/deconv-checkerboard/


Playing around with the tool a bit, supersampling seems to work pretty well to avoid those artifacts. (Thanks to the speed of Fast Style Transfer, this is feasible.)

EDIT: There's a catch. The checkpointed models are calibrated for smaller images, so you'll get better styles with smaller images. Larger images tend to just have images repeated: https://twitter.com/minimaxir/status/793161755992002560


I hadn't seen that, thanks for the link - it looks useful! Right now the kernel size is not divisible by the step size - I'll try changing the transformation network.


Utterly amazing what people are achieving with neural nets. The idea that 'style transfer' can be fit into an algorithm is slightly blowing my mind right now.

The jumping fox video does looks a bit 'off' though, I think because the animation is kept the same and so it ends up looking too realistic for that style. Still this is early days!


To get an idea of how style transfer works, it can be useful to look at some less successful examples. Here's one from the same git repo:

https://github.com/lengstrom/fast-style-transfer/blob/master...

(Original photo: https://github.com/lengstrom/fast-style-transfer/blob/master... )

With the "Great Wave" painting as the network's style input, the limitations of the technique become more apparent. It's clear that a human painter would never render the Chicago skyline in this way: there are incongruent little waves on buildings' edges and all over the sky.

The antennas on top of the tallest tower are particularly revealing. The neural network just sees an area of higher local contrast, and has continued the same pattern that was applied in the sky at the top-right of the antennas but with more contrast applied. This doesn't make any sense for what's supposed to be a painting.

There's no intelligence here, "just" pattern matching that can do a brilliant illusion of creative variance on the right kind of content. ("Just" in quotes because it's still a great achievement.)


Now we need some kind of object recognition-enhanced version of style transfer that learns constraints on what "makes sense" given "sensible" labeled/captioned training examples!


It has been done with manual segmentation [1]. And the results are mind blowing. There also a lot of work done on segmentation with neural nets [2, 3], so I wouldn't be surprised to see someone implementing this idea in the near future.

1: https://github.com/alexjc/neural-doodle 2: https://arxiv.org/pdf/1605.06211.pdf 3: http://mi.eng.cam.ac.uk/projects/segnet/


There has been a lot of cool image manipulation/transformation/synthesis work coming out over the past couple of years using NNs and such. I'm curious if any of these techniques have started worming their way into products? Will new effects in Photoshop (or whatever) get better over point releases as companies train up better and better NNs?


It would be nice if this algorithm could be created in a more localised form, so that you could take a photo and apply the effect with different intensities as if you were using a brush.

Merely because sometimes it gets things wrong, and I think a normal human could correct things if they had more control over the process.


Can surely be done with the original neural style transfer algorithm. You'd just have to use a soft mask over the picture when calculating the content loss function.

Don't know if there's an analog way for fast style transfer.


As I mentioned in another thread, there is some work on this: https://github.com/alexjc/neural-doodle



So you can transfer style from image A to image B. What I want to know is, can you use style transfer to "amplify" image A's style to, say, 500%?


presumably this could be done by enlarging image A image by 500% and getting and cropping a relevant 1/25th of the image?


Nice work. I'd be curious how this technique compares with other fast style transfer methods, if any exist.


Results look pretty nice. Congrats!


This is ridiculously cool.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: