Transfer Style but Not Color

Everlag · on June 10, 2016

I played with style transfer on a bunch of paintings from the Wikipedia featured paintings[0]. The result when it works well is absolutely fantastic but when it doesn't, it basically just applies noise. Even 'obnoxious' styles like Van Gogh need to be cherry picked for the most 'featureful' flavor images.

That being said, I am pretty hyped to get my 1080 to be able process more than 1 image an hour at 720p. Also, take a look at the featured paintings, they're all public domain and absolutely gorgeous.

[0] https://en.wikipedia.org/wiki/Wikipedia:Featured_pictures/Ar...

sabalaba · on June 10, 2016

I just re-implemented something similar for Dreamscope that gets very similar results. If you want to do it yourself, simply convert the original and style image to HSV and create a third image like this:

    [H, S, V] = RGB2HSV(original)
    [H_s, S_s, V_s] = RGB2HSV(style)
    result = HSV2RGB([H, S, V_s])

You can play with the algorithm on Dreamscope, just click the "Original Color" star before processing!

https://dreamscopeapp.com

a_ecker · on June 10, 2016

Dude, you're fast :)

Our method is a little bit more involved than that, though. We're preparing a short writeup.

j2kun · on June 10, 2016

Is there a github repository for this variant? Or a paper?

[Edit: found an attempt here: https://github.com/pavelgonchar/color-independent-style-tran...]

amelius · on June 10, 2016

Can we do similar things with music?

versteegen · on June 10, 2016

This works for images because images can be decomposed into different scales: in classical image processing into image pyramids, and in this case using deep convolutional neural networks which capture progressively more complex features and larger scales. So you decompose an image, keeping the high level features fixed, and modify the image (using gradient ascent) to make the low level features match those from a sample of the artistic style.

So to use the same algorithm for music you would have to decompose audio in a similar meaningful way. There has also been a lot of success in speech recognition with CNNs lately, but I don't know what the situation is with modelling music.

amelius · on June 10, 2016

> So to use the same algorithm for music you would have to decompose audio in a similar meaningful way.

Well, in music, scaling could be compared to increasing/decreasing frequency. We all know that a song which is transposed by e.g. an octave still sounds the same (albeit lower/higher). So I think the concept translates well from images.

versteegen · on June 10, 2016

That doesn't sound like what I meant. The point of image pyramids is that they separate fine details (e.g. style) from coarse details (form). A note transposed down an octave is still exactly the same type of object; it's not more abstract so frequency is not an analog to scale in images.

What you want is a progression of abstractions, e.g. note, chord, melody... but one where 1) each is largely orthogonal so that they can be separated and recombined with a lot of flexibility (definitely not true of notes and melodies) 2) it be computed straightforwardly, preferably by a differentiable function 3) can separate style and form.

GFK_of_xmaspast · on June 11, 2016

Mutiresolution analysis (forex wavelets) is perfectly appropriate for time series data tho.

s_kilk · on June 10, 2016

Probably not (yet).

Music has the added complexity of being smeared across time, which I think will be a lot harder to crack.

It would be fun though, to pass a Garth Brookes track through a Deathspell Omega filter.

amelius · on June 10, 2016

> Music has the added complexity of being smeared across time

But images are smeared across space in two dimensions. That's one dimension more than music.

pierrec · on June 10, 2016

Yes, and I'll try to make it more explicit for people saying that sound is also two-dimensional. Before any analysis, sound is one-dimensional, that is unless you count stereo channels to represent an additional dimension, but then it equally makes sense to represent colors as an additional dimension in images. Frequencies appear when you do a Fourier transform - but this transform is equally applicable to images as it is to sound, so again images stay at least one dimension ahead of sound.

So it appears that dimensionality is just not a good way of explaining the difficulty here. I'd say the neural style technique was developed specifically for images, and it's becoming apparent that we can't simply apply it to sound and get good results. Maybe by working something similar from the ground up.

coldtea · on June 10, 2016

Music has different amplitude and frequency at every sample, which is also two dimensions, no?

jefftk · on June 10, 2016

Sound just has an amplitude at every sample, but this is too low level to be very useful. Instead you would probably make a net process the same kind of features a human has, which is a list of frequencies at each sample, each with an amplitude.

(In humans we have oscillating hairs that perform an analog frequency extraction, while with computers you'd use a fourier transform to do it mathematically.)

hrnnnnnn · on June 10, 2016

Sound has at least two dimensions: time and amplitude.

pygy_ · on June 10, 2016

Amplitude as a function of time.

Images have (r,g,b) as a function of (x,y).

Edit: or (h,s,v), or however you slice the color 3d space.