Notice how the transferred images have none of the power of the originals.
Take the first set of images, in the style of “Head of a Clown”, by Georges Rouault. Not only does it not recreate the brush-strokes, but the content of the restyled image doesn't fit the artist's style.
In the case of Munch's The Scream, and Starry Night, and every Monet, the subject matter is as important as the image. Maybe you'll be able to mask your Facebook profile in a combination of Murakami and Pissaro, but that's not the extent of their genius.
Similar things are attempted in music. While we can pretty programmatically formulate pop music, deeper meaning eludes machines. Bob Dylan didn't just write great lyrics, he tapped into the zeitgeist of the era in which he wrote. When Kendrick Lamar or Beyonce write about black struggle in America, that's in the context of a larger societal conversation.
I don't believe that neural networks will ever be artists because they work on copying existing patterns. Until there is an element of randomness fueled by context, they aren't creating anything new. That's where genius lies.
> Notice how the transferred images have none of the power of the originals.
They're not meant to. This is the development of base, mechanical technique, not of composition and theory. Lots of artists paint fairly mediocre pieces when developing technique and very often copy other works of art in the process.
> I don't believe that neural networks will ever be artists because they work on copying existing patterns. Until there is an element of randomness fueled by context, they aren't creating anything new. That's where genius lies.
What you ask for is actually a fairly simple architecture, it's just presently computationally intractable. So we're seeing if we can optimize the approaches to run on modern(ish) hardware.
Years ago a man made a computer program that could compose music. He kept getting told things like what you are saying. That it's soulless, that computers can never make real art, that it has no emotion, that it's just regurgitating patterns, etc.
Eventually an orchestra agreed to play one of the works. People liked it. Until they found it was composed by a computer, and then they hated it. People are insanely biased against computers.
That said, the algorithm was not a neural network. NNs do have problems producing music, they forget long term patterns quickly and so the works are easily distinguishable from human music. But it's possible this could be solved in the future with some new techniques at training NNs with longer term memory. I don't think it's long before the world is full of AI produced music. And you won't be able to tell.
I agree with the first half of this comment, but I don't follow you here: "Until there is an element of randomness fueled by context, they aren't creating anything new. That's where genius lies."
I suspect reframing of problems, seeing things in new light, paradigm shifts, new ontologies, whatever you want to call it are not quite so simple as context-dependent randomness! I don't think we understand this process very well right now.
Finally; I think there is an artfulness in copying existing patterns, because the way in which you "abduce"[1] an observation-explaining theory out of the infinite space of possibilities is a creative and aesthetic process.
Lil Wayne has a wonderful line in his song 6'7', "real G's move in silence like lasagna". Unbelievable.
Let's extrapolate that pattern:
"Real H's move in silence like phonebooks".
Not really the same ring to it, huh? It's not just pulling out a silent letter to emphasize the silence, but there's the context of Lil Wayne being a rapper and self-proclaimed G (gangster.)
Is our extrapolation "new"? Sure, in that no one has (likely) ever said that. But while it mimics Weezy's style, it doesn't understand it's context. Similarly, if Katy Perry sang the same line as Lil Wayne, the context doesn't make sense (Katy Perry is no gangster...)
EDIT
A better example, specific to art, would be Warhol. He explicitly copied real-world objects as art, to create something new. But the newness wasn't that he made a clear copy, it was that his copies reflected the shift in materialism that came with mass-production. Warhol mass-producing "art" was a social comment that resonated _at that time_.
His art was more than the process, it was the context in which it was made and what that said more broadly about society. That's why it resonated.
> Lil Wayne has a wonderful line in his song 6'7', "real G's move in silence like lasagna". Unbelievable.
Sorry, I'm too thick. I mentally thought of a gangster moving laterally, splayed out and wondered how it would be quiet. I had to google this to understand that he meant silence like the letter 'G' in 'lasagna'.
Wow, this is more complex than I thought. It looks like in Italian, you have a single consonant ɲ. In English, ɲ doesn't exist. We say nj, a two consonant cluster, and most people can't properly distinguish the two sounds.
The end result here is that in English the g causes a sound change after the n, and basically qualifies it as a silent letter. This is a subtly different sound from the Italian version, where it merges with the n and does not qualify as a silent letter.
Extrapolation in text space is not the same as more abstract movement - even simple things like word2vec can capture a lot of meaning through context. Probably not enough to construct wordplay like this yet, but it is not outside the realm of possibility. At least a few journalists were fooled by a walk through latent space in a more complex model here.
This incarnation of neural networks will never be artists true, but some future iteration of combinations of algorithms (some of which will be informed by current neural networks) likely will produce many great artists per turn.
That said, when people express skepticism about current AI produced art, what they are really trying to express is that current AI lacks a theory of mind in its productions. Artists are not simply sampling from a generative model, they're generating a construction informed by complex inner states while also predicting how arrangements of states will affect a perceiver. Great artists, especially writers and poets, are able to operate at above average levels of intentionality.
Until we have AI with basic commonsense, the good news is—for an undefined period of time—AI will foreseeably be tools which augment artists (humans) and raise the floor for what counts as creative work. Even within a couple years, things by talented teams that would impress today will be run of the mill thanks to intelligent tools but one can hardly imagine what talent of the future enabled by semi-autonomous tools will create. This help could hardly be more timely thanks to rising production costs, falling profits and new content intensive modalities as VR.
Billions of humans have all of the hardware to become great artists and it's still extremely unusual.
Art comes from life. AIs won't experience life until they have bodies. Until then the best they can be is a psychotic imitation machine chained to a human-selected data point.
That can have quite a lot of value, when paired with humans who can select data points intelligently. But we still don't even have a whiff of an understanding of how to create autonomous intelligence.
As a recent devotee of Prisma, I would agree that the highest-level discretionary function is (as yet) totally missing from the computational side...
...BUT I would also say that this technology is yet-another-one which can act as an effective 'force multiplier' in the sense that if you already _have_ a reasonable 'eye' you can use this as a very very powerful tool.
One of the benefits of that is that if like me you have a 'decent' eye only, but no e.g. visual genius, an app like Prisma plus a modest amount of straightforward judicious additional editing / cropping, can lead unnervingly quickly to results which I find _very_ satisfying.
To put that another way, I find that I can now in-phone produce images which for my own interests and tastes are quite arresting.
They are not fine art, or innovative... but they represent a rendering into pixels of notions I am not otherwise able to execute, and frankly am not able to even foresee as effective.
I find then that I am often in a very specific aesthetic mode using them: discrimination and rejection of all but the strongest candidates.
Feels like I am executing a GA selection function, essentially, and then just doing a little post production...
We don't have to have "deeper meaning". We just have to pass enough double-blind trials of whether an expert can distinguish "deep meaning" from "AI-generated meaning". I'm not a big fan of the "humans are special and machines can't capture magical human nuances" hypothesis.
In the context you're talking about, I agree, but non-contextual/pure consumption art is a different story. Think about how game art/textures, Instagram/imgur/reddit images that people look at for one second, like/upvote, then move on, YouTube background music, etc.
We might not get world-changing, paradigm-changing, transcendental, strong social commentary-inflected art, but we are getting some cool stuff people like looking at. Not to mention meme material ("Dog or Bagel" and friends) that uses stuff like this as part of its input (which, I would argue, is its own kind of art.)
> Until there is an element of randomness fueled by context, they aren't creating anything new.
A neural net has plenty of randomness (which shouldn't be confused with nondeterminism) inside it, right from the start with its randomly initialized weights & biases; no two trained style networks will be exactly alike and produce exactly the same output unless one takes major trouble to do so.
Any artist (or writer) has had the experience of people finding meaning and depth in their creations which they had no idea about, and may have thought one of their worser works. This can reflect unconscious work, of course, but it also reflects that the artistic process is as much about throwing random things at the wall to see what sticks and triggers an intuitive reaction in other people that 'hey that's good!'. One may have fail hundreds of times before getting a hit. Rarely does one sit down and think 'I am going to write an intricate story about dying in Venice with such and such symbolism, a buildup to the ending, and referencing these specific past authors', write it out in full in 1 pass without revision, decide it's good, and have everyone else agree. Art is as much about about editing, curation, and recognition as it is about planning. A neural net could create great works of art without any planning or intention of doing so, if the right person sees it and selects it out of the vast stream of images the net will generate.
I totally agree that the transferred images don't have the power of the originals. I know nothing about art (well, painted art), but its striking how the transfered images seem like cheap imitations in the style of the originals. And yet, im still impressed by the technology.
> Notice how the transferred images have none of the power of the originals.
Partly because they are like 200x200 pixel images. There is no power in the original either that way. And I've been seen a ton of these posts like "look at these beautiful images", and the images are absolutely tiny. It's really ridiculous and annoying.
> In the case of Munch's The Scream, and Starry Night, and every Monet, the subject matter is as important as the image.
Not to mention the transformations the artists made that the software cannot: turning pinpoint stars in the sky into huge pinwheels and fireballs; turning a human face and hands into an inhuman creature.
Yeah, to achieve that the software will have to understand the semantics of the image (at least crudely) which is already possible, and how the depiction is a distorted representation of real objects (harder, but it is already clear that this is possible).
Capturing the semantic meaning of the distortion would another increase in complexity, but perhaps not necessary, as the software doesn't have to understand why we selected "The Scream" as the style we want (although it would certainly be cool to be able to specify "Use the style of "The Scream", but expressing melancholy instead of horror.").
While I agree with everything you're essentially saying...
> Notice how the transferred images have none of the power of the originals.
I disagree. Maybe that's because I've been described at emotionless but the original art doesn't have any "power" to me at all. I honestly see all of them as basically the same as far as impact goes.
Now I'm likely in the minority here but it shows that art is going to be very personal and there isn't going to be very objective ways to measuring "power" or "emotional impact" except through an aggregate polling.
This instagram account - https://www.instagram.com/prisma/ - was posted below and contains selected images produced by the Prisma app. They might not compare with the masters, but they are definitely art in my opinion. Examples:
Deeper meaning is inferred by the viewer/listener every bit as much as it is implied by the creator. I think the future is in empowering artists with better computational tools to iterate, create, and explore.
That said, I think saying never is a bit strong. People like an amazing variety of things, and given enough design, statistical modeling, luck, and randomness something interesting enough for a listener to infer meaning can pop out. You should check out this article about David Cope's work [0] and thoughts on creativity.
I think it's a lot more reasonable to think of algorithms like this as new media for artists rather than a replacement for them. Obviously the choice of subject matter is important for art, but the subject matter in these examples was chosen by humans was it not? Humans took the original pictures that the algorithms were applied to. Their choice of subject wasn't made for artistic impact though, it was made to give them a base to demonstrate the techniques with. Artists do this will all types of media while they're learning and trying things out. There are already some artists out their doing some really cool things by incorporating neural nets into their work and I expect their will be many more as the tools become more mainstream and available. Why do you feel the need to be so dismissive of new ideas in art?
There is an element of randomness fueled by context, at least in the technical sense of there being random variables with distributions conditioned on the inputs.
This has bugged me for some time now, but how does an artist like Beyonce know about the "black struggle" in America? She's indeed black, she is an American and she lives there, but judging from half a world away it seems that she lives in a parallel universe compared to what black people who are poor experience day-to-day by living in the States.
Surely those aren't good ones to compare against due to bias. A more fair way to judge would be to have people rate unknown artists' work as well as the machine output. Then see how that pans out over time.
Judging from some of the stuff on display in galleries right now, I'd be shocked if machines can't outperform or match. And they'll only get better, right?
After being infatuated by a lot of art and medium, I recently found myself perplexed by painting (and some forms of illustrations). There's some magic beauty there, that I often feel is lacking in, say, movies, that can't be transfered. Something about re expressing our biological perception and the "soul" level ones, intertwined.
Something that could help is feeding the NN more paintings in the same style as a style base. Otherwise it seems the NN gets "too fixated" in trying to make a painting that's similar to the original (see suns appearing when transferring from Starry Night, buildings disappearing, etc)
> Until there is an element of randomness fueled by context, they aren't creating anything new.
That argument could be said about any innovation.
The Google paper is an iteration on neural style transfer; just because it doesn't mimic the "soul" of art doesn't mean the field is not worth pursuing.
You're correct. It's imitating style, not creating substance.
This is effectively the 2d image version of those "random text generators" that ape a given style. It's really interesting and cool but it's not "Hollywood AI" in any sense.
Why should we be disappointed if it's not actually artistic-genius-in-a-box? Here is an interesting new technique that's just been invented. Its artistic possibilities have barely been touched.
Real-time neural style transfer is not new; in the past year there have been several academic papers [1-4] on this topic and several open-source code releases:
The novelty of this work is a clever way for training a single network that can apply many different styles; existing methods for real-time style transfer train separate networks per style. Their method also allows for real-time style blending, which is very cool and to my knowledge has not been done before.
(Disclaimer: I'm the author of [2])
[1] Ulyanov et al, "Texture Networks: Feed-forward Synthesis of Textures and Stylized Images", ICML 2016
[2] Johnson et al, "Perceptual Losses for Real-Time Style Transfer and Super-Resolution", ECCV 2016
[3] Li and Wand, "Precomputed Real-Time Texture Synthesis with Markovian Generative Adversarial Networks", ECCV 2016
[4] Ulyanov et al, "Instance Normalization: The Missing Ingredient for Fast Stylization", arXiv 2016
Yes, I think that is a likely explanation. Also note that Vincent Dumoulin is an author of both the deconv-checkerboard blog post and the new paper from Google, and that the new Google paper uses the upsample+convolution technique suggested by the deconv-checkerboard blog post.
I've noticed that your project for fast-neural-style does instance normalization over batch normalization.
Batch normalization has the benefit that you can merge the gamma & beta into a convolutional layer on the forward pass, which makes it a lot faster by allowing you to skip a step when building the styled images using a trained model.
Can the same be done with instance normalization? I didn't see a formula in the paper but I would think so, since they are fairly closely related.
I've found that instance normalization usually gives better results so I prefer it over batch normalization.
With batch norm you learn four scalars per convolutional feature map: mu (mean), sigma (stddev), alpha (scale) and beta (shift). During training, mu and sigma are estimated from data statistics; during testing they are constants, either estimated from the entire training set or computed as a running mean during training. At test time the batch norm operation is then alpha * (x - mu) / sigma + beta, which is a linear operation since everything but x is constant; since it is linear it can be merged into a convolutional layer.
With instance norm, mu and sigma are estimated from data statistics during both training and testing; this means that the test-time forward pass is nonlinear, so it cannot be merged into a convolution (which is linear).
Don't forget everyone that is not Prisma that does the same thing, sometimes better.
Here's mine. It runs on Mac & Windows, runs locally on all its styles unlike Prisma, runs HD images, and can process unlimited video: http://macdaddy.io/Style/
So I've been playing with style transfer for about six months. I even combined it with another technique to actually get something that I liked enough to have printed on canvas (https://github.com/j2kun/art-21-logo)
And as much as I love the math, the computer science, and the tech behind it, I have to admit that the novelty wore off quick. Not only does a style-transferred image not have the power of the original, as another commenter said (we can debate about why), but the longer you look at it the more you see the ugly artefacts. I understand it's not perfect, and that there will be improvements, but to me that's what's primarily holding back style transfer. Not runtime or memory constraints, but whatever first principles need to change to get a global unity of style in the image.
Of course, if you introduce the human hand to fix the artefacts (which I think is great, and I know people are working on this), then style transfer can still be super useful. I can definitely see this becoming a new photoshop/illustrator integration.
Did you try putting more structure (concentric outlines or a gradient or something) in the input image? Seems like style transfer would fall flat on a binary image.
Gradients in the source image tend to yield poor results. They often awkwardly reappear in the result, looking like an artifact doesn't fit with the style image at all.
Source images in a posterized style, with large flat surfaces in unified colors, tend to yield great results with few artifacts. With that in mind, I find the raw "21" logo to be a rather good source image. Two colors is minimalist, but it's a choice. I'm sure GP went for simplicity quite deliberately.
These are thrilling developments but every time I see examples they're relatively "low-res" images. I would love to be able to zoom in and see whats going on, but I suspect that we're seeing the output exactly as it was rendered by the ML processing. Is that the case?
Actual paintings have brush strokes and even surface effects that aren't visible until you're looking very up-close or at a high resolution image. This is a key feature of the enjoyment paintings that often gets lost on people.
I guess the ultimate "turing test" for this stuff would be to have the system literally paint these images. That is, with actual paint and a paint brush, NOT with pixels. This would sort of give a really fresh and exciting twist of meaning to the title of the old essay by Walter Benjamin "Art in the age of mechanical reproduction."
Imagine this used to create a film noir style transfer style network. Throw it onto Daydream. You could have a game where you're Sam Spade, walking around 1930's San Francisco.
Or maybe it's a gloomy day on Ocean Beach. Make a sunny day filter, put it on Daydream, and you're ready for a picnic!
Interesting. Your comment gets at the question of what constitutes art.
I think there are many forms and purposes of art, but the kind we most often consider authentic usually seems in many ways crude compared to what preceded it:
- Jazz began as something that sounded cacophonous compared to the more obvious order of hymns and classical music. Jazz transformed what had been brief moments of dissonance and arrhythmia into elaborate intereleavings of tension and resolution, structure and chaos.
- Rap began by taking the repetition and rhyme of poetry and the rhythms of african music and created something many would not classify as music when they first heard it.
- In order to be deemed fresh and worthy of recognition, fiction must not feel like a rehash of the great classics. In most cases the best new fiction smashes conventions in at least one narrow way. I'd argue that an unattributed work from a great author would not likely be publishable today because of that. Quality is not just subjective, it is relative.
- In fashion, the moment a style is obvious it becomes outdated and means something quite different. In some cases the latter can be marketed successfully, but it is hard to think of it as art by that point. Think of the way iconic things felt when they were new (and ugly to most people at that moment) and someone was making them iconic.
So while these algorithms create believable surface features mimicing established instances of "art" (they might persuade me that a short story was written by Fitzgerald when in fact it had been written by a grad student and run through the algorithm) they won't necessarily make it a good short story.
But in another year researchers will likely understand what makes humans consider things stylistically novel and then all the above points will be irrelevant.
Let's take "survive" to mean survive financially based on one's ability to produce art with commercial value during their lifetimes.
As others do, I think art is much more than than the piece itself, through whatever senses we use to perceive it. It is also the story of its creator and the wider context of its creation, as well as history of its movement through the world including those of its various owners.
I think this leaves tremendous room for human artists today and far into the future. Further, I believe some artists, and there certainly exists those who have tried already, will direct machine learning based art, adapting it to be more likely to succeed based on those factors I indicated.
Musicians have always been about the performance more than the creation of the music. A ton of musicians do not write their own music, for instance, but are still really popular.
While these results are certainly impressive, it's hard to say that they're aesthetically or artistically "good."
I played around with Photoshop filters a bunch when I was younger -- there are (or used to be) a good handful of "artistic" filters, imitating watercolor, pointillism , ink drawing, etc. These style transfer results are reminiscent of nothing so much as those effects: they look superficially appealing, but have the same blind, dead quality.
Maybe this is an indictment of "style" as it's typically understood? A certain brush technique does not an artwork make.
The moderately impressive part of these algorithms is the fact that they can "learn" a style from just one image.
However, in reality they are not nearly as universal as these papers make them seem. Pretty much all my attempts to use these algorithms resulted in garbage with ugly artefacts. Which, actually, was pretty informative, since I saw exactly what kinds of features they pick up. I am disappointed that these blog posts don't include any "failed" images of that sort.
More than likely you're just biased because you know ahead of time it was done by a computer. We could probably take one of the images, present it in the right context, like in an art gallery where we pretend that it was created by some fictional up-and-coming artist, and people would love it instead.
People will love all sorts of stupid shit if it's put in a gallery. (Commentary on that phenomenon is, in fact, an entire art movement -- readymades.) There are two questions we should be asking here:
1) Does a positive gallery reception indicate artistic value? Presumably not.
2) If we assume that these style transfers have artistic value, is it similar to the value of the original paintings? This is a more difficult question, but the answer does seem to be "no."
Prisma was a land grab because the original authors didn't make an app. Hell I started working on my own version at the time since it was such an obvious app.
The original authors finally got around to making their DeepArt app, so we have at least two players in this space atm.
I think either of these two players will incorporate this research, but there's no clear incentive for anyone else to try and get in on this market now. It's not clear that there's a whole lot of money to be made here, but who knows maybe usage will explode. I mostly expect an acquisition.
Plug for my mostly ignored version, that runs on Mac & Windows. It processes high res images unlike Prisma, and runs entirely locally. And can process videos of arbitrary length & resolution. http://macdaddy.io/Style/
I imagine we will see many clones when Google open sources their implementation too (as mentioned in the article, along with others mentioned in this thread as well). I wasn't aware of DeepArt. I'll check it out.
Prisma never had "ownership" of this effect; the papers of Neural Style transfer have been out for months, with this Google method being another iteration.
Prisma was just the first polished app released using style transfer. (However, since the technique is public, it will make an acquisition of Prisma less worthwhile)
Did I miss the part of the article where they create a Monet style from multiple Monet works and apply it, rather than using the style of an individual painting? I thought that was the point of this. They sort of do it in the last image, I guess. Seems to pick up so much actual color, though! (as opposed to perhaps the quality of the color or how it's used, which is just as characteristic of a particular style as the color itself)
There has been other recent work (Preserving Color in Neural Artistic Style Transfer) that allowed control, separate from the "style", over how much of the original painting's color palette to use:
It isn't too hard to extrapolate from this to using separate images as sources for the style and palette, and from there to transforming the "content" image's color palette to resemble the palette of the "style" image in terms of saturation, value, contrast, etc. (but not hue) by manipulating histograms, and then using that result instead of either the "content" image's or the "style" image's palettes.
Because if you have a photo of a bright red cardinal and a desaturated pastel drawing of a pale blue boat, the result you are likely to be most pleased with would be a pastel cardinal that is, well, pink, and not a pale blue cardinal.
Of course, the harder version of this would need to recognize the semantics of both the "content" and the "style" image in order to construct a more meaningful palette to be applied (which is what you're getting at), and that same semantic information will also help with transferring the style more meaningfully (eg. crude palette-knife houses in the background but small brushed strokes depicting the people in the foreground).
I agree. While this is super impressive stuff it seems to match palette far more than style. Van gogh has a distinct style even though not all of his paintings have the same color palette.
This is cool for my Instagram account, and I do really like Prisma, but... Google and others do hard work in neural nets. I feel like there must be less obvious applications here than what's discussed in the blog post. Anyone have ideas?
Does this go far beyond photos? Will I eventually be able to translate my ideas and thoughts into anyone's style programmatically? Could each user view the same content interpreted in a different style that they like best? Might I be able to extrapolate how a user might use my website from their clickstream stlye on many sites? etc
I'll be the first one to admit I know next to nothing about art, but some of those generated paintings are quite beautiful. The "Starry Night" driven ones in particular. Very cool research.
There was a twitter bot that would do this for you. Unfortunately it got to many copyright complaints and was shut down. Here's one it did for me: https://twitter.com/sep332/status/705882039720013824 (scroll down for the result)
Take the first set of images, in the style of “Head of a Clown”, by Georges Rouault. Not only does it not recreate the brush-strokes, but the content of the restyled image doesn't fit the artist's style.
In the case of Munch's The Scream, and Starry Night, and every Monet, the subject matter is as important as the image. Maybe you'll be able to mask your Facebook profile in a combination of Murakami and Pissaro, but that's not the extent of their genius.
Similar things are attempted in music. While we can pretty programmatically formulate pop music, deeper meaning eludes machines. Bob Dylan didn't just write great lyrics, he tapped into the zeitgeist of the era in which he wrote. When Kendrick Lamar or Beyonce write about black struggle in America, that's in the context of a larger societal conversation.
I don't believe that neural networks will ever be artists because they work on copying existing patterns. Until there is an element of randomness fueled by context, they aren't creating anything new. That's where genius lies.