IDK why this is being downvoted, it was indeed published two year ago, they just apparently repackaged this as an easier-to-use tool:
https://arxiv.org/abs/1903.07291
Before there was a dense scientific paper. Now they've released an incredibly simple tool that lets anyone draw photorealistic images with simple strokes.
Saying that is repackaging something into an easier-to-use tool seems like quite a stretch. They didn't put a GUI on curl or something.
I hate the "download" button at the top that points to "#something" on the page that centers on an image that fills the whole screen (on a 4k laptop) so that you can't see the real download button below it.
Ah, that's kind of reassuring for canvas, because I was really disappointed when I' tried to play a bit with it. I was like: “meh, how is that even worth a press release?”
It would be really interesting/fun to run this on a frame by frame output of classic 8 bit video games and see what it does. I know it wouldn't make the game look real within it's own concept, but the ordered/familiar inputs to the AI might generate some interesting video outputs.
I'm not entirely sure how this would be achiveved.
When you draw in the app, you use a brush and have to pick materials from a palette (like "sky" or "ground" or "stone wall" etc). It doesn't seem to have any sort of "import an image" feature because what would that even mean in their model.
The approach I would probably consider is to use a modified ROM so that the different sprites in the game are different solid colors. Then I'd write some kind of mouse automation to use those captured images and draw the frame in the app, clicking on the various palette options based on color.
The next challenge is that the Canvas app doesn't let you set individual pixels, the smallest brush is ~10px across on its ~550px canvas. Maybe I'd have to settle for picking a Z-ordering and just drawing everything approximately, or maybe you could do some sort of attempt at a path routing algorithm to draw along the edges of the shapes and fill in the centers.
I remember when downloading 50KB was a serious commitment, over a phone line, of course. It took long enough that inevitably someone else in the house would try to use the phone and your download would get disconnected.
Indeed. I remember downloading a 3 MB mp3 file and it would take numerous tries and quite some time. Winamp would play the partial though so you could put it on repeat and get a few more seconds of the song on each run. This was back when you could find websites that just offered a collection of mp3s for direct download! The internet was a wild lawless place of free information back then.
When I got the first smartphone for my wife (then girlfriend) we set it up when we were in a hotel (without wifi). The 10mb mobile data used for setting up email etc. endet up costing more than the phone :(
Was anyone really training ML models on Mac to begin with? Pretty sure most people run Jupyter on a Linux server somewhere and develop in their browser.
I wonder how difficult it would be to make something similar that generated 3D models. Most of the examples look like they'd make good video game levels.
I wondered the same. There is some solid competition in this area right now, without AI assisted asset generation.
Unreal 5 has a new, free, 3d model library integrated as Quixel Bridge. [1]
Kitbash 3D, a company selling modular 3D sets used regularly in Beeple’s 2d provides mid-res, theme-based sets for customized use.
Neither take into account the idea of fully featured 3d objects being built from basic primitive using ML.
It makes sense that it will go this direction though, because it means designers can get unique 3D assets customized to the size and dimensions with less work.
Couple this with Apple’s photogrammetry in iOS 15 it seems original 3D assets available for training data will swell greatly.
Dungeon Alchemist seems really cool (I'm a backer), but I'm not entirely sure that it is related. DA is basically procedurally generated furnishing (with a few params), but it doesn't create 3D models from what I understand, it "just" shuffles around furniture.
Well, I think there is enough interesting research to put things in place. Not in single model. But, we have
0. This neural thing, of course, to create landscape-like 2D projections of a plausible scene.
1. Wave-function collapse models that synthesize domain data quite nicely when parametrized with artistic care - this is a "simpler" example of the concept. https://github.com/mxgmn/WaveFunctionCollapse
2. Fairly good understanding how to synthesize terrain. Terragen is a good example of this (although not public research, the images drive the point home nicely) https://planetside.co.uk/
So, we could use the source image from this as a 2D projection of an intended landscape as a seed to a wave-function collapse model that would use known terrain parametrization schemes to synthesize something usable (so basically create a Terragen equivalent model).
I think that's it plausibly more or less. But it's a "research" level problem still, I think, not something one can cook up by chaining the data flow from a few open source libraries together.
I think the theory's all there, it just needs reference material on the one hand and the work to be put in on the other. With the new Unreal 5 engine, I think there is a lot of room for technology where an artist sketches out a rock and tools come in to generate the small details - much like there's tools like speedtree and co nowadays to procedurally generate content.
I believe the current understanding of GAN copyright is that the "minimum degree of creativity" happens when a human chooses the inputs/outputs and copyright is assigned to the human at that point. Drawing the input image for GauGAN probably suffices.
Fully automated outputs (like pulling an image at random from thispersondoesnotexist.com) would be public domain since non-humans cannot hold copyrights and no creativity was applied.
This is analogous to the "creativity" of a photo being the settings and framing done by the person who set up the shot and is why the famous "monkey selfie" fell under public domain[1].
But what if the pen draws by itself? You just say "draw a dog", and it does.
I do think you should always be the copyright owner, unless it's clearly stated in their terms that any image created using their tool is owned by nVidia.
Interestingly, if I employ an artist to produce a work (e.g. software code), usually the employment contract would say the copyright belongs to me and not that artist.
But it's still just a tool that functions to your input. An IDE which insert a lot of boilerplate and autocompletions, does it get the copyright to your codebase? Nope.
I'd just like to point out that this line of inquiry is not some unanswered philosophical question. All of capitalism is focused on this question of ownership. Who owns the picture? The answer is always whoever the parties involved agreed would own it. Both options can exist and they'll have different prices.
This same question often comes up with self-driving cars and "fault", and it seems to regress into the same trap. Ownership of _risk_ is one of the primary concerns of capitalism. The question is not, "who should be at fault?", it is instead "what is the cost of this risk?" and then we buy and sell that risk like everything else (which is also how we determine that cost). If the self-driving advocates are right and self-driving is safer, then the risk will likely cost less than your current insurance.
Of course, it's not always clear. If the parties can't agree who owns a thing, they often use some legal mechanism to resolve their dispute.
Because actually the user isn't. The AI is. AI's don't have a right to copyright. You making a few lines and the AI making the actual image does not make you the creator of the image.
While the analogy is correct, the binaries generated by compilers does involve integration of creative work beyond that in the code compiled. The binary as such is a 'derivative work' generated from creativity of the authors of the source code, compiler, and standard libraries. What happens is that the copyright licenses coming along with compilers and standard libraries explicitly grant generous permissions to the users of the compilers.
For algorithmic art, likewise the developers of the software typically provide permissive licenses to the users of the software.
AI makes this harder because the works are massively derivative works, which AFAIK, do not have much precedants in law. The question is not easy to answer unless the author (Nvidia in this case) owned copyright over all training data.
Would you mind elaborating a bit more on what you mean? It's very fashionable to be concerned about model bias at the moment, but it's not clear to me what the issue you're describing would be? Something like: trees would end up looking too much like the same tree?
Right, that was the assumption I was alluding to at the end of my comment. That said, it still doesn't fully resolve the question and unfortunately leaves the statement in handwaving territory still. 'boring' isn't really a measurement we can take and discuss super effectively, but we do have actual metrics across visual datasets that span basically all of what you might see as a human.
By chance, are you aware of any research on this topic?
I would say it leaves the statement in hypothesis territory, not hand-waving territory. If we can't discuss or measure "boring" effectively, that is a problem with our instruments and vocabulary, not the assertion.
"Boring" is an absolutely crucial thing to be worried about when it comes to anything remotely artistic.
It's not a hypothesis though, there's nothing that we can test in it. As presented, it's an assertion that has zero evidence to support it. That's the part I was hoping to get some actual clarification and precision on. When discussing machine learning, bias is a term that has a quantifiable definition in it's different contexts....boring doesn't. It's just FUD to snipe this kind of work with some kind of 'bias' fear without any additional evidence or thought.
You can easily generate a landscape and then do a paintover. People already do that with sketch up 3D models for backgrounds. I don’t think any professional would just literally copy and paste the thing.
Given its deep integration with their RTX APIs, I imagine even if the source code were open, the only way to get at the RTX ML-specific stuff is via their Windows driver.
Hmm slightly disappointed. I thought I'd make a mashup video game short with generated art.
The output resolution is locked at 512x512.
The "target style images" seem to be locked to that handful that come with the application.
The brush materials don't include anything man-made.
I am thinking if this Nvidia Canvas and the Apple Object Capture Capture [1] [2] will make Graphics or 3D Modelling cheaper or taking much less time. Instead of using tools on the computer which was never really good enough for human creation, they now create photos, painting, or models in real world and scan it in a computer for further editing.
Did they take a bunch of reference pictures where they said "this part here is water, this part here is rocks, this part here grass, etc...", and somehow trained a model from that?
Q: How does the AI in Canvas work?
NVIDIA Canvas uses a GAN (Generative Adversarial Network) to turn a rough painting of a segmentation map into a realistic landscape image. 5 million photographs of landscapes were used to train the network on an NVIDIA DGX.
Q: Is Canvas related to GauGAN?
Canvas is built on the same research core that NVIDIA showed in GauGAN.
I find the demo video terrible, though. Of course, it gets the general idea across but it's too fast-cut to see any details which - for something visual like painting - is kind of the whole point.
This is Windows only, probably due to the lack of support for the relevant GFX stuff on Mac?
Incidentally, does anyone know of a straightforward and quick Windows-in-the-cloud solution? A bit like GeForceNOW but giving you an entire VM without setup et al?
I'm using an Nvidia RTX 3090 on Linux with the proprietary drivers for machine learning and man it fucking sucks as a desktop lately.
So my best guess is nobody at Nvidia uses the Linux desktop as a workstation.
1) My HDMI screen hasn't been able to wake from sleep for over a year now, the only way to make it wake is to switch to a text tty and then back to X11.
2) Wayland still isn't supported. The default Ubuntu 18.04 gdm doesn't even work so on first boot with the proprietary driver everything seems broken.
3) Since Firefox 89 switched to accelerated rendering by default, windows randomly disappear and various video players have lock contention, drop frames at 60fps, and downscale video on a fucking $1600 video card.
4) HDMI audio crackles and pops with a 2 second delay after a few hours and I have to restart pulseaudio on the command line.
5) I file support tickets on Nvidia's website and the company never responds, they don't even dupe them with some other old ticket.
I mean, using ubuntu 18.04 means using ~4/5 years old software which only gets "security updates" (not even patch updates, e.g. they use a Qt LTS from 2017 and don't even update the patch version, it's still 5.9.5 while Qt's is 5.9.9), why would you expect things to work correctly with a 1 year old graphics card. On archlinux wayland with an nvidia card works pretty much fine.
This has been broken since 2019. I’m running Ubuntu 20.04 with the 5.11 kernel and the 460/465 drivers and all these problems are still happening.
And also, yes, I expect a 5 year old operating system to still work. Windows 10 does and it came out in 2015. These are professional tools for my fucking job.
> And also, yes, I expect a 5 year old operating system to still work. Windows 10 does and it came out in 2015.
but the windows 10 you run in 2021 is super different from the windows 10 you installed in 2015, there are ton of (sometimes fairly breaking) updates :
running an up-to-date win10 is basically equivalent to updating to every ubuntu release, LTS or not. Kernel is different, libc is different, system APIs implementations are different, everything is updated every few months - even the start menu pretty much changes all the time.
That is true but I would like that point out that windows still has issues on my bog standard Intel/Nvidia rig - e.g. Linux can't sleep properly, but windows either fails to resume properly or randomly wakes me up at night by revving turning back on and revving the fans.
Similarly, my new iPad pro is great until you need to do something apple haven't approved of (e.g. I can't watch a bunch of movies I have had copies of for years due to apple not letting VLC ship certain codecs)
Yeah you’re right, even low power Intel gpus can render an X11 desktop with audio wayyyyy faster and with less artifacts than the proprietary nvidia driver.
The software focused teams all use Linux workstations afaik, look at their job boards and blind. Their embedded systems (robotics / av) are all Linux as well.
I simply do not believe that given how bad their drivers are.
I would not be surprised if most or all of their Linux engineers ssh into Linux from a Windows machine given how stable their command line stuff is in comparison to the graphics (once you figure out the correct permutation of userland/kernel pieces to get CUDA+cudnn+TF working anyways).
Their recommended method of installing cuda includes a 64-bit version, but not a 32-bit version. Nvidia's cuda packages are marked as incompatible with debian's nvidia-driver-* packages, so installing it uninstalls the 32-bit version. As a result, I need to choose between steam (which uses the 32-bit graphics library) and an updated cuda version (since Ubuntu 20.04's repo is pinned at 10.2).
That happened to me yesterday on my work laptop. System 76's help documentation said to chroot in from rescue media, uninstall the drivers, then reinstall, and that worked fine, so it's now running 465 perfectly well. No idea why the straight upgrade path doesn't work.
But that's completely an Ubuntu problem, not NVIDIA. Like a (currently) higher up comment says, NVIDIA on Linux works fine as long as you're running the latest version of everything. My main desktop was built last April and I've been running Arch with RTX 2070 and the latest NVIDIA drivers ever since first boot and it has never given me any trouble, video or audio. My display is a 50 inch OLED connected via HDMI and audio a 5-channel soundbar with external subwoofer using eARC from the display. Everything is fine using GNOME defaults.
NVIDIA provides the nvidia-xconfig tool to autogenerate the X configuration, but you don't need it. It runs fine with no config. Wayland has worked for over a year, too. You can go look at the PKGBUILD file for Arch's PulseAudio installer and it isn't doing anything special, either, just applying the suggest default from PulseAudio's documentation making the ALSA default module pulse.
The only reason NVIDIA on Linux gives people so many problems is they're trying to run old versions of everything on enterprise-oriented Linux distros or "long-term support" without purchasing support. If you want the latest hardware, use the latest software.
Data center revenue at $6.7 billion. Gaming at $7.7 billion. But data center grew 124%, gaming 41%. If that keeps up, data center passes gaming this year.
That doesn't make any sense to me. Human ML/AI researchers are also users and NVIDIA clearly intentionally targets them as a market segment. They don't only care about pleasing gamers running Windows.
Wow. This is amazing! I'm not holding by breath for Mac OS support as Apple isn't very fond of Nvidia. I'm sure there will be clones in future for MacOs/IOS, Linux and Android.
Using realistic scenarios forces you to use realistic assets everywhere. If this tool only does backgrounds, it would raise costs for indie developers. Hence my previous comment.
I suspect they don't really need GPU to render it. It is usually training what requires a lot of GPU, not evaluation. So the Nvidia requirement is only to sell more cards.
I think the flops comparison you’ve presented is not fair: for nvidia it is “tensor” floops, not generic float multiplication (which is 10 times smaller), while for intel it is any float multiplication.
So for i9 the number would be higher if fma operations used, no?
It doesn’t make sense. Why it is fair to compare matrix multiplication with generic float operations? It should be either comparison of matrix multiplication to matrix multiplication or generic float to generic float.
I've often desperately wanted to put certain landscapes from my dreams into art, but I suck at drawing.
There are some dreams that I remember years later because of how beautiful they were, and how they made me feel. This would be a godsend if it works as well as the demo pictures show.
I could see tech like this being a big hit for illustrations for low-budget self-publish book.
Stock photos are all good but sometimes you really need a visual of Illiyana the dragon vampire arriving at the three-towered mountain citadel with two moons overhead, on a budget of $10 or less.
Bro, they are letting you literally doodle children’s art and create solid photo manipulations. This kind of stuff took at least some creativity by hobbyist photoshoppers.
Believe it or not, it took some effort to take random scenery and create a solid composition. Take my job sure, but Jesus, not my hobby too. Now these people will have to compete against AI scrubs.
the problem is demand for stock images. i m not sure the quality here is good enough, but there's no reason why image-generating ANNs won't keep getting better
but will it? my very limited experience with animation has been characterized by control: it's storytelling and creation where the creator is responsible for every fraction of a second. The value of this type of AI is in ceding control to an algorithm and letting it deal with the hard parts. My limited understanding is sort of pointing to a difference in goals of the two projects: one is for control, the other is for ease. And I don't think ease has a very stable place in animation.
This looks still dang hard for my cursed paws. I’m pretty sure it’s not easy for most people, and it still can’t beat google image search considering the amount of images.
Am I the only one who thinks this development is a bad idea?
Just extrapolate the obvious into the future. When everyone can create good art, despite being actually completely unskilled and untalented, then good art ceases to exist.
When everyone's an artist no one's an artist. It doesn't matter if we're not there yet, we will get there eventually and at that point it's too late.
That is not what tools like this enable though? Will it not still require at least a bit of artistic sense to get something decent out of it? It just makes the technical aspect much easier. Some will benefit. Not everyone. Unless you're convinced there's a hidden artist in all of us?
It's just like with the introduction of small portable cameras decades ago (film/photo, doesn't matter) especially getting a lot better in the past decade: did we suddenly see great film/pictures being taken all over the place? No. We mainly saw a ton of crap, bad shots, bad home movies, you name it. And then some rather small fraction of people which earlier did not have the means to get quality material or were restricted in other ways, who got their hand on it and were able to deploy/discover their inherent talent. Which they could perhaps have done in other ways, but not as easy.
The same used to be said about photography. What is most important in true art is not being skilled with a paintbrush or Photoshop but ability to evoke different emotions and thoughts.
[1] https://knowyourmeme.com/memes/how-to-draw-an-owl