This is feeling like almost thought to launch. In the last week, a lot of the id...

intelVISA · on March 21, 2023

GPT4 + Python... the product basically writes itself!

Until the oceans boil...

robertlagrant · on March 21, 2023

ChatGPT-5 will be written by ChatGPT4? :)

knodi123 · on March 21, 2023

if I've been reading it correctly, the power of chatgpt is in the training and data, not necessarily the algorithm.

And I'm not sure if it's technically possible for one AI to train another AI with the same algorithm and have better performance. Although I could be wrong about any and everything. :-)

visarga · on March 21, 2023

A LLM by itself could generate data, code and iterate on its training process, thus it can create another LLM from scratch. There is a path to improve LLMs without organic text - connect them to real systems and allow them feedback. They can learn from feedback from their actions. It could be as simple as a Python execution environment, a game, simulator, other chat bots, or a more complex system like real world tests.

BizarroLand · on March 21, 2023

I know that NVidia is using AI that is running on NVidia chips to create new chips that they then run AI on.

All you have left to do is to AI the process of training AI, kind of like building a lathe by hand makes a so-so lathe but that so-so lathe can then be used to build a better and more accurate lathe.

digdugdirk · on March 21, 2023

I actually love this analogy. People tend to not appreciate just how precise modern manufacturing equipment is.

All of that modern machinery was essentially bootstrapped off a couple of relatively flat rocks. Its going to be interesting to see where this LLM stuff goes when the feedback loop is this quick and so much brainpower is focused on it.

One of my sneaky suspicions is that Facebook/Google/Amazon/Microsoft/etc would have been better off keeping employees on the books if for no other reason than keeping thousands of skilled developers occupied, rather than cutting loose thousands of people during a time of rapid technological progress who now have an axe to grind.

scyzoryk_xyz · on March 22, 2023

It is a nice analogy because you can expand it really to all history of technological progress. Tools help make tools - all the way back to obsidian daggers and sticks.

qikInNdOutReply · on March 22, 2023

Same goes for the bellybutton, that navel connected from one living being to another, back to the first mamal.

kindofabigdeal · on March 21, 2023

Doubt

junon · on March 21, 2023

I know this is a joke but electronics cause an unmeasurably small amount of heat dissipation. It's how we generate power that's the problem.

taneq · on March 21, 2023

Or what answers we ask the electronics for... "Univac, how do I increase entropy?" distant rumble of cooling fans

arthurcolle · on March 21, 2023

You mean decrease entropy?

taneq · on March 22, 2023

We'll work up to that. For now, there's insufficient data for meaningful answer.

nmfisher · on March 21, 2023

Just yesterday I was literally musing to myself "I wonder if NeRFs would help with 3D object synthesis", and here we are.

It's definitely a fun time to be involved.

popinman322 · on March 21, 2023

NeRFs are a form of inverse renderer; this paper uses Score Jacobian Chaining[0] instead. Model reconstruction from NeRFs is also an active area of research. Check out the "Model Reconstruction" section of Awesome NeRF[1].

From the SJC paper:

> We introduce a method that converts a pretrained 2D diffusion generative model on images into a 3D generative model of radiance fields, without requiring access to any 3D data. The key insight is to interpret diffusion models as function f with parameters θ, i.e., x = f (θ). Applying the chain rule through the Jacobian ∂x/∂θ converts a gradient on image x into a gradient on the parameter θ.

> Our method uses differentiable rendering to aggregate 2D image gradients over multiple viewpoints into a 3D asset gradient, and lifts a generative model from 2D to 3D. We parameterize a 3D asset θ as a radiance field stored on voxels and choose f to be the volume rendering function.

Interpretation: they take multiple input views, then optimize parameters (a voxel grid in this case) to a differentiable renderer (the volume rendering function for voxels) such that they can reproduce the input views.

[0]: https://pals.ttic.edu/p/score-jacobian-chaining [1]: https://github.com/awesome-NeRF/awesome-NeRF

regegrt · on March 21, 2023

It's not based on the NeRF concept though, is it?

Its outputs can provide the inputs for NeRF training, which is why they mention NeRFs. But it's not NeRF technology.

noduerme · on March 21, 2023

it's actually a really fun time to know how to sculpt in ZBrush and print out models.

nmfisher · on March 21, 2023

If I had any artistic talent whatsoever, I'd probably agree with you!

noduerme · on March 21, 2023

I won't lie... ZBrush is brutally hard. I got a subscription for work and only used it for one paid job, ever. But it's super satisfying if you just want to spend Sunday night making a clay elephant or rhinoceros, and drop $20 to have the file printed out and shipped to you by Thursday. I've fed lots of my sculpture renderings to Dali and gotten some pretty cool 2D results... but nothing nearly as cool as the little asymmetrical epoxy sculptures I can line up on the bookshelf...

dimatura · on March 21, 2023

People are definitely building at a high pace, but for what it's worth, this isn't the first work to tackle this problem, as you can see from the references. The results are impressive though!

noduerme · on March 21, 2023

yeah, the road to hell is paved with a desperate need for upvotes (and angel investment).

amelius · on March 21, 2023

Is image classification at the point yet where you can train it with one or a few examples (plus perhaps some textual explanation)?

f38zf5vdt · on March 21, 2023

Image classification is still a difficult task, especially if there are only a few examples. Training a high resolution 1k multi-class imagenet on 1m+ images is a drag involving hundreds or thousands of GPU hours from scratch. You can do low-resolution classifiers more easily, but they're less accurate.

There are tricks to do it faster but they all involve using other vision models that themselves are trained for as long.

amelius · on March 21, 2023

But can't something like GPT help here? For example you show it a picture of a cat, then you say "this is a cat; cats are furry creatures with claws, etc." and then you show it another image and ask if it is also a cat.

f38zf5vdt · on March 21, 2023

You are humanizing token prediction. The multimodal models for text-vision were all established using a scaffold of architectures that unified text-token and vision-token similarity e.g. BLIP2. [1] It's possible that a model using unified representations might be able to establish that the set of visual tokens you are searching for corresponds to some set of text tokens, but only if the pretrained weights for the vision encoder are able to extract the features corresponding to the object to which you are describing to the vision model.

And the pretrained vision encoder will have at some point been trained to minimize text-visual token cosine similarity on some training set, so it really depends on what exactly that training set had in it.

[1] https://arxiv.org/pdf/2301.12597.pdf

aleph_infinity · on March 21, 2023

This paper https://cv.cs.columbia.edu/sachit/classviadescr/ (from the same lab as the main post, funnily) does something along those lines with GPT. It shows for things that are easy to describe like Wordle ("tiled letters, some are yellow and green") you can recognize them with zero training. For things that are harder to describe we'll probably need new approaches, but it's an interesting direction.

GaggiX · on March 21, 2023

If you have a few examples you can use an already trained encoder (like CLIP image encoder) and train a SVM on the embeddings, no need to train a neural network.

cainxinth · on March 23, 2023

The engineers of the future will be poets. -Terence McKenna