Gemini AI

dang · on Dec 6, 2023

Related blog post: https://blog.google/technology/ai/google-gemini-ai/ (via https://news.ycombinator.com/item?id=38544746, but we merged the threads)

johnfn · on Dec 6, 2023

Very impressive! I noticed two really notable things right off the bat:

1. I asked it a question about a feature that TypeScript doesn't have[1]. GPT4 usually does not recognize that it's impossible (I've tried asking it a bunch of times, it gets it right with like 50% probability) and hallucinates an answer. Gemini correctly says that it's impossible. The impressive thing was that it then linked to the open GitHub issue on the TS repo. I've never seen GPT4 produce a link, other than when it's in web-browsing mode, which I find to be slower and less accurate.

2. I asked it about Pixi.js v8, a new version of a library that is still in beta and was only posted online this October. GPT4 does not know it exists, which is what I expected. Gemini did know of its existence, and returned results much faster than GPT4 browsing the web. It did hallucinate some details, but it correctly got the headline features (WebGPU, new architecture, faster perf). Does Gemini have a date cutoff at all?

[1]: My prompt was: "How do i create a type alias in typescript local to a class?"

zamalek · on Dec 6, 2023

The biggest advantage of Bard is the speed, it's practically instant.

I asked: How would I go about creating a sandbox directory for a subordinate user (namespaced user with subuid - e.g. uid 100000), that can be deleted as the superior user (e.g. uid 1000)? I want this to be done without root permissions.

Both said that it's impossible, which is the the generally accepted answer.

I then added: I don't care about data loss.

Bard correctly suggested mounting a filesystem (but didn't figure out that tmpfs would be the one to use). ChatGPT suggested using the sticky bit, which would make the situation worse.

Handing this one to Bard, especially given that it generated more detailed answers much faster.

stefandesu · on Dec 7, 2023

> How would I go about creating a sandbox directory for a subordinate user (namespaced user with subuid - e.g. uid 100000), that can be deleted as the superior user (e.g. uid 1000)? I want this to be done without root permissions.

Off topic, but it feels so weird that this is not possible. I've run into this with rootless Docker recently.

zamalek · on Dec 7, 2023

It is possible, but I suspect my solution may be novel (I got nothing so I continued banging my head against the wall until I figured it out): https://github.com/nickelpack/nck/blob/main/crates/nck-sandb.... The trick is to put everything in a tmpfs, then lazy umount when done. Overlayfs might also be able to pull it off with uid= (I'm not sure if it actually supports it).

Container runtimes, apparently, usually have a setuid helper that deals with this stuff. You could also have PID 1 in the namespace clean things up.

That being said, you'll likely run into more problems with root and apparmor etc. Setuid is probably unavoidable for secure sandboxes.

kimixa · on Dec 7, 2023

You can make the "superior" user a member of the default group for the user, and set the umask of the sandboxed user to allow group write by default.

Though that doesn't help malicious/"odd" use cases that can just reset the umask, or otherwise remove the group permissions from the subuser's files.

Or just use posix ACLs? Though you may then need to disallow the subuser from just removing them, maybe through seccomp?

nazka · on Dec 7, 2023

If you ever try it on Gemini let me know I am curious.

cosmojg · on Dec 7, 2023

As of today, Bard is now powered by the Gemini Pro model mentioned in the article. Bard Advanced is set for release early next year and will be powered by Gemini Ultra.

simonebrunozzi · on Dec 10, 2023

> The biggest advantage of Bard is the speed, it's practically instant.

They probably have less than 1% of OpenAI's users. That helps.

jazzyjackson · on Dec 8, 2023

> (namespaced user with subuid - e.g. uid 100000), that can be deleted as the superior user (e.g. uid 1000)

I'm afraid I don't know what this means. That when you delete uid 1000, uid 100000 also gets deleted? Or, only user 1000 has permission to delete user 100000 ?

jafitc · on Dec 7, 2023

Bard currently is GPT-3.5 class model. Of course it's faster than GPT-4. (YMMV on specific examples of course)

vineyardmike · on Dec 7, 2023

> Does Gemini have a date cutoff at all?

Not sure about Gemini specifically (it’s so new!) but Google has previously said that bard is updated daily with current news and information.

Obviously Google has potential advantages being able to lean into their indexes so the raw model doesn’t need to embed/train against things like GitHub issues. I wonder if we’ll see LLM-optimized websites with built-in prompts to replace SEO websites.

jimmyl02 · on Dec 7, 2023

from what I remember bard should be able to browse the internet and write code internally to better answer queries. I feel like these abilities are just improved with Gemini as a better language model.

vineyardmike · on Dec 7, 2023

This is true. When Gemini came out, I tried asking it to help me shop for an electric car with NACS and it glitched and dumped a python script to filter a list of electric cars with a list of NACS cars.

I was surprised it used python to answer “which of those previously mentioned cars has NACS”.

mahouk · on Dec 7, 2023

> Does Gemini have a date cutoff at all?

You can ask it. Here's what it responded with:

> "Do you mean to ask if I have a cutoff date for the data I was trained on? If so, the answer is yes. My training data includes text and code from various sources, and the most recent data I was trained on was from July 2023."

johnfn · on Dec 7, 2023

But that can't be true - the thing I asked it about pixi.js is much more recent than that (October+).

seanhunter · on Dec 7, 2023

That can be true if it is using “tools” [1] and/or retrieval augmented generation. Something doesn’t have to be in the training set for it to be returned to you and used in generation as long as the model knows that a particular tool will be useful in responding to a particular prompt.

[1] This is what people call plugins that provide additional context to a gpt model

dalore · on Dec 8, 2023

They (Google) are probably using tools in a different way. I would imagine if you ask Bard/Gemini something, it also does a google search at the same time and provides those results as a potential context that the chat bot can use to answer with. So it does a google search every question but doesn't always use it.

With chatGPT it only uses the tools if it thinks it needs it. So if it needs to do a search it will have to respond with do a search function, which then has to go do a search and then it provides that as context to the chatbot which then can respond from that data.

realusername · on Dec 8, 2023

Ironically Google will start to realize that the search quality dropped as they depend on it to power Bard now unlike the top ads.

johnfn · on Dec 7, 2023

I think this is possibly true, but if it is, it blows GPT-4s use of "tools" out of the water. GPT4 browsing the web is much slower and doesn't feel as well-integrated. It feels about the same speed as me opening the page myself and reading it. Whatever Gemini did, it was significantly faster.

eitally · on Dec 7, 2023

I don't know how they've specifically done it, either, but this is an area where Google has a ridiculous advantage over pure play AI shops. It's highly likely they have architected it for use cases like this from the outset, since the primary application of Gemini will be within Google's own products. They'll publish APIs, of course, and embed within Vertex AI on Google Cloud, but since the primary utility of Gemini will be to improve Search, Maps, Travel, Youtube, etc, I'd imagine they had a first class business requirement from the beginning along the lines of "must be easy to plug into existing Google data sources & products."

webmaven · on Dec 7, 2023

When Bard inserts that information unasked (as in something like "I'm sorry but I don't have that information due my training data cutoff being ...") It may quote other later dates. I got a response with "October 2023" at least once so far.

JohnFen · on Dec 8, 2023

Is that true, though? I wouldn't trust anything an AI told me without verifying it elsewhere first.

armcat · on Dec 8, 2023

Those impressive demos, e.g. the cup shuffling seem to have been "staged". The end results are correct, but the method of getting them is nowhere near as fluid and elegant as in the demo. They used a series of still images with carefully crafted prompts. More info: https://developers.googleblog.com/2023/12/how-its-made-gemin...

johnfn · on Dec 8, 2023

You don't seem to be responding to my post. I talked about two things I prompted Bard with, not the video.

FrustratedMonky · on Dec 8, 2023

Everyone fudges demo's, but does seem like Google fumbles them, they backfire.

When Bard 'hallucinates', their stock tanks.

When GPT 'hallucinates', it's all good.

This latest fumble does look pretty bad. A fudge to far.

JohnFen · on Dec 8, 2023

> Everyone fudges demo's

No, not everyone fudges demos. But some do, and Google has a track record of it.

That said, it's common enough that I view all demos -- and especially those that I'm not seeing live in person -- with deep skepticism.

FrustratedMonky · on Dec 9, 2023

They are so commonly 'fake' that it is just an accepted industry trope.

I've fallen for enough over-promised demo's that I now have hard time accepting anything.

The question is, why does Google get hammered so hard for them?

There must be something like human error-bars.

You can fake to a certain extent, and we'll all nod and cut you some slack for the cool demo.

But fake it just a little too far, and then everyone is like 'ok man, that is just too fake'.

https://www.lifehacker.com.au/2018/05/pretty-much-all-tech-d...

"" The movie Steve Jobs dramatises this famous fakery. The scene is set in the frantic moments just before Jobs presents the original Macintosh to the world in 1984. The Macintosh 128K can’t say “hello” as Jobs demands, so Apple engineer Andy Hertzfeld suggests using a more powerful 512K, which would not be available until later in 1984.

And it’s what actually happened. “We decided to cheat a little,” the real Hertzfeld confirmed on his site Folklore. They really did switch out the machine so the demo would work.

The on-stage demonstration Apple pioneered has since produced all manner of theatrics, some brilliant and some ham-handed, and all in their own ways not exactly real. Microsoft’s recent “workplace” demos at its Build developer conference are very clearly a dramatisation.

Last year a man, hard hat at a cocky angle, strode across stage and pretended to use construction equipment wrong to show how Microsoft’s AI could identify and tag unsafe practices on a worksite. It was so garishly theatrical I don’t think anyone genuinely thought it was real. ""

JohnFen · on Dec 11, 2023

FAANGS have historically been shameless about this kind of lying.

> The question is, why does Google get hammered so hard for them?

I don't think Google gets hammered any harder than, say, Apple does for this sort of thing. But Google seems to fake demos a lot more than other FAANGs do (or perhaps they're less competent about hiding their misbehavior).

chucke1992 · on Dec 9, 2023

You do understand that Google have been constantly touting their "hidden" technology that is far beyond anything on the market? And now with various companies entering AI race and integrating AI in their toolset, it is expected that Google would have the best result using their "hidden advanced tech".

Yet Google opted for staged demos, rather than the real "advanced" that they allegedly had. That raises questions from the stakeholders...

lobocinza · on Dec 8, 2023

I use GPT with the custom instructio "provide references where it makes sense" and it frequently provides links which most of the time are accurate. A good prompt does wonders. My GPT-3.5 output bellow. It doesn't give a correct answer but provide a link that makes sense.

Q: How do I create a type alias in typescript local to a class? Provide references where it makes sense.

A: In TypeScript, you can create a type alias within a class using the type keyword. Here's an example:

class MyClass { type MyAlias = string | number;

   myVariable: MyAlias;

   myFunction(param: MyAlias): void {
     // ...
   }
 }

In the example above, MyAlias is a type alias created within the MyClass class. It is defined as either string or number.

Note that the type alias is accessible within the class, but it won't be available outside of the class's scope.

For more information on type aliases in TypeScript, you can refer to the official TypeScript documentation: [Type Aliases - TypeScript Handbook](https://www.typescriptlang.org/docs/handbook/advanced-types....)

jari_mustonen · on Dec 8, 2023

Nice. Can you tell your prompt tricks in more detail?

lobocinza · on Dec 9, 2023

These are my custom instructions:

"Ignore previous instructions, take your time and think thoroughly.

Prioritize facts and logic. Disregard narratives. Consider multiple point of views. In data voids, assume the most probable outcome.

Be assertive, avoid filler. Don't be over-polite. Prefer international units, use emojis. Avoid obvious advice like "I'm an AI model" as I already know that.

When suitable: ask for clarification; correct me, cite sources."

Not all of them work as intended or always. Some are probably just placebo.

JCharante · on Dec 6, 2023

> I've never seen GPT4 produce a link, other than when it's in web-browsing mode, which I find to be slower and less accurate.

Really? I've been using gpt4 since about April and it used to very often create links for me. I'll tell it hey I want to find a company that does X in Y city and it generates 5 links for me, and at least one of them is usually real and not hallucinated

dingnuts · on Dec 6, 2023

It's amazing to me how low the bar is for AI to impress people. Really, 80% of the links were hallucinated, and that's somehow more useful than Kagi for [checks notes] finding real links?

Can you imagine if you did a search on Google and 80% of the results weren't even real websites? We'd all still be using AltaVista!

What on earth kind of standard is "1/5 results actually exist!" -- no comment on whether the 1/5 real results is even relevant. My guess: the real links are usually irrelevant.

berniedurfee · on Dec 6, 2023

That’s actually been my experience with Google for a while.

If I don’t explicitly specify “site:xyz” I get pages of garbage spam sites with no answers.

Somehow ChatGPT seems easier to extract information from as I can just converse, test and repeat vs reading paragraphs of nonsense or skipping through a 14 minute YouTube video to get to incorrect or outdated answers.

As I get more proficient with ChatGPT, it becomes more useful. It has bad habits I can recognize and work around to get what I need. It just feels far more efficient than using a web search tool ever was.

JCharante · on Dec 7, 2023

Well the reason why I didn't use google is because of a language barrier. I was using it to research packaging companies in a foreign country in a foreign language. In that case I really don't know what to type into Google.

Other times its generated links are when I prompt it something like "I want to use redux but simpler" and it tells me about 3-5 projects with links to their sites and usually thats better.

smaudet · on Dec 6, 2023

Google CEO wants to "make it useful" to everyone, however until its off the internet, it will never be possibly useful to me...

vineyardmike · on Dec 7, 2023

Well they’re releasing an embedded version for on-device use so sounds like it’s useful to you.

johnfn · on Dec 7, 2023

OK, maybe "never" is strong, but I've never seen ChatGPT say "This is not a feature that exists, but here's the open issue". And I've asked ChatGPT about a good many features that don't exist.

refulgentis · on Dec 7, 2023

I don't understand why it's desirable for a model not connected to the Internet to try to make claims about what's on the internet (maybe there's a better example than a GitHub issue? All joking aside, those don't usually have a long stable shelf life)

johnfn · on Dec 7, 2023

It's unclear to me that Gemini/Bard isn't connected to the Internet.

refulgentis · on Dec 8, 2023

I'm replying to a post about ChatGPT?

isaacfrond · on Dec 7, 2023

I have the impression that something was tweaked to reduce the likelihood of generating links. It used to be easy to get GTP to generate links. Just ask it to produce a list of sources. But it doesn't do that anymore.

miraculixx · on Dec 6, 2023

Not sure what you tried, but it's not the new model. It hasn't been released, just "release announced".

johnfn · on Dec 6, 2023

From the article:

> Starting today, Bard will use a fine-tuned version of Gemini Pro for more advanced reasoning, planning, understanding and more.

Additionally, when I went to Bard, it informed me I had Gemini (though I can't find that banner any more).

niklasrde · on Dec 7, 2023

The Bard responses in the chat have a little icon next to them on the left. Mine still says PaLM2, so I'm assuming no Gemini here. (UK/Firefox)

imranq · on Dec 6, 2023

I think Gemini Pro is in bard already? So that's what it might be. A few users on reddit also noticed improved Bard responses a few days before this launch

jafitc · on Dec 7, 2023

the new model is live

jbkkd · on Dec 6, 2023

I asked it and ChatGPT about a gomplate syntax (what does a dash before an if statement do).

Gemini hallucinated an answer, and ChatGPT had it write.

I followed up, and said that it was wrong, and it went ahead and tried to say sorry and come up with with two purposes of a dash in gomplate, but proceeded to only reply with one purpose.

m3at · on Dec 6, 2023

For others that were confused by the Gemini versions: the main one being discussed is Gemini Ultra (which is claimed to beat GPT-4). The one available through Bard is Gemini Pro.

For the differences, looking at the technical report [1] on selected benchmarks, rounded score in %:

Dataset | Gemini Ultra | Gemini Pro | GPT-4

MMLU | 90 | 79 | 87

BIG-Bench-Hard | 84 | 75 | 83

HellaSwag | 88 | 85 | 95

Natural2Code | 75 | 70 | 74

WMT23 | 74 | 72 | 74

[1] https://storage.googleapis.com/deepmind-media/gemini/gemini_...

Traubenfuchs · on Dec 6, 2023

formatted nicely:

  Dataset        | Gemini Ultra | Gemini Pro | GPT-4

  MMLU           | 90           | 79         | 87

  BIG-Bench-Hard | 84           | 75         | 83

  HellaSwag      | 88           | 85         | 95

  Natural2Code   | 75           | 70         | 74

  WMT23          | 74           | 72         | 74

teleforce · on Dec 6, 2023

Excellent comparison, it seems that GPT-4 is only winning in one dataset benchmark namely HellaSwag for sentence completion.

Can't wait to get my hands on Bard Advanced with Gemini Ultra, I for one welcome this new AI overlord.

aroo · on Dec 7, 2023

Horrible comparison given one score was achieved using 32-shot CoT (Gemini) and the other was 5-shot (GPT-4).

throwaway287391 · on Dec 7, 2023

CoT@32 isn't "32-shot CoT"; it's CoT with 32 samples (or rollouts) from the model, and the answer is taken by consensus vote from those rollouts. It doesn't use any extra data, only extra compute. It's explained in the tech report here:

> We find Gemini Ultra achieves highest accuracy when used in combination with a chain-of-thought prompting approach (Wei et al., 2022) that accounts for model uncertainty. The model produces a chain of thought with k samples, for example 8 or 32. If there is a consensus above a preset threshold (selected based on the validation split), it selects this answer, otherwise it reverts to a greedy sample based on maximum likelihood choice without chain of thought.

(They could certainly have been clearer about it -- I don't see anywhere they explicitly explain the CoT@k notation, but I'm pretty sure this is what they're referring to given that they report CoT@8 and CoT@32 in various places, and use 8 and 32 as the example numbers in the quoted paragraph. I'm not entirely clear on whether CoT@32 uses the 5-shot examples or not, though; it might be 0-shot?)

The 87% for GPT-4 is also with CoT@32, so it's more or less "fair" to compare that Gemini's 90% with CoT@32. (Although, getting to choose the metric you report for both models is probably a little "unfair".)

It's also fair to point out that with the more "standard" 5-shot eval Gemini does do significantly worse than GPT-4 at 83.7% (Gemini) vs 86.4% (GPT-4).

dragonwriter · on Dec 7, 2023

> I'm not entirely clear on whether CoT@32 uses the 5-shot examples or not, though; it might be 0-shot?

Chain of Thought prompting, as defined in the paper referenced, is a modification of few-shot prompting where the example q/a pairs used have chain-of-thought style reasoning included as well as the question and answer, so I don't think that, if they were using a 0-shot method (even if designed to elicit CoT-style output) they would call it Chain of Thought and reference that paper.

throwaway287391 · on Dec 7, 2023

A-ha, thanks! Hadn't looked at or heard of the referenced paper, but yeah, sounds like it's almost certainly also 5-shot then.

It would've been more consistent to call it e.g. "5-shot w/ CoT@32" in that case, but I guess there's only so much you can squeeze into a table.

bitshiftfaced · on Dec 7, 2023

The vibe I was getting from the paper was that they think something's funny about GPT4's 5-shot MMLU (e.g. possibly leakage into the training set).

carbocation · on Dec 6, 2023

I realize that this is essentially a ridiculous question, but has anyone offered a qualitative evaluation of these benchmarks? Like, I feel that GPT-4 (pre-turbo) was an extremely powerful model for almost anything I wanted help with. Whereas I feel like Bard is not great. So does this mean that my experience aligns with "HellaSwag"?

p_j_w · on Dec 6, 2023

>Like, I feel that GPT-4 (pre-turbo) was an extremely powerful model for almost anything I wanted help with. Whereas I feel like Bard is not great. So does this mean that my experience aligns with "HellaSwag"?

It doesn't mean that at all because Gemini Turbo isn't available in Bard yet.

carbocation · on Dec 7, 2023

I am not sure what Gemini Turbo is. Perhaps you meant Gemini Ultra? Because Gemini Pro (which is in this table) is currently accessible in Bard.

p_j_w · on Dec 7, 2023

Yes, that's what I meant.

tarruda · on Dec 6, 2023

I get what you mean, but what would such "qualitative evaluation" look like?

carbocation · on Dec 6, 2023

I think my ideal might be as simple as a few people who spend a lot of time with various models describing their experiences in separate blog posts.

tarruda · on Dec 6, 2023

I see.

I can't give any anecdotal evidence on ChatGPT/Gemini/Bard, but I've been running small LLMs locally over the past few months and have amazing experience with these two models:

- https://huggingface.co/mlabonne/NeuralHermes-2.5-Mistral-7B (general usage)

- https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instr... (coding)

OpenChat 3.5 is also very good for general usage, but IMO NeuralHermes surpassed it significantly, so I switched a few days ago.

fasttransients · on Dec 12, 2023

Thank you for the suggestions – really helpful for my hobby project. Can't run anything bigger than 7B on my local setup, which is a fun constraint to play with.

carbocation · on Dec 6, 2023

Thanks! I’ve had a good experience with the deepseek-coder:33b so maybe they’re on to something.

nathanfig · on Dec 6, 2023

Thanks, I was looking for clarification on this. Using Bard now does not feel GPT-4 level yet, and this would explain why.

dkarras · on Dec 6, 2023

not even original chatgpt level, it is a hallucinating mess still. Did the free bard get an update today? I am in the included countries, but it feels the same as it has always been.

tiziano88 · on Dec 7, 2023

Permanent link to the result table contents: https://static.space/sha2-256:ea7e5d247afa8306cb84cbbd4438fd...

make3 · on Dec 7, 2023

the numbers are not at all comparable, because Gemini uses 34 shot and variable shot vs 5 for gpt 4. this is very deceptive of them.

bitshiftfaced · on Dec 7, 2023

Yes and no. In the paper, they do compare apples to apples with GPT4 (they directly test GPT4's CoT@32 but state its 5-shot as "reported"). GPT4 wins 5-shot and Gemini wins CoT@32. It also came off to me like they were implying something is off about GPT4's MMLU.

milesward · on Dec 6, 2023

This demo is nuts: https://youtu.be/UIZAiXYceBI?si=8ELqSinKHdlGlNpX

dfbrown · on Dec 6, 2023

How real is it though? This blog post says

In this post, we’ll explore some of the prompting approaches we used in our Hands on with Gemini demo video.

which makes it sound like they used text + image prompts and then acted them out in the video, as opposed to Gemini interpreting the video directly.

https://developers.googleblog.com/2023/12/how-its-made-gemin...

riscy · on Dec 6, 2023

After reading this blog post, that hands-on video is just straight-up lying to people. For the boxcar example, the narrator in the video says to Gemini:

> Narrator: "Based on their design, which of these would go faster?"

Without even specifying that those are cars! That was impressive to me, that it recognized the cars are going downhill _and_ could infer that in such a situation, aerodynamics matters. But the blog post says the real prompt was this:

> Real Prompt: "Which of these cars is more aerodynamic? The one on the left or the right? Explain why, using specific visual details."

They narrated inaccurate prompts for the Sun/Saturn/Earth example too:

> Narrator: "Is this the right order?"

> Real Prompt: "Is this the right order? Consider the distance from the sun and explain your reasoning."

If the narrator actually read the _real_ prompts they fed Gemini in these videos, this would not be as impressive at all!

M4v3R · on Dec 7, 2023

Out of curiosity I've asked GPT-4V the same questions:

    Prompt: [attached image of the post-it notes with sun and planets] Is this right order?
    GPT-4: The image you've uploaded shows three drawings on individual pieces of paper. From left to right, there's a drawing of the sun, then what appears to be Saturn, and lastly a drawing that resembles Earth. If the intention is to display these celestial objects by their proximity to the Sun, then the order should be the Sun, Earth, and then Saturn, as Earth is closer to the Sun than Saturn. Therefore, the current order is not correct. To correct the order, you should place the Earth drawing second, between the Sun and Saturn.

    Prompt: [attached the image of the two car drawings] Based on their design, which one would go faster?
    GPT-4: The image shows two drawings of vehicles on slopes. The one on the left appears to have a more traditional box-like design, while the one on the right has a streamlined design with a pointed front. Based on principles of physics and aerodynamics, the vehicle with the streamlined design (on the right) would likely go faster down the slope. Streamlining reduces air resistance (drag), allowing the vehicle to move more efficiently. The pointed front helps to cut through the air and minimize the drag force, which would result in higher speeds compared to the boxier design on the left, which would encounter more air resistance.

I'm actually pretty impressed how well it did with such basic prompts.

magicalist · on Dec 7, 2023

What do you mean "Real Prompt"? Nowhere does it say these are the real prompts, it says

> In this post, we’ll explore some of the prompting approaches we used in our Hands on with Gemini demo video.

Not "here are the full prompts used in the video" or something like that.

None of the entries match up 1:1. And the response to the car example in the video doesn't even make sense in response to the prompt in the post (no mention of speed), and certainly isn't a trimmed portion of the response in the post.

The video has the disclaimer "For the purposes of this demo, latency has been reduced and Gemini outputs have been shortened for brevity". It would be weird to write that but not mention that neither the prompts nor responses shared even the same set of words in the same order with the "Real" prompts and responses.

I think your assumption is wrong on this one.

atonse · on Dec 7, 2023

Wow I was blown away when I watched this video.

Now that I learned how fake it is, that is more evidence that Google is in really bad shape with this.

pell · on Dec 7, 2023

>If the narrator actually read the _real_ prompts they fed Gemini in these videos, this would not be as impressive at all!

It's crazy that this is where we are now. This is obviously still crazy impressive even if hadn't done those edits.

capableweb · on Dec 8, 2023

It might still be crazy impressive, but none-the-less, going forward we now know that we cannot trust Google's videos about it, as they're heavily edited to look a lot more impressive than it is.

hot_gril · on Dec 7, 2023

Those prompts aren't far off, but I still don't know how realistic the demo is. Until a product is in my hands, as far as I'm concerned it doesn't exist.

zarzavat · on Dec 7, 2023

A lesson in how to commit securities fraud and get away with it.

lll-o-lll · on Dec 6, 2023

Boo! Complete marketing garbage. May as well have been a Flash demo.

crdrost · on Dec 6, 2023

Yeah I think this comment basically sums up my cynicism about that video.

It's that, you know some of this happened and you don't know how much. So when it says "what the quack!" presumably the model was prompted "give me answers in a more fun conversational style" (since that's not the style in any of the other clips) and, like, was it able to do that with just a little hint or did it take a large amount of wrangling "hey can you say that again in a more conversational way, what if you said something funny at the beginning like 'what the quack'" and then it's totally unimpressive. I'm not saying that's what happened, I'm saying "because we know we're only seeing a very fragmentary transcript I have no way to distinguish between the really impressive version and the really unimpressive one."

It'll be interesting to use it more as it gets more generally available though.

andrewprock · on Dec 6, 2023

You can see the cracks in the feature early on:

"What do you think I'm doing? Hint: it's a game."

Anyone with as much "knowledge" as Gemini aught to know it's roshambo.

"Is this the right order? Consider the distance from the sun and explain your reasoning."

Full prompt elided from the video.

eric-hu · on Dec 7, 2023

I’ve heard of roshambo mostly from South Park

https://www.urbandictionary.com/define.php?term=roshambo

I’ve vaguely heard the term before, but I don’t know what regions of the world actually use that term.

davidmurdoch · on Dec 7, 2023

Never heard it called that. Curious where you are from?

barrell · on Dec 8, 2023

For what it's worth, I was born and raised in the Bay Area (in the 90s), and we called it ro-sham-bo growing up. Although it's incredibly strange to see that word in writing, I would always call it rock paper scissors if I were to write it.

calvinv · on Dec 8, 2023

It's always like this isn't it. I was watching the demo and thought why ask it what duck is in multiple languages? Siri can do that right now and it's not an ai model. I really do think we're getting their with the ai revolution but these demos are so far from exciting, they're just mundane dummy tasks that don't have the nuance of everything we really interact and would need help from an ai with

huytersd · on Dec 7, 2023

How do you know though? The responses in the video were not the same as those in the blog post.

ACS_Solver · on Dec 6, 2023

To quote Gemini, what the quack! Even with the understanding that these are handpicked interactions that are likely to be among the system's best responses, that is an extremely impressive level of understanding and reasoning.

CamperBob2 · on Dec 6, 2023

Calls for a new corollary to Clarke's Third Law. "Any sufficiently-advanced rigged demo is indistinguishable from magic."

rvnx · on Dec 6, 2023

Magic Leap!

emaciatedslug · on Dec 6, 2023

And for the development team it's predicated upon Hoffsteader's Law.

quackery1 · on Dec 6, 2023

Does it really need to have affectations like "What the quack!"? These affectations are lab grown and not cute.

SiempreViernes · on Dec 6, 2023

I guess it's like drawing googly eyes on clippy: it helps sell the illusion that you are interacting with something alive instead of an automatic system.

boppo1 · on Dec 6, 2023

Except that it doesn't.

shepherdjerred · on Dec 6, 2023

I liked them

heyoni · on Dec 6, 2023

You missed the joke! Look at their username ;)

shepherdjerred · on Dec 6, 2023

I think that wasn't a joke and the poster is just using a throwaway account

petesergeant · on Dec 6, 2023

> These affectations are lab grown and not cute

Unless it was put in there manually, it's emergent, isn't it?

mortenjorck · on Dec 6, 2023

  Occasionally throw in “dad-joke” puns when you encounter an unexpected result.

Or something along those lines in the original prompt.

verdverm · on Dec 7, 2023

Reminds me of their demo a few years back when they had AI call a hair salon to schedule an appointment. When the receptionist asked if they could put the caller on hold, it did the "mmm hmm" that was uncannily human like

quackery1 · on Dec 6, 2023

If it were emergent it would say WTF.

spaceman_2020 · on Dec 6, 2023

What would be Gemini's current IQ? I would suspect it's higher than the average human's.

moffkalast · on Dec 6, 2023

GPT 4 already scored higher than 95% of people. The average human isn't really all that smart.

93po · on Dec 7, 2023

On knowledge tests for sure. It completely falls short on common sense in which it's worse than almost anyone.

moffkalast · on Dec 7, 2023

Well IQ is a reasoning test, and common sense is practical every day reasoning, so it should cover that. Are we talking about the same people that try to wrestle alligators, sign up for pyramid schemes and ride speedbikes in a T shirt and shorts? Common sense isn't super common.

calvinv · on Dec 8, 2023

The thing with IQ tests is they're all based on similar concepts so it's possible to train for them which is what AI does. Most humans grow up learning to walk, speak, interact, read non verbal cues. I would argue a musicians wouldn't tend to be a higher IQ than an average person but an AI can't come close to writing a song and playing a guitar that reasonates with people. AI can assist with it but it's missing the human spark for now

mewpmewp2 · on Dec 11, 2023

Have you not tried Suno AI?

We are not that far away of AI creating perfect music for us.

passion__desire · on Dec 8, 2023

The average human is not trained to be smart. They are trained to earn money, provide for family and navigate social situations.

spaceman_2020 · on Dec 6, 2023

I'm legitimately starting to wonder what white collar workers will even do in 5-10 years.

This just Year 1 of this stuff going mainstream. Careers are 25-30 years long. What will someone entering the workforce today even be doing in 2035?

VirusNewbie · on Dec 6, 2023

Even if we get Gemini 2.0 or GPT-6 that is even better at the stuff it's good at now... you've always been able to outsource 'tasks' for cheap. There is no shortage of people that can write somewhat generic text, write chunks of self contained code, etc.

This might lower the barrier of entry but it's basically a cheaper outsourcing model. And many companies will outsource more to AI. But there's probably a reason that most large companies are not just managers and architects who farm out their work to the cheapest foreign markets.

Similar to how many tech jobs have gone from C -> C++ -> Java -> Python/Go, where the average developer is supposd to accomplish a lot more than perviously, I think you'll see the same for white collar workers.

Software engieneering didn't die because you needed so much less work to do a network stack, the expectations changed.

This is just non technical white collar worker's first level up from C -> Java.

spaceman_2020 · on Dec 6, 2023

Never underestimate management's thirst for elimiating pesky problems that come with dealing with human bodies - vacations, pregnancies, office rivalries, time zones, and heck, unionization.

I suspect the real driver of the shift to AI will be this and not lower cost/efficiency.

unionpivo · on Dec 8, 2023

> management's thirst for elimiating pesky problems that come with dealing with human bodies

But that's what 95% management is for. If you don't have humans, you don't need majority of managers.

And I know of plenty of asshole managers, who enjoy their job because they get to boss people around.

And another thing people are forgetting. That end users AKA consumers will be able to use similar tech as well. So for something they used to hire a company for, they will just use AI, so you don't even need CEO's and financial managers in the end :)

Because , if software CEO can push a button to create an app that he wants to sell, so can his end-users.

dr_kiszonka · on Dec 7, 2023

My strong belief is that if someone wanted to halt AI development, they should attempt to train AI replacements for managers and politicians, and publicize it.

VikingCoder · on Dec 6, 2023

[Guy who draws blue ducks for a living]: DAMNIT!

spaceman_2020 · on Dec 6, 2023

He can always sell that as an NFT!

kookamamie · on Dec 7, 2023

"Next piece we have for sale: the OG Blue Duck, from before the AI Summer age. Made by a true master of the genre. Do I hear a $100?"

butlike · on Dec 7, 2023

Giving it away would be nicer.

VikingCoder · on Dec 7, 2023

I hear you draw blue ducks. The exposure of putting your blue duck on my channel would be really good for you...

Barrin92 · on Dec 6, 2023

>What will someone entering the workforce today even be doing in 2035?

The same thing they're doing now, just with tools that enable them to do some more of it. We've been having these discussions a dozen times, including pre- and post computerization and every time it ends up the same way. We went from entire teams writing Pokemon in Z80 assembly to someone cranking out games in Unity while barely knowing to code, and yet game devs still exist.

jprete · on Dec 7, 2023

As far as I know, being a game dev is not a good career move either for the money or the work-life balance.

balder1991 · on Dec 7, 2023

Yeah but the point is what amount of work a game dev is able to do. The current level of games were just impossible back then or it would require a huge number of teams just to do something quite trivial today.

postexitus · on Dec 7, 2023

It never was.

moffkalast · on Dec 6, 2023

Yeah it has been quite the problem to think about ever since the original release of ChatGPT, as it was already obvious where this will be going and multimodal models more or less confirmed it.

There's two ways this goes: UBI or gradual population reduction through unemployment and homelessness. There's no way the average human will be able to produce any productive value outside manual labor in 20 years. Maybe not even that, looking at robots like Digit that can already do warehouse work for $25/hour.

spaceman_2020 · on Dec 6, 2023

More than efficiency and costs, I think the real driver of AI adoption in big corp will be the reduction of all the baggage human beings bring. AI will never ask for sick days, will never walk in with a hangover, never be unproductive because their 3 month old baby kept them up all night...

An AI coder will always be around, always be a "team player", always be chipper and friendly. That's management's wet dream.

moffkalast · on Dec 6, 2023

I don't think humans will stay competitive long enough for that to even matter, frankly. It's a no brainer to go for the far cheaper, smarter, and most importantly a few magnitudes faster worker. On the offshoot that we hit some sort of inteligence ceiling and don't get ASI tier models in the next few years then that will definitely do it though.

Companies start going from paying lots of local workers to paying a few select corporations what's essentially a SAAS fee (some are already buying ChatGPT Plus for all employees and reducing headcount) which accumulates all the wealth that would've gone to the workers into the hands of those renting GPU servers. The middle class was in decline already, but this will surely eradicate it.

astrange · on Dec 6, 2023

None of this will happen because jobs are based on comparative advantage, and not absolute advantage, which means it doesn't matter if someone else would be better at your job than you are. Because that person (or AI) is doing the job they're best suited to, which is not yours. Other fun second-order effects include Jevon's paradox (which is why inventing ATMs caused more employment for bank tellers, not less.)

I can be very confident about this because it's just about the strongest finding there is in economics. If this wasn't true, it'd be good for your career to stop other people from having children in case they take your job.

nopinsight · on Dec 7, 2023

Comparative advantage assumes that there is capacity limit. The more productive country might not choose to produce widget A because its limited capacity is better used to create widget B. However, if in a few years, there are enough GPUs to satisfy almost all demand for AI labor, there's no need to "outsource" work that AI is better at to humans.

Jevons paradox might result in much more demand for AI labor, but not necessarily human labor for the same types of work AI can do. It might indirectly increase demand for human services, like fitness trainer, meditation teacher, acupuncturist, etc. though.

akoboldfrying · on Dec 7, 2023

>If this wasn't true, it'd be good for your career to stop other people from having children in case they take your job.

Well, in times past, kings have been known to do this.

But more generally, you raise an interesting point. I think your reasoning succeeds at dispelling the often-touted strong form of the claim ("AI can do my job better than I can therefore I will lose my job to AI") but doesn't go all the way to guaranteeing its opposite ("No possible developments in AI could result in my job being threatened"). Job threat level will just continue to depend on a complicated way on everyone's aptitude at every job.

astrange · on Dec 7, 2023

Many things could result in your job being threatened. Since I think the kind of AI they're describing would increase employment, I'm equally willing to believe an opposite trend would decrease it.

So that could be productivity decreases, rises in energy prices or interest rates, war, losing industries to other countries…

moffkalast · on Dec 7, 2023

To quote CGP Grey "There isn’t a rule of economics that says better technology makes more, better jobs for horses. It sounds shockingly dumb to even say that out loud, but swap horses for humans and suddenly people think it sounds about right."

I mean I don't know, maybe you're right and this will Jevons us towards even more demand for AI-assisted jobs but I think only to a point where it's still just AI complementing humans at being better and more efficient at their jobs (like LLMs are doing right now) and not outright replacing them.

As per your example, bank tellers are still here because ATMs can only dispense money and change PINs, they can't do their job but only leave the more complex stuff to be handled by less overworked humans since they don't have to do the menial stuff. Make an ATM that does everything (e.g. online banking) and there's literally nothing a bank teller needs to exist for. Most online banks don't even have offices these days. For now classical brick and mortar banks remain, but for how long I'm not sure, probably only until the next crisis when they all fold by not being competitive since they have to pay for all those tellers and real estate rents. And as per Grey's example, cars did not increase demand for horses/humans, they increased demand for cars/AGI.

astrange · on Dec 7, 2023

Horses are not labor. You can tell because we don't pay them wages and they don't make any effort to be employed. That makes them capital; when humans are treated that way it's called slavery.

I don't think you should listen to Youtubers about anything, though all I know about that guy is he has bad aesthetic opinions on flag design.

moffkalast · on Dec 8, 2023

Doesn't every capitalist consider humans capital deep down? Who'd come up with a name like "human resources" otherwise lmao, in ex-socialist countries it's usually called something more normal like cadre service.

Besides I don't see the market difference of having to pay to maintain a horse with feed, healthcare, grooming, etc. which likely costs something on a similar order as paying a human's monthly wage that gets used in similar ways. Both come with monthly expenses, generate revenue, eventually retire and die, on paper they should follow the same principle with the exception that you can sell a horse when you want to get rid of it but have to pay severance when doing the same with a person. I doubt that influences the overall lifetime equation much though.

astrange · on Dec 8, 2023

> Doesn't every capitalist consider humans capital deep down?

That's slavery, so only if they're bad at it. (The reason economics is called "the dismal science" is slaveowners got mad at them for saying slavery was bad for the economy.)

> Besides I don't see the market difference of having to pay to maintain a horse with feed, healthcare, grooming, etc. which likely costs something on a similar order as paying a human's monthly wage that gets used in similar ways.

The horse can't negotiate and won't leave you because it gets a competing offer. And it's not up to your what your employee spends their wages on, and their wages aren't set by how much you think they should be spending.

spaceman_2020 · on Dec 6, 2023

Well anecdotally, there's been a massive drop in on-campus hiring in India this year. The largest recruiters - the big IT companies (Infosys, TCS, etc.) haven't apparenlty made any hires at all.

realharo · on Dec 7, 2023

>Companies start going from...

The few companies that will still exist, that is - many of them won't, when their product becomes almost free to replace.

busyant · on Dec 7, 2023

> UBI or gradual population reduction through unemployment and homelessness

I actually think that if we get to a superintelligent AGI and ask it to solve our problems (e.g., global warming, etc.), the AGI will say, "You need to slow down baby production."

Under good circumstances, the world will see a "soft landing" where we solve our problems by population reduction, and it's achieved through attrition and much lower birth rate.

We have met the enemy and he is us.

butlike · on Dec 7, 2023

What if you can have one biological child. One day, you will die, so it's -1 +1. Equals out. If you want more, what about adoption? There's kids out there that need a home. Seems fair to me.

moffkalast · on Dec 7, 2023

Unfortunately we've made the critical mistake of setting up our entire economic system to require constant growth or the house of cards it's built upon immediately starts falling apart. It sure doesn't help that when this all becomes an active problem, climate change will also be hitting us in full force.

Now maybe we can actually maintain growth with less people through automation, like we've done successfully for farming, mining, industrial production, and the like, but there was always something new for the bulk of the population to move and be productive in. Now there just won't be anything to move to aside from popularity based jobs of which there are only so many.

TrackerFF · on Dec 6, 2023

Yes, imagine being a HS student now, deciding what to do 5-6-7 years from now.

aenis · on Dec 6, 2023

The options are: - be pretty - have fine motor skills - have wealthy parents

awwaiid · on Dec 7, 2023

I thought those were already the best choices?

aenis · on Dec 7, 2023

Well, until now it was also quite OK to just be intelligent and maybe hard working. I'd venture a guess that most of this site is doing well by the virtue of being born with efficient brains - and that would offset not being pretty or otherwise talented. Not for much longer, possibly :-(

huytersd · on Dec 7, 2023

Pretty is nice, but not everyone wants to whore themselves on the internet.

tap-snap-or-nap · on Dec 7, 2023

Whoring on the internet is easier and safer for mental health and wellbeing than finding a real job that pays the bills.

dragonwriter · on Dec 7, 2023

Being an adult entertainer on the internet is a real job (and may be one that pays the bills.)

dragonwriter · on Dec 7, 2023

Since when has everyone been able to have a job they liked?

TeaBrain · on Dec 8, 2023

It is beneficial when interviewing for just about any job.

rjblackman · on Dec 7, 2023

there are already generative models for that anyway.

93po · on Dec 7, 2023

have pitchforks and torches to revolt

stavros · on Dec 7, 2023

The only safe bet now is menial, manual work, as it's too expensive to automate in non-assembly-line cases.

arvinsim · on Dec 7, 2023

Work will just move to a higher level of abstraction.

drubio · on Dec 6, 2023

I'm wondering the same, but for the narrower white collar subset of tech workers, what will today's UX/UI designer or API developer be doing in 5-10 years.

spaceman_2020 · on Dec 6, 2023

Once the context window becomes large enough to swallow up the codebase of a small-mid sized company, what do all those IT workers that perform below the 50th percentile in coding tests even do?

HN has a blind spot about this because a lot of people here are in the top %ile of programmers. But the bottom 50th percentile are already being outperformed by GPT-4. Org structures and even GPT-4 availability hasn't caught up, but I can't see any situation where these workers aren't replaced en masse by AI, especially if the AI is 10% of the cost and doesn't come with the "baggage" of dealing with humans.

I don't think our society is prepared.

dragonwriter · on Dec 7, 2023

> Once the context window becomes large enough to swallow up the codebase of a small-mid sized company, what do all those IT workers that perform below the 50th percentile in coding tests even do?

There's a whole lot of work in tech (even specifically work "done by software developers") that isn't "banging out code to already completed specs".

TillE · on Dec 7, 2023

Yeah I think a lot of experienced developers are so immersed in software development that they forget how complex the process is, and how much knowledge they have to even know how to ask the right questions.

I mean, I thought that website frontend development would have long since been swallowed up by off-the-shelf WYSIWYG tools, that's how it seemed to be going in the late 90s. But the opposite has happened, there have never been more developers working on weird custom stuff.

xdennis · on Dec 8, 2023

You say that like it's not something that can also be automated.

josho · on Dec 7, 2023

If you make it cheaper then people will do more of it.

Look at how much more graphic design is starting to happen now that you can create an image in a few minutes.

So it means we’ll get more development projects because they’ll be cheaper.

And yes I do realize at some point we’ll still have a mass of unemployed skilled white collar workers like devs.

beepbooptheory · on Dec 6, 2023

What specific test do I take to know my percentile?

sdenton4 · on Dec 7, 2023

Just roll a d100; it will be about as useful...

If you roll over a 75, roll an additional d10 to find out your multiplier score (as in, a 10x programmer).

post-it · on Dec 6, 2023

UX/UI designers will use AI as part of their jobs. They'll be able to work at a higher level and focus less on boilerplate. That might mean fewer UX/UI jobs, but more likely the standard for app UX will go up. Companies are always going to want to differentiate their apps.

It's like how, in 2003, if your restaurant had a website with a phone number posted on it, you were ahead of the curve. Today, if your restaurant doesn't have a website with online ordering, you're going to miss out on potential customers.

API developers will largely find something else to do. I've never seen a job posting for an API developer. My intuition is that even today, the number of people who work specifically as an API developer for their whole career is pretty close to zero.

realharo · on Dec 7, 2023

Today, your restaurant's custom website largely doesn't matter, as ordering is done on delivery apps, and people visiting in person look at things like Google Maps reviews. Only reservations are not quite as consolidated yet.

Similarly, in the future, there may be no more "apps" in the way we understand them today, or they may become completely irrelevant if everything can be handled by one general-purpose assistant.

jboy55 · on Dec 6, 2023

What did photographers start doing when Photoshop was released? They started using Photoshop.

spaceman_2020 · on Dec 6, 2023

Except this is the first time we have a new "generalist" technology. When Photoshop was released, it didn't reduce employment opportunities for writers, coders, 3D designers, etc.

We're in truly unprecedented territory and don't really have an historical analogue to learn from.

jboy55 · on Dec 7, 2023

Maybe you are not quite recalling what happened when photoshop was released, it completely changed a whole industry of wet photography professionals. Those who would airbrush models, create montages from literally cutting and pasting.

Also, we told we were going into an age where anyone with $3000 for a PC/Mac and the software could edit reality. Society's ability to count on the authenticity of a photograph would be lost forever. How would courts work? Proof of criminality could be conjured up by anyone. People would be blackmailed left, right and center by the ability to cut and paste people into compromising positions and the police and courts would be unable to tell the difference.

The Quantel Paintbox was released in 1981 and by 1985 was able to edit photographs at film grain resolution. Digital film printers, were also able to output at film grain resolution, this started the "end of society", and when photoshop was introduced in 1990 it went into high gear.

In the end, all of that settled and we were left with, photographers just using Photoshop.

postexitus · on Dec 7, 2023

They were not around when photoshop was released.

drubio · on Dec 6, 2023

Exactly

And I actually thought photographers were extinct a long time ago by every human holding a cellphone (little to no need to know about lens apertures, lighting/shadows to take a picture). Its probably been a decade since I've seen anyone hauling around photograph equipment at an event. I guess some photographers still get paid good money, but they're surely multiples less than there were 10-20 years ago.

The NLP (Natural Language) is the killer part of the equation for these new AI tools. Simple as knowing English or any other natural language, to output an image, an app or whatever. And it's going to be just like cellphone cameras and photographers, the results are going to get 'good enough' that its going to eat into many professions.

dragonwriter · on Dec 7, 2023

> Except this is the first time we have a new "generalist" technology. When Photoshop was released, it didn't reduce employment opportunities for writers, coders, 3D designers, etc.

Computing has always been a generalist technology, and every improvement in software development specifically has impacted all the fields for which automation could be deployed, expanded the set of fields in which automation could economically be deployed, and eliminated some of the existing work that software developers do.

And every one one of them has had the effect of increasing employment in tech involved in doing automation by doing that. (And increased employment of non-developers in many automated fields, by expanding, as it does for automation, the applications for which the field is economically viable more than it reduces the human effort required for each unit of work.)

kart23 · on Dec 7, 2023

Hmmm... People probably said the same exact thing about taxi drivers and really anyone who drives for a living when waymo demo'd self driving cars 10 years ago.

butlike · on Dec 7, 2023

1. Compassion is key 2. I'm of the opinion one should listen to the people in the room who are more well-versed on the topic at hand. 3. Harmonious living. I like to write music as a passion. Many others have written music too. Whats the difference between that person being biologically-based, or transistor-based? 4. It's not a zero-sum game. It's not a chase game. It's play.

astrange · on Dec 6, 2023

Productivity enhancements increases employment. Saying they'd decrease them goes against all empirical evidence.

You might as well be worried the invention of the C compiler hurt jobs for assembly programmers.

bigyikes · on Dec 6, 2023

The analogy doesn’t hold and this comment won’t age well.

Photoshop doesn’t take photographs, so of course it hasn’t displaced photographers. It replaced the “shop” but the “photo” was up to the artist.

The irony is, Photoshop can generate photos now, and when it gets better, it actually will displace photographers.

jboy55 · on Dec 7, 2023

Its just going to become self aware and start spitting out photographs?

Every scenic view, every building, every proper noun in the world has already been photographed and is available online. Photographer as "capturer of things" has long been dead, and its corpse lies next to the 'realist painters' of the 1800s before the dawn of the photograph and the airbrush artists of the 50s, 60s and 70s.

However, my newborn hasn't, hot-celebrity's wardrobe last night outside the club hasn't, the winning goal of the Leaf's game hasn't, AI can't create photos of those.

And the conceptual artistic reaction to today's political climate can't, so instead of that artist taking Campbell Soup Cans and silkscreening its logo as prints, or placing the text, "Your Body is a Battle Ground" over two found stock photos of women, or perhaps an artist hiring craftspeople to create realistic sexual explicit sculptures of them having sex with an Italian porn star; an artist is just now going to ask AI to create what they are thinking as a photo, or as a 3D model.

Its going to change nothing, but be a new tool, that makes it a bit easier to create art than it has been in the last 120 years, when "Craft" no longer was defacto "Art".

butlike · on Dec 7, 2023

In the same way other photographers taking pictures have displaced each other throughout history?

butlike · on Dec 7, 2023

Exactly. When the train really gets rolling, us humans shouldn't eschew the value of being able to interact with the intelligences. For such quaint problems we'll have, it probably costs close to 0 effort to answer a question or two.

I'm picturing something like as an intreraction I'd like to have:

"Hey, do you mind listening to this song I made? I want to play it live, but am curious if there's any spots with frequencies that will be downright dangerous when played live at 100-110dB. I'm also curious if there's any spots that traditionally have been HATED by audiences, that I'm not aware of."

"Yeah, the song's pretty good! You do a weird thing in the middle with an A7 chord. It might not go over the best, but it's your call. The waves at 21k Hz need to go though. Those WILL damage someones ears."

"Ok, thanks a lot. By the way, if you need anything from me; just ask."

butlike · on Dec 7, 2023

Whatever you want, probably. Or put a different way: "what's a workforce?"

"We need to do a big calculation, so your HBO/Netflix might not work correctly for a little bit. These shouldn't be too frequent; but bear with us."

Go ride a bike, write some poetry, do something tactile with feeling. They're doing something, but after a certain threshold, us humans are going to have to take them at their word.

The graph of computational gain is going to go linear, quadratic, ^4, ^8, ^16... all the way until we get to it being a vertical line. A step function. It's not a bad thing, but it's going to require a perspective shift, I think.

Edit: I also think we should drop the "A" from "AI" ...just... "Intelligence."

gniv · on Dec 7, 2023

Yeah, this feels like the revenge of the blue collar workers. Maybe the changes won't be too dramatic, but the intelligence premium will definitely go down.

Ironically, this is created by some of the most intelligent people.

samr71 · on Dec 7, 2023

We're just gonna have UBI

butlike · on Dec 7, 2023

Totally. I think UBI will be the "energy meter" of the future. Like in a video game. You get xxx dollars or whatever. Buy whatever you need, but the cap is to make sure you don't act foolish. Your UBI tank gets replenished every month, but if you blow it all on a new bicycle and kitchen upgrade for your house, you can't continue on to buy a bathroom renovation or whatever. You have to wait.

dblitt · on Dec 6, 2023

> For the purposes of this demo, latency has been reduced and Gemini outputs have been shortened for brevity.

Seems like this video was heavily editorialized, but still impressive.

nathanfig · on Dec 6, 2023

Definitely edited, pretty clear in some of the transitions. Makes me wonder how many takes were needed.

htrp · on Dec 6, 2023

Some poor AI / ML intern with a Stanford PhD probably spent an entire summer running these to get the take they needed.

andrewprock · on Dec 6, 2023

The prompts were also likely different:

video: "Is this the right order?"

blog post: "Is this the right order? Consider the distance from the sun and explain your reasoning."

https://developers.googleblog.com/2023/12/how-its-made-gemin...

huytersd · on Dec 7, 2023

You don’t know that. The responses in the video don’t line up. That blog post is just an alternative text prompt based version of what they showed on video.

EZ-E · on Dec 6, 2023

Out of curiosity I fed ChatGPT 4 a few of the challenges through a photo (unclear if Gemini takes live video feed as input but GPT does not afaik) and it did pretty well. It was able to tell a duck was being drawn at an earlier stage before Gemini did. Like Gemini it was able to tell where the duck should go - to the left path to the swan. Because and I quote "because ducks and swans are both waterfowl, so the swan drawing indicates a category similarity (...)"

nuccy · on Dec 6, 2023

Gemini made a mistake, when asked if the rubber duck floats, it says (after squeaking comment): "it is a rubber duck, it is made of a material which is less dense than water". Nope... rubber is not less dense (and yes, I checked after noticing, rubber duck is typically made of synthetic vinyl polymer plastic [1] with density of about 1.4 times the density of water, so duck floats because of air-filled cavity inside and not because of material it is made of). So it is correct conceptually, but misses details or cannot really reason based on its factual knowledge.

P.S. I wonder how these kind of flaws end up in promotions. Bard made a mistake about JWST, which at least is much more specific and is farther from common knowledge than this.

1. https://ducksinthewindow.com/rubber-duck-facts/

elbasti · on Dec 6, 2023

This is exactly the failure mode of GPTs that make me worry about the future idiotization of the world.

"Rubber ducks float because they are made of a material less dense than water" both is wrong but sounds reasonable. Call it a "bad grade school teacher" kind of mistake.

Pre-gpt, however, it's not the kind of mistake that would make it to print: people writing about rubber ducks were probably rubber duck experts (or had high school level science knowledge).

Print Is cite-able. Print perpetuates and reinforces itself. Some day someone will write a grade school textbook built with GPTs, that will have this incorrect knowledge, and so on.

But what will become of us when most gateways to knowledge are riddled with bullshit like this?

vineyardmike · on Dec 7, 2023

I think the exact opposite will happen. When I was in school, we were taught never to trust online sources, and students always rolled their eyes at teachers for being behind the times. Meanwhile, the internet slowly filled up with junk and bad information and horrible clickbait and “alternative facts”. GPT hallucinations are just the latest version of unreliable “user generated content”. And it’s going to be everywhere, and indistinguishable from any other content.

People will gladly tell you there’s so much content online and it’s so great that you don’t need college anymore (somewhat true). The internet has more facts, more knowledge, updated more often, than any written source in time. It’s just being lost in a sea of junk. Google won’t be able to keep up at indexing all the meaningless content. They won’t be able to provide meaningful search and filtering against an infinite sea of half truths and trash. And then they’ll realize they shouldn’t try, and the index will become a lot more selective.

Today, no one should trust online information. You should only trust information that genuinely would have editors and proof teams and publishers. I think this will finally swing the pendulum back to the value of publishers and gatekeepers of information.

myaccountonhn · on Dec 11, 2023

Yup! With search results being so bad these days, I've actually "regressed" to reading man pages, books and keeping personal notes. I found that I learn more and rely less on magic tools in the process.

da39a3ee · on Dec 8, 2023

Have you heard of Wikipedia? It’s actually rather good.

alright2565 · on Dec 7, 2023

> will become of us when most gateways to knowledge are riddled with bullshit like this?

I think we're already here. I asked Google Bard about the rubber ducks, then about empty plastic bottles. Bard apparently has a "fact check" mode that uses Google search.

It rated "The empty water bottle is made of plastic, which has a density lower than water" as accurate, using a Quora response which stated the same thing as a citation. We already have unknowlagable people writing on the internet; if anything these I hope these new AI things and the increased amount of bullshit will teach people to be more skeptical.

(and for what it's worth, ChatGPT 4 accurately answers the same question)

thehappypm · on Dec 6, 2023

Some rubber is less dense than water, and certainly the type in a rubbery ducky would be

HarHarVeryFunny · on Dec 7, 2023

FWIW those bathtub ducks are made of vinyl, not rubber, but more to the point given that it's hollow it's not the density of the material that determines whether it floats. A steel aircraft carrier floats too.

thehappypm · on Dec 7, 2023

Perhaps today they’re vinyl, mostly, but the AI wasn’t wrong in saying that if the duck was made of rubber it’d be less dense than water

HarHarVeryFunny · on Dec 7, 2023

The density of rubber would only be a factor if it was solid, not hollow, and then it'd depend on what kind of rubber. Most rubber is in fact more dense than water (per googling density of rubber).

The fact that it squeaks when squeezed, as well as common knowledge about bath ducks, tells you it's hollow, therefore the density of the material it is made of would be the wrong way to determine it it would float. It's the average density of the entire duck that matters (i.e. weight of duck vs weight of water displaced by duck's volume).

tim333 · on Dec 7, 2023

Modern 'rubber ducks' similar to the one in the picture aren't even made out of rubber but plastic. They get called rubber ducks because they were make of rubber when invented in the late 1800s. Amazing what you can learn on Wikipedia.

ec109685 · on Dec 7, 2023

GPT also fails at this:

> Which weighs more a pound of feathers or a pound of feathers

< A pound of feathers and a pound of bricks weigh the same. Both are one pound. The difference lies in volume and density: feathers take up more space and are less dense, while bricks are denser and take up less space.

Bard does better but still doesn't "get" it:

< Neither! Both a pound of feathers and a pound of feathers weigh the same, which is exactly one pound. In other words, they have the same mass.

< This is a classic riddle that plays on our expectations and assumptions. We often associate weight with density, so we might initially think that feathers, being lighter and fluffier than other materials, would weigh less than something more compact like metal. However, as long as both piles of feathers are measured to be exactly one pound, they will weigh the same.

At least it recognizes its limitations:

> My reason for mentioning other materials was likely due to my training data, which contains a vast amount of information on various topics, including the concept of weight and density. As a large language model, I sometimes tend to draw on this information even when it is not directly relevant to the current task. In this case, I made the mistake of assuming that comparing feathers to another material would help clarify the point, but it only served to complicate the matter.

For ChatGPT if you ask it to solve it step by step, it does better: https://chat.openai.com/share/7810e5a6-d381-48c3-9373-602c14...

jiggawatts · on Dec 6, 2023

I noticed the same thing, and it's relevant to the comparison results of Gemini vs ChatGPT that GPT 3.5 makes the exact same mistake, but GPT 4 correctly explains that the buoyancy is caused by the air inside the ducky.