He can make the same criticism of Internet searches as he does of GPT: you shoul...

anotherpaulg · on April 3, 2023

Gish looks really nice. I'm going to give it a try.

It seems like you've been using similar workflows to what I've been trying for coding with gpt?

https://github.com/paul-gauthier/easy-chat#created-by-chatgp...

dror · on April 3, 2023

Pretty much, except that I'm automating everything as much as I can, so that I just give the instructions and GPT does the rest. Here's an example:

-----

#import ~/work/gish/tasks/coding.txt

Change the following so that it looks for the open AI key in the following fashion:

1. env variable

2. os.home()/.openai

3. Throws an exception telling the user to put it in one of the above, and then exits

#diff ~/work/gish/src/LLM.ts

-----

Puts me in vimdiff comparing the old code with the generated code letting me review and cherry pick the changes.

anotherpaulg · on April 3, 2023

Ya, that's pretty much my workflow as well. Though for my little web app, I could give it the whole ball of html/css/js each time.

I haven't seen anyone else describing this workflow. Feed it the existing code, ask it to modify/improve/fix the code and output a new version of all the input code, review diffs.

It has downsides, because you can easily run out of context window of chatgpt-3.5-turbo. But I am getting much better code out of it versus other approaches I've tried. And it's a very efficient and natural workflow -- we're used to getting and reviewing diffs/PRs from human collaborators.

jazzyjackson · on April 3, 2023

fabulous

tornato7 · on April 4, 2023

> The cost is based on the assumption that you're using GPT3.5 at $0.02 per 1000 tokens.

It's actually $0.002/1k, FYI

meeka · on April 3, 2023

Speaking of AI generated pages, I wonder how OpenAI filter these low quality web pages out of their training set as they continue to training.

Also, I wonder how they decide what code is worth training on. Because a lot of code is written in poor style/has technical debt, it might be the case that these LLMs in the long run lead to an increase in the technical debt in our society. Plus, eventually, and this might already be happening, the LLM are going to end up training on their own outputs, so that could lead to self immolation by the model. I am not certain RLHF completely resolves this issue.

wwweston · on April 5, 2023

> I wonder how OpenAI filter these low quality web pages out of their training set as they continue to training.

This. The value proposition is very clearly tied to the quality of the training data, and if there's secret sauce for automatically determining information quality that's obviously huge. Google was built in part on such insights. I suspect they do have something. I'd be utterly astonished if quality sorting were an emergent property of LLMs (especially given it's iffy in humans).

The problem, of course, is that if they do have a way of privileging data for training, that information is going to be the center of the usual arms race for attention and thinking. It can't be truly public or it's dead.

jazzyjackson · on April 3, 2023

yea i'm kind of shocked none of these models implement any kind of fingerprinting, something encoded in zero width spaces or other invisible unicode. It would be trivial to delete it but for the vast majority of cases, it would allow content to be flagged as model output-do not ingest

lupire · on April 3, 2023

If they aren't using Bing as a quality filter, they are crazy or stupid.

swader999 · on April 3, 2023

Google and others would be wise to add a date filter of "before summer 2023". Maybe a bit longer, but not much time left till AI spam really takes over.

fomine3 · on April 3, 2023

Spammers will set article date to 2021

flangola7 · on April 3, 2023

You can't fake a domain registration date

fomine3 · on April 3, 2023

I should invest to used domain market

lupire · on April 3, 2023

A lot of people owned domains before 2021.

rimliu · on April 3, 2023

I imagine ChatGPT will go from the "future changing" to "glorified spam generator" quite quickly.