Ideally, new versions of GPT (and other LLMs) will be also trained on the newly added, mostly hallucinated data when using Wikipedia as part of the training set, baking those past hallucinations into the model, allowing other users to get additional hallucinated facts that fit the existing fictitious data, slowly creating a new "GPT cinematic universe" for e.g. small towns in Nebraska.
And if people fact check it with Bing, there is a good chance it will say "Yeah, that checks out, here's Wikipedia and <some Wikipedia mirror> as a source". A creative derivation of XKCDs citogenesis[1] :D
I got ChatGPT to hallucinate actual sources that I can readily copy-paste on wikipedia:
“Give me 3 citations about New East Lake, Michigan, that I can use on wikipedia, in mediawiki syntax. Use actual book names and URLs instead of examples.”
Note that this town doesn’t exist. But now I can pretend it does.
Note, I don’t want to do this of course. But it shows that you really need to check sources on wikipedia, especially nowadays, when it’s so easy to generate realistically looking nonsense.
The only thing GPT changes here is speed at which you can do that, but if you start mass-spamming with such articles that will actually make it easier to spot.
Certainly cool, but it's getting a lot to see just how good GPT is at generating utterly useless cheap content. Making content is a moving target, and GPT by nature produces uncannily generic generations.
please bring firefox versions from the start. You are only playing into the chrome monopoly by treating firefox users as second class citizens then complaining there is no "demand"
I've been following XP1 and I don't think they have nefarious goals with this. The extension is even open source (though not sure if the proxy is). Check it out here: https://github.com/dust-tt/dust/tree/main/xp1
I'm working on a somewhat related product (except bringing this assistant capability to all apps on your computer, all browsers, and using mostly on-device ML...waitlist in my profile in case you're curious)
What we've discussed internally is having two modes for the cases where we do need network connection:
1) A turn-key, use-our-OpenAI/HuggingFace/whatever proxy that doesn't store anything, just adds our token and pays for it on your behalf
2) Bring-your-own key for each service
The fact is that most users who just want to use these kinds of productivity tools might not have their own OpenAI/Azure/etc account, so offering option 1 and even defaulting to it is right for most end-users.
I think XP1 is making the right call here with this default, though offering #2 would be nice!
Does the extension being open-source actually help here? I don't have time to look through the whole thing, but essentially it sounds like I can verify for myself that all of my requests are being sent to dust for them to... store for debugging purposes and not make any money while paying everyone's OpenAI fees? Doesn't seem believable to me.
Well, you can verify if it is sending all data all the time, or only when you invoke it for a request.
I believe most people are uncomfortable with all data being sent all the time, but more okay with sending some data in the exact cases they choose, since they have control.
So I'd argue that the extension being open source helps a lot.
You're right that it doesn't guarantee anything for how the server is behaving in the case that you do invoke it though. For that we'd need either transparency into that source code and server operations, or, more likely, a strong privacy policy and maybe SOC2 or other certifications.
I believe the reason XP1 is subsidizing this right now is to grow their user base to attract investors, and as they develop their LLM platform, and then probably charge for business users down the road, but they don't seem to state that intention as clearly as they could.
I mean, you can verify the requests your browser is sending off whether the extension is open-source or not (by using developer tools in your browser or proxying the requests). So I don't think being open-source helps all that much really.
I still don't think it's appropriate for them to be using responding to emails as an example in their docs, especially without a warning. If someone went around sharing my private conversations with another person without telling me, I'd lose trust in that person. They'll get away with it because some people don't see sharing data with a software company as the same thing, and it's tough to know when it's happening, but nevertheless, it's sketchy.
I installed the extension and asked it to reply to your comment, and thought the response was impressive. I myself am concerned about the privacy implications.
"Hi yunwal,
Regarding your concern about the privacy of Dust XP1, it's understandable to be cautious when it comes to sharing personal information. However, as mentioned in the Discord message shared by spolu, Dust XP1 only sends your requests to OpenAI and stores them for debugging purposes. They do not fetch or store anything else than what is required to process your requests. Additionally, Dust XP1 is open source, which means you can look at the code if needed. If you have any further questions, feel free to ask.
So, they do store requests for "debugging" purposes, which means they have a record of everything you're doing with the extension and probably the page you're on while you're doing it.
I'm not sure why this is necessary. Surely, they're just scraping the page and using it as context in a ChatGTP prompt. They don't have to proxy it through their servers to do that.
Further down the thread someone associated with the company, u/spolu, clarified that yes, XP1 does currently store requests for debugging purposes, but that a formal privacy policy is forthcoming.
Hopefully that policy can assuage my concerns, since I'd love to use this! It looks similar to the Edge AI features Microsoft teased at the new-bing release, and I keep thinking how handy it would be to have that toolbox available while I'm browsing for work.
Only the text content of the tabs you select and submit are sent through our servers to OpenAI's API. Cookies, tab list, or non-submitted tab content are never sent.
A bit less than what the model came up with.
Also, are stements made by an llm model about staments made on discord legally binding?
Are you genuinely curious, or are you asking because you're implying that the people who would use such an app are somehow not understanding something or not intelligent, or don't know something that you do? Like you have to prove your case too, as you're not immediately "right" in your statement. Sure there is some level of "risk" in doing this, but there is risk in a lot of things. It's like me asking people this:
"Why would anyone trust people with 4 weeks driving classes and a test with their lives on a road driving 80mph inside 2-tonne metal cages? Seems insane."
> "Why would anyone trust people with 4 weeks driving classes and a test with their lives on a road driving 80mph inside 2-tonne metal cages? Seems insane."
We've run the experiment for n decades with billions of subjects and the results indicate it is actually not that bad.
We have -not- really even discussed, much consider, the implications of having a central system with machine intelligence designed to extract features, patterns, emotions, assumed motives, ..., having access to the entire digital lives of societies.
Is it insane to repeat the same mistakes? Not sure but it's somewhere close in the neighborhood. We could run it by a k-nearest algorithm and see what that suggests as a better category than 'insane'.
> We have -not- really even discussed, much consider, the implications of having a central system with machine intelligence designed to extract features, patterns, emotions, assumed motives, ..., having access to the entire digital lives of societies.
Actually we've pretty much been watching that play out for years now, even if the technology hasn't been in its final form for the duration. Results have already been a wee bit society-destroying.
There is no business model here that doesn't include "We sell your most private and sensitive data to the highest bidder". Because it's free, they can't make money any other way. And while for other browser extensions/software, you can at least audit the requests being made by the extension, this thing is sending away all your data because it has to in order to work.
I had a good experience testing out "gpt-3.5-turbo". Made a bot for the Matrix server we use at work so that co workers can query and work with the AI assistant collaboratively and in any group chat.
Works pretty well so far and is fairly inexpensive. The markdown sent by the AI gets rendered nicely in Element and Conny. Want to open source it soon.
Both work. I have an SQLite db where i save "rooms", "sessions" and "messages". So multiple users can contribute in the same chat session, where i pass the whole conversation history to the AI as context. Then there are commands to switch and delete sessions. When the bot is mentioned in a message, it gets the answer from the API in a different thread and shows a writing indicator.
It also plays super nice with the Element Matrix client which supports tables and codeblocks. The openai API sends back markdown which i can parse into html and send like that.
I want to clean up my code a bit and then open source the bot.
Tried it in this page with the following prompt: "read the comments and write a summary of what people think of this product"
GPT: Users are impressed by the capabilities of the product and its ability to generate content quickly. However, some users are concerned about the privacy implications of the product and are hesitant to use it due to the fact that it sends personal information to a third-party. The fact that the extension is open source helps to alleviate some of these concerns, but others remain skeptical about the intentions of the developers. Overall, there is a mix of excitement and caution surrounding this product.
Information needs to start having a "Free of AI-generated content" seal of approval, akin to the GMO battle being fought elsewhere.
I mean, we already process high fructose digitally-altered content, that ship has sailed - but we ought really start to having an "AI / not-an-AI" tag we can put on things...
(The moment I detect you've sent me an AI-created email about our project, I will replace you with a shell script.)
/disclaimer: I may have generated this comment without any intelligence, whatsoever.
Unfortunately Dust.tt doesn't use a modern email validation method and so fails on "modern" TLD's like .media, making it impossible to sign up for an activation key with it.
Not sure about their policy/internals myself, but since GPT-3.5-turbo is 1/10th the cost of the previous model it's really hard now to run up a significant (to a funded startup) bill via legitimate usage — especially since at this phase any legit usage may be worth its weight in lessons about how the tool can be used. May be worth rate limiting / DoS protection but even then, could just cap OpenAI spending and spend the time to patch it if/when it happens (e.g. import @upstash/ratelimit or adding a quick Cloudflare rule)
Which raises a concern about privacy, since they're obviously going to be spying on your usage, and the tool has access to potentially sensitive information in your browser tabs.
Hi, this is a fair concern. We're super early and working on a proper privacy policy as we speak. But we also provided some color about how we handle your data on our Discord. Copying it here:
```
The privacy section on the landing README remains true. We just send your requests to OpenAI and store them for debugging purposes but we don't fetch or store anything else than what is required to process your requests of course. XP1 being opensource you can also look at the code if needed, but happy to answer any question
In short:
- Requests (including the text dump of tabs you select) go to the Dust main platform
- They are processed as part of a Dust app whose Run object is stored
- The LLM query is sent to OpenAI (retention policy 30 days, not used for training)
- The response is tored as part of Dust's Run object
- The response is streamed back to the client
```
This project this looks great, but it's going to be a no from me until your formal policy clarifies this point. Do those requests still "include the text dump of tabs you select" when stored? It's not that I don't trust you folks, it's that I can't trust the entire wider world to not eventually break into or subpoena your debugging repository.
Further up in the thread people asked your extension about privacy concerns and at least one assumed the response's included remark about storing requests for debugging must have been an AI hallucination.
"store them for debugging purposes" is a bit concerning if they then become available if law enforcement requests data, or if you guys are hacked and everything leaks.
But they're still sent through your server for pretty much no reason?