Hacker News new | past | comments | ask | show | jobs | submit login
Dust XP1 switches to GPT-3.5-turbo, is now free to use (dust.tt)
99 points by ukuina on March 8, 2023 | hide | past | favorite | 50 comments



> Only the text content of the tabs you select and submit are sent through our servers to OpenAI's API.

But they're still sent through your server for pretty much no reason?


So they don't have to send the API keys to the client, where they can be abused


OpenAI could begin to accept pre-signed URLS to accommodate this use case. Would reduce the amount of people with access to your private data by 1.


Ah, I thought you'd provide your own key.

If it is free, how do they not make losses?!


I realized.

I could use GPT to generate Wikipedia articles about random villages and people I know nothing about.

Sure, those will not be at all correct, but who checks correctness on wikipedia, especially about random people and villages?

GPT, give me 10 paragraphs about New East Lake, Michigan


You can, and some people will do exactly that.

Ideally, new versions of GPT (and other LLMs) will be also trained on the newly added, mostly hallucinated data when using Wikipedia as part of the training set, baking those past hallucinations into the model, allowing other users to get additional hallucinated facts that fit the existing fictitious data, slowly creating a new "GPT cinematic universe" for e.g. small towns in Nebraska.

And if people fact check it with Bing, there is a good chance it will say "Yeah, that checks out, here's Wikipedia and <some Wikipedia mirror> as a source". A creative derivation of XKCDs citogenesis[1] :D

[1] https://xkcd.com/978/


Well that is true. However, I used Phind and was able to get 10 paragraphs about New East Lake with sources! https://www.phind.com/search?cache=8ff1dcc7-7587-4df8-9e46-6...

At the same time, it uses Wikipedia as a source as well as marketing and blog sources so Phind results should be carefully critiqued as well.


I got ChatGPT to hallucinate actual sources that I can readily copy-paste on wikipedia:

“Give me 3 citations about New East Lake, Michigan, that I can use on wikipedia, in mediawiki syntax. Use actual book names and URLs instead of examples.”

Note that this town doesn’t exist. But now I can pretend it does.

Note, I don’t want to do this of course. But it shows that you really need to check sources on wikipedia, especially nowadays, when it’s so easy to generate realistically looking nonsense.


You can already do that by hand (hoaxes have been done, e.g. https://en.wikipedia.org/wiki/Henryk_Batuta_hoax)

The only thing GPT changes here is speed at which you can do that, but if you start mass-spamming with such articles that will actually make it easier to spot.


Would that not qualify as an act of vandalism?


You really think someone would do that? Just go on the internet and tell lies?


Related:

Show HN: XP1 – A GPT-based Assistant with access to the browser Tabs - https://news.ycombinator.com/item?id=33970570 - Dec 2022 (9 comments)


Certainly cool, but it's getting a lot to see just how good GPT is at generating utterly useless cheap content. Making content is a moving target, and GPT by nature produces uncannily generic generations.


please bring firefox versions from the start. You are only playing into the chrome monopoly by treating firefox users as second class citizens then complaining there is no "demand"


Why would anyone trust a free-to-use app that’s grabbing all of the information in their browser and sending it to a 3rd party? Seems insane.


I've been following XP1 and I don't think they have nefarious goals with this. The extension is even open source (though not sure if the proxy is). Check it out here: https://github.com/dust-tt/dust/tree/main/xp1

I'm working on a somewhat related product (except bringing this assistant capability to all apps on your computer, all browsers, and using mostly on-device ML...waitlist in my profile in case you're curious)

What we've discussed internally is having two modes for the cases where we do need network connection:

1) A turn-key, use-our-OpenAI/HuggingFace/whatever proxy that doesn't store anything, just adds our token and pays for it on your behalf

2) Bring-your-own key for each service

The fact is that most users who just want to use these kinds of productivity tools might not have their own OpenAI/Azure/etc account, so offering option 1 and even defaulting to it is right for most end-users.

I think XP1 is making the right call here with this default, though offering #2 would be nice!

(edit: added Github link to XP1)


Does the extension being open-source actually help here? I don't have time to look through the whole thing, but essentially it sounds like I can verify for myself that all of my requests are being sent to dust for them to... store for debugging purposes and not make any money while paying everyone's OpenAI fees? Doesn't seem believable to me.


Well, you can verify if it is sending all data all the time, or only when you invoke it for a request.

I believe most people are uncomfortable with all data being sent all the time, but more okay with sending some data in the exact cases they choose, since they have control.

So I'd argue that the extension being open source helps a lot.

You're right that it doesn't guarantee anything for how the server is behaving in the case that you do invoke it though. For that we'd need either transparency into that source code and server operations, or, more likely, a strong privacy policy and maybe SOC2 or other certifications.

I believe the reason XP1 is subsidizing this right now is to grow their user base to attract investors, and as they develop their LLM platform, and then probably charge for business users down the road, but they don't seem to state that intention as clearly as they could.


I mean, you can verify the requests your browser is sending off whether the extension is open-source or not (by using developer tools in your browser or proxying the requests). So I don't think being open-source helps all that much really.

I still don't think it's appropriate for them to be using responding to emails as an example in their docs, especially without a warning. If someone went around sharing my private conversations with another person without telling me, I'd lose trust in that person. They'll get away with it because some people don't see sharing data with a software company as the same thing, and it's tough to know when it's happening, but nevertheless, it's sketchy.


I installed the extension and asked it to reply to your comment, and thought the response was impressive. I myself am concerned about the privacy implications.

"Hi yunwal,

Regarding your concern about the privacy of Dust XP1, it's understandable to be cautious when it comes to sharing personal information. However, as mentioned in the Discord message shared by spolu, Dust XP1 only sends your requests to OpenAI and stores them for debugging purposes. They do not fetch or store anything else than what is required to process your requests. Additionally, Dust XP1 is open source, which means you can look at the code if needed. If you have any further questions, feel free to ask.

Best regards, XP1"


So, they do store requests for "debugging" purposes, which means they have a record of everything you're doing with the extension and probably the page you're on while you're doing it.

I'm not sure why this is necessary. Surely, they're just scraping the page and using it as context in a ChatGTP prompt. They don't have to proxy it through their servers to do that.


> So, they do store requests for "debugging" purposes

No, that's likely entirely made up by ChatGPT.


Further down the thread someone associated with the company, u/spolu, clarified that yes, XP1 does currently store requests for debugging purposes, but that a formal privacy policy is forthcoming.

Hopefully that policy can assuage my concerns, since I'd love to use this! It looks similar to the Edge AI features Microsoft teased at the new-bing release, and I keep thinking how handy it would be to have that toolbox available while I'm browsing for work.


From the site:

Privacy

Only the text content of the tabs you select and submit are sent through our servers to OpenAI's API. Cookies, tab list, or non-submitted tab content are never sent.

A bit less than what the model came up with.

Also, are stements made by an llm model about staments made on discord legally binding?


Because your average user has no idea what you're talking about, nor do they care.


I would - and I'm sure a lot of other people.

Are you genuinely curious, or are you asking because you're implying that the people who would use such an app are somehow not understanding something or not intelligent, or don't know something that you do? Like you have to prove your case too, as you're not immediately "right" in your statement. Sure there is some level of "risk" in doing this, but there is risk in a lot of things. It's like me asking people this:

"Why would anyone trust people with 4 weeks driving classes and a test with their lives on a road driving 80mph inside 2-tonne metal cages? Seems insane."


> "Why would anyone trust people with 4 weeks driving classes and a test with their lives on a road driving 80mph inside 2-tonne metal cages? Seems insane."

We've run the experiment for n decades with billions of subjects and the results indicate it is actually not that bad.

We have -not- really even discussed, much consider, the implications of having a central system with machine intelligence designed to extract features, patterns, emotions, assumed motives, ..., having access to the entire digital lives of societies.

Is it insane to repeat the same mistakes? Not sure but it's somewhere close in the neighborhood. We could run it by a k-nearest algorithm and see what that suggests as a better category than 'insane'.


> We have -not- really even discussed, much consider, the implications of having a central system with machine intelligence designed to extract features, patterns, emotions, assumed motives, ..., having access to the entire digital lives of societies.

Actually we've pretty much been watching that play out for years now, even if the technology hasn't been in its final form for the duration. Results have already been a wee bit society-destroying.


You seem to be willfully ignoring the point.

There is no business model here that doesn't include "We sell your most private and sensitive data to the highest bidder". Because it's free, they can't make money any other way. And while for other browser extensions/software, you can at least audit the requests being made by the extension, this thing is sending away all your data because it has to in order to work.


Chrome only? Guess I'll just have to wait/avoid for now.


Yeah, I noticed that also. I will watch from the sidelines for now.


I had a good experience testing out "gpt-3.5-turbo". Made a bot for the Matrix server we use at work so that co workers can query and work with the AI assistant collaboratively and in any group chat. Works pretty well so far and is fairly inexpensive. The markdown sent by the AI gets rendered nicely in Element and Conny. Want to open source it soon.


Did you somehow get it to play nice with multi-party chats, or does every user have their own private 1-1 room with the bot?


Both work. I have an SQLite db where i save "rooms", "sessions" and "messages". So multiple users can contribute in the same chat session, where i pass the whole conversation history to the AI as context. Then there are commands to switch and delete sessions. When the bot is mentioned in a message, it gets the answer from the API in a different thread and shows a writing indicator. It also plays super nice with the Element Matrix client which supports tables and codeblocks. The openai API sends back markdown which i can parse into html and send like that. I want to clean up my code a bit and then open source the bot.


"- Does not work with Google Docs and Google Sheets (work in progress)"

I wonder if security issue or because the canvas/ui is special


Tried it in this page with the following prompt: "read the comments and write a summary of what people think of this product"

GPT: Users are impressed by the capabilities of the product and its ability to generate content quickly. However, some users are concerned about the privacy implications of the product and are hesitant to use it due to the fact that it sends personal information to a third-party. The fact that the extension is open source helps to alleviate some of these concerns, but others remain skeptical about the intentions of the developers. Overall, there is a mix of excitement and caution surrounding this product.


Information needs to start having a "Free of AI-generated content" seal of approval, akin to the GMO battle being fought elsewhere.

I mean, we already process high fructose digitally-altered content, that ship has sailed - but we ought really start to having an "AI / not-an-AI" tag we can put on things...

(The moment I detect you've sent me an AI-created email about our project, I will replace you with a shell script.)

/disclaimer: I may have generated this comment without any intelligence, whatsoever.


This kind of apps make me wonder... is AI going to learn how to treat people better than people?


> is AI going to learn how to treat people better than people?

The bar isn't very high TBH.


Unfortunately Dust.tt doesn't use a modern email validation method and so fails on "modern" TLD's like .media, making it impossible to sign up for an activation key with it.


"Mr X is an experienced software engineer with over 500 connections". ChatGPT learns from the best bullshit as well.


They manage a cap on usage for predictability or have other means of converting early/often enough?


Not sure about their policy/internals myself, but since GPT-3.5-turbo is 1/10th the cost of the previous model it's really hard now to run up a significant (to a funded startup) bill via legitimate usage — especially since at this phase any legit usage may be worth its weight in lessons about how the tool can be used. May be worth rate limiting / DoS protection but even then, could just cap OpenAI spending and spend the time to patch it if/when it happens (e.g. import @upstash/ratelimit or adding a quick Cloudflare rule)


Which raises a concern about privacy, since they're obviously going to be spying on your usage, and the tool has access to potentially sensitive information in your browser tabs.


Hi, this is a fair concern. We're super early and working on a proper privacy policy as we speak. But we also provided some color about how we handle your data on our Discord. Copying it here:

``` The privacy section on the landing README remains true. We just send your requests to OpenAI and store them for debugging purposes but we don't fetch or store anything else than what is required to process your requests of course. XP1 being opensource you can also look at the code if needed, but happy to answer any question In short: - Requests (including the text dump of tabs you select) go to the Dust main platform - They are processed as part of a Dust app whose Run object is stored - The LLM query is sent to OpenAI (retention policy 30 days, not used for training) - The response is tored as part of Dust's Run object - The response is streamed back to the client ```


>store them for debugging purposes

This project this looks great, but it's going to be a no from me until your formal policy clarifies this point. Do those requests still "include the text dump of tabs you select" when stored? It's not that I don't trust you folks, it's that I can't trust the entire wider world to not eventually break into or subpoena your debugging repository.

Further up in the thread people asked your extension about privacy concerns and at least one assumed the response's included remark about storing requests for debugging must have been an AI hallucination.


"store them for debugging purposes" is a bit concerning if they then become available if law enforcement requests data, or if you guys are hacked and everything leaks.


Completely agreed with that. We'll revisit if needed but we don't expect it to be too much of an issue at this stage.

Our goal is to gather marginally more usage to learn more about productivity use-cases using LLMs.

Here's the email we sent to our users as we removed the paywall: https://twitter.com/dust4ai/status/1633484243228585988


Thought I saw Dell XP1 switching to use GPT 3.5 turbo.

When you come out of beta, probably need a rebrand


How is this different from Edge the browser?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: