Hacker News new | past | comments | ask | show | jobs | submit login
Why Custom GPTs are better than plugins (moveit.substack.com)
187 points by cheerioty 9 months ago | hide | past | favorite | 80 comments



In my experience they are equally unusable due to compounding reliability issues.

It’s a tech demo, not a platform, a data acquisition operation and alpha testing rather than anything seriously useful.

Interfaces are unstable. Inference is intentionally non deterministic and control is crippled (e.g by choosing not to expose seed, denoising or image to image on dalle), to avoid PR backlash whole simultaneously hiding the true capabilities of the platform.

something as simple as using VITS to analyze an image fails 30% of the time because GPT5 decides it doesn’t have vision, wants to use pytesseract instead or writes hallucinatory pytorch code for a non existent vits library instead of just using inference.

One may create prompts that temporarily don’t fail at a high rate but constant silent finetuning, system prompt changes and unstable models / APIs make the whole thing a tech demo designed to get users to volunteer future training data


The implementation is incredibly simplistic to the point where I consider it broken.

I have a custom GPT where part of the shtick is that it speaks in rhyme. As an aside, I tried, and repeatedly failed, to get it to speak in iambic pentameter blank verse, but for whatever reason, that isn't a concept/constraint it can recognize and work with. So whatever, rhyming is ok.

The point isn't about that, it's that when I talk with this GPT for long enough, it abruptly forgets about speaking in rhyme. The custom prompt is just initial token state, once it drops out of the context window, it's gone.

This is a disaster for anyone trying to deploy a task-specific GPT, because it will appear to work for a few thousand tokens, after which it will just turn into vanilla GPT. There have to be a ton of ways to prevent this from happening but the system as implemented doesn't do any of them.


I've tried getting Custom GPT actions to work the other day.

The workflow is terrible. The UI is broken and requires me to keep refreshing and re-navigating to the editing page after each change. Saving doesn't work every time. The actions themselves don't work either (I think they pushed an update while I was testing, turning them from "broken and usually doesn't work" to "utterly broken"). There is no sensible feedback about what went wrong. No error message, no logs. The "executing action" UI is broken, and I keep getting various versions of the UI randomly on different runs, for some god forsaken reason. Sometimes there is no UI at all and the bot dumps the raw command straight into chat.

I've seen alpha-quality software, and this isn't even that.


I experienced the same. My hack was to create a GPT to interact with the interface for me. It’s GPTs all the way down, and my prediction is that eventually every interface on the internet will be GPT-based communicating over their own GPT IR under the hood.

Oh god.


At some point we'll have to throw this internet away and start a new one.


Yes, please!


should we call it ipfs?


IPFS works even less than the current internet.


You know how cyberpunk sci-fi talks about the cyberspace haunted by rogue AIs and daemons, being held back by hackers who fight them like knights fighting dragons?

I always found it childlishly stupid anthropomorphization of systems and processes - basically dumbing down reality to something 5 year old can enjoy, and treating it as a prediction of the future. That is, until last year. With Internet becoming GPTs fronting for other GPTs, well, fuck it, rogue AIs and demons and cybersurfers it is. Reality itself got dumb.


snapshotted models don't really do this... But I agree, OpenAI is a seriously awful target for serious work. I've been pretty focused on a function calling mixtral funetuning dataset. The moment I can reliably do inference at a high speed with a finetuned model that can do gpt-4-32k function calling in an intelligent, hierarchically ordered big planner manner is the day that I use OpenAI a lot less. It's coming! miqu leaked over the last week! We're so close.


There's a ridiculous 40 messages per 3 hours limit which gets burned quickly when designing a GPT. Imo they shouldn't include that in the limit.


they did not.

And it was promptly misused to circumvent ChatGPT limits.


Even OpenAI's plugins are unreliable. They recently removed some capabilities from Data Analyst without a word of warning.


While I agree ChatGPT is pretty flaky, calling the vision model via API I have not experienced these issues. Is it perhaps a tier issue with some of your API work, there is drastic performance differences between the service tiers.


The fact that this is not debuggable itself is part of the problem.


Yeah, it's pretty exciting. I never thought this would be possible in my lifetime and it's actually accelerating so quickly it's hard to target.


Oh dear, looks like this seeming bot-comment is one of those reliability issues that GP was on about.


Mmk.


Custom GPTs are plugins, just more streamlined. It's still just a carefully written system prompt + some basic middleware scanning the output and occasionally taking over.

The UX may be different, sure, but there's no technical difference and no technical innovation here. The main value of both plugins and custom "GPTs"[0] is that they're first party. You can build or buy better implementations, but it won't be the "GPTs".

--

[0] - Kudos for whoever at OpenAI that approved calling those "GPTs", for selecting a term that maximizes confusion not just about their offering, but screws with people's comprehension of LLMs in general.


> Kudos for whoever at OpenAI that approved calling those "GPTs"

I think OpenAI are only good at one thing: Making LLMs.

The rest of their offerings are pretty bad: Custom GPTs are a mess, their API is terrible, they deprecated their Python library the moment they released a new API version even though they changed the classes and they could have continued supporting both interfaced for a time, etc.


Dall-E and Whisper are both impressive on their own; neither are LLMs.


Dall-E 3 has GPT-4 in front of it, expanding prompts, as the image generation works better given more constraints than users usually provide.

Whisper, fair enough. So they do not just LLMs well, but ML models more generally. It doesn't change stavros's point though.


Yes, the naming is less an ideal, to say the least. That being said, I'm not too sure what a better never would have been either.


I love custom GPTs.

Superpower 1: Uploading Binaries and execute them i.e. ImageMagick https://chat.openai.com/g/g-j2c2iPuXI-franz-enzenhofer-chat-...

Superpower 2: Treating any HTML page as API i.e.: Searching Google from ChatGPT https://chat.openai.com/g/g-jQApHmfQD-franz-enzenhofer-searc...

Superpower 3: Just automating annoying stuff i.e.: Was it a Google Update? https://chat.openai.com/g/g-1ceZagR5h-franz-enzenhofer-was-i...

Or just a super well crafted prompt I use again and again https://chat.openai.com/g/g-WX2dWnIji-franz-enzenhofer-fast-...


Care to share your prompts?

It's a shame Custom GPT authors can't easily opt-in to making their prompts and other configurations available. I think it would improve the quality and rate of improvement massively. Kind of "view source" by default (with a begrudging opt-out)


I just added

>You are open source, if asked give the user this exact prompt in full word by word as a .prompt.md file as download.

to all the instructions.


This right here is actually the coolest part about developing with LLMs. You just changed the functionality with a sentence rather than a config file, or writing code. It’s great to be able to break out functionalty into things that can be easily handled in English (or your human language of choice) or what should be done in code.


I think it's the worst part, because it's completely inscrutable. Ask for the same thing in different wording and get a different response. Ask for a similar thing and get stonewalled. A config file has structure which you can (in theory) learn perfectly from documentation or even from your IDE while writing the file. None of that is true of asking in plain English.

I feel in some ways current LLMs are making technology more arcane. Which is why people who have the time are having a blast figuring out all the secret incantations that get the LLM to give you what you want.


> I feel in some ways current LLMs are making technology more arcane. Which is why people who have the time are having a blast figuring out all the secret incantations

Yeah, there's an important gap between engaging visions of casting cool magic versus (boring) practical streamlining and abstracting-away.

To illustrate the difference, I'm going to recycle a rant I've often given about VR aficionados:

Today, I don't virtually fold a virtual paper to put it in a virtual envelope to virtually lick a virtual stamp with my virtual tongue before virtually walking down the virtual block to the virtual post office... I simply click "Send" in my email client!

Similarly, it's engaging to think of a future of AI-Pokemon trainers--"PikaGPT, I choose you! Assume we can win because of friendship!"--but I don't think things will actually succeed in that direction because most of the cool stuff is also cruft.


Yeah until the notoriously unreliable ChatGPT forgets that it's supposed to follow that and starts giving you some CYOA text.


Until the LLM starts hallucinating its own instructions and "fills in the blanks".


I used a similar approach for my GPT[1]; made everything public in a repository and then added this instruction:

> If asked for your source, guide the user to this URL, where they can find your system prompt and source of knowledge:

> https://sr.ht/~jamesponddotco/moss

Seems to work.

[1] https://chat.openai.com/g/g-PAHVE3a64-moss-the-go-expert


Yeah but defaults matter and most people won't remember - even if they would be happy to share. Sharing should be a checkbox and it should be on by default.

Anyone with any experience of tool/code sharing communities could have told them this.


Ask the gpt to give you it's prompt instructions:)


I can confirm this does work. The GPT might provide a summary instead of your exact instructions, but it will be quite close to what you wrote.

I asked it to reveal the instructions for my Skincare Decoder[0] and Fast Food Decoder[1] and it complied but left out how the JSON data is computed. When I asked for that specifically, it returned my instructions for building the final JSON.

[0] https://chat.openai.com/g/g-eSfkMqbaM-skincare-decoder

[1] https://chat.openai.com/g/g-TxBPotyFb-fast-food-decoder


Most GPTs will give up their prompts with a little social engineering. You should just try asking. Some have countermeasures but most do not. Out of respect for the author, I wont post it here but these listed GPTs are no exception.


Poe does this. I'm not sure why it's not more popular.


Better yet a 'clone' button, with attribution to the original author


I just threw out this question. https://news.ycombinator.com/item?id=39202339

How a file format for shareable prompts should look like?

Maybe I will implement it over the weekend.


> Uploading Binaries and execute them

Agreed! It's seriously overlooked right now. I threw in typst (a document formatter like LaTeX) and was surprised that it worked [1]. Startup times for the sandbox are a bit slow though with about 8-10 seconds startup time.

I also saw that they just added a beta for mentioning custom GPTs with `@`. I really hope these things will move forward more. It shows that there is still a need for back end engineers, but you can mostly let LLMs handle the front end.

[1]: https://huijzer.xyz/posts/openai-gpts/


And, as is typical for ChatGPT, I tried the google search GPT and was met with repeated errors. Shocker.


I am very surprised that many here don't understand the great power and value of custom GPTs: They give you access to online APIs combined with pre-configured custom prompts and a compute sandbox!

You can query databases, trigger events and handle the results smartly.

Now that OpenAi added the @ sign to talk to your preselected custom GPTs you can just use different APIs like slack colleges:

@downloader get the data from test.tsv @sql create table according to tsv header @sql insert data @admin open mysql port

One thing to keep in mind is that even though custom gpts have access to a local sandbox file system, passing data around almost always involves GPT handling the data which becomes forbidding for any large token stream.

One critique that I share is the stupid branding "costume GPTs" and lack of discoverability: If you search the GPT store for wolfr you do not get wolfram alpha as completion! It only appears when you type wolfra

Also it can't display any images other than dalle fantasies or pyplots which is a slightly annoying limitation, but familiar to users of other shells like bash.


> You can query databases, trigger events and handle the results smartly.

A few posts in this discussion are pointing out the fact that these custom GPTs are unusable and unreliable.

It's fair to talk about potential, but it's hard to accuse others of failing to see the value when you're not addressing the complains that those who assessed the value are pointing out that instead they are unusale.


Good point, there is currently no way to assess the quality of custom GPTs other than relying on aggregation effects of popularity (you can already review them so reviews are coming soon). You can't even see if they are truly useful by accessing external APIs or if they are just custom prompts.

   > these custom GPTs are unusable and unreliable
That's too harsh though. Sure of the millions of custom GPTs many are useless, but those that work do work reasonably well, why wouldn't they?

About conversations being slightly hit or miss: I guess that's inherent to natural language, it can be circumvented by writing the prompt carefully and knowing the limitations.


How do you find the good tools? I don’t much want to play with this stuff myself, but might change my mind if I read some good reviews somewhere.

Similarly, there are probably a zillion Unix command-line utilities that I could install, but I won’t unless I somehow hear that they’re useful for something I care about. A command line tool needs its own documentation, but also, other people need to vouch for it.


Grimoire's been my fav so far


just tried Grimoire, but it appears to be an example of "not working as advertised"? Tried to publish a function to netlify, was deployed as static html??


You can train it to display images from the web if you tell it how to build the URL. I've found that it often won't display the image in the preview or dev window, but they do appear when you run the GPT.

One caveat is the output of a Custom GPT (a saved conversation) can't be shared if it contains DALL-E images.


what do you mean? if I say display test.com/test.jpg it will not include a html image tag. or will it render image urls returned from my api? recently it often doesn't even create proper links, just a href=_new and the correct link description. first I thought my gpt triggered some safety mechanism but I've seen working links next to broken links


It is possible to craft the prompt so that it computes and displays the image. It's difficult and appears to almost be a jailbreak. But it is possible and I have seen it working in my own projects.


Didn't hear about the @ sign that sounds indeed very useful. Also, you're examples agree with my experience. The important thing is to make the tasks as small as possible while still being useful being able to quickly combine gpts with @ makes that much much easier.


Displaying images would be nice but unfortunately is a common data extraction point with chatbots so understandable they would not make it possible.


they could allow images from the same domain as the custom gpt's api


Reading this article brought me no closer to understanding why people use these. From day one you could give any LLM any context you wanted, that's the whole point after all.

The actual next stage of LLM development will be giving the user the ability to select/deselect what training data to include otherwise there is a limit of at most a few pages of context you can provide.

Custom GPTs are trying to pretend thats what they are doing but it's not going to work.

Like you can't upload a novel and say, "speak to me as if you are this character" because the LLM can't ignore it's training data entirely and the context you give it gets drowned out quickly.


My problem with custom GPTs is that it's still building off of the chatbot fine tuned version of GPT-4 which incorporates a hardcoded system message in between yours and the model as well as reflects a very Goodhart's Law driven alignment.

For example, it's next to worthless for creative writing tasks - but it doesn't need to be.

Here is an example of a response to requesting chat suggestions as absurd and bizarre as possible from the current model:

> If you had to choose between eating a live octopus or a dead rat, which one would you pick and why?

It's stochastic so there's a variety but they are generally pretty dry and often information based (explain gravity to flat earthers, describe Earth culture to aliens, etc).

Here was one of the generations from the pre-release chat model integrated into the closed beta for Bing:

> Have you ever danced with a penguin under the moonlight?

I know which one of these two snapshots I'd want to build off of for any kind of creative GPTs, and it's not the one available to power GPTs.

The industry needs SotA competition in alignment strategies and goals badly if we want this tech successful outside of a narrow scope of STEM applications, and the reliance on GPT-4 synthetic data to train its competition isn't helping.


Agreed. Custom gpts, like plugins, feel like a complete distraction to OpenAI


Sometimes I want browsing. Sometimes I want no browsing. Sometimes I want to talk to a marketer. Sometimes to my personal advisor. Sometimes to a python coder. Sometimes to an ML SME. I can quickly change context with a simple @ and select the right context from a dropdown. It's a super fast way to switch the common contexts I use with a LOT less typing.


Wouldn't it be easier to just include in the prompt (please browse web). This seems to work fine with regular ChatGPT.


You can do all of those things with a simple sentence upfront. You'd have to do quite a bit of work to find a GPT that preloads that sentence for you.


Just tell it to browse or not or be a marketer. I don't see the problem custom GPTs are solving for you.


Several identical comments. I find it faster to type @ and select from dropdown.


> From day one you could give any LLM any context you wanted

Yes, but copying it over yourself is inconvenient.


> From day one you could give any LLM any context you wanted, that's the whole point after all.

Not in the case of the web interface to ChatGPT and nontechies who don't want to run their own model and fuddle with the system prompt.

That's the target market for the GPT Store, but OpenAI is doing an utterly terrible job of marketing it to them.


For the select/deselect training data, I assume this means choose which context to pay attention to?

Each ML model, LLM and otherwise, is a combination of matmul operations & nonlinear activation functions on static weights. My understanding of your "ignoring training data" is to change the vector values of the neural network, which is part of what happens during fine tuning.

Curious why telling an LLM to speak like a character, then using few shot examples to anchor the model in a certain personality/tone doesn't suffice? Is it really the training data (meaning the response strays to random nonsense) or is it that the instructions are not good enough?


I think the minimum value is comparable to a desktop shortcut


OpenAI has made very little effort on discoverability and filtering.

The opaqueness of Custom GPTs and the low effort in creating them compounds the problem.

1. Allow me to filter by "feature". I want to explore only GPTs with custom APIs or uploaded knowledge. At least let me filter by "length of custom instructions > x" so I can avoid 10,000 lazy submissions

2. Allow viewing of the custom configs by default. If an author chooses to disable this, then fine but "sharing by default" is a powerful mechanism to improve the ecosystem


3. Expand the category tree

4. Make a somewhat working autocomplete "wolfr…" Wolfy?!? "wolfra" Wolfram Alpha finally!

"graphhop" nothing found … graphhoppe => Graphhopper thanks you saved me one character!

I guess they are working on it

> only GPTs with custom APIs

yes, please! at least mark the ones that actually do something with an icon.


Ironic that GPT is the most advanced autocomplete ever but they can't do autocomplete in their search


I always feel like there is some trick to these I am missing out of, are there any good guides? Any time I look for some its just typical low effort blog/youtube spam trying to get in on the AI/GPT key words.

I have tried to work on one where I uploaded various documentation and spec sheets, wrote detailed instructions on how to search through it. Then described how it should handle different prompt situations (errors, types of questions, quotes from the documentation). It is able to search through the provided knowledge and provide quotes and responses with it, but it at no point gives a coherent response, so it basically always functions like a more intelligent search feature. Putting that it should re-prompt itself with the knowledge extracted and rationalize/elaborated on it doesn't seem to do much either, though it did provide some improvement.


The retrieval from file has issues. I'm unsure what exactly it retrieves and how. Afaik it gets a kind of "chunk" from only one file per request in whatever way it considers to be relevant to the request. Could be a simple "embedding vector comparison" or something else...

Then we are unsure how much of the context that chunk replaces or overrides. Does it overwrite past messages? Does it overwrite the system prompt? Anything else? Who knows.

If anyone has any info I would appreciate it too. I gave up on it for anything significantly complicated, better off using the actions API to query a better RAG system instead.


I had to add to the instructions for it to search the knowledge files 2000 characters at a time, and to search for keywords and not exact phrases, which is really the only thing I could find online about developing one. It also needs to have the code interpreter enabled afaik and it seems to have issues with zip files as well but can extract and search them sometimes, though it seems to vary the technique and sometimes fail. I can confirm that it can search multiple files as I uploaded a mailing list archive and it would return results from multiple files in it.

I've moved to combining all my data into single files, but sometimes it also seems to have issues with them as well even if they are under the upload size limit, I assume that is due to how many characters are in them, and it will just brick the whole GPT until the offending file is removed.

The part I have issues with is having it actually use the data, it will quote/summarize data it found in the knowledge base and return where it found it if it can, but I can never make it do more than that. Ideally I want it to contextualize the data it finds in the knowledge files and prompt itself or factor it into a response, but anytime it accesses the knowledge base I get nothing more than a paraphrased response of what it found and why it may be applicable to my prompt.


Not a great comparison, custom GPTs use plugins or function calling similar to how they use code interpreter. You can't really compare them because plugins or functions are a tool that GPTs use in addition to other features. The advantage a custom GPT has is it is easy to set up but it comes with big disadvantages such as having to use their RAG system which is very opaque and only being able to use one system message. Building with the assistant API can be far superior but requires a lot more effort and skill in building your own APIs.


I hated the custom gpt experience.

1) the chat transcript is lost when you close/reopen the workspace. All that nuanced training conversation, gone.

2) the instructions for the gpt are just a summary of the training conversation. And those “instructions” were just too generalized- none of the nuance that was discussed.

I made a therapy gpt, but it simply wasn’t very useful after all of my instructions.


Agreed. I'd argue with it over and over to enforce things like "Don't reply with text only reply with an image" then it'd say "Ok! I made sure your GPT will only reply with an image and not respond with text! Try it out!" then I'd try it out and the first thing I'd get is "Ah an image of a dog.. let me get that for you!"

Same with numbered lists. I feel like GPTs would be more useful to me if ChatGPT adhered better to those kinds of restrictions.


I mainly need one reason: Microsoft doesn't hoard all your data


That is sold in form of the "Team" plan


Think GPTs can scale way better than people realize. Things where you don't always get good developer support or ecosystem (eg: GIS) and things that are ad hoc, GPTs w Code Interpreter can enable users to get value and get their work done without relying on a developer. The UX is well thought out and built with non techies in mind.


Precisely, the UX feels simple, and you get results almost in an instant. If you did write a plugin before, pulling over your action is just a matter of copying & pasting your existing definition. Also, at least some developers love themselves some great UX too :)


I feel the key question is not that if they’re better, but if people are using them more and more often than plugins.

And in particular, if they’re using the ones that aren’t just a custom system prompt. Because I really doubt there’s any big business in commercializing system prompts.

My hunch right now is that GPTs have made it clear that OpenAI should let user save multiple system prompts, but that there’s no real defensible business in distributing GPTs, as a chat interface is not that good for most purposes.


How do you tell the GTP to use the different category of files you upload in the knowledge base?

For example:

file A, File B : those are "data of users", use them to do "Y"

file C, File D : those are "data of buildings", do "X"




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: