In my experience they are equally unusable due to compounding reliability issues.
It’s a tech demo, not a platform, a data acquisition operation and alpha testing rather than anything seriously useful.
Interfaces are unstable.
Inference is intentionally non deterministic and control is crippled (e.g by choosing not to expose seed, denoising or image to image on dalle), to avoid PR backlash whole simultaneously hiding the true capabilities of the platform.
something as simple as using VITS to analyze an image fails 30% of the time because GPT5 decides it doesn’t have vision, wants to use pytesseract instead or writes hallucinatory pytorch code for a non existent vits library instead of just using inference.
One may create prompts that temporarily don’t fail at a high rate but constant silent finetuning, system prompt changes and unstable models / APIs make the whole thing a tech demo designed to get users to volunteer future training data
The implementation is incredibly simplistic to the point where I consider it broken.
I have a custom GPT where part of the shtick is that it speaks in rhyme. As an aside, I tried, and repeatedly failed, to get it to speak in iambic pentameter blank verse, but for whatever reason, that isn't a concept/constraint it can recognize and work with. So whatever, rhyming is ok.
The point isn't about that, it's that when I talk with this GPT for long enough, it abruptly forgets about speaking in rhyme. The custom prompt is just initial token state, once it drops out of the context window, it's gone.
This is a disaster for anyone trying to deploy a task-specific GPT, because it will appear to work for a few thousand tokens, after which it will just turn into vanilla GPT. There have to be a ton of ways to prevent this from happening but the system as implemented doesn't do any of them.
I've tried getting Custom GPT actions to work the other day.
The workflow is terrible. The UI is broken and requires me to keep refreshing and re-navigating to the editing page after each change. Saving doesn't work every time. The actions themselves don't work either (I think they pushed an update while I was testing, turning them from "broken and usually doesn't work" to "utterly broken"). There is no sensible feedback about what went wrong. No error message, no logs. The "executing action" UI is broken, and I keep getting various versions of the UI randomly on different runs, for some god forsaken reason. Sometimes there is no UI at all and the bot dumps the raw command straight into chat.
I've seen alpha-quality software, and this isn't even that.
I experienced the same. My hack was to create a GPT to interact with the interface for me. It’s GPTs all the way down, and my prediction is that eventually every interface on the internet will be GPT-based communicating over their own GPT IR under the hood.
You know how cyberpunk sci-fi talks about the cyberspace haunted by rogue AIs and daemons, being held back by hackers who fight them like knights fighting dragons?
I always found it childlishly stupid anthropomorphization of systems and processes - basically dumbing down reality to something 5 year old can enjoy, and treating it as a prediction of the future. That is, until last year. With Internet becoming GPTs fronting for other GPTs, well, fuck it, rogue AIs and demons and cybersurfers it is. Reality itself got dumb.
snapshotted models don't really do this... But I agree, OpenAI is a seriously awful target for serious work. I've been pretty focused on a function calling mixtral funetuning dataset. The moment I can reliably do inference at a high speed with a finetuned model that can do gpt-4-32k function calling in an intelligent, hierarchically ordered big planner manner is the day that I use OpenAI a lot less. It's coming! miqu leaked over the last week! We're so close.
While I agree ChatGPT is pretty flaky, calling the vision model via API I have not experienced these issues. Is it perhaps a tier issue with some of your API work, there is drastic performance differences between the service tiers.
Custom GPTs are plugins, just more streamlined. It's still just a carefully written system prompt + some basic middleware scanning the output and occasionally taking over.
The UX may be different, sure, but there's no technical difference and no technical innovation here. The main value of both plugins and custom "GPTs"[0] is that they're first party. You can build or buy better implementations, but it won't be the "GPTs".
--
[0] - Kudos for whoever at OpenAI that approved calling those "GPTs", for selecting a term that maximizes confusion not just about their offering, but screws with people's comprehension of LLMs in general.
> Kudos for whoever at OpenAI that approved calling those "GPTs"
I think OpenAI are only good at one thing: Making LLMs.
The rest of their offerings are pretty bad: Custom GPTs are a mess, their API is terrible, they deprecated their Python library the moment they released a new API version even though they changed the classes and they could have continued supporting both interfaced for a time, etc.
It's a shame Custom GPT authors can't easily opt-in to making their prompts and other configurations available. I think it would improve the quality and rate of improvement massively. Kind of "view source" by default (with a begrudging opt-out)
This right here is actually the coolest part about developing with LLMs. You just changed the functionality with a sentence rather than a config file, or writing code. It’s great to be able to break out functionalty into things that can be easily handled in English (or your human language of choice) or what should be done in code.
I think it's the worst part, because it's completely inscrutable. Ask for the same thing in different wording and get a different response. Ask for a similar thing and get stonewalled. A config file has structure which you can (in theory) learn perfectly from documentation or even from your IDE while writing the file. None of that is true of asking in plain English.
I feel in some ways current LLMs are making technology more arcane. Which is why people who have the time are having a blast figuring out all the secret incantations that get the LLM to give you what you want.
> I feel in some ways current LLMs are making technology more arcane. Which is why people who have the time are having a blast figuring out all the secret incantations
Yeah, there's an important gap between engaging visions of casting cool magic versus (boring) practical streamlining and abstracting-away.
To illustrate the difference, I'm going to recycle a rant I've often given about VR aficionados:
Today, I don't virtually fold a virtual paper to put it in a virtual envelope to virtually lick a virtual stamp with my virtual tongue before virtually walking down the virtual block to the virtual post office... I simply click "Send" in my email client!
Similarly, it's engaging to think of a future of AI-Pokemon trainers--"PikaGPT, I choose you! Assume we can win because of friendship!"--but I don't think things will actually succeed in that direction because most of the cool stuff is also cruft.
Yeah but defaults matter and most people won't remember - even if they would be happy to share. Sharing should be a checkbox and it should be on by default.
Anyone with any experience of tool/code sharing communities could have told them this.
I can confirm this does work. The GPT might provide a summary instead of your exact instructions, but it will be quite close to what you wrote.
I asked it to reveal the instructions for my Skincare Decoder[0] and Fast Food Decoder[1] and it complied but left out how the JSON data is computed. When I asked for that specifically, it returned my instructions for building the final JSON.
Most GPTs will give up their prompts with a little social engineering. You should just try asking. Some have countermeasures but most do not. Out of respect for the author, I wont post it here but these listed GPTs are no exception.
Agreed! It's seriously overlooked right now. I threw in typst (a document formatter like LaTeX) and was surprised that it worked [1]. Startup times for the sandbox are a bit slow though with about 8-10 seconds startup time.
I also saw that they just added a beta for mentioning custom GPTs with `@`. I really hope these things will move forward more. It shows that there is still a need for back end engineers, but you can mostly let LLMs handle the front end.
I am very surprised that many here don't understand the great power and value of custom GPTs: They give you access to online APIs combined with pre-configured custom prompts and a compute sandbox!
You can query databases, trigger events and handle the results smartly.
Now that OpenAi added the @ sign to talk to your preselected custom GPTs you can just use different APIs like slack colleges:
@downloader get the data from test.tsv
@sql create table according to tsv header
@sql insert data
@admin open mysql port
One thing to keep in mind is that even though custom gpts have access to a local sandbox file system, passing data around almost always involves GPT handling the data which becomes forbidding for any large token stream.
One critique that I share is the stupid branding "costume GPTs" and lack of discoverability: If you search the GPT store for wolfr you do not get wolfram alpha as completion! It only appears when you type
wolfra
Also it can't display any images other than dalle fantasies or pyplots which is a slightly annoying limitation, but familiar to users of other shells like bash.
> You can query databases, trigger events and handle the results smartly.
A few posts in this discussion are pointing out the fact that these custom GPTs are unusable and unreliable.
It's fair to talk about potential, but it's hard to accuse others of failing to see the value when you're not addressing the complains that those who assessed the value are pointing out that instead they are unusale.
Good point, there is currently no way to assess the quality of custom GPTs other than relying on aggregation effects of popularity (you can already review them so reviews are coming soon). You can't even see if they are truly useful by accessing external APIs or if they are just custom prompts.
> these custom GPTs are unusable and unreliable
That's too harsh though. Sure of the millions of custom GPTs many are useless, but those that work do work reasonably well, why wouldn't they?
About conversations being slightly hit or miss: I guess that's inherent to natural language, it can be circumvented by writing the prompt carefully and knowing the limitations.
How do you find the good tools? I don’t much want to play with this stuff myself, but might change my mind if I read some good reviews somewhere.
Similarly, there are probably a zillion Unix command-line utilities that I could install, but I won’t unless I somehow hear that they’re useful for something I care about. A command line tool needs its own documentation, but also, other people need to vouch for it.
just tried Grimoire, but it appears to be an example of "not working as advertised"? Tried to publish a function to netlify, was deployed as static html??
You can train it to display images from the web if you tell it how to build the URL. I've found that it often won't display the image in the preview or dev window, but they do appear when you run the GPT.
One caveat is the output of a Custom GPT (a saved conversation) can't be shared if it contains DALL-E images.
what do you mean? if I say display test.com/test.jpg it will not include a html image tag. or will it render image urls returned from my api? recently it often doesn't even create proper links, just a href=_new and the correct link description. first I thought my gpt triggered some safety mechanism but I've seen working links next to broken links
It is possible to craft the prompt so that it computes and displays the image. It's difficult and appears to almost be a jailbreak. But it is possible and I have seen it working in my own projects.
Didn't hear about the @ sign that sounds indeed very useful. Also, you're examples agree with my experience. The important thing is to make the tasks as small as possible while still being useful being able to quickly combine gpts with @ makes that much much easier.
Reading this article brought me no closer to understanding why people use these. From day one you could give any LLM any context you wanted, that's the whole point after all.
The actual next stage of LLM development will be giving the user the ability to select/deselect what training data to include otherwise there is a limit of at most a few pages of context you can provide.
Custom GPTs are trying to pretend thats what they are doing but it's not going to work.
Like you can't upload a novel and say, "speak to me as if you are this character" because the LLM can't ignore it's training data entirely and the context you give it gets drowned out quickly.
My problem with custom GPTs is that it's still building off of the chatbot fine tuned version of GPT-4 which incorporates a hardcoded system message in between yours and the model as well as reflects a very Goodhart's Law driven alignment.
For example, it's next to worthless for creative writing tasks - but it doesn't need to be.
Here is an example of a response to requesting chat suggestions as absurd and bizarre as possible from the current model:
> If you had to choose between eating a live octopus or a dead rat, which one would you pick and why?
It's stochastic so there's a variety but they are generally pretty dry and often information based (explain gravity to flat earthers, describe Earth culture to aliens, etc).
Here was one of the generations from the pre-release chat model integrated into the closed beta for Bing:
> Have you ever danced with a penguin under the moonlight?
I know which one of these two snapshots I'd want to build off of for any kind of creative GPTs, and it's not the one available to power GPTs.
The industry needs SotA competition in alignment strategies and goals badly if we want this tech successful outside of a narrow scope of STEM applications, and the reliance on GPT-4 synthetic data to train its competition isn't helping.
Sometimes I want browsing. Sometimes I want no browsing. Sometimes I want to talk to a marketer. Sometimes to my personal advisor. Sometimes to a python coder. Sometimes to an ML SME. I can quickly change context with a simple @ and select the right context from a dropdown. It's a super fast way to switch the common contexts I use with a LOT less typing.
For the select/deselect training data, I assume this means choose which context to pay attention to?
Each ML model, LLM and otherwise, is a combination of matmul operations & nonlinear activation functions on static weights. My understanding of your "ignoring training data" is to change the vector values of the neural network, which is part of what happens during fine tuning.
Curious why telling an LLM to speak like a character, then using few shot examples to anchor the model in a certain personality/tone doesn't suffice? Is it really the training data (meaning the response strays to random nonsense) or is it that the instructions are not good enough?
OpenAI has made very little effort on discoverability and filtering.
The opaqueness of Custom GPTs and the low effort in creating them compounds the problem.
1. Allow me to filter by "feature". I want to explore only GPTs with custom APIs or uploaded knowledge. At least let me filter by "length of custom instructions > x" so I can avoid 10,000 lazy submissions
2. Allow viewing of the custom configs by default. If an author chooses to disable this, then fine but "sharing by default" is a powerful mechanism to improve the ecosystem
I always feel like there is some trick to these I am missing out of, are there any good guides? Any time I look for some its just typical low effort blog/youtube spam trying to get in on the AI/GPT key words.
I have tried to work on one where I uploaded various documentation and spec sheets, wrote detailed instructions on how to search through it. Then described how it should handle different prompt situations (errors, types of questions, quotes from the documentation). It is able to search through the provided knowledge and provide quotes and responses with it, but it at no point gives a coherent response, so it basically always functions like a more intelligent search feature. Putting that it should re-prompt itself with the knowledge extracted and rationalize/elaborated on it doesn't seem to do much either, though it did provide some improvement.
The retrieval from file has issues. I'm unsure what exactly it retrieves and how. Afaik it gets a kind of "chunk" from only one file per request in whatever way it considers to be relevant to the request. Could be a simple "embedding vector comparison" or something else...
Then we are unsure how much of the context that chunk replaces or overrides. Does it overwrite past messages? Does it overwrite the system prompt? Anything else? Who knows.
If anyone has any info I would appreciate it too. I gave up on it for anything significantly complicated, better off using the actions API to query a better RAG system instead.
I had to add to the instructions for it to search the knowledge files 2000 characters at a time, and to search for keywords and not exact phrases, which is really the only thing I could find online about developing one. It also needs to have the code interpreter enabled afaik and it seems to have issues with zip files as well but can extract and search them sometimes, though it seems to vary the technique and sometimes fail. I can confirm that it can search multiple files as I uploaded a mailing list archive and it would return results from multiple files in it.
I've moved to combining all my data into single files, but sometimes it also seems to have issues with them as well even if they are under the upload size limit, I assume that is due to how many characters are in them, and it will just brick the whole GPT until the offending file is removed.
The part I have issues with is having it actually use the data, it will quote/summarize data it found in the knowledge base and return where it found it if it can, but I can never make it do more than that. Ideally I want it to contextualize the data it finds in the knowledge files and prompt itself or factor it into a response, but anytime it accesses the knowledge base I get nothing more than a paraphrased response of what it found and why it may be applicable to my prompt.
Not a great comparison, custom GPTs use plugins or function calling similar to how they use code interpreter. You can't really compare them because plugins or functions are a tool that GPTs use in addition to other features. The advantage a custom GPT has is it is easy to set up but it comes with big disadvantages such as having to use their RAG system which is very opaque and only being able to use one system message. Building with the assistant API can be far superior but requires a lot more effort and skill in building your own APIs.
1) the chat transcript is lost when you close/reopen the workspace. All that nuanced training conversation, gone.
2) the instructions for the gpt are just a summary of the training conversation. And those “instructions” were just too generalized- none of the nuance that was discussed.
I made a therapy gpt, but it simply wasn’t very useful after all of my instructions.
Agreed. I'd argue with it over and over to enforce things like "Don't reply with text only reply with an image" then it'd say "Ok! I made sure your GPT will only reply with an image and not respond with text! Try it out!" then I'd try it out and the first thing I'd get is "Ah an image of a dog.. let me get that for you!"
Same with numbered lists. I feel like GPTs would be more useful to me if ChatGPT adhered better to those kinds of restrictions.
Think GPTs can scale way better than people realize. Things where you don't always get good developer support or ecosystem (eg: GIS) and things that are ad hoc, GPTs w Code Interpreter can enable users to get value and get their work done without relying on a developer. The UX is well thought out and built with non techies in mind.
Precisely, the UX feels simple, and you get results almost in an instant. If you did write a plugin before, pulling over your action is just a matter of copying & pasting your existing definition. Also, at least some developers love themselves some great UX too :)
I feel the key question is not that if they’re better, but if people are using them more and more often than plugins.
And in particular, if they’re using the ones that aren’t just a custom system prompt. Because I really doubt there’s any big business in commercializing system prompts.
My hunch right now is that GPTs have made it clear that OpenAI should let user save multiple system prompts, but that there’s no real defensible business in distributing GPTs, as a chat interface is not that good for most purposes.
It’s a tech demo, not a platform, a data acquisition operation and alpha testing rather than anything seriously useful.
Interfaces are unstable. Inference is intentionally non deterministic and control is crippled (e.g by choosing not to expose seed, denoising or image to image on dalle), to avoid PR backlash whole simultaneously hiding the true capabilities of the platform.
something as simple as using VITS to analyze an image fails 30% of the time because GPT5 decides it doesn’t have vision, wants to use pytesseract instead or writes hallucinatory pytorch code for a non existent vits library instead of just using inference.
One may create prompts that temporarily don’t fail at a high rate but constant silent finetuning, system prompt changes and unstable models / APIs make the whole thing a tech demo designed to get users to volunteer future training data