Hacker News new | past | comments | ask | show | jobs | submit login

Most of the products announced (and the price cuts) appear to be more about increasing lock-in to the OpenAI API platform, which is not surprising given increased competition in the space. The GPTs/GPT Agents and Assistants demos in particular showed that they are a black box within a black box within a black box that you can't port anywhere else.

I'm mixed on the presentation and will need to read the fine print on the API docs on all of these things, which have been updated just now: https://platform.openai.com/docs/api-reference

The pricing page has now updated as well: https://openai.com/pricing

Notably, the DALL-E 3 API is $0.04 per image which is an order of magnitude above everyone else in the space.

EDIT: One interesting observation with the new OpenAI pricing structure not mentioned during the keynote: finetuned ChatGPT 3.5 is now 3x of the cost of the base ChatGPT 3.5, down from 8x the cost. That makes finetuning a more compelling option.




It's a good strategy. For me, avoiding the moat means either a big drop in quality and just ending up in somebody elses moat, or a big drop in quality and a lot more money spent. I've looked into it and maybe the most practical end-to-end system for owning my own LLM is to run a couple of 3090s on a consumer motherboard at substantial running cost to keep them up 24/7 and that's not powerful enough to cut it and rather expensive simultaniously. For a bit more expense, you can get more quality and lower running costs and much slower processing from buying a 128gb/192gb apple silicon setup and that's much much much slower than the "Turbo" services that OpenAI offers.

I think the biggest thing pushing me away from OpenAI was they were subsidizing the chat experience much more than the API and this seems to reconcile that quite a bit. Quite simply OpenAI is sweetening the pot here too much for me to really ignore, this is a massively subsdizised service. I honestly don't feel the switching costs in the future will outweigh the benefits I'm getting now.


Everybody's got their own calculus about how competitive their space is and what this tech can do for them, but some might be best off dancing around lock-in by being careful about what they use from OpenAI and how tightly they integrate with it.

This is very early in the maturity cycle for this tech. The options that will be available for private inference and fine tuning, for cloud-gpu/timeshare inference and fine tuning, and for competing hosted solutions are going to vastly different as months go by. What looks like squeezing value out of OpenAI today might look a lot like technical debt and frustrating lock-in a year from now.

That's what they're hoping you chase after, and if your product is defined by this technology, maybe that's what you have to do. But if you're just thinking about feature opportunities for a more robust product, judiciousness could pay off better than rushing. For now.


"What looks like squeezing value out of OpenAI today might look a lot like technical debt and frustrating lock-in a year from now."

Just wanted to highlight this as such a great, concise way to look at the Buy vs Build with pretty much any cloud service, thanks!


Seems right to me. That's why it's good to build with extendability in mind to allow switching easily in the future if needed.


While everything you say mighy be true, it's also irrelevant.

Time to market is more important, you build users you get an edge, you can swap models later on (as long as you own the data).


Almost nobody sharecropping their way to an MVP has a product except at the sufferance of the company doing the work for them - regardless of which one they switch to later.

There's very little there there in most of these folks.

(For the few where there is, though, I agree with you.)


The Phind CEO talked in an interview about how their own model is already out ahead of ChatGPT 4 for the target use case of their search engine in some cases, and increasingly matching it on general searches: https://www.latent.space/p/phind#details

I use it instead of Bing Chat now for cases where I really need a search engine and Google is useless. Mainly because it's faster, but I also like not having to open another browser.


Do you build on AWS AI services then? Or any other cloud provider? The outcome is the same, right? Technical lock in, cost risks, integration maintenance, etc.


The key line in my comment, for emphasis:

> This is very early in the maturity cycle for this tech.

Think about what value you get out of the services and what migration might look like. If you are making simple completion or chat calls with a clever prompt, then migration will probably be trivial when the time comes. Those features are the commodity that everyone will be offering and you'll be able to shop around for the ideal solution as alternatives become competitive.

Alternately, if you're handing OpenAI a ton of data for them to opaquely digest for fine tuning with no egress tools, or having them accumulate lots of other critical data with no egress, you're obviously getting yourself locked in.

The more features you use, the more idiosyncratic those features are, the more non-transferable those features are, and the more deeply you integrate those features, the more risk you're taking. So you want to consider whether the reward is worth that risk.

Different projects will legitimately have different answers for that.


While I agree with you, as a happy GPT4 plus customer, I'm worried about the inevitable enshittification downhill roll that will eventually ensue.

Once marketing gets in charge of product, it's doomed. And I can't think of a product startup that it hasn't happened to. Particularly with this type of growth, at some point, the suits start to out number the techies 10:1.

This is why openeness and healthy competition is primordial.


It's not marketing, it's economics.

If you set money on fire -- eventually there's a time when you need to stop doing that.


I think the parent is more talking about the other common situation where organizations start focusing on maximizing profits, rather than just working towards a basic profitability. Eg, Google Maps API pricing comes to mind.

Yes, OpenAI might be (we don't know how much) burning through their $5B capital/Azure credits now, but I think the `turbo` models are starting to addressing this as well. And $20/month from a large user base can also add up pretty quick.


You do see VC bloat up company sizes for what could have been a very profitable small to medium sized private business, without enshittefication, had they not hired dozens more people.


ChatGPT only costs a few dollars, but I'm also "paying" for the service by contributing training data to OpenAI.

Getting access to this type of interaction data with (mostly) humans must be quite valuable asset.


For me personally, being able to fine-tune the local LLM's at a much higher rank and training more layers is very useful for (somewhat unreliably) embedding information. AFAIK the OpenAI fine-tuning is more geared towards formatting the output.


As I understand it, fine tuning is never really about adding content. RAG and related techniques are likely cheaper/better if that’s what you want.


yup how I understand fine tuning is more about adding context and bigger picture. RAG is more about adding actual content. Good system probably needs both in long run.


This isn't subsidized. OpenAI makes money on their API and ChatGPT pricing.


A good strategy for who? Society? Customers? The future? Or just for making money for the owners?


Obviously it is good strategy, surely created from GPT.


I don't understand the lock-in argument here. Yes, if a competitor comes in there will be switching cost as everything is re-learned. However, from a code perspective, it is a function of the key and a relatively small API. New regulations outstanding, what is stoping someone from moving from OpenAI to Anthropic (for example) other than the cost of learning how to effectively utilize Anthropic for your use case?

OpenAI doesn't have some sort of egress feed for your database.


> OpenAI doesn't have some sort of egress feed for your database.

That's what they're trying to incentivize, especically with being able to upload files for their own implementation of RAG. You're not getting the vector representation of those files back, and switching to another provider will require rebuilding and testing that infrastructure.


Thats exactly what i thought. Smart strategy on OpenAI's part given that its extremely easy (and free) to do RAG with pgvector.


It's neither free nor performant.

The developer experience is lacking vs. other vector database providers and the performance doesn't match those that prioritize performance rather than devex. You're also spending time writing plumbing around postgres that isn't really transferrable work.

For some people already in the ecosystem it will make sense.


> However, from a code perspective, it is a function of the key and a relatively small API.

You're thinking of traditional apps and APIs.

In an AI application, most of the work is in prompt engineering, not wiring up the API to your app. Prompts that work well for one model will fail horribly for another. People spend months refining their prompts before they're safe to share with users, and switching platforms will require doing most of that refinement over again.


I’d be more worried about this if OpenAI had a track record of increasing prices, but the opposite happens. I get more for the same price basically every 6 months.


Sure, they are decreasing the prices right now. But once it comes time for them to become profitable they can easily reverse course.


I'd imagine that will start to switch back the other way at some point. Decrease prices to gain market share and get you locked in, then increase prices to earn more money to keep VC's happy


Still moving between models is less arduous than switching cloud providers, depending on use case and price difference of course. Most models hold GPT4 as the benchmark they aspire to and should converge to its capabilities.


It's not a question of converging on its capabilities, it's a question of responding equivalently to the nuances of the prompt you've crafted. Two models can be equally capable, but one might interpret a phrase in your prompt slightly differently.


Switching from one API to another generally requires refactoring. I’ve not had much problems moving between LLMs (openai to Anthropic)


Then you’re either not testing your prompts or doing something trivial.

Remember: a good model with a good prompt will generate bad outputs sometimes.

A bad model with a bad prompt will generate a good output sometimes.

That is simply a fact with these non deterministic models.

You have to do many iterations for each prompt to verify they are working correctly.

> I’ve not had much problems moving between LLMs…

If you want to move your prompts to a different model, you’re effectively replacing one:

f(prompt + seed) => output

With different black box implementation.

Unless you’re measuring the output over multiple iterations of (seed) and verifying your prompt still does the right thing, it’s actually very likely that what you’ve done if take an application with a known output space and converted it to an application with an unknown output space…

that partially overlaps the original output space!

So it looks like it’s the same.

…but it isn’t, and the “isn’t” is in weird edge cases.

Unless you’re measuring that, you simply now have an app that does “eh, who knows?”

So yes. Porting is trivial if you don’t care if you have the same functionality.

…but reliably porting is much harder (or longer).


How many people are even writing tests for these things?


Very few. Many deployed apps don't have a good quantitative grasp of the quality of their LLMs. Some are doing testing or evaluation, through things like unit tests, A/B testing different prompts, collecting user feedback.

I think we're exiting the phase where people can launch an AI app and have people use it just because of the initial "wow factor" and moving into the phase where users will start churning and businesses will need to make sure that their AI agent is performing and they they understand how well it's performing.


> many iterations of each prompt

BTW its much faster and cheaper to artive at a good prompt if you sample the model in deterministic mode (ie temperature=0)

By default you have to guess if the difference is due to the prompt change or due to the dice roll, as you’ve noticed, but you don’t need to!


You should have a read of https://huggingface.co/blog/how-to-generate (the section on sampling, with regard to setting temperature to zero).

This is degenerate (greedy) behaviour, and not representative of the what the prompt will behave like at a higher temperature.

(At least, that’s my understanding; it’s a complex topic but broadly speaking there no specific reason, as far as I’m aware, to expect that a particular combination of params/prompt is representative of any other combination of params/prompt for the same model; it may be, but it may not. Certainly on models like GPT4 it is not, for reasons that are not clear to anyone. So… take care with your prompt testing. setting temperature to 0 is basically meaningless unless you expect to use a temperature of 0 in production. The results you get from your prompts at temp 0 are not generally reflective of the results you will get at temp > 0).


[flagged]


> Please don't post insinuations about astroturfing, shilling, brigading, foreign agents, and the like. It degrades discussion and is usually mistaken. If you're worried about abuse, email hn@ycombinator.com and we'll look at the data.

https://news.ycombinator.com/newsguidelines.html


likewise, stop posting these references to the guidelines, if you feel the guidelines are being broken report it to the moderators and move on.

This type of post is actually more disruptive than the post you're replying to.


> This type of post is actually more disruptive than the post you're replying to.

As is yours. The author of that comment has contact information in their information, why not unicast them?


https://en.wikipedia.org/wiki/Paradox_of_tolerance

> The paradox of tolerance states that if a society's practice of tolerance is inclusive of the intolerant, intolerance will ultimately dominate, eliminating the tolerant and the practice of tolerance with them. Karl Popper described it as the seemingly self-contradictory idea that, in order to maintain a tolerant society, the society must retain the right to be intolerant of intolerance.

To answer your question more succinctly, because the poster isn't the only person who will read these comments.


> To answer your question more succinctly, because the poster isn't the only person who will read these comments.

I would bet the poster you replied to had the same intent.


I'll always defend your right to say whatever you want but that never implies it's valid.


I most definitely am not paid by OpenAI and am very confused how my original (critical) comment could be seen as astroturfing.


I have no financial vested interest.

To prove it, I'll post Dr. Emily Bender's fantastic podcast about the problematic issues behind the current LLM wave. https://peertube.dair-institute.org/w/qpKuiNLTuHHMnvWGjnA2D8

(I think they are behind the most cogent critiques and worth knowing.)


> Assistants demos in particular showed that they are a black box within a black box within a black box that you can't port anywhere else.

I'd argue the opposite. The new "Threads" interface in the OpenAI admin section lets you see exactly how it's interpreting input/output specifically to address the black box effect.

Source: https://platform.openai.com/docs/api-reference/runs/listRunS... tells you exactly how it's stepping through the chain. Even more visibility than there used to be.


I agree that some parts of the process now seem more like “open”, but there is definitely a lot more magic in the new processing. Namely, threads can have an arbitrary length, and OpenAI automatically handles context window management for you. Their API now also handles retrieval of information from raw files, so you don’t need to worry about embeddings.

Lastly, you don’t even need any sort of database to keep track of threads and messages. The API is now stateful!

I think that most of these changes are exciting and make it a lot easier for people to get started. There is no doubt in my mind though that the API is now an even bigger blackbox, and lock-in is slightly increased depending on how you integrate with it.


I wouldn't say the black box issue is unique to OpenAI. I suspect nobody could explain certain behaviors, including them.

As for lock in, agreed completely.


Indeterminate context, unknown/hidden values and a stateful API are usually reasons for me to look elsewhere for a solution.


Mistral + 2 weeks of work from the community. Not as good, but private and free. It will trail OpenAI by 6-12 months in capabilities.


OpenAI offering 128k context is very appealing, however.

I tried some Mistral variants with larger context windows, and had very poor results… the model would often offer either an empty completion or a nonsensical completion, even though the content fit comfortably within the context window, and I was placing a direct question either at the beginning or end, and either with or without an explanation of the task and the content. Large contexts just felt broken. There are so many ways that we are more than “two weeks” from the open source solutions matching what OpenAI offers.

And that’s to say nothing of how far behind these smaller models are in terms of accuracy or instruction following.

For now, 6-12 months behind also isn’t good enough. In the uncertain case that this stays true, then a year from now the open models could be perfectly adequate for many use cases… but it’s very hard to predict the progression of these technologies.


Open researchers are trying to shrink and speed up 138K models e.g. YaRN https://github.com/jquesnelle/yarn

It's very compelling and opens up a lot of use cases, so I've been keeping an eye out for advancements. However, inferencing on 4xA100s would be the target today for YaRN and 128K to get a reasonable token rate on their version of Mistral.


Comparing a 7B parameter model to a 1.8T parameter model is kind of silly. Of course it's behind on accuracy, but it also takes 1% of the resources.


The person I replied to had decided to compare Mistral to what was launched, so I went along with their comparison and showed how I have been unsatisfied with it. But, these open models can certainly be fun to play with.

Regardless, where did you find 1.8T for GPT-4 Turbo? The Turbo model is the one with the 128K context size, and the Turbo models tend to have a much lower parameter count from what people can tell. Nobody outside of OpenAI even knows how many parameters regular GPT-4 has. 1.8T is one of several guesses I have seen people make, but the guesses vary significantly.

I’m also not convinced that parameter counts are everything, as your comment clearly implies, or that chinchilla scaling is fully understood. More research seems required to find the right balance: https://espadrine.github.io/blog/posts/chinchilla-s-death.ht...


Nah, it's training quality and context saturation.

Grab an 8K context model, tweak some internals and try to pass 32K context into it - it's still an 8K model and will go glitchy beyond 8K unless it's trained at higher context lengths.

Anthropic for example talk about the model's ability to spot words in the entire Great Gatsby novel loaded into context. It's a hint to how the model is trained.

Parameter counts are a unified metric, what seems to be important is embedding dimensionality to transfer information through the layers - and the layers themselves to both store and process the nuance of information.


It's an order of magnitude comparison.

Let's just agree it's 100x-300x more parameters, and let's assume the open ai folks are pretty smart and have a sense for the optimal number of tokens to train on.


This definitely. Andrej Karpathy himself mentions tuned weight initialisation in one of his lectures. The TinyGPT code he wrote goes through it.

Additionally explanations for the raw mathematics of log likelihoods and their loss ballparks.

Interesting low-level stuff. These researchers are the best of the best working for the company that can afford them working on the best models available.


Usually the 7B model is fine-tuned with "enriched" data, "textbook quality" generations from the 1.8T model. Riding on its coat tails.


That's my take-away from limited attempts to get Code Llama2 Instruct to implement a moderately complex spec as well, using special INST and SYS tokens even or just pasting some spec text along in a 12k context when Code Llama2 supposedly can honor up to 100k tokens. And I don't even know how to combine code infilling with an elaborate spec text exceeding the volume of what normally goes into code comments. Is ChatGPT 4 really any better?


Their products are incredible though. I’ve tried the alternatives and even Claude is not nearly as good as even ChatGPT. Claude gives an ethics lecture with every second reply, which costs me money each time and makes their product very difficult to (want to) embed.


Honestly the companies that completely ignore ethics are the only ones who are going to scoop up any market share outside of OpenAI.

Getting a chiding lecture every time you ask an AI to do something does absolutely nothing for the end user other than waste their time. "AI Safety" academics are memeing themselves out of the future of this tech and leaving the gate wide open for "unsafe" AI to flourish with this farcical behavior.


What are you using it for? I want to know what people actually use these things for damn it !


Summarizing large documents. Finding relationships between two (or more) documents. Building a set of points bridging the gap between the documents. Correcting malformed text data.

Not everything is just data in a database or some structured format. Sometimes you have blobs of text from a user, or maybe you ran whisper on an audio/video file and now you just have a transcript blob… it’s never been easier to automate all of this stuff and get accurate results.

You can even have humans in the loop still to protect against hallucinations, or use one model to validate another (ask GPT to correct or flag issues with a whisper transcript)


Retrieving the non-metadata titles of 45,000 various PDF, docx, etc. without a bunch of rules/regexs that would fail half the time.

“Derp derp, hallucinations”.

Eh, no, not in practice, not when the entire context and document is provided and the tools are used correctly.


I'd built a bot to use ChatGPT from Telegram (this was before the ChatGPT API), and currently building a tool to help make writing easier (https://www.penpersona.com). This is the API.

Apart from that, it's pretty much replaced 80% of my search engine usage, I can ask it to collate reviews for a product from reddit and other sites, get the critical reception of a book, etc. You don't have to go and read long posts and articles, have GPT do it for you. There's many other use cases like this. For the second part, I'm using a UI called Typing Mind (which also works with the API).


> currently building a tool to help make writing easier

That's cool!

> it's pretty much replaced 80% of my search engine usage

That's not cool. That's how you end up relying on nonexisting sources or other hallucinations.


As opposed to raw information surfaced by the search engine, which we all know is perfectly reliable, unbiased, and up to date?

That aside, this particular admonishment was worn out a couple of months after ChatGPT was released. It does not need to be repeated every time someone mentions doing something interesting with an LLM.


> That's not cool. That's how you end up relying on nonexisting sources or other hallucinations.

I have integrated a search engine plugin and a web browsing plugin, which means I don't have to do the search, for example I can ask it to compare the battery life of 3 phones, it'll do 3 searches, might open couple of reddit threads too, then give me the info that I need. It's miles ahead of the current experience with search engines.


Also, DALL·E 3 "HD" is double the price at $0.08. I'm curious to play around with it once the API changes go live later today.

The docs say:

> By default, images are generated at standard quality, but when using DALL·E 3 you can set quality: "hd" for enhanced detail. Square, standard quality images are the fastest to generate.

https://platform.openai.com/docs/guides/images/usage


This is great, but Dall.E still has a long way to go before reaching midjourney standards and I'm curious to see how they can pull that off.


> most of the products announced (and the price cuts) appear to be more about increasing lock-in to the OpenAI API platform

OpenAI is currently refusing far more enterprises than these products could "lock-in" even with 100% stickiness.

Makes it unlikely this is about lock-in or fighting churn when arguably, the best advertisement for GPT-4 is comparing its raw results to any other LLM.

If you said their goal was fomenting FOMO, I'd buy it. Curious, though, when they'll let the FOMO fulfillment rate go up by accepting revenue for servicing that demand.


>The GPTs/GPT Agents and Assistants demos in particular showed that they are a black box within a black box within a black box that you can't port anywhere else.

This just rings hollow to me. We lost the fights for database portability, cloud portability, payments/billing portability, and other individual SaaS lock-in. I don't see why it'll be different this time around.


> We lost the fights for database portability, cloud portability, payments/billing portability, and other individual SaaS lock-in.

No we didn’t. There are viable on-prem alternatives or cross cloud alternatives for everything popular on the cloud.

Many companies did choose to hand their destiny over to cloud providers but lots didn’t.


I think it's more about finding places to add value than "lock in" per se. It seems they're adding value with improved developer experience and cost/performance rather than on the models themselves. Not necessarily nefarious attempts to lock in customers, but it may have the same outcome :)


What do you mean "orders of magnitude above" for DALL-E? As far as I can see, Midjourney is $0.05 per image, and that's if you don't forget you have a subscription. I've ended up paying $10 per image.


A friend of mine is building Zep (https://www.getzep.com/), which seems to offer a lot of the Assistant + Retrieval functionality in a self-hostable and model-agnostic way. That type of project may the way around lock-in.


Anything open about OpenAI starts and ends with the name




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: