Hacker Newsnew | past | comments | ask | show | jobs | submit | dmd's commentslogin

> old-school analog awesomeness of galactic empires that seem to entirely lack integrated circuits

Reminds me of Harry Turtledove's The Road Not Taken.

https://en.wikipedia.org/wiki/The_Road_Not_Taken_(short_stor...


I would LOVE to use Opus 4.5, but it means I (a merely Pro peon) can work for maybe 30 minutes a day, instead of 60-90.

folk can return a 502 error, apparently.

I consistently have exactly the opposite experience. ChatGPT seems extremely willing to do a huge number of searches, think about them, and then kick off more searches after that thinking, think about it, etc., etc. whereas it seems like Gemini is extremely reluctant to do more than a couple of searches. ChatGPT also is willing to open up PDFs, screenshot them, OCR them and use that as input, whereas Gemini just ignores them.

I will say that it is wild, if not somewhat problematic that two users have such disparate views of seemingly the same product. I say that, but then I remember my own experience just from few days ago. I don't pay for gemini, but I have paid chatgpt sub. I tested both for the same product with seemingly same prompt and subbed chatgpt subjectively beat gemini in terms of scope, options and links with current decent deals.

It seems ( only seems, because I have not gotten around to test it in any systematic way ) that some variables like context and what the model knows about you may actually influence quality ( or lack thereof ) of the response.


> I will say that it is wild, if not somewhat problematic that two users have such disparate views of seemingly the same product.

This happens all the time on HN. Before opening this thread, I was expecting that the top comment would be 100% positive about the product or its competitor, and one of the top replies would be exactly the opposite, and sure enough...

I don't know why it is. It's honestly a bit disappointing that the most upvoted comments often have the least nuance.


How much nuance can one person's experience have? If the top two most visible things are detailed, contrary experiences of the same product, that seems a pretty good outcome?

Also, why introduce nuance for the sake of nuance? For every single use case, Gemini (and Claude) has performed better. I can’t give ChatGPT even the slightest credit when it doesnt deserve any

Replace "on HN" with "in the course of human events" and we may have a generally true statement ;)

Chatgpt is not one model! Unless you manually specify to use a particular model your question can be routed to different models depending on what it guesses would be most appropriate for your question.

Isn’t that just standard MoE behavior? And isn’t the only choice you have from the UI between “Instant” and “Thinking”?

MoE is a single model thing, model routing happens earlier.

Yes but then what does the grandparent mean with “unless you specify a specific model” ? Do they mean “if you select auto, it automatically decides between instant or thinking” ?

That’s… hardly something worth mentioning.


Because neither product has any consistency in its results, no predictive behaviour. One day it performs well, another it hallucinates non existing facts and libraries. Those are stochastic machines

I see the hyperbole is the point, but surely what these machines do is to literally predict? The entire prompt engineering endeavour is to get them to predict better and more precisely. Of course, these are not perfect solutions - they are stochastic after all, just not unpredictably.

Prompt engineering is voodoo. There's no sure way to determine how well these models will respond to a question. Of course, giving additional information may be helpful, but even that is not guaranteed.

Also every model update changes how you have to prompt them to get the answers you want. Setting up pre-prompts can help, but with each new version, you have to figure out through trial and error how to get it to respond to your type of queries.

I can't wait to see how bad my finally sort-of-working ChatGPT 5.1 pre-prompts work with 5.2.

Edit: How to talk to these models is actually documented, but you have to read through huge documents: https://cdn.openai.com/gpt-5-system-card.pdf


It definitely isn’t voodoo, it’s more like forecasting weather. Some forecasts are easier to make, some are harder (it’ll be cold when it’s winter vs the exact location and wind speed of a tornado for an extreme example). The difference is you can try to mix things up in the prompt to maximize the likelihood of getting what you want out and there are feasibility thresholds for use cases, e.g. if you get a good answer 95% of the time it’s qualitatively different than 55%.

No, it's not. Nowadays we know how to predict the weather with great confidence. Prompting may get you different results each time. Moreover, LLMs depend on the context of your prompts (because of their memory), so a single prompt may be close to useless and two different people can get vastly different results.

> we know how to predict the weather with great confidence

some weather, sometimes. we're not good at predicting exact paths of tornadoes.

> so a single prompt may be close to useless and two different people can get vastly different results

of course, but it can be wrong 50% of the time or 5% of the time or .5% of the time and each of those thresholds unlock possibilities.


And I’d really like for Gemini to be as good or better, since I get it for free with my Workspace account, whereas I pay for chatgpt. But every time I try both on a query I’m just blown away by how vastly better chatgpt is, at least for the heavy-on-searching-for-stuff kinds of queries I typically do.

Gemini has tons of people using it free via aistudio

I can't help but feel that google gives free requests the absolute lowest priority, greatest quantization, cheapest thinking budget, etc.

I pay for gemini and chatGPT and have been pretty hooked on Gemini 3 since launch.


It’s like having 3 coins and users preferring one or the other when tossing it because one coin gives consistently more heads (or tails) than the other coin.

What is better is to build a good set of rules and stick to one and then refine those rules over time as you get more experience using the tool or if the tool evolves and digress from the results you expect.


<< What is better is to build a good set of rules and

But, unless you are on a local model you control, you literally can't. Otherwise, good rules will work only as long as the next update allows. I will admit that makes me consider some other options, but those probably shouldn't be 'set and iterate' each time something changes.


what I had in mind when I added that comment was for coding, with the use of .md files. For the web version of chats I agree there is little control on how to tailor the way you want the agent to behave, unless you give a initial "setup" prompt.

I can use GPT one day and the next get a different experience with the same problem space. Same with Gemini.

This is by design, given a non-determenitisic application?

sure. It may be more than that...possibly due to variable operating params on the servers and current load.

On whole, if I compare my AI assistant to a human worker, I get more variance than I would from a human office worker.


Thats because you don't 'own' the LLM compute. If you instead bought your office workers by the question I'm sure the variability would increase.

They're not really capable of producing varying answers based on load.

But they are capable of producing different answers because they feel like behaving differently if the current date is a holiday, and things like that. They're basically just little guys.


I guess LLMs have a mood too

Vibes

Tesla FSD has been more or less the same experience. Some people drive 100s of miles without disengaging while others pull the plug within half a mile from their house. A lot of it depends on what the customer is willing to tolerate.

We've been having trouble telling if people are using the same product ever since Chat GPT first got popular. The had a free model and a paid model, that was it, no other competitors or naming schemes to worry about, and discussions were still full of people talking about current capabilities without saying what model they were using.

For me, "gemini" currently means using this model in the llm.datasette.io cli tool.

openrouter/google/gemini-3-pro-preview

For what anyone else means? If they're equivalent? If Google does something different when you use "Gemini 3" in their browser app vs their cli app vs plans vs api users vs third party api users? No idea to any of the above.

I hate naming in the llm space.


FWIW i’m always using 5.1 Thinking.

Could also be a language thing ...

Same, I use chatgpt plus (the entry-level paid option) extensively for personal research projects and coding, and it seems miles ahead of whatever "Gemini Pro" is that I have through work. Twice yesterday, gemini repeated verbatim a previous response as if I hadn't asked another question and told it why the previous response was bad. Gemini feels like chatGPT from two years ago.

Are you uploading PDFs that already have a text layer?

I don't currently subscribe to Gemini but on A.I. Studio's free offering when I upload a non OCR PDF of around 20 pages the software environment's OCR feeds it to the model with greater accuracy than I've seen from any other source.


I’m not uploading PDFs at all. I’m talking about PDFs it finds while searching than it extracts data from for the conversation.

I'm surprised to hear anyone finds these models trustworthy for research.

Just today I asked Claude what year over year inflation was and it gave me 2023 to 2024.

I also thought some sites ban A.I. crawling so if they have the best source on a topic, you won't get it.


Anytime you use LLMs you should be keenly aware of their knowledge cutoff. Like any other tool, the more you understand it, the better it works.

I'm sorry but I don't see what "knowledge cutoff" has to do with what we were talking about- which is using a LLM find PDFs and other sources for research.

I agree with you. To me, gemini has much worse search results. Then again, I use kagi for search and I cannot stand the search results from Google anymore. And its clear that gemini uses those.

In contrast, chatgpt has built their own search engine that performs better in my experience. Except for coding, then I opt for Claude opus 4.5.


Perplexity Pro with any thinking model blows both out of the water in a fraction of the time, in my experience

I’ve done this as well. I’m only making twice what I was 30 years ago at 18. (Though to be fair, I was making a ridiculous amount of money for a 18 year old - my first job was in the Qwest network operations center.)

Hacker? Straight to jail.

No no, spend three months in an untraceable maze of ICE holding facilities, then to jail, then deported to a 'shithole' country that you didn't come from in the first place.

You only get deported to a third country if you refuse to be deported to your home country.

The information you would need to be able to state this categorically is not publicly available.

I think deporting you to Switzerland is no fun and won't teach you any valuable lesson.

Works fine in Chrome on both my W11 and MacOS 15.7.2 machines.

As a very small (like, two digit spend a month) AWS user, I still have gotten a human to help me when I've needed one.

Amazon is amazing to be a customer of. Just not an employee of (not one, know many).


Exactly. I give services like this - generally coded as someone's first "wow I know PHP now!" or the modern equivalent - approximately 5 years shelf life, at best.

Whereas I have notes-to-future-me on my calendar that I put there 30 years ago.


What calendar system have you been using for 30 years, that's survived that long?

I think I sent one of those "mails to the future" in the 90's, asking 2002 me how I am. I don't think it ever arrived, or the free email domain I was using ceased operating.

Sheesh, anyone old enough to remember the services offering a free email address with a choice of maybe 50 domains in a dropdown?


> Sheesh, anyone old enough to remember the services offering a free email address with a choice of maybe 50 domains in a dropdown?

Mail.com is still around and offers a lot of domains, though I think the amount of domains has reduced over the years.


Not the same one all that time, but I've always exported/imported when I have changed.

For the same reason, I have every email I've ever sent or received, going back to my 1988 FIDOnet account.


The absolute best teaching of the Fourier transform I've ever encountered is the extremely bizarre book "Who is Fourier?"

https://www.amazon.com/Who-Fourier-Mathematical-Transnationa...


This was my first exposure to the Fourier transfer. I also highly recommend this book. It was recommended to me by the head of the math department at university.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: