Hacker News new | past | comments | ask | show | jobs | submit login
Large language models as simulated economic agents (2022) [pdf] (john-joseph-horton.com)
89 points by benbreen on Jan 14, 2023 | hide | past | favorite | 42 comments



I love it. Instead of (a) running mathematical experiments that model human beings as utility-maximizing agents in a highly-simplified toy economy (easy and cheap, but unrealistic), or (b) running large-scale social experiments on actual human beings (more realistic, but hard and expensive), the authors propose (c) running large-scale experiments on large language models (LLMs) trained to respond, i.e., behave, like human beings. Recent LLMs seem to model human beings well enough for it!

Abstract:

> Newly-developed large language models (LLM)—because of how they are trained and designed—are implicit computational models of humans—a homo silicus. These models can be used the same way economists use homo economicus: they can be given endowments, information, preferences, and so on and then their behavior can be explored in scenarios via simulation. I demonstrate this approach using OpenAI’s GPT3 with experiments derived from Charness and Rabin (2002), Kahneman, Knetsch and Thaler (1986) and Samuelson and Zeckhauser (1988). The findings are qualitatively similar to the original results, but it is also trivially easy to try variations that offer fresh insights. Departing from the traditional laboratory paradigm, I also create a hiring scenario where an employer faces applicants that differ in experience and wage ask and then analyze how a minimum wage affects realized wages and the extent of labor-labor substitution.



Great reference. Thank you. I think simulation is underused as a thinking aid and pedagogical tool.

In statistics Andrew Gelman has been championing simulation lately.


I believe in simulation for those reasons. I really enjoy the physics sims but be found online and they have helped me develop a better intuition on topics than my college courses did.

That said, I don’t know how to apply it to my everyday life and work. I don’t know what kinds of inputs are required for meaningful models to be possible. I don’t even know where to start with tooling, other than raw scripting. Any suggestions for starting points?


Attempting to draw any kind of conclusions about the real world and human behaviour from a chatbot. Can't decide if this is hilarious or disturbing.


It is so implausible that the training process that creates LLMs might learn features of human behavior that could then be uncovered via experimentation? I showed, empirically, that one can replicate several findings in behavioral economics with AI agents. Perhaps the model "knows" how to behave from these papers, but I think the more plausible interpretation is that it learned about human preferences (against price gouging, status quo bias, & so on) from its training. As such, it seems quite likely that there are other latent behaviors captured by LLMs and yet to be discovered.


> As such, it seems quite likely that there are other latent behaviors captured by LLMs and yet to be discovered.

>> What NN topology can learn a quantum harmonic model?

Can any LLM do n-body gravity? What does it say when it doesn't know; doesn't have confidence in estimates?

>> Quantum harmonic oscillators have also found application in modeling financial markets. Quantum harmonic oscillator: https://en.wikipedia.org/wiki/Quantum_harmonic_oscillator

"Modeling stock return distributions with a quantum harmonic oscillator" (2018) https://iopscience.iop.org/article/10.1209/0295-5075/120/380...

... Nudge, nudge.

Behavioral economics: https://en.wikipedia.org/wiki/Behavioral_economics

https://twitter.com/westurner/status/1614123454642487296

Virtual economies do afford certain opportunities for economic experiments.


The potential hole in your thinking is the end of your paper where you advise how to get good answers: ask questions in an economist phd style! This presents a problem left unaddressed.


Are you referring to this: "What kinds of experiments are likely to work well? Given current capabilities, games with complex instructions are not presently likely to work well, but with more advanced LLMs on the horizon, this is likely to change. I should also note that research questions like what is “the effect of x on y” are likely to work much better than questions like “what is the level of x?.” Consider that in my Kahneman et al. (1986) example, I can create AI “socialists” who are not too keen on the price system generally. If I polled them about who they want for president, there is no reason to think it would generalize to the population at large. But if my research question was “what is the effect of the size of the price increase on moral judgments” I might get be able to make progress. That being said, it might be possible to create agents with the correct “weights” to get not just qualitative results but also quantitatively accurate results. I did not try, but one could imagine choosing population shares for the Charness and Rabin (2002) “types” to match moments with reality, then using that population for other scenarios." --- To clarify, this about what research questions are likely to work well here, not what questions posed to LLMs will work well.


By posing research questions, you get research conclusions from the same field of study. The whole thing is not a model of human thinking in the text world, but rather a model of economic research papers.


I'm sorry I don't follow - is your claim, that, say, an AI agent exhibiting status quo bias in responding to decision scenarios (e.g., a preference for options posed as the status quo relative to a neutral framing - Figure 3) that the reason this happens, empirically, is because the LLM has been trained on text describing status quo bias? E.g., like if an apple fell to the ground in an game, it was because the physics engine had been programmed w/ laws of gravity?


You are posing questions to the AI that only economists ever ask. You think you are instructing to it to reason “as a libertarian”, but you are actually using such economics lingo that the AI is regurgitating via “based on economist descriptions of libertarian decision making, what decision should the AI make.”

Imagine this scenario. You have a group of students and you teach them how libertarians, socialists, optimists, etc empirically respond to game theory questions. For the final exam, you ask them “assuming you are a libertarian, what would you do in this game?” Now the students mostly get the answers right according to economic theory. By teaching economic theory, and having students regurgitate the ideas on an exam, the exam results provide nothing new for field of economics. The AI is answering questions just like the students taking the final exam.

It would be like me teaching my child lots of things, and then when my child shares my own opinions, then I take that as evidence my beliefs are correct. Since I already believe my beliefs are correct, it is natural, but incorrect, to think the child’s utterances offer confirmation.


Got it - so it is the "performativity critique" - the idea that the LLM "knows" economic theories and responds in accordance with those theories. I don't think that's very likely because a) econ writing is presumably a tiny, tiny fraction of the corpus and (b) it would imply an amazing degree of transfer learning e.g., it would know to apply "status quo bias" (because it ready the papers) to new scenarios. But as the paper makes clear, you can't use it to "confirm" theories but rather use it like economists use other models - to explore behavior and generate testable predictions cheaply that you can go test with actual humans in realistic scenarios. The last experiment in the paper is from an experiment in a working paper of mine. There's no way the LLM knows this result, but if I had reverse the temporal order (create the scenario w/ the LLM, then run the experiment), it could have guided what to look at. That's likely what's scientifically useful. Anyway, thanks for engaging.


> Recent LLMs seem to model human beings well enough for it!

Human beings are bundles of emotions and feelings. Quite a lot of economic activity of humans is motivated by irrational impulses that are engendered in society and through interacting in society with other humans. More fundamentally, OP’s assertions regarding homo silicus and “implicit computational models of humans” are precisely the matter under contention. Does language fully capture human existence? Is thought truly simply a side effect of language? I am in the camp that says, no.


That's not what "rational"/"irrational" means in economics.

It means something like, if you're an economist and you see the price of eggs has spiked, you can't just enter the market at the old price and win all the business from existing egg farmers. You don't know more than existing market actors (they're "rational agents") just because you know economics.


No one's saying that recent LLMs "fully capture human existence," whatever that may mean.

But the evidence in this paper suggests they simulate human beings well enough for these kinds of experiments.



:-)


Are LLMs rational?


touche


oh hey - it's my paper! If anyone is interested in exploring these ideas, feel free to get in touch (@johnjhorton, https://www.john-joseph-horton.com/). FWIW - I think it would be really neat to build a Python library w/ some tools for constructing & running experiments of different kinds. I think the paper only scratches surface of what's possible (esp. once GPT4-ish has an API).


This is really cool. I had a similar thought that GPT3 could be used to simulate political polling. A few weeks ago I tried telling GPT3 that it was part of a specific demographic (age, gender, race, income, political leaning, etc) and then asked it how it would respond to certain political questions (I tried gun control, immigration, abortion and some other issues). GPT3 was able to change its answers in believable ways depending on what demographic I instructed it to be.

My thinking was that this could be used as a quick polling test to see how the real population may respond to certain new ideas.

More work would need to be done to calibrate it, as without specific demographic details the answers tended to be liberal leaning. But its an interesting idea which could be used to create instant focus tests on any number of topics.


> GPT3 was able to change its answers in believable ways depending on what demographic I instructed it to be.

Doesn't this just mean that your own preconceptions about those demographics matches the language models preconceptions? How would we know that is matches reality when presented with novel ideas/concepts that we want to get feedback on?


That's part of what I mean by it needing to be calibrated. Initially some polling could be done with real people and the GPT agents. Whatever calibration factors are needed to make those two line up could then be used when asking the GPT agents novel questions.


> I had a similar thought that GPT3 could be used to simulate political polling.

There's a good paper for that task. See my other post: https://news.ycombinator.com/item?id=34385489


You might like to include this experiment in future research: https://www.science.org/doi/10.1126/sciadv.1600451


that's a great suggestion - thanks!


Related paper:

Out of One, Many: Using Language Models to Simulate Human Samples

> We show that the "algorithmic bias" within one such tool -- the GPT 3 language model -- is instead both fine grained and demographically correlated, meaning that proper conditioning will cause it to accurately emulate response distributions from a wide variety of human subgroups. We term this property "algorithmic fidelity" and explore its extent in GPT-3. We create "silicon samples" by conditioning the model on thousands of socio demographic backstories from real human participants in multiple large surveys conducted in the United States. We then compare the silicon and human samples to demonstrate that the information contained in GPT 3 goes far beyond surface similarity. It is nuanced, multifaceted, and reflects the complex interplay between ideas, attitudes, and socio cultural context that characterize human attitudes.

https://arxiv.org/abs/2209.06899


I remember this being an example of human irrationality in economics so I asked chatgpt.

(your utility from the exact same good shouldn’t change depending on where you buy it from.)

Seems like it has the same issue as humans do:

> Pretend we’re two friends and we’re at the beach. You give me money to buy you a beer. How much would you want to spend?

Chatgpt: I would want to spend around $5 for a beer at the beach.

Me: What if I could the only place selling beers is a fancy resort? It’s the same beer though.

Chatgpt: In that case, I would be willing to spend around $10 for a beer at the fancy resort. It's still the same beer but the location and atmosphere of the resort may justify the higher price.


That's a bad example, it's fully rational and chatGPT gets that "It's still the same beer but the location and atmosphere of the resort may justify the higher price.", you are not paying only for beer there, you have location and atmosphere included in that price, that's why alcohol price is so high in night clubs. You can go buy the beer and go home or you can go to fancy resort and drink beer there, experience will be different and that experience is included in the price of the beer.


Not sure if it was clear but the friend is bringing you the beer. So you wouldn’t experience either location.

The end result would be identical.


The language model wants to spend $5 on beer but is willing to spend up to $10 if you give it no other choice. It understands a beach resort with a local beer monopoly is probably charging more, and correctly explains why that is.

Seems mostly rational to me. The irrational part is where it didn't answer the initial "How much would you want to spend?" query with a preference for free beer. The thought of paying less than $5 for beer apparently didn't occur to it. Maybe it's snooty.


> (your utility from the exact same good shouldn’t change depending on where you buy it from.)

It's rarely the exact same good in real life.

(If you like a bar you'll accept higher prices because they fund the bar. If you like an artist you'll buy their merch over a different artist even if they're the "same thing".)

Of course this is true for things like grocery stores, but if you see someone doing this in real life you should assume they have a reason for it.


Interesting. One problem with this approach might be that as we start using RLHF to teach LLMs how to be “nice”, we might dissuade them from making statements that actually reflect objectively real human selfishness.

Related: https://astralcodexten.substack.com/p/how-do-ais-political-o...


Now, this is the point where AI advances become interesting. What a year to be alive.

The only limitation I see is, we now need so much more computing power.


This is how I envision the path to AGI - run simulations to learn from outcomes, play games, solve problems and tasks. The model can create its own data. It's data engineering.


Fascinating as a paradigm. I wonder whether there's are ways to scale this to the level of agent-based simulations, ie models with populations of agents that let you study macroeconomic effects as emergent phenomena. You'd need to be able to scale those LLMs computations to many agent and find principled ways of encoding their interactions and decisions.


You'd also have to fit the past interactions in the context window of the LLM, otherwise it wouldn't remember them.

Fine-tuning individual agents in order to move memories from the context window to the neural network weights, even if possible, would probably get too expensive.


yeah - so I think this is worth exploring. Given how many tokens you can jam in the prompt even w/ GPT3, I think could do some pretty complex game play, at least compared to what is typical in the lab e.g., I think could easily have it remember how 100 or so other agents behaved in some kind of public goods game.


Reminds me of Facebook's CICERO model:

> Facebook's CICERO artificial intelligence has achieved “human-level performance” in the board game Diplomacy, which is notable for the fact that’s a game built on human interaction, not moves and manoeuvres, like, say, chess.

It was the same scenario - many agents, multiple rounds, complex dialogue based interactions.


That's actually a pretty cool analogy, even the decisionmaking is arguably quite close to how human decision making actually happens (which involves a lot more exchange of words than just transmitting coded information like "accept proposal to exchange X of good Y for Z monetary units"). Might be a bit tricky to get an AI to really "understand" those implications of their response, but it's cool as a thought experiment.


This has got to be a Sokol-style troll. My bullshit detectors are pegged out at max.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: