Hacker News new | past | comments | ask | show | jobs | submit login
[flagged] I spent a year and $5,700 to see if ChatGPT can beat the market (S&P 500)
39 points by ishtiaqrahman 8 months ago | hide | past | favorite | 42 comments
Exactly one year ago, on May 16th, 2023, I started investing in the stock market using LLM-powered autonomous agents, which I named The GPT Investor. These investments were real, made with actual cash in the NYSE and NASDAQ, not just paper trading.

My goal was to determine if autonomous agents powered by Large Language Models, such as GPT-4, could "beat the market."

"Beating the market" refers to achieving investment returns that exceed the performance of a benchmark index, such as the S&P 500. It implies that an investor's portfolio has generated a higher return compared to the average market return over a specified period. This concept is often used to evaluate the success of investment strategies or the skill of portfolio managers.

I wanted the GPT Investor to compete against the SPDR S&P 500 ETF Trust, also known as $SPY, an exchange-traded fund (ETF) that aims to track the performance of the S&P 500 Index. This index includes 500 of the largest publicly traded companies in the U.S., making $SPY a popular investment vehicle for those seeking broad exposure to the U.S. stock market.

$SPY is a formidable opponent because, over the last 10 years, fewer than 10% of active U.S. stock funds have managed to outperform index funds like $SPY.

If The GPT Investor could beat $SPY, it would mean it outperforms 90% of professional fund managers.(e.g. In the last one year, Warren Buffet couldn't beat $SPY)

Methodology:

I experimented with various methods and technology stacks to generate stock recommendations, meticulously documenting and reporting the results to our subscribers. The common thread among these methods was:

-Using OpenAI's LLMs (GPT-3.5 and GPT-4.0) -Enabling the LLMs to autonomously search the web -Generating stock recommendations for specific durations, such as three months

I utilized platforms like ChatGPT, Godmode, and BabyAGI UI to generate the stock recommendations. Each of these platforms can perform multi-step reasoning, a crucial attribute of autonomous agents. For example, based on a prompt, the agent can create its own to-do list and independently execute the steps to arrive at a result.

I conducted 19 experiments, investing CAD $300 in each, for a total of CAD $5,700 invested. I used a Wealthsimple brokerage account to execute the trades. Since each stock recommendation had a specific duration, I closed the positions at the end of each duration and compiled the returns as part of The GPT Investor portfolio.

For every experiment I ran, I published the entire methodology (tech stack, prompt, LLM, etc.) and results on this platform—The GPT Investor (www.gptinvestor.co)

Results: -Total Invested: CAD 5,700 -Number of Experiments: 19 -Shortest Experiment Duration: 7 days -Longest Experiment Duration: 1 year -Number of Stocks Recommended by The GPT Investor: 31

Total Return:

-The GPT Investor: 11.54% (CAD $658.20) -$SPY: 8.89% (CAD $507.10)

Overall, The GPT Investor Portfolio return was approximately 29.78% better than the $SPY's return.

Number of experiments by LLM

-GPT-4: 11 experiments -GPT-3.5: 8 experiments

The average return for the two LLMs used by the GPT Investor is as follows:

-GPT-4: 15.54% (with a corresponding average $SPY return of 9.74%)

-GPT-3.5: 6.05% (with a corresponding average $SPY return of 7.74%)

*GPT-4's return was approximately 156.86% better than GPT-3.5's return.

The results raise the exciting possibility that as LLMs become more powerful, the returns of The GPT Investor should improve even further.

I publish the real-time status of The GPT Investor here: https://www.gptinvestor.co/the-gpt-investor/




It was easy to beat spy over this period by investing into, say, a Nasdaq index.

What were max drawdowns on spy and your experiment?

If yours was smaller, then it's interesting, because it means higher returns with lower volatility.

* Also, looking at SPY data it looks like its 1-year return from today was 26.05% rather than the 3x smaller number you gave. Even considering that you operate in CAD that's quite a discrepancy...


I am comparing SPY's return with the GPT return for the exact timeframe during which a particular position was recommended by the GPT Investor. For example, some positions were held for only 7 days, while others were held for 3 months, etc. I generated 19 different sets of recommendations with varied timeframes and at different points in the last year. This link, where I track everything, should clarify any confusion: https://www.gptinvestor.co/the-gpt-investor/


Very interesting that this works. And great to see experiments like this which open their methodology.

Looking at the status page, https://www.gptinvestor.co/the-gpt-investor/

- GPT version has been better than benchmark 8 times, while the benchmark won 11 times.

- GPT return was negative in 5 experiments and 0 times for benchmark.

- GPT invests in a quite small selection of 1 or 3 stocks (mostly tech).

- The GPT outperformance comes from one 100% bet on NVIDIA that is still ongoing (+140%).

I like my blood pressure a bit more steady to start just yet with this method.


Why am I not surprised that the AI is going all in on the company that produces the hardware that it runs on.


I also noticed that. It recommended NVIDIA 5 times.


haha the thing is that I got better at prompting as I ran more and more experiments.

For example, in the beginning I was asking GPT to recommend me 3 stocks at a time, which was not a good idea. I revised the strategy to ask for only 1 stock recommendation at a time.

Also, GPT-4 was MUCH better at it than GPT-3.5 which makes sense and is also very exciting.


FYI. For those of you not on twitter, this guy buys a lot of ads with the above story

Not sure what his business model is


Seems that he’s trying to sell GPT investor as a service.

https://www.gptinvestor.co/#/portal/signup


I think you're confusing me with someone else. I do not buy any ads on Twitter. Maybe I should! What do you think?

I do charge a tiny subscription fee for full access to all my research to fund further research on this.


I think this is the main one I am thinking of:

"I gave GPT-4 a budget of $100 and told it to make as much money as possible"

article here:

https://mashable.com/article/gpt-4-hustlegpt-ai-blueprint-mo...

Apologies if this isn't you.


Nope that's not me. I think they abandoned their project.


Build following, down the track offer to sell picks on private discord or a course.


thanks for the suggestion. I am still new to the Discord game. TBH, if it continues to be this successful I'll start a hedge fund or something.

I appreciate your encouragement. I'll continue the effort!


Interesting experiment. Would like to know a bit more about the NVDA bet, and how removing that would have impacted the performance...

In any case, added to my long list of GPT achievements:

https://docs.google.com/spreadsheets/d/1O5KVQW1Hx5ZAkcg8AIRj...


Form a quick glance at your websites the results only show performance and no volatility, which is only half the story.

Also, GPT seems to be mostly picking tech stocks; considering the fact that average performance of AAPL alone has been higher than SPY for the last 1/3/5/10 years, this result seems to be pretty much what one would expect …


Lol what more do you want? The experiment was to beat the S&P 500. You are dismissing the results because ... it chose stocks that would do that?


I think it‘s only fair to be critical since there is a product attached to this experiment


You don't really need an LLM to beat SPY by choosing the largest four holdings of SPY. Plenty of passive funds already do something like that. It's a fun experiment, but don't misinterpret it to mean the LLM had a unique insight.

To frame it another way: When the best returns were from investing in the company that makes the GPUs that run the LLM that you're asking for stock picks from, did you really need the LLM?


What is your theory on why an LLM would be good at investing? What about it makes it better than a fund manager?


GREAT QUESTIN!

My theory is that since LLMs (Large Language Models) are the ultimate generalists when it comes to knowledge, they "know" a decent amount about every single topic and concept known to the entire human race. I do not believe there is a single human alive today who possesses knowledge about so many different subjects. For example, a physicist may know 100 times more about physics and mathematics than an LLM, but the LLM probably knows a decent amount about 10,000 more disciplines (like plant biology) that the physicist has little to no understanding of.

I believe this multidisciplinary ability makes LLMs uniquely qualified to pick stocks based on the millions of variables that may impact stock prices.

For example, one of the most successful strategies I deployed was when I asked GPT-4 to come up with the attributes of a hypothetical "most investable stock on the NYSE and NASDAQ based on current market conditions." Once it generated the attributes, I used another instance of GPT-4 to find me a stock that matched these attributes. It came up with NVIDIA. The actual prompt I used was more detailed, but you get the idea.


So… an LLM doesn’t actually “know” anything. It’s not actually calculating any of the variables that affect stock prices.

Have you double checked GPT-4s work? The stock most matching the pre-defined attributes really was Nvidia?


The 1-year return on $SPY is 26%, or 67% better than your GPT-4 return (https://www.google.com/finance/quote/SPY:NYSEARCA?window=1Y). I think you've been fooled by randomness over very short intervals.


You misunderstood my research, I think my wording wasn't clear enough. Let me clarify.

I am comparing SPY's return with the GPT return for the exact timeframe during which a particular position was recommended by the GPT Investor. For example, some positions were held for only 7 days, while others were held for 3 months, etc. I generated 19 different sets of recommendations with varied timeframes and at different points in the last year. This link, where I track everything, should clarify any confusion: https://www.gptinvestor.co/the-gpt-investor/


This is basically tea-leaf reading on your part - a dart board might give similar (or better!) results if tested in such a fashion, and it doesn't change the overall result that just putting your money in $SPY over the whole duration of the experiment was still a much better strategy.


For folks who don't know much about finance - this is as bullshit as it gets.

Leaving aside the complete lack of any valuable statistic and the way too short period of time to evaluate 'investments', it's very easy to generate this kind of returns - just start 5 experiments like that, keep the 1 that 'works' best (thanks to pure random luck) and advertise. This is also sadly how some parts of finance work.


If ChatGPT or llama were trained to subtly invest in Microsoft and Meta stock respectively, expecting this use case, I'd actually respect it. :D


I think this would count as market manipulation if it can even be proved.


Now do this for a timeframe that it makes sense like at least 10 years and write a blog post on how it lost to SPY.


I am starting to use longer timeframes, such as one year. I consider this a long-term experiment and will continue my research. Based on the improvement in returns I have seen from GPT-3.5 to GPT-4, my hypothesis is that it will continue to outperform $SPY as the models become smarter. For reference, GPT-4's return was approximately 156.86% better than GPT-3.5's return. GPT-3.5 didn't actually beat $SPY.


If I’m reading that correctly there’s some huge drawdowns!


Yes. GPT-3.5 made some terrible calls and didn't really beat $SPY. GPT-4 turned it around for team LLM.

The average return for the two LLMs used by the GPT Investor is as follows:

GPT-4: 15.54% (with a corresponding average $SPY return of 9.74%) GPT-3.5: 6.05% (with a corresponding average $SPY return of 7.74%)


Sorry but LLM's are not even good at math so I don't understand how they can beat index if they're a lot worst then avg STEM graduate.


Math isn't a requirement. There are plenty of examples of a dart-based portfolio (sometimes thrown by monkeys) beating or matching a beskpoke, artisanal, investment professional stock picks.


What a great description of decisions driven by LLMs.


even so, unless I misunderstand... LLM are most likely to pick what they've seen the most, ie: what is the most common recommendation it's seen during training; barring rlhf anyway


I published the all the methodologies and data, do take a look if you're curious.


Very cool!


thank you!


interesting


Thanks! I know a lot of people wondered if it was possible.


Now, please repeat the experiment using $25,000.


I am definitely going to continue the experiments. The ROI improvement from GPT-3.5 to GOT-4 is very encouraging. I also got better at prompting.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: