Show HN: Turning a Live Video Stream into Real-Time Purchases With AI

Hi HN

We released a demo showcasing a live shopping assistant - it allows you to monitor any live stream and picks up items instantaneously so that users can search/buy them. We wanted to showcase a use case for how you can extract data from a video live and deliver value to the end user immediately.

It would be great to get the community to give feedback on what we could do to improve the demo (besides those mentioned below) as well as extend the demo to be much better. More importantly, if this sparks you to create something better please share it with us!

You can see more in depth information here:

Live Demo - (https://live-stream-shopper.cerebrium.ai/) - this is very basic right now.

Tutorial Blog post - (https://www.cerebrium.ai/blog/building-a-real-time-shopping-...)

Code here - (https://github.com/CerebriumAI/examples/tree/master/27-ecomm...)

Basic Flow:

We use a variety of tools in order to achieve the end-to-end functionality.

- Daily - We use Daily to extract video frames from the live stream every 150ms. This allows us to run object detection and our search function on each frame.

- Yolov8: We use a custom trained Yolov8 model to detect objects in our live stream that we must search for. - Turso: We store our product catalog in Turso's embedding database that searches for matches against the detected objects.

- Supabase: We use Supabase to store detected images and we store items found via our search function in a database that allows our client application to update with the results in real-time.

- Serp API (Bonus): We use the SERP API to search Google for a product match using only the image of the detected item.

- Cerebrium: The entire solution is hosted on Cerebrium serverless CPU/GPUs. It manages all auto-scaling and orchestration.

Things to improve

- Instead of only detecting a class image once and never again, I would store the embeddings of previously searched items and if an image makes it through the search criteria you can compare it to past embeddings. This means you could search multiple books, cups etc.

- Train a better model. We trained our model on 6 classes (book, shoe, watch, cup etc) and about 40 images. Depending on your use case, I would train it on a lot more data for your use case to get more accurate.

- Facebook recently released their SAM 2 model that can follow items through a video. It would be very cool to incorporate this, showing a user what items have already been searched as time progresses.

- If a user has a bad camera, the accuracy of the YOLO model is much lower. I would recommend training on a diverse set of data across different quality cameras.