Show HN: SingleAPI – Convert the Internet into your own API

lpellis · on Oct 17, 2023

This is pretty cool, it is able to parse data out of a random pricing table somewhere in the page. It does seem to just make up data it if is not found in the page (probably expected with LLM's), I wonder if you can reduce that with some prompting, or maybe verify the data is actually present? Your schema page docs is broken https://singleapi.co/docs/schema

semanser · on Oct 17, 2023

Fixed the link: https://singleapi.co/docs/getting-data/ (the docs/schema was incorrect one). Thanks for that!

Yes, it's able to parse data out of a random pricing table somewhere on the page. Here is an exact example of how to do that: https://singleapi.co/docs/examples/scraping-pricing/

The prompt leakage is a pretty common issue that I still have to address, but ideally, it should just return empty fields for data that it couldn't find on the page.

semanser · on Oct 17, 2023

I also published a simplified version on GitHub, so you can try to self-host it. I'm really excited to see all the possible use cases for such a tool besides web scraping or data enrichment.

https://github.com/semanser/JsonGenius

KomoD · on Oct 17, 2023

What role does GPT play in this?

semanser · on Oct 17, 2023

After retrieving all the text data from the webpage (using a headless browser), GPT is used to filter out all the noise and extract the actual information requested in the request schema. Let's say you request for {"product_name": "string"}. GPT will retrieve that product name from the webpage and return the correctly formatted JSON with the fields you requested.

It works pretty similarly to GraphQL when you define a schema that you want, and the backend returns the exact data that you requested. But in this case, the data is received from the webpage that you provided.

KomoD · on Oct 17, 2023

Can the results really be trusted then? Isn't it possible for GPT to make something up that doesn't exist on the site