Hacker News new | past | comments | ask | show | jobs | submit login

Doesn't most/all react data come from xhr? Can't you just figure out how the xhr works, and simply parse that?

I did this with an investment website, where I was able to retrieve all data using simple python. It _should_ be more robust than parsing react components/html.




> Doesn't most/all react data come from xhr? Can't you just figure out how the xhr works, and simply parse that?

Content-heavy websites using React often generate static versions of pages at build time (using e.g. https://nextjs.org/docs/advanced-features/automatic-static-o...). In those cases, there might not be a public API endpoint to fetch the data you want

For applications though, it's definitely easier to just make an HTTP request if you can. However, you're more likely to run into issues like APIs blocking datacenter IPs, rate limiting etc than when it appears you're just loading the website like a human


I'd add in Postman into that workflow, especially if there's headers you need to know about which are non-obvious in the xhr url. From the network tab of your browser's debugger, copy the network request as cURL, paste the cURL into Postman's import, and then click the "code" button to translate to python (or whatever else) code.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: