The hard thing is not scraping, but getting around the increasingly sophisticate...

__MatrixMan__ · 2024-08-07T00:51:22 1722991882

I wonder if we could get some legislation in place to require that they publish pricing data via an API so we don't have to tangle with the blockers at all.

immibis · 2024-08-07T01:37:32 1722994652

Perhaps in Europe. Anywhere else, forget about it.

zackmorris · 2024-08-07T14:33:39 1723041219

I'd prefer that governments enact legislation that prevents discriminating against IP addresses, perhaps under net neutrality laws.

For anyone with some clout/money who would like to stop corporations like Akamai and Cloudflare from unilaterally blocking IP addresses, the way that works is you file a lawsuit against the corporations and get an injunction to stop a practice (like IP blacklisting) during the legal proceedings. IANAL, so please forgive me if my terminology isn't quite right here:

https://pro.bloomberglaw.com/insights/litigation/how-to-file...

https://www.law.cornell.edu/wex/injunctive_relief

Injunctions have been used with great success for a century or more to stop corporations from polluting or destroying ecosystems. The idea is that since anyone can file an injunction, that creates an incentive for corporations to follow the law or risk having their work halted for months or years as the case proceeds.

I'd argue that unilaterally blocking IP addresses on a wide scale pollutes the ecosystem of the internet, so can't be allowed to continue.

Of course, corporations have thought of all of this, so have gone to great lengths to lobby governments and use regulatory capture to install politicians and judges who rule in their favor to pay back campaign contributions they've received from those same corporations:

https://www.crowell.com/en/insights/client-alerts/supreme-co...

https://www.mcneeslaw.com/nlrb-injunction/

So now the pressures that corporations have applied on the legal system to protect their own interests at the cost of employees, taxpayers and the environment have started to affect other industries like ours in tech.

You'll tend to hear that disruptive ideas like I've discussed are bad for business from the mainstream media and corporate PR departments, since they're protecting their own interests. That's why I feel that the heart of hacker culture is in disrupting the status quo.

sakisv · 2024-08-07T07:49:32 1723016972

Thankfully I'm not there yet.

Since this is just a side project, if it starts demanding too much of my time too often I'll just stop it and open both the code and the data.

BTW, how could the network request not appear in the network tab?

For me the hardest part is to correlate and compare products across supermarkets

langsoul-com · 2024-08-10T05:29:44 1723267784

If they don't populate the page via Ajax or network requests. Ie server side, then no requests for supermarket data will appear.

See nextjs server side, I believe they mention that as a security thing in their docs.

In terms of comparison, most names tend to be the same. So some similarity search if it's in the same category matches good enough.

seanthemon · 2024-08-06T23:57:43 1722988663

And you couldn't use OCR and simply take an image of the product list? Not ideal, but difficult or impossible to track depending on your method.

langsoul-com · 2024-08-07T05:18:10 1723007890

You'll get blocked before even seeing the page most times.

eddyfromtheblok · 2024-08-07T07:34:01 1723016041

Crowdsource it with a browser extension