Hacker News new | past | comments | ask | show | jobs | submit | mateuszbuda's comments login

You can use a USB hub (example with 20 ports: https://www.sipolar.com/product/a-805p-20-ports-usb-2-0-hub/) and attach multiple USB dongles to it. This blog post describes a setup for web scraping: https://scrapingfish.com/blog/byo-mobile-proxy-for-web-scrap...

Can I use this to run my own mobile network? Is there something like a blank SIM card which I could use for it? I don't need global coverage but is it possible to create my own BTS on a PC (with some antenna connected to it) and then have my own SIM card which I can insert into regular phone/device and have it connected to my BTS and connect to the Internet?


Yes and yes, although in most parts of the world it won't be legal when done via antennas. You can buy blank SIM cards from vendors like Sysmocom, which are preprogrammed but "writable" SIM cards. The important part is knowing the private key that is used to authenticate end user devices.

Then you'll just need a decent SDR and it actually works fairly well for small test setups.


Nitpick: in most places you need a license to do this with antennas (in some places temporary licenses aren't that hard to get for temporary/noncommercial use)


Do you have an idea what would be my external IP address? On my phone connected to a mobile network, I get assigned mobile IP address which is my external IP address. It's not attached to the SIM card because it changes when I reconnected. Is it handles by the BTS software? Do I get assigned an IP address and BTS communicates on my behalf using that address which comes from the mobile network operators pool?


The LTE core assigns an IP address to your SIM card/UE. In the case of srsRAN, there's a simple CSV file mapping the IMSI to an IP address, but there are no limits on how complicated it can get.


Yes, is simple enough to get started if you have access to the required hardware. You might able to operate in the n41 band (2.4GHz, like wifi) even without a license.

Phones are nitpicky about network configuration (chippers, emergency calling and so on). I would recommend starting with a USB modem. Also setting your network PLMN to the 00101 (the testing one), as it usually gets preferencial treatment in UEs.


N41 isn't 2.4ghz. It's licensed 2.5ghz. You're thinking N46 which is unlicensed 5ghz.

LTE didn't allow for it to be used as primary carrier, but NR-SA I think does.


This site[1] details setting up an LTE network, including programming a blank SIM.[2]

[1] https://www.quantulum.co.uk/blog/private-lte-with-limesdr-an...

[2] https://www.quantulum.co.uk/blog/private-lte-with-limesdr-an...


The hard part will be legal. You will need permission to have a radio and that often means certification of the system along with right to use the frequencies phones use.


What exactly doesn't work well? Did you consider playwright?


Content marketing - blog posts with useful content posted around the Internet. Most traffic from HN and Reddit.


Do you do much in terms of SEO on the post pages?


I don’t really know. I don’t write posts to optimize for SEO (include FAQ at the end or something like that) and hope it’s just good content people will share.

There are also SEO pages which do not have any useful content. I think I should have more of them because my competitors have only SEO pages but I don’t have time for it as I have to focus on the product and customer support. Probably a good mix between useful content blog posts (maybe with SEO filling) and strictly SEO pages is best to bring traffic.


I think that LLM costs, even GPT-4o, are probably lower compared to proxy costs usually required for web scraping at scale. The cost of residential/mobile proxies is a few $ per GB. If I were to process cleaned data obtained using 1GB of residential/mobile proxy transfer, I wouldn't pay more for LLM.


Here are some insights into how proxies are sourced: https://scrapingfish.com/how-ips-for-web-scraping-are-source...

There's also an option to build your own mobile proxy pool which gives you very good reputation IPs for web scraping and doesn't harm other people: https://scrapingfish.com/blog/byo-mobile-proxy-for-web-scrap...


Web scraping through a proxy to hide your own address is a shady business, by definition. So the "doesn't harm other people" bit is an oxymoron, IMO.


I agree that web scraping is a shady business in many cases but there is definitely a difference between setting up a few mobile proxies for yourself and using devices and networks which belong to other people without them even knowing this until they cannot access some websites because there was a bot detected in their network.


Some of the affected people somehow knows that are being used for that, some companies pay them for amount of requests they pass through. But they ignore the potential harmful consequences for them, and the consequences for the scrapped sites (from loss of performance to not be table to serve content) and the intended users for those sites.


I would also include deceptive credits systems used by SaaS which have usage-based like subscriptions. It’s a bait and switch variant. First, you think one call to the API is one credit but it always turns out that you need calls which consume 20 or 50 credits instead and you have to move to a more expensive plan and buy millions of credits every month. Second, unused credits do not roll over to the next month so your effective cost per call is orders of magnitude larger compared to what you expected.


I tried not buying food with added sugar but it’s surprisingly difficult. Here is an interesting analysis I did some time ago which shows that for half of the food items, sugar is the main ingredient: https://scrapingfish.com/blog/scraping-walmart


At https://scrapingfish.com/ we have both options, usage based https://scrapingfish.com/buy and subscriptions (monthly unlimited requests plan) https://scrapingfish.com/unlimited. Despite subscriptions being cheaper option per request, usage based is way more popular. Only less than 10% of our users have subscribed to unlimited monthly plan. I guess usage based plans give users more control over how much they spend or maybe they simply don't want to subscribe to another service.


Residential IP bandwidth is a commodity priced per GB. The rest is available OS for free. This isn't a SaaS exactly.


I’m still working on a web scraping API (https://scrapingfish.com/). For some people it’s evil bot but for others it’s enabler for public data access. I think it’s useful.


Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: