More

techn00 · 2025-04-12T11:52:01 1744458721

I have a small go binary that uses caddy and dns-sd on mac to have any kind of domain names on my local network (uses mdns) with https. Really nice for accessing websites from my phone.

https://github.com/DeluxeOwl/localhttps

techn00 · 2025-02-27T08:09:21 1740643761

I love single binary apps, easy to deploy.

hamdouni · 2025-02-27T09:50:27 1740649827

It still needs postgres and redis

techn00 · 2025-02-20T15:47:44 1740066464

My ideal k8s dev env (I wonder if any of the tools do this):

- local on my machine.

- ingress with https + subdomains integrated with mDNS (so I can access the services easily from my phone when developing mobile apps). mDNS also makes sure that other devs can set it up locally for themselves.

- easily swap what I'm working on, if I have 3 services A, B, C, while I'm working on A locally, I want B and C to run in the cluster and to be able to interact with them, same if I'm working on B, A and C should run in the cluster.

jcollins · 2025-02-20T16:14:20 1740068060

Tailscale Operator for Kubernetes sounds like it'd fit your second bullet point. It's has a really good experience. I've only used for my person homelab but I've been more than impressed by it.

techn00 · 2025-02-20T16:50:38 1740070238

It could however I specified mDNS so other developers won't be required to use tailscale (or to run a dns server)

jcollins · 2025-02-20T17:06:23 1740071183

Fair. Making https work with mDNS seamlessly sounds like work (i.e. local CA would be needed I think). It would make things nice though.

craftkiller · 2025-02-20T23:25:30 1740093930

Instead of mDNS, they could update a DNS record for a subdomain (techno00.dev.thecompany.com, preferably under a different domain than your real one) to their local IP address and then do the DNS-01 challenge on LetsEncrypt to get a valid TLS cert for the subdomain. Then the only problem is some routers block DNS responses with RFC-1918 IP addresses, but everyone is using DoT/DoH by now, right? ... right?

nucleardog · 2025-02-20T20:16:00 1740082560

I have tools I've developed that do some/most of this, but they're internal/proprietary so I can't share them directly. What I _can_ share is how it works. Maybe somebody with more time/energy/will to live than me can take a crack at the problem.

Every developer runs Rancher Desktop as a local k8s cluster.

There's a controller + nginx container in the cluster.

For any appropriately annotated ingress the controller's mutatingwebhook patches the ingress to be backed by its own proxy service. It then reaches in and reconfigures the cluster's CoreDNS to resolve the domain to its proxy service as well.

Then as pods are started/stopped, it tracks whether the service your ingress was supposed to point at has any running pods behind it. If it does, it configures nginx to forward the request to the local service. If it doesn't, it configures nginx to proxy it to the upstream URL.

That all comes together in three main ways:

1. We start chromium with, among other things, a flag to set host rules that rewrite all connections to 127.0.0.1. So going to `oursite.com` loads our site through your cluster. API requests the page makes to `service.oursite.com` get routed through the local cluster.

2. Any requests your containers make to other services can request `oursite.com` and because of the CoreDNS stuff they'll hit the local proxy and get routed appropriately.

3. ... And for anything else we just have a real `localdev.cloud` domain with a wildcard subdomain that resolves to 127.0.0.1 and include that host on all the ingresses as well. So Postman can hit a service at `service.localdev.cloud`.

This puts us in a good place to "easily swap". There's a pile of bash scripts calling themselves a justfile that manages most of this. Run `system start` to bring up the controller and proxy as well as deploy all of the ingresses and services (so it knows what it's rewriting/proxying). Then you just do `project whatever up` and `project whatever down` to create/destroy the deployment and other resources. Mounting the project code into the container is `project whatever mount`--this is a separate step so in a situation where, e.g., a FE guy wants to test a specific BE build he can just throw the container tag in a .env and start it up and keep working on what he was working on. (And the QA can just start up any build of anything without any extra fuss.)

As for SSL, we're mostly solving for "on the same machine". The controller generates a root certificate on first start (so every developer has their own and it couldn't be used to intercept anyone else's traffic), then uses that to issue + sign certificates for everything else. You could add that to any other devices if you wanted. What we do is just slip chromium an extra flag to tell it to treat our certificates as valid.

So I can run a just command to open up chromium with the novel-worth of extra flags, go to `https://oursite.com`, and everything works. If there's a specific BE service I need to poke at, I just `project my-service up` and go back to chromium and keep doing stuff. If I wanted to make some FE changes, `project my-fe up && project my-fe mount` and start editing code.

There's a lot more to all of this but this comment's already way too long (but feel free to ask and I can talk your ear off). End of the day, though, we went from it taking like 1-2 days to get people to a point where they could start editing code to (and I tested this when a computer died one day) 45 minutes--most of which was waiting for rancher to download and then starting it and waiting for it to download more stuff. Went from it taking a day to get some rarely-touched legacy service limping along enough that you could debug something to it being consistently a single command and like 30 seconds. Went from spending a bunch of time reconfiguring all the services for each combination you might try and run to... just not anymore. Bugs/issues/misalignments getting caught much earlier because turns out making it easy to actually run the software together means people will do it more.

I have most of what you're asking for and you're definitely on the right track--it's a way nicer way to live.

techn00 · 2025-01-05T16:33:53 1736094833

the implementations of getFileContext() and shouldStartNewGroup().

techn00 · 2024-12-21T05:50:39 1734760239

> Transactional custom mutators that allow complex logic and server-side specific behavior are coming in beta.

It seems right now it only supports CRUD-style logic? Suppose I have a Go backend service (or any other language, really). Creating and exposing some pg tables for the read only path is something reasonable if I want to integrate with zero.

But what about the write path in such an app? Where you have some business logic that doesn't involve simple CRUD, will (or does) zero support that with custom mutators?

edit: I didn't see this notion page which address this https://replicache.notion.site/Introducing-Zero-8ce1b1f184aa...

aboodman · 2024-12-21T10:57:40 1734778660

Yep - Notion doc answers your question.

techn00 · 2024-12-03T13:34:45 1733232885

> This is currently $19/mo plus search API cost (2.5c per search). Please reach out to support@kagi.com for invite.

Isn't 2.5c per search kind of expensive? serp api is 1 cent for the $150/month = 15k searches plan

cube2222 · 2024-12-03T13:51:50 1733233910

Isn't Serp using public Google search for free and effectively selling you "we will bypass the captchas that will inevitably come up"?

Kind of doesn't make sense to compare to them.

Kiro · 2024-12-03T14:20:09 1733235609

How is Serp not sued to oblivion?

crowcroft · 2024-12-03T14:42:09 1733236929

I suspect that Google finds value in the role they play in the market.

There is demand for search APIs, and companies like Kagi can build a business around that, grow and then compete more generally with Google over time. Serp makes that difficult.

For competitive reasons Google might not want to sell a search API directly (they might indirectly fuel a lot of competition against their main ad supported product). So letting Serp offer this service in a bit of a gray area makes it hard for competitors to form a beach head in search, while giving Google legal flexibility to shut down any service that tries to compete with them in any way through Serp's data.

kingstoned · 2024-12-03T14:49:18 1733237358

Sued for what? For scraping a website? That is Google's main business.

dooglius · 2024-12-03T14:57:43 1733237863

https://serpapi.com/blog/scraping-public-pages-legality/

nielsole · 2024-12-03T13:47:22 1733233642

Serp doesn't pay for any of the search infrastructure. So it's an unfair comparison

bambax · 2024-12-03T14:40:03 1733236803

It's completely fair to compare prices when you're on the buying side.

It's also strongly recommended when you're on the selling side, as you should be prepared to explain what added value justifies that your product is so much more expensive.

echoangle · 2024-12-03T15:20:18 1733239218

Why would I care about that as a customer though? If I compare options, I look at price and performance. If one is making it harder for themselves without any added value for me, why would I pay more money for that?

ziddoap · 2024-12-03T14:33:33 1733236413

From a customer perspective, it's completely fair to compare prices.

marginalia_nu · 2024-12-03T15:11:52 1733238712

Given the economics involved (likely a fixed cost per query), I think it makes business sense to try to get few customers to pay a lot to get a lot, much more so than having many customers that pay a little and get results on par Google's free offering.

High end "boutique" search offering a refined search experience (at high computational cost) is a niche Google search can't compete with, since they're offering their search for free, and they'd take massive losses if they drastically increased the amount of compute per query.

LorenDB · 2024-12-03T13:56:06 1733234166

Brave is even cheaper, at 0.3 cents/request with a monthly limit of 20M searches (for the base plan).

yawnxyz · 2024-12-03T13:57:23 1733234243

never used Brave or Kagi, but comparing their search results, Kagi seems way more on-point in the first 10 results

xnorswap · 2024-12-03T13:47:36 1733233656

It does seem expensive, but if it wasn't, then there'd be more temptation to simply white-label Kagi via it's own API while undercutting Kagi's own plans.

dotancohen · 2024-12-03T13:51:17 1733233877

That's actually not so bad. That company reselling Kagi search is also replacing e.g. Kagi support, so there is less pressure on Kagi staff.

And honestly, anything that gets more people into the idea of using not-Google for searching is good for Kagi - even if it is to an ostensibly competitor.

llm_trw · 2024-12-03T13:43:43 1733233423

Yes, prohibitively.

techn00 · 2024-12-03T08:30:09 1733214609

side question: what do you use to make the diagrams?

techn00 · 2024-12-01T17:04:08 1733072648

It's amazing what you can do with S3. It's one of the best things that AWS has to offer.

I wonder, is there a formal definition for a set of primitives that allow you to build an ACID database? Assume an API of some kind (in this case, S3) that you can interact with - and provides, I don't know, locks, % durability, etc.

What would make you say, 'Having those primitives, I CAN build an ACID database on top of it'?

justincormack · 2024-12-01T17:27:36 1733074056

You basically just need a log [1].

Sirupsen goes through the basics in [2] (video)

I did a talk about building on object storage with lots of examples [3] (video) [4] (slides with links)

[1] https://engineering.linkedin.com/distributed-systems/log-wha... [2] https://www.youtube.com/watch?v=RFmajOeUKnE [3] https://www.youtube.com/watch?v=ei0wwTy6_G4 [4] https://static.sched.com/hosted_files/kccncna2024/c8/Object%...

ComputerGuru · 2024-12-01T17:45:46 1733075146

Having a consistent log is sufficient with an atomic compare operation is sufficient for a distributed database but its performance will be extremely questionable. CAS is always the slow step, in this case, pathologically slow. The magic is to do whatever you can to avoid it until absolutely necessary. The availability of consistent, ordered, synchronized timestamps across all nodes is something most distributed databases require as a prerequisite. How you handle violations of that (and to what degree of accuracy you can rely on it) make a considerable difference.

Depending on how you structure the underlying pages, you’ll get to decide how availability at the log level translates to availability in your user/app-facing interface and whether you will end up sacrificing consistency, availability, or partition tolerance.

Basically, S3 with its recent consistency guarantees and all-new CAS support is sufffiicent in-and-of itself. But for anything other than the most basic (least amount of data, lowest frequency writes, etc) you’ll need a considerable amount of magic to make it useable.

The most straightforward approach would be to use the existing whole of another database but swap out the backend and then tweak the frontend accordingly. SQLite lets you use custom vfs providers (already used to provide fairly efficient SQLite over http without serving the entirety of the database, but previously not for writes) and with Postgres you can use foreign data wrappers. But in both cases you’ll basically have to take out a lock to support writes, either on a page or a row (either risk lots of contention or introduce a ton of locking and network latency overhead).

ethegwo · 2024-12-01T17:14:00 1733073240

We are building tonbo: https://github.com/tonbo-io/tonbo , An embedded KV database allows to use S3 as storage backend, and we are trying to implement SQLite virtual table on it: https://github.com/tonbo-io/sqlite-tonbo a real pay-as-you-go DB.

grapesodaaaaa · 2024-12-01T17:18:04 1733073484

That’s really cool! I’m personally really interested in serverless DB offerings. I’m not sure if yours scales well, but I always seem to hit the limits of a single RDBMS instance at some point as a product matures.

There are plenty of ways to scale out traditional RDBMS, but serverless offerings make it so easy to scale out.

ethegwo · 2024-12-02T12:44:02 1733143442

Thanks, easy-to-scale is the first thing we consider, also using S3 as a shared storage service makes architecture easy to achieve this.

arianvanp · 2024-12-01T17:23:17 1733073797

This blog post is great in explaining this: https://blog.mbrt.dev/posts/transactional-object-storage/

n00j · 2024-12-01T17:12:19 1733073139

I would guess something like Apache Iceberg would be something close to this? https://iceberg.apache.org/

Is a table format which can be use via trino, spark, flink, java apis, pything API?

rad_gruchalski · 2024-12-01T17:05:56 1733072756

Wake me up when s3 supports wrie at offset. Until then it’s all gimmicky. Writing small objects and retrieving them later is very inefficient and costly for large data volumes. One can do roll ups, sure, but with roll ups there’s no longer a way to search through the single rolled up file. One needs some compute to download the complete file and process it outside of s3.

ncruces · 2024-12-01T17:28:48 1733074128

S3 can at least do a multi-part upload, where any given part is a copy of a range of an existent object. Then you can finish the upload overwriting the previous object.

GCS, unfortunately, does not support copying a range. OTOH, it has long supported object append through composition.

The challenge with both offerings is that writes to a single object, and writes clustered around a prefix, are seriously rate limited, and consistency properties mostly apply to single objects.

rad_gruchalski · 2024-12-01T17:37:34 1733074654

Yeah, but you cannot multipart single chunk into a larger complete file. You need all chunks one way or another. Multipart upload starts and ends from all chunks. GCS and Azure support this too. S3 does a maximum of 1k objects,

GCS 32 objects, and Azure blob storage, afair 5k objects. Both can do an operation similar to what you described for S3 with various alternatives of read at offset + length and rolling those up.

In all cases, you end up always rolling up into a new key that isn’t available for read until roll up is done. It’s kinda useless for heavy write scenario.

Compare that to your normal fs operation. Write at offset to an existing file with size smaller that offset will just truncate the file to the offset, and continue writing.

ncruces · 2024-12-01T18:57:01 1733079421

You can?

You create a multi-part with 3 chunks: the 1st part is a copy range of the prefix, the 2nd part the bit you want to change, and the 3rd a copy range of the suffix?

And yes, all of this is useless for heavy (and esp. concurrent) writes.

rad_gruchalski · 2024-12-01T19:10:08 1733080208

We both said the same thing. You kinda can but cannot. Yes, you can replace some part of an existing object but you cannot resize it, not can you do anything parallel with that. So you kina can but cannot. And this trick will work in gcs and azure, here you have to move the new object to an old key yourself after the roll up. But why not while you’re already at it.

ncruces · 2024-12-01T20:58:40 1733086720

You can do it “in place” as the target can be the same as the source. And you can definitely resize it, both truncate it and extend it. The only restriction, really, is that all parts except for the last one need to be at least 5MiB.

GCS compose can also have target be one of the source objects, so you can append (and/or prepend) “in place.”

For GCS compose the suffix/prefix need to be separate visible objects (though you can put a lifecycle on them). For multipart, the parts are not really objects ever.

The performance isn't great because because updating the “index” is slow and rate limited, not because the APIs aren't there.

MadsRC · 2024-12-01T17:11:37 1733073097

It’s actually surprisingly efficient if you batch writes at the expense of some added latency. The WarSyream team found that batching into chunks of either 4MB of data or 250ms was optimal.

Downside is the 250ms latency. But then again, a fair amount of workloads can deal with 250ms of latency.

rad_gruchalski · 2024-12-01T17:38:57 1733074737

Kafka does batch reading anyway. Have a look at the reader, it’s a loop with read and read timeout. Usually a 100ms per single loop iteration.

ethegwo · 2024-12-01T17:11:27 1733073087

Random reads and sequential writes are enough to build a log-structured database, but S3 does not really support the latter.

rad_gruchalski · 2024-12-01T17:42:57 1733074977

One can do sequential writes by simply writing to a chunk objects at a key with the offset in the name. For example, sizes in MBs:

tmp/uploads/object-0, tmp/uploads/object-1024, tmp/uploads/object-2048

would be a rolled up object of size 2048MB + whatever is in the object-2048 file.

techn00 · 2024-11-09T09:01:47 1731142907

It doesn't, you cant find any reference to the captcha solving or anything like that in the repo. There's one line for the stealth plugin that is commented out:

// chromium.use(stealthPlugin());

imo this is the hardest part about scraping, evading bot detection and captchas.

edit: and keeping the scraping logic & rules up to date

krsma · 2024-11-09T14:37:49 1731163069

Hi, creator here.

So our open-source version does not provide cloudflare bypass or captcha support. It is impossible to have a robust system for the same completely FOSS. But we do have it available in our cloud version (which we launch soon, currently in testing). Our open-source version allows you to BYOP (Bring Your Own Proxy) to handle all the bypassing. Our OSS version is being used by users with small to medium scraping needs :)

All of this is mentioned in our README.md.

techn00 · 2024-10-18T07:16:12 1729235772

I expected cheese