More

krab · 2025-01-10T20:53:05 1736542385

I don't think so. This is true for people writing cpython. But if you implement custom `__hash__` method, you should be fine.

krab · 2025-01-10T20:50:01 1736542201

Because -1 is not equal to -2. They will fall into the same bucket of the hash map, but the hash collisions are expected and must be accounted for.

krab · 2024-12-18T20:11:58 1734552718

Oh, this format was fun. You could see history unfold when parsing it. The messages I parsed were ISO-8583 with ~EBCDIC~ no, BCD. But one field contained XML. And the XML had an embedded JSON. The inner format matched the fashion trend of the year when someone had to extend the message with extra data. :-)

lxgr · 2024-12-18T21:17:53 1734556673

> The messages I parsed were ISO-8583 with ~EBCDIC~ no, BCD.

The "great" thing about most ISO 8583 implementations I've worked with (all mutually incompatible at the basic syntactic level!) is that they usually freely mix EBCDIC, ASCII, BCD, UTF-8, and hexadecimal encoding across fields.

ekmartin · 2024-12-18T20:13:39 1734552819

Fascinating, I don't think I've ever seen an XML field! Do you remember which network that was for?

krab · 2024-12-18T20:19:36 1734553176

We were the issuer. So these were probably the payment processor's extensions. But we were issuing MasterCards.

krab · 2024-11-20T16:56:09 1732121769

In practice, it works the same.

Technically, there are two kinds of copyright - I'll translate loosely from the Czech law, I'm not sure about the exact English equivalent. There are "person" rights and "property" rights. You can never rescind your "person rights". But that only means that no one can claim you're not the original author. That's about it. You can transfer "property rights" via licensing as you wish. The license can be exclusive and you can give a right to further transfer or sublicense the work to the licensee.

Also, each work is copyrighted by default. You're not allowed to use something that you just found on the internet if you're not granted a license.

krab · 2024-11-12T20:11:30 1731442290

Just shooting without knowing your real needs - take this with a grain of salt.

Store some parsed representation that makes it easier for you to work with (probably normalized). Keep an archive of raw data somewhere. That may be another column, table or even S3 bucket. Don't worry about schema changes but you need to not lose the original data. There are some pitfalls to schema migrations. But the schema should be the representation that works for you _at the moment_, otherwise it'll slow you down.

krab · 2024-11-11T12:49:26 1731329366

Regarding Python+Django vs. Go, I have an opposite experience. Concurrency is handled much better in Go but that's about it. ORM, web framework, any library is generally worse. On top of that, I miss my exceptions.

krab · 2024-10-11T17:08:30 1728666510

Yes. That's a risk assessment every company must make. What's the probability of downtime vs the development slowdown and the operating costs of a fully redundant infrastructure?

I worked for a payments company (think credit cards). We designed the system to maintain very high availability in the payment flow. Multi-region, multi-AZ in AWS. But all other flows such as user registration, customer care or even bill settlement had to stop during that one incident when our main datacenter lost power after a testing switch. The outage lasted for three hours and it happened exactly once in five years.

In that specific case, investing into higher availability by architecting in more redundancy would not be worth it. We had more downtime caused by bad code and not well thought out deployments. But that risk equation will be different for everyone.

krab · 2024-09-24T06:33:04 1727159584

That's a pretty much orthogonal issue if you compare dedicated servers to, say AWS EC2. There aren't that many tasks you would have to perform on bare metal that you don't have to do in EC2.

PaaS such as Heroku or Vercel, that's another story.

krab · 2024-09-24T06:29:48 1727159388

... and we were doing serverless and build less before it was cool! It was called "shared PHP hosting". :-D

immibis · 2024-09-24T07:32:56 1727163176

It was pointed out recently on HN that Lambda is basically CGI-as-a-service. (CGI being a more language-agnostic spec that operates similarly to PHP)

hooverd · 2024-09-26T17:49:13 1727372953

The real value-add from Lambda is integration with the rest of the AWS ecosystem. It doesn't even scale up without breaking the bank. But it does scale down to zero. And can process events from EventBridge, DDB, S3, SQS, SNS, and every AWS service under the sun. It's really good at that.

If you want to run an API, slap a Go executable on an EC2 or ECS instance.

sirsinsalot · 2024-09-24T09:24:46 1727169886

Lemme just login to cPanel and check that

krab · 2024-09-23T06:54:25 1727074465

A bit less terrible way in my opinion:

Find a dedicated server provider and rent the hardware. These companies rent some part of the datacenter (or sometimes build their own). Bonus points if they offer KVM - as in remote console, not the Linux hypervisor. Also ask if they do hardware monitoring and proactively replace the failed parts. All of this is still way cheaper than cloud. Usually with unmetered networking.

Way less hassle. They'll even take your existing stuff and put it into the same rack with the rented hardware.

The difference from cloud, apart from the price, is mainly that they have a sales rep instead of an API. And getting a server may take a from few hours to a few days. But in the end you get the same SSH login details you would get from a cloud provider.

Or, if you really want to just collocate your boxes, the providers offer "remote hands" service, so you can have geo-redundancy or just choose a better deal instead of one that's physically close to your place.

vidarh · 2024-09-23T07:57:24 1727078244

This. I used to colo lots of stuff, but now mostly use Hetzner. But there are many in this space, and some of them even offer an API. And some of them (like Hetzner) also offer at least basic cloud services, so you can mix and match (which allows for even steeper cost cuts - instead of loading your dedicated hardware to 60% or whatever you're comfortable with to have headroom, you can load it higher and scale into their cloud offering to handle spikes).

The boundary where colo and dedicated server offerings intersect in price tend to be down to land and power costs - Hetzner finally became the cheaper option for me as London land values skyrocketed relative to their locations in Germany, and colo prices with them. (We could have looked at coko somewhere remote, but the savings would've been too low to be worth it)

rsanheim · 2024-09-23T07:32:57 1727076777

One hurdle that many companies who have only known cloud hosting will face here is significant: how do you find a reliable, trustworthy datacenter? One who actually monitors the hardware and also has a real human if your networking access gets screwed or if you need a critical component swapped at 2 am on a Saturday.

I used to have a short list of trustworthy companies like this I'd recommend to clients ~20 years ago when doing consulting. I think 3/4 of them have been gobbled up by private equity chop shops or are just gone.

Nowadays noone gets fired for going with AWS, or resold AWS with a 100% markup from a 'private enterprise cloud' provider.

jareklupinski · 2024-09-23T13:24:45 1727097885

> how do you find a reliable, trustworthy datacenter?

drive to a few, and shake some hands. in my exp, the difference between colos is usually "actual SOC2/ISO compliance" on one side, and "there are no locked doors between the parking lot and my rack" on the other, with not much in-between that's not for some specialty (radio), and these things can only really be seen for yourself

hedora · 2024-09-23T15:08:44 1727104124

That’s unfortunate. I consider SoC 2 compliance as a negative indicator of security (I’ve been on the vendor side of it, and have seen it have significant negative impacts on security and reliability in multiple organizations).

Ideally, there’d be locked doors, and the data center wouldn’t be subsidizing performative checkboxing.

wadadadad · 2024-09-23T22:13:20 1727129600

I'm curious, can you tell us what the negative impacts you've seen are? Are there any audits that you can say are a positive indicator for security?

agartner · 2024-09-23T22:32:50 1727130770

Spending all their security time/budget on box checking rather than actual security.

I'd rather see open ended red team pentest reports.

ensignavenger · 2024-09-24T20:01:56 1727208116

This is my complaint with "cyber insurance". Companies spending money on insurance premiums and checklists for the insurance company rather than spending money on security.

phil21 · 2024-09-24T13:49:31 1727185771

Yep. My experience as well. Once a place starts doing useless box checking stuff like SOC2 it’s time to find a new job or switch vendors.

Positive indicators would be talking to employees and getting an idea of organizational clue level. There are no shortcuts here I’ve ever found beyond doing this sort of old fashioned “know your vendor” style work.

thomasjudge · 2024-09-24T05:13:26 1727154806

I remember visiting a colo in the valley in the early aughts that had both of these: all the biometric-man trap-security drama in the front door, and in the back, garage doors & loading docks wide open to the Santa Clara breeze...

threeio · 2024-09-24T22:13:56 1727216036

I had a similar mid 90's tour at a major provider in Washington DC (Ashburn) that had this exact problem, mantraps everywhere, great facility for the generation and then they took me out the back door to show me the basketball court in the back parking lot and no alarm went off.... Considering root dns servers, yahoo, hotmail and other majors of the day were running there, it was... sketchy at best.

chpatrick · 2024-09-23T09:39:19 1727084359

I think if you want to host in Europe then Hetzner is the clear choice. They won't monitor your hardware for you though, you need to let them know if something breaks and they'll replace it very quickly.

krab · 2024-09-23T07:47:20 1727077640

You're right you need to find a company you can trust.

And for a lot of startups it really makes sense to use AWS. But if you do something resource or bandwidth intensive (and I'm not even talking about Llama now), the costs add up quickly. In our case, switching to AWS would increase our costs by an equivalent of 4 - 8 devs salaries. After AWS discounts. That's a hard sell in a 15-person team even though half of our infra costs already are with AWS (S3).

vidarh · 2024-09-23T08:05:29 1727078729

I often recommend a "embrace, extend, extinguish" approach to AWS: Starting there for simplicity is fine, then "wrap" anything bandwidth intensive with caches elsewhere (every 1TB in egress from AWS will pay for a fleet of Hetzner instances with 5TB included, or one or more dedicated servers).

Gradually shift workloads, leaving anything requiring super-high durability last (optionally keeping S3, or competitors, as a backup storage option) as getting durability right is one of the more difficult things to get confidence in and most dangerous ones to get wrong.

Wrapping S3 with a write-through cache setup can often be the biggest cost win if your egress costs are high. Sometimes caching the entire dataset is worth it, sometimes just a small portion.

amluto · 2024-09-23T13:49:14 1727099354

> Wrapping S3 with a write-through cache setup can often be the biggest cost win if your egress costs are high. Sometimes caching the entire dataset is worth it, sometimes just a small portion.

Is there an off the shelf implementation of this?

vidarh · 2024-09-23T14:31:11 1727101871

I usually just use a suitable Nginx config unless there's a compelling reason. It means you "waste" the first read - you just let post/put etc. hit S3, and just case reads, but it's easier to get right. It's rare this matters (if your reads are rare enough relative to writes that avoiding the cost of the first read matters, odds are the savings from this are going to be questionable anyway - the big benefit here comes when reads dominate massively)

mbreese · 2024-09-23T14:06:30 1727100390

Minio used to support this, but removed the feature a while back. It was called “gateway mode”. Sadly, I know that doesn’t help much now…

https://blog.min.io/deprecation-of-the-minio-gateway/amp/

amluto · 2024-09-23T18:38:18 1727116698

Every time I contemplate increasing my use of S3 or similar cloud services, I get annoyed at the extremely sorry state of anything on-prem that replicates to S3.

Heck, even confirming that one has uploaded one's files correctly is difficult. You can extract MD5 (!) signatures from most object storage systems, but only if you don't use whatever that particular system calls a multipart upload. You can often get CRC32 (gee thanks). With AWS, but not most competing systems, you can do a single-part upload and opt into "object integrity" and get a real hash in the inventory. You cannot ask for a hash after the fact.

I understand why computing a conventional cryptographically-secure has is challenging in a multipart upload (but that actually all that bad). But would it kill the providers to have first-class support for something like BLAKE3? BLAKE3 is a tree hash: one can separately hash multiple parts (with a priori known offsets, but that's fine for most APIs but maybe not Google's as is), assemble them into a whole in logarithmic time and memory, and end up with a hash that actually matches what b3sum would have output on the whole file. And one could even store some metadata and thereby allow downloading part of a large file and proving that one got the right data. (And AWS could even charge for that!)

But no, verifying the contents of one's cloud object storage bucket actually sucks, and it's very hard to be resistant to errors that occur at upload time.

krab · 2024-09-23T08:10:39 1727079039

Well, S3 is hard to beat for our use case. We make a heavy use of their various tires, we store a somewhat large amount of data but only a minor part ever goes out.

The compute and network heavy stuff we do is still out of AWS.

vidarh · 2024-09-23T08:16:31 1727079391

That's pretty much the one situation where they're competitive, so sounds very reasonable. Some of their competitors (Cloudflare, Backblaze) might compete, but the biggest cost issue with S3 by far is the egress so if not much goes out it might still be best for you to stay there.

Sounds like (unlike most people who use AWS) you've done your homework. It's great to see. I've used AWS a lot, and will again, because it's often convenient, but so often I see people doing it uncritically without modeling their costs even as it skyrockets with scale.

throwaway384638 · 2024-09-23T13:47:48 1727099268

S3 is a decent product with zero competition. You should keep s3, it’s a fair price.

vidarh · 2024-09-23T14:32:38 1727101958

S3 has plenty of competition. It can be a fair price if you rarely read and need it's extreme durability, but that leaves plenty of uses it's totally wrong for.

jve · 2024-09-23T14:26:37 1727101597

As a technical person working at datacenter and at still handle technical support requests at some capacity, interesting to read this stuff from the customer perspective. Good to know what is considered and important by customers. Maybe sales staff knows all that too well, but for me just invokes a smile and some pride in the job I do :)

krab · 2024-09-23T18:38:39 1727116719

The absolute trust killer is when your customer finds out about some issue on their own - even though you were supposed to monitor that. :-)

More important than raw uptime.

bcrl · 2024-09-24T13:40:29 1727185229

After years of dealing with colocation, I would never deploy a server without KVM in a data center. The cost of truck rolls is just too damned high! Even with KVM, I have had to make an emergency trip to a data center that is a 3.5 hour drive away due to a hardware errata on circa 2010 Intel Xeon boards where the network port shared between the KVM and the host would lock up under certain rare circumstances. The second time that happened I pulled the system from production.

If you do happen to have a system without on-board KVM, check out NanoKVM, which is a cheap (~$40) option for an add on KVM. It's rather more affordable than PiKVM. https://github.com/sipeed/NanoKVM