more pornel's comments | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit | more pornel's comments

login

pornel 10 days ago | parent | context | [–] | on: A year of Rust in ClickHouse

They link to actual issues in their bug tracker, so if it was a joke, it was an impressive long con.

alserio 7 days ago | | [–]

The joke is in the tone, not in the issues

pornel 10 days ago | parent | context | | [–] | on: A year of Rust in ClickHouse

Rust can be optionally compiled in a panic=abort mode, but by default panics are recoverable. From implementation perspective Rust panics are almost identical to C++ exceptions.

For servers that must not suddenly die, it's wise to use panic=unwind and catch_unwind at task/request boundaries (https://doc.rust-lang.org/stable/std/panic/fn.catch_unwind.h...)

In very early pre-1.0 prototypes Rust was meant to have isolated tasks that are killed on panic. As Rust became more low-level, it turned into terminating a whole OS thread on panic, and since Rust 1.9.0, it's basically just a try/catch with usage guidelines.

pornel 12 days ago | parent | context | | [–] | on: Writing C for Curl

> Could such bugs be avoided in C using the right tools and strategies

"right tools and strategies" is very open-ended, almost tautological — if you didn't catch the bug, then obviously you haven't used the right tools and the right strategies! In reality, the tools and strategies have flaws and limitations that turn such problem into "yes but actually no".

Static analysis of C code has fundamental limits, so there are bugs it can't find, and there are non-trivial bugs that it can't find without also finding false positives. False positives make developers needlessly tweak code that was correct, and leads to fatigue that makes them downplay and ignore the reports. The more reliable tools require catching problems at run-time, but problems like double-free often happen only in rare code paths that are hard to test for, and fuzzers can't reach all code either.

uecker 12 days ago | | [–]

Static analysis of arbitrary legacy code is limited. But I do not find it difficult to structure my code in a way that I can reasonable exclude most errors. The discussion of false positives in C is interesting. In some sense, 99% of what the Rust compiler would complain about would be considered false positives in C. So if you want to have safety in C, you can not approach this from this angle. But this relates to my point. If it is acceptable to structure the code in specific ways to make the Rust compiler happy, but you do not accept that you may have to write code in specific ways to avoid false positives in C, then you are already not comparing apples to apples.

pornel 12 days ago | | | [–]

Even if you rewrite C code to the same "shape" as Rust, it won't become equally easy to statically analyze. The C language doesn't give the same guarantees, so it doesn't benefit from the same restrictions. For example, pointers don't guarantee their data is always initialized, `const` doesn't make the data behind it truly immutable, and there's no Send/Sync to describe thread safety. You have nothing to express that a piece of memory has a single owner. C's type system also loses information whenever you need to use void* instead of generics, unions with a DIY tag, and have pointers with a mixed provenance/ownership.

Rust checks that you adhere to the analyzable structure, and keeps you from accidentally breaking it at every step. In C you don't get any compiler help for that. I'm not aware of any tools that guide such structure for C beyond local tweaks. It'd be theoretically possible with enough non-standard C language extensions (add borrowing, exclusive ownership with move semantics), but that would get pretty close to rewriting the code in Rust, except using a bolted-on syntax, a dumb compiler that doesn't understand it, and none of the benefits of the rest of Rust's language, tooling, and ecosystem.

uecker 11 days ago | | | [–]

You do not get the same automatic guarantees when not using external tools. But you get almost the same results when using tools readily available and having a good strategy for organizing. I do not have problems with double frees, void type unsafety, or tagged unions in my code. I occasionally have memory leaks, which tooling then tends to find. I certainly have exploitable integer overflows, but those are easily and comprehensively mitigated by UBsan.

GTP 12 days ago | | | | [–]

> If it is acceptable to structure the code in specific ways to make the Rust compiler happy

I think this is a misleading way of presenting the issue. In Rust, there are class of bugs that no valid Rust code can have (unless maybe when using the "unsafe" keyword), while there is valid C code that has said bugs. And here's the difference: in Rust the compiler prevents some mistakes for you, while in C it is you that have to exercise discipline to make sure every single time you structure the code in such a way to make said bugs unlikely. From this, it follows that Rust code will have less (memory-related) bugs.

It is not an apples to apples comparison because Rust is designed to be a memory safe fruit, while C isn't.

uecker 11 days ago | | | [–]

The point was that you can not reject warnings as part of a solution for C because "it annoys programmers because it has false positives" while accepting Rust's borrow checker as a solution.

GTP 11 days ago | | | [–]

I think the general point of my comment still stands, as you and everyone else working on the project would need to be disciplined and only release binaries that were compiled with no warnings. And even using -Werror doesn't fully solve the problem in C, as not having warnings/errors is still not enough to get memory safety.

uecker 11 days ago | | | [–]

I don't really disagree with you, but I was also making a slightly different point. If you need absolute memory safety, you can use Rust (without ever using unsafe)

But you get 99% this way with C, a bit of discipline, and tooling and this also means maintaining a super clean code base with many warnings activated even though they cause false positives. My point is that these false positives are not a valid argument why this strategy does not work or is inferior.

Your claim is that it is inferior because you only get 99% of safety and not 100%. But one can question whether you actually get 100% in Rust in practice in a project of relevant size and complexity, due to FFI and unsafe. One can also question whether 100% memory safety is all that important when there also many other issues to look out for, but this is a different argument.

pornel 12 days ago | parent | context | | [–] | on: Pitfalls of Safe Rust

I don't think it's a common concern in Rust. It used to be a problem in Internet Explorer. It's a footgun in Swift, but Rust's exclusive ownership and immutability make cycles very difficult to create by accident.

If you wrap a Future in Arc, you won't be able to use it. Polling requires exclusive access, which Arc disables. Most combinators and spawn() require exclusive ownership of the bare Future type. This is verified at compile time.

Making a cycle with `Arc` is impossible unless two other criteria are met:

1. You have to have a recursive type. `Arc<Data>` can't be recursive unless `Data` already contains `Arc<Data>` inside it, or some abstract type that could contain `Arc<Data>` in it. Rust doesn't use dynamic types by default, and most data types can be easily shown to never allow such cycle.

It's difficult to make a cycle with a closure too, because you need to have an instance of the closure before you can create an Arc, but your closure can't capture the Arc before it's created. It's a catch-22 that needs extra tricks to work around, which is not something that you can just do by accident.

2. Even if a type can be recursive, it's still not enough, because the default immutability of Arc allows only trees. To make a cycle you need the recursive part of the type to also be in a wrapper type allowing interior mutability, so you can modify it later to form a cycle (or use `Arc::new_cycle` helper, which is an obvious red flag, but you still need to upgrade the reference to a strong one after construction).

It's common to have Arc-wrapped Mutex. It's possible to have recursive types, but having both together at the same time are less common, and then still you need to make a cycle yourself, and dodge all the ownership and borrow checking issues required to poll a future in such type.

pornel 13 days ago | parent | context | | [–] | on: SeedLM: Compressing LLM Weights into Seeds of Pseu...

Weights in neural networks don't always need to be precise. Not all weights are equally useful to the network. There seems to be a lot of redundancy that can be replaced with approximations.

This technique seems a bit similar to lossy image compression that replaces exact pixels with a combination of pre-defined patterns (DCT in JPEG), but here the patterns aren't from cosine function, but from a pseudo-random one.

It may also be beating simple quantization from just adding noise that acts as dithering, and breaks up the bands created by combinations of quantized numbers.

pornel 13 days ago | parent | context | | [–] | on: Show HN: I Built ImgFiber-Better Image Optimizer. ...

In image compression "lossless" is a term of the art. What you're doing is a practically useful quality degradation, but it's not lossless.

umtksa 13 days ago | | [–]

I just tested with a folder full of jpegs and I didn't even have to compare to see the artifacts kind of "looseless"

pornel 13 days ago | parent | context | | [–] | on: The Llama 4 herd

No, it's more like sharding of parameters. There's no understandable distinction between the experts.

vintermann 13 days ago | | [–]

I understand they're only optimizing for load distribution, but have people been trying to disentangle what the the various experts learn?

calaphos 13 days ago | | | [–]

Mixture of experts involves some trained router components which routes to specific experts depending on the input, but without any terms enforcing load distribution this tends to collapse during training where most information gets routed to just one or two experts.

pornel 13 days ago | | | | [–]

Keep in mind that the "experts" are selected per layer, so it's not even a single expert selection you can correlate with a token, but an interplay of abstract features across many experts at many layers.

pornel 14 days ago | parent | context | | [–] | on: The Fifth Kind of Optimisation

This is where Rust makes a massive difference. I can change iter() to par_iter() and have it work flawlessly on the first try.

pornel 14 days ago | parent | context | | [–] | on: The Fifth Kind of Optimisation

The warp model in GPUs is great at hiding the DRAM latency. The GPU isn't idly waiting for DRAM.

All threads that need a memory access are in a hardware queue, and data coming from the DRAM immediately dequeues a thread and runs the work until the next memory access. So you compute at the full throughput of your RAM. Thread scheduling done in software can't have such granularity and low overhead, and hyperthreading has too few threads to hide the latency (2 vs 768).

pornel 14 days ago | parent | context | | [–] | on: Europe needs its own social media platforms to saf...

Facebook and Google+ tried to do this with their realname policies. It doesn't work as well as one would expect:

• Toxic assholes are not deterred by their name being attached to what they're saying, because they think they're saying righteous things and/or fighting bad people who don't deserve any respect.

• People self-censor, because they don't want to risk upsetting some random violent stranger on the internet who can track them down.

• People who don't use their legal name publicly have trouble participating. This impacts transgender people, but also people using stage names/pen names, and stalking victims.

squiggleblaz 14 days ago | | [–]

I think OP's point isn't to prevent toxic assholes from saying whatever righteous things and fighting whatever bad fight, but to limit bot/inorganic/foreign contributions from made up people - basically to make it "one person one voice".

I kind of like the idea of "one person one voice", but I have two problems with it, which I think will block me from accepting it.

One is that the cost of it seems much too high, even if you can change it to allow the use of chosen aliases (I don't think it matters what a "one person one voice" system calls an authenticated member). I don't really trust everyone who I have to give my ID details too, and this is just one more bit of stress for so little gain.

The second is that the benefits will never be realised. In an election, one person one vote doesn't work when half the population doesn't vote; you need almost everyone to come, otherwise it's the strongest opinions not the mainstream opinions that dominate. And I'm quite sure we'll see the exact same thing here, but in spades, and faster. If you don't like the opinion, you just don't show up. Once the centre of the social media is sufficiently different from the centre of the community, there will be the sort of bullying and self censorship you foresee and it will spiral out of control.

phillipseamore 14 days ago | | | [–]

There's no need for real names, what is needed is that you can't create multiple accounts. This can be done without linking identities by using two unrelated parties. Party A is the platform and B is the authenticator, when creating an account on A you are sent to B to authenticate your identity and get a token to finish your account creation on A. As long as A and B are separate, A never knows the identity of the user and B doesn't know what the user represents himself on A.

Join us for AI Startup School this June 16-17 in San Francisco!
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact