Hacker Newsnew | past | comments | ask | show | jobs | submit | antics's commentslogin

If I give you a biased coin can you simulate a truly random coin flip with it? The answer turns out to be yes. Flip the biased coin twice: HT = heads, TH = tails, and HH/TT = flip twice again.

The general study of such things is called “randomness extractors”. The new Gödel prize goes to this paper which shows how to extract a nearly perfect random bit out of two sources with low min-entropy.


Yes, but - you need to replace "twice" there with "an unbounded number of times". If you apply this in an environment where the biased coin is coming from an external source, your system becomes susceptible to DoS attacks.


While I obviously think randomness extractors over adversarial sources are very interesting, I think talking about them specifically in this example complicates the point I'm trying to make, which is that it's incredible it can be done at all.


Note that adversarial is kind of a red herring, not sure why they mentioned that. The number of flips is unbounded regardless. Which is why it's not really incredible that it can be done: it can't, not as the problem was originally stated. What can be done is solving a different (but useful) problem than the one originally posed.

I realize this sounds like a minor detail to someone who finds this cool (and so do I), but I don't think it is. It's kind of frustrating to be told that your intuition is wrong by someone smarter than you, when your intuition is actually correct and the experts are moving the goalposts. IMO, it makes people lose respect for experts/authority.


So, the problem in its original framing is: can we simulate a fair coin flip with an unfair coin? As stated, I do actually think the von Neumann response answer ("this is actually technically possible") is fair, in that if I wanted a solution in O(1), I think I should have to say so ahead of time.

I suppose we'll have to disagree about whether this is incredible. The response shows that (1) this can be done at all, and (2) that the answer is exponentially likely as time goes on, not asymptotically, but for finite n. Incredible! You don't see finite-decay bounds very often! If you don't think that's incredible I invite you to ask a room full of people, even with the qualifications you deem appropriate, e.g., "solution does not need to be constant-time", or whatever.


The difference is deterministic termination:

There is no N such that your algorithm is guaranteed to terminate before N flips — even for a single bit. The complexity class in the worst case (100% T) is infinite; it’s only the average cases that have something reasonable.

To borrow your phrasing:

If you only have a stochastic algorithm, I think you should have to say so.


Kind of, kind of not. I think it's fair to argue a lot of theory people would argue that it is effectively guaranteed to terminate, for any reasonable definition of "guaranteed."

As I mention elsewhere, for a coin where P(H)=0.75, with 600 flips, the probability of not having received an answer is less than 1 divided by the number of atoms in the universe. While this is not 0, neither is the probability of the atoms of the coin lining up just right that it falls through the table, producing neither H or T. Theoretically and practically we do generally consider these kinds of things immaterial to the problem at hand, perhaps papering over them with terms-of-art like "with high/extremely high/arbitrarily high/ probability". They all mean roughly the same thing: this algorithm is "essentially guaranteed" to terminate after n flips.


You can relax the constraints to bound the cost.

If I asked a someone gambling, does it matter if this coin is biased 0.0001%, they will probably say no. And just like that, the algorithm is now guaranteed to terminate.

It's a bit unfair to talk about the case (100% T) because in that case you don't have a source of randomness anymore, we've dropped the assumption that you have a source of randomness, and as expected you can't make one from nothing.


You have a source of randomness — you just got exceptionally unlucky. A truly random source can produce the same value indefinitely.

My point is that you have stochastic termination, but not guaranteed termination. Those are different things.


A random source can produce the same value indefinitely, but it cannot produce the same value forever. It is impossible in the sense that the probability that it will happen is zero. That is, even the unbiased version of the algorithm will terminate with 100% probability.


> My point is that you have stochastic termination, but not guaranteed termination. Those are different things.

You missed their suggestion about that.

If you rephrase the algorithm as one that almost almost almost almost almost completely eliminates bias, you can guarantee termination by giving up after a certain number of repetitions.


Sure — but now we’re back in the case the person above me was calling out and I was emphasizing by pointing out the stochastic nature: you moved the goal posts.

From their comment:

> Which is why it's not really incredible that it can be done: it can't, not as the problem was originally stated. What can be done is solving a different (but useful) problem than the one originally posed.

You can have an arbitrarily small bias, at the cost of increased runtime — but you can’t have a zero bias algorithm that always terminates. You have to move the goalposts (allowing some bias or exceedingly rare cases to not terminate).

I’m not sure why people have such a hard time admitting that.


I don't think the termination thing is moving the goalposts.

If you have infinite time, then you can wait for it to take more than a few hundred iterations.

If you don't have infinite time, it won't take more than a few hundred iterations.

Also "constant time" wasn't even part of the original promise! If a goalpost was moved, it was by the person that decided on that requirement after the fact.


> If you have infinite time, then you can wait for it to take more than a few hundred iterations.

To be clear, you're making this argument while also arguing "this wouldn't happen in the real world". [1] You can't have it both ways.

> Also "constant time" wasn't even part of the original promise!

We're not even asking for "constant time". We're literally only asking for "will finish in any bounded amount of time". Even an exponential time bound would've been better than unbounded!

A simulator that can't even promise to get to step #2 of the simulation in any bounded amount of time very much needs a giant proactive asterisk on its labeling, because that's not what people understand to be a simulator. Again, if you sold someone that in such a manner that obscured this fact, they would absolutely be justified in wanting their money back.

[1] https://news.ycombinator.com/item?id=44228568


> You can't have it both ways.

I addressed that in my next sentence! The whole point of making it two sentences was to split up the real world and not real world cases. Come on.

> We're not even asking for "constant time". We're literally only asking for "will finish in any bounded amount of time". Even an exponential time bound would've been better than this!

N is 1. Every bound is a constant bound.

> if you sold someone that in such a manner that obscured this fact, they would absolutely be justified in wanting their money back.

They would not be entitled to a penny back because it's impossible for them to hit the failure case.

Even if they could hit it, nobody is getting a refund for "it crashes once per googolplex runs".


> I addressed that in my next sentence! The whole point of making it two sentences was to split up the real world and not real world cases. Come on.

Your next sentence was just obviously flat-out wrong though. Not just because who says that's the case (maybe I don't have that much time?) but because it's trivial to find P(H) that makes it false for any duration of time. "A few hundred iterations" literally doesn't guarantee anything unless you make unstated assumptions about the biases your device works for.

I don't get why we're going in circles here. It seems we've hashed everything out.

> N is 1. Every bound is a constant bound.

Kind of a meaningless statement when you don't even say what your N is. I can imagine lots of N where that's not the case. But whatever you want to call it, my point entirely stands.

> They would not be entitled to a penny back because it's impossible for them to hit the failure case.

No, it is very possible. All they need to be given is a coin whose bias they don't know beforehand, whose bias is unfortunate. Or a coin whose bias they do know to be much worse than whatever you imagined a few hundred tosses would be enough for.


> because it's trivial to find P(H) that makes it false

As I already said in the comments you read, it's proportional to the bias. A few hundred multiplied by the ratio between heads and tails or vice versa will never be reached.

> when you don't even say what your N is

Number of outputs?

Look, if you want to invoke concepts like exponential time then you tell me what N is. Exponential of what?

> All they need to be given is a coin whose bias they don't know beforehand

See first answer.

If someone expects the bias to not matter, even a one in a million coin, they're the one with the problem, not the algorithm.

If they accept the bias affects speed, things are fine.


> the worst case (100% T)

You're no longer randomly flipping a coin to get heads or tails at that point. There's a good argument that this is not within the original scenario.


Yes you are — you just have exceptionally bad luck to generate that sequence.


Oh, I thought you were saying the coin bias was 100%. I misread how you were using complexity class.

Still, when a worst case is physically impossible I don't think it needs a mandatory disclaimer.


All infinite sequences are physically impossible, but they’re the basis of asymptotics; arbitrary failure and non-production of entropy are still physically possible.


Note that the solution dataflow objects to already operates in constant time on an average-case basis. There isn't room to make a complexity improvement.


What do you mean by "exponentially, not asymptotically, but for finite n"? Exponential is by definition asymptotic and continues infinitely, no?

And to be clear, I'm not disagreeing (or agreeing) with the result being inherently incredible. I'm just saying it's not an incredible example of simulating a fair coin, because it just... isn't doing that. As an analogy: communicating with the Voyager spacecraft might be incredible, but it's not an incredible example of infinitely fast communication... because it just isn't. Telling me to go ask a room full of people whether they find either of these incredible is missing the point.


> What do you mean by "exponentially, not asymptotically, but for finite n"? Exponential is by definition asymptotic and continues infinitely, no?

In statistics, we generally separate asymptotic bounds—which usually make guarantees only asymptotically—from finite-decay bounds like Chernoff, which decay exponentially not only in the limit, but also at each particular finite n. The second is much rarer and much more powerful.

This is an important distinction here: we are not talking about an asymptotic limit of coin flips. Each and every round of coin flips reduces the probability of not having an answer exponentially.

> I'm just saying it's not an incredible example of simulating a fair coin, because it just... isn't doing that. As an analogy: communicating with the Voyager spacecraft might be incredible, but it's not an incredible example of infinitely fast communication... because it just isn't.

Ok, no problem, it's up to you to decide if you want to use your own definition here.

But, FYI, in computability theory, it is definitely fair game and very common to "simulate" some computation with something that is either much more memory-intensive or compute-intensive, in either the deterministic or probabilistic case. For example, it's pretty common to add or remove memory and see whether the new "simulated" routine takes more or less compute power than the "standard" algorithm, and that is kind of what people are up to with the PSPACE stuff that is going on right now.

Using that lens—and this is entirely off the cuff, but in the 10 seconds of thinking, I believe probably mostly correct—this algorithm "simulates" a fair coin toss by using 1 extra bit of memory and O(p) compute, where p is the reciprocal of |0.5-<coin's bias>|. You can choose p to be infinite but you can do that for a sorting algorithm too (and that is why both sorting and this algorithm have DoS implications). Is this the best we can do? Well, that's the problem we're studying with randomness exractors: given this imperfect source of randomness, can we extract perfect randomness out of it at all?


> In statistics, we generally separate asymptotic bounds—which usually make guarantees only asymptotically—from finite-decay bounds like Chernoff, which decay exponentially not only in the limit, but also at each particular finite n. The second is much rarer and much more powerful.

I don't know if you're using different terminology than I understand here, but I'm reading "exponential in the limit" to mean "eventually exponential", or in other words "there is some finite n_0 where for n > n_0 the decay becomes exponential"? In which case it basically means that the first n_0 tosses would have to be discarded? It's nice that that's not the case, I guess. Somehow I don't find myself blown away by this, but perhaps I'm misunderstanding what it means to be "exponential but only in the limit but not for finite n".

> in computability theory, it is definitely fair game and very common to "simulate" some computation with something that is either much more memory-intensive or compute-intensive, in either the deterministic or probabilistic case.

"Much" more is one thing, "arbitrarily more if you're unlucky" is another. I'm not an expert in computability theory by any means, but whenever I've heard of simulation in that context, it's been with a hard bound (whether constant, polynomial, or whatever). I've never heard it called simulation in a context where the simulator can take arbitrarily long to simulate step #2. Even if this is normal to those in the field, don't think the average person thinks of it this way -- I very much think most people would want their money back if they were sold a "simulator" that took arbitrarily long to get to step #2.


What is our goal here? I'm willing to continue this discussion but after answering these questions and seeing your responses elsewhere it does not seem like your understanding is improving (e.g., still misunderstanding that the algorithm does not take "arbitrary" time), so I'm not sure you're getting the value here... If you want to continue maybe we should try you asking specific questions instead.


I have the exact opposite reaction, that if someone told me the answer is "no" because it requires an unbounded number of coin flips that they were the ones trying to bullshit me. In antic's formulation, nothing is said about requiring a bounded number of flips.


"Simulate a truly random coin" implies it IMO. You're not simulating a truly random coin if you need unbounded time for a single flip. The truly random coin definitely doesn't need that. It just feels like a scam if someone sold me such a machine with that description - I'd want my money back. I don't expect everyone would feel the same, but I think a lot of people would.


I'm not sure we're on the same page about what this result practically means, so let me re-state it a few different ways, so that people can draw their own conclusions:

* The von Neumann approach will "appear to be O(1) (constant-time)" for any particular biased coin, but that constant might be big if the coin is very, VERY biased in one direction.

* How can this be true? Every flip reduces the probability you do not have an answer exponentially. The "concentration" around the mean is very sharpe.g., at 275 coin tosses for P(H)=0.5 (the fair case), the probability of not having an answer is smaller than 1 divided by the number of atoms in the known universe. It is technically possible, but I think most people would say that it's "effectively constant time" in the sense that we'd expect the coin to phase-shift through the desk before flipping it 275 times in a row and not getting answer. So it takes 275 flips, it's "constant" time! Interpret it how you like though.

* As you make the coin more and more biased, that "horizon" increases linearly, in that 0.99^1,000 is approximately the same thing as 0.999^10,000. So, an order of magnitude increase in probability requires roughly an order of magnitude increase in the number of flips. This is why it's not useful for the adversarial case, and why adversarial extractors are held apart from normal extractors.

Whether this is a "give me my money back" type thing is for you to decide. I think for most people the claim that you can simulate a fair coin from a biased coin in, effectively, O(1), and that the constant increases in O(n) in the bias, is plainly incredible. :)


You don't need unbounded time for a single flip, that's all in your imagination. The worst-case time is unbounded, but you can't achieve the worst case.


It's very easy to achieve if someone hands you a "coin" that is made such that it never lands on tails.

Sorry for still being in the adversarial mindset, but this means that you essentially have to hardcode a maximum number of same-side flips after which you stop trusting the coin.


That's not a situation of "can't be done", that's just a consequence of casually describing the problem instead of exhaustively specifying it.

Yes the bias has to be finite / the entropy can't be zero, and the slowdown is related to how big the bias is / how low the entropy is.


> You don't need unbounded time for a single flip, that's all in your imagination. The worst-case time is unbounded, but you can't achieve the worst case.

There literally isn't a bound, it can be arbitrarily large. This isn't just in my head, it's a fact. If it's bounded in your mind then what is the bound?


It's not a fact of the real world, at least. "You can't achieve it" is true. A pretty small number of failures and you're looking at a trillion years to make it happen. And if you buffer some flips that number gets even smaller.


> A pretty small number of failures and you're looking at a trillion years to make it happen.

This depends on the bias of the original coin. P(H) can be arbitrarily large, making P(HH) the likeliest possibility even for a trillion years. "This wouldn't happen in the real world" would be a sorry excuse for the deliberate refusal to clearly state the problem assumptions upfront.

IMO, if you really want to pleasantly surprise people, you need to be forthcoming and honest with them at the beginning about all your assumptions. There's really no good excuse to obfuscate the question and then move the goalposts when they (very predictably) fall into your trap.


> This depends on the bias of the original coin. P(H) can be arbitrarily large

> There's really no good excuse to obfuscate the question and then move the goalposts when they (very predictably) fall into your trap.

Interesting. Because I see the guy pulling out the one-in-a-million coin and expecting it to run at a similar speed to be doing a gotcha on purpose, not falling into a trap and having the goalposts moved.

And I think "well if it's a million times less likely to give me a heads, then it takes a million times as many flips, but it's just as reliable" is an answer that preserves the impressiveness and the goalposts.

It's fast relative to the bias. Which seems like plenty to me when the original claim never even said it was fast.

(And if the coin never gives you a heads then I'd say it no longer qualifies as randomly flipping a coin.)


A real coin could repeatedly land on its side over and over indefinitely, every time you flipped it.


I don't think anyone would be surprised to hear that if a biased coin can only give one result, you can't extract randomness from it.

And if it can only give the second result one in a million times, you could be flipping millions.


i read "flip twice" as recussion, so, given we're talking randomness, yes, that could go on forever. but i don't think you really need to replace "twice."


Speaking from complete ignorance, with apologies to those who that will annoy:

I'm sure it's possible to make a coin with what one might term "complex bias" where the bias extends over two events (thick gloop inside the coin, possibly heat activated or non-Newtonian).

This method sounds like the bias needs to be fixed ("simple bias" if you like)?

I guess that's just out of scope here.

Aside: There's a strong 'vibe' that HHHH HHHH with a single coin is somehow less random than HTHTHTHT HTHTHTHT with two coins when discarding HH and TT. I wonder what the origin of that feeling is - maybe just HT doesn't seem like it's a constant result simply due to nomenclature? If we call HT "type-A", then we just have HHHH HHHH vs. AAAA AAAA; and I _feel_ happier about the result!


I suspect this depends on where you drawn the upper bound, since a really really complex biased coin is one that spies on your thesis and data and is committed to making you suffer.

Descartes' evil demon, in numismatic form.


Could the vibe be due to the fact that HHHH… seems like the coin could not just be biased - it could be completely broken and come up heads every time. There are two distinct possibilities here:

1) the coin is broken and is always H

2) the coin is random or possibly biased, and you got a string of H by chance

And observing the string of H… increases the probability of 1) in the Bayesian sense.

With the two coins you eliminate this possibility altogether - a broken coin can never produce HT or TH.


The legacy of the electric motor is not textile factories that are 30% more efficient because we point-replaced steam engines. It's the assembly line. The workforce "lost" the skills to operate the textile factories, but in turn, the assembly line made the workflow of goods production vastly more efficient. Industrial abstraction has been so successful that today, a small number of factories (e.g., TSMC) have become nearly-existential bottlenecks.

That is the aspiration of AI software tools, too. They are not coming to make us 30% more efficient, they are coming to completely change how the software engineering production line operates. If they are successful, we will write fewer tests, we will understand less about our stack, and we will develop tools and workflows to manage that complexity and risk.

Maybe AI succeeds at this stated objective, and maybe it does not. But let's at least not kid ourselves: this is how it has always been. We are in the business of abstracting things away so that we have to understand less to get things done. We grumbled when we switched from assembly to high-level languages. We grumbled when we switched from high-level languages to managed languages. We grumbled when we started programming enormous piles of JavaScript and traveled farther from the OS and the hardware. Now we're grumbling about AI, and you can be sure that we're going to grumble about whatever is next, too.

I understand this is going to ruffle a lot of feathers, but I don't think Thomas and the Fly team actually have missed any of the points discussed in this article. I think they fully understand that software production is going to change and expect that we will build systems to cope with abstracting more, and understanding less. And, honestly, I think they are probably right.


I think the intuition the authors are trying to capture is that they believe the models are omniscient, but also dim-witted. And the question they are collectively trying to ask is whether this will continue forever.

I've never seen this question quantified in a really compelling way, and while interesting, I'm not sure this PDF succeeds, at least not well-enough to silence dissent. I think AI maximalists will continue to think that the models are in fact getting less dim-witted, while the AI skeptics will continue to think these apparent gains are in fact entirely a biproduct of "increasing" "omniscience." The razor will have to be a lot sharper before people start moving between these groups.

But, anyway, it's still an important question to ask, because omniscient-yet-dim-witted models terminate at "superhumanly assistive" rather than "Artificial Superintelligence", which in turn economically means "another bite at the SaaS apple" instead of "phase shift in the economy." So I hope the authors will eventually succeed.


> I think the intuition the authors are trying to capture is that they believe the models are omniscient, but also dim-witted.

We keep assigning adjectives to this technology that anthropomorphize the neat tricks we've invented. There's nothing "omniscient" or "dim-witted" about these tools. They have no wit. They do not think or reason.

All Large "Reasoning" Models do is generate data that they use as context to generate the final answer. I.e. they do real-time tuning based on synthetic data.

This is a neat trick, but it doesn't solve the underlying problems that plague these models like hallucination. If the "reasoning" process contains garbage, gets stuck in loops, etc., the final answer will also be garbage. I've seen sessions where the model approximates the correct answer in the first "reasoning" step, but then sabotages it with senseless "But wait!" follow-up steps. The final answer ends up being a mangled mess of all the garbage it generated in the "reasoning" phase.

The only reason we keep anthropomorphizing these tools is because it makes us feel good. It's wishful thinking that markets well, gets investors buzzing, and grows the hype further. In reality, we're as close to artificial intelligence as we were a decade ago. What we do have are very good pattern matchers and probabilistic data generators that can leverage the enormous amount of compute we can throw at the problem. Which isn't to say that this can't be very useful, but ascribing human qualities to it only muddies the discussion.


> There's nothing "omniscient" or "dim-witted" about these tools. They have no wit. They do not think or reason.

> All Large "Reasoning" Models do is generate data that they use as context to generate the final answer. I.e. they do real-time tuning based on synthetic data.

I always wonder when people make comments like this if they struggle with analogies. Or if it's a lack of desire to discuss concepts at different levels of abstraction.

Clearly an LLM is not "omniscient". It doesn't require a post to refute that, OP obviously doesn't mean that literally. It's an analogy describing two semi (fairly?) independent axes. One on breadth of knowledge, one on something more similar to intelligence and being able to "reason" from smaller components of knowledge. The opposite of which is dim witted.

So at one extreme you'd have something completely unable to generalize or synthesize new results. Only able to correctly respond if it identically matches prior things it has seen, but has seen and stored a ton. At the other extreme would be something that only knows a very smal set of general facts and concepts but is extremely good at reasoning from first principles on the fly. Both could "score" the same on an evaluation, but have very different projections for future growth.

It's a great analogy and way to think about the problem. And it me multiple paragraphs to write ehat OP expressed in two sentences via a great analogy.

LLMs are a blend of the two skills, apparently leaning more towards the former but not completely.

> What we do have are very good pattern matchers and probabilistic data generators

This an unhelpful description. And object is more than the sum of its parts. And higher levels behaviors emerge. This statement is factually correct and yet the equivalent of describing a computer as nothing more than a collection of gates and wires so shouldn't be discussed at a higher level of abstraction.


Language matters. Using language that accurately describes concepts and processes is important. It might not matter to a language model which only sees patterns, but it matters to humans.

So when we label the technical processes and algorithms these tools use as something that implies a far greater level of capability, we're only doing a disservice to ourselves. Maybe not to those of us who are getting rich on the market hype that these labels fuel, but certainly to the general population that doesn't understand how the technology works. If we claim that these tools have super-human intelligence, yet they fail basic tasks, how do we explain this? More importantly, if we collectively establish a false sense of security and these tools are adopted in critical processes that human lives depend on, who is blamed when they fail?

> This statement is factually correct and yet the equivalent of describing a computer as nothing more than a collection of gates and wires so shouldn't be discussed at a higher level of abstraction.

No, because we have descriptive language to describe a collection of gates and wires by what it enables us to do: perform arbitrary computations, hence a "computer". These were the same tasks that humans used to do before machines took over, so the collection of gates and wires is just an implementation detail.

Pattern matching, prediction, data generation, etc. are the tasks that modern AI systems allow us to do, yet you want us to refer to this as "intelligence" for some reason? That makes no sense to me. Maybe we need new higher level language to describe these systems, but "intelligence", "thinking", "reasoning" and "wit" shouldn't be part of it.


>There's nothing "omniscient" or "dim-witted" about these tools

I disagree in that that seems quite a good way of describing them. All language is a bit inexact.

Also I don't buy we are no closer to AI than ten years ago - there seem lots going on. Just because LLMs are limited doesn't mean we can't find or add other algorithms - I mean look at alphaevolve for example https://www.technologyreview.com/2025/05/14/1116438/google-d...

>found a faster way to solve matrix multiplications—a fundamental problem in computer science—beating a record that had stood for more than 50 years

I figure it's hard to argue that that is not at least somewhat intelligent?


> I figure it's hard to argue that that is not at least somewhat intelligent?

The fact that this technology can be very useful doesn't imply that it's intelligent. My argument is about the language used to describe it, not about its abilities.

The breakthroughs we've had is because there is a lot of utility from finding patterns in data which humans aren't very good at. Many of our problems can be boiled down to this task. So when we have vast amounts of data and compute at our disposal, we can be easily impressed by results that seem impossible for humans.

But this is not intelligence. The machine has no semantic understanding of what the data represents. The algorithm is optimized for generating specific permutations of tokens that match something it previously saw and was rewarded for. Again, very useful, but there's no thinking or reasoning there. The model doesn't have an understanding of why the wolf can't be close to the goat, or how a cabbage tastes. It's trained on enough data and algorithmic tricks that its responses can fool us into thinking it does, but this is just an illusion of intelligence. This is why we need to constantly feed it more tricks so that it doesn't fumble with basic questions like how many "R"s are in "strawberry", or that it doesn't generate racially diverse but historically inaccurate images.


>The machine has no semantic understanding of what the data represents.

How do you define "semantic understanding" in a way that doesn't ultimately boil down to saying they don't have phenomenal consciousness? Any functional concept of semantic understanding is captured to some degree by LLMs.

Typically when we attribute understanding to some entity, we recognize some substantial abilities in the entity in relation to that which is being understood. Specifically, the subject recognizes relevant entities and their relationships, various causal dependences, and so on. This ability goes beyond rote memorization, it has a counterfactual quality in that the subject can infer facts or descriptions in different but related cases beyond the subject's explicit knowledge. But LLMs excel at this.

>feed it more tricks so that it doesn't fumble with basic questions like how many "R"s are in "strawberry"

This failure mode has nothing to do with LLMs lacking intelligence and everything to do with how tokens are represented. They do not see individual characters, but sub-word chunks. It's like expecting a human to count the pixels in an image it sees on a computer screen. While not impossible, it's unnatural to how we process images and therefore error-prone.


You don't need phenomenal consciousness. You need consistency.

LLMs are not consistent. This is unarguable. They will produce a string of text that says they have solved a problem and/or done a thing when neither is true.

And sometimes they will do it over and over, even when corrected.

Your last paragraph admits this.

Tokenisation on its own simply cannot represent reality accurately and reliably. It can be tweaked so that specific problems can appear solved, but true AI would be based on a reliable general strategy which solves entire classes of problems without needing this kind of tweaking.

It's clear we're nowhere close to that.


Consistency is a strange criteria seeing as humans aren't very consistent either. Intelligent beings can make mistakes.


This is a common category of error people commit when talking about LLMs.

"True, LLMs can't do X, but a lot of people don't do X well either!"

The problem is, when you say humans have trouble with X, what you mean is that human brains are fully capable of X, but sometimes they do, indeed, make mistakes. Or that some humans haven't trained their faculties for X very well, or whatever.

But LLMs are fundamentally, completely, incapable of X. It is not something that can be a result of their processes.

These things are not comparable.

So, to your specific point: When an LLM is inconsistent, it is because it is, at its root, a statistical engine generating plausible next tokens, with no semantic understanding of the underlying data. When a human is inconsistent, it is because they got distracted, didn't learn enough about this particular subject, or otherwise made a mistake that they can, if their attention is drawn to it, recognize and correct.

LLMs cannot. They can only be told they made a mistake, which prompts them to try again (because that's the pattern that has been trained into them for what happens when told they made a mistake). But their next try won't have any better odds of being correct than their previous one.


>But LLMs are fundamentally, completely, incapable of X. It is not something that can be a result of their processes.

This is the very point of contention. You don't get to just assume it.

> it is because it is, at its root, a statistical engine generating plausible next tokens, with no semantic understanding of the underlying data.

Another highly contentious point you are just outright assuming. LLMs are modelling the world, not just "predicting the next token". Some examples here[1][2][3]. Anyone claiming otherwise at this point is not arguing in good faith. It's interesting how the people with the strongest opinions about LLMs don't seem to understand them.

[1] https://arxiv.org/abs/2405.15943

[2] https://x.com/OwainEvans_UK/status/1894436637054214509

[3] https://www.anthropic.com/research/tracing-thoughts-language...


OK, sure; there is some evidence potentially showing that LLMs are constructing a world model of some sort.

This is, however, a distraction from the point, which is that you were trying to make claims that the described lack of consistency in LLMs shouldn't be considered a problem because "humans aren't very consistent either."

Humans are perfectly capable of being consistent when they choose to be. Human variability and fallibility cannot be used to handwave away lack of fundamental ability in LLMs. Especially when that lack of fundamental ability is on empirical display.

I still hold that LLMs cannot be consistent, just as TheOtherHobbes describes, and you have done nothing to refute that.

Address the actual point, or it becomes clear that you are the one arguing in bad faith.


You are misrepresenting the point of contention. The question is whether LLMs lack of consistency undermines the claim that they "understand" in some relevant sense. But arguing that lack of consistency is a defeater for understanding is itself undermined by noting that humans are inconsistent but do in fact understand things. It's as simple as that.

If you want to alter the argument by saying humans can engage in focused effort to reach some requisite level of consistency for understanding, you have to actually make that argument. It's not at all obvious that focused effort is required for understanding or that a lack of focused effort undermines understanding.

You also need to content with the fact that LLMs aren't really a single entity, but are a collection of personas, and what you get and its capabilities do depend on how you prompt it to a large degree. Even if the entity as a whole is inconsistent between prompts, the right subset might very well be reliably consistent. There's also the fact of the temperature setting that artificially injects randomness into the LLMs output. An LLM itself is entirely deterministic. It's not at all obvious how consistency relates to LLM understanding.

Feel free to do some conceptual work to make an argument; I'm happy to engage with it. What I'm tired of are these half-assed claims and incredulity that people don't take them as obviously true.


I imagine if you asked the LLM why the wolf can't be close to the goat it would give a reasonable answer. I realise it does it by using permutation of tokens but I think you have to judge intelligence by the results rather than the mechanism otherwise you could argue humans can't be intelligent because they are just a bunch of neurons that find patterns.


> you have to judge intelligence by the results rather than the mechanism

This would be the exact opposite conclusion of the Chinese room: https://en.wikipedia.org/wiki/Chinese_room

I think you'd need to offer a stronger counter argument than the one you presented here.


Actually I think the Chinese room fits my idea. It's a silly thought experiment that would never work in practice. If you tried to make one you would judge it unintelligent because it wouldn't work. Or at least in the way Searle implied - he basically proposed a look up table.


We have had programs that can give good answers to some hard questions for a very long time now. Watson won jeapordy already 2011, but it still wasn't very good at replacing humans.

So that isn't a good way to judge intelligence, computers are so fast and have so much data that you can make programs to answer just about anything pretty well, LLM is able to do that but more automatic. But it still doesn't automate the logical parts yet, just the lookup of knowledge, we don't know how to train large logic models, just large language models.


LLMs are not the only model type though? There's a plethora of architectures and combinations being researched.. And even transformers start to be able to do cool sh1t on knowledge graphs, also interesting is progress on autoregressive physics PDE (partial differential equations) models.. and can't be too long until some providers of actual biological neural nets show up on openrouter (probably a lot less energy and capital intense to scale up brain goo in tanks compared to gigawatt GPU clusters).. combine that zoo of "AI" specimen using M2M, MCP etc. and the line between mock and "true"intelligence will blur, escalating our feable species into ASI territory.. good luck to us.


> There's a plethora of architectures and combinations being researched

There were plethora of architectures and combinations being researched before LLM, still took a very long time to find LLM architecture.

> the line between mock and "true"intelligence will blur

Yes, I think this will happen at some point. The question is how long it will take, not if it will happen.

The only thing that can stop this is if intermediate AI is good enough to give every human a comfortable life but still isn't good enough to think on its own.

Its easy to imagine such an AI being developed, imagine a model that can learn to mimic humans at any task, but still cannot update itself without losing those skills and becoming worse. Such an AI could be trained to perform every job on earth as long as we don't care about progress.

If such an AI is developed, and we don't quickly solve the remaining problems to get an AI to be able to progress science on its own, its likely our progress entirely stalls there as humans will no longer have a reason to go to school to advance science.


>The only thing that can stop this is if intermediate AI is good enough...

Not going to happen due to competition. As soon as one company has a good one their rivals will develop a better one.


This approach to defining “true” intelligence seems flawed to me because of examples in biology where semantic understanding is in no way relevant to function. A slime mold solving a maze doesn’t even have a brain, yet it solves a problem to get food. There’s no knowing that it does that, no complex signal processing, no self-perception of purpose, but nevertheless it gets the food it needs. My response to that isn’t to say the slime mold has no intelligence, it’s to widen the definition of intelligence to include the mold. In other words, intelligence is something one does rather than has; it’s not the form but the function of the thing. Certainly LLMs lack anything in any way resembling human intelligence, they even lack brains, but they demonstrate a capacity to solve problems I don’t think is unreasonable to label intelligent behavior. You can put them in some mazes and LLMs will happen to solve them.


>LLMs lack anything in any way resembling human intelligence

I think intelligence has many aspects from moulds solving mazes to chess etc. I find LLMs resemble very much human rapid language responses where you say something without thinking about it first. They are not very good at thinking though. And hopeless if you were to say hook one to a robot and tell it to fix your plumbing.


While it's debatable whether slime molds showcase intelligence, there's a substantial difference between its behavior and modern AI systems. The organism was never trained to traverse a maze. It simply behaves in the same way as it would in its natural habitat, seeking out food in this case, which we interpret as "solving" a human-made problem. In order to get an AI system to do the same we would have to "train" it on large amounts of data that specifically included maze solving. This training wouldn't carry over any other type of problem, for which we would also need to specifically train it on.

When you consider how humans and other animals learn, knowledge is carried over. I.e. if we learn how to solve a maze on paper, we can carry this knowledge over to solve a hedge maze. It's a contrived example, but you get the idea. When we learn, we build out a web of ideas in our minds which we can later use while thinking to solve other types of problems, or the same problems in different ways. This is a sign of intelligence that modern AI systems simply don't have. They're showing an illusion of intelligence, which as I've said before, can still be very useful.


My alternative definition would be something like this. Intelligence is the capacity to solve problems, where a problem is defined contextually. This means that what is and is not intelligence is negotiable in situations where the problem itself is negotiable. If you have water solve a maze, then yes the water could be said to have intelligence, though that would be a silly way to put it. It’s more that intelligence is a material phenomenon, and things which seem like they should be incredibly stupid can demonstrate surprisingly intelligent behavior.

LLMs are leagues ahead of viruses or proteins or water. If you put an LLM into a code editor with access to error messages, it can solve a problem you create for it, much like water flowing through a maze. Does it learn or change? No, everything is already there in the structure of the LLM. Does it have agency? No, it’s a transparently deterministic mapping from input to output. Can it demonstrate intelligent behavior? Yes.


That's an interesting way of looking at it, though I do disagree. Mainly because, as you mention, it would be silly to claim that water is intelligent if it can be used to solve a problem. That would imply that any human-made tool is intelligent, which is borderline absurd.

This is why I think it's important that if we're going to call these tools intelligent, then they must follow the processes that humans do to showcase that intelligence. Scoring high on a benchmark is not a good indicator of this, in the same way that a human scoring high on a test isn't. It's just one convenient way we have of judging this, and a very flawed one at that.

Anyway, cheers for the discussion!


I keep on trying this wolf cabbage goat problem with various permutations, let’s say just a wolf and a cabbage, no goat mentioned. At some step the got materializes in the answer. I tell it there is no goat and yet it answers again and the goat is there.


2025 and we are still discussing if LLM's are intelligent or not, gee.


I am not sure we are on the same page that the point of my response is that this paper is not enough to prevent exactly the argument you just made.

In any event, if you want to take umbrage with this paper, I think we will need to back up a bit. The authors use a mostly-standardized definition of "reasoning", which is widely-accepted enough to support not just one, but several of their papers, in some of the best CS conferences in the world. I actually think you are right that it is reasonable to question this definition (and some people do), but I think it's going to be really hard for you to start that discussion here without (1) saying what your definition specifically is, and (2) justifying why its better than theirs. Or at the very least, borrowing one from a well-known critique like, e.g., Gebru's, Bender's, etc.


>They have no wit. They do not think or reason.

Computers can't think and submarines can't swim.


There’s 2 things going on here.

Output orientation - Is the output is similar to what a human would create if they were to think.

Process orientation - Is the machine actually thinking, when we say its thinking.

I met someone who once drew a circuit diagram from memory. However, they didn’t draw it from inputs, operations, to outputs. They started drawing from the upper left corner, and continued drawing to the lower right, adding lines, triangles and rectangles as need be.

Rote learning can help you pass exams. At some point, it’s a meaningless difference between the utility of “knowing” how engineering works, and being able to apply methods and provide a result.

This is very much the confusion at play here, so both points are true.

1) These tools do not “Think”, in any way that counts as human thinking

2) the output is often the same as what a human thinking, would create.

IF you are concerned with only the product, then what’s the difference? If you care about the process, then this isn’t thought.

To put it in a different context. If you are a consumer, do you care if the output was hand crafted by an artisan, or do you just need something that works.

If you are a producer in competition with others, you care if your competition is selling Knock offs at a lower price.


> IF you are concerned with only the product, then what’s the difference?

The difference is substantial. If the machine was actually thinking and it understood the meaning of its training data, it would be able to generate correct output based on logic, deduction, and association. We wouldn't need to feed it endless permutations of tokens so that it doesn't trip up when the input data changes slightly. This is the difference between a system with _actual_ knowledge, and a pattern matching system.

The same can somewhat be applied to humans as well. We can all either memorize the answers to specific questions so that we pass an exam, or we can actually do the hard work, study, build out the complex semantic web of ideas in our mind, and acquire actual knowledge. Passing the exam is simply a test of a particular permutation of that knowledge, but the real test is when we apply our thought process to that knowledge and generate results in the real world.

Modern machine learning optimizes for this memorization-like approach, simply because it's relatively easy to implement, and we now have the technical capability where vast amounts of data and compute can produce remarkable results that can fool us into thinking we're dealing with artificial intelligence. We still don't know how to model semantic knowledge that doesn't require extraordinary amounts of resources. I believe classical AI research in the 20th century leaned more towards this direction (knowledge-based / expert systems, etc.), but I'm not well versed in the history.


That sentence, is from the perspective of someone only caring about the output.

The people who care about the process, have a different take, which I have also explained.


But if you need a submarine that can swim as agiley as a fish then we still aren't there yet, fish are far superior to submarines in many ways. So submarines might be faster than fish, but there are so many maneuvers that fish can do that the submarine can't. Its the same with here with thinking.

So just like computers are better at humans at multiplying numbers, there are still many things we need human intelligence for even in todays era of LLM.


The point here (which is from a quote by Dijkstra) is that if the desired result is achieved (movement through water) it doesn't matter if it happens in a different way than we are used to.

So if an LLM generates working code, correct translations, valid points relating to complex matters and so on it doesn't matter if it does so by thinking or by some other mechanism.

I think that's an interesting point.


> if the desired result is achieved (movement through water) it doesn't matter if it happens in a different way than we are used to

But the point is that the desired result isn't achieved, we still need humans to think.

So we still need a word for what humans do that is different from what LLM does. If you are saying there is no difference then how do you explain the vast difference in capability between humans and LLM models?

Submarines and swimming is a great metaphor for this, since Submarines clearly doesn't swim and thus have very different abilities in water, its way better in some ways but way worse in other ways. So using that metaphor its clear that LLM "thinking" cannot be described with the same words as human thinking since its so different.


>If you are saying there is no difference then how do you explain the vast difference in capability between humans and LLM models?

No I completely agree that they are different, like swimming and propulsion by propellers - my point is that the difference may be irrelevant in many cases.

Humans haven't been able to beat computers in chess since the 90s, long before LLM's became a thing. Chess engines from the 90s were not at all "thinking" in any sense of the word.

It turns out "thinking" is not required in order to win chess games. Whatever mechanism a chess engine uses gets better results than a thinking human does, so if you want to win a chess game, you bring a computer, not a human.

What if that also applies to other things, like translation of languages, summarizing complex texts, writing advanced algorithms, realizing implications from a bunch of seemingly unrelated scientific papers, and so on. Does it matter that there was no "thinking" going on, if it works?


>So if an LLM generates working code

It matters when code bases become hard to parse because the engineers throwing shit together with Cursor have made an ungrokkable ball of shit.


"Can't" is a pretty strong word, effectively entailing "never". Never is a long time to believe computers can't think.


> I think AI maximalists will continue to think that the models are in fact getting less dim-witted

I'm bullish (and scared) about AI progress precisely because I think they've only gotten a little less dim-witted in the last few years, but their practical capabilities have improved a lot thanks to better knowledge, taste, context, tooling etc.

What scares me is that I think there's a reasoning/agency capabilities overhang. ie. we're only one or two breakthroughs away from something which is both kinda omniscient (where we are today), and able to out-think you very quickly (if only through dint of applying parallelism to actually competent outcome-modelling and strategic decision making).

That combination is terrifying. I don't think enough people have really imagined what it would mean for an AI to be able to out-strategise humans in the same way that they can now — say — out-poetry humans (by being both decent in terms of quality and super fast). It's like when you're speaking to someone way smarter than you and you realise that they're 6 steps ahead, and actively shaping your thought process to guide you where they want you to end up. At scale. For everything.

This exact thing (better reasoning + agency) is also the top priority for all of the frontier researchers right now (because it's super useful), so I think a breakthrough might not be far away.

Another way to phrase it: I think today's LLMs are about as good at snap judgements in most areas as the best humans (probably much better at everything that rhymes with inferring vibes from text), but they kinda suck at:

1. Reasoning/strategising step-by-step for very long periods

2. Snap judgements about reasoning or taking strategic actions (in the way that expert strategic humans don't actually need to think through their actions step-by-step very often - they've built intuition which gets them straight to the best answer 90% of the time)

Getting good at the long range thinking might require more substantial architectural changes (eg. some sort of separate 'system 2' reasoning architecture to complement the already pretty great 'system 1' transformer models we have). OTOH, it might just require better training data and algorithms so that the models develop good enough strategic taste and agentic intuitions to get to a near-optimal solution quickly before they fall off a long-range reasoning performance cliff.

Of course, maybe the problem is really hard and there's no easy breakthrough (or it requires 100,000x more computing power than we have access to right now). There's no certainty to be found, but a scary breakthrough definitely seems possible to me.


I think you are right, and that the next step function can be achieved using the models we have, either by scaling the inference, or changing the way inference is done.


People are doing all manner of very sophisticated inferency stuff now - it just tends to be extremely expensive for now and... people are keeping it secret.


If it was good enough to replace people then it wouldn't be too expensive, they would have launched it and replaced a bunch of people and made trillions of dollars by now.

So at best their internal models are still just performance multipliers unless some breakthrough happened very recently, it might be a bigger multiplier but that still keeps humans with jobs etc and thus doesn't revolutionize much.


There is no reason that omniscient-yet-dim-witted has to plateau at human intelligence.


I am not sure if you mean this to refute something in what I've written but to be clear I am not arguing for or against what the authors think. I'm trying to state why I think there is a disconnect between them and more optimistic groups that work on AI.


I think that commenter was disagreeing with this line:

> because omniscient-yet-dim-witted models terminate at "superhumanly assistive"

It might be that with dim wits + enough brute force (knowledge, parallelism, trial-and-error, specialisation, speed) models could still substitute for humans and transform the economy in short order.


And we have a good example of a dimwitted, brute-force process creating intelligent designs - evolution.


Also corporations, governments etc. - they're capable of things that none of the individuals could do alone.


Sorry, I can't edit it any more, but what I was trying to say is that if the authors are correct, that this distinction is philosophically meaningful, then that is the conclusion. If they are not correct, then all their papers on this subject are basically meaningless.


This is only tangentially related, but one thing that I am still kind of hopeful about is analytic combinatorics will free us from the tyranny of asymptotic analysis.

I suppose this is kind of a hot take, and in some ways less relevant than it was (say) 10 years ago, but I believe the theoretical CS community has spent a lot of time analyzing the performance of various algorithms over the years and kind of pointedly not inventing algorithms which accomplish some result at all, even if it is only known to be computable in a pathologically inefficient way. This means a lot of breakthroughs that could have been TCS breakthroughs have been invented, usually worse, by other communities.

I am not the only one who thinks so, here[1] is an obscure talk where Michael Mitzenmacher (a well-known CS theory prof at Harvard) makes more or less this specific complaint.

One of the nice things about Flajolet/Sedgewick[2] (mentioned in this post) is that it gives a nice set of combinators for (at least it seems to me, in theory) precisely enumerating how many "steps" an algorithm would take given some input. I optimistically imagine a world where we replace the relatively non-specific asymptotic bounds of an algorithm with a very specific quantification of the work an algorithm will take. And that this in turn would free TCS to focus on things that are more germane to the field as a whole.

With that all said I'm open to the argument that TCS has fundamentally changed and this is no longer such a big issue. But perusing the various TCS conferences I sense it isn't.

[1]: https://www.youtube.com/watch?v=p3anjSAnKBo

[2]: https://algo.inria.fr/flajolet/Publications/book.pdf


> One of the nice things about Flajolet/Sedgewick[2] (mentioned in this post) is that it gives a nice set of combinators for (at least it seems to me, in theory) precisely enumerating how many "steps" an algorithm would take given some input. I optimistically imagine a world where we replace the relatively non-specific asymptotic bounds of an algorithm with a very specific quantification of the work an algorithm will take. And that this in turn would free TCS to focus on things that are more germane to the field as a whole.

Check out the Art of Computer Programming, by Knuth. He writes all of the algorithms in a hypothetical machine code called MIX (MMIX in later editions). He does go through algorithmic analysis to the degree you describe, showing exactly how many cycles a particular routine may take to complete, based on different variables. This is actually fairly complex even for simple programs. Consider the simple case of computing the maximum in a list:

    let current_max := 0
    for x in array_of_numbers {
        if x > current_max {
            current_max = x;
        }
    }
    return current_max;

Anyone can tell you this runs in O(N) time. But exactly how many cycles would this take? Ignoring the architectural details, you would need to know how often the condition "x > current_max" is true. If the list of numbers is in ascending order, this will be true every time, if it is in descending order it will only be true once. The difference in the number of cycles between those two cases is significant. I believe Knuth works through this example, and shows that, for a randomly-ordered array, this would be true around log_2(n) times.


I actually have read a chunk of the relevant TAOCP, but didn't mention it because the comment was already kind of long. I remarked to a friend the other day that it's kind of funny that we've a little bit come full circle on asymptotic analysis in that we are now counting FLOPs, and it is not a little bit like what Knuth does here. :)


I think this needs the use of a CAS (Computer Algebra System, like Mathematica) to calculate the runtime of some code relative to different choices of computer architecture. Then it looks practical. Could even be added to IDEs. But without the help of a CAS, few people would attempt this, because it would be too tedious to do by hand.


Natural log of n, actually. That's because the expected number of times a new max is found is H_n, the nth harmonic number, where n is the size of the array. And H_n = ln n + gamma + o(1).


I don't think it's controversial to say that asymptotic analysis has flaws: the conclusions you draw from it only hold in the limit of larger inputs, and sometims "larger" means "larger than anything you'd be able to run it on." Perhaps as Moore's law dies we'll be increasingly able to talk more about specific problem sizes in a way that won't become obsolete immediately.

I suppose my question is why you think TCS people would do this analysis and development better than non-TCS people. Once you leave the warm cocoon of big-O, the actual practical value of an algorithm depends hugely on specific hardware details. Similarly, once you stop dealing with worst-case or naive average-case complexity, you have to try and define a data distribution relevant for specific real-world problems. My (relatively uninformed) sense is that the skill set required to, say, implement transformer attention customizing to the specific hierarchical memory layout of NVIDIA datacenter GPUs, or evaluate evolutionary optimization algorithms on a specific real-world problem domain, isn't necessarily something you gain in TCS itself.

When you can connect theory to the real world, it's fantastic, but my sense is that such connections are often desired and rarely found. At the very least, I'd expect that to often be a response to applied CS and not coming first from TCS: it's observed empirically that the simplex algorithm works well in practice, and then that encourages people to revisit the asymptotic analysis and refine it. I'd worry that TCS work trying to project onto applications from the blackboard would lead to less rigorous presentations and a lot of work that's only good on paper.


Average-case complexity can be a fickle beast. As you say, the simplex algorithm for LP is great in practice, so it's rarely problematic to use. But meanwhile, people also say, "Modern SAT/SMT solvers are great, they can solve huge problems!" Yet when I try using one, I usually run into exponential runtimes very quickly, especially for unsatisfiable instances.

Overall, it's annoying to tell whether an NP-hard problem is always really hard, or if ~all practical instances can be solved with a clever heuristic. It doesn't help that most problems receive little attention (e.g., to find solvable special cases) after being labeled NP-hard.


> most problems receive little attention (e.g., to find solvable special cases) after being labeled NP-hard.

Since I know quite some people who exactly work on this kind of problem of finding special classes that can be solved in polynomial time, my impression is of course different.

But I think it can be said that if some NP-hard problem is very important in practice and there is no easy way to to get around this problem (i.e. it will also be practically relevant in, say, 15 years), the problem will for sure get a lot more attention.

This is also the reason why some NP-hard problems are much more researched than others.


Yeah, perhaps I was a bit unfair, it's just that the problems that have gotten good results never seem to be the ones I need! C'est la vie, I suppose. (In particular, I've been working with recognizing small formal languages from samples, which has usually NP-hard, but has a surprising number of solvable cases. But my impression is that most modern work has gone into various forms of probabilistic grammars, which aren't really what I'm looking for.)

Sometimes it's also helpful to look into approximation algorithms, e.g., a good LLL implementation can work wonders for certain problems. But heaven forbid someone obtains an inapproximability result for your problem, then you're really in trouble.


> But heaven forbid someone obtains an inapproximability result for your problem, then you're really in trouble.

This is not (necessarily) true:

For example, there exists a great approximation algorithm (Goemans-Williamson algorithm) for MAXCUT in graphs with non-negative edge weights.

On the other hand, when negative weights do occur, one can show (unless P=NP) that there exists no polynomial-time approximation algorithm for which the approximation guarantee is a constant factor times the optimal solution.

But since the Goemans-Williamson algorithm is a great algorithm (if the Unique Games Conjecture holds, and P != NP, it has the best approximation guarantee that any approximation algorithm for MAXCUT with non-negative weights can get in polynomial time) nobody "forbids" you to use it in situations where also negative edge weights can occur. You will loose the approximation goodness guarentee, but in practice, this algorithm nevertheless gives good results in this situation, just not certified good results.


I found SMT solves were stymied by merely running into cubic runtimes.


I just meant that there are lots of problems where I think TCS could have contributed but didn't, because the relevant analysis was not acceptable at FOCS or SODA (or whatever). A good example is the original transformers paper, which is vastly over-complicated relative to (say) a later treatment[1] by Ruslan Salakhutdinov's lab, which shows that attention is "just" plain-old Nadaraya-Watson kernel smoothing—which if you don't know is a weighted average of points covered in elementary grad stats.

I'm not saying it was TCS people would be "better" at discovering this or anything like that, I'm just saying that the opportunity for contribution from TCS people is very high, and because the fields are not well-integrated you often have to wait years to uncover what ML concepts "really" are. I think working together would benefit both ML and TCS!

[1]: https://arxiv.org/pdf/1908.11775


> I optimistically imagine a world where we replace the relatively non-specific asymptotic bounds of an algorithm with a very specific quantification of the work an algorithm will take.

I think you're getting this backwards. The (good) point you and Mitzenmacher are making is that our analysis standards are _too strict_, and hence we avoid good algorithms that are just too hard to analyze.

If we instead require exact combinatorics, e.g., using Flajolet/Sedgewick, that will be _even harder_ than if we _just_ try to do asymptotic analysis. Remember that asymptotics are a way to _approximate_ the exact number of steps. It's supposed to make it _easier_ for you.


Oh I think there is no real doubt that this is true, it's why analytic combinatorics did not succeed in replacing asymptotic analysis. I just wish it had. If it was tractable in a similar set of areas I think a lot of things would be quite a bit better.


In fact in these slides from a 2011/2012 talk, Sedgewick talks about exactly this, even going as far as to draw a contrast between "TCS" and "analysis of algorithms" (the "scientific method"): https://sedgewick.io/talks/#algorithms-for-the-masses = https://sedgewick.io/wp-content/uploads/2022/03/2011-12AlgsM... onwards.

(Also touched upon briefly in https://sedgewick.io/talks/#lasting-legacy-of-flajolet = https://sedgewick.io/wp-content/uploads/2022/03/2013-14Flajo...)

His point is exactly that, often researchers care only about worst-case analysis and asymptotics, which is not necessarily a useful scientific analysis of how long an algorithm is expected to take on a machine as a function of the input size (which was the original motivation for Knuth etc to develop the field).

The same authors (but ordered as Sedgewick and Flajolet!) also have a book called An Introduction to the Analysis of Algorithms; there's a course website at https://aofa.cs.princeton.edu/home/ and videos on Coursera (https://www.coursera.org/learn/analysis-of-algorithms) — it is advanced mathematics but less advanced than the Analytic Combinatorics book.


Coincidentally I just did an asymptotic complexity analysis yesterday (something I rarely do). It seemed right since it's by the book but I wasn't sure. After running the extensive benchmarks on the algorithm, I was so happy to find the results matched the asymptotic prediction. The lesson is verify, verify, and always verify.

[1] Algorithm, https://github.com/williamw520/toposort/blob/master/Algorith...

[2] Benchmarks, https://github.com/williamw520/toposort/blob/master/README.m...


I'm not a researcher, so start with the default assumption that I don't know what I'm talking about, but I had the same impression when studying those topics, asymptotic analysis is kind of broken.

For example, we assume that arbitrary arithmetic takes constant time. Good. Except that if you take that seriously, then P=PSPACE. So we might say that bit operations on fixed registers are constant time. But nobody does that because then arithmetic is O(lg n) and it's too much trouble to keep track of everything (and if you do, then hashtables aren't O(1) any longer!) If we are more sophisticated we might take a transidchotomous model, but that's really weird and we have results like sorting in less than n lg n time. Not to mention that the mathematical foundations get a bit shaky with more than one variable.

I think the hard problem underneath is that it's hard to define a computational model that (a) can be reasoned about and isn't extremely weird, (b) has solid mathematical foundations, (c) is reasonably predictive of real world performance, and (d) is not linked to a specific architecture. Asymptotic analysis is kind of a compromise. It can be formulated in very elementary terms, it makes mathematical sense and if you plot the performance of a sorting algorithm on a chart it kind of looks like the graph of n log n if you squint a bit.

Knuth does something similar to what you're suggesting IIRC, albeit without the analytic combinatorics machinery. For some programs he derives a precise expression as a function of the size of input that describes how many operations of each kind the underlying machine will perform. In numerical analysis there is a similar concept, e.g. evaluating a polynomial with Horner has the same asymptotic complexity as the naive method, but we say it's better because it takes exactly the optimal amount of floating point operations (and, in practice, because it uses fused multiply-adds, but that's another story...)

A research program along those lines would be certainly fascinating, but the combination of "needs to be reasonably predictive" and "needs to be portable" is a tough nut to crack.


Yep, I mentioned this elsewhere in the threads—Knuth does do this quite a bit in TAOCP. I think in a lot of ways we've ended up completely full circle, in that we are now counting FLOPs. This time though I think we have very little hope for something as nice and pretty as asymptotics. At least ones that are useful.


> For example, we assume that arbitrary arithmetic takes constant time.

Arithmetic on fixed-width number types is constant. It's very much not constant on arbitrary precision integers, something that underpins much of cryptography. Indeed, if you try to e.g. use Gaussian elimination (which uses O(n^3) arithmetic operations) to compute the determinant of a large integer matrix, you'll run into trouble, which is why there is another algorithm for this use case: https://en.m.wikipedia.org/wiki/Bareiss_algorithm

(Disclaimer: Also not a researcher)


My sentence there is a bit unclear -- I meant it as an implication.

Arithmetic on fixed-width registers in O(1) is equivalent to saying that bit operations are O(1), which intuitively makes the most sense, after all adding two uint64 is not the same as adding two arbitrary precision integers. But that's not always the model we pick, often we just assume that arithmetic is O(1) and sweep length considerations under the rug.

Also, what does fixed mean? Is it like the word size in an actual computer architecture, where registers are 64 bits and that's all you'll ever get, or it depends on the size of input? Because the latter is the transdichotomous model, yet another option. Unsurprisingly you can get different results with different computational models.

This is not to say that asymptotic analysis is wrong, obviously, but it's a bit of a mess and it's important to remember what computational model the analysis was performed under.


> But that's not always the model we pick, often we just assume that arithmetic is O(1) and sweep length considerations under the rug.

So, I only have rudimentary undergraduate knowledge about complexity theory, but my materials (e.g. in theoretical CS, cryptography and computational algebra) were always incredibly clear about the fact that arbitrary integer arithmetic isn't constant - this is why complexity classes such as weakly vs. strongly NP-complete exist. Do you have any examples of where people would treat arbitrary length integer arithmetic as constant? (I think most people just never have to use arbitrary precision arithmetic, so that's why they'd not talk about it explicitly.)

You're right that computational complexity depends on the model and that this makes it nontrivial to use in practice. The theoreticians would sweep this under the rug with "these models are all polynomially equivalent to Turing machines" (which lets us formulate a conjecture such as P=NP precisely), but of course that doesn't in practice help you distinguish between a constant, linear or quadratic algorithm.


> I optimistically imagine a world where we replace the relatively non-specific asymptotic bounds of an algorithm with a very specific quantification of the work an algorithm will take.

My understanding is that it's pretty much how it started, but proved to be impractical. Meaning, we absolutely can quantify the work any algorithm will take, but for that we need to define the architecture first, i.e., to specify what is the unit of work, what are operations available to us. And, sure, we can do that for any given task (and we always could), but it's usually too messy to include that in general discussions about algorithms, not specifically concerned with the architecture.


Yep, That's exactly why we moved to asymptotic analysis. I was just hoping analytic combinatorics would help us move back to something like this.


In cryptography it is common to get exact estimates for the number of bit operations required to break a cipher. Asymptomatic analysis is great in some circumstances. It for example explains why quicksort is faster than bubble sort for large inputs. Beyond that you want some empirical results.


I am one of these people! I am one of a handful of people who speak my ancestral language, Kiksht. I am lucky to be uniquely well-suited to this work, as I am (as far as I know) the lone person from my tribe whose academic research background is in linguistics, NLP, and ML. (We have, e.g., linguists, but very few computational linguists.)

So far I have not had that much luck getting the models to learn the Kiksht grammar and morphology via in-context learning, I think the model will have to be trained on the corpus to actually work for it. I think this mostly makes sense, since they have functionally nothing in common with western languages.

To illustrate the point a bit: the bulk of training data is still English, and in English, the semantics of a sentence are mainly derived from the specific order in which the words appear, mostly because it lost its cases some centuries ago. Its morphology is mainly "derivational" and mainly suffixal, meaning that words can be arbitrarily complicated by adding suffixes to them. So baked into English is word order that sometimes we insert words into sentences simply to make the word order sensible. e.g., when we say "it's raining outside", the "it's" refers to nothing at all—it is there entirely because the word order of English demands that it exists.

Kiksht in contrast is completely different. Its semantics are nearly entirely derived from triple-prefixal structure of (in particular) verbs. Word ordering almost does not matter. There are, like, 12 tenses, and some of them require both a prefix and a reflective suffix. Verbs are often 1 or 2 characters, and with the prefix structure, a single verb can often be a complete sentence. And so on.

I will continue working on this because I think it will eventually be of help. But right now the deep learning that has been most helpful to me has been to do things like computational typology. For example, discovering the "vowel inventory" of a language is shockingly hard. Languages have somewhat consistent consonants, but discovering all the varieties of `a` that one can say in a language is very hard, and deep learning is strangely good at it.


Awesome. Good luck to you!

I am also working on low-resource languages (in Central America, but not my heritage). I see on Wikipedia [0] it seems it's a case of revival. Are you collecting resources/data or using existing? (I see some links on Wikipedia).

[0] https://en.wikipedia.org/wiki/Upper_Chinook_language


We are fortunate to have a (comparatively) large amount of written and recorded language artifacts. Kiksht (and Chinookan languages generally) were heavily studied in the early 1900s by linguists like Sapir.

re: revival, the Wikipedia article is a little misleading, Gladys was the last person whose first language was Kiksht, not the last speaker. And, in any event, languages are constantly changing. If we had been left alone in 1804 it would be different now than it was then. We will mold the language to our current context just like any other people.


Super interesting, thank you very much for sharing your thoughts!

HN is still one of the few places on the internet to get such esoteric, expert and intellectually stimulating content. It's like an Island where the spirit of 'the old internet' still lives on.


I am applying for graduate school (after 20 years in the software industry) with the intent of studying computational linguistics; specifically, for the documentation and support of dying/dead languages.

While I am not indigenous, I hope to help alleviate this problem. I'd love to hear about your research!


I did research on entirely unrelated NLP, actually. I worked on search for awhile (I am corecipient of SIGIR best paper ‘17) ad a bit in space- and memory- efficient nlp tasks.


Still, that's cool. Is this the paper in question? https://dl.acm.org/doi/10.1145/3077136.3080789


That’s the one!


Wow kiksht sounds like a pretty cool language! Are there any resources you'd recommend for the language itself? I'm mostly curious about the whole "a verb with prefix structure can be a whole sentence" thing, that sounds like a pretty cool language feature!


> I'm mostly curious about the whole "a verb with prefix structure can be a whole sentence" thing, that sounds like a pretty cool language feature!

That's a fairly common language feature; such languages are generally called "agglutinating".

Prominent examples of agglutinating languages are the Eskimo languages, Turkic languages, and Finnish.

https://archive.is/QQnB6

There should be no shortage of resources available if you want to learn Turkish or Finnish.



So, bad news. Culturally, the Wasq'u consider Kiksht something that is for the community rather than outsiders. So unfortunately I think it will be extremely challenging to find someone to teach you, or resources to teach yourself.


How do you combine that feeling/observation together with what you're working on now, which I'm guessing you'll eventually want to publish/make available somehow? Or maybe I misunderstand what the "it will eventually be of help" part of your first comment.


I myself am very conflicted. I will always do what the elders say but opening things up greatly enhances the chance of survival.


Yeah, I understand, somewhat of an dilemma. Still, I wish you luck and hope that you manage to find a way that is acceptable to everyone involved.


Does that present philosophical questions on if an AI is part of the community or an outsider?


I’m not sure that the AI itself has many ethical complications. Many terms of service grant AI companies ownership of the data you supply to them though and that very problematic for many tribes which consider the language sacred and not to be shared.


I wonder why their language is dying. /s


Having a language that only your in group can speak is a good survival strategy


Good luck I wish you the best. I think you will almost certainly need to create a LoRA and fine tune an existing model. Is there enough written material available? I think this would be a valuable effort for humanity, as I think the more languages we can model, the more powerful our models will become because they embody different semantic structures with different strengths. (Beyond the obvious benefits of language preservation)


There is more material than you'd normally find, but it is very hard to even fine-tune at that volume, unfortunately. I think we might be able to bootstrap something like that with a shared corpus of related languages, though.


You can always fine tune with the corpus you have and then try in context on top the fine tuning even if it’s insufficient. Then with that - and perhaps augmenting with RAG against the corpus - you might be able to build a context that’s stable enough in the language that you can generate human mediated synthetic corpuses and other reinforcement to enrich the LORA.


> I will continue working on this because I think it will eventually be of help.

They say language shapes thought. Having an LLM speak such a different language natively seems like it would uncover so much about how these models work, as a side effect of helping preserve Kiksht. What a cool place to be!


Almost sounds like Cebuano / Waray-Waray in that sense.


I was wondering about what the limitations were.

Lots of languages, even Indo-European languages, have very different word order from English or a much less significant word order.


According to Wikipedia there were 69 fluent speakers of Kiksht in 1990, and the last one passed away in 2012. How did you learn the language?

https://en.wikipedia.org/wiki/Upper_Chinook_language


I learned it from my grandmother and from Gladys's grandkid. Gladys was the last person whose first language was Kiksht, not the last person who speaks it.


FYI, the Wikipedia article currently states:

> The last fully fluent speaker of Kiksht, Gladys Thompson, died in July 2012

Which is from https://web.archive.org/web/20191010153203/http://www.opb.or...

Maybe there are some sources talking about fully fluent people still being alive? As currently the article gives the impression they were the last person to "fully" speak it.


I’m aware and I think people do not care much to correct it. The tribe has had consistently bad experiences with outsiders attempting to document this kind of thing and I sense that people are mostly wanting to be left alone at this point. There is some possibility that I would make a different decision, but it’s not for me to decide, it’s for the elders.


This is the way ;)


I really love this article. I made the transition in the opposite direction: I am a distributed systems engineer who found himself working on a collaborative text editor, and initially it was very unintuitive to me that text editors were not a completely solved problem.

One of the most confusing things about the space is that you only learn after using every high-level editing library (Lexical, etc.) that ProseMirror really is the right abstraction, a masterpiece of engineering and architecture. The reason this is unintuitive is because ProseMirror is essentially the same level of abstraction as a compiler middle-end, and it's surprising that there are no higher-level abstractions that seem to work better in practice. If you are doing something remotely non-trivial, you will need the customization that only ProseMirror provides, and you will learn that the hard way. You will need the decade of bug-fixes for various touch devices and old browsers, you'll need the careful work that has gone into performance, and so on.

For a long time the missing piece was that (1) PM defaults are not what the vast, vast majority of users want, and (2) it is nearly impossible to get it to work cleanly with React. react-prosemirror, in my view, is the solution. Every day I am happy that I get to work with it, and I am not sure what we'd do without it.


>If you are doing something remotely non-trivial, you will need the customization that only ProseMirror provides, and you will learn that the hard way.

As far as I know Meta's Lexical [1] (used by Facebook, Whatsapp, and Instagram) is goes also a similar direction: customizable, but it takes time to get used to it.

[1] https://github.com/facebook/lexical


Thanks for this perspective.

Does ProseMirror handle large files well? One thing that impresses me hugely about browsers is that they all seem to render pages of seemingly unlimited size robustly and with speed (for example, whole novels on Gutenberg.org).

In my experience no similar editor, including Google Docs, can deal with documents more than 10-12 pages but I haven’t tried ProseMirror AFAIK.


>Does ProseMirror handle large files well?

The official example page for ProseMirror includes the full text of Herman Melville's Moby Dick[0].

[0]https://handlewithcarecollective.github.io/react-prosemirror...


Holy shit! Thanks a ton


I’d love to hear what people are reading to actually understand this. At this point I will openly admit that I do not know if what’s happening in Argentina is good or even on purpose. I sense that everything I read is at best motivated reasoning and at worst propaganda, and I’m not skilled enough to determine what is correct and what is not.


One of the things I like about this paper is the pristine, crystal-clear English it uses to describe things I thought I knew extremely well:

> Processes are the software analog of processors. They are the units of software modularity, service, failure and repair. The operating system kernel running in each processor provides multiple processes, each with a one gigabyte virtual address space. Processes in a processor may share memory, but for fault containment, and to allow processes to migrate to other processors for load balancing, sharing of data among processes is frowned upon. Rather, processes communicate via messages. They only share code segments.

"Processes are the software analog of processors." Yes of course. How easy it is to never ponder such a thing.


> "Processes are the software analog of processors." Yes of course.

Maybe this analogy worked in 1986 when hardware was a lot less reliable, but I don't think it goes very far these days: processes die all the time (normal termination, crash, get killed). When was the last time a processor or core died on you? In fact, according to this analogy most of us are running MS-DOS equivalent of systems since if your processor or core dies, your machine dies. And I don't see this changing any time soon.


I think the point they're making is that a process is a software abstraction that provides programmers the illusion of having a processor all to themselves. So in that sense the process is a software "analog" to the processor.


>When was the last time a processor or core died on you?

Almost every day. You may never experience if you only manage tens or hundreds of cores, but when you manage hundreds of thousands of cores, it happens.


I'm not sure if you're being serious, or delivering scathing sarcasm.


I am being serious. There is a lot of value in making complicated things trivially understandable. I think it's not an accident, for example, that many of the highest-cited academic papers in CS are survey papers.


Apologies from my cynical self & Poe's Law.


There is one other way in which you are not a failure, and I am sure that I am not alone in thinking about it: I have been reading your words for 15 years, from the time I was a baby CS major. That's really true, e.g., here[1] is a comment from 11 years ago where I mentioned that a lisp you wrote for the Apple //e informed a lisp for the Apple //e that I wrote.

Especially in my college years (2009-2013, ish), the world seemed smaller, and to me, a lot of what you wrote was like looking through a keyhole into the real world. Grown-ups can apparently write lisp at JPL[2]! Google was chaotic[3][4][5][6] to work in 2000, especially if you commuted from Burbank. A name change[7]. And, variously, surveillance, more lisp stuff, etc. Now I'm an adult and I still haven't professionally written lisp, but I'm glad to have read about it anyway.

Anyway, it might surprise you to learn that when I think of your writing, though, I think of your HN comments first. There is a kind of influence that you can only get with a steady, unthanked build-up of seemingly-small contributions over a long period of time. In the right place they compound. I probably cite you to other engineers once a month or so.

[1]: https://news.ycombinator.com/item?id=6199857#6200414 [2]: https://flownet.com/gat/jpl-lisp.html [3]: https://web.archive.org/web/20080212170839/https://xooglers.... [4]: https://web.archive.org/web/20090408003924/http://xooglers.b... [5]: https://web.archive.org/web/20090408013612/http://xooglers.b... [6]: https://web.archive.org/web/20090408003913/http://xooglers.b... [7]: https://flownet.com/ron/eg-rg-faq.html


I think of GitHub as a sad story. There are probably going to be at least 3 public companies that should have "just" been GitHub features: GitLab (GitHub Enterprise), Sourcegraph (search), and Linear (GitHub Issues). There are dozens of upstarts "unbundling" features of GitHub that are genuinely useful, but which are under-loved and under-invested in, and surely some of those will become successful too. It feels like GitHub continuously invents the future, and then waits for someone else to make it truly great.

It is so painful to watch because I love GitHub so much. I graduated college in 2013, which means I started programming right when they got started. I read their dev blog every single day, eagerly waiting for new features (which were released every couple days). I watched their team page grow, I looked carefully at what they did to deserve a spot at such a cool company. I scoured the projects they contributed back to for hints about what good code looked like (my favorite was Vicent Martí's contributions to libgit2). I eagerly taught my friends how to use git because university taught subversion. I wrote my own Rails tutorials as a way to learn the magic.

But it's been years now, and it's not an easy love. It's work! It's so painful to watch it get continuously lapped by upstarts. I always just thought the core offerings (Pull Requests and Issues) would get better eventually. And now, 14 years later, they finally are. But very slowly. I really want to believe, but after 16 years, it is starting to sink in that GitHub might just be a place to store files.

The magic is still in there. I think there are lots of people like me, who want to believe. But it will take real agency to do it, and that's really hard to muster at this stage in a company's life.


I like GitHub precisely because it didn't try to capture the entire market of issue trackers and code search too aggressively.

If GitHub sub-issues had existed even in an inferior form back in 2019, developer-targeted trackers like Linear and Shortcut would have had a hard time existing, and all of their ideas (some of which have advised the UX of today's GitHub sub-issues release!) would have been lost to time.

Now, perhaps this was not value-maximizing to Microsoft, or value-maximizing to companies who now need an extra license for Linear - but I would argue that for actual developer experience, GitHub fostering a spirit of innovation through its own stagnation has created a vibrant ecosystem better than anything any company could do themselves.


Yes this is one of the reasons this discussion is so tricky. I just believe that if I maintain a popular project on GitHub I should not be automatically consigned to worst-in-class experiences for Issues and code review. I understand this is mildly controversial but I have had maintainer status on a few top 0.01% GitHub repositories and I have seen tools that do not suck, and so my opinion is that a better world is possible.

Again, I say all of this entirely with love. I love GitHub. I have used GitHub since I started programming. I want them to win.


> automatically consigned to worst-in-class experiences

You said it perfectly. This is why there are a lot of people willing to create better experiences on top of GitHub’s API.

I created CodeApprove (https://codeapprove.com) to improve GitHub’s code review and there’s also Graphite, CodePeer, Reviewable, and others doing a great job.


Any of them support dependencies between issues? Without it, it's still all worst-in-class experience for the purpose of managing work. Yes, this is distinct from issue tracking, which is an input, and usually external, but people seem to have this weird habit of trying to manage software projects using GitHub or GitLab issues.


> There are probably going to be at least 3 public companies that should have "just" been GitHub features: GitLab (GitHub Enterprise), Sourcegraph (search), and Linear (GitHub Issues).

I don't agree, and I cannot understand what train of thought would lead to conclude that each individual feature that's a crucial part of any developer's workflow should be broken down to external companies without any good reason at all.

Any developer's workflow consists of a) ticketing, b) revision control, c) auditing code and change history, d) CICD. Obviously a) b) and c) are tightly coupled and you cannot have one without the other, whereas d) invariably leads to a) b) and c) in any incident response scenario. There is no argument that justifies any alternative to a unified developer experience.


Sorry if this was not clear, but I am not making a normative statement that each feature should be broken out into its own tool. I'm saying that the revenue ramp on each these products is, empirically, good enough to be on track to IPO. So GitHub is apparently doing a bad enough job in each of those areas that three separate companies were able to break a feature out into a dedicated product, sell it on its own, and ramp revenue fast enough to support a venture class business. Absolutely wild stuff.


I feel like ticketing is something that has been pasted on for corporate reasons. Keeping track of the work to be done doesn't require a ticketing system, anyone remember using Story Cards? A whiteboard?


Well, if they wanted to implement keeping track of work to be done, they'd have long ago implemented subissues, sub-subissues, and sub^n-issues, because obviously work does not break down into a list, or a one-level-deep tree - the natural data structure for breaking down work is a directed graph (and if you really must do an 80/20 on it, then a tree).

Sadly, to this day, out of popular software packages used in software companies, only Microsoft Project and maybe Jira get this right. Unfortunately, they have plethora of unrelated issues that are big enough that they drive the industry towards silliness of managing work in tools like Trello or Github/Gitlab issues.

EDIT: I sometimes wonder how much of how Agile implementations look is a consequence of simplistic tooling. When your team's work planner is fundamentally unable to represent the basic idea that tasks can depend on other tasks, you can't really plan ahead much further than a sprint. So is the Agile cadence really about avoiding the Folly of the Waterfall, or is it just shitty tooling that makes it exponentially hard to think ahead for more than two weeks? Did we invent "epics" because we can't link together multiple tasks on a Kanban board?

And then, how many companies SCRUM the living hell out of their developers, while PMs secretly plan the whole thing ahead in MS Project to keep the project on track towards some goal, and then pray no one on the dev team spots a Gantt chart open on their screen?

(Personally, I worked under one once; they begrudgingly admitted to managing everything in MS Project once I started asking if I can get a license, because even Emacs Org Mode sucks at the job if the job is to break down month's worth of work on entire deliverable.)

EDIT2:

> Keeping track of the work to be done doesn't require a ticketing system, anyone remember using Story Cards? A whiteboard?

Anyone remembers a project plan? Work breakdown structure? I know PERT looks to people like a Borg Cube, but then the status quo in software dev looks like Pakled technology. "We are devs. We scrum. Trello make us go!"


I wouldn’t say Jira gets it right.

I find its subtasks a little half baked and I feel like portions of Jira weren’t designed with sub-tasks in mind. We actually avoid them completely at my current work.

It’s understandable why a lot of systems don’t support sub-tasks. Hierarchical data structures are not the easiest to deal with and come with a litany of edge cases. It can also be quite overwhelming UX wise.


> I feel like portions of Jira weren’t designed with sub-tasks in mind

That's true, and I can see why you avoid them. Last time I touched Jira, I quickly learned that subtasks are annoying.

Dependency releationships between tasks are more general and better overall. Unfortunately,

> Hierarchical data structures are not the easiest to deal with and come with a litany of edge cases. It can also be quite overwhelming UX wise.

Don't know of any software that fully figured it out. Even MS Project, which does the relationship and scheduling parts a-ok, is a PITA to use in general, and has some pretty annoying bugs that made me stop using it. Being able to irrecoverably brick the auto-scheduler and have it suddenly crash on opening the project file is a kind of show-stopper.

Another tricky challenge here is scope. Planning, scheduling and doing are three different mindsets, and it's tough to balance them all in a single tool. Someone who figures out how to do it well stands to make a lot of money.


> I sometimes wonder how much of how Agile implementations look is a consequence of simplistic tooling. When your team's work planner is fundamentally unable to represent the basic idea that tasks can depend on other tasks

In the agile world it's accepted that a task that today looks like it depends on another task may not have that dependency next week. Planning involves scheduling what the stakeholders prioritize now that can be done, and whatever lands in the "we can't do that until we do other thing" doesn't even get scheduled.


> Planning involves scheduling what the stakeholders prioritize now that can be done, and whatever lands in the "we can't do that until we do other thing" doesn't even get scheduled.

Planning and scheduling/prioritizing are two distinct processes people often confuse. Planning is about identifying what needs to be done, and breaking it down into a network of tasks joined by dependencies. Scheduling is figuring out who's going to do what and when they're going to do it.

Planning is figuring out your service needs e-mail verification, password reset, and some transactional messages, to stuff like "design e-mail templates" "write templates for {verification, reset, account confirmation} e-mails", "set up infra for sending e-mails", "add {verification, reset} URL flows", "make it send {verification, reset, confirmation} e-mail on {{relevant action}}", etc. Scheduling is me assigning "design e-mail templates" to Joe in graphics dept. and giving him 3 days for it, assigning you the "set up infra" task and giving you a week, and knowing the "make it send ..." tasks won't start until next sprint, at which point we'll find someone to do them, etc.

In naive Agile reality, while <<whatever lands in the "we can't do that until we do other thing" doesn't even get scheduled>>, it gets recorded somewhere (e.g. product backlog), but it won't have its dependency relationship encoded. So every sprint, someone goes over a heap of such tasks, and has to manually reconstruct those relationships, to the degree it lets them plan out the next sprint or two. This both adds work and limits the ability to plan forward (you ain't going to be even thinking about stuff that's obviously beyond the horizon).

While it's true that stakeholder priorities change (as they should), and this affects scheduling, it affects planning much less, as the dependencies aren't about order in time, they're about order in execution. Dependencies aren't volatile - no matter what the stakeholders think this week, you won't be able to work on "make it send emails" until you can send e-mails (infra task) and have e-mails to send (design/write) and have them be useful (endpoints/routes for links). This is useful information to have encoded, particularly with more interdependent features - not only the system will help you do the scheduling correctly, it'll also let you quickly tell that unscheduling work on X now will delay work on Y down the line, etc.


Planning and scheduling/prioritizing are split into separate things by people who think less about shipping working code and more about counting beans.


Shipping working code is a means to an end, not end unto itself. Planning and scheduling/prioritizing are split into separate things by people who think about and then tell the people shipping working code what they should be writing and shipping.


I hate bureaucracy and will actively question the necessity of any imposition of bureaucracy, and resist its imposition if it doesn't seem justified.

Nonetheless, there is use to more than just sticky notes. It tends to come with scale, and trying to coordinate and align teams of teams of teams of teams, etc.

Additionally, sticky notes are impractical for remote teams (would there be some always-on webcam pointed at them?)


Don’t forget code review. I created CodeApprove, but there are so many other good alternatives that could have been GitHub features (Graphite, CodePeer, Reviewable, etc).


FWIW, this specific feature - what they are now calling sub-issues - is actually better described as a framework for modeling proper parent-child relationships in their system, which is something quite hard to get right, mainly because it has to work somehow with the existing issues feature set. People building this feature from scratch (e.g. Linear) have it trivial to solve, because they didn't have any backwards compatibility issue to worry about. This is, of course, absolutely Github's fault for not designing it to accomodate such a thing easily in the first place, but the people who built this feature are probably not the same people who made that original decision.

They've been working on some version of this feature for several years now in various iterations. I believe this is either their third or fourth attempt to get it right - I was trialing a beta of some previous iteration of it a few years ago and it was incomplete/not fully well thought out which must be why they dropped it. I'd trust the feature here at least to be decent now, because of how many attempts they've had at it.

But yeah if I was a company like Zenhub I would be probably a bit worried at this announcement since it is almost inevitable that this specific feature is going to be enough for people to no longer need third party management of issues. I know in a previous company I worked for that specific feature (proper parent-child relationships) was the reason they used Zenhub, and same for my current company using Linear.


> FWIW, this specific feature - what they are now calling sub-issues - is actually better described as a framework for modeling proper parent-child relationships in their system, which is something quite hard to get right

That's a wrong framework to use. Parent/child relationship is a special case of dependencies between tasks, which can be more generally modeled as start/finish relationships, i.e. one of: Start-to-Start, Start-to-Finish, Finish-to-Start and Finish-to-Finish; parent-child relationship is Finish-to-Finish here. This distinction starts to matter if you want to plan out more work items than you can quickly eyeball on a Kanban board, and it's all well-trodden ground with nearly century worth of prior art, to which our entire industry is entirely oblivious.

Useful search terms: PERT, Gantt charts, critical path, PMBOK.

Popular software occasionally spotted in use in our industry (usually by PMs, in secret) that supports it: MS Project, maybe Jira, ... is there anything else?


Once you have sizing, the ability to make an issue dependent, estimated start / end, don’t you have everything needed to generate a PERT / Gantt chart?


You also need the proper dependency types[0] I mentioned, though I'd say you can get away with FS for dependencies and FF for subtasks.

Beyond merely generating a Gantt chart, when you have dependencies and estimated duration you can start estimating how long things will take and which ones can take longer or be delayed without impacting overall project time; add the ability to provide constraints on start and end time, and you can automatically schedule work (and re-schedule at it happens).

Most of these things are in every system in some form or other, but dependencies seem to be missing almost everywhere - and dependencies are the one key component that enable major benefits and turn the system into a proper project planning tool.

--

[0] - https://en.wikipedia.org/wiki/Dependency_(project_management...


Sorry, the goal here is not to trivialize GitHub, or suggest they suck at engineering, or even to say this is easy. I am just saying my own perspective, which is that I wish the core properties (pull requests, issues) had improved a long time ago. I have been a maintainer of a top 0.01% project, and I have seen it get in the way where other tools would have been tolerable. It hasn't always been easy to be a fan.


GitHub is a hugely successful company. They make a shit load of money. Your idea of success is not in congruence with actual success.


Ok, let's try from another angle. Close your eyes and picture the ideal, best, most successful product in this space, for whatever definitions those words mean to you.

Everyone is different, but I think most people would not picture a source control product where search, issues, and on-prem are each so terrible, the default solution for that feature is to use a completely different product from a totally different company, which solves that specific issue (Sourcegraph, Linear, GitLab, respectively).


> I started programming right when they got started.

Then let me tell you about SourceForge. SourceForge, around the time that GitHub started, had a lot of these features (or at least, the ones that were standard practice at the time). It was the de facto home of open source software development at the time, but it was also a megalith that tried to do everything.

GitHub was a breath of fresh air precisely because it had just the right balance of minimalism and features. There were plenty of truly minimalist hosts out there, so we didn't need any more of those. But we also had lots of bloated and slow code hosts. And it deserves to be stressed that the bloated ones were just as unusable as the minimalist ones, even while being (again, for the time) as feature-complete as they come.

Bloated products can't pivot. New features often don't integrate well with existing ones, and they take a long time to develop. We're seeing this with GitHub too, as feature velocity has dropped. But the difference is that GitHub's feature set, particularly their core initial feature set, was exceptionally well designed. They've lost a bit of this edge over time, but overall I think they've done a well enough job of balancing the thoughtfulness of their build-out with doing things at a reasonable speed.

I just want to emphasize this point because I think it gets lost, especially if you're not familiar with what the competition of the time looked like. GitHub absolutely would not have made it to where it is today if they had rushed to add features. For a comparison of what it looks like to build features first and put design on the back burner, see GitLab. There's a reason why I still find their product difficult to navigate despite it being one of the main tools I use in my day-to-day work. And I fear that if anything, GitHub is succumbing to the lure of feature count and becoming its own bloated monolith over time, even with their slow pace.


Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: