Hacker News new | past | comments | ask | show | jobs | submit login

> The distinction between code and data is very real, and dates back to at least the original Harvard Architecture machine in 1944. Things like W^X and stack canaries have been around for decades too.

You are right in some sense, but wrong in another:

You can easily write an interpreter in a Harvard Architecture machine. You can even do it accidentally for an ad-hoc 'language'. An interpreter naturally treats data as code.

See eg https://gwern.net/turing-complete#security-implications




An interpreter still models execution at a nested level of abstraction — data remains distinguishable from code.


In some sense, yes. In some other sense: in an interpreter data controls behaviour in a Turing complete way. That's functionally equivalent to code.


And in reality, it's the other way around: Harvard architecture is the interpreter written on top of the runtime of physics. Reality does not distinguish between code and data. Formal constructs we invent might, and it almost works fine in theory (just don't look too close at the boundary between "code" and "data") - but you can't instantiate such systems directly, you're building them inside reality that does not support code/data distinction.

(This is part of the reason why an attacker having physical access to target machine means the target is pwnd. You can enforce whatever constraints and concocted abstraction boundaries you like in your software and hardware; my electron gun doesn't care.)

In terms of practical systems:

- There is no reason to believe human minds internally distinguish between code and data - it would be a weirdly specific and unnatural thing to do;

- LLMs and deep neural models as they exist today do NOT support code/data distinction at the prompt / input level.

- Neither natural nor formal languages we use support code/data distinction. Not that we stick to the formal definitions in communication anyway.

- You could try building a generic AI with strict code/data separation at input level, but the moment you try to have it interact with the real world, even if just by text written by other people, you'll quickly discover that nothing in reality supports code/data distinction. It can't, because it's nonsense - it's a simplification we invented to make computer science more tractable by 80/20-ing the problem space.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: