It's part of the tokenizer. t contains data in the format of [char 0][char 1][base 64 encoded token id …], whereas the id is as long as the entry is smaller than 'b'. So it's ('++', `#m`), ('--', `%am`), ('*@', 'R<^1c'), and so on. '@' is used as a wildcard. As you can see this part parses operators.
First thought would be "opcodes", but on a closer look the operators in there seem to be relevant. Maybe this encodes their meaning/relevant opcode and precedence?
The unary operators ++ and -- have a higher precedence, so they come first. Then there are "*@R<^1c", "/@%[_[H3c", "%@%[_[H3c" (same as for '/', though), "+@.B#d" and so.
I did not have time to read the source (my compile finished a minute ago), but maybe you can now figure out the rest? :)
//edit: ha, edflsafoiewq and xashor we're a bit faster and more thorough. But nice to see we agree on the overall idea.
True, although "the largest QR code" is substantially more dense than most people have ever encountered: https://en.m.wikipedia.org/wiki/File:Qr-code-ver-40.svg . People might not realize that QR codes scale up and down very gracefully; most QR codes that people have ever seen probably encode less than 80 bits.
Reminds me of a project a friend and I took part in, sending each other source code via postal mail. I ended up sending a QR code printed on film negative [0] to pack about 1k of bytes. Bellard's code is way more "information-dense" of course..! :-)
That's an incredibly kind comparison. Thank you! Do you have a reference where Feynman talks about this? I'm finding this [0]:
"Arline entered the nearby Albuquerque sanatorium, from where she wrote him letters in code — for the sheer fun of it, because she knew how he cherished puzzles, but the correspondence alarmed the military censors at the laboratory’s Intelligence Office. Tasked with abating any breaches to the secrecy of the operation, they cautioned Feynman that coded messages were against the rules and demanded that his wife include a key in each letter to help them decipher it. This only amplified Arline’s sense of fun — she began cutting holes into her letters, covering passages with ink, and even mail-ordered a jigsaw puzzle kit with which to cut up the pages and completely confound the agents."
In Montreal, the really really great people at Borealis [0] made it extremely easy. I sent them a 300 DPI file by email, and they printed negatives from those files in a few days. Each negative cost about $8 to print.
Finding a lab that did this online was pretty difficult at the time. If others have recommendations -- please chime in! Otherwise, I would strongly recommend considering Borealis (who I'm sure can ship the negatives by mail -- otherwise reach out and I can facilitate the mailing as I live here ha)
It was an art project -- called "Postal Codes" -- where an artist friend and I would correspond by creating sketches (typically using p5.js or three.js) and sending them by "snail" mail. There is a bit more information here [0], here [1] and here [2].
It started as that but it's since turned into completely usable C compiler that supports 32-bit and 64-bit x86 and ARM platforms. There are still developers adding to the project every day.
I absolutely adore TCC, fwiw. It’s the base project that this one is derived from.
Like all of bellard’s work, this is legit and also extraordinarily high quality. At one point before I got bored I basically had a game engine that took C source code as scripting inputs. Totally stupid, and worked great thanks to this.
Speaking of mad genius ideas, my mind was blown by https://bellard.org/tcc/tccboot.html a boot loader able to compile and boot a Linux kernel directly from source code.
I wonder does Fabrice hang-out on HN, Reddit or some chat channels?
I very much like to know his views on topics like Rust, RISC-V, Blockchain etc... I'm curious about his workflow, dev. machine, preferred distros, editors and the like :D
An AMA or a technical interview done by someone knowledgeable in the field would be awesome!
He has his own emacs clone named QEmacs[1] which I think he uses personally. He uses Subversion as his version control system (at least in developing QuickJS) according to this post[2] by him in quickjs-devel mailing list.
Linus has his own port of the venerable MicroEmacs[1]. As does Walter Bright[2], by the way.
qemacs is actually quite advanced, and even includes a basic HTML renderer (for the help). I was half joking that with quickjs, one now can make a reasonably modern web browser out of bellard-ware…
If you find this level of density interesting, it may be illuminating to know that there are programmers who work with code that is as dense if not more so, every day as part of their job, not even trying to be dense or win the IOCCC.
Of course, I am referring to the users of APL family languages:
When I look at Knuth code snippets in CWEB I get the same vibes. He has a full implementation of garbage collecting zero suppressed binary decision diagrams that he used to count various stuff and it’s insanely elegant and minimal.
The way I approach programming in general is to put a ton of thought in up front about how to do the least amount of work possible. Is there a very simple elegant way to do this? Can I collapse it down even more? Could I do it in just one or a few source code files of reasonable size?
For some reason the thought that I will never be as productive as himself brings me peace. Huge respect. I remember looking at his entry in 2002 like it was yesterday, that was a pleasurable mind fk.
It's literally explained in the first line on the page, where he says he wrote it "in order to win the International Obfuscated C Code Contest (IOCCC) in 2002."
So it's not that it wouldn't work, but it obviously wouldn't have worked for that contest.
Without having looked at the code I'm guessing he used unimplemented language features in the non-obfuscated version, then re-wrote to avoid using them in the obfuscated version.
I did in my music/generative art app, where I compiled math expressions into native code using TCC as a backend. Actually, it was _so_ much easier than dynasm and other JIT libraries, and so little code, and yet, it was a great performance boost already.
Suppose you want to defeat a "Reflections on Trusting Trust" attack and bootstrap a toolchain source code you believe is clean to produce binaries that are clean. You need to compile llvm/clang (or gcc). You start with something small you can audit manually, use it to build binaries that can compile llvm, then use llvm to build itself to get an optimized compiler that doesn't have malicious code to recognize the password check inside the login binary and emit machine code that is not faithful to the source code.
Interesting! Still, I believe this doesn't prevent malfunction resulting from a compromised silicon...Like a sequence of instruction executions or memory accesses resulting to sending over a packet via the network.
I wonder how can one prevent that? Even an open-source ISA could not be fully trusted since one can't know what is implemented on the actual die on hand.
It's true that the silicon is hard to audit, but it's also pretty static, and there's a huge variety of pretty effective obfuscation that can live on top of the silicon. The
You suggest that an adversary could use some compromised layer to send a packet over the network when something happens. For example, someone running `aesenclast xmm15, xmm10` could also secretly trigger a socket()/send()/close() syscall set that would transmit a packet over the network adapter. But what if the target code never calls the AES-NI instructions? A user could easily be using a library that processes the encryption in software; a simple ISA filter has an impossible task to determine which xor instructions are cryptographic and which are just ordinary compiler output. You could be running a virtual machine, in which you've loaded a browser Javascript engine, and using that Javascript engine to run a virtual machine (https://bellard.org/jslinux/), in which you're running a Python instance, which is finally performing the encryption.
Sure, if all those layers were efficient, they'd just pass it down the stack to eventually call `aesenc`, but modern software is anything but efficient. Yes, if your deeply nested Python calls "sum = num1 + num2", the x86 `add` instruction likely gets invoked somewhere, but to predict when some client code is handling a secret key or when some user input is actually a password seems really difficult.
tl;dr of a pamphlet I decided not to post: If you're [seriously] considering anything beyond "my firewall is strictly unable to report it", you're either building something like nuclear weapons or taking the first steps to sliding down the rabbit hole of paranoia, ending up in tinfoil-hat-wonderland ;-)
The topic is interesting from an academic point of view. Maybe something along "On trustworthy, verifiable execution on untrusted, adversary hardware". Like homomorphic encryption, but I think having the user supply the input would make for a difficult programming+debugging cycle :P
Anyway, the goal of this compiler here isn't to prevent ANY kind of attack. As the page says, it is "only" a tiny C compiler, and an obfuscated one at that. The "tiny" part makes it much easier to audit than say GCC or clang (and you don't need to audit GNU binutils, either - and I've seen that code, so trust me when I say: You don't want to go that route).
//edit: Obviously I don't want to insult the fine folks investing their time into projects such as binutils, they deserve quite some praise - but as someone who is not familiar with the code base I found it to be very overwhelming.
If you wanted to solve for that, you would have to build a computer like the monster6502 [1] to run the compiler on, and visually inspect the memory contents.
The ultimate question is: How far do you need to go until you believe that the earth is indeed not flat? Do you trust your teachers? Stand on a tall building? Take a balloon ride? A plane? A spaceship? Then still, at which point can you rule out that the earth just appears to be a ball due to some yet-to-be-explained phenomenon (gravitational anomaly?) but is indeed flat? (The correct answer of course is "turtles all the way down").
For some definition of 'easy', yes. You still need to know a lot about how a CPU works. I think a project to bootstrap a PC with code generated from a completely verifiable platform would be really cool, but in practise would be of limited usefulness above and beyond secure boot we already have, because you're still trusting all the binary BLOBs you can't control (BIOS/UEFI, Intel ME, microcode, etc)
At one point a variation of it could compile a Linux kernel (very fast too!) and boot from the same ISO. Of course GCC or LLVM produced better optimised code, but TCC, being so small runs so much faster.
Another practical application is using tinycc as a backend compiler for Nim [1]. I set up my nim.cfg to default to this for a rapid edit-compile-test cycle (usually under 250 millisec), with a quick define switch to move to gcc-optimized code.
Similar is likely possible for other prog.langs that emit C.
One helpful feature for the full round-trip to an executable file is a built-in object file linker.
Also, libtcc can be used as a library to "compile a string" and then run it which is the JIT application mentioned elsewhere in this thread.
Yes, for winning IOCCC competitions! Which would normally be a great CV badge to demonstrate your skills to prospective employers, but likely of little consequence if you're Fabrice.
If you have the sort of brain that makes winning IOCCC entries, it is highly probable that a candidate employer who has not heard of it is not, in fact, prospective.
He also created QuickJS, that's javascript server side, like NodeJS but much faster to start programs. Just type "qjs your-script.js". I use it in my server because of its low overhead to start and run my microservices. https://bellard.org/quickjs/
And around a week ago, his text compressor topped the enwik9 (Hutter Prize) benchmark. Not prize-eligible though-- that would require a single core, non-GPU build to run in <100h. His nncp 2.1 benchmark took 140h to run on a GTX 3090.
I also wonder his employer might be the nicest do-whatever-you-want-in-your-free-time employer on earth.
Not only that, I wonder what kind of day-job roles he has since I assume there is significant cognitive load in his personal projects (we still think he's a human not a superhuman right?).
For context, my last employer was "Oh you're doing something in your free time? let me utterly destroy you." (and they did).
Founded in 2012. That's like yesterday. 80% of his stuff was done by then.
edit: Even then, now he probably has the world's nicest co-founder instead. Who does cognitively demanding open source software projects made available for free online, while running a startup?
They also had a policy that everyone should spend at least 4 hours every week (outside of the 40 hour work-week) doing some personal project, or learning. The more the better. (it was a recommendation, not a requirement, and there was no monitoring).
Well what I had in mind was a bit more ambitious than four hours. It did not infringe on their bottom-line (I wasn't using their tech or competing with my day-job; it was completely unrelated). But for me they were like "Oh, no, no ... we were just kidding. We really didn't mean that you should do something in your free time".
The encouragement to do something in free-time is a lie! ... they know most of the engineers don't do that. But if someone actually does something, then the lie gets exposed.
Try it yourself. Take them up on their offer and see what happens.
I'm not going to open a can of worms here but lemme suffice by saying this:
With success comes maturity, and a startup turns into a corrupt, org-chart, white-elephant company. 90% of the employees end up being cronies that are pushing paper. The remaining 10% are the real engine of the company (cz the revenue stream has to be maintained somehow and real work still needs to be done). But those 10% need to be kept under the thumb at all cost, made to feel they're not good enough no matter what they do; otherwise they might start questioning wtf's going on in the rest of the company. Let's just say that I was one of those 10%. They did whatever they had to do to make an example out of me. I got my freedom, but I actually feel worse for the other hard-workers who were left behind.
There have been many more ffmpeg developers since then. I think it was originally a mpeg1/2/4 codec only, which is something one person can do. Although hardly anyone has done it with acceptable performance like he did, usually they like to get abstract and object-oriented for no actual benefit.
jslinux as well, it's more of a cool demo than anything useful but the "cool" factor is super high. And led to tinyemu (which underlies the current jslinux implementation). The current version can run Windows NT inside QEMU inside JSLinux. In your browser.
That reminds me, Edward Teller said "von Neumann would carry on a conversation with my 3-year-old son, and the two of them would talk as equals, and I sometimes wondered if he used the same principle when he talked to the rest of us."
* Small and easily embeddable: just a few C files, no external dependency, 210 KiB of x86 code for a simple hello world program.
* Fast interpreter with very low startup time: runs the 75000 tests of the ECMAScript Test Suite in about 100 seconds on a single core of a desktop PC. The complete life cycle of a runtime instance completes in less than 300 microseconds.
* Almost complete ES2020 support including modules, asynchronous generators and full Annex B support (legacy web compatibility).
* Passes nearly 100% of the ECMAScript Test Suite tests when selecting the ES2020 features. A summary is available at Test262 Report.
* Can compile Javascript sources to executables with no external dependency.
I hadn't seen QuickJS before and it looks amazing. It's a small, fast implementation of JS plus wrappers for the C standard library. That standard library includes output, reading and writing files, and a lot more. It's both an interpreter and a compiler. So you can build small console apps in JS.
bellard is a programming genius and true C hacker. his works are so great but how humble is his website! why doesn't he write blog posts? why he has so little online presence? hmm. let us wish him a long and productive life.
>> why doesn't he write blog posts? why he has so little online presence?
>> bellard is a programming genius and true C hacker. his works are so great ...
It's probably related. Obviously it's his and only his decision, but I wonder if him slowing down to share the knowledge and skills could have been more productive in the sense that others could do similar work.
Bellard is famous for starting a number of projects that have since been carried on by others.
Looking at his output some time ago, one of the main things that I found inspiring was how much good he did by sharing his work in a form that others could build on it.
Complaining that he's done insufficient to bring others on seems really unfair.
I wonder if we could regain the level of intellectual maturity of the 19th and 20th century and not criticize those who already give away so much that they are not working hard enough.
This is literally Animal Farm / team lead "multiplier" nonsense.
The funniest thing about the 'invert a binary tree' meme is that it genuinely is a simple, 5-liner problem that most programmers could solve in five minutes, it just sounds intimidating.
Also, in a saner timeline, Turing might have well received the Fields medal.
His contributions to probability theory, and morphogenesis are amazing in addition to being the father of computer science, and codebreaker. All of the work was done before he was 40.