Author of HumanifyJS here! I've created specifically a LLM based tool for this, which uses LLMs on AST level to guarantee that the code keeps working after the unminification step:
Would it be difficult to add a 'rename from scratch' feature? I mean a feature that takes normal code (as opposed to minified code) and (1) scrubs all the user's meaningful names, (2) chooses names based on the algorithm and remaining names (ie: the built-in names).
Sometimes when I refactor, I do this manually with an LLM. It is useful in at least two ways: it can reveal better (more canonical) terminology for names (eg: 'antiparallel_line' instead of 'parallel_line_opposite_direction'), and it can also reveal names that could be generalized (eg: 'find_instance_in_list' instead of 'find_animal_instance_in_animals').
What kind of question does it ask the LLM? Giving it a whole function and asking "What should we rename <variable 1>?" repeatedly until everything has been renamed?
Asking it to do it on the whole thing, then parsing the output and checking that the AST still matches?
Does it work with huge files? I'm talking about something like 50k lines.
Edit: I'm currently trying it with a mere 1.2k JS file (openai mode) it's only 70% done after 20 minutes. Even if it works therodically with 50k LOC file, I don't think you should try.
It does work with any sized file, although it is quite slow if you're using the OpenAI API. HumanifyJS works so it processes each variable name separately, and keeps the context size manageable for an LLM.
I'm currently working on parallelizing the rename process, which should give orders of magnitude faster processing times for large files.
> Large files may take some time to process and use a lot of tokens if you use ChatGPT. For a rough estimate, the tool takes about 2 tokens per character to process a file:
> echo "$((2 * $(wc -c < yourscript.min.js)))"
> So for refrence: a minified bootstrap.min.js would take about $0.5 to un-minify using ChatGPT.
> Using humanify local is of course free, but may take more time, be less accurate and not possible with your existing hardware.
It uses smart feedback to fix the code when LLMs occasionally do hiccups with the code. You could also have a "supervisor LLM" that asserts that the resulting code matches the specification, and gives feedback if it doesn't.
It's a shame this loses one of the most useful aspects of LLM un-minifying - making sure it's actually how a person would write it. E.g. GPT-4o directly gives the exact same code (+contextual comments) with the exception of writing the for loop in the example in a natural way:
for (var index = 0; index < inputLength; index += chunkSize) {
Comparing the ASTs is useful though. Perhaps there's a way to combine the approaches - have the LLM convert, compare the ASTs, have the LLM explain the practical differences (if any) in context of the actual implementation and give it a chance to make any changes "more correct". Still not guaranteed to be perfect but significantly more "natural" resulting code.
As someone who has spent countless hours and days deobfuscating malicious Javascript by hand (manually and with some scripts I wrote), your tool is really, really impressive. Running it locally on a high end system with a RTX 4090 and it's great. Good work :)
how do you make an LLM work on the AST level? do you just feed a normal LLM a text representation of the AST, or do you make an LLM where the basic data structure is an AST node rather than a character string (human-language word)?
The frontier models can all work with both source code and ASTs as a result of their standard training.
Knowing this raises the question, which is better to feed an LLM source code of ASTs?
The answer is really it depends on the use case, there are tradeoffs. For example keeping comments intact possibly gives the model hints to reason better. On the other side, it can be argued that a pure AST has less noise for the model to be confused by.
There are other tradeoffs as well. For example, any analysis relating to coding styles would require the full source code.
On structural level it's exactly 1-1: HumanifyJS only does renames, no refactoring. It may come up with better names for variables than the original code though.
Came here to say Humanify is awesome both as a specific tool and in my opinion a really great way to think about how to get the most from inherently high-temperature activities like modern decoder nucleus sampling.
JS minification is fairly mechanical and comparably simple, so the inversion should be relatively easy. It would be of course tedious enough to be manually done in general, but transformations themselves are fairly limited so it is possible to read them only with some notes to track mangled identifiers.
A more general unminification or unobfuscation still seems to be an open problem. I wrote handful of programs that are intentionally obfuscated in the past and ChatGPT couldn't understand them even at the surface level in my experience. For example, a gist for my 160-byte-long Brainfuck interpreter in C had some comment trying to use GPT-4 to explain the code [1], but the "clarified version" bore zero similarity with the original code...
> JS minification is fairly mechanical and comparably simple, so the inversion should be relatively easy.
Just because a task is simple doesn't mean its inverse need be. Examples:
- multiplication / prime factorization
- deriving / integrating
- remembering the past / predicting the future
Code unobfuscation is clearly one of those difficult inverse problems, as it can be easily exacerbated by any of the following problems:
- bugs
- unused or irrelevant routines
- incorrect implementations that incidentally give the right results
In that sense, it would be fortunate if chatGPT could give decent results at unobfuscating code, as there is no a priori expectation that it should be able to do so. It's good that you've also checked chatGPT's code unobfuscation capabilities on a more difficult problem, but I think you've only discovered an upper limit. I wouldn't consider the example in the OP to be trivial.
Of course, it is not generalizable! In my experience though, most minifiers do only the following:
- Whitespace removal, which is trivially invertible.
- Comment removal, which we never expect to recover via unminification.
- Renaming to shorter names, which is tedious to track but still mechanical. And most minifiers have little understanding of underlying types anyway, so they are usually very conservative and rarely reuse the same mangled identifier for multiple uses. (Google Closure Compiler is a significant counterexample here, but it is also known to be much slower.)
- Constant folding and inlining, which is annoying but can be still tracked. Again, most minifiers are limited in their reasoning to do extensive constant folding and inlining.
- Language-specific transformations, like turning `a; b; c;` into `a, b, c;` and `if (a) b;` into `a && b;` whenever possible. They will be hard to understand if you don't know in advance, but there aren't too many of them anyway.
As a result, minified code still remains comparably human-readable with some note taking and perseverance. And since these transformations are mostly local, I would expect LLMs can pick them up by their own as well.
I would say the actual difficulty greatly varies. It is generally easy if you have a good guess about what the code would actually do. It would be much harder if you have nothing to guess, but usually you should have something to start with. Much like debugging, you need a detective mindset to be good at reverse engineering, and name mangling is a relatively easy obstacle to handle in this scale.
Let me give some concrete example from my old comment [1]. The full code in question was as follows, with only whitespaces added:
Many local variables should be easy to reconstruct: b -> player, c -> removePlayer, d -> playerDiv1, e -> playerDiv2, h -> playerVideo, l -> blob (we don't know which blob it is yet though). We still don't know about non-local names including t, aj, lc, Mia and m, but we are reasonably sure that it builds some DOM tree that looks like `<ytd-player><div></div><div class="ad-interrupting"><video class="html5-main-video"></div></ytd-player>`. We can also infer that `removePlayer` would be some sort of a cleanup function, as it gets eventually called in any possible control flow visible here.
Given that `a.resolve` is the final function to be executed, even later than `removePlayer`, it will be some sort of "returning" function. You will need some information about how async functions are desugared to fully understand that (and also `m.return`), but such information is not strictly necessary here. In fact, you can safely ignore `lc` and `Mia` because it eventually sets `playerVideo.src` and we are not that interested in the exact contents here. (Actually, you will fall into a rabbit hole if you are going to dissect `Mia`. Better to assume first and verify later.)
And from there you can conclude that this function constructs a certain DOM tree, sets some class after 200 ms, and then "returns" 0 if the video "ticks" or 1 on timeout, giving my initial hypothesis. I then hardened my hypothesis by looking at the blob itself, which turned out to be a 3-second-long placeholder video and fits with the supposed timeout of 5 seconds. If it were something else, then I would look further to see what I might have missed.
I believe the person you're responding to is saying that it's hard to do automated / programmatically. Yes a human can decode this trivial example without too much effort, but doing it via API in a fraction of the time and effort with a customizable amount of commentary/explanation is preferable in my opinion.
Indeed that aspect was something I failed to get initially, but I still stand by my opinion because most of my reconstruction had been local. Local "reasoning" can be often done without the actual reasoning, so while it's great that we can automate the local reasoning, it falls short of the full reasoning necessary to do the general unobfuscation.
This is, IMO, the better way to approach this problem. Minification applies rules to transform code, if we know the rules, we can reverse the process (but can't recover any lost information directly).
A nice, constrained, way to use a LLM here to enhance this solution is to ask it some variation of "what should this function be named?" and feed the output to a rename refactoring function.
You could do the same for variables, or be more holistic and ask it to rename variables and add comments (but risk the LLM changing what the code does).
How do we end up with you pasting large blocks of code and detailed step-by-step explanations of what it does, in response to someone noting that just because process A is simple, it doesn't mean inverting A is simple?
This thread is incredibly distracting, at least 4 screenfuls to get through.
I'm really tired of the motte/bailey comments on HN on AI, where the motte is "meh the AI is useless, amateurish answer thats easy to beat" and bailey is "but it didn't name a couple global variables '''correctly'''." It verges on trolling at this point, and is at best self-absorbed and making the rest of us deal with it.
Because the original reply missed three explicit adverbs to hint that this is not a general rule (EDIT: and also had mistaken my comment to be dismissive). And I believe it was not in a bad faith, so I went to give more contexts to justify my reasoning. If you are not interested in that, please just hide it because otherwise I can do nothing to improve the status quo and I personally enjoyed the entire conversation.
> As a result, minified code still remains comparably human-readable with some note taking and perseverance.
At least some of the time, simply taking it and reformatting to be unfolded and on multiple lines is useful enough to be readable/debuggable. FIXING that bug is likely more complex, because you have to find where it is in the original code, which, to my eyes, isn't always easy to spot.
As a point of order Code Minification != Code Obfuscation.
Minification does tend to obfuscate as as side effect, but it is not the goal, so reversing minification becomes much easier. Obfuscation on the other hand can minify code, but crucially that isn't the place it starts from. As the goal is different between minificaiton and obfuscation reversing them takes different efforts and I'd much rather attempt to reverse minification than I would obfuscation.
I'd also readily believe there are hundreds/thousands of examples online of reverse code minification (or here is code X, here is code X _after_ minifcation) that LLMs have ingested in their training data.
Yeah, having run some state of the art obfuscated code through ChatGPT, it still fails miserably. Even what was state of the art 20 years ago it can't make heads or tails of.
> JS minification is fairly mechanical and comparably simple, so the inversion should be relatively easy.
This is stated as if it's a truism, but I can't understand how you can actually believe this. Converting `let userSignedInTimestamp = new Date()` to `let x = new Date()` is trivial, but going the other way probably requires reading and understanding the rest of the surrounding code to see in what contexts `x` is being used. Also, the rest of the code is also minified, making this even more challenging. Even if you do all that right, it's at best it's still a lossy conversion, since the name of the variable could capture characteristics that aren't explicitly outlined in the code at all.
You are technically true, but I think you should try some reverse engineering to see that it is usually possible to reconstruct much of them in spite of the amount of transformations made. I do understand that this fact might be hard to believe without any prior.
EDIT: I think I got why some comments complain I downplayed the power of LLM here. I never meant to, and I wanted to say that the unminification is a relatively easy task compared to other reverse engineering tasks. It is great we can automate the easy task, but we still have to wait for a better model to do much more.
I have tried reconstructing minified code (I thought that would be obvious from my example). It feels like it takes just a bit less thought than it did to write the code in the first place, which is definitely not something I would classify as "comparably simple".
Because of how trivial that step is, it's likely pretty easy to just take lots of code and minify it. Then you have the training data you need to learn to generate full code from minified code. If your goal is to generate additional useful training data for your LLM, it could make sense to actually do that.
I suspect, but definitely do not know, that all the coding aspects of llms work something like this. It’s such a fundamentally different problem from a paragraph, which should never be the same as any other paragraph. Seems to me that coding is a bit more like the game of go, where an absolute score can be used to guide learning. Seed the system with lots and lots of leetcode examples from reality, and then train it to write tests, and now you have a closed loop that can train itself.
If you're able to generate minified code from all the code you can find on the internet, you end up with a very large training set. Of course in some scenarios you won't know what the original variable names were, but you would expect to be able to get something very usable out of it. These things, where you can deterministically generate new and useful training data, you would expect to be used.
And I can’t understand why any reasonably intelligent human feels the need to be this abrasive. You could educate but instead you had to be condescending.
Converting a picture from color to black and white is a fairly simple task. Getting back the original in color is not easy. This is if course due to data lost in the process.
Minification works in the same way. A lot of information needed for understanding the code is lost. Getting back that information can be a very demanding task.
But it is not much different from reading through badly documented codes without any comments or meaningful names. In fact, many codes to be minified are not that bad and thus it is often possible to infer the original code just from its structure. It is still not a trivial task, but I think my comment never implied that.
The act of reducing the length of variable names by replacing something descriptive (like "timeFactor") with something much shorter ("i") may be mechanical and simple, but it is destructive and reversing that is not relatively easy; in fact, its impossible to do without a fairly sophisticated understanding of what the code does. That's what the LLM did for this; which isn't exactly surprising, but it is cool; being so immediately dismissive isn't cool.
I never meant to be dismissive, in fact my current job is to build a runtime for ML accelerator! I rather wanted to show that unminification is much easier than unobfuscation, and that the SOTA model is yet to do the latter.
Also, it should be noted that the name reconstruction is not a new problem and was already partly solved multiple times before the LLM era. LLM is great in that it can do this without massive retraining, but the reconstruction depends much on the local context (which was how earlier solutions approached the problem), so it doesn't really show its reasoning capability.
That's much better in that most of the original code remains present and comments are not that far off, but its understanding of global variables are utterly wrong (to be expected though, as many of them serve multiple purposes).
Yep, I've tried to use LLMs to disassemble and decompile binaries (giving them the hex bytes as plaintext), they do OK on trivial/artificial cases but quickly fail after that.
That only means it’s not a legally recognised brand, but it is a brand nonetheless if people associate the two (and they do). A bit like the way people associate tissue paper with Kleenex, or photocopies with Xerox, or git with GitHub.
> The name Wi-Fi, commercially used at least as early as August 1999, was coined by the brand-consulting firm Interbrand. The Wi-Fi Alliance had hired Interbrand to create a name that was "a little catchier than 'IEEE 802.11b Direct Sequence'." According to Phil Belanger, a founding member of the Wi-Fi Alliance, the term Wi-Fi was chosen from a list of ten names that Interbrand proposed. (…)
> The name Wi-Fi is not short-form for 'Wireless Fidelity' (…) The name Wi-Fi was partly chosen because it sounds similar to Hi-Fi, which consumers take to mean high fidelity or high quality. Interbrand hoped consumers would find the name catchy, and that they would assume this wireless protocol has high fidelity because of its name.
The generative pretrained transformer was invented by OpenAI, and it seems reasonable for a company to use the name it gave to its invention in its branding.
Of course, they didn't invent Generative pretraining (GP) or transfomers (T) but AFAIK they were the first to publicly combine them
I have, but only as an idiom, never literally. E.g. "Microsoft just keeps hoovering up companies", but the literal act of vacuuming is only called vacuuming.
growing up in India over past 4 decades .. 'Xerox' was/is the default and most common word used for photocopying ... only recently have I started using/hearing the term 'photocopy'.
every town and every street had "XEROX shops" where people went to get various documents photocopied for INR 1 per page for example
It’s not only their core strength — it’s what transformers were designed to do and, arguably, it’s all they can do. Any other supposed ability to reason or even retain knowledge (rather than simply regurgitate text without ‘understanding’ its intended meaning) is just a side effect of this superhuman ability.
I see your point, but I think there's more to it. It's kind of like saying "all humans can do is perceive and produce sound, any other ability is just a side-effect". We might be focusing too much on their mechanism for "perception" and overlooking other capabilities they've developed.
Sure, but that claim wouldn't be true for humans, right? So it's a nonsequiteur.
The relevant claim would be: all humans can do is move around in their environments, adapt the world around them through action, observe using adaptive sensory motor systems, grow and adapt their brains and bodies in response to novel and changing environments, abstract sensory motor techniques into symbolic concepts, vocalize this using inherited systems of meaning acquired as very young children in adaption within their environments, etc.
In the case of transformers all they can do is, in fact, sample from a compression of historical texts using a weighted probability metric.
If you project both of these into "problems an office worker has"-space, then they can appear simimlar -- but this projection is an incredibly dumb one, and offered as a sales pitch by charlatans looking to pretend that a system which can generate office emails can communicate.
Abstract functions are fully representable by function approximations in the limit n->inf; ie., sampling from a circle becomes a circle as samples -> infinity.
This makes all "studies" whose aim is to approximate a fully representable abstract mathematical domain irrelevant to the question.
This is just more evidence of the naivety, mendacity, and pseudoscientific basis of ML and its research.
As you sample all pixels from all photos on a mountain, the pixels don't become the mountain.
The structure of a mountain is not a pattern of pixels. So there is no function for a statistical alg to approximate, no n->infinity which makes the approximation exact.
By sampling from historical pixel patterns in previous images you can generate images in a pixel order that makes sense to a person already acquainted with what they represent. Eg., having seen a mountain (, having perspective, colour vision, depth, counterfactual simulation, imagination, ...).
In all these disagreeably dumb research papers that come out showing "world models" and the like you have the bad mathematicians and bad programmers called "AI researchers" giving a function approximation alg an abstract mathematical domain to approximate.
ie., if the goal is to "learn a circle" and you sample points from a circle, your approximation becomes exact in n->inf, because the target is *ABSTRACT*.
It's so dumb its kinda incomprehensible. It shows what a profound lack of understanding of science is rampent across the discipline.
MNIST, Games, Chess, Circles, Rulesets, etc. are all mathematical objects (shapes, rules). It is trivial to find a mathematical approximation to a mathematical object.
The world is not made out of pixels. Models of pixel patterns are not their targets.
> all they can do is, in fact, sample from a compression of historical texts using a weighted probability metric.
I don't think that's all they can do.
I think they know more than what is explicitly stated in their training sets.
They can generalize knowledge and generalize relationships between the concepts that are in the training sets.
They're currently mediocre at it, but the results we observe from SOTA generative models are not explainable without accepting that they can create an internal model of the world that's more than just a decompression algorithm.
I'm going to step away from LLMs for a moment, but: How are video generator models capable of creating videos with accurate shadows and lighting that is consistent in the entire frame and consistent between frames?
You can't do that simply by taking a weighted average of the sections of videos you've seen in your training set.
You need to create an internal 3D model of the objects in the scene, and their relative positions in space across the length of the video. And no one told the model explicitly how to do that, it learned to do it "on its own".
>You need to create an internal 3D model of the objects in the scene, and their relative positions in space across the length of the video. And no one told the model explicitly how to do that, it learned to do it "on its own".
Compression is understanding. If you have a model which explains shadows you can compress your video data much better. Since you "understand" how shadows work.
> In the case of transformers all they can do is, in fact, sample from a compression of historical texts using a weighted probability metric.
You seem to think LLMs operate independently from humans. That doesn't happen in practice. We prompt LLMs, they don't just sample at random. We teach them new skills, share media and stories with them, work, learn and play together. It's not LLMs alone. They are pulled outside their training distribution by the user. The user brings their own unique life experience into the interaction.
Well, yes — absolutely. You could say something similar about any system with complex emergent behaviour. 'All computers can do are NAND operations and any other ability is just a side effect', or something.
However, I do think that in this case it's meaningful. The claim isn't that LLMs are genuinely exhibiting reasoning ability — I think it's quite clear to anyone who probes them for long enough that they're not. I was fooled initially too, but you soon come to realise it's a clever trick (albeit not one contrived by any of the human designers themselves). The claim is usually some pseudo-philosophical claim that the very definition of reasoning is simply 'outputting (at least some of the time) correct sentences' and so there's no more to be said. But this is just silly. It's quite obvious that being able to manipulate language and effectively have access to a vast (fuzzily encoded) database of knowledge will mean you can output true and pertinent statements a lot of the time. But this doesn't require reasoning at all.
Note that I'm not claiming that LLMs exhibit reasoning and other abilities 'as a side effect' of language manipulation ability — I'm claiming there's no reason to believe they have these abilities at all based on the available evidence. Humans are just very easily convinced by beings that seem to speak our language and are overly inclined to attribute all sorts of desires, internal thought processes and whatever else for which there are no evidence.
>I think it's quite clear to anyone who probes them for long enough that they're not.
I disagree and so do a lot of people who've used them for a long while. This is just an assertion that you wish to be true rather than something that actually is. What happens is that for some bizarre reason, for machines, lots of humans have a standard of reasoning that only exists in fiction. Devise any reasoning test you like that would cleanly separate humans from LLMs. I'll wait.
> The claim is usually some pseudo-philosophical claim that the very definition of reasoning is simply 'outputting (at least some of the time) correct sentences' and so there's no more to be said.
There is nothing philosophical or pseudo-philosophical about saying reasoning is determined by output. If anything, the opposite is what's philosophical nonsense. The idea that there exists some "real" reasoning that humans perform and "fake" reasoning that LLMs perform and yet somehow no testable way to distinguish this is purely the realm of fiction and philosophy. If you're claiming a distinction that doesn't actually distinguish, you're just making stuff up.
LLMs clearly reason. They do things, novel things that no sane mind would see a human do and call anything else. They do things that are impossible to describe as anything else unless you subscribe to what i like to call statistical magic - https://news.ycombinator.com/item?id=41141118
And all things considered, LLMs are pretty horrible memorizers. Getting one to regurgitate Training data is actually really hard. There's no database of knowledge. It clearly does not work that way.
> Devise any reasoning test you like that would cleanly separate humans from LLMs. I'll wait.
Well, you don’t have to wait. Just ask basic questions about undergraduate mathematics, perhaps phrased in slightly out-of-distribution ways. It fails spectacularly almost every time and it quickly becomes apparent that the ‘understanding’ present is very surface level and deeply tied to the patterns of words themselves rather than the underlying ideas. Which is hardly surprising and not intended as some sort of insult to the engineers; frankly, it’s a miracle we can do so much with such a relatively primitive system (that was originally only designed for translation anyway).
The standard response is something about how ‘you couldn’t expect the average human to be able to do that so it’s unfair!’, but for a machine that has digested the world’s entire information output and is held up as being ‘intelligent’, this really shouldn’t be a hard task. Also, it’s not ‘fiction’ — I (and many others) can answer these questions just fine and much more robustly, albeit given some time to think. LLM output in comparison just seems random and endlessly apologetic. Which, again, is not surprising!
If you mean ‘separate the average human from LLMs’, there probably are examples that will do this (although they quickly get patched when found) — take the by-now-classic 9.9 vs 9.11 fiasco. Even if there aren’t, though, you shouldn’t be at all surprised (or impressed) that the sum of pretty much all human knowledge ever + hundreds of millions of dollars worth of computation can produce something that can look more intelligent than the average bozo. And it doesn’t require reasoning to do so — a (massive) lookup table will pretty much do.
> There is nothing philosophical or pseudo-philosophical about saying reasoning is determined by output.
I don’t agree. ‘Reasoning’ in the everyday sense isn’t defined in terms of output; it usually refers to an orderly, sequential manner of thinking whose process can be described separately from the output it produces. Surely you can conceive of a person (or a machine) that can output what sounds like the output of a reasoning process without doing any reasoning at all. Reasoning is an internal process.
Honestly — and I don’t want to sound too rude or flippant — I think all this fuss about LLMs is going to look incredibly silly when in a decade or two we really do have reasoning systems. Then it’ll be clear how primitive and bone-headed the current systems are.
this overlooks how they do it. we don't really know. it might be logical reasoning, it might be a very efficient content addressable human-knowledge-in-a-blob-of-numbers lookup table... it doesn't matter if they work, which they do, sometimes scarily well. dismissing their abilities because they 'don't reason' is missing the forest for the trees in that they'd be capable of reasoning if they were able to run sat solvers on their output mid generation.
Dismissing claims that LLMs "reason" because these machines perform no actions similar to reasoning seems pretty motivated. And I don't think "blindly take input from a reasoning capable system" counts as reasoning.
Does it? I think Blindsight (the book) had a good commentary on reason being a thing we think is a conscious process but doesn't have to be.
I think most people talking past each other are really discussing whether the GPT is conscious, has a mental model of self, that kind of thing, as long as your definition of reasoning doesn't include consciousness it clearly does it (though not well.)
Hinton's opinions on LLMs are frankly bonkers. Just because you're famous — and intelligent and successful — doesn't mean you can't be completely wrong.
Also: what's his rationale? It's no use simply claiming something without evidence. And as far as I (and seemingly most others) can see, there's no such evidence other than that they can sometimes output sentences that happen to be true. But so can Wikipedia — does that mean Wikipedia is reasoning?
Also, any form of reasoning in the usual sense of the word would surely require the ability to allocate arbitrary amounts of computation (i.e. thought) to each question. LLMs don't do this — they don't sit and ponder; each token takes exactly the same amount of computation to produce. Once they hit an 'end of text' token, they're done.
Even empirically speaking, LLMs' ability to reason can be seen to be nonexistent. Just try asking basic mathematics questions. As soon as you ask anything for which the answer isn't available — practically verbatim — on the web already, it produces intelligent-sounding gibberish.
This whole idea that 'LLMs must be able to reason because in order to learn to fake reasoning you must learn to actually reason' is like some kind of inverted no true Scotsman fallacy.
Yes, Hinton can be wrong, is wrong on many things like his misunderstanding on Chomsky and language.
But I also think he has spent thousands of hours testing these systems scientifically.
Your last sentence puts a lot of words in peoples mouths. But to continue down that line, fake reasoning and actual reasoning sounds like the Chinese Room. Is that the argument you are making?
We don't understand our own mental processes well enough, so I try to not anthropomorphize reasoning and cognition.
> Your last sentence puts a lot of words in peoples mouths.
Well, it’s the most common sentiment I see on both here and (before I gave up) the AI-centred parts of reddit.
It’s not quite the Chinese Room, since LLMs can’t even simulate reasoning very well. So there’s no need to debate the distinction between ‘fake reasoning and actual reasoning’ — there may or may not be a difference, but it’s not the point I’m making.
As for Hinton: I’m sure he has. But inventors are often not experts on their own creations/discoveries, and are probably just as prone to FUD and panic in the face of surprising developments as the rest of us. No one predicted that autoregressive transformers would get us this far, least of all the experts whose decades of work lead us to this point.
Particularly those that are basically linear, that don’t involve major changes in the order of things or a deep consideration of relationships between things.
They can’t sort a list but they can translate languages, for instance, given that a list sorted almost right is wrong but that we will often settle for an almost right translation.
One potential benefit should be that with the right tooling around it it should be able to translate your code base to a different language and/or framework more or less at the push of a button. So if a team is wondering if it would be worth it to switch a big chunk of the code base from python to elixir they don't have to wonder anymore.
I tried translating a python script to javascript the other day and it was flawless. I would expect it to scale with a bit of hand-railing.
It seems that this kind of application can really change how the tech industry can evolve down the line. Maybe we will more quickly converge on tech stacks if everyone can test new one's out "within a week".
ChatGPT is trained well enough on all things AWS that it can do a decent job translating Python based SDK code to Node and other languages, translate between CloudFormation/Terraform/CDK (in various languages).
It does a well at writing simple to medium complexity automation scripts around
AWS.
If it gets something wrong, I tell it to “verify your answer using the documentation available on the web”
>>ChatGPT is trained well enough on all things AWS
It was scary to me how to chatting with GPT or Claude would give me information which was a lot more clear than what I could deduce after hours of reading AWS documentation.
Perhaps, the true successor to Google search has arrived. One big drawback of Google was asking questions that can't be converted to a full long conversation.
To that end. LLM chat is the ultimate socratic learning method tool till date.
ChatGPT is phenomenal for trying new techniques/libraries/etc. It's very good at many things. In the past few weeks I've used it to build me a complex 3D model with lighting/etc with Three.JS, rewrote the whole thing into React Three Fiber (also with ChatGPT), for a side project. I've never used Three.JS before and my only knowledge of computer graphics is from a class I took 20 years ago. For work I've used it to write me a CFN template from scratch and help me edit it. I've also used it to try a technique with AST - I've never used ASTs before and the first thing ChatGPT generated was flawless. Actually, most of the stuff I have it generate is flawless or nearly flawless.
It's nothing short of incredible. Each of those tasks would normally have taken me hours and I have working code in actual seconds.
And we are still at the beginning of this. Some what like where Google search was in early 2000s.
As IDE integration grows and there are more and better models, that can do this better than ever. We will unlock all sort of productivity benefits.
There is still skepticism about making these work at scale, with regards to both electricity and compute requirement for the larger audience. But if they can get this to work, we might see a new era tech boom way bigger than we have seen anything before.
I see your point but that specific analogy makes me wince. Google search was way better in the 2000s. It has become consistently dumber since then. Usefulness doesn't necessarily increase in a straight line over time.
The problem is the use case is where you don't care about the risk of hallucinations or you can validate the output without already having the data in a useful format. Plus you need to lack the knowledge/skill to do it more quickly using awk/python/perl/whatever.
I think text transformation is a sufficiently predictable task that one could make a transformer that completely avoids hallucinations. Most LLMs have high temperatures which introduces randomness and therefore hallucinations into the result.
I'm sure there's some number greater than zero of developers who are upset because they use minification as a means of obfuscation.
Reminds me of the tool that was provided in older versions of ColdFusion that would "encrypt" your code. It was a very weak algorithm, and didn't take long for someone to write a decrypter. Nevertheless some people didn't like this, because they were using this tool, thinking it was safe for selling their code without giving access to source. (In the late 90s/early 2000s before open source was the overwhelming default)
This is an example of superior intellectual performance to humans.
There’s no denying it. This task is intellectual. Does not involve rote memorization. There are not tons and tons of data pairs on the web of minimized code and unminified code for llms to learn from.
The llm understands what it is unminifying and it is in general superior to humans on this regard. But only in this specific subject.
> There are not tons and tons of data pairs on the web of minimized code and unminified code for llms to learn from.
Are you sure about this? These can be easily generated from existing JS to use as a training set, not to mention the enormous amount of non-minified JS which is already used to train it.
I'm bullish on AI, but I'm not convinced this is an example of what you're describing.
The challenge of understanding minified code for a human comes from opaque variable names, awkward loops, minimal whitespacing, etc. These aren't things that a computer has trouble with: it's why we minify in the first place. Attention, as a scheme, should do great with it.
I'd also say there is tons of minified/non-minified code out there. That's the goal of a map file. Given that OpenAI has specifically invested in web browsing and software development, I wouldn't be surprised if part of their training involved minified/unminified data.
> These aren't things that a computer has trouble with
They are irrelevant for executing the code, but they're probably pretty relevant for an LLM that is ingesting the code and text and inferring its function based on other examples it has seen. It's definitely more impressive that an LLM can succeed at this without the context of (correct) variable names than with them.
minification and unminification is a heuristic process not an algorithmic one. It is akin to decompiling code or reverse engineering. It's a step beyond just your typical AI you see in a calculator.
I don’t claim expertise in AI or understanding intelligence, but could we also say that a pocket calculator really understands arithmetic and has superior intellectual performance compared to humans?
Why not count the fact that humans created a tool to help themselves at unminifying towards human score?
Having a computer multiplying 1000-digit numbers instantly is an example of humans succeeding at multiplying: by creating a tool for that first. Because what else is intellectually succeeding there? It’s not like the computer has created itself.
If one draws a boundary of human intelligence at the skull bone and does not count the tools that this very intelligence is creating and using as mere steps of problem solving process, then one will also have to accept that humans are not intelligent enough to fly into space or do surgery or even cook most of the meals.
> Does not involve rote memorization. There are not tons and tons of data pairs on the web of minimized code and unminified code for llms to learn from.
GPT-4 has consumed more code than your entire lineage ever will and understands the inherent patterns between code and minified versions. Recognizing the abstract shape of code sans variable names and mapping in some human readable variable names from a similar pattern you've consumed from the vast internet doesn't seem farfetched.
I think I’d agree with your statement, in the same sense that a chess simulator or AlphaGo are superior to human intellect for their specific problem spaces.
LLMs are very good at a surprisingly broad array of semi-structured-text-to-semi-structured-text transformations, particularly within the manifold of text that is widely available on the internet.
It just so happens that lots of code is widely available on the internet, so LLMs tend to outperform on coding tasks. There’s also lots of marketing copy, general “encyclopedic” knowledge, news, human commentary, and entertainment artifacts (scripts, lyrics, etc). LLMs traverse those spaces handily as well. The capabilities of AI ultimately boil down to their underlying dataset and its quality.
You're in denial. Nobody is worshipping a god here.
I'm simply saying the AI has superior performance to humans on this specific subject. That's all.
Why did you suddenly make this comment of "bowing before your god" when I didn't even mention anything remotely close to that?
I'll tell you why. Because this didn't come from me. It came from YOU. This is what YOU fear most. This is what YOU think about. And your fear of this is what blinds you to the truth.
That's interesting. It's gotten a lot better I guess. A little over a year ago, I tried to use GPT to assist me in deobfuscating malicious code (someone emailed me asking for help with their hacked WP site via custom plugin). I got much further just stepping through the code myself.
After reading through this article, I tried again [0]. It gave me something to understand, though it's obfuscated enough to essentially eval unreadable strings (via the Window object), so it's not enough on it's own.
Here was an excerpt of the report I sent to the person:
> For what it’s worth, I dug through the heavily obfuscated JavaScript code and was able to decipher logic that it:
> - Listens for a page load
> - Invokes a facade of calculations which are in theory constant
> - Redirects the page to a malicious site (unk or something)
Anyone working on decompiler LLMs? Seems like we could render all code open source.
Training data would be easy to make in this case. Build tons of free GitHub code with various compilers and train on inverting compilation. This is a case where synthetic training data is appropriate and quite easy to generate.
You could train the decompiler to just invert compilation and the use existing larger code LLMs to do things like add comments.
The potential implications of this are huge. Not just open sourcing, but imagine easily decompiling and modifying proprietary apps to fix bugs or add features. This could be a huge unlock, especially for long dead programs.
For legal reasons I bet this will become blocked behavior in major models.
I've never seen a law forbidding decompiling programs.
But, some programs forbid to decompile applications by the license agreement. Further, you still don't have any right on this source code. It depends on the license...
A mere decompilation or general reverse engineering should be fine in many if not most jurisdictions [1]. But it is a whole different matter to make use of any results from doing so.
Using an LLM (or any technique) to decompile proprietary code is not clean room design. Declaring the results "open source" is deception and theft, which undermines the free open source software movement.
Only if you use the decompiled code. But if one team uses decompiled code to write up a spec, then another team writes an implementation based on that spec, then that could be considered clean room design. In this case, the decompiler would merely be a tool for reverse engineering.
It is true that at least some jurisdictions do also explicitly allow for reverse engineering to achieve interoperability, but I don't know if such provision is widespread.
Unfortunately not really. Having the source is a first step, but you also need the rights to use it (read, modify, execute, redistribute the modifications), and only the authors of the code can grant these rights.
Doesn't it count as 'clean room' reverse engineering - or alternatively, we could develop an LLM that's trained on the outputs and side-effects of any given function, and learns to reproduce the source code from that.
Or, going back to the original idea, while the source code produced in such a way might be illegal, it's very likely 'clean' enough to train an LLM on it to be able to help in reproducing such an application.
IANAL but if your only source for your LLM is that code, I would assume the code it produces would be at high risk of being counterfeit.
I would guess clean room would still require having someone reading the LLM-decompiled code, write a spec, and have someone else write the code.
But this is definitely a good question, especially given the recent court verdicts. If you can launder open source licensed code, why not proprietary binaries? Although I don't think the situation is the same. I wouldn't expect how you decompile a code matters.
> Seems like we could render all code open source.
I agree. I think "AI generating/understanding source code" is a huge red herring. If AI was any good at understanding code, it would just build (or fix) the binary.
And I believe how it will turn out to be, when we really have AI programmers, they will not bother with human-readable code, but code everything in machine code (and if they are tasked in maintaining existing system, they will understand in its entirety, across the SW and HW stack). It's kinda like diffusion models that generate images don't actually bother with learning drawing techniques.
Why wouldn't AIs benefit from using abstractions? At the very least it saves tokens. Fewer tokens means less time spent solving a problem, which means more problem solving throughput. That is true for machines and people alike.
If anything I expect AI-written programs in the not so distant future to be incomprehensible because they're too short. Something like reading an APL program.
It is true that compilation and minification are both code transformations (it's a correct reduction [1]), but this doesn't seem a very useful observation in this discussion. In the end, everything you do to something is an operation. But that's not very workable.
In practice, compilation is often (not always, agreed!) from a language A to a lower level language B such that the runtime for language A can't run language B or vice-versa, if language A has a runtime at all. Minification is always from language A to the same language A.
The implication is that in practice, deminification is not the same exercise as decompilation. You can even want to run a deminification phase after a decompilation phase, using two separate tools, because one tool will be good at translating back, and the other will be good at pretty printing.
There was a paper about this at CGO earlier this year [1]. Correctness is a problem that is hard to solve, though; 50% accuracy might not be enough for serious use cases, especially given that the relation to the original input for manual intervention is hard to preserve.
You could already break the law and open yourself up to lawsuits and prosecution by stealing intellectual property and violating its owners rights before there were LLMs. They just make it more convenient, not less illegal.
I think there's actually some potential here, considering LLMs are already very good at translating text between human languages. I don't think LLMs on their own would be very good, but a specially trained AI model perhaps, such as those trained for protein folding. I think what an LLM could do best is generate better decompiled code, giving better names to symbols, and generating code in a style a human is more likely to write.
I usually crap on things like chatgpt for being unreliable and hallucinating a lot. But in this particular case, decompilers already usually generate inaccurate code, and it takes a lot of work to fix the decompiled code to make it correct (I speak from experience). So introducing AI here may not be such a huge stretch. Just don't expect an AI/LLM to generate perfectly correct decompiled code and we're good (wishful thinking).
This is very close to how I often use LLMs [0]. A first step in deciphering code where I otherwise would need to, to use the authors words, power through reading the code myself.
It has been incredibly liberating to just feed it a spaghetti mess, ask to detangle it in a more readable way and go from there.
As the author also discovered, LLMs will sometimes miss some details, but that is alright as I will be catching those myself.
Another use case is when I understand what the code does, but can't quite wrap my head around why it is done in that specific way. Specifically, where the author of the code is no longer with the company. I will then simply put the method in the LLM chat, explain what it does, and just ask it why some things might be done in a specific way.
Again, it isn't always perfect, but more often than not it comes with explanations that actually make sense, hold up under scrutiny and give me new insights. It actually has prevented me once or twice from refactoring something in a way that would have caught me headaches down the line.
[0] chatGPT and more recently openwebUI as a front end to various other models (Claude variants mostly) to see the differences. Also allows for some fun concepts of having different models review each others answers.
Okay, but if the unminified code doesn't match the minified code (as noted at the end "it looks like LLM response overlooked a few implementation details"), that massively diminishes its usefulness — especially since in a lot of cases you can't trivially run the code and look for differences like the article does.
[ed.: looks like this was an encoding problem, cf. thread below. I'm still a little concerned about correctness though.]
It does seem that the unminified code is very close to the original. In some cases ChatGPT even did its own refactoring in addition to the unminification:
Note that the original code doesn't call `handleResize` immediately, but have its contents inlined instead. (Probably the minifier did the actual inlining.) The only real difference here is a missing `if (typeof window < "u")` condition.
This refers to the fact that ChatGPT generated version is missing some characters that are used in the original example. Namely, [looks like HN does not allow me to paste unicode characters, but I am referring to the block characters] can be seen in their version, but cannot be seen in the ChatGPT generated version. However, it very well might be that it is simply because I didn't include all the necessary context.
Discrediting the entire output because a few missing characters would be very pedantic.
Otherwise, the output is identical as far as I can tell by looking at it.
It's because the author miscopy-pasted the original code: those "â–‘â–’â–“â–ˆ" at the end of the O5 string are supposed to be the block characters. E.g. "â–‘" in Windows-1252 [0] is 0xE2 0x96 0xE2 which, in UTF-8, exactly the encoding for U+2592 MEDIUM SHADE [1].
If you look for `oahkbdpqwmZO0QLCJUYXzcvunxrjft` in the output, you should see that those characters appear exactly like that. Maybe an issue with encoding of the script file?
Most definitely; if I use "View >> Repair Text Encoding" in Firefox, it shows the block characters. But I have to admit, it's strange that Firefox does not choose UTF-8 by default in this case.
Yes, turns out I was the one who made the mistake.
I updated the article to reflect the mistake.
> Update (2024-08-29): Initially, I thought that the LLM didn’t replicate the logic accurately because the output was missing a few characters visible in the original component (e.g., ). However, a user on HN forum pointed out that it was likely a copy-paste error.
>
> Upon further investigation, I discovered that the original code contains different characters than what I pasted into ChatGPT. This appears to be an encoding issue, as I was able to get the correct characters after downloading the script. After updating the code to use the correct characters, the output is now identical to the original component.
>
> I apologize, GPT-4, for mistakenly accusing you of making mistakes.
If no character set is specified, plain text content is assumed to be 1252. This probably extends to application/javascript as well but I'd have to check to be sure.
The web pre-dates utf-8, although not by much. Ken Thompson introduced utf-8 at winter Usenix in 1993 and CERN released the web in April, but it would be several more years before utf-8 became common. The early web was ISO 8859-1 by default. But people were pretty lazy about specifying character sets back then (still are actually) and Microsoft started sending or assuming their 1252 character set where 8859-1 was required by the spec. Eventually the spec was changed to match de facto behavior. I guess the assumption was that if you're too stupid or lazy to say what character set you're using, then it's probably 1252. (Today the assumption would be that it's probably utf-8). I'm not sure what the specs say today, but I think html is assumed to be in utf-8, and everything else is assumed to be 1252 (if the character set is not explicitly declared).
I recognized this a few months back when I wanted to see the algorithm that a website used to do a calculation. I just put the minified JS in ChatGPT and figured it out pretty easily. Let's take this a few steps out. What happens when a LLM can clone a whole SAAS app? Let's say I wanted to clone HubSpot. If an LLM can interact with a browser and figure out how a UI works and take code hints from un-mimified code I think we could see all SAAS apps be commoditized. The backend would be proprietary, but it could figure out API formats and suggest a backend architecture.
All this makes me think AI's are going to be a strong deflationary force in the future.
>If an LLM can interact with a browser and figure out how a UI works and take code hints from un-mimified code I think we could see all SAAS apps be commoditized. The backend would be proprietary, but it could figure out API formats and suggest a backend architecture.
whoooha! that's a lot of probing and testing of the SAAS that would be required in order to see how it behaved. SAAS aren't algorithms, they operate over data that's unseen on the front end as well...
>All this makes me think AI's are going to be a strong deflationary force in the future.
I don't get this. I've literally never worked anywhere which had enough software engineers, we've been going on about software crisis for about 50 years and things are arguably worse than ever. The gap between the demand for good software (in the sense that allocating capital to producing it would be sensible) and the fulfillment of that demand is bigger than ever. We just don't have the mechanisms to make this work and to make it work at an economically viable level.
Then we get AI to help us and everyone thinks that the economy will shrink?
You wouldn't necessarily need to do much probing - consider that the documentation would provide numerous hints to the agent as to what each endpoint was actually doing.
Honestly, the value in most business software isn't the actual technology. It's the customer base and data held by the platforms.
Someone could already easily clone HubSpot relatively cheaply even if they hired developers, but that doesn't mean it will be anywhere near successful.
Had tweeted about this sometime back. Found a component which was open source earlier and then removed and only minfied JS was provided. Give the JS to Claude and get the original component back. It even gave good class names to the component and function names.
Actually this opens up a bigger question. What if I like an open source project but don't like its license. I can just prompt AI by giving it the open source code and ask it to rewrite it or write in some other language. Have to look up the rules if this is allowed or will be considered copying and how will a judge prove?
Almost likely you would be found guilty because the intent matters. It is easy to check that the generated code is much similar to the original code, and you surely had a reason to bypass the original license. The exact legal reasoning would vary but any reasonable laywer would recommend you to do not.
In the historic Google v. Oracle suit, the only actual code that was claimed to be copied was a trivial `rangeCheck` function, but Google's intent and other circumstances like the identical code structure and documentation made it much more complicated, and the final decision completely bypassed the copyrightability of APIs possibly for this reason.
>I apologize, GPT-4, for mistakenly accusing you of making mistakes.
I am testing large language models against a ground truth data set we created internally. Quite often when there is a mismatch, I realize the ground truth dataset is wrong, and I feel exactly like the author did.
Apologizing to a program seems rather silly though. Do you apologize to your compiler when you have a typo in your code, and have to make it do all that work again?
LLMS are trained to predict next text. But examples like these look like they have also 'learned patterns'. If rot13 is applied on this minified code, will LLM still find meaning in it? if it still could, its more than just next tokens. Need to try it.
edit: chatgpt found out that its rot13 and couldn't explain the code directly without deobfuscating it first.
Claude 3.5 Sonnet can natively speak double base64 encoded English. And I do mean it - you can double b64 encode something, send to it, and it'll respond as if it was normal English. Obviously base64 is a simpler transformation than rot13, but no GPT models can deal with double b64.
it appears that openai's gpt-4 model can speak base64 as well. I jumped to your comment seeing if anyone else had tried it following the OP. double b64 I didn't try, but that is interesting.
> $ ask4 ' what does dGhhdCBpcyBxdWl0ZSBpbnRlcmVzdGluZw== decode to? '
> A "dGhhdCBpcyBxdWl0ZSBpbnRlcmVzdGluZw==" is a Base64 encoded string. When decoded, It translates to "that is quite interesting" in English.
> Obviously base64 is a simpler transformation than rot13
Is it? It’s probably more obscuring from an LLM’s perspective, assuming the LLM has seen enough rot13 text during training. Spaces and punctuation are untouched by rot13, unlike base64, which means that word and sentence boundaries will still be denoted by tokens that denote those boundaries in plaintext.
> The provided code is quite complex, but I'll break it down into a more understandable format, explaining its different parts and their functionalities.
Reading the above statement generated by ChatGPT, I asked myself: Will we live to the day where these LLMs could take a large binary executable as input, read it, analyze it, understand it, then reply with the above statement?
> I followed up asking to "implement equivalent code in TypeScript and make it human readable" and got the following response.. To my surprise, the response is not only good enough, but it is also very readable.
What if this day came and we can ask these LLMs to rewrite the binary code in [almost] any programming language we want? This would be exciting, yet scary to just think about!
You should give it a try and report back! One easy way would be to take an open-source Android app, compile the APK, then decompile it and feed the bytecode to an LLM and ask it to write the java/kotlin equivalent and compare the source and LLM decoded one.
Unminification (obfuscation removal) can also be applied to text. Most specialties develop a jargon that allows insiders to communicate complex ideas quickly; that shorthand excludes outsiders. Large language models can make specialist jargon transparent and thereby expand the circle of people whose understanding applies to specialized fields. Essentially, they solve the problem of mapping specialized, jargonized concepts to things the outside reader already knows. Anyone who wants to learn needs this, and I hope it will become part of students' learning paths.
The garbled text is included in the tree as relevant, pronounceable, and constantly changing text. Here's Chrome's accessibility tree: https://imgur.com/a/V1589Jr
(I'd love if a screen reader user could upload some audio of how awful this sounds, by the by)
Please use `aria-hidden="true"` for stuff like this, it just removes the element from the accessibility tree. I've also emailed Reactive a link to this thread.
Big props for ARIA attributes - they’re so crucial for differently abled and impaired users. I’ve been combing through our project’s components lately to bring them up to design spec and have been taking a look at our accessibility - it’s so important and so easily missable for most engineers.
It is good at unminifying and "minifying" as well.
I have been doing the Leetcode thing recently, and even became a subscriber to Leetcode.
What I have been doing is I go through the Grind 75 list (Blind 75 successor list), look for the best big O time and space editorial answer, which often has a Java example, and then go to ChatGPT (I subscribe) or Perplexity (don't subscribe to Pro - yet) and say "convert this to Kotlin", which is the language I know best. Jetbrains IDE or Android Studio is capable of doing this, but Perplexity and ChatGPT are usually capable of doing this as well.
Then I say "make this code more compact". Usually I give it some constraints too - keep the big O space and time complexity the same or lower it, keep the function signature of the assigned function the same, and keep the return explicit, make sure no Kotlin non-null assertions crop up. Sometimes I continually have it run these instructions on each version of the iterated code.
I usually test that the code compiles and returns the correct answers for examples after each iteration of compacting. I also copy answers from one to the other - Perplexity to ChatGPT and then back to Perplexity. The code does not always compile, or give the right answers for the examples. Sometimes I overcompact it - what is clear in four lines becomes too confusing in three compacted lines. I'm not looking for the most compact answer, but a clear answer that is as compact as possible.
One question asked about Strings and then later said, what if this was Unicode? So now for String manipulation questions I say assume the String is Unicode, and then at the end say show the answer for ASCII or Unicode. Sometimes the big O time is tricky - it is time O(m+n) say, but since m is always equal to or less than m in the program, it is actually O(n), and both Perplexity and ChatGPT can miss that until it is explained.
People bemoan Leetcode as a waste of time, but I am wasting even less time with it, as ChatGPT and Perplexity are helping give me the code I will be demonstrating in interviews. The common advice I have heard from everywhere is don't waste time trying to figure out the answers myself - just look at the given answers, learn them, and then look for patterns (like binary search problems, which are usually similar), so that is what I am doing.
Initially I was a ChatGPT and Perplexity skeptic for early versions of those sites, in terms of programming, as they stumbled more, but these self-contained examples and procedures they seem well-suited for. Not that they don't hallucinate or give programs that don't compile, or give the wrong answers sometimes, but it saves me time ultimately.
I have been told by people working in $200k+/$300k+ SWE jobs to look up at the answers and just be able to regurgitate something along the lines of the Grind 75 answers as a first step.
As a next step - even within these 75 questions, Grind 75's eighth answer and fourteenth answer are answered essentially the same way, as are other questions in there. So the next step would be to see these patterns (binary search, priority queues, sliding window, backtracking) and how to answer them, and then be able to solve them in slightly novel problems (in the more complex questions I understand one might run into more than one of these patterns).
This is a good way to do it IMO. Though I would say you don't want to just memorize answers; you want to fully understand them. Also, paying for LeetCode premium is very helpful since their official solutions are easy to understand and explain how you might arrive at these solutions yourself.
Train on java compiled to class files. Then go from class back to java.
Or even:
Train java compiled to class files, and have separate models that train from Clojure to class and Scala to class files. Then see if you can find some crufty (but important) old java project and go: crufty java -> class -> Clojure (or Scala).
If you could do the same with source -> machine instructions, maybe COBAL to C++! or whatever.
LLM source recovery from binaries is thing. The amazing part is that they are pretty good at adding back meaningful variable names to the generated source code.
This is something you don't need AI for, there are many decompilers out there already as well.
AI cannot even lint properly right now and you want it to decompile? good luck, there's too much hype going on people really think this is possible this year?
In the end always remember it's just autocomplete, it's pretty terrible at translations that are not natural language to natural language. I worked on a natural language to SQL and it was impossible to make it consistently generate valid SQL for Postgres, and I'm talking about natural language to SQL not virtual machine instructions...
LLMs are very good at text reading. LLMs read tokenized text, while human use eyes to view words. Another scenario is that ChatGPT is good at analyzing cpp template error messages, which are usually long and hard to understand for human.
You can do this on minified code with beautifiers like js-beautify, for example. It's not clear why we need to make this an LLM task when we have existing simple scripts to do it?
While this doesn't restore minified identifiers, like the LLM version claims to do, it tends to help a lot with understanding the code. Usually minified code still has the original identifiers in global function names, object attributes, DOM classes and a few other places where it is hard to guarantee no side effects of name mangling. This makes guessing the purpose of the remaining identifiers substantially easier to a human, and it is probably also the main reason why an LLM is capable of making reasonable guesses at what they could reasonably be called.
I can see some ways to use this and easily check that the LLM is not hallucinating parts of it, because you can ask the LLM to unminify (or deobfuscate) some component, then request unit tests to be written by the LLM, then humanly check that the unit tests are meaningful and that they don't miss things on the unminified code, then run the tests on the original minified version to confirm the LLM's work, maybe set up some mutation testing if it is relevant.
Unfortunately the comments that could be generated are exactly the ones that should never be written. You want the comment to explain why, the information missing from the code.
This is something I always disagreed with. In my experience, I rather read a short comment explaining what is the purpose of a block of code, than trying to decipher it. Yes, code "should speak for itself", but reading a comment is almost always faster than reading blocks of code.
And then there is also documentation (if you include it in what you define as comment). I much rather go through a website, with a search function, example, description, made with some docgen tool, than having to go through a library or programming language source code every time I need to remember how to do X, or if object B has implement function Y ...
It's just a rule of thumb, like anything else. In most code, "why" is the hard part; I see that you are incrementing that account by a penny from out of the blue, but why? When you are in code where "what" is the hard part, like an implementation of a book algorithm or some tricky performance optimization, then by all means comment that.
Really all this rule amounts to is
// Increment by a penny
accountValue += 1
is a pointless comment, please don't do that. Schools had a way of accidentally teaching that by too-rigidly requiring "commented code", in situations where there wasn't much else to say, or situations where the students themselves didn't necessarily have a strong sense of "why". Any comment that isn't just literally "this is what the next line does" is probably useful to somebody at some point.
I do agree that documenting the why is way more important than the how/what. But having a short comment to summarize a block of code like:
// Parse the filename and remove the extension
let fext_re = Regex::new(r"(.\*)\.(.+)$").unwrap();
let page_cap = fext_re.captures(fname).unwrap();
let page_base_filename = page_cap.get(1).unwrap().as_str();
Is still useful. Instead of having to read the next few line of code, I already know what they are suppose to do and expect.
It makes discovery, later down the line, easier.
This would be entirely self-documenting by replacing that with a function named after what it does, then the comment isn't necessary.
To boot, a unit test could be written that would reveal the bug in the regular expression that makes it only work with filenames that have an asterisk before the extension. Unless you intended that (unlikely), in which case the comment is wrong/not comprehensive and misdirects the reader.
You can put these comments into the name of a function, getting rid of the redundancy and having them read by whoever would just be reading the code not to be distracted by the comments.
> reading a comment is almost always faster than reading blocks of code
Not to a competent programmer when reading well-written code.
This also means that you read what the code does, rather than what a comment says the code does. Otherwise you will be blind to bugs. Any experienced developer will tell you that code very often doesn't do what the original programmer thought it did.
> Not to a competent programmer when reading well-written code.
No, literally reading a one line about what the next 4 lines do is mechanically faster. It does not matter that you are good or bad, it is about simple reading speed.
> This also means that you read what the code does, rather than what a comment says the code does. Otherwise you will be blind to bugs. Any experienced developer will tell you that code very often doesn't do what the original programmer thought it did.
I am an experience developer. I have worked on several "legacy" projects, and started many from 0.
1. It does not make you blind to anything, it is just a way to learn/direct yourself in the code base faster.
2. Knowing what the original developer wanted is often as useful as knowing what the code actually does. More info is better than no info.
Even outdated comment can be useful.
For me, this type of thinking that comment are unnecessary, that competent ppl can just read the code, etc. is actually a sign of younger dev who never had to work on a long-lived codebase.
> For me, this type of thinking that comment are unnecessary, that competent ppl can just read the code, etc. is actually a sign of younger dev who never had to work on a long-lived codebase.
It sounds like you're conflating "helpful comments that explain why" with "no comments are needed ever because read the code", and we're talking past each other.
Like all LLMs you greatly benefit from prior experience or you risk just falling for hallucinations which is a limitation of a non-deterministic black box, and degrades performance relative to the task. Ive commented in other threads, LLMs are great at amplifying my output in an area I already have domain knowledge in. I think this is why people fail to realize any gains or give up, they think it will unlock areas they dont fully understand themselves. Blind leading the blind problem.
An interesting use-case of this capability is refactoring, which, for me, ChatGPT has been unmistakably good at. It's amazing how I can throw garbage code I wrote at ChatGPT, ask it to refactor, and get clean code that I can use without worrying if it's going to work or not, because in 99% of cases it works without breaking anything.
I've had pretty good results dumping entire files in to Sonnet3.5.
For example, "Here's my app.js file, please add an endpoint for one user to block another. Feel free to suggest schema changes. Please show me the full app.js with these changes implemented"
The model seems to be great at figuring out frameworks and databases just by seeing the contents of a full app.js file.
I do find this type of prompt works much better with Sonnet3.5 than GPT4o.
No, most other languages absolutely don’t work as well as JS, simply because there’s been less training material available. It’s useless with Rust, for example (hell, I’d be totally impressed if it has any idea how to appease the borrow checker!)
LLMs are great for self-contained boring tasks; recently I have started to refactor ruby tests with a simple prompt (getting rid of various rspec syntax in favor of more explicit notation at cost of code duplication - so kinda like unminifying things I guess) - works _ridiculously_ good as well
Slightly off-topic but I remain perplexed at how "minified" Javascript is acceptable to software developers commenting online but terse code in any other language, e.g., one letter variable names, is unacceptable to a majority of software developer online commenters.
There is also topiary. From their website "The universal code formatter". I think it doesn't work with Javascript source for the moment, but it will surely work in the future.
Would have been cool if this had been used in that air con reverse engineering story yesterday.
I noticed while reading the blog entry that the author described using a search engine multiple times and thought, "I would have asked ChatGPT first for that."
Are there any serious security implications for this? Of course obfuscation through minification won't work anymore, but I'm not sure if that's really all that serious of an issue at the end of the day.
I’ve tried using LLMs to deobfuscate libraires like fingerprintjs-pro to understand what specific heuristics implementation details they use to detect bots.
They mostly fail. A human reverse engineer will still do better.
I’m hoping LLMs get better at decompiling/RE’ing assembly because it’s a very laborious process. Currently I don’t think they have enough in their training sets to be very good at it.
Not exactly, because you still have to pay any distinct identifier present in your code. Also many minifiers do constant folding and inlining and remove comments, any of which almost surely remove redundant or unused information to compress.
I don’t think they’re saying that minifying provides no additional space savings, but rather that those additional savings are small and not worth the tradeoffs.
Not even that is true in my knowledge. For example a particular benchmark [1] demonstrates that many popular libraries benefit much from minification even after gzip compression, with the saving ranging from 35% to 75%. Sure, a small library would be fine without any minification or even compression, but otherwise minification is clearly beneficial.
I think you have to look at this in the context of an entire bundle or project, and then you have to weigh it against the download speeds you’re generally expecting for the users of your site or app.
I agree that as a blanket statement “gzip is enough” is not technically correct, but I think it’s largely correct in spirit, in that people tend to reach for minification by default, without really thinking about what they’re gaining.
If minifying saves you 200 KB overall, for example, and you expect your average user to have a 200 Mbps connection, you’re saving a grand total of 8 ms on page load, which is an imperceptible difference on its own. In exchange, you’re getting worse debugging, and worse error reporting.
Minification would be indeed useless under that set of assumptions, but the real world is much more variable and you need a comfortable margin. For example, mobiles rarely have that large bandwidth sustained all the time.
Comprehensively speaking, the minification is only a small step in building a performant website or web application. You have way more things to do, for example choosing a correct image compression format and method would have much more impact in general. But not everyone can be expected to understand them in depth, so we have best practices. Doing the minification therefore qualifies as a good best practice, even though it would be just a single one out of many others.
I think probably not, if the assets are coming from the same place, since the connection will be reused in most modern situations. Maybe if you’re loading the JS from a CDN though, and there are no other large resources, or those resources come from a different server.
I find LLMs good at these kind of tasks, also converting between CSV to JSON for example (although you have to remind it not to be lazy and do the whole file)
It is also shockingly good at converting/extracting data to CSV or JSON, but not JSONL. Even the less capable model, `gpt-4o-mini`, can "reliably" parse database schemas in various formats into CSV with the structure:
I have been running it in production for months[1] as a way to import and optimize database schemas for AI consumption. This performs much better than including the `schema.sql` file in the prompt.
I have to ask the obvious question: how do you know the unminified code is semantically equivalent to the minified code? If someone knows how to verify LLM code transformations for semantic fidelity then I'd like to know because I think that would qualify as a major breakthrough for programming languages and semantics.
LLMs are good at modeling and transforming text, news at 11. AI proponent hypes AI. I could go on, but I shouldn't have been this sarcastic to start with
Is it though? The developer tabs have an unminify button which yields similar results. JavaScript minification is not hard in any way and the guessing of variable names is not that hard given such a simple code example.
Dont know if this will apply directly here, but --
As someone who is "not a developer" - I use the following process to help my:
1. I setup StyleGuide rules for the AI, telling it how to write out my files/scripts:
- Always provide full path, description of function, invocation examples, and version number.
- Frequently have it summarize and explain the project, project logic, and a particular file's functions.
- Have it create a README.MD for the file/project
- Tell it to give me mermaid diagrams and swim diagrams for the logic/code/project/process
- Close prompts with "Review, Explain, Propose, Confirm, Execute" <-- This has it review the code/problem/prompt, explain what it understands, propose what its been asked to provide, confirm that its correct or I add mroe detail here - then execute and go with creating the artifacts.
I do this because Claude and ChatGPT are FN malevelant in their ignoring of project files/context - and their hallucinate as soon as their context window/memory fills up.
Further they very frequently "forget" to refer to the project context files uploaded/artifacts they themselves have proposed and written etc.
But - asking for a readme with code mermaid and logic is helpful to keep me on track.
Agents like Aider or Plandex wrap that up nicely. They do the automatic review and have a very verbose description of the edit format. If you do that often manually, it may be worth testing their prepackaged approach.
Only thing I'd like to suggest is an option to search for Windows 11 compatible machines. With MS cutting off support for Windows 10 next year, making sure a machine has the system requirements needed.
However, I have seen a lot of sellers install W11 on non-compatible devices using a few tricks. I'm not sure how you check that in a search tool, but great job otherwise! I'll definitely be using this in the future (and I think you should pass everything through affiliate links! Pay for the upkeep at least)
https://github.com/jehna/humanify