FWIW, when I'm doing a code review, these are the exact kind of comments that I would tell a committer to remove.
That is, it's like it generates these kinds of comments:
// initializes the variable x and sets it to 5
let x = 5;
// adds 2 to the variable x and sets that to a new variable y
let y = x + 2;
That is, IMO the whole purpose of comments should be to tell you things that aren't readily apparent just by looking at the code, e.g. "this looks wonky but we had to do it specifically to work around a bug in library X".
Perhaps could be useful for people learning to program, but otherwise people should learn how to read code as code, not "translate" it to a verbose English sentence in their head.
I don’t think the intent of this tool is to generate comments which you’d then embed into the code it describes. I think it’s meant to explain, in plain language, what the actual behavior is (for whatever confidence level you might assign to “actual” and “is”).
To your point about the utility of code comments describing the behavior this way, I agree it’s probably much more valuable for beginners. In fact when I’ve mentored early programmers, I sometimes ask them to write out essentially prose like this in comments before writing a single line of executable code.
Now, I’m far from a beginner. I’ve been considered a senior engineer long enough that friends discourage me from disclosing the amount of time, for fear of age discrimination. I can absolutely see the potential of this tool as part of my IDE. I’m on vacation now, but when I return to work I plan to take it for a spin as an aid for refactoring areas of code which clearly work as intended (well, for the most part) but the actual behavior and intent is much less clear.
Here’s why I think it’ll be valuable for refactoring: it can help limit the amount of mental context switching necessary to build a mental model of what the code does. I often find myself trying to produce prose much like this for my own reference, but I end up losing context as fast as I acquire it as I follow references into their respective rabbit holes. Having the tool do that for me can help me stay in a single area of focus. It could also be a useful reference for adding and improving type definitions, maybe even regression tests.
The best part is that it doesn’t, from what I’ve seen, do anything besides populate ephemeral annotations. It doesn’t try to write code or automate anything other than producing a narrative. Like at least one other commenter, I’m skeptical about the reliability of that. But unlike that commenter, I’m willing to take the risk… probably because I’ve learned to be skeptical of my own reliability performing the same task. If my instinct is right that I can use this tool the way I hope, I’ll still scrutinize it for accuracy. But that’s potentially much better than having only one imperfect, meat-based computer doing the work.
To be clear, I'm not really arguing that the intent is to embed comments. But I'm just going by the "validateSignature" example they give in the video. All that does is generate a 1-1 mapping of each code line to an English comment. TBH I would expect a senior developer to just be faster reading the code.
Perhaps there are some other examples that explain more of a "structural understanding" of the code, but I'm skeptical without more evidence.
> TBH I would expect a senior developer to just be faster reading the code.
TBH, I would also expect this tool to be a lot more affordable than a senior developer for language X.
Plus, chances are good that your senior developer for language X will barely scrape by as an extremely junior developer for language Y that you also use in some capacity, and that there is a problem which involves stuff written in both X and Y. Also, many languages have horrid gotchas, where innocuous looking code does something quite unexpected. Like the C++ classic log->debug("Timeout: %d", config["timeout"]). If some tool where to actually tell you "Writes a debug log with the current value of timeout, if it exists, or otherwise sets it to 0", well that would be pretty useful for someone who has only superficial knowledge of C++.
> TBH I would expect a senior developer to just be faster reading the code.
That specific code? Sure. Real world code that’s gotten more convoluted over a decade of maintenance?
I have a dozen pages of notes I’ve taken on the apparent behavior of a single function, all of its call sites, all of the known possible kinds of states it might encounter, all of the categories of implications those states might carry and the kinds of downstream effects its return value might have. It’s not even a particularly large function as far as those go. The notes aren’t even complete, if I had to guess they’re 1/3 there. To be complete they’ll span not just several modules but cross package and language boundaries and even repo migrations. Not because the function itself is that complex (although it’s much more complex than I’d prefer), but because the universe of its inputs and usage is enormous and the history which produced it is long and just as convoluted. And, importantly, because the space of very similar behaviors and functions I’ve discovered has grown each time I peel a layer of the onion off.
Now when I go back to work, given I get to continue untangling this thing, having a tool which helps even explain this universe would be a godsend for actually acting on it in a reasonably safe way. Especially if it does so with consistent language. I could literally shove it into a database and query it, or do all kinds of other analysis. I can’t do that if I’m spending all of my energy trying to just describe the thing in my own words with incomplete understanding. I mean, I can. The documentation I’m describing literally began with me wrapping up the previous day’s notes with “humans built this, a human can understand it, model the damn thing”. But on what timeline? And is it a good use of my time to do it when a machine could probably do it more quickly and hopefully just as reliably at scale?
I’d be happy to share the evidence if my instinct proves out.
Really? The thing it does in the video is almost identical to something I’ve spent days doing manually. Even if it's a small part of the task, it does it in about a second. Why do you think it doesn’t do what I think it does?
The problem with complex code is usually branches / conditionality.
Also complex indexing or expressions.
This stuff very rapidly stops being expressible in narrative form because the details can't be "compressed".
The explanation starts to look like legalese and you realise the code itself is the more compact and legible expression.
Ideally an autocommenter will be able to take a hand matmult and write "compute the surface normal" (this is an example where apparently high local detail can map to good conceptual compression) but actually that is the best possible case.
When it comes to important but somewhat arbitrary stuff like shipping or tax rules, english will not speed comprehension and is less good than a tageted dsl or well thought out tables.
The core problem is that "sometimes there is no simple explanation".
That’s all fine and good but the code exists already and even if I think its domain is an exceptionally good candidate for a DSL I can’t snap my fingers and make it morph into what I wish it was. And I can’t get there gradually without understanding what it is.
I’m well acquainted with its complexity but I need to articulate it coherently to justify changing any of it, not just to myself but to my team and stake holders. I’m not looking for a simple explanation. I’m looking forward to having tools help me navigate a very complicated explanation.
I also don’t understand the motivation here or upthread to splain the problem I’m solving away as some nihilistic unsolvable thing when I’m saying I see real prospects of this helping me. I mean, you’re welcome to whatever nihilism you see fit in your course of action but I personally don’t benefit from being told things I find potentially useful are not useful for me, actually.
So a better "output" might be a kind of chatbot that will answer questions about that particular piece of code. Basically it understands every detail including branches and you can ask whether a specific assumption is correct.
I don't think we're there yet (because complicated side effects etc. are a thing) but I would rather have such a solution than some text to read, after all I'd be using such a tool for code I cannot comprehend just by reading it.
It's possible that if you changed from translating the code to prose by hand and translating it with the tool, you would find that the "value" of that translation would be reduced. Basically, I think reading the code and mentally analyzing it enough to write to prose is doing most of the work, and when you read the prose, it's only a reminder for the deeper analysis you've done in your head.
Yeah, I find the signature example kind of unimpressive because it can just pick up the meaning of the code from the function name and variables, which an experienced human can do just as easily. If it could get "validates a signature with secret key" when all the variable names are obfuscated it might be useful in some code bases.
Why could you not create a comment block above the code section in question and fold it. All IDE's nowadays support folding, including comment block folding.
To see what a section of code does, you unfold the comment block above it, read it, then fold it back.
I guess if the AI is good enough, your point makes some sense.
It's a good programming habit to doc your code as you write it.
If you do this in a structured way and code folding is used
the comments should get out of the way from the main code.
Another way to look at it is that it doesnt (afaik) have externsl context, so there are severe limits on what it could possibly infer from some code. Like you say, if something unusual is being done because of a library, or business rules or something, "AI" cannot take this into account. There may be some cases where something non-obvious can be distilled out of the code, though I agree that mostly the stuff you can infer without context is mostly self evident from the code anyway
Edit: just thinking, the canonical example would be something like the fast inverse square root from quake. Is it going to summarize or is it going to tell you
i = 0x5f3759df - (i >> 1);
// Shift i right by one and subtract it from 0x5f...
(It would be cool if it does work here, even if it does, when someone makes up a new thing like this, it couldn't possibly comprehend why it's being done)
Fortunately for me, at my current job I don't see a lot of code that makes me say "what the hell is this even trying to do?" The most useful comments are the ones explaining limitations and why this implementation was chosen, or business rules that wouldn't be apparent in the code.
Seeing as how its knowledge base is pulled from similar code pulled from the internet, I'd find it very likely to get the reference and interpret that as the quake fast inverse square root. Same with general context/library stuff - though obviously that's conditional on the quality of its training data.
e.g. it would probably "understand" common devops/organization/OS/library stuff even if it's not part of the language it's presumably reading - just because other users probably left comments on those lines similarly - but it's not gonna necessarily understand your application-specific business logic beyond what's actually happening. Would need some very specific examples though (by definition lol). Even dumb stuff like interpreting "make the div spin around" from some CSS/JS would probably work, as someone somewhere probably coded that similarly.
I'm always amazed by what AI can do but never been amazed by the actual output(amazed within the context that this is computer generated). Not just Copilot but any tool that has too much of intelligence of itself(Dall-e, Midjourneys etc) feels like this because it reminds me of a person with great talent for compositing stuff but doesn't know what they are doing.
AI generated papers that got published in prestigious journals situation all over again and again. At glance looks amazing but the machine definitely doesn't have any kind of intelligence and the output is actually worthless.
This AI stuff work really well when they do something very specific that is quickly inspectable by human, for example generating interpolated frames in videos or extending a pattern or detecting anomalies kind of stuff.
The moment it strays away from human control it fails amazingly well.
You are grossly over-simplifying what this is doing. Nowhere does it say anything as simple as setting x to y. In nearly each case it takes the context of the variables into account and states the meaning of the function calls, not the meaning of the syntax.
Yes, the examples I gave are gross over-simplifications. But look at the attached video. It literally just gives a 1-1 mapping of each code line to an English sentence, and that code is pretty trivial to understand in any case. I mean
I still argue that developers should be able to read the raw code like that faster than turning it into an English sentence. I'm not saying the text generation isn't extremely impressive, I just don't think it's that useful.
Even for experienced developers, it is useful when navigating a new codebase (where you don't know what each API call does) or even a new programming language (where a particular construct may be unfamiliar like Rust's if let). Well supposing the tool is accurate (incorrect results may make the tool useless or harmful)
But if you're discounting the usefulness for beginners I think this is a mistake. The experienced devs of tomorrow are the beginners of today.
The tweet and video don’t seem to imply this _should_ be a comment.
I have been “learning to program” for 20+ years and would absolutely find this useful as a quick way to get basic information about a chunk of code I’m unfamiliar with.
Not that learning to read code isn’t important, just not always necessarily worth the time (:
Agree. I find, what you usually want to understand is not the "what" or "how" but the "why", and that is quite a bit harder to automate than translating syntax into natural language statements.
Yeah, the 'why' often literally doesn't exist as information in the code anywhere, e.g. business rules, domain-specific knowledge, etc. If you're extremely lucky it might exist as 'why comments' :)
I do wonder if ML could be applied if you actually also train it on your particular (large) application where if might be able to 'cross-reference' domain knowledge present in commented code to other similar code. (But then you might want to just remove the pseudo-duplication anyway.)
I caused a bit of an "incident" at a web dev company I worked at many many years ago by removing commit access from one of the "technical managers", whose hobby was removing every single comment he could find.
"But the code shouldn't need comments," he'd complain, "it should be obvious what it does otherwise it's just bad code!"
Yes, you dobber, it *is* obvious from the code, the *how* is obvious, but the why and the what might not be. The comments explain what it's doing stuff to and why, and in particular why you'd want that and why it's important. Disk space is free, so in a big long comment just write up what that particular bit of business logic is intended to achieve and what it expects as inputs and outputs.
Absolutely, and could not be convinced that if the code and comments got out of sync then it was likely that the function name would no longer make sense, and that's when we get real trouble.
There were horror stories with that codebase, like index.php was set as the default error handler, started by immediately returning 200 OK before going anywhere near URL dispatching, and then displaying a page that said 404 despite returning 200 if it couldn't find the URL.
The only situation where i see an AI generating this type of comments to be useful would be deciphering an obfuscated C challenge program. Or maybe Perl.
Yep. Nothing particularly mind-blowing about this. It's just a word-by-word translation of code into English. Heck, you don't even need GPT-3 to do this, except for some variety and grammatical correctness.
What I like about it is that it could help me understand a new language. Sometimes it’s easy to follow what’s going on, but sometimes there are just odd language conventions that I’m not used to.
I just don't trust it, I've worked with GPT-3 before and it sure does a real good job of sounding convincing, but if you don't understand the code there's no way to know if what it's saying is accurate, or whether it's just regurgitating random nonsense that sounds plausible.
It knows how to create sentences that sound like something a human would write, and it's even good at understanding context. But that's it, it has no actual intelligence, it doesn't actually understand the code, and most importantly, it's not able to say "Sorry chief, I don't actually know what this doing, look it up yourself."
The underhanded C contest is a great practical demonstration even "biological intelligences" have a difficult time reading and summarizing code. I wouldn't trust this thing further than I do comments, but I could see it being equally as useful.
There's an example in this example. The line it translates as "we're getting the raw request body" doesn't work on multipart/form-data.
I can easily imagine the reason you're looking at an unfamiliar function in an unfamiliar language (hence needing such a line-by-line translation) is that there's some sort of bug and that edge case is exactly why. The tool would mislead you into thinking it's one of the other lines, because of how simple its translation is.
This is also the story of the past several millenia of "progress" in human translation between natural languages.
You trust a human translator because their ability has been partially verified by an authority (language certificate) and many people have employed their services with few complaints. Similarly, you trust a translation program because it was partially verified (hidden test cases) and many people use it with an acceptable amount of complaints (given its convenience).
False equivalence. It may not be any easier for you to verify a human translator's output, but that's a far cry from being structurally incentivized to produce output which appears fluent in the target language at the expense of presenting confidently wrong.
This is a prime example of the moving goalpost of what intelligence "actually" is - in previous eras, we would undoubtedly consider understanding context, putting together syntactically correct sentences and extracting the essence from texts as "intelligent"
Whether this thing is worthy of the label of "intelligent" or not is fairly uninteresting. What matters for something like this is its accuracy and if it can be trusted - that is what I think OP is getting at.
Have you ever read "A Canticle for Leibowitz"? A peripheral bit in the story has a monk develop a mathematical system to determine what word would come next in a manuscript whose edge has been lost. Walter M. Miller, writing that story in 1959, does not portray such a system as having or being perceived to have "actual intelligence", because he can easily imagine that a complex system could appear to work in that way without intelligence.
Does it do all that, or does it just pretend to understand context and extract the essence from texts? It looks as if it does because it follows the form you'd expect an answer to have if the person is intelligent. But when you look more closely, it often falls apart.
It reminds me of people who use "big words" without actually understanding them. If they don't overdo it or really miss the meaning of a term, they can seem much more educated than they are.
You're asserting here that it understands context, but you haven't provided any argument in support of that assertion.
I think you'll also need to define what you mean by "understanding" (because that term is loaded with anthropocentric connotations) and clearly state what "context" you think the model has.
How come general developer audiences aren't more acquainted with GPT-3 (and Codex in particular) capabilities? People in the twitter thread all seem completely mind blown over an app that basically just passes your code to an existing API and prints the result.
I don't want to sound negative of course, and I expect many of these apps coming up, until Codex stops being free (if they put it on the same pricing as text DaVinci model, which Codex is a fine-tuned version of, it will cost a ~cent per query). I'm just wondering how come the information about this type of app reaches most people way before the information about "the existence of Codex" reaches them.
For all the publicity around Codex recently (and especially on HN), it still seems like the general IT audience is completely unaware of the (IMHO) most important thing going on in the field.
And to anyone saying "all these examples are cherrypicked, Codex is stupid", I urge you to try Copilot and try to look at its output with the ~2019 perspective. I find it hard to beileve that anything but amazement is a proper reaction. And still, more people are aware of the recent BTC price, than this.
Source: have been playing with Codex API for better part of every day for the last few weeks. Built an app that generates SQL for a custom schema, and have been using it in my daily work to boost ma productivity as a data scientis/engineer/analyst a lot.
MS has been trying to get AI into intellisense for years now and I always turn it off.
The lack of control over it just makes it annoying. In many ways it's faster to just type out the algorithm than it is to lay the algorithm out and spend the time trying to understand what's there so I can successfully convert the code to what I need.
Then there's the lack of stability. Yesterday it did something different from what it's doing today, so I can't even use muscle memory to interact with it anymore.
Intellisense has _always_ had that annoyance factor of getting in your way sometimes, forcing you to write code in a certain way to minimize that. All this just makes it more annoying and I don't believe anyone who claims it truly makes them more productive.
FWIW I’ve been using CO pilot now for a while and I have to do very little laying out, usually just from a name and context it will give me 80% of what I want and then it’s much quicker for me to just edit it into the correct form if need be. My productivity has very heavily increased because of the amount of rote boiler plate I can now just completely obviate.
I think you should be careful to realize that though it may not fit for you intellisense is very helpful for a lot of people and that it may be your tastes as to what you find annoying that do not generalize. I for one don’t even notice the things you’re saying bother you because the mental overhead to me is very little. Just quick glance, tab to auto complete if it’s useful otherwise keep typing.
What you're describing in terms of boilerplate can be done via snippets.
Think of it like parenting.
I can either convince the little tike to clean up his room or do it myself in half the time. Only with babysitting I may take the time to teach them something, with things like autopilot I have no such motivation.
I don't even like autocompleting brackets and the like. It flat isn't consistent enough for me. It's ok-ish when writing fresh code, but completely gets in the way when editing code to the point that it's easier and faster for me to just disable it.
What I am talking about has nothing to do with Intellisense or your workflow. What I am saying is that, if someone in 2019 told you that there is a "thing" that is able to take a very complex sentence and qith high accuracy (and awareness of the database details) generate 50 lines of SQL, using CTEs, complex JOINs, subqueries, string formatting, date manipulation, etc, you would have been amazed. That thing now exists, and it didn't exist before. It is a complete phase shift and it cannot simply be viewed as a incremental improvement. This is a whole different beast.
Using this beast as intellisense is just one application (called "Copilot") and it has all these annoyance factors sometimes. But I am not talking about that.
To me, this is like we found a way to transform iron to gold with low energy usage, and people are complaining that gold is not that useful. And most chemists not even hearing about the news. I'm constantly amazed by this, every single day, as I read threads like this one.
Thank god, finally found someone with the same issue. This is incredibly frustrating.
I'm also wondering why the news coverage of this is so abysmal. In the Netherlands there is very little if any awareness of this, in my bubble at least. (We all seem to very aware of every goddamn bowel movement of every soccer player.)
Even my government seem to just very recently become aware that using data and generally working in a structured manner is preferable to just winging everything all the time. Some departments are even starting to use basic statistics which some even call AI. Nobody is quite sure what anything means and how to make sense of it and all these high-level decision makers studied history, administration or something legal. Absolutely nobody with a clue about anything beyond '80s tech - if even that. It's downright disturbing to see this immense gap and we are supposed be somewhat advanced. But I digress..
I can only speak for myself, but it just hasn't been around long enough for me to properly trust any AI-driven tool to give me correct output for anything important.
I'll admit I haven't played with Copilot yet (since I don't think my employer would be happy for me to send off proprietary code to third-party servers, so I've effectively self-banned myself from using it at work*), but I'd feel that for anything non-trivial like your example of complex SQL queries I'd be reluctant to use the generated output without extra scrutiny (essentially a very fine-toothed code review, which is exhausting).
My opinion will probably change as the tools become more mature, but for now I'm treating them as toys primarily which limits the excitement.
Something like TLDR is less risky as it's not producing code, just summarising it, but I'd still feel wary to trust it since it's such a new field. Maybe this speaks more to my own paranoia than anything else!
EDIT: *and on this topic while I'm here: I'm actually a bit confused (and honestly... jealous?) on the topic of privacy for these kinds of external models. Is everyone who's using Copilot and tools like this working at non-Bigcos? Or just ignoring that it's sending off your source code to a third party server? Or am I missing something here?
It'd be against the rules to use external pastebins or other online tools that send off private source code to a server, so I'm kind of shocked how many devs are talking about how they use AI tools like this at work... is this just a case of "ask for forgiveness, not permission"?
Check out the SQL example I posted below. If you're interested, I'd be happy to post more. To me, this is not about "is the machine accurate enough already". Maybe it isn't and it needs to mature. But the door has been opened now, and it's only up to "technical details" now.
And I'm not saying this can replace developers, as it clearly isn't capable of building complete codebases and reasoning about the system as a whole. But writing self-contained code snippets seems like a solved problem to me, and I think that's the biggest thing that happened in our field since a long time ago.
Please point me to an "old" model that was able to do something similar to my example, and was general enough to do that not only for any custom schema, but for basically all often used languages as well (both natural languages as input, and programming languages as output).
It's possible to start with tests and then generate the code, using tests to ensure it is correct. Humans iterate many times on a piece of code - write, execute, compare with your desired output, AI should work step by step with executions and feedback too.
I’ve heard AI researchers describe this phenomenon before. As soon as something is discovered or invented it immediately becomes trivial and boring. The goalposts shift, and now they have to find the next amazing thing that will suffer the same fate.
Can you elaborate? To me (but I know very little about it) it seems like part of the incremental progress in the internet availability. What am I missing?
> Intellisense has _always_ had that annoyance factor of getting in your way sometimes, forcing you to write code in a certain way to minimize that. All this just makes it more annoying and I don't believe anyone who claims it truly makes them more productive.
I have the same issues with these tools, but the one situation I can imagine it being really useful is people who are good at reading and understanding code, but are slow typists. Or more particularly, people who have to think about typing, no matter what the speed is (though I think they're usually the slower ones). I believe it's only once you can type without thinking about typing, and have done it for a while, that these tools become an annoyance because you've gotten used to not interrupting your thoughts on the problem at hand.
I have no use for it, and don't expect ever having a use for it. 95% accurate, 99% accurate, and 99.9% accurate are all aweful in this context.
It's something run repeatedly, so small chances will occur. Amoung it's failure states are being very, very wrong in ways that are hard for a skilled human to detect without more work that writing from scratch.
And no one in the space is discussing ways to eliminate categories of bugs, only ways to reduce the frequency. Most of those solutions have the side effect of making the less frequent bugs harder to detect. On balance, that's worse.
And, less importantly, it's only useful for writing boring code that should probably be generalized to an API. Sure, I write plenty of that, but it's not an exciting area to follow in my spare time.
Assume it's per invocation, and each invocation generates a few dozen line function. How many such functions do you write when you get in the groove? If you multiply it out, you'll probably end up expecting a few bugs a day at 95%, similar to most humans might write.
Except you're pretty used to the sorts of bugs you write, and the AI isn't you. So these bugs will be harder to find.
So why is this better than writing by hand? Most of the hard work of programming is figuring out specs and debugging, not banging out well understood and specced implementations.
It's something run repeatedly, so small chances will occur. Amoung it's failure states are being very, very wrong in ways that are hard for a skilled human to detect without more work that writing from scratch.
Also makes for a great plot summary for the original Jurassic Park
This is what amazes me, since it seems like such big news, and people in the field are just not aware of it. Just for reference of what I am talking about, here is a piece of code that was generated, without any cherry picking at all (you just have to trust me on this, sorry) by allowing Codex to be aware of the database with some smart prompting (this is on a DB with music store data):
Needless to say, this query works and returns the data I wanted. Whether this is useful or not is up for discussion. But I cannot understand how it's not amazing.
Ha, do I feel stupid now! Just checked, and seems like I misconfigured my system (the part that builds the Codex prompt from the schema info) - Invoice and InvoiceLine tabls got confused. After fixing it now, it works as expected (using IL.Total).
This non-cherry-picked example is exactly one of the problems people are talking about though: It easily creates something that looks correct, and the users sees what they expect, so things get through that wouldn't have if they thought through the problem themselves.
Yeah, agree - building and verifying apps built on this models is very hard as they are hard to test extensively. But so are most other ML-based apps. The error in that example was on my side, not on the model. I mean, to me this seems simillar to saying that high-abstraction programming languages are a problem because their compilers are hard to develop and probably loaded with bugs.
so it was smart enough to know that the total for the invoice could be safely used as the total sales for the artist because an invoice will only ever contain a single line item? no double counting because of multiple line items or misattribution because of different artists. that's a pretty crazy level of reasoning.
I'd love to try Codex if I could run it on a local GPU and finetune it for my own code. I'd even push to use it at work. But as we're writing in a niche language and our code is heavily problem domain dependent, I don't feel like making my workflow vulnerable to an external supplier, even aside the IP concerns.
Call me when I can download and finetune the weights, like I can with Stable Diffusion.
The problem is that they don't understand the practical ways it can be used. Even tech savvy people don’t yet get it. Even my CEO, a kind of technical person, did not understand the full potential until I explained some use cases.
In one scenario, I took the slow running long MySQL query and rewrote that with Codex in 2 mins.
But I think people have started to realize the potential now.
Pitch: My app https://Elephas.app brings GPT-3 and Codex to all applications in MacOS. Many business professionals are using it.
The question is - can this actually explain the code which really needs explanation - or can it only explain code that should be easy and straight forward to read anyway?
And does having this reduce the amount of discomfort badly readable code creates, and thus make you less inclined to take care the code is and stays easily readable?
TBF, that is actually a situation where the big pattern-matching trained system would probably easily find and regurgitate the correct answer, just from prior exposure to a very distinctive bit of code.
I'm assuming that if you throw enough training-data at it, it will have seen the same "equation" (or at least the constant) right after an explanatory code-comment.
No only I find the tool not useful, as it just state the obvious. My personal opinion is the code should be already very near what the tool gives.
The code should be clear enough for not needed such tool. If you need it, you have a very different problem, my friend.
Yep. Not only it generates useless comments, for me it is actually easy to read the code itself, than the generated comments in this case. I don't know neither language nor framework they use, still it is completely readable.
TLDR but you actually end up reading even more than the original. I could be wrong and this might actually work and condense a big function but if that is true, why showcase such an example.
It’s helpful if you can read English but the code is difficult to understand. Most explanations of code are more verbose than the code they’re explaining because code is usually pretty terse compared to natural language.
You can think of “too long” referring to the time it might take someone to reason out a particularly terse, dense line of code verse the actual length of the code.
Something like this could be helpful if the stumbling block is the syntax. If the output consistently looks like the example, though, it's not going to be very much help explaining the longer tail of straightforward code that simply implements hard-to-understand logic, though.
I can see something this functionality being useful to explain dense, ungooglable code, like regex, or maybe APL. That said, I couldn't really trust current-generation ML to actually produce a correct explanation instead of being confidently and wildly wrong.
in this particular example at least for me it is easier to read the original code. Even though I don't even know the language they use. And I'm pretty sure this is the case for the most developers with non-zero experience.
I know there's such thing as idiomatic code, but I can't help but think the code in the tweet would be much more readable - and no ai needed - if the variables/methods/args were better named.
I think tool like this can make sense if you cannot read the code.
Probably this still gets confused like humans do when variables and functions are named less clearly or even plainly wrong.
I wonder if reading explanation like this makes you more likely to believe code is correct, even if some details are wrong.
In this signature example, you can read the wrong header, calculate hash the wrong way, compare hashes wrong way, etc. there are some many tiny mistakes.
I don't know that I have much of a need for this and, although I'm hesitant to provide crutches to people especially when they're in the early stages of their learning, this might be helpful for more junior people who are ramping up, especially in a large project. Is there a way to use this or something similar today in PyCharm, etc?
I wrote something similar before, my friend had a nice technique to do code analysis and remove everything but the critical path to the point in code you had your cursor over. Then I fed that code path into gpt3 to generate an explanation of that critical path.
Wound up being useful for explanations of long code paths across file boundaries in large code bases.
I know it's completely missing the point here but: it's a good habit to verify signature using constant time comparison rather than == to avoid timing side channel attack :)
looks great, reminds me of our product, https://denigma.app that explains the business logic of code and technical concepts, which recently launched an extension for all Intelli-J based IDEs (except Android Studio- there's a compatibility issue with it)
I use this and have pushed it to my project managers as a way of getting some insight to code and more specifically code changes that we are making and need to explain to clients.
I bet it would absolutely shine for regex. I can’t think of a more obvious use case, where information density about the state space is so high and the implications are so tuned for computer interpretation instead of human readers. I say that as someone who quite likes regex but realizes most humans don’t.
A tool that could provide example inputs for Regex could be really useful. Especially if it could also provide similar examples inputs where it doesn't match and evaluate edge cases.
Before clicking through I thought this was for this great command line tool [0]. I'm skeptical about GPT-3 generated comments, but I can recommend the other TLDR whole heartedly!
Code can have ambiguous scope too, except the compiler has a decision making process that you might not fully understand. (Closures are famous for this problem).
In the English you can recognize the ambiguous parse and reject it.
The meaning of code is defined by rules. The only ambiguity exists in human understanding of code.
Natural language doesn't have such rules. My example above isn't an example of incorrect English, there isn't a correct interpretation. The grammar is correct and both interpretations are correct.
Indeed. And if well written, the majority of the code shouldn't need comments. For example, well-chosen descriptive identifier names can make code fairly self-documenting.
The comments then should then be reserved for the cases where it is not obvious why or how something is done, or for assumptions that can't be expressed through code (type system or similar).
So instead of having the following
// set i equal to five
i = 5;
either skip the comment, or, since 5 might seem arbitrary, comment why 5:
nope
this is variation of 'don't write obvious comments to your code'. Useful comments can include the 'whys', links to design docs/decisions, references to other part of source code. Here we see the case of commenting everything for the sake of commenting. Without adding any additional knowledge/insights.
> X does something that is commonly done, in a different way than Y. This means X is useless
It's still not clear how this explanation is supposed to be sufficient to explain the conclusion.
If a comment is collapsed in an IDE and you spend 1 click/key combo to expand it, versus some key combination to pull up the TLDR, the difference is what?
Comments being embedded IN code could be a thing of the past. No PR necessary to maintain them. Sounds like an improvement to me.
That is, it's like it generates these kinds of comments:
That is, IMO the whole purpose of comments should be to tell you things that aren't readily apparent just by looking at the code, e.g. "this looks wonky but we had to do it specifically to work around a bug in library X".Perhaps could be useful for people learning to program, but otherwise people should learn how to read code as code, not "translate" it to a verbose English sentence in their head.