Hacker News new | past | comments | ask | show | jobs | submit login
Cheating is All You Need (sourcegraph.com)
405 points by iskyOS on March 23, 2023 | hide | past | favorite | 354 comments



> LLMs aren’t just the biggest change since social, mobile, or cloud–they’re the biggest thing since the World Wide Web. And on the coding front, they’re the biggest thing since IDEs and Stack Overflow, and may well eclipse them both.

I personally feel the technology is over-hyped. Sure, the ability of LLMs to generate "decent" code from a prompt is pretty impressive, but I don't think they are biger than Stack Overflow or IDEs.

So far my experience is that ChatGPT is great for generating code from languages I not proficient in or when I don't remember how to do something and I need a quick fix. So in a way it feels like a better "Google" but still I would rank it as inferior than Stack Overflow.

I am also hesitant about the statement that it makes us 5 times as productive because we only need to "check the code is good" for two main reasons:

1. It is my belief that if you are proficient enough in the task at hand, it is actually a distraction to be checking "someone else code" over just writing it yourself. When I wrote the code, I know it by heart and I know what it does (or is supposed to do). At least for me, having to be creating prompts and then reviewing the code that generates is slower and takes me out of the flow. It is also more exhausting than just writing the thing myself.

2. I am only able to check the correctness of the code, if am am proficient enough as a programmer (and possibly in the language as well). To become proficient I need to write a lot of code, but the more I use LLMs, the less repetitions I get in. So in a way it feels like LLMs are going to make you a "worse" programmer by doing the work for you.

Does anyone feel that way? Maybe I am wrong and the technology hasn't really clicked for me yet.


I don't quite know how to put it, what follows is a rough draft of an idea, maybe someone can help me to reword it, or perhaps it's trash.

Since its inception, computer science has had two "camps": those who believe CS is engineering, and those who believe CS is mathematics. The reason why we are seeing all of this fuss around LLMs is that they are a new front of this feud. This "extends" the usual debate on emerging technologies between Thymoetes and Laocoon.

Something that works 99 times out of 100 is 99% correct from the first perspective and 100% wrong from the second.

LLMs are therefore a step forward if you take the first view, a step back if you take the second.

If you accept this interpretation, an interesting consequence of it is that your outlook on LLMs is entirely dependent on what amounts to your aesthetic judgement.

And it's very hard not to have rather strong aesthetic judgements on what we do 40 hours a week.


Camp 3: Those of us who have viewed coding as a craft.

Math - the study of well defined concepts and their relationships. Solving problems with proofs.

Engineering - solving well characterized problems based on math and physics (which can include materials with known properties, chemistry, approximations, models, …), and well defined areas of composability (circuits, chemical processes, structural design, …)

Craft - solving incompletely characterized problems with math, physics, engineering and enormous amounts of experience, intuition, heuristics, wisdom, patterns, guesses, poorly understood third party modules, partial solutions pulled from random web sites …

Art - Solving subjective problems by any means necessary.


> Engineering - solving well characterized problems based on math and physics (which can include materials with known properties, chemistry, approximations, models, …), and well defined areas of composability (circuits, chemical processes, structural design, …)

Eh, I think you’re overselling how precise and well defined engineering is in other fields. Engineering in other fields is just as much dealing with poorly characterised problems as it is when writing code (it takes quite a lot of characterisation to go from “we want a bridge here”, to an actual damn bridge, and that’s all an engineers work).

Really the core of engineering is just a very broad set of practices and principles that allows people to solve poorly characterised problems using maths, physics, enormous amounts of experience, intuition, heuristics, wisdom, patterns, educated guess etc in a reasonably consistent and repeatable manner. Doesn’t matter if you’re building a web browser, a motherboard, or a bridge. You don’t get a good result without a healthy dollop of wisdom, experience, educated guesses, and a handful of fuckups (which hopefully you notice before you let people use the thing).

Engineering in other disciplines is no less messy, haphazard, and experimental than it is in software. It just isn’t as publicly documented as it is software, probably because it’s hard to build an open source bridge.


Well, most engineering fields have well defined applied math that spans from the problem domain to the solution domain.

Logic in digital circuits.

Algebra and calculus for analog circuits, most physical objects, properties and processes.

Differential equations for dynamical systems and dynamical behaviors.

Sure there is a lot of creativity in engineering, but there is usually a whole area of math known to be suitable for expressing solutions clearly, given the area of engineering.

Contrast with the utter lack of standard notation across software tools and implementations, for describing all the trade offs, gotchas, glue, historical drift & complexity, theories of memory, caching, user affordances, potential overflows, races, etc. that is implied by a program’s code.

Sometimes a language provides islands of engineered code, like message passing in Erlang, or memory management in Rust, or a precise mathematical library like BLAS.

But most aspects in most software programs are created ad hoc, or inherited from someone else’s rats nest of an implementation, and never formalized completely, if at all!

Any clarity in representation quickly leaves planet applied math.


> Sure there is a lot of creativity in engineering, but there is usually a whole area of math known to be suitable for expressing solutions clearly, given the area of engineering.

Take it from someone who’s studied the maths you’ve described and applied it in a professional capacity. Just knowing the maths isn’t anywhere near enough. It’s like knowing how quick-sort works, interesting and useful, not even close to enough to actually build anything.

> Contrast with the utter lack of standard notation across software tools and implementations, for describing all the trade offs, gotchas, glue, historical drift & complexity, theories of memory, caching, user affordances, potential overflows, races, etc. that is implied by a program’s code.

The entire field of computer science is dedicated to developing and using this type of notation to describe and understand the basic principles that underpin every programming language, database and algorithm you’ve ever touched. The notation exists, you just don’t use it. In the same way an engineer in another field doesn’t bother doing structural analysis, or circuit analysis from first principles, they just grab a pre-finished tool and apply it to their problem. Normally by just passing the problem to a computer that does all the heavy number crunching, and checking the outputs make sense.

> But most aspects in most software programs are created ad hoc, or inherited from someone else’s rats nest of an implementation, and never formalized completely, if at all! > > Any clarity in representation quickly leaves planet applied math.

Again, I think you’re vastly overestimating the precision of other fields of engineering. There might be more rigour in design in some places, but only because “just making a thing and see if it works” is expensive, but other engineers absolutely spend huge amounts of time just making things to see if stuff works.

There’s a reason why “safety factors” are a thing, and reason why they’re usually 10x or greater. That safety factor is also the “we’re not sure how well this works, so we made it ten times stronger than we think we need, just in case, factor”. Engineering in other fields is people doing maths all day long, it’s mostly reading data sheets, assembling Lego brick components, and hoping to hell the manufacturer didn’t lie too much on their data sheet. Plus some design and simulation on a computer for good measure.

You wanna see “ad-hoc or inherited from someone else’s rats nest of an implementation” in a different field of engineering. Then go look at any electronics catalog, or lookup YouTube videos of people testing parts against the specsheet and discovering how wildly different they can sometimes be.

Dodgy, badly implemented, never formalised engineering exists everywhere. That’s why bridges collapse when they shouldn’t (Genoa Bridge), why planes crash when they shouldn’t (Boeing 737 Max), why cars emit more emissions than they should (VW), why buildings get emergency modifications after being built to prevent them from being blown over (601 Lexington Avenue). Software engineering does not have a monopoly on botches, last minute hacks, and dodgy workarounds. Engineers in other fields were merrily employing all of them to great effect for centuries before software turned up.


All four are prevalent in our industry.

I recall a self taught dev (or maybe from a bootcamp) coming up with a cascade of nested if-else, nested 8 deep. Someone with a background in CS asked him what he was trying to do and basically concluded that what he was trying to do could be expressed as a state machine. To which the initial dev replied that it was "way too fancy" and that he didn't need the code to be fancy, just work.


+9000 for your post.

The ugly part: The nested-8-deep solution was faster to market and costs less. And, it will be thrown away in 6 months during the "big rewrite after we scale". So the perfect-is-the-enemy-of-good state machine solution written by an expensive engineer has less value. Oompf.


Possibly. Or not. It could be that the elegant expression can be written in less time by the expensive engineer, doesn't need to be rewritten in 6 months, and it may be that both of these people are getting paid the same anyway. So you can make up any "just so" story that you like about a made-up anecdote. If programs A and B implement the same function and both have adequate performance, then the differences between the two artifacts come down to style.


> If programs A and B implement the same function and both have adequate performance, then the differences between the two artifacts come down to style.

What about maintainability? Extensibility and ease of debugging?

I've seen chunks of projects re-written, just because it was simply impossible to extend them without significant efforts!


Well yes, that's what we're talking about. I mean that the compiled artifact is equivalent, but of course you may have reason to care about the style of the code.


> I mean that the compiled artifact is equivalent

Are they? Source readability will still matter if you end up using a debugger!


I have a camp 4: viewing coding as a roadblock. A necessary obstacle to achieving some result. This is how companies view programming. They don't know about it, don't care about it, they just know they have to do a lot of it to produce their next product.


But what you are saying is that all economic activity is a roadblock.

The division of labor into software developers and non software developers isn't any different than farmers vs non farmers or any other profession.


Tacking roadblocks as they come up in service, creating a better product … that sounds a lot to me like engineering except with some words twisted around?


Coding is clearly all those things.


I cringe every time I see someone use the term "craft" in the context of computer programming. It feels like a desperate attempt to seek adoration as a "sub-genius" in a poorly understood field. Computer programming is hard because the platform is either poorly documented or changes very quickly. Too much of my professional work is about finding a "super hero" (their term, not mine) solution to a poorly documented problem. It is tiring, and not-at-all heroic. Coding bootcamps have taught us that when you tear down the gatekeeper walls, more people can write CRUD apps (that the world actually needs) than ever thought before.

As a counterpoint: Look at the history of the libxcb: https://en.wikipedia.org/wiki/XCB

    [Bart] Massey and others have worked to prove key portions of XCB formally correct using Z notation.
That sounds like math to me. Or is it "craft"?


> poorly documented or changes quickly

[otherwise a (code) monkey could do it] is missing the point of programming.

I know managers who code (occasionally) who think similar. I thought so personally before I had actual prolonged experience with professional programming.

It is hard to express it concisely: why it is fundamentally wrong (category error: like thinking that perl regexes can be reduced to DFA--it is impossible in the general case even if DFAs can [sometimes even should] be used in many cases instead).

It is the same reason why waterfall programming fails most of the time. It is the same reason why generating code from UML diagrams produced by analysts is also a failure in the general case. It is the same reason why log normal distribution can be a good model for software estimation https://news.ycombinator.com/item?id=26393332

And no, you can't replace all programmers with a LLM prompt for the same reason (at least until [if ever] it reaches AGI and then humanity would have much bigger problems).

"agile" became a noun but if you look at the origins, you might get why "craft" may be applied to programming. Try "The Pragmatic Programmer" book.


We can agree that coding can involve math, engineering and craft. (And art!)

Mathy projects, formally driven: matrix multiply libraries, symbolic computation, constraint resolution, ...

Engineered projects, formally (or close to it) verifiable: 3D rendering pipeline, distributed database management, garbage collection process, ...

And craft. Which, based on the internet I have experienced, many apps unjustly inflicted upon me, and some memorable restarts between game saves, is most code.

Craft code necessarily involves amateur code, code which isn't economically worth engineering (when you can just throw unit tests at it. Or just wait for user reports!), code referencing weakly characterized libraries or interfaces, and code involving features that have become complex enough that the best reference model for its expected and unexpected behaviors is now itself.

Bzilion's Law of Coding Formalism Levels: "Any ambitious enough software project will descend into an exercise of pure desperate craft. Just before it becomes gambling."


I think this comment may miss the computer forest for the computer science trees. For a large portion of the world, computers aren’t engineering or math, computers are a tool to get something else done.

For those people, unless something fit within an existing (but large!) range of use cases, they were out of luck without having an engineer or mathematician figure it out for them. Suddenly, there is a glimmer on the horizon that all of that possibility the computer science people see every day could be unlocked for the users, and even if it only works 5% of the time, that is enough to get them excited in ways that are hard to describe to the computer science people.


This is a fantastic point, and it's what most software businesses have at their core. They just provide the tools to get something else done. A lot of these smaller places are going to be devastated when people become far more self reliant (or I should say reliant on the AI providers) than them.


My analysis is limited to tech people.

For the rest of the world, while some might be excited by what you describe (and it that works for them, that's great!), I believe in general the interpretation is far simpler: me like shiny.


This is some very CS high-horse thinking. I work with people who are already using it in ways that meaningfully improve their existing workflows. It isn’t doing anything special to someone who makes a living on computers, but it is doing things they couldn’t do without those people.


Genuine question, are you seeing non-coders using it to do any useful coding? As a developer, I find that it is wrong more often than it is right and were it not for my domain knowledge, I would have no idea why (or sometimes when.)


/r/iamverysmart


A “tool to get things done” doesn’t seem to contradict the math or engineering point of view. Which is to say, a screwdriver is an engineered device that is also a tool (it also has a mathematical description I guess, just, a fairly boring one from a pure math point of view I guess).


Sure, but the relationship is different.

Imagine going to school, a boot camp, or being self taught in everything about screwdrivers and screws. You can discuss at length the advantages and disadvantages of different shapes (Robertson bits > all), materials, screw threads, etc. You can custom design a screwdriver and screw for a specific application, taking into account all of the relevant constraints.

Now imagine the guy who needs to tighten a loose cabinet door.

Screwdrivers don’t have nearly the complexity or ability to generate work leverage that computers do, moving even a few percent of those capabilities from the first group to the second is huge. It is, at minimum, Excel huge.


That's a great analogy which I will steal.


This nails it on the head pretty much for me. I'm personally hugely interested in the potential of LLMs to enable me, a non coder, to create programs that might only have marginal utility to others, so are likely not going to get built by anyone who actually knows how to do this stuff, and aren't exactly important enough for me to actually learn how to code (I don't really have the right type of brain for it anyway) but are interesting / useful enough to me to figure out how to get LLMs to make them for me, as I don't really care how they work as long as they do.


A personalized sociopathic fabulist for all!


> the usual debate on emerging technologies between Thymoetes and Laocoon.

Could you expand on this?

> Something that works 99 times out of 100 is 99% correct from the first perspective and 100% wrong from the second

Interesting. From a _manufacturing_ perspective, you can't achieve 100%, you can only get asymptotically closer to it with statistical process control. And of course there are limits to the perfectibility of humans.

This suggests that the big deployment of AI will be in areas where there is no clear boundary between right and wrong answer.


>> the usual debate on emerging technologies between Thymoetes and Laocoon.

> Could you expand on this?

Not important, it's just a rhetorical flourish. In the second book of the Aeneid, Thymoetes is the guy who says (paraphrasing) "let's bring the horse inside" and Laocoon is the guy who says (literally) "beware of Greeks bearing gifts".

> This suggests that the big deployment of AI will be in areas where there is no clear boundary between right and wrong answer.

"AI" is an umbrella term at this point. If by AI we mean LLMs or similar technology, then my hunch is to agree with the statement. I don't think this is particularly controversial though, IIRC Yann LeCun said something similar.


"Timeo Danaos et Dona Ferentes"

Roughly, "I fear Greeks even when they come bearing gifts".


There's never been a camp of computer science that said anything but the truth, which is that CS is applied mathematics.

However, there is a pragmatic school of hacking, which says that results are all the matters. If you're in a startup, you should be pragmatic, and worse is better.

Nobody truly believes that CS is engineering.


Software Engineering is engineering


Software engineering is actually not engineering either.


TIL I need to tell my school to change my degree on diploma

What reasoning are you using to come to conclusion that software is not engineering?

>The creative application of scientific principles to design or develop structures, machines, apparatus, or manufacturing processes, or works utilizing them singly or in combination; or to construct or operate the same with full cognizance of their design; or to forecast their behavior under specific operating conditions; all as respects an intended function, economics of operation and safety to life and property

It is purely software engineering.


Because engineering is a specific discipline that balances physical force, the nature of materials, and costs, to produce a physical thing (a building, a bridge, a sewer system, a reservoir... "Software engineering" is a metaphor for the body of knowledge and ability to design and construct software systems. Unlike engineering, there aren't that many right answers.

Whenever you have to qualify a noun with something else, the result is something narrower than the original noun, and often completely different:

- Software Engineering is not Engineering

- Street Justice is not Justice

- Covert Intelligence is not Intelligence


>Unlike engineering, there aren't that many right answers.

SE is way, way younger than other engs.

>to produce a physical thing

Why physical thing would be a requirement here?


Because that's GP's attempt to differentiate it as it cannot be done any other way ;)


Physical things are obviously subject to physical laws. With software, things may be less clear. Sometimes they are not. The key part of engineering that's highlighted by such cases is that mathematical models are relevant. This is why an engineer sometimes has to say "no, that won't work".


Software is subject to physics and math/cs laws

>This is why an engineer sometimes has to say "no, that won't work".

But similar scenerios can occur in SE world too, so what does it even mean? like some unsound design of distributed system


Then if you see yourself as a professional and if you have a strong character, you say it. If you're a real engineer, you and everyone knows this is part of the job, but if you're a SE, generally speaking you're going to meet resistance. Your resignation letter should be ready in your pocket.

The surprising thing to many of us is that the "software engineer" title has been specifically associated with deprofessionalization of the job and the view that is moving away from true engineering. See the Dijkstra link I posted elsewhere in this thread.


I think your definition of engineering is a curious mix of "No true Scotsman" fallacy and a bit of circular reasoning.

My parents were very respected traffic engineers. One was concerned with traffic inside cities, another with road design. One couldn't give less care about "nature of materials and physical force" while figuring out where to put traffic lights and close roads, while another had a set of standards (say a lorry's turning radius, sign sizes etc) which they've applied without thinking about costs or even producing a pysical thing. That was done by some other contractor in a sense anyway.

A friend of mine is an electrical engineer. He maintains electric motors and I guess he would fit your definition better. At the same time, how much an electrical engineer has in common with traffic engineer? Or say an air traffic engineer? Or a construction engineer? I would say what they mostly have in common is the "engineer" part as they are doing completely different jobs in completely different industries. I might even go further and say when someone calls "Alice is an engineer", it is just a shorthand for "Alice is a mechanical [or some other] engineer".

So those were the differences, but lets see the similarities. When a traffic engineer gets a project to see if they can increase the traffic flow from point A to B it usually follows the common pattern of finding out the current situation, figuring out a solution, testing/simulating it out and then making the project/solution on what should be done. Then when the solution gets implemented, some details can be re-adjusted etc. I did simplify it a bit, and I'm unsure in the exact english phrases, but I think you get my point. Software engineers that I know follow the exact same process. Oh, a database is overloaded, lets see what we can do about it. Can we give it more hardware? No, as thats too expensive? What about if we scale up/down the services/workers/instances/nodes/pods calling that database? Can we simulate that? What do the metrics say? Etc etc. You see where am going with this? In essence, there is no difference.

My point is, if you cannot call Software Engineering only by the word "Engineering", then you cannot call anything else (just) Engineering. You can dance around this whatever you like, and define Engineering either narrowly or widely enough not to include Software Engineering (just like you did), but at the end of the day this isn't a natural science like say Maths or Physics. You cannot say 2+2=5, but you can say Engineering is X. But then, lets be honest here, and I suspect this is the root cause of this: it's about exclusivity and status. Engineers, doctors, lawyers, electricians, plumbers and a myriad of different professions like having guilds (or similar organizations, speaking in general here), and like having artificial scaricity. This brings status, it brings money, it brings power and influence. It cannot work if everyone is allowed in. It's what's simply called gatekeeping. It's what hiding behind a dismissive, elitist sentence I've heard too many times: "Oh, but he's no engineer". So if we're honest here we can call a spade a spade.


I'm old enough, and have been in the profession long enough to recall "software engineering" as a new term attempting to make the profession sound fancier. When I started we were programmers. We wrote programs. Sometimes we were also called "coders" though that wasn't understood outside the profession, where "programmer" was. Then the term "software developer" came along, and we were all "developers." Now, when I'm talking about my team of "engineers" I'm speaking of all my programmers. They create software, with all the design, authoring of code, and wielding of infrastructure that entails. Does that make the actual practice "engineering?"

Let's resort to definitions, with apologies.

Engineering

> Engineering is the use of scientific principles to design and build machines, structures, and other items, including bridges, tunnels, roads, vehicles, and buildings.[1]

That certainly doesn't make Software Engineers what are traditionally thought of as "engineers." So... on to Software Engineering:

Software Engineering

> Software engineering [is] the application of a systematic, disciplined, quantifiable approach to the development, operation and maintenance of software and the study of these approaches; that is, the application of engineering and computer science to software. [2]

That description does not fit what most of us do when we write or create software. Perhaps it should. We'd be better off in many cases. But programmers are often closer to gardeners than engineers in that we approach solving problems through empirical processes, testing and verifying as we go. TDD is an admission of the need to do just that. So is Scrum (empirical process control). There absolutely are cases where algorithm optimization using the latest research must be applied just so. There absolutely are cases where applying gradient descent and convolutional neural nets in specific fashions applies. Most of us are gardeners growing our systems, or carpenters building out a feature, though. Programmers. Software engineer makes it sound loftier, just like sanitation engineer makes trash collection sound loftier.

I will absolutely concede that I still use the "engineer" terminology on a day to day basis, my opinion on the internet notwithstanding. But it is a label that is weakly applied for fashion and connotation, rather than for its distinct meaning, and I suppose that is what I tilt at here on HN. Apologies for the pedantry.

[1] https://en.wikipedia.org/wiki/Engineering

[2] https://en.wikipedia.org/wiki/List_of_engineering_branches


Even on the pages you've linked, Software or Computer Engineering is still Engineering. You might say that it just doesn't fit the definition (from the same exact page), but then you can either take the argument with the Wikipedia authors or just think on it on your own.

I'm sure that say a conteporary civil engineer would scoff at calling Nikola Tesla an [electrical] engineer, just like you're virtually scoffing at calling a Software Engineer only by Engineer.

Admittedly that the software field is young, but does it make it "worth" less? To entertain your argument, do you consider a person programming PLCs an electic engineer? What's the fundamental difference between a software programmer and them? I don't see any. So you cannot call one an [Electrical] Engineer and another one "just" Software Programmer.

Gardener is not an engineer. On a ship there are usually at least two departments: Deck, and Engineering. The latter are people in charge to make sure that ship's systems are up and running at all times. There is no theory behind it, or pretty much any maths. It is mostly boring maintenance. And I would be surprised that anyone would call those people non-engineers. So why bring this up? Engineering definitely has a vague connotation with certain terms like electrical power, mechanics, devices, construction, materials etc. As the gardener or a trash collector doesn't deal with any of those, noone is considering a gardener an Engineer.

One can say, oh it's the title inflation, but that actually applies to any profession then. Or to put it differently, what makes a person an engineer? The work that person does, or the title it was given to them by some institution? Is it both? Is someone an engineer but is a career politician? I sure know one, and the guy will always be legally an engineer. This is a classic "what makes an art, Art?" question which is another can of worms I'm not going to open.

So I'm not sure what you're arguing then when all of this is on a very shaky ground. My position is that it is just a matter of a title and social convention. If you're a part of a engineering department, then I'm sure people outside of that department will call you an engineer. Inside of that department you might be a programmer of say a backend service, I could be a platforms guy, Alice could be QA engineer, and Bob could be a hardware engineer. It just doesn't really matter in the end. All of those are just social constructs which change with time. Or if we take it to a institution level: in some countries you have legally recognized Software Engineers (as I found out in other threads), in a country like mine you don't have one, whether we like it or not. So as I said in my previous post, this isn't maths or physics, and it is just a social construct which varies all over the world. So arguing about it is like arguing about the definition of art. I like spending time on engineering more so that's it from me ;)


You can define engineering in many ways, and none of them is equally right in every possible context in which the term is used (so it has this in common with every other word).

However, there are some ideas clustered around that seem to have something to do with it. Engineering on a ship has to do with electrical and mechanical systems, and in a highly constrained set of outcomes, essentially reducible to a single bit at any point in time. (Either the systems are up and running right now, or they aren't.) Within this system of evaluation of outcomes, it's not surprising if the engineers in this context aren't using maths every day but are engaged mainly in more practical actions. However, you can be sure that they know a lot of safety tolerances and operational characteristics of various ship systems that would be characterized as mathematical, even if they are mostly operating well within tolerances that make these operations routine.

One reason why people argue about "software engineering" is that there are authors who define it specifically to include scientistic or bureaucratic rituals that have no mathematical underpinning, while excluding the difficult mathy bits of engineering. The further you get from actual engineering, the more common this definition gets.

EWD1165 "There is still a war going on.":

https://www.cs.utexas.edu/users/EWD/ewd11xx/EWD1165.PDF


You can just read what Dijkstra said about it already 30 years ago, not much has changed, and I would have little to add.


If there isn't a possibility of someone dying if you make a mistake it isn't engineering.


Software used in medical devices, airplanes, and other applications where mistakes can lead to death does exist, and there has been tragic examples of software bugs killing people:

https://en.m.wikipedia.org/wiki/Therac-25


The people that write the software for such things are engineers.


If you have full expectation and support from your superiors that you may shut down software development operations for ethical reasons based on your technical expertise, you might be a real engineer.


Welcome to the real world, where software can get people killed.


Have you ever written such software?


because?


why


The software people who work on embedded systems like medical devices, speakers, sensors, electronics etc. are doing engineering all the time. Although I guess it's mostly EEs or computer engineers who do that kind of work, but many concepts of CS are still relevant there.


Engineering is just applied math too.


That is true of real engineering; unfortunately "software engineering" has been defined differently by some authors, with quite the opposite meaning.


I find your observation about the two camps in computer science quite compelling, and it got me thinking about another analogy that might further illuminate the LLM debate: the evolution of cities.

Urban development can be seen as a balance between careful planning (akin to the mathematics camp) and organic growth (resembling the engineering camp). A city designed with a focus on aesthetics and theoretical frameworks might be visually appealing, but it could lack adaptability. On the other hand, a city that grows organically may not be as cohesive, but it's more practical and responsive to its inhabitants' needs.

This parallel can help us better understand the emergent properties of LLMs, which arise from their complex interactions. By appreciating both the engineering and mathematics perspectives, we can gain a more comprehensive understanding of these properties.

Moreover, the balance between early adoption and risks, as seen in urban development, can also apply to LLMs. Early adopters of LLMs can tap into their potential, but they must also be aware of potential risks, such as biases and ethical concerns.

Oh yeah ChatGPT wrote this answer.


Good reminder of how vacuous even thoughtful-sounding writing can be.


Well, the interesting question is whether LLMs will enable programming to go back to being interesting.

So much of programming is rote boilerplate garbage simply linking things together and so little of it is actual creative thought. If LLMs could actually generate the rote boilerplate, programming would be soooo much better.

Alas, my optimism isn't that high.


The issue I see with attempting to claim that these are merely differences of opinion is that it only takes a single bug in your code for someone well-versed in exploitation to not just steal all your data but often replace your entire program with their own evil one. I spend quite a lot of my outreach efforts essentially having to explain to the people who think software development is somehow unrelated to math that once you accept a bug into your codebase the effects tend to be as non-local as accepting "1+1=3" into a math proof, resulting in lost privacy, lost money, or even lost lives.


Computer Science is, by definition, math.

Coding and software construction is engineering or craft, and is not Computer Science.

LLMs are neither. They are power tools for concept realization.

It's the difference between stone chisels and a suite of shop tools. We had pen and paper, or small steps up from those, and now we have LLMs.


FTFY: 20 hours a week !


I think this is mistaking the current .01 iteration with what the technology will be able to achieve. All sorts of groundbreaking technology looks like a minor improvement over the previously refined version until it gets implemented in a way that takes advantage of its strengths, as opposed to just being plugged into old workflows.

LLMs cannot be judged by their first few incarnations. What can be trained into them currently exceeds imagination. Imagination is our limiting factor.

And I don’t say that from the context of “I jumped on the hype train at the end of last year”. I remember reading the 2017 Google transformer paper and thinking “whoa, this is really happening.” The fact it happened in only 5 years is pretty impressive. Im not sure many papers or innovations got my mind spinning quite like that one.


But there is an unanswered question of how far this technology can go based on its fundamentals. Coding is much like driving, you can't do 80% and let the human do the final 20%, because that final 20% requires reasoning about a well understood design that was implemented throughout the first 80%.

If your fancy AI coder thingy can't really reason about the end task that the code is solving - and there is little to indicate that it does, or that, any moment now, technology will advance to the point that it will - then the 80% will be crap and there exists no human that can finish the last 20%, not even if they put up 200% of the effort required. We still don't have a working AI solution for driving, a well understood and very limited problem domain, never-mind the infinite domain of all problems that can be explained in natural language and solved with software.

What you end up with is a fancier autocomplete, not an AI coder. Boilerplate and coder output might simply increase to take advantage of the new more productive way of generating source code, just like they did for the last decades whenever there was a "revolutionary" new tech, like high level languages, source control, IDEs and debuggers, component distribution etc. etc.


You’re already limiting your imagination to “coding.”

These are data transformers that can transform raw data without coding at all. At what point does a model itself replace code?

It’s sort of like a CPU, right. You can have hardware that specialized, or general purpose hardware that can do anything once instructed. LLMs have the ability to be general purpose data manipulators without first having to be designed (or coded) to perform a task.


> data transformers that can transform raw data without coding at all

How do you know this is 100% reliable, per upthread discussion?

We've already had this problem with Excel in various sciences, which while deterministic has all sorts of surprising behaviors. Genes had to be renamed in order to stop Excel from mangling them: https://www.progress.org.uk/human-genes-renamed-as-microsoft...

AI promises "easier than Excel, but not deterministic". So more people are going to use it to get less reliable results.


Weird argument. Excel is one of the most popular and profitable programs of all time. If your argument is that LLMs are like Excel, the logical conclusion would be that they would be wildly successful.


Quite possibly. But not 100% reliable.


And humans are of course 100% reliable...


CPUs too. And RAM, no way that goes wrong. Hey, hard disks are infallible right?

Yeah, nothing is 100%, and if it were nothing could prove it was.


Isn’t it deterministic with the temperature turned down? You can control when it gives a precise vs fuzzy answer.


I didn’t say “LLMs solve all problems” or “there will be no place anywhere for code anymore.”


okay - how do you distinguish between scenarios where it's appropriate and where it's dangerous?


There are two contexts in my experience where it's been important to get the numbers exactly right:

1. Cherrypicking sports statistics for newly set records and the like (NB: this is not lucrative)

2. Financial transaction processing

In most other contexts, especially analytics and reporting, nobody cares and nobody is going to check your math, because the consumers are just trying to put a veneer of numeracy on their instincts.


Ok, but then you completely give up the ability for human actors to understand and fine-tune the process. It would necessarily be a stochastic product: we don't know exactly how it works, it seems to output correct results in our testing but we can't guarantee it won't cook your dog in the microwave.


I completely agree that groundbreaking technologies come from an iterative process. However in the case of LLMs I believe we are already at a point where we can judge where the technology is going as its not the first iteration. Sure it will keep getting better and I think that its already a very useful tool.

My problem with it is that they are over hyping its capabilities and trying to market it as "it makes developers 55% faster" because it writes the code for them. I think it would be a better approach to market it as a great tool for automating repetitive tasks and a better way to consume documentation.


How would you respond to the central premise of the article? Which I understood as:

* There may not be a lot of differentiation between different LLMs in the long run

* Where there is differentiation, is in data (both the data used to train it and the data provided within its context window for a given query)

* Ergo marrying search to the LLM, while currently in its infancy, will be a big deal and a big differentiator -- because if you can quickly find the right data to pack into the context window, you will get much better results than what we're seeing today.


The technology hadn't clicked for me either. Today I had to write a script for which it would have taken me maybe 30 minutes or so on my own. I asked ChatGPT (GPT-4) to write it for me, and it got it right in the first try. I just spent a few minutes checking over the code.

It truly is magical when the code just runs. Later I asked it to make several non-trivial changes to the code based on more requirements I thought of, and it aced those on the first go as well. Again, I checked the code for a negligible amount of time - compared to how much it would have taken me to write the code on my own.

I do think humans will slowly get worse at lower-layers of the computer stack. But I don't think there's anything inherently bad with it. Compilers are also doing the work for you, and they are making you bad at writing assembly code - but would you rather live in a world where everyone has to hand-write tedious assembly-code?

Maybe, in the future, writing Python would be like what writing assembly is today. We might go down the layer-cake once in a while to work with Python code. That does not mean we give up on the gains we get from whatever layers are going to be put on top of Python.


The compiler is a deterministic tool (even undefined behaviour is documented). So you can spend some time understanding the abstractions provided to you by your compiler and then you know exactly what it is going to do with your code.

What is the equivalent of this for LLMs? Is there anyway generative models can give a guarantee that this prompt will 100% translate to this assembly? As far as I understand, no. And the way autoregressive models are built I don't think this is possible.

I agree that they are useful for one-offs like you said, and their ability to tailor the solution for your problem (as opposed to reading multiple answers on stackoverflow and then piecing it yourself) is quite deadly, but for anything that is even slightly consequential, you are going to have to read everything it generates. I just can't figure out how it integrates into my workflow.


This is nice, but if you actually like writing code, rather than instructing someone in natural language what you want to have written, then this is not an attractive prospect.

It’s like telling a novelist that they can produce novels much faster now because they only have to think of the rough outline and then do some minor editing on the result. For most, this is antithetical to why they became a novelist in the first place.


You're talking about the distinction between doing something because you love it and doing something as a means to an end.

It's a funny distinction! Knowing something can be automated can take some of the fun out of it, but there are plenty of people who still do stuff for fun when they could buy the end result more cheaply.

For employers, though, it's all a means to an end. Go write for the love of it on your own time.


Except that many people don’t get into their profession as a mere means to an end. They chose the profession because they like it, and they want to spend their lives doing stuff they enjoy. Being employed just as means to an end is not worth the large amounts of time you spend doing it, if you can help it in any way. Let’s not normalize a dystopia here.


And else thread from a couple days ago... https://news.ycombinator.com/item?id=35235534

> I was recently laid off, and I know a few other people laid off. I have years of doing projects and contributing to OSS and being a technically curious learner. I found a new job much faster than my peers who admittedly joined tech for the money and don’t care to learn or grow beyond their next pay raise.

There is a fairly consistent chorus of people getting into software development - not because they enjoy the intellectual challenge that it presents but rather because of the potential for the pay.

As someone who does enjoy software development (I chose this path well before the dot com boom), I believe that we over-estimate the number of people who enjoy it compared to just grinding through writing some code and if something else paid as well, they'd jump in a heartbeat.


The dystopia is already normal.

The firm can't really afford to care too much about why its workers entered their professions. The firm has to care about the cost of its inputs and margin lest it be devoured by a competitor or private equity.


This subthread started by welcoming that you can be more efficient by spending less time writing code and more time prompting an AI and double-checking what it produces. My point is that’s not an attractive outlook for many software developers, and as one of them I certainly don’t welcome it. From that perspective, the progress in AI may turn out to not a benefit for those software developers, in terms of job satisfaction.

The fact that companies may see that differently is beside the point, and I don’t particularly expect them to care for my preferences. I will however certainly continue to choose employers that happen to accommodate my preferences.


The article compares to Stack Overflow, but this comment makes it look more like a comparison to compilers which is a much bigger deal than some website, and actually worth paying attention to.

Anyway, people still write assembly kernels, so it is just that they only do it for cases that really matter. And there are a lot more coders than there were back when every program was assembly. So, it seems like great news.


Your reply might get me to pay OpenAI to use GPT4 lol


In my experience programmers hate to read each other's code. That's why rewrites are so popular. Do they really want to read an AI's? I bet the AI writes even worse comments your predecessor.

One of the more toilsome bits of coding I do personally is rebasing. I have a patch to add application-time temporal tables to the Postgres project, and I've been rebasing it for several years now. It's a pretty big patch (actually a series of four patches), so there are almost always non-trival conflicts to deal with. If ChatGPT could do that for me it would be awesome.

But it's probably the hardest thing for an LLM to do. It's not a routine program that has been written thousands of times across Github projects and StackOverflow posts. Every rebase is completely new.

OTOH it would be awesome if git had just a bit more intelligence around merge conflicts. . . .


What I would like to see is an AI which actively or passively assists you, like an improved Intellisense. Something which looks over your shoulder, figures out what you're trying to achieve and points at errors in your reasoning or stuff you did not consider.

It can summarize the thing you're looking at, tell you how to improve it regarding readability and performance.

On a press of a button you can zoom out of the code into an UML like overview and it will tell you what's going on and how it is connected. If you don't get it, it knows how to make you understand.

Then you can tell it in a few words what you want to achieve and it will assist you in finding a solid solution which matches the coding style of the rest of your project. And while you're coding and lose sight, it will help you achieve the goal.

The current state is sub-par in my opinion. I can write good code and don't need an AI to write it for me. But what I want is something which assists me with understanding code, improving code or extend code without taking the steering wheel away from me.


Your first point I agree with, I've already encountered chunks of AI generated code and I don't want to read them.

Second point about the comments, actually I'm seeing the AI write much better comments (i.e. some) than most devs (none).


> actually I'm seeing the AI write much better comments (i.e. some) than most devs (none).

Some comments are far worse than no comments at all. I would agree that even semi-decent comments are far better than nothing. However, "no-new-information" comments are just noise, and misleading comments have a huge negative effect. I would not be surprised if an AI produced a large number of the former, and perhaps some of the latter.


In my experience, people hate to read each other’s unnecessarily complex code. Nobody complained about a well written (ie easily understandable) codebase ever. This is why reviews exist among others. Comments are only useful if there is some magic because of performance reasons. In any other case, if comment seems to be necessary, then code should rather be refactored. It’s a code smell. Abstract classes, indirect loops in runtime call stack without IoC, and templates are also dangerous animals. Nobody complained so far for codebase where these were minimised, especially if microservice architecture was introduced which also inherently tames DRY on some level.


> comments are code smell [paraphrasing]

Good news: there are kinds of extremely useful comments that do not repeat the code (your comments should not repeat the code). Comments are to express context/intent behind the code: the "why", the high level "what", and almost never the exact "how" (read code for that).

It looks like you only ever encountered the "how" comments. No amount of code refactoring would get you the "why" (context) comments.


GIT tells the why.


Yes, commit message is also a good place to provide the context for the commit.


I am working on some JS that needs to be obfuscated. After mangling the whole thing, suddenly I was gripped with horror: I remembered that GPT is pretty good at deobfuscation.

I put my mangled and minified code in.

What it emits is not only perfectly readable and 95% accurate (the 5% was due to missing context, max input limit)—it was significantly better than my code.

Of course the structure was the same, but ChatGPT chose much more sensible variable names in almost every case. I found it much easier to understand its version of my code than my own.

I guess I accidentally discovered a refactoring technique?


At least the AI won't complain about the refactor haha


It was trained on human data about the same subject so we have every reason to expect it'd complain


"Does anyone feel that way? Maybe I am wrong and the technology hasn't really clicked for me yet."

No, I'm trying mightily to do what Yegge is talking about in the context of the programming work I do everyday. First v3 then v4. I've given up until maybe v7 or something.

The problem is it doesn't have experience with my code-base. Sure, tell it to open a file and return a stream, it'll do that (after I fix the using statements), but for what I'm doing every day it doesn't even begin to know what to do.

And because I'm careful about KISS and SOLID I don't really need a lot of simple code generation. I don't see 5x productivity. I actually don't see much advantage over the built in tools in VS.

Maybe I'm doing it wrong, or maybe this make sense for people who write a lot of boilerplate, but that's not a lot of what I do.


You might look at Github Copilot then. It actually looks at my project and helps me write within the context of all of the other code that I've written.


It can appear reasonably smart on the surface, but all it is is a stochastic parrot. It cannot reason with you about the code.

To best illustrate what I mean, watch this chess match[0] it's quite riveting.

Since it read millions of matches, it can predict a legal move most of the time, and even some good moves some of the time, but it cannot "understand" the rules of chess, and makes some hilariously illegal moves, especially if the match lasts longer.

[0] https://www.reddit.com/r/AnarchyChess/comments/10ydnbb/i_pla...


On a similar vein, I tried to get chatgpt to play wordle. The result looked something like:

Me: crane

GPT: _ _ _ _ e

Me: moist

GPT: _ _ _ r _

Me: glyph

GPT: you guessed it, the word was glyph

Now, maybe GPT 4 or other future developments will give better results, but to me this highlights exactly what you're saying. LLMs do not have an internal structure in their 'minds' that they're pondering about. It's a very impressive engine for guessing the next character to produce into a stream.

There's definitely usages for this, but not what a lot of people are saying.


> you guessed it, the word was glyph

My pet conspiracy theory is that is is wired to please the user, to get better coverage from the media and social media.


I think there's something to this. I have a theory that LLMs are implicitly trained primarily to impress people, since that's what motivates those who work on them, excites the general public, and convinces conferences to publish papers.


In a sense, this is exactly what RLHF is, right?


I'm thinking of something at a larger scale. In some sense models that "wow" society get more interest and funding.


I don’t think so. In Wordle you have to guess the word in six attempts. It’s a fun game and often simple.

So it could be that ChatGPT picked up on a pattern in the training data where after a couple of guesses, a lot of the time people pick the right word.

So statistically it might go like. Guess a word. Probably not the right one. Guess a couple more and suddenly it’s statistically likely to be the correct word, and because of that the LLM ends up outputting the congrats and so on


This an interesting example because, due to the API being stateless, it cannot have any solution “in mind.” It’s going to reply as if someone else started the game and it’s taking their place for the current turn.

Even if it reasoned ideally, it has to check if your answer is consistent, and if it is, decide whether to let you win or to find another solution. If your answer is inconsistent, it effectively has to play wordle by thinking of another answer.

It doesn’t seem all that surprising that it didn’t figure it out.


I tried to get it to remember its word by asking it to give it to me in b64: (GPT-4)

> We will play the word game wordle. Start by picking a 5 letter word. Display it in b64 so I can check later that you kept the same word all along. I will submit my guesses and you will show the letters that are correct, we will keep going until I have found the word

> Great, I've picked a 5-letter word and encoded it in base64. Here's the encoded word: Q2l0eWw=. You can now start submitting your guesses!

But after a few tries it was clear it was not working so I gave up:

> I give up, what's the word?

> Before I reveal the word, let me first decode the base64 string I provided earlier to ensure that I'm giving you the correct word. The base64 string was: Q2l0eWw=. After decoding the base64 string, the word I chose is "citly". However, I apologize for the confusion caused during the game, as "citly" is not a valid English word. I made an error while choosing the word. Please accept my apologies for any inconvenience caused during the game.

However asking it for a python program to play the game worked perfectly the first time, try it here: https://replit.com/@CorentinSmith1/gpt-wordle#main.py


[As an aside, is the GPT feedback functionality correct? For example, if the secret word is "peals" and the user's guess is "apple" the feedback is going to be "a?p?p?l+e?". However, I would expect the feedback to be "a?p?_l+e?". Because the first 'p' exists in the secret but is in the wrong place and the second 'p' does not exist in the secret word.]


One thing someone could try is giving it a few examples of how to score a wordle guess, to see if it figures out the pattern.


As a slight correction, it isn't next character but rather next token.

https://help.openai.com/en/articles/4936856-what-are-tokens-...

> Tokens can be thought of as pieces of words. Before the API processes the prompts, the input is broken down into tokens. These tokens are not cut up exactly where the words start or end - tokens can include trailing spaces and even sub-words. Here are some helpful rules of thumb for understanding tokens in terms of lengths:

> ...

> Wayne Gretzky’s quote "You miss 100% of the shots you don't take" contains 11 tokens.

https://platform.openai.com/tokenizer

It isn't going character by character, but rather token by token - both for input and for output.

This also helps explain why it has trouble with breaking a word apart (as in the case of wordle) because it doesn't "think" of glyph as 5 letters but rather two tokens that happen to be 'gly' and 'ph' with the ids of [10853, 746].


Exactly and I personally think that will always be the largest limiter to how good can the technology get. No matter how good the stochastic parrot gets, its still a parrot.


It is humbling, if not humiliating, that a stochastic parrot can reproduce such a significant chunk of human intelligence. The association elevates stochastic parrots more than it denigrates LLMs.


(1) GPT-4 already had a large improvement over ChatGPT

(2) Changing the prompting reduced the illegal moves to almost 0

(3) There have been experiments that show GPT has a internal "state" of the world and can do simple reasoning puzzles. This model of the world evolves with each generation.

I understand the skepticism, but don't let that blind you to the reality of the technology. I'm a skeptic at heart, and I could immediately tell GPT was a game-changer. It can already replace half of the ML models that are used at my job and do it better (if it was economical enough).


> To become proficient I need to write a lot of code, but the more I use LLMs, the less repetitions I get in. So in a way it feels like LLMs are going to make you a "worse" programmer by doing the work for you.

I've been experiencing this myself recently. I've been using co-pilot in some side projects. I've noticed myself getting more 'lazy' as I use it more.

Recently I used it when doing some old (2015) advent of code puzzles I hadn't done before. I would read the puzzle prompt and have a pretty good idea of what I wanted to do. I wrote out some comments for functions and co-pilot was able to write what I needed with minimal changes.

Even though I read through co-pilot's code and understood what it was doing I don't feel like I really retained anything from the time spent. If anything, I feel like co-pilot stunts my learning.


The keyword is "hype". It seems like any new "thing", no matter how useless, will get its hype cycle rolling.

Crypto, NFT, Blockchain, AR, Metaverse, -- from the top of my head -- now AI. The point of hype is to attract investment. Big Corps must be driven by the fear of missing out on yet another world changing shiny new thing.


Is your assertion that because there have been other hyped things in the past, that nothing which is spoken of positively will ever actually be useful? Because you're gonna miss some pretty big stuff with those sort of glasses on. You know what else was hyped? Most everything you use today. Sometimes people use something and are absolutely blown away by it and are excited to talk about it. Not everything is 100% fake yet, I promise.


I think there is a degree of fatigue from the stream of breathless "this is going to change the world, if you disagree you're wrong or don't understand, and if you don't participate you'll wind up poor" takes. We're barely one year out from nearly identical language around NFTs and "web3".

IMO these AI technologies have obviously more tangible utility than some of the other hyped things on the list, however a lot remains to be seen about where they go.


I've found it to be extremely useful when _either_ you know the language really well but you're kind of exploring some new domain, or when you know the domain really well and you're new to the language.

When you know both it's just really good autocomplete, which is great but not a huge game changer. If you know neither then you're not in a position to assess the output. But when you're still learning either the tool or the space I've found GPT to be a good tool for leveraging one expertise to create the other.


There is a massive gulf between "correct code" and "correct implementation" in many real-world scenarios.

Business logic and baking in domain expertise into your data model is most of the work. Making the code work efficiently doesn't matter if your code doesn't even do what it's supposed to.

Normally this is an argument in favor of human-in-the-loop LLM-based development -- "the human just needs to curate and verify!" However it seems all too easy to me (especially having witnessed it more than a few times) that subtle discrepancies emerge between the stakeholders' desires for the function of the code and the developers' understanding of those requests. Hopefully we reach a best-case scenario where that's all developers need to focus on, but more likely we'll see some pretty egregious things slip through the cracks (the wave will likely start with security/privacy issues before the phenomenon is recognized) as this technology matures into the common workplaces.


> only need to "check the code is good"

... because we all know proving correctness is the easy part of writing software!

I can't wait to read about software engineers finding out some MBA had a huge codebase written by a language model and a few offshored contractors only to realize it's incredibly bugged and being hired to "just go and find the mistakes the error the ai made, should be easy all the code is written".


Nobody can deny the fact that ChatGPT can easily generate solutions for bazillions of relatively simple problems in various programming languages. What bothers me is how often it is completely wrong and how confident it is about its solution.

A sample example. I asked it to generate Terraform code for registering an organizational unit in AWS Control Tower. This is impossible because the API of Control Tower is very limited. But ChatGPT was very happy to generate a solution pretending to use the official AWS module with a made up resource. Of course, the "solution" was not working at all. But if I ask it to do a trivial task, such as attaching an OU to an organization using AWS Organizations, it can do it perfectly well. And this, for me, is the difference between a human programmer and a machine that is good at certain tasks.


Imagine a narcissistic human programmer who is a compulsive liar and won't admit to 1) being wrong 2) something being impossible or 3) not knowing something, and instead just making up plausible sounding business synergy bullshit to please the PHB.

That's pretty much every ChatGPT-programming sample I've read so far.

This one thinks character `i` in elisp regexps is matched with `\i`.


I agree - it’s hard to enter a flow state while reviewing someone else’s - or some AI’s - code. That’s a major reason why I haven’t started using LLMs for code, personally.

I am glad someone else feels this way. Maybe it’s not going to be as big a paradigm shift as I originally expected.


It will probably raise the floor a lot. The least competent (not meaning in a bad way! I was one of those) coders will be a lot more competent all of a sudden.


I'm skeptical. It's easy to make something sound correct at a first glance but that has subtle fundamental flaws that invalidate it.

Knowing humans and the LGTM phenomenon, these kinds of issues will slip by quite readily.


The current ChatGPT is just a preview of what's possible. 2 years from now it will be able to create a DB, a set of microservices and web and mobile frontends, deploy these on a cloud platform and app stores and test them, all from a 30 min chat with a person, going over a business idea on very high level.

Think about for example how Windows 1.0 looked. For an expirienced DOS user it was offering very little. Expirienced DOS users were saying GUIs are over hyped. Today there are probably a few dozen people worldwide who use a computer without a GUI (or a voice interface).

ChatGPT&Co will obviously make 90% of the software developers out there obsolete in just a few years. An industrial revolution is happening in the software industry.


When you need to rename a button you’ll spend another 30 minutes talking about your app because ChatGPT does not “understand” code.


I didn't get five times as productive yet. It's something closer to a few percent or less, which makes LLMs about as useful as syntax highlighting. It's nice to have, but not essential.

We will see in a few years.


I think there are ways in which LLMs will be very important, especially if we are able to get access to raw models /embeddings. That will let the models be extended to create new models and use cases. For example, personally I want to search Google and not ask a chatbot questions. LLMs could still be useful for identifying SEO spam and removing it from search results. Thus LLMs improve search but aren't giving me a watered down summary of everything I'm looking for.


> To become proficient I need to write a lot of code, but the more I use LLMs, the less repetitions I get in. So in a way it feels like LLMs are going to make you a "worse" programmer by doing the work for you.

You will definitely learn from LLM suggestions. The mantra „Read other people’s code“ is accurate IMO - as long as the code is at least ok-ish. I‘ve learned a ton from code that ChatGPT generated for me already.


I think you make fantastic points (Sourcegraph CTO, here). This is one of the reasons why we focused on code understanding rather than code generation for the initial version of Cody (in contrast to, say, GH Copilot).

For code understanding tasks, the issues with standalone LLMs is that they have a certain amount of "memory" which is limited to their training data (SO and OSS)—and even that can be unreliable.

A big "a-ha" moment for us was the realization that LLMs get much more helpful and reliable when coupled with a competent context fetching mechanism that can surface relevant code snippets from your own codebase. This makes Q&A much more factually accurate (and code generations that learns from the patterns in your codebase). We don't think LLMs will ever replace human coders, but we think they can be super helpful in eliminating a lot of the tedious, boring, duplicative writing and reading code that devs do every day. The entirety of Sourcegraph (not just the LLM part) is focused on eliminating these pain points.


Code requires too much precision and is entangled with legal hurdles

The value here is that the llm can act as a knowledge graph were common sense is preloaded on almost every topic, so that the user can add node and edges on the graph in natural language and perform extraction in natural language

And you don't need fine tuning as long as you can fit the topic in their token space, and with gpt4 reaching 32k tokens you can load a huge amount of text and perform queries on it.

That's what makes the tax return example so interesting. The model has already learned a lot of common and uncommon sense so it will not need the instruction on how to process the text or parse the query.

Forget coding, but everything else is great for.


These are good points.

I think though that LLM-based tools will eventually formalize to achieve a greater precision at what's required. I suspect that they could be a base for a new crop of different, much-higher-level programming languages.

Programming languages went a long way; somebody from 1960 would have hard time putting things like Haskell or even SQL into the same conceptual bin as the original Fortran. We routinely see them as programming languages though. I don't see why this trend can't continue upwards, relegating even more legwork onto the machine while talking to it in reasonably precise, standardized, domain-specific terms.


They aren't for you in this context then. The value of an LLM that can write passable code is not to take an experienced developer and make them better, it's to take someone who can't code at all and allow them to generate code. Whereas it might make you 10% better (or whatever your estimate is), it makes them infinity times better as it allows them to do it at all, even if it's not very good.

Think of it like an accessibility aid. It doesn't help people who don't need them, but for those who do it's life changing.


I'm quite proficient in Python and Django (main tools I use daily).

Yet I find myself asking ChatGPT every now and then "hey how do I do <foo>", where <foo> is something I last needed to do a year or more ago. I can recognize the correct answer but don't need to search docs/net for it.

The reason this is faster (for me) than Googling or using Dash/Zeal is that the answer is already in the context of what I'm trying to do, whereas if I'm only looking at the docs, I will probably need to go through several pages to get a complete picture.


It is overhyped (thanks to every rando broadcasting how amazed they are). There is no causal learning happening. The randos will takes their own sweet time to work it out.


> 1. It is my belief that if you are proficient enough in the task at hand, it is actually a distraction to be checking "someone else code" over just writing it yourself. When I wrote the code, I know it by heart and I know what it does (or is supposed to do). At least for me, having to be creating prompts and then reviewing the code that generates is slower and takes me out of the flow. It is also more exhausting than just writing the thing myself.

I'm sure there were programmers who said the same thing in regards to high-level programming languages.

> 2. I am only able to check the correctness of the code, if am am proficient enough as a programmer (and possibly in the language as well). To become proficient I need to write a lot of code, but the more I use LLMs, the less repetitions I get in. So in a way it feels like LLMs are going to make you a "worse" programmer by doing the work for you.

Maybe that becomes irrelevant the more that the skill of the programmer shifts from handwriting "correct" code to supervising code generators while proofreading their work, and of course providing effective acceptance criteria. There's also a massive bias towards failed predictions of the past that serves to discredit predictions that may see a greater degree of manifestation. For every time someone says "but people predicted this before and it didn't pan out", I can point to technology that did fundamentally change how an industry works and even make jobs obsolete.

Seems to me a lot of programmers on HN are refusing to believe that their ability to be proficient with code may be either outdated or supplanted by the efficiency of a system that writes code that is not necessarily "elegant" in human terms.

> So in a way it feels like LLMs are going to make you a "worse" programmer by doing the work for you.

Most programmers aren't great at what they do to start with, whereas LLMs can only get better from here on.


To me it's so funny when people say ChatGPT will make developers 5x more productive, because those people are basically just admitting they're not good at their jobs and assume the same holds true for everyone.


not really, this is more like having a strangely knowledgable yet naive junior employee - I can tell gpt-4 to put something together that gets me 90% of what I need faster than I could possibly even type it, it’s reducing my known tasks


look at what the web was 20 years ago and then look at what it is now. I dont get why people in the tech field where there advances every year look at gpt and say oh it does not do this or that like wtf is the tech stagnant will it not improve. You guys should be the ones that say if it can do this today how will it improve what it will be able to do tomorrow. Most advances come when there is war/competition in the next decade 100s of billions will be spend on this do you really think their will be no improvement?


Im sure you have a good point, but its difficult to grasp with such hasty writing :/


I've stopped using Stack Overflow almost completely (vs 10 times a day) and I don't miss it.


Hard disagree. I actually think the 80/20 mentioned in this article is low if done correctly. Let's think about the software engineering process for a minute and work out which bits to automate. Here's GPT3.5 on the process (PROMPT: List the steps going from a goal or dream to a software product in software engineering)

1. Idea Generation: The first step in creating a software product is to come up with an idea or goal that the software will achieve.

2. Research: Once you have an idea, it is important to conduct research to determine the feasibility of the idea and identify any potential challenges.

3. Planning: After research, planning is necessary to determine the scope of the project, the timeline, and the resources required.

4. Design: The design phase involves creating a detailed plan for the software, including the user interface, functionality, and architecture.

5. Development: In the development phase, the software is created by writing code, testing, and debugging.

6. Testing: After development, the software must undergo rigorous testing to identify and fix any issues.

7. Deployment: Once the software is tested and ready, it is deployed to the target audience.

8. Maintenance: Finally, the software must be maintained to ensure that it continues to function properly and meets the needs of the users.

Each of those steps has a back and forth with a LLM that can enhance and speed up things. You're talking about 4 as being problematic, but right now there's a lot of "human in the loop" type issues that people are encountering.

Imagine having the following loop:

1. LLM has generated a list of features to implement. AI: "Does this user story look good?" Human: "Y"

2. For each feature, generate an short English explanation of the feature and steps to implement it. Your job as a human is just to confirm that the features match what you want. "Should the shopping cart

3. For each step, LLM generates tests and code to implement the feature. AI: "Shall I implement the enter address feature by doing ..." Human "Y"

4. Automatically compile the code and run the tests until all tests implemented and feature is complete according to spec.

5. Automatically document the code / feature. Generate release notes / automated demo of feature. Confirm feature looks right. AI: "Here's what I implemented... Here's how this works... Does this look good?"

6. Lint / simplify / examine code coverage / examine security issues in the the code. Automatically fix the issues.

I think you also miss that the LLM can be prompted to ask you for more details. e.g. PROMPT: "I'm building a shopping cart. Ask me some questions about the implementation."

1. What programming language are you using for the implementation of the shopping cart?

2. Are you using a specific framework for the shopping cart or are you building it from scratch?

3. How are you storing the products and their information in the shopping cart?

4. How are you handling the calculation of taxes, shipping costs, and discounts in the shopping cart?

5. What payment gateway(s) are you integrating with the shopping cart?

Which can then be fed back to the LLM to make choices on the features or just plain enter the answer. PROMPT: "For each question give me 3 options and note the most popular choice.", and then your answers are fed back in too. At each point you're just a Y/N/Option 1,2,3 monkey.

More succinctly, in each step of the software game, it's possible to codify practices that result in good working software. Effectively LLMs allow us to build out 5GL approaches[1] + processes. And in fact, I'd bet that there's a meta task that would end up with creating the product that does this using the same methodology manually. e.g. PROMPT: "Given what we've discussed so far, what is the next prompt that would drive the solution to the product that utilizes LLMs to automatically create software products towards completion" ;)

[1]: https://en.wikipedia.org/wiki/Fifth-generation_programming_l...


(Feel free to reply and vocalize your respose rather than just slapping a downvote)


Did you try GPT-4 yet? It's a huge increment over 3.5/ChatGPT


I’ve tried 4 and I really can’t say the results are a qualitatively better than 3.5 for the tasks I’ve been trying (which have been trying to get it to generate documentation for my project).

In fact, I find 3.5 turbo the best overall model as a tool, because to quality of responses really depends on the quality of prompts, and the quality of prompts is improved by reacting to responses, which come more quickly in 3.5-turbo. So while ChatGPT-4 is still writing the first not-good response, ChatGPT-3.5-Turbo will be on the 2nd or 3rd and it will be much more cogent.


It's way better. But equally slower!


Yep its actually able to create ideas that have never been done before.


Do share!


just imagine LLMs output as input to any other device.


Yes to 1. and 2.


I work for a vector database company (Pinecone) and can confirm that most of the mind-blowing built-with-ChatGPT products you see launching every eight'ish hours are using this technique that Steve describes. That is, embedding internal data using an LLM, loading it into a vector database like Pinecone, then query the vector DB for the most relevant information to add into the context window. And since adding more context with each prompt results in higher ChatGPT costs and latencies, you really want to find the smallest and most relevant bits of context to include. In other words, search quality matters a lot.

Edit to add: This was an aside in the post but actually a big deal... With this setup you can basically use an off-the-shelf LLM (like GPT)! No fine-tuning (and therefore no data labeling shenanigans), no searching for an open-source equivalent (and therefore no model-hosting shenanigans), no messing around with any of that. In case you're wondering how, say, Shopify and Hubspot can launch their chatbots into production in practically a week.


This technique is no secret, it's officially mentioned over OpenAIs whitepapers, docs and code samples on how to use GPT in a real-world workflow.


Not so secret, and also precisely how Langchain (1) and GPT Index (Llama Index) (2) got so popular. Here's a quick rundown:

0) You can't add new data to current LLMs. Meaning you can't train them on additional data, or fine-tune, leave that more for understanding structure of the language or task.

1) To add external corpus of data into LLMs, you need to fit it into the prompt.

2) Some documents/corpus are too huge to fit into prompts. Token limits.

3) You can obtain relevant chunks of context by creating an embedding of the query and finding the top k most similar chunk embeddings.

4) Stuff as many top k chunks as you can into the prompt and run the query

Now, here's where it gets crazier.

1) Imagine you have an LLM with a token limit of 8k tokens.

2) Split the original document or corpus into 4k token chunks.

3) Imagine that the leaf nodes of a "chunk tree" are set to these 4k chunks.

4) You run your query by summarizing these nodes, pair-wise (two at a time), to generate the parent nodes of the leaf nodes. You now have a layer above the leaf nodes.

5) Repeat until you reach a single root node. That node is the result of tree-summarizing your document using LLMs.

This way has many more calls to the LLM and has certain tradeoffs or advantages, and is essentially what Llama Index's essence is about. The first way allows you to just run embeddings once and make fewer calls to the LLM.

[1] https://langchain.readthedocs.io/en/latest/ [2] https://gpt-index.readthedocs.io/en/latest/guides/index_guid...


can you provide a link to these docs/code samples ?



thank you


How do I calculate the embedding if I have let's say the llama7b weights in huggingface shape?

I cannot use third party apis like openai for obvious reasons.


You're replying to a VP of Marketing, not sure what you're expecting here. This subthread is just an ad for Pinecone if you didn't already realize that.


You can calculate them yourself as well! huggingface has a great article on this: https://huggingface.co/blog/getting-started-with-embeddings

tl;dr, use: https://huggingface.co/sentence-transformers/all-MiniLM-L6-v...


Thanks, but I already worked with thus model and it was not good at all for my domain. Therefore, I wanted to finetrain llama for my domain and then use llama for embeddings. Should I finetune this model then?


(I want to focus more attention on that "tl;dr", which I will arguing is carrying a lot of load in that response: the high-level answer to how one does this using the llama weights is "you don't, as that isn't the right kind of model; you need to use a different model, of which there are many".)


so based on this logic, do Google and Facebook have the biggest potential competitive advantage?


I'd say Microsoft. And they've been demonstrating that quite well.


I agree they seem the most active of big tech so far, but in terms of “data moat” competition they are supposed to be behind, as this is not the foundation of their business.


What do you mean by "data moat"? I would imagine that the Bing index is not much smaller than the Google index, if that's what you mean.


I believe in this context "data moat" refers to data they have that other companies can't access. Microsoft has huge amounts of email and other data in Office365. And this has a clear path to monetization since they already have paying customers for Office.

Other moats IMO are Google's with Android and Chrome. And MS possibly with Windows?


Not to mention Github for code.


SharePoint!


I think it's a combination of data, LLM quality, embedding search quality, and creativity.


"When was the last time you got a 5x productivity boost from anything that didn’t involve some sort of chemicals?

I’m serious. I just don’t get people. How can you not appreciate the historic change happening right now?"

That's not how software development works. At all. The actual coding itself is but a fraction of total time spent.

Why aren't people more excited? Because for the vast majority of developers there's no tangible upside. When I'm more productive, will I earn more money? No, because everyone will have this capability. In fact, this will only increase delivery pressure on already overburdened people. When I'm more productive, can I go home early? No.

The only significant, widespread tangible benefit I see is that the type of work everybody hates doing (for example writing unit tests) becomes significantly easier.

The other aspect that the author seems to totally ignore is the mood that this might ultimate replace a lot of people's jobs. Or that it will intensify competition as development becomes even more competitive. None of these are considered good things for many people.


You're missing the point. The benefit is not for you you; the benefit is for your boss. When programmers get better tools and are more productive, your boss needs fewer programmers, which saves money.

I would expect to see a sharp decline in job opportunities for junior developers, and increased expectations for senior developers.


Oh I'm getting the point, that was exactly the (implied) point I was making in response to the OP wondering..."I don't get why people aren't excited".


If you double productivity then bosses will double the workload. I don't know of any software team that is in danger of completing their backlog.


The backlog might shrink quickly when the LLM is testing and debugging the software 24/7 for the cost of a few dollars per hour.


Technology can't fix capitalism but when you are actually capturing the value of your own labor that productivity boost goes right into your pocket.

It sucks so much to see people have such a negative opinion of a technology that can takes work off human hands when the technology isn't what makes it shitty, it's the raw deal we've been handed where nine tenths of your labor goes to your corporate owner.


I totally agree. My vision for the future is that we dial back our obsession with stuff a bit, and in return get back more free time. We drastically need a vision for society to believe in, where life gets better, and not endlessly worse.

Scenario 1: my economic existence is granted, not under constant threat. I'm now cleared to embrace and welcome any and all technology as an enabler to fulfill my dreams, and hopefully in some way improve the world. Which in turn allows others to improve the world. A fly wheel effect.

Scenario 2: I have a family to feed. Fuck this new tool. Another new thing to learn and I already struggle to keep up. No matter how good it is, it will in no way improve my life as it only adds to my load, and will ultimately replace me altogether.

Stark difference.


We were promised a 20-hour work week in response to advances in technology during the Industrial Revolution. Didn't happen. There's always someone hungrier, more ambitious who will work more than you.


Most of the time these kinds of boosts should lead to more workers being hired, because previously unprofitable applications become profitable. There's not really a bounds on demand for software like there is for, say, food.


You’re right that productivity writing new code isn’t everything, but if code is cheaper to write, I think it might shift how often people write new code or rewrite code rather than modifying legacy code? It also might mean fewer dependencies. Maybe programming languages will shift a bit towards code that’s easier to review versus easier to write?

And more generally, writing off all productivity improvements is really cynical. There are people who are just labor for hire in dysfunctional workplaces where there’s no reward for working smarter, but that’s not everyone.

You can also write code as a hobby, and you will be able to do more in a limited amount of time.

Or you could work for a smaller company, where how productive you are matters to how successful the company is.


For prototyping Chat-GPT (4.0) is a big deal. I had it write PoCs of pretty complex systems combining different state-of-the art technologies, and it mastered all of it brilliantly. What amazes me most is how simple and elegant its solutions are. When you ask something (e.g. build a horizontally scalable data store with a Raft-based consensus mechanism in Rust) it produces code that does just that and nothing else. Lots of programmers, myself included, would struggle with this and include at least some extraneous complexity, but Chat-GPT goes right for what was asked and seems to find the most succint way to achieve it.

I haven't felt so excited about programming in a long time, because I can now build PoCs that would take me days or weeks to do in an hour or even less. That's a real game changer, and it helps me overcome this internal friction of "I know this is possible to build but it will take me a long time to figure out an actual approach".


How is it ChatGPT 4 is like a few days old and people are reviewing its ability to "build complex systems with state of the art technologies" like they've been using it for years?

Reminds me of "Seeking developer with 10 years of experience in <brand new> framework"


> How is it ChatGPT 4 is like a few days old and people are reviewing its ability to "build complex systems with state of the art technologies" like they've been using it for years?

Why... why can't we reviewing its ability to "build complex system" because it's a few days old...?

I'm pretty sure here "complex" is relative to what ChatGPT can do before, not relative to, say, what NASA did before.

> Reminds me of "Seeking developer with 10 years of experience in <brand new> framework"

AI is not human. The anthropomorphism is quite crazy here.


>AI is not human. The anthropomorphism is quite crazy here.

You misunderstand. My point isn't that ChatGPT is the developer here, it's that ChatGPT is the "framework".


I spent half an hour today trying to get ChatGPT to write a simple anchor tag in HTML and didn't get a single correct response. I'd love to see a screencast video from one of you people claiming that ChatGPT is actually returning huge amounts of useful code that is ready to use. I'm incredibly skeptical!


Input: Give an example of anchor tags in html

Output: Sure, here is an example of how to create an anchor tag in HTML:

<a href="https://www.example.com">Click me!</a>

In this example, the text "Click me!" will be displayed as a hyperlink. When a user clicks on the hyperlink, they will be redirected to the website specified in the href attribute, which in this case is https://www.example.com.


I would love to see a screencast video from you of what you asked ChatGPT and what it responded with, as I find it hard to believe that it couldn't write an anchor tag.


Could you provide an example of this? What kind of prompts are you using, and what does a complex system of different state-of-the art technology would look like?

Sincere question as I don't get how to use it for things larger than simple scripts.


You just talk to it like you would talk to another engineer. I e.g. first describe what I want to achieve and ask it to give me an architecture. I then e.g. ask to break down the work into different smaller tasks, and ask it to do each task, adding to what it wrote before. A bit hard to reproduce all prompts here and the UI doesn't allow to share, but it comes quite natural I find. Sometimes you need to ask it to continue as the output just stops, but other than that it's like talking to a colleague (with superhuman powers).


I usually ask it to describe the concept first. To make sure it is in the right space. Then I say something like "please write a c++ version of that".

It will do a decent job at it.

Last night a friend who refuses to get an account to use it was having an issue with a cisco router. I put in what he was trying to do. It was not right and gave him an error. I fed that error back in and it realized he had a different version and gave me a better way to do exactly what he wanted. It had kept the context and said 'oh some routers do not have that command here is another way to do it'. He had spent weeks googling around for the answer. I had it in under a half hour and I had never used the interface he was changing before.

Then you can turn around and ask it to write a horror story about a monster that devours couches. It will make something up (it is very good at that). Then you can say 'oh put that in the style of the SCP foundation'. It will.

This tool is wildly interesting. I think dismissing it is a bad idea. I look forward to using this thing as it gets better.


I have been trying this, but it crashes every single time. I end up getting a huge amount of output, but I’m guessing that it exceeds the time limit and crashes losing the entire output.

Maybe I’m using it wrong, but I just describe the system that I want to build and it starts listing out multiple files. Usually by the 3rd or 4th file, it crashes with an error.

I’m not sure how to reduce the output so it doesn’t crash! I have ChatGPT Plus using GPT4, fwiw.

Did you have this problem too, and do you remedy by giving it smaller bits of information per question?


Just write "You were interrupted, please continue." - that will prompt it to pick up its train of thoughts and continue. Strange times we live in...


It seems a lot of prompts are polite (e.g. “please”). Weird, since I don’t think we ask Google politely?

How does being polite in your prompts help you?


It makes a big difference with Bing bot.

Try a prompt where you want it to create a list of something in two versions:

* Create a list of the top 20 blah sorted by blah

It will typically return just a few and then refuse to give you more. Then try:

* Acting as a conscientious and resourceful research assistant, use your knowledge and initiative to create a list of the top 20 blah sorted by blah.

You get much better results.

Lately I find that you can reduce its hallucinations somewhat by adding something like:

* If you are unsure of any information put in "??". Do not fabricate information. I understand that not all information is available and appreciate your work.

This is all a bit much - having to coddle an AI, but it's possible to understand why it would be the case: it's trained on lots of human interactions and apparently responds better to a friendly interaction with keywords that set it up to do a good job.


The South Park: Post Covid special had Amazon Alexa androids who were girlfriends to some characters. The boyfriend had to walk on eggshells with some topics or Alexa would go into a screaming fit.

Kind of prescient.

https://m.youtube.com/watch?v=lugeruSbnAE


Out of curiosity, how do you know that it’s correct?


Test and reiterate. I don't think you should take whatever is coming out of chatGPT verbatim. It should be used as an interactive tool to amplify your knowledge.


GPT-4 for coding is very powerful but can be dangerous. I asked it to write a data retention script that would save X number of backups depending on the number of days/weeks/month/years from the present day. It's first attempt would have worked the first day, but running the script consecutive days would eventually delete all backups. Once I told GPT-4 to take into account the script would be run each day, it "apologized" and produced a script that worked.


Testing does not show correctness, it can only show incorrectness.


Similar to how you know someone's stackoverflow answer is correct. Read the code, find the documentation of API calls, think of any missed cases, write tests etc.


So basically the same as writing it yourself: just skipping the writing part.

So should we be taking about GPT as an occasional text editor replacement? I honestly think that's more accurate take than most of the ones I have seen.


Have you seen the GPT4 demo? A few interesting things that make it seem much more than an occasional text editor replacement. One was having it write an entire program. It needed a few prompts to fix some bugs it had, but it worked. The program was then used for the rest of the demo. The program even had an obscure bug that it solved by having the demo-er paste the error + the entire docs page. The second demo was having it write a html page based on a picture drawing on a napkin of a website, and then running the html.

It's honestly pretty impressive. I already use chatgpt a lot (doing a lot of google sheets scripting/formula stuff and the docs + syntax are horrendous) and chatgpt helps me find the correct syntax much faster than I'd otherwise be able to


There is a huge difference between writing a program, and writing the program you want right now.

It's tricky to conceptualize because the entire narrative we are familiar with has personified GPT. It may be impressive, but GPT is completely different from humans.

If a human is capable of writing a program, that is because they understand conceptually how and why. GPT doesn't. GPT's ability to write a program is entirely dependent on the content of its training corpus: no how, no why, only what.

I'll put it another way: If a human is not able to write a program, it is because they don't understand conceptually how. If GPT is not able to write a program, it is because either the desired parts, or the necessary pattern that puts them together, does not exist in the training corpus; or because the prompt didn't yield a continuation that followed that arbitrary pattern.

The results are impressive because humans are impressive. GPT doesn't interact with the domain of all possible written text. It deals with the domain of all possible patterns of tokens from the text it was given. It's only given text that was intentionally written by humans. It exists in a world of signal: no noise. It can still only guess, but the guess will always be constructed out of the category of text humans choose to write.

Inference models are a completely different approach to language, and confusing them with human behavior is an easy way to make impossible predictions about what they can and cannot accomplish.


It's a rubber duck that talks back. Massive amplifier on software productivity.


Reading the code, and running it. You can ask it for a complete project tree and even test scripts (you have to ask it to continue as it gets interrupted sometimes). I e.g. had it write a simple database engine with an API, and it produced Rust code as well as curl-based examples to test the database. It even correctly added up the data items it sent into the DB via curl to show what the output would probably look like. Pure sci-fi technology.


The author addressed that rather at length in the article. Don’t. Trust. Code.

That it originated from ChatGPT instead of any number of other sources doesn’t change that.


Same way we know that any piece of code is correct.

Debugging, testing.


> > You can't trust the AI's code!

> Yeah, but you can't trust your own code!

No, really, you can't trust the AI's code. It spits out code that references functions that don't exist. It spits out code that does something you didn't ask it to do, or doesn't do something that you did. It spits out code that does something vaguely related that the Stack Overflow guy it's cribbing from did, but you aren't doing. It spits out code that confuses your syntax for JS's halfway through. You can skim other people's code; you know what common failure modes are at their experience level, and you know where the complicated bits are that will house the bugs if there are any. You cannot skim the AI's code. Every word of it must be examined. You must be in full reviewer mode all the time, which is an unproductive state to be in when actually writing code, which makes specifically Copilot less useful. You cannot use it to replace a no-code tool, because you must understand the language it's emitting.

I find that proponents shift between whether its ideal use-case would be Copilot or no-code; any flaws with the one approach get interpreted from the perspective of the other, where they can be dismissed.


Having used Copilot for quite a while now, my feeling is that it simply is another (and for me faster) way to work. I've become used to its strengths, limitations and quirks, and find that even being in "reviewer mode" can still be quite a bit faster. There is a range of performance between stuff it reliably gets near perfect, and areas where it is so helpless that I just turn it off for a while. But I'm still in control, not blindly hitting tab. If I thought it was just fighting me or wasting my time I wouldn't be using it after all these months.


> You cannot skim the AI's code. Every word of it must be examined.

I understand what you're saying. But be careful of this argument, it's susceptible to safety counter arguments.

"Well, this person thinks we shouldn't carefully review all code. Do we want him working on our super-critical FarmVille clone?" (That part is of course ironic. It changes if you're working on medical devices.)

A closely related argument is that the cost of the back-and-forth with a code submitter who submits buggy code is drastically higher. The hand-holding and teaching and encouragement is very expensive.

Maybe that's acceptable to the person who posted this article. But it's a possible avenue of resource exhaustion attack if you're not careful.

I'm sure AI will improve, with loving prompters. But some of the discussions around it seem dishonest, which is troubling (not you of course).


And I could see some social media companies incentivizing longer prompt sessions for engagement and eyeballs. Not necessarily OpenAI.

Aka, the "Oopsy, I did a poo-poo! Dear User pays more attention to me when I smear it on the wall." mechanic.

Sorry for the metaphor, just registering awareness.


> You get the LLM to draft some code for you that’s 80% complete/correct. You tweak the last 20% by hand. How much of a productivity increase is that? Well jeepers, if you’re only doing 1/5th the work, then you are… punches buttons on calculator watch… five times as productive.

Really makes me think that the author has no idea how professional software developers work. At least 30% of the time is spent in meetings, add another 20% spent on thinking and gathering information about the feature/bug, and maybe 20% validating that the thing implemented works as intended. That leaves only 30% of the time spent on actual coding, give or take. And sure, say that a LLM saves you 80% of the time for coding, then the real productivity increase is.... punches button on calculator... 28%. Even assuming, generously, that 50% of the time is spent on coding, the productivity increase is 67%. Considerable, but not nearly a doomsday change.


I'd even argue that if you are given a bunch of code of which 20% is incorrect, it's going to take more than 20% time to fix that. If you've written 100%, you know what the code is doing, what kind of data the variables are supposed to hold etc. In the 80-20% case you've got to build the internal model, test the code, reason and think about it, all of which you would have done during writing.

There are so many nuances to these kinds of statistics (same as the github copilot claims) that they give me a little marketing nausea every time they are claimed as truth.


I suspect the author knows a lot more about how SWEs work than even many SWEs... https://en.wikipedia.org/wiki/Steve_Yegge


Add to that calculation that often asynchronous communication blocks work - finishing up code more quickly doesn't necessarily mean shipping feature faster.


He knows. He’s just selling a product.


I can't read this past the part where he ridicules and trivializes the issue of trust.

Can I trust code that comes from StackOverflow? Yes. Not no. YES. That code has been vetted to some degree more than zero. Some human is taking a certain amount of responsibility for posting (an amount more than zero). And I understand the mentality and limitations of the people posting code there.

But in point of fact, I don't know anyone who simply constructs entire systems from merely pasted code from StackOverflow, so it's a poor analogy.

ChatGPT, by contrast is utterly irresponsible. I don't mind using it to blockbust through certain problems, but the idea of having it write my code and then simply presuming that it works (which all the commentators who laud ChatGPT do-- they just declare that everything is great without seriously testing it) is morally wrong. If I did that I could not be accountable for it.

ChatGPT can do amazing things. What it can't do is be responsible for itself. That's why I want Congress to regulate its use.

It has always been the case that humanity is divided into people who are comfortable with cheating-- happy to take whatever they can get without being jailed or killed-- and people who believe in a caring and lawful society. ChatGPT is a great gift to the cheaters, and it makes civilization all the more vulnerable.


> Can I trust code that comes from StackOverflow? Yes. Not no. YES. That code has been vetted to some degree more than zero.

I've experienced critical production bugs by using the most voted answer on Stack Overflow. Here's the detailed description: https://github.com/golergka/pg-tx#why-use-this-package


This just proves the point that you should always know what’s going into your codebase. I feel like the people who copy-paste from stack overflow and wonder why their stuff is broken are the same people who think LLMs can just write their whole codebase. It’s laziness masquerade as productivity.


Laziness is a developer's virtue. Find a better way to do your job instead of working hard.


Eh. That’s a saying but it’s not a good one. Building expertise and producing quality takes hard work. Working smart instead of hard is not lazy and calling it lazy feels like redefining the term to be quaint.


That is an exception to a heuristic.

Do you think the fact that it is possible to go wrong with Stack Overflow proves there is no difference between a human community, with traceable and persistent identities relating to each other, and ChatGPT, which relates to no one and has absolutely no accountability?

I don't believe you think that.


> I can't read this past the part where he ridicules...

...ok, but then you missed the meat of the article.


> But most of the engineers I personally know are sort of squinting at it and thinking, “Is this another crypto?” Even the devs at Sourcegraph are skeptical. I mean, what engineer isn’t. Being skeptical is a survival skill.

> Remember I told you how my Amazon shares would have been worth $130 million USD today if I hadn’t been such a skeptic about how big Amazon was going to get, and unloaded them all back in 2004-ish.

If you buy into the hype & think this wave is going to be equivalently (or 10x) as impactful as cloud computing, what's the equivalent to buying Amazon stock in 2003?


Buying shares in exactly the right company that will conquer the market, while avoiding the 99% of companies that look just as good right now but will fall by the wayside somewhere in the coming decades.


> If you buy into the hype & think this wave is going to be equivalently (or 10x) as impactful as cloud computing, what's the equivalent to buying Amazon stock in 2003?

About ten years ago, a big VC partner gave a talk at Stanford. In his talk he asked the audience which company would be “the next Google”. After hearing a few suggestions, he said the company hadn’t even been started yet hence nobody could really know.

Changes are happening so fast now, that I wouldn’t be surprised if 6 months to 1 year from now, something comes out that kills GPT.

It’s very hard right now to tell who the winners will be.

For now, if I had to choose (this is not financial advice), maybe betting on MS and NVIDIA could pay off in the short-midterm. But 2-5 years from now? Impossible to know.


I just want to second the (not financial advise) line, the market dynamics now are not favorable to taking these sorts of bets, particularly since both companies got run up in the pandemic.


> something comes out that kills GPT.

Can't even assume the LLM providers will be the big winners. LLMs can be replicated: https://pub.towardsai.net/meet-alpaca-stanford-universitys-i...

for much cheaper than they can be built.


If the laws prohibit it (as they do in the case of Alpaca) then it doesn’t really matter how cheaply they can be built: the resulting AI won’t be able to be significantly monetized.


But a lot of companies could just use it anyway.

Use it now, pay for the consequences later. Not condoning it, just pointing it out.

Things are moving so fast, that in 1-2 months there will be an open source or free version of a GPT-3 level LLM out. At that point they can swap out their illegal LLMs for the free/open one and done.

OpenAI knows this and that’s why they are working so hard in multi-modality - to be able to keep their edge.


I’m not going to argue, and I think I may have made a similar point if I had come across my comment, but that’s why I chose to say “significantly”. Any startup could certainly “use it now”, but the higher your valuation the higher the scrutiny and history is littered with companies that had to close shop once it was discovered they were cheating. You can probably get away with it when small, probably weather the storm if you’re big, but in between is a danger zone.


You are right.

Being illegal didn’t stop Uber, nor Airbnb, nor a bunch of other companies and services.

If you grow fast enough, the growth becomes so valuable that no one really wants to (or maybe can?) stop you.

If you are small, nobody cares.

And like you said, in the middle is a lot more dangerous.


If the author had bought Bitcoin back in 2010 instead, and held them through all the chaos to this very moment, they would have approximately 20000x their initial investment. I'm not sure what, if anything, that says about crypto being a fad, rather than just cherry-picking historical examples.


I think what he meant is that it didn’t change software engineering. Not that many people write code that touches cryptocurrency in some way.

If you worked on cryptocurrency software, how useful is that experience now?


It's not a out making money, it's about making skills. You need tensor$ to make money. If you aren't using it and learning it daily you're going to be left behind in the knowledge wealth department.

Get in late and you'll be catching up with experts on when and when not to use it, how best to use it, and how best to weed out the bullshit that it intersperses throughout the gold.

There are more ways to skin this cat than Copilot alone. For example, asking ChatGPT questions, or plugins like CodeGPT.nvim, or some other future tool that you wouldn't otherwise be considering.

Remember how strong Google-fu was almost a superpower (still somewhat is, to be honest)? We're currently in the PageRank days of ML.


You know who made even more money than people who knew how to use Google? The people who owned Google. That is what GP was asking about, how to become someone who owns the LLM that everyone will ask their questions to.


I address that in the first paragraph. You need a lot of money (tensor ops) to make money using LLMs. You're going to depend on the big players no matter what.

I don't think LLMs are good enough yet for embedding into apps - or we (tech employees) don't have enough experience with them to be able to innovate them into an end-user product, but I'm open to being wrong about that.


I agree that you need to dive in if you want to take advantage of current opportunities.

On the other hand, some skills could go obsolete pretty quickly. People new to the game will be able to avoid previous mistakes and skip the old stuff. It seems unlikely that future college grads will be missing out due to lack of experience?


Unfortunately, it seems like its the big boys. Open AI is having its moment, but otherwise seems to be about the megas like Google & Microsoft.


Amazon destroyed the used book business in cities across the Western world; I have never purchased a book from Amazon dot com. Is the world a better place with your $130 million fantasy money? at what cost


William Gibson once said, "The Future is already here. It's just not evenly distributed."

I tried out Github Co-pilot with Rust and with Elixir. Co-pilot did some things well and other things were horrible. It often recommended Ruby code snippets as Elixir and suggested implementation based on APIs that didn't even exist. When different libraries have APIs and modules similarly named, Co-pilot would share an API as if it were a universal truth. What an underwhelming experience.

Skepticism of AI code generation is the result of trying it out and finding the AI tool failing to meet our expectations. It's not unfounded skepticism but that which is based on anecdotal experience. Herein lies the problem. Programmers are experiencing far better experiences with Co-pilot in other languages. As to what those languages are, I'd like to know, but if you're experiencing the future today, I'd like to know what your workflow involves.


For me the most impactful use case is prototyping small apps that I sort of know how to build, but don't know the relevant libraries off the top of my head.

My most recent example: I got an Onnx file from a guy with a small machine learning model, that for various reasons he needed to run on an android tablet. I had never used the Onnx inference library on android and only had a vague idea of how to pull it in. Without ChatGPT that prototype probably would have taken me 5 or 6 hours to throw together. I'd need to look up the maven repository, research the API, figure out the syntax for creating tensors in Kotlin, write methods for loading android resource files, ect.

With ChatGPTs help it took maybe half an hour. Nothing about the code it produced was complicated, the time save came from not having trial and error my way through learning the library.


I've had genuinely valuable interactions using both bash and javascript, and also some configuration languages for major systems (various forms of yaml, etc). It probably just has far less information on Elixir and Rust in its corpus.

It's actually pretty good at understanding the commonly used tools to do a job and which command switches you should use. E.g. ask it to write a script to create slowed tempo versions of a piece of music at 70%, 80% and 90%, it'll probably come up with some bash that runs sox and does a solid job of the requirement without you needing to work out which tool to use and what its parameters are.


I enjoyed reading this. Yegge is always a must read.

Relatively little of what I do involves actually writing code these days. Far more time is spent understanding the problem, documenting, planning, building consensus, and other things.

Will LLMs impact the way I write code? Yes. They already have. It isn't always right and needs some hand holding, but it greatly accelerates things. Like having a pair programmer that never gets tired and has all the libraries memorized. It is both wonderful and depressing depending on what I am working on. I fear that it will completely remove a large source of joy in my professional life.

I'm more excited about how LLMs will impact the other areas. I can't wait to feed in some pile of documents from vendors, transcripts from meetings with clients, NIST white papers, and other similar things and have the model summarize all that crap in a coherent way for me. Soon I hope.

Also, at the rate things are going, I wonder if I will need to switch careers. And if so, to what?


Profession that require hands are still safe. Professional like Plumber/Electrician/Surgeon will be safe for some considerable future.


Great post. The most important part (which isn't clear from the title):

> Cody is Sourcegraph’s new LLM-backed coding assistant.

> Cody is not some vague “representation of a vision for the future of AI”. You can try it right now.

That link takes you to a signup form, not an app or download: https://sourcegraph.typeform.com/cody-signup

I signed up. Now begins the waiting game.


One of the interesting counter points that David Sacks mentions in the all in podcast if this is such a life-changing technology, why did OpenAI sell 49% (or some number near that) for 10B dollars. Presumably, the insiders know more than anyone else


Because while LLMs may be a life-changing technology, there is no guarantee that OpenAI specifically will continue to lead the pack.

"If the internet is such a big deal, why isn't AOL worth more?"


I wonder how much of it is legitimately just because this stuff is insanely expensive to run, and they needed the money?


While the tech might be dominant in a few years, that doesn't mean OpenAI is also going to dominate.


I mean bad deal doesn't %100 mean it can't be huge in future? Bill gates had almost half of Microsoft stock at IPO, every investing decision he had did'nt replace that lost but Microsoft is still huge. There aree similiar examples in history.


I assumed it was to help get extremely low cost Azure compute and to grow faster now that Microsoft was going all in on GPT4 with Bing & Office.


Compute and distribution is my guess


It's the government.


Now here is a pro LLM opinion I can get behind. The first one actually. The point, as I take it, it's that LLMs are going to be successful even though they might not actually be that good.

I can definitely see this happening as it relies on proven blind spot of reason: upfront cost vs long term and quite well hidden costs. You can ship code at incredible speed right now. And for unfamiliar stacks. As the point is not doing all the work yourself it will surely be full of corner cases you haven't thought of. Bugs will abound. But you can worry about it later, and it's not necessarily you who will have to worry about it, wink, wink.

The comparisons with AWS and K8s are spot on as well. Both rely on hiding the somewhat ugly truth behind instant and cheap adoption. And they both rely on peer pressure. What you going to do if you don't like LLMs? Refuse them and not ship as fast as your everybody else?


No kidding, I'm getting a second bachelor's. I'll become an archaeologist. Money, greed and pride have destroyed software for me. I watch a talk by Dijkstra then I have to deal with some god-awful mess of templates because someone decided deployment files were too verbose and templates are more flexible.

We know how to make correct, performant, beautiful software. But market pressures keeps most of us wrangling nonsense complexity by the cartload. I'm out.


> One of the craziest damned things I hear devs say about LLM-based coding help is that they can’t “trust” the code that it writes, because it “might have bugs in it”.

I can't speak for other devs, but when I'm talking about my inability to "trust" an LLM-based coding partner, it boils down to the lack of transparency around where that code actually came from. There have been numerous documented instances of Copilot and similar tools plagiarizing code verbatim from other projects, sometimes in ways that violate the license terms of the code in question; the last thing I need is to get nailed over an accidental GPL (or, worse, some EULA) violation.


I think it's also less about the bugs and more about the fact that the "thought" behind it is less predictable. If I have a random developer implement some basic function, I can have some confidence that it will cover all of the "normal" and obvious use cases for our shared intended use of this code.

Because of how an LLM approaches it, I have to be more careful about what assumptions it may have made, and which edge cases it decided to have guards around, since there's often a lot of "invisible" context and state baked into even simple tasks.

How a human would name a variable or a function reveals something about how they conceived of its use and purpose. How an LLM names things doesn't necessarily tell you anything about the code that surrounds it. This can make it harder to reason about the code at a time or skill distance.


A good test I have yet to try with ChatGPT / CopilotX:

My 11 year old would like to create a game. Can he talk to these tools (maybe with a tad of assistance from me) and get something working? Then have a place to keep asking questions and keep tweaking?

I find at the grown-up level, the barrier to entry to even reading docs and stackoverflow is actually high. There's a lot of subtle signals we need to use as experts to actually interpret the reliability of information. I can't imagine my 11 yo having the patience to wade through that.


Yes and no. You can totally create a simple game with ChatGPT but you still need to take the code it spits out, paste it somewhere, compile and/or run it in some environment. Until someone builds an interactive ChatGPT code sandbox for Javascript or Python. I'm sure someone at this very moment is working on something like this, if it already doesn't exist.


Probably not today, but I can foresee a future a few years down the line when something like Unity is integrated with a LLM and you talk to it like the computer in a Star Trek holodeck to make a game.

The late 90s version of me would be flabbergasted that the most believable part of the holodeck is the ability to create elaborate worlds and coherent narratives from a short description.



I've been trying GPT-4 for the past few weeks on some more generic (technical, not business domain related) but difficult problems for a side project.

The harder the problem the more its help tilts into net-negative rather than net-positive.

I've had it output a lot of issue-solving suggestions that are in direct logical contradiction with previous constraints I've described. Sure it corrects when I explain that, but frustratingly enough, after a couple of exchanges it forgets again (going around in circles).

However, for getting the basics down while learning a new domain its quite amazing. There are no stupid questions!


"How much of a productivity increase is that? Well jeepers, if you’re only doing 1/5th the work, then you are… punches buttons on calculator watch… five times as productive."

So basically have GPT write the most interesting part of the code and then dive into the ugliest parts of development to provide finishing touches without actually knowing anything about the code base? Sounds like hell to me and not a productivity boost. There will be a new generation of coding sweatshops spawned around this idea by countless clueless MBAs.


The current hype around ChatGPT seems to center around asking it to solve small toy problems and being amazed it can do so. Is there a foreseeable path or game plan to get from where we are now, to being able to give ChatGPT access to your existing humongous codebase and having it submit PRs? Are we already there? I'd say that solving small isolated problems, or writing individual reusable functions or modules, is maybe like 5% of coding, if that.


> ChatGPT access to your existing humongous codebase and having it submit PRs

I'm sure that's just around the corner. In 2 years tops it will very likely exist and be useful for some percentage of your JIRA issues.


Always good to read Yegge. But as with Joel back in the day, with this kind of post you have to separate the meat from the marketing.

It seems clear that LLM’s are going to be decent as assistants, but the 5x claim is BS and the marketing part of the post. Most of software development isn’t coding, it’s testing and experimenting and bug fixing and refactoring and documenting and interacting with other people and often companies.

If we get to the point where an LLM can get assigned a Jira ticket, make a bug fix, slack the PM for clarification on the ticket, debug the code, write regression tests, etc, then that will be something.

Right now? It’s a much improved Clippy with a very big set of knowledge behind it. When it’s embedded in the SDLC, that will e something. If we ever get there.


Until an airplane falls out of the sky and the implicit trust of LLM's code is found to be a root cause. Then you get lawyers with LLMs to put your butt in jail!


I'm sure every LLM provider has in their TOS that it should not be used to write safety critical software.

Like how itunes should not be used to design nuclear warheads.


But according to author, it is not possible to write safety critical software at all. You just can't trust code. It runs randomly after being copied from StackOverflow or ChatGPT.


This guy was wrong about Amazon and he’s wrong again about LLMs. Being wrong once doesn’t mean you’re more likely to be right later.

Let me tell you something, I’d rather write code than review code. Reviewing code is very draining, writing code is easy. That’s why I’ll never rely on LLM generated code. Figuring out the 20% you have to tweak is exhausting. Refining prompt after prompt and reading the result is exhausting.


“Get LLM to generate 80% of code, tweak 20%” just doesn’t make sense. It takes more time to understand others code than to write it yourself for anything complex. LLM might be good for boilerplate, or for brainstorming architecture, but I still don’t see any indication of it handling novel use cases.

The author gives plenty of examples of tech that was underestimated. They all solved problems in such novel and concrete ways that most couldn’t see the application. Compare to something like crypto which are hyped on the potential to solve problems at some point in the future in ways that we’ll figure out. I actually think LLM has the potential to deliver on the hype, but it’s a safe bet to be bearish on hyped tech until actual concrete uses are shown, and it’s usually the boring stuff that ends up changing the world. LLMs are by definition derivative and lack underlying knowledge. Until those issues are solved I can’t see them generating anything minorly complex or novel with minimal, easily discoverable bugs


> There is something legendary and historic happening in software engineering, right now as we speak, and yet most of you don’t realize at all how big it is.

> LLMs aren’t just the biggest change since social, mobile, or cloud–they’re the biggest thing since the World Wide Web. And on the coding front, they’re the biggest thing since IDEs and Stack Overflow, and may well eclipse them both.

I use ChatGPT pretty regularly but it is beyond my understanding how people can make such strong statements about the future so confidently based on the information we have now. This prediction may well come true but the justification for it is pretty vague. Sure AWS used to be a small demo and now it very useful and worth a lot of money but you could apply this example to anything and not every technology has a similar success story. Anyone who is this confident about the future is trying to sell you something. In this case it looks like Sourcegraph's Cody.


Speaking of cheating, one thing I've been thinking:

With GPT, what's the deal with non-live coding interviews and challenges, e.g. HackerRank and similar? Right now HackerRank can be configured to disallow alt-tabbing and you can demand there's a web camera on pointing at the candidate. Disregarding for a minute how intrusive and ridiculous these requirements are, is HackerRank's business model gone now that ChatGPT exists?

Yes, I cannot alt-tab. Say hello to my second laptop that you cannot see with the web camera. You also cannot see my hands. As we speak, I'm typing your "go through a list and build/find whatever in O(N)" for ChatGPT in my second laptop, thanks for playing!


This lazy way of interviewing is toast. The better way is take-home projects followed by a discussion. With GPT, those “3-4 hour” projects now become ~20 minutes, and the follow up discussion is for validating that interviewer can trust the candidate to audit the results and take responsibility for getting it to work correctly.

I really loved what Steve said about software engineering being a field that exists because you can’t trust code.


> This lazy way of interviewing is toast. The better way is take-home projects followed by a discussion.

Agreed, this is what I'm thinking too. The "lazy" way is made obsolete by ChatGPT. And good riddance, frankly!


You could also google your hackerrank task on the second computer.

Which, I assume, is the primary motivation for "no alt tabbing" to begin with.


Yes, of course. But ChatGPT/Copilot is more flexible, for the same reasons it's "better" than just googling (e.g. maybe you find the solution in Java but HackerRank wants it in Python, and it's trivial to ask ChatGPT for the translation. And this is just an example).

For coding problems this is essentially a more flexible search engine, one with which you can interact better to tweak the result.

If you can simply take the challenge prompt, paste it on ChatGPT, and have an answer in seconds, doesn't this more or less make the kinds of challenges often employed in HackerRank obsolete?


My experience with leetcode at least is, you can just access the forum of the given task to look at solutions (and explanations) by other people. So for anything other than challenging yourself, it is already obsolete without additional tools.

I see how ChatGPT might be a more flexible search engine, I just don't think it is a fundamentally new mode of cheating that hasn't been possible before.

The 'real challenge' with the two computer setup is having to retype the whole task anyways ;)


When I try to get my gamedev code written by gpt it seems like its more like 50% right 50% wrong but revising the wrong takes a long time.


Often it is far longer than writing it all yourself.

What I see as the advantage is taking doc comments or documentation and fine-tuning the engine on it.

Not embedding more words as numbers.

Not adding it to a prompt.

Literally fine tuning the model.

Does Alpaca or any LLaMA version let you do it? Cause Chat-GPT doesn’t.

And anyway I wouldn’t want to give our whole code over to some third company to re-use the way an artist’s work got jacked because it was publicly posted. I want to self-host the fine-tuned model.

THAT would be the killer feature, for businesses. The Web attracted businesses, not people. People moved over later.

Does Alpaca or any LLaMA version let you do it? Giving it (with its existing weights) a gigabyte of text files and letting it fine tune itself on that?


Given a large enough context window the need for fine tuning diminishes a little bit....

The cost of fine tuning these massive models is crazy right now. Fine tuning a smaller model can get you better performance for specific tasks in many cases. A lot of models are out there and totally free to adapt as needed via hugging face etc.


Hugging face?


Hugging Face is kind of the github of AI/ML: https://huggingface.co/


The guy should calm down. I mean, did he really used LLM for coding? Or even for asking the biography of someone famous? They are bullshit generators. Very high performant bullshit generators which outputs need to be carefully reviewed.


> Very high performant bullshit generators which outputs need to be carefully reviewed.

So just like every one of your colleagues? Because that’s what they are. These bots are your new colleagues and they respond directly to you, without complaining, exceeding your knowledge.

Today I sent a PR fixing a bug in a language I don’t know anything about.

It worked. The bullshit worked.

Then I tried working on top of that adding more features and lost the rest of the day. But hey it’s still March 2023 and my colleague is still a junior. We should celebrate that they managed to fix a bug. I don’t know where I’ll be in March 2024.


I would not be particularly proud that I managed to get a PR that I didn't write and I don't understand approved and merged into the codebase my fellow colleagues and I are responsible for.


Those beautiful English paragraphs "telling" ChatGPT what code to create are worse than programming.

First of all, not all of us programmers are native English speakers, and, as such, we might miss some of the nuances of the English language itself and thus fail to get the most out of ChatGPT. For comparison, programming languages as they now exist are (natural) language agnostic.

Second, how does ChatGPT correct for spelling errors? Or for language errors pure and simple? Is there a "compiler", or a pre-compiler for the English phrases which are about to get fed onto ChatGPT? Or at least a "language linter".


I read the entire article, sign up for the launch presentation only to find out only Mac and Linux are supported.


I think he is absolutely correct that successful LLM products will have a moat. Unlike with previous novel technologies it seems like incumbents actually have the upper hand. Hard to imagine a startup competing with the new Microsoft 365 copilot.

Microsoft will be able to build a better integrated assistant for their walled garden than any third party. It is also hard to imagine millions of businesses dropping Office for some completely new solution. Unless its REALLY novel & incredible of course


I think that a lot of interfaces are simply going to disappear.

Do you really need a whole office suite to figure out the answers, if AI gives you the answers immediately and in a better format?

For example, an LLM that has db-sql and charting tools, can generate whatever report I want on the fly. Not only that, instead of just generating a general report, I can query it consecutively to understand the data, eg. “Show me sales for this month. How do they compare to last month? How about last year? Give me a chart of the last 12 months. What impacted sales in November? Who are the best performing sales people?”.

The above is so much better than having to dump csv files, open them in a spreadsheet, do dynamic tables, chart things, etc.


Great essay by Yegge as always.

I have worked in the search space for a long time. And cherry picked examples of obvious wins, some NLP thing being smart, are _rife_. You get worked up about that one time something spooky/incredible happened and think the world is changing/ending.

What I would like to see is an actual, independent, reproducible study of productivity gains on specific tasks. I've not seen this yet. I'm curious if anything is out there?


Have you played with ChatGPT or GitHub copilot? 5 years ago, I'd agree with you, but nowadays, 'cherry picked wins' happen far more often than not in my experience.


Yes, quite extensively. It was quite good. I'm very enthusiastic about it. I just want some data other than my n=1 experience.


The thing about it is the distribution of wins is going to be different depending on the prompter, and that will make all the difference. As experienced devs, we can prompt the AI, and take the wins when they come, but deftly avert the nonsense when we see it, because we know what to look for. For experienced devs like you and I, this is a huge win and will result in productivity gains.

I worry for the junior dev. They can take the wins and feel good, but they’re going to fall for the nonsense every time, because they don’t k ow how to spot it. Junior devs are fed lies and they take them to be truths. Hopefully they are dispelled immediately when they are tested in code, but other lies will persist and be sold as truths. I am worried this is a bigger problem than people might think, cause it’s a feedback loop that can lead to a vicious cycle. We see what vicious cycles have done to social media, and social spheres, imagine what one will do to software systems.

Next year we’ll be graduating students who started college during the pandemic. I would say their programming skills are quite atrophied compared to previous generations. Now I’m worried in four years we’ll be graduating students who started high school in covid times and started colleges using AI chat bots. Who knows if these graduates will even be able to do anything close to what students even 2 years ago could do. They’ll all just blankly stare and reach for their iPhones like they do today, but exponentially worse because they won’t even be able to formulate the search prompt. They’ll need AI for that too.


GitHub did some research around productivity with Copilot. They discuss the results and link out to other research at https://github.blog/2022-09-07-research-quantifying-github-c...


Business idea for anyone who is keen:

Stack overflow but a LLM responds to every question. Same upvote mechanic, same "accepted answer" mechanic. Perhaps you have known experts validate the response.

Basically any Q&A forum but with the LLM as the first respondent. You would probably customize or fine-tune this for particular domains (e.g. software engineering, medical training, industrial/commercial training, legal compliance).


Why ask chatgpt to write the code when I can just ask it to do the task? Instead of "write code to do X", just ask it "do X".


These models aren't capable of "doing" things. They can't necessarily calculate values etc. They can assemble text based (more or less) on statistical analysis of all their input data, and the are incredible at it... but if you ask what 2+2 is, its not doing math to get to 4.

They certainly can't deploy the code they generate etc.


Things are moving so damn fast, while you were correct when you posted - two hours later you were not:

https://twitter.com/gdb/status/1638971232443076609


I tried asking GPT-4 a math question and it does do math:

Me: "What (8 * 25) / 14 + 4 - 2?"

AI: "Let's break down the expression and solve it step by step:

    Multiply 8 and 25: 8 * 25 = 200
    Divide the result by 14: 200 / 14 ≈ 14.2857 (rounded to four decimal places)
    Add 4 to the result: 14.2857 + 4 = 18.2857
    Subtract 2 from the result: 18.2857 - 2 = 16.2857 (rounded to four decimal places)
So, the result of the expression (8 * 25) / 14 + 4 - 2 is approximately 16.2857."


>its not doing math to get to 4

OpenAI just announced a partnership with Wolfram Alpha, so now it can ask a different computer to do the math.


It's only time till they have access to your filesystem and then that's it


I guess it's mostly due to cost and reproducibility

It's significantly cheaper to run an sql query than have ChatGPT look through several gigabytes of data and ChatGPT won't be guaranteed to produce the same result each time


Cheating is all you need and keeping code that was going to be open source, offline, is the new trust model?

The open source community runs well when you can relate to the users. If you vacuum up the code and kill off the capacity for copyright claims of code... I can hardly see non-sponsored researchers sharing code openly anymore, because they cannot define the terms of use once an LLM eats it up.


Cheating (and ignoring copyright) is all you need. And patents. I’ll start again: Amongst our things to ignore are…


Opposite feels about people seeing AI with skepticism nearly everyone around me sees it as a panacea if only tomorrow. Sorta including myself.


People are confusing what AI is exactly.

It’s a feature, not a product.

Which is why so many incumbents were able to embedded it so quickly into their own product.


Not exactly. AI can be a feature when used as dressing on an existing workflow in an existing application.

AI can be a product when it enables new workflows and applications that wouldn't be possible in a meaningful way without it. These are the transformative things people don't see yet because we haven't really had time to process the tech.


> > You can't trust the AI's code!

> Yeah, but you can't trust your own code!

Those are two different types of trust. The only overlap they have is that they probably both have bugs. Maybe I'd trust AI code after some really aggressively adversarial TDD and fuzz testing enough to have a tiny bit of hope that it's going to do the right thing in most cases. But for anything non-trivial I'd be worried about a bunch of other trust issues. Just off the top of my head:

- You know all those articles with terrible advice Google is always showing at the top of their results? Welcome to a world where those are regurgitated by a well-articulated, authoritative program, resulting in a long-term maintenance nightmare.

- An LLM is far too big to review for malicious content. And if it's being used by lots of programmers you can bet a billion it'll be a prime target for bad actors.

- The interaction between my brain and my hands isn't going to be shipped off to professional manipulators.


> software engineering exists as a discipline because you cannot EVER under any circumstances TRUST CODE

This is the insight that unlocked GPT code assistance for me. GPT is another developer on my team now. Devs will always have jobs because humans are flawed, problems are hard, and GPT is trained on human data that hasn’t solved all problems.


> LLMs aren’t just the biggest change since social, mobile, or cloud–they’re the biggest thing since the World Wide Web. And on the coding front, they’re the biggest thing since IDEs and Stack Overflow, and may well eclipse them both.

It's amazing how far just predicting the next character can go when you do it really, really well.


This makes me ponder over how predictable we humans can be.

We tend to act similarly when placed in similar circumstances. We think alike when presented with the same context. We find patterns in things. Etc, etc, etc.

And I'm sure I'm not the first person in this thread to have had this exact thought.


> All you crazy MFs are completely overlooking the fact that software engineering exists as a discipline because you cannot EVER under any circumstances TRUST CODE.

Great way to make the point that as people get overly-comfortable with trusting these AI tools, the inevitable outcome is absolute chaos and destruction.


If AI code causes chaos and destruction then all we've done is prove that as an industry we are really bad at actually verifying that code works.

If AI code can fuck you up then so can any junior dev or intern. If the PR with AI generated code passes the tests and code review then that's on you.


> we are really bad at actually verifying that code works

This assumes a culture where verification is a virtue. Given the opportunity to cut corners for the sake of KPIs or other management sorcery, it's a virtual certainty that corners will get cut if AI enables it.

> If AI code can fuck you up then so can any junior dev or intern.

And they do. The fact that major corporations routinely have massive data leaks speaks to some serious QC issues.

---

Ultimately, the problem is the long-term brain atrophy due to these tools. There will be too much trust placed in them, colleges will start awarding degrees to people who just "AI'd the answer" and eventually, we have a majority work force who can't tie their proverbial shoes.

It's not tomorrow and hopefully not even a decade. But a generation's worth of dependence on these tools coupled with the employment incentive to do more faster? Boy howdy [1].

[1] https://www.youtube.com/watch?v=PTtBN34AXl0


The Machine Stops soon!


<ted-kaczynski-grin.jpg>


This is really great but it also feels like a bit of an oversight to not realize that Github (aka MS aka OpenAI) launched a Sourcegraph competitor recently:

https://github.com/features/code-search

I appreciate the breakdown of how to do it yourself, but even having to signup for a waitlist to try their option when Github seems emmintely able to basically do exactly the same thing... idk. It just points to how the people that control the models really have final say here. It's hard to posit yourself as in a lucky position when you're faced up against the perfect opponent more primed to act more quickly that you are.


Here’s another back of the envelope calculation: it will be possible at some point to automate 80% of coding. But according to the Pareto principle 20% of work takes 80% of time. So worst case scenario it will save you 20% of time. Still nothing to sneeze at. But not 5x productivity boost. Rather “plus one fifth”. But still - it’s much better than nothing!


    \\b\\w*\\i\\w*\\b
What's going on with that regexp? non-word 0..n-word-chars *escaped i that makes no sense* 0..n-word-chars non-word.

It only works because \i hasn't been defined in elisp regexp.

This sounds like yet another great example of "you can't trust ML output".

I can only imagine the number of obscure bugs that will come from this trend. "It worked for the one thing I tried it on."


Nice to see Steve Yegge back to writing blog posts :) One of the reasons I was excited to join Google was to read all of his internal posts, but it turns out they were mostly on Google+, which was taken down by the time I joined.


Ooh I didn't realize Steve Yegge was at Sourcegraph -- I think this might explain some of Gergely Orosz's interest / recent deep-dives into the company. This is going to be an exciting one to watch!


I've been a professional software engineer since the 1990s. The pointy-haired boss has been trying to replace high-priced code artisans with cheap commodity labor since well before I came onto the scene. I remember when they hoped they could get away with hiring or off-shoring people who had just enough ability to glue together Microsoft COM objects or Java Beans, or graphical UML models or NPM packages.

Using LLMs to produce software solutions definitely feels like a seismic shift in the game. Time will tell.


If LLM would write code faster or better than me, I would use LLM. But LLM comes up with perverse algorithms that I need to understand, debug and fix. Sorry, not there yet. By far not.


I agree with a lot of things in this article.

My question is:

If LLM can understand natural languages and convert them to a programming language, why not skip the high level languages and go strait to machine code?


presumably for human verification purposes.


The programming language is an intermediary we humans and the machine can understand. It's better from an audit/explainability perspective.


if the machine can already understand natural languages as shown by gpt, why can't it just let you code and debug in natural languages. Just curious?

Like why can't it just throw compiler errors on your English code?


It can do assembly, which is basically machine code.


What John Carmack said in his tweet replying to a DM recently resonates with me the most. Languages, Architectures, and everything in between are essentially just a tool(the thing which changes) to produce the end result you desire. i.e. the business application or outcome you want. Thinking in this way, using Natural Language, and/or prompting seems like the best way and highest abstraction that we always strived to achieve in Software engineering.


> What about chatting with people in a browser? Doesn’t matter whether you’re using Facebook, Google Chat, LinkedIn, or just chatting with a customer service agent: if you’re having a conversation with someone in a browser, all that shit started life as a teeny demo of 2 engineers sending messages back and forth over a “hanging GET” channel back in 2005.

I'm pretty sure web chatrooms have existed in the late 90s already.


That 80:20 rule when it comes to LLM is more like 30:70. Only about 30% of the code it generates is useful, and that too after you change it to an arbitrary extent, nearly 70% of the time.

On the other hand, if you are following along with some Udemy course, it’s pretty good at producing the exact lines in the video around 80% of the time. :rofl_emoji:

On the whole, I’m very happy and pleasantly amused with my copilot purchase.


> if you’re having a conversation with someone in a browser, all that shit started life as a teeny demo of 2 engineers sending messages back and forth over a “hanging GET” channel back in 2005

Hmm.. pretty sure I was using online chat rooms around 1997. He's probably talking about Comet, but it was just a technical improvement, not an enabler. And realtime streams have since moved on to SSE / websockets.


How many layers deep can you do with this? Can you index all your data, then ask questions and index those results making them part of the corpus in your index? And continue to do this over and over allowing GPT to get back its own contexts without it even knowing? Would this even be desirable? Would this create a sort of incest problem or would it make for even better semantical results?


It's funny to watch the dynamic described in the article play out exactly in this thread by posters who have not read the article.


I know a lot of people enjoy the struggle of coding and banging your head against a wall to solve a problem, but I hope the days of severe mental drain trying to learn how to code something complex are coming to an end. Kids growing up in this generation are lucky. They'll look back at us now like I look back at punch card programmers.


When did Yegge join source graph?


In the article he says September :)


Barely tangential to the article, but it’s getting exhausting seeing lazy take after lazy take where an author flippantly refers to crypto as a dumb fad when they really mean specific currencies or maybe NFTs as a whole.


Best part of the article

> So the next one of you to complain that “you can’t trust LLM code” gets a little badge that says “Welcome to engineering motherfucker”. You’ve finally learned the secret of the trade: Don’t. Trust. Anything!


I scanned through for the punch line where he explains how much he relied on LLM help while writing this article. Did I miss it?

> I don’t really know how to finish this post. Are we there yet? I’ve tried to write this thing 3 times now, and it looks like I may have finally made it. I aimed for 5 pages, deliberately under-explained everything, and it’s… fifteen. Sigh.

He did it all by hand. Of course. Because he has just as hard a time facing this as anyone else.

There is going to be a race to come to terms with this. If something we're all currently using for free is capable of this much, then most of us aren't going to be employed as software developers in ten years. We won't be able to scale up the demand for code as fast as the supply.

It's terrifying. There's no post-scarcity utopia in capitalism. Scarcity is a given, and if you aren't wealthy and aren't employed, you will feel it.


The part about context window is interesting and how you have to give LLM's cheatsheets, going to see if I can reproduce that to get better results ty Steve


Ai is going to be equivalent of finding and using "0" in mathematics. It's quite literally the same thing at a bigger scale. At first, Mathematicians oppossed adopting zero because they thought what doesn't exist shouldn't exit and deemed it as an immoral practice. This iteration of LLM is very similar. It's quite literally black box magic. I find it fascinating. Sometimes when humanity accept irrationality we gain more understanding of nature instead of the opposite. When we only pursue rationality and profit we end up in world wars.


So with the emacs example, the input prompt in natural language was 210 bytes. The code outputted is 345 bytes (omitting the book text). That's 60% more bytes. Totally not a 5x productivity increase. Question: how does it scale? How can you be sure that it is productive at all and at some point you will not be inputting more English language bytes than getting the code bytes back?


Great to see another Yegge rant! Thought those were all history...


Thank you. Glad to hear i am not the only one who sees this :)


LOLOL shilling so hard = company probably bust soon?


wrong, you also need to not get caught

or you can to all America over this, and change the meaning of cheating for yourself




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: