Years ago, over a decade ago now, I was a .Net developer. Microsoft introduced Entity Framework, their new way of handling data in .Net applications. Promises made, promises believed, we all used it. I was especially glad of Lazy Loading, where I didn't have to load data from the database into my memory structures; the system would do that automatically. I could write my code as if all my memory structures were populated and not worry about it. Except, it didn't work consistently. Every now and again a memory structure would not be populated, for no apparent reason. Digging deep into technet, I found a small note saying "if this happens, then you can check whether the data has been loaded by checking the value of this flag and manually loading it if necessary" [0]. So, in other words, I have to manually load all my data because I can't trust EF to do it for me. [1]
Long analogy short, this is where I think AI for coding is now. It gets things wrong enough that I have to manually check everything it does and correct it, to the point where I might as well just do it myself in the first place. This might not always be the case, but that's where I feel it is right now.
[0] Entity Framework has moved on a lot since then, and apparently now can be trusted to lazily load data. I don't know because...
[1] I spat the dummy, replaced Windows with Linux, and started learning Go. Which does exactly what it says it does, with no magic. Exactly what I needed, and I still love Go for this.
Pardon me for the tangent (just a general comment not directed to OP).
What I have learned over the years is that the only way to properly use ORM is as a fancy query tool. Build the query, fetch/update data, MOVE THE DATA to separate business objects. Don't leave ORM entities shared across the sea of objects!
I wouldn't have believed you until I moved from ActiveRecord (Rails's ORM) to Ecto (Elixir/Phoenix's data mapping library which is decidedly not an ORM.) It's a million times better and I'm never going back.
Ecto is hands down my favorite part of the elixir ecosystem.
It’s so elegant and the Lego blocks (query, schema, change set, repo) can be mixed and matched in different ways.
I’ve even used schemas and change sets to validate API requests and trivially provide very nice, clean, and specific errors, while getting perfectly typed structs when things are validated.
same, I wish more libraries would go the ecto design route. my ecto queries map pretty close to 1:1 with the sql counterpart. no guessing what the output is going to look like. I spend my time debugging the query and not trying to get the orm to output he query I want.
Yes, same experience here. I felt (and still feel) that ActiveRecord is one of if not the best ORMs out there, but it was always a source of debugging and performance optimizations/pain, and the trick was basically taking hand-written SQL and trying to get ActiveRecord to generate that. After briefly moving to node.js full time I actually got very anti-ORM, although query building libraries all sucked too which left me unsure of what the best approach is.
Then I learned Ecto/Phoenix and this is truly the best way. Ecto is so close and translateable to raw SQL that there's little to no friction added by it, but it handles all the stuff you don't want to have to do by hand (like query building, parameterization, etc). Ecto is a real breath of fresh air and I find myself writing quick scripts that hit a database in Elixir just so I can use Ecto! I also love how easy Ecto makes it to model database tables that were originally defined by another language/framework or even by hand. Trying to do that with ActiveRecord or another ORM is usually a recipe for extreme pain, but with Ecto it's so easy.
Yeah, I hear some people say that they find Ecto.Query confusing, and I think it's because they never learned SQL properly. That's understandable because it's possible to use something like ActiveRecord for years without ever learning to write even a simple SQL query. But if you have a good grasp of SQL then Ecto.Query is trivial to learn - it's basically just SQL in Elixir syntax.
Being able to think directly in sql lets you perform optimal queries once you understand sql. and imho, this much cleaner than tha equivalen sql to write. it also takes care of input sanatization and bindings.
adding an off the shelf ORM layer creates so much more opacity and tech debt than writing queries I don't understand why anyone would willingly put one into their stack. Sure, they're neat although I don't even know if they save time. There's something very satisfying about well-crafted queries. And is it ever really well crafted if you can't tweak them to improve their their execution plan? I've never had a client or boss who asked to use an ORM framework. I suspect it's something people think looks cool - treating SQL as OOP - until they run into a problem it can't solve.
[edit] for instance, I have a case where I use a few dozen custom queries on timers to trawl through massive live data and reduce it into a separate analytics DB. Using everything from window functions to cron timers to janky PHP code that just slams results from separate DBs together to provide relatively accurate real-time results. At the end from that drastically reduced set in the analytics DB... sure, I'm happy to let the client summarize whatever they want with Metabase. But those things just couldn't be done with an ORM, and why would I want to?
Yes, I would not put it just anywhere. But I have few rules about ORMs:
- Proper DB design first. You should be able to remove the ORM and DB should still function as intended. This means application-side cascade operations or application-side inheritance is banned.
- No entities with magical collections pointing to each other. In other words, no n to n relations handled by ORM layer. Create in-between table, for gods sake. Otherwise it becomes incredibly confusing and barely maintainable.
- Prefer fetching data in a way that does not populate collections. In other words, fetch the most fine-grained entity and join related data. Best if you craft special record entities to fetch data into (easy with EF or Doctrine).
- Most ORMs allow you to inspect what kind of queries you create. Use it as query building tool. Inspect queries often, don't do insane join chains and other silly stuff.
I would use ORM in one kind of app: where I would work with data that shares records that might need to be either inserted or updated, and there is several nesting levels of this kind of fun. You know, you need to either insert or update entity, if it exists, you should update, and then assign related entities to it, if it does not, then you should insert, and assign related entities to the newly created id. The ORM can easily deal with that, and on top of that it can do efficient batched queries, which would be really annoying and error-prone to hand-craft.
If the app does not require this kind of database with these kind of relations, I would not use ORM.
> No entities with magical collections pointing to each other. In other words, no n to n relations handled by ORM layer. Create in-between table, for gods sake. Otherwise it becomes incredibly confusing and barely maintainable.
So, I have a database that looks like this. My method was to lay out the database myself, by hand, and then use EF's facility to generate EF code from an existing database. The bridge table was recognized as being nothing but the embodiment of a many-to-many relation and the EF code autogenerated the collections you don't like.
Is this a problem? If you do things the other way around, the ORM creates the same table and it's still there in the database. It isn't possible not to create the bridge table. Why is that case different?
This is more of a preference for bridge to be visible in application. Also the bridge may seem simple at first, but it also may gain associated data, like created_at, order, etc.
> adding an off the shelf ORM layer creates so much more opacity and tech debt than writing queries I don't understand why anyone would willingly put one into their stack.
Simple: because if I don't, I'm going to spend the rest of my career explaining why I didn't to people extremely skeptical of that decision. Meanwhile even people like me tend to just shrug and quietly go "oh, an ORM? Well, that's the price of doing the job."
Also, ORMs are an endless source of well-paid jobs for people who actually learned relational algebra at some point in their lives, and that's not a compliment to ORMs.
ORM is not for writing analytics queries. It's for your CRUD operations. Something like Django Admin would be impossible without an ORM. You create tables for your business logic and customer support or whoever can just browse and populate them.
I consider an ORM to be any SQL generating API, without which it would indeed be impossible to have a generic Admin class to make Admin views in Django.
What should I call a program that generates SQL, executes it, and stores the result in a tuple, object, or whatever data structure in the programming language that I'm using? Does it magically stop being an ORM the second I use a tuple instead of a class instance, or is it now an ORM plus another nameless type of program? Are tuples also objects?
Traditionally, though, SQL generation was known as query building. The query was executed via database engine or database driver, depending on the particulars. ORM, as the name originally implied, was the step that converted the relations into structured objects (and vice versa). So, yes, technically if you maintain your data as tuples end to end you are not utilizing ORM. Lastly, there once was what was known as the active record pattern that tried to combine all of these distinct features into some kind of unified feature set.
But we're in a new age. Tradition has gone out the window. Computing terms have no consistency to speak of, and not just when it comes to databases. Indeed, most people will call any kind of database-related code ORM these days. It's just funny that ORM no longer means object-relational mapping.
I think the core thing that ORMs do is create a 1:1 mapping between the data structures in the database (that are, or should be, optimised for storage) and the data structures in the application (that are, or should be, optimised for the application business logic).
ORMs create this false equivalence (and in this sense, so does Django's admin interface despite using tuples instead of classes). I can see the sense of this, vaguely, for an admin interface, but it's still a false equivalence.
I agree with you, but I do think there's a little fuzziness between full-blown ORM and a tuple-populating query builder in some cases. For example Ecto, which can have understanding of the table schema and populate a struct with the data. It's just a struct though, not an object. There's no functions or methods on it, it's basically just a tuple with a little more organization.
> It's just a struct though, not an object. There's no functions or methods on it
Object-relational mapping was originally coined in the Smalltalk world, so objects were in front of mind, but it was really about type conversion. I am not sure that functions or methods are significant. It may be reasonable to say that a struct is an object, for all intents and purposes.
A pendant might say that what flimsy definition Kay did give for object-oriented programming was just a laundry list of Smalltalk features, meaning that Smalltalk is (probably) the only object-oriented language out there, and therefore ORM can only exist within the Smalltalk ecosystem. But I'm not sure tradition ever latched onto that, perhaps in large part because Kay didn't do a good job of articulating himself.
Most queries are pretty trivial, ORMs are great for 90% of queries. As long as you don't try to bend the ORM query system to do very complicated queries it is fine. Most (all?) ORMs allow raw queries as well so you can mix both.
On top of that most ORMs have migrations, connection management, transaction management, schema management and type-generation built-in.
Some ORMs have inherently bad design choices though, like lazy loading or implicit transaction sharing between different parts of the code. Most modern ORMs don't really have that stuff anymore.
How do you map rows to objects? How do you insert into/update rows in your databases? These are the basic problems ORMs solve splendidly. They are for OLTP workloads, and have deliberate escape hatches to SQL (or some abstraction over it, like JPQL in java-land).
I just fail to see what else would you do, besides implementing a bug-ridden, half-ORM yourself.
Rows are tuples, not objects, and treated as such throughout the code. Only the needed data is selected in the form most appropriate to the task at hand, constructed in a hand-written sql query, maybe even taylored to the DB/task specifics. Inserts/updates are also specific to the task, appropriately grouped, and also performed using plain sql. Data pipelines are directly visible in the code, all DB accesses are explicit.
Maybe we need to use a different acronym than ORM, because to me the thing we can all agree we need is code that emits SQL. If you can't agree that projects need generated SQL because SQL is dog water for composition, then we can't really agree on anything.
Probably so: I can't agree with that particular inference.
1. Very often we need generated SQL because writing SQL for primitive CRUD operations is hell tedious and error-prone (as well as writing UI forms connected to these CRUD endpoints, so I prefer to generate them too).
2. Structured Query Language being very poorly structured is indeed a huge resource drain when developing and maintaining complex queries. PRQL and the like try to address this, but that's an entirely different level of abstraction.
3. Unfortunately, when efficiency matters we have to resort to writing hand-optimized SQL. And this usually happens exactly when we terribly need a well-composing query language.
I'd argue that "code that emits SQL" is never an inherent need but a possible development time-saver - we need code that emits SQL in those cases (and only those cases) where it saves a meaningful amount of development time compared to just writing the SQL.
That is exactly where ORMs help. The problem is all of the other stuff with it. When most people just need a simple mapper. Not something to build their SQL statements for them (which seems to be why most people pick it).
But that comes to the second problem. Most devs I meet seem to be deathly allergic to SQL. :)
One project I had a dev come to me asking me to look at a bug in the thing. Having never seen that particular ORM before I was able to diagnose what was wrong. Because MS ORMs have the same issues over and over (going back to the 90s). You better read those docs! Because whatever they did in this stack will be in their next one when they abandon it in place 3 years from now.
> These are the basic problems ORMs solve splendidly.
Depends on the ORM.
I have noticed that typically, 'unit of work' type ORMs (EFCore and Hibernate/NHibernate as examples) prevent being 'true to the ORM' but 'efficient'.
i.e. Hibernate and EFCore (pre 7 or 8.0ish) cannot do a 'single pass update'. You have to first pull the entities in, and it does a per-entity-id update statement.
> I just fail to see what else would you do, besides implementing a bug-ridden, half-ORM yourself.
Eh, you can do 'basic' active-record style builders on top of dapper as an afternoon kata, if you keep feature set simple, shouldn't have bugs.
That said, I prefer micro-ORMs that at most provide a DSL for the SQL layer. less surprises and more concise code.
For me the biggest reason is automated database initialization and migration. After defining or updating the ORM model, I don't have to worry about manually CREATing and ALTERing tables as the model evolves.
This is compatible with the OC suggestion of using ORMs as a "fancy query builder" and nothing more, which I strongly support.
You always have to worry about your model changes if you run at any sort of scale. Some ORMs will get it right most of the time, but the few times they don’t will really bite you in the ass down the line. Especially with the more “magical” ORMs like EF where you might not necessarily know how it build your tables unless you specifically designed them yourself.
This is where migrations also become sort of annoying. Because if you use them. Then it is harder to fix the mistakes since you can’t just change your DB without using the ORM or you’ll typically break your migration stream or at least run into a lot of troubles with it.
And what is the plus side of having a code-first DB really? You can fairly easily store those “alter table” changes as you go along and have full availability of history in a very readable way that anyone, including people not using C#, Java, Python.
Which is the other issue with ORMs. If you have multiple consumers of your data. Then an ORM most likely won’t consider that as it alters your “models”.
For a lot of projects this is a non-issue, especially at first. Then 10 years down the line, it becomes a full blown nightmare and you eventually stop using the ORM. After spending a lot of resources cleaning up your technical debt.
> And what is the plus side of having a code-first DB really? You can fairly easily store those “alter table” changes as you go along and have full availability of history in a very readable way that anyone, including people not using C#, Java, Python.
The benefits should be obvious if you've used ORMs. They are an object that represents your database data in code rather than in a table where you can't touch it. If you have code that brings data from a database into code, congratulations, you've implemented part of an ORM. Having the data model defined "in code" treats the code as first-class instead of the SQL, which makes sense from an ergonomics perspective, you will spend much more time with the code objects than you will the SQL schemas. Either way, you will have two versions: a SQL version and a code version. You might as well get both from writing one.
If you can read alter table in SQL, you can probably read migrations.AddField in Python, and whatever the equivalent is in the other languages. I still am waiting with bated breath for the problems with much maligned (by some) ORMs to arrive.
The only area of development where ORMs haven’t been the cause for at least some trouble in my career has been with relatively small and completely decoupled services. Even here I’ve had to replace countless ORMs with more efficient approaches as the service eventually needed to be build with C/C++. That being said, I don’t think any of these should have been build without the ORM. The rewrite would have been almost as much of a hassle if there hadn’t been an ORM after all.
I’m not really against ORMs as such. I’m not a fan of code-first databases for anything serious, but as far as CRUD operations goes I don’t see why you wouldn’t use an ORM until it fails you, which it won’t in most cases, and in those cases where it does… well similar to what I said earlier you just wouldn’t have build it to scale from the beginning anyway, and if you had and it turned out it didn’t need to scale then you probably wasted a lot of developer resources to do so.
I'm not sure if you're talking about creating and altering model tables or if you mean ORMs provide safety in case underlying tables are modified. I'd argue that well-built queries should be resistant to alteration of the underlying tables, and that views and functions and stored procedures already exist to both red flag breaking changes and also to combine and reduce whatever you need without relying on third party code in another language layer to do the lifting.
Doesn't it also mean that any non-trivial migration (e.g. which requires data transformation or which needs to be structured to minimize locking) has to be defined elsewhere, thus leaving you with two different sources for migrations, plus some (ad-hoc) means to coordinate the two?
(I would say that it is conceptually perverse for a client of a system to have authority over it. Specifically, for a database client to define its schema.)
Agree completely, as does most of the Go community :) Newbie gophers are regularly told to learn some SQL and stop trying to rebuild ActiveRecord in Go ;)
But in .Net, EF is still the most common way of accessing data (I have heard, because I stopped using it over a decade ago).
That doesn't really help you with EF because there's plenty of stuff shared at context level. So depending on the order of queries in the context the same query can return different data.
Well, this month we had to debug an issue where EF was NOT populating fields on classes from the db, that it definitely should have been!
So it still seems flakey. I've never worked a single job that chose EF that didn't end up regretting it. Either from it being unreliable, migration hell or awful performance.
"It allows you to treat your database like an in-memory enumerable"
Then devs go and do exactly that and wonder why performance is so terrible...
We had an issue last week where we had an obect like
public class Foo
{
public List<Bar> Bars { get; set; }
}
We'd query for some Foos, like:
await _dbContext.Foos.ToListAsync();
and some amount of them would have Bars be an empty list where it should definitely be populated from the db. And it wasn't even consistent, sometimes it would populate thousands, sometimes it would populate a handful and then just stop populating Bars.
No errors, no exceptions, just empty lists where we'd expect data.
And so often we have to debug and see what SQL its actually generating, then spend time trying to get it to generate reasonable sql, when if we were using sprocs we could just write the damn sql quicker.
Another issue we have is the _EFMIgratoinsHistory table.
Sometimes we will deploy and get a load of migration errors, as it tries to run migrations its already ran... SO they all fail and then the API doesn't come back up... The fix ? TUrn it off and on again and it stops trying to re-run migrations its already ran!
navigation properties are not loaded automatically, because they can be expensive. you need to use `.Include(foo => foo.Bars)` to tell EF to retrieve them.
EF tries to be smart and will fix up the property in memory if the referenced entities are returned in separate queries. but if those queries don't return all records in `Foo.Bar`, `Foo.Bar` will only be partially populated.
this can be confusing and is one of the reasons i almost never use navigation properties when working with EF.
We have those, and when I say inconsistent I mean inconsistent on the same query / exact same line of code on the same database.
e.g. stick a breakpoint, step over, see in the debugger that it was not populating everything it should. Then run it again, do the same and see different results. Exact same code, exact same db, different results.
5000 results back from the db, anything between 5000 and a handful were only fully correctly populated.
If that happens with the correct `.Include()`, you really should raise an issue with EF, trying to reproduce it. If it's not a random mistake in your code, that's a really big deal.
Like your parent said, the same line of code will or won't populate the navigation property depending on whether EF is already tracking the entity that belongs there (generally because some other earlier query loaded it). You get different behavior depending on the state of the system; you can't look at "one line of code" in isolation unless that line of code includes every necessary step to protect itself against context sensitivity.
I've been using Entity Framework for the last 5 years and have not encountered this issue, as long as I've got all my Includes specified correctly.
There is also the AutoIncludeAttribute that you can specify on entity fields directly to always include those fields for every query.
My main complaints with EF are that the scaffolding and migration commands for the CLI tool are nearly impossible to debug if they error during the run.
But when they run right, they save me a ton of time in managing schema changes. Honestly, I consider that part worth all the rest.
There can also be some difficulty getting queries right when there are cyclical references. Filtering "parent" entities based on "child" ones in a single query can also be difficult, and also can't be composed with Expression callbacks.
But in any difficult case, I can always fall back on ADO.NET with a manual query (there are also ways of injecting manual query bits into EF queries). Which is what we'd be doing without EF, so I don't get the complaints about EF "getting in the way".
Lazy loading was a mistake in EF. A lot of apps had awful performance due to lazy loading properties in a foreach loop creating N+1 queries to the database. It would be fine in dev with 50-100 rows and a localhost SQL and blow up in prod with 1000s of rows and a separate Azure SQL.
Also if you relied on lazy loading properties after the DbContext had been disposed (after the using() block) you were out of luck.
With old EF we would turn off lazy loading to make sure devs always got an exception if they hadn’t used .Include() to bring the related entities back in their initial query. Querying the database should always be explicit not lurking behind property getters.
Fortunately with EF core MS realized this and it’s off by default. EF with wise use of .Include and no lazy loading is a pretty good ORM!
> [0] Entity Framework has moved on a lot since then, and apparently now can be trusted to lazily load data
To some degree. If you're using it for anything serious you're still going to help it along a lot. It's rather easy to do so, however, and I certainly wouldn't consider writing your own code as fast or easy as simply telling EF how you want it to do certain things.
I'm not an overall fan of EF. I especially dislike how it's model builder does not share interoperability with other .Net libraries which also use it. I also don't really like the magic .Net does behind the scenes. EF as a whole has been one of the better ORMs for any language since .net core. I'd still personally much prefer something like Rust's diesel, but whenever I have to work with C# I tend to also use EF.
You might want to try Linq2Db, it is much closer to Diesel in how it works (More SQL DSL with parameter+reader mapper, less Unit-of-work ORM).
FWIW, it can actually work 'on top' of an Existing EF Core context+mappings (i.e. if you have an existing project and want to migrate, or just need the better feature-set for a specific case.) or you can get pretty close to 'yolo by convention' depending on case. In general though it's a lot less ceremony to start messing around.
Even more important than the question of productivity is that this turns a joyous activity into a depressing probabilistic shitshow where you describe what you're trying to do and hope for the best. Instead of feeling engaged and challenged, you're just annoyed and frustrated. No thanks!
> this is where I think AI for coding is now. It gets things wrong enough that I have to manually check everything
This might be dependant on the programming language, some languages are way more popular and have way more questions on StackOverflow and Reddit and repos on github, so the answers will be better.
When I use copilot for JS it's right 90% of the time.
And where it's 'wrong' it's usually just stuff it skipped over because it didnt have proper context.
When I'm not sure in the someones code I have to double or triple check it be sure that I understand it correctly and to verify that there no somehow hidden missed steps or side effects.
> Long analogy short, this is where I think AI for coding is now. It gets things wrong enough that I have to manually check everything it does and correct it, to the point where I might as well just do it myself in the first place.
Even if that were true, reading code is reasonably faster than typing it out and then reading it again to check it.
I'm left wondering what AI everyone are using. I can prompt copilot and it gives me exactly what I need. Sure, if I barf out a lazy, half-baked prompt it yields a waste of time.
My problem is running into it's limitations, mostly around resources. I have tried giving it larger tasks and it takes bloody forever.
"Given this unstructured data, create CSV output for all platforms, with each line containing the manual, and model, ignoring the text in parenthesis."
Works great except for God-awful performance and stopping half way through. I had to break out each section and paste it into the prompt and let it work on small pieces. We need to get to the next level with this, especially for paying customers.
More concerning is that I see a clear pattern in smaller companies of hiring seniors and turning them loose with AI assistants instead of hiring junior devs. The prospect is attractive to nearly every stakeholder and the propensity to put off hiring "until next quarter" in light of this is a constant siren song. There is a lot of gravity pulling in this direction with the short-term thinking and distractions that are thoroughly soaked into the business world these days. Supposedly, one third of Gen Z (20-25 yrs old) are sitting at home, up from 22% in 1990.
I'm one of those seniors happily putting off hiring, but I find the situation and it's wider impact on the future very unnerving.
Well, having AI transform some data into a certain CSV format is orders of magnitude simpler and more straightforward of a programming task than what I try to use it for.
A lot of the discrepancy between people's experiences is simply due to the fact there's there's a massive range of programming complexity/difficulty that people can be trying to apply AI to. If your programming is mostly lower complexity stuff, non-critical stuff, or simply defined stuff, it obviously works better.
I try to use AI when I get stuck on a hard problem/algorithm, hoping that it can provide an answer/solution to unblock me. But when I'm stuck the problem I'm facing is so complicated that there's no chance at all that AI is actually going to be able to help me with it. I see absolutely no point in using AI when I already know how to solve a problem, I just solve it. I only turn to it when I need help, and it can never help me.
>>It gets things wrong enough that I have to manually check everything it does and correct it, to the point where I might as well just do it myself in the first place.
I have had personal experience with this. And seen others telling me as well. These AI things often suggest wrong code, or even with bugs. If you begin your work by assuming AI is suggesting the correct code you can go hours, to even days debugging things in the wrong place. Secondly when you do arrive at a place where you find the bug in the AI generated code, it can't seem to fix or even modify it, because it misses context in which the which itself generated at the first place. Thirdly the AI itself can interpret your questions in a way you didn't mean.
As of now AI generated code is not for serious work of any kind.
My guess a whole new paradigm of programming is needed, where you will more or less talk to AI in a programming language itself, some what like lisp. I mean a proper programming language at a very abstract level, which can be interpreted in only one possible meaning, and hence not subject to interpretation.
"My guess a whole new paradigm of programming is needed, where you will more or less talk to AI in a programming language itself, some what like lisp. I mean a proper programming language at a very abstract level, which can be interpreted in only one possible meaning, and hence not subject to interpretation."
Code generation is quite old though, and also quite common, also outside the Lisp-family. When doing non-trivial systems development in Java you tend to use it a lot, especially with XML as an intermediary, abstracted language.
> I was especially glad of Lazy Loading, where I didn't have to load data from the database into my memory structures; the system would do that automatically.
oh god, I have used Java with Hibernate a lot and once I read "Lazy Loading" I didn't even need to finish reading the post.
I've always found that it's easier to code something from scratch, than to review and fix someone else's code, and that's been my experience with Copilot up to this point. I'm not sure if it's better than just writing code from scratch productivity wise, but it makes coding kind of unpleasant for myself.
One thing I've found about Copilot is that it introduces me to novel ways to solve problems and more obscure language features. It makes me a better coder because I'm constantly learning. But do I want to be spending my time learning or do I want to make that deadline that's coming up?
I feel pre November “dev day” 90% of the time I could trust GPT4 output to just work but post downgrades the increased amount of times I’ve copy and pasted then seen the error and realized there’s unfinished placeholder stuff, straight up parts not done or previous code removed that was important.
Just means I now spend a lot of time rewriting it which I could have just done in the first place but now I’ve wasted time asking GPT too.
A key difference between database mapping and interactive AI tools is the position of the user.
I would not be enthusiastic about a system where I receive database query results for review, before delivering them to an end user somewhere on this planet. However, I am more than happy to get some extra help in communicating code from my brain to a compiler.
From what I remember, lazy loading wasn't part of EF for a long while and even longer for navigational properties.
I am not even sure if it was part of EF4.
The amount of abstraction available in Go is just about right. It gives you higher level constructs, while still being reasonably straightforward to predict memory and CPU performance and behavior.
It is. Oh, and also, Go managed to screw up even the assembly, inventing portable but actually not dialect that uses ugly bits of AT&T syntax, custom operator precedence and in practice is non-portable, forcing you to mix Go-only mnemonics (which might even collide with opcode names on certain platforms), supported opcodes of target platform, and BYTE literals for opcodes it doesn't support, making a lot of your preliminary (N)ASM knowledge useless. Isn't that magnificent?
Gee, I wonder if there's a better way to do so that is not such a lazy job. But doing it properly, like .NET does, is supposedly too much effort!
So you are saying it's even worse than suing anyone over using the language like a certain Java-related company or laying off people off the core language team like a certain Dart-related company?
I find GitHub Copilot close to useless for production code. The worst, most obscure bugs I've had to debug in the last year were all in Copilot-written code. It _looks_ plausible, but it makes extremely subtle mistakes. Occasionally, you have repetitive sections of code where it can copy&adapt lines from the context, but that's about it.
It's a different story for test code. Test code is often formulaic and "standardized" (given/when/then). For instance, I find myself writing the first test case and Copilot can come up with additional test cases. Or I might write the method name ( FeatureUnderTest_Scenario_ExpectedOutcome) and Copilot provides the implementation.
Test code is code. It's as much of a burden as every other piece of code you are troubled with, so you must make it count. If you're finding it repetitive and formulaic, take that opportunity to identify the next refactoring.
Just churning out more near copies is not a good answer.
Absolutely this! I was very guilty of over complicating test code to use abtractions and reduce boilerplate, but it certainly resulted in code which you could not always tell what was being tested. And, you'd result in nonsensical tests when the next developer added tests but didn't look deeply to see what the abstractions were doing.
I now find it is best to be very explicit in the individual test code about what the conditions are of that specific test.
> If you're finding it repetitive and formulaic, take that opportunity to identify the next refactoring.
It doesn't really matter how many helper functions you extract from your test code, in the end you have to string them together and then make assertions, and that part will always be repetitive and formulaic. If you've extracted a lot of shared code, then it might look something like "do this high-level business thing and then check that this other high-level business thing is true". But that is still going to need to be written a dozen times to cover all the test cases, and you're still going to want test names that match the test content.
There's a certain amount of repetition and formulaism that will never go away and that copilot is very good at.
LLMs are pretty good at anything that follows a pattern, even a really complex pattern. So unit tests often take a form similar to the n-shot testing we do with LLMs, a series of statements and their answers (or in the case of unit tests, a series of test names and their tests). It makes sense to me that LLMs would excel here and my own experience is that they are great at taking care of the low-hanging fruit when it comes to testing.
I agree. A very high impact change I made for an application my team is working on was allowing easy creation of test cases from production data. We deal with almost unknowable upstream data and cheaply testing something that was not working out has reduced the time to find bugs tremendously
I think automated tests are the one area that LLMs will truly improve productivity (and overall code quality). It’ll likely also lead to a lot of tests that actually tests nothing, but as a whole, it’ll hopefully be capable of both generating and updating tests if you give it some good inputs to do it on. Documentation is another area where I have high hopes. In the ideal world people update it as they change things. In reality, however, well…
Then there is the design side of things. I really feel bad for designers of Icons now that you can get some really good one really fast by tasking one of the image generating AIs.
I’m not sure LLMs will ever really be capable helpers as far as programming goes. Well I guess it’s two part, they can help with trivial tasks, but they can’t help with anything related to the actual work of generating business value with code. It’s two-sided of course. They certainly allow a lot of people write functioning, though really shitty, code. Which is a huge benefit for a lot of programming tasks where it doesn’t really matter that it’s inefficient and well terrible. We’ve already seen our more digitally inclined employees make great things with power apps, most of which are eventually replaced by more robust software as they scale. But we also see small Python programs helping out with tiny personal tasks around our offices, and while IT operations aren’t too happy it’s generating a lot of individual value that wasn’t there before.
If the code isn't doing anything special, it spits out decent enough code (I am using the paid version of ChatGPT with the various customization).
As someone who spends 80% of his time in the backend, I find it great for JavaScript whereas it's not so good for Django which I know pretty well.It can still be useful though and is often faster than looking up docs for specific things.
Yes, I just used ChatGPT to write me some code to iterate through a CSV and add each row to a system via its API.
It wrote a python app. It hard coded the API key and the CSV file. And then it told me to pass the file name as an argument. lol.
I just asked it to fix that and tested with a two line csv. Worked like a charm and saved me quite a bit of time trying to figure a few new things out.
But a proper programmer would have been slowed down by this, for sure.
A test that tests nothing is redundant and therefore is not a test. I have seen people make claims about "useless tests" when they are not able to reason about the coverage. You should be using a tool to gauge test coverage. Tests should be proving accuracy and precision. It's easy to conflate those or lose sight of one.
Copilot for me is very useful with all the scaffold boring code. It sometimes helps with problems, but I have to guide it, and be very precise with my request.
And even then it happens to ignore context or queries from start or halfway through. I'd rather spend the time coding then trying to bruteforce it to give me the answer I need.
I've found Supermaven to be substantially better than Copilot. The latency is near instant and the results are mostly confined to a line or 2 where the success rate is higher. Meanwhile I agree that Copilot was less than useless for me. Actively hurt my workflow and made things harder.
If you’re not using a language that can properly support algebraic structures and randomized property-based testing you’re essentially getting no guarantees about your code from tests. You wrote the code, you wrote the tests, they’re equally likely to be incorrect.
Humble-brag about how he uses "a language that can properly support algebraic structures and randomized property-based testing" whatever the hell that is.
Personally I use python and solve real world problems.
My statement is clear and straightforward, I’m not sure how to put it any other way. LLM-generated tests don’t make sense as a concept because there are only roughly five properties you actually need to write tests for if you’re writing tests that actually provide any guarantees.
Apologies, but I understand the English words you're typing but I'm still not sure of the intent you're trying to convey to everyone. You're conversing in a very rigid style which isn't sympathetic to how people typically interact.
I could just leave the discussion I guess, but in the interest of discourse, I don't find your statement meaningful because we're not all working in languages that I think you refer to. Our unit tests are absolutely not perfect and don't offer perfect guarantees, as we're fallible and will write fallible code.
And as such, I just don't understand what point you're trying to make by saying that LLM generated tests are no good because they can't offer perfect guarantees.
Ah that makes sense to me, I see where I misunderstood you. When you say you don’t understand what you mean is that you do understand but you disagree with the point.
I’m on mobile so it’s hard to reference what I previously said but I’m assuming my statement needs to be weakened a bit to be correct. What I meant to say was that unit tests provide essentially no value because they can’t offer perfect guarantees, which is probably different than what I originally said. I’m assuming I just said “they offer no value” which is probably false in some cases for some people and some teams depending on their definition of value. My point was that unit tests do not make sense insofar as their purpose is to provide guarantees about the behavior of code because the information they provide does not meet the standard definition of “a guarantee”. For the above mentioned people/teams/situations/value definitions, they may make sense.
Hope that clarifies what I was trying to say.
Regarding languages, algebraic structures can be implemented in any Turing complete language. Likewise with property-based testing (with, eg randomized inputs across the domain). I’d be willing to guess it’s just a matter of education and/or desire keeping most developers from using it.
The effectiveness of a Copilot-like tool trialed at FB showed that 8% of code contributed by participants was sourced from suggestions, but the latter study made no promise about coding velocity: https://arxiv.org/abs/2305.12050. In my own experience the time taken to review machine-generated suggestions often eats into developer time.
This is not a critique of LLMs in general — I’ve found ChatGPT really great for kicking off greenfield projects in well-known languages and frameworks.
Especially when renaming variables that are “immune” to normal refactoring. Copilot handles that pretty well and I don’t have to spend all that focus on such a menial task.
Sure, but doing the refactor the first time, without thinking about how to record a vim macro, then just hitting tab to have copilot do the same change over and over is a lower friction experience
Totally depends on what kind of refactoring you want to do; and how well vim's commands map to your language's syntax, too. (Or whether your vim has special support for your language's syntax.)
I would guess they mean stuff which isn’t just “rename symbol” or whatever, like changing code from one pattern to another slightly different one. For example, I’ve used LLMs to “change this if statement to a switch”, which I don’t think VS Code can do as an automatic refactor using the native tools.
I find it far more tiring because I don't have "micro-breaks" where I'm slowly typing code. I just have to be in a serious "check the logic" mode for a longer period of time.
It also makes it a lot easier for juniors to chuck random code at seniors for review.
Yeah although the frustration of restarting after failed & failed attempts, or introducing bugs in parts of code you didn't want updated is also tiring.
I have a "top class" LLM based autocomplete provided by work.
At first it was a massive pain because I didn't realise it wasn't a "proper" autocomplete(intellisense is probably the king in that regard), and get hit with a large number of hallucinated functions.
This was really hard for me, as I'm slightly dyslexic, which means spotting plausible but bullshit completions is very hard. (I suspect its hard for everyone else too). Worse still, at the time the linter/type inspector was/is very slow so only ran on save/execution.
However its both improved in the last year significantly, and I have got used to it. For me there are a few techniques that help me:
1) It has a recency bias. Which is great for when you're jumping about in code making changes
2) It rewards proper variable names
3) Your comments should say what, and why your doing what you are doing.
2 & 3 should be obvious and you should be doing it anyway. But it really re-enforces that.
However I would really like some UI changes so that I can make _better_ use of the LLM plugin.
1) a completely different colour to indicate that its an LLM suggestion (bonus points for giving a confidence as well)
2) a different keystroke to accept the suggestion. (bonus for partial selection)
Coding speed really is a horrible metric. I can code really quickly, but it doesn't mean I'm doing anything productive or correct.
I'd rather slower coding speed, properly written as it provides higher overall velocity. And velocity should take into account the refactoring that happens months or even years later. Crappy code can look really fancy, and even be bug free, but if it's overengineered and hard to change, it can create long change times or even a full stop in development later in the products lifecycle.
And that's on top of developers losing the understanding of how something actually works. If AI helps create the code that would have been written without AI, then great, but I don't observe that happening, and the code has never been a better idea than then dev could have done without it.
This last sentence is the most important. You can describe your dB schema in words and get laravel migration, models, controllers and if you wanted policies, form requests etc all near perfect. You can get a V1 in 10-20 minutes then go about handling all the actual logic
The "chance of hallucinations" is the tricky bit - if I have to manually check everything it does in case it's hallucinating, then it's not actually a solution. It's not saving me time (as TFA says).
Yeah the copilot doesn’t need to be integrated into every keystroke, just able to analyze the context and kickstart the code. Getting up to speed with new libraries is so much easier with AI instead of the bad old days of trial-and-error-and-marked-as-duplicate
It guaranteed can't have all of your code in the LLM context, or even all of your file (in case of longer, through not even quite long files).
Through it could do stuff like go through your repository(ies) and generate embedding for sections of it and then have a vector database + retrieval argumented generation (RAG) system.
A lead from one of the GH Copilot-adjacent companies was interviewed recently, and that’s precisely what they are doing. They generate embedding of “local” code based on AST (up the stack and sideways, if you know what I mean), and take into account runtime and library versions when doing inference. Sounded like a very interesting challenge.
Yea, that's the way they're going based on the low-memory models they're building.
Basically all the LLM needs to do is translate human writing to some format that a "normal" service can use. That can then leverage the existing spotlight system that's pretty decent at searching stuff on the phone anyway.
Then it'll report it back to the LLM which translates whatever format back to something humans can process.
It seems to be hit and miss, at least with the current PyCharm integration. In some cases it can infer information using other files, in some cases it can't (even if they are open in other tabs).
As someone who switches between PyCharm and VSCode, I find that Copilot seems to work better in VSCode for some reason. Nothing major, but the suggestions I get just seem slightly more relevant to my code base, and are more often what I wanted.
Although I could be hallucinating the whole thing.
Whole-codebase understanding is beyond current coding assistants’ capabilities. To do that you need to add in traditional static analysis. But I’m sure people are working on it!
The last entity I would trust for trustworthy data on how much Copilot helps programmers is GitHub. They obviously skew the scenarios and metrics to find the best possible number to report.
I think the truth as revealed in your comment and many others is that it all depends on the context. For repetitive code or boilerplate, or starting new projects in well-known frameworks and languages, it probably does increase velocity. The other context factor is the programmer themselves. Their familiarity with the problem domain, the language, and the framework all matter, as does the personality and coding style of the individual.
It also depends on what tools a developer is already using as a productivity booster.
For example, I use vanilla web components which have some boilerplate for new components.
I already have a simple vscode snippet that does the job well, with no hallucinations [1]. I've experimented with llms doing the same thing with not great results.
It took me longer to explain what I wanted than it did for me to just write that snippet. Doing it repeatedly and waiting for the results definitely didn't increase my speed, though I was impressed that it was eventually able to figure it out (vanilla web components with lit-html renderer isn't a super common technique). Also, I prefer the pythonic approach of snake-case local variable names. Getting the llm to do that in a js project where it's not super common was another whole iteration.
We've had code generation tools around for decades to deal with repetitive boilerplate tasks. Maybe they aren't quite as capable as when the llm "gets it right" - but I wonder with these productivity claims how they are measured.
Are they starting from zero and a new developer? Or comparing against an experienced developer with proficiency with lots of workflow enhancers like templates, snippets, vim macros, etc...
So I am a scientist and I've generally resisted using llms for my work. My general critique reading others' experience is this sounds like what is needed is better abstractions if all it helps people with is doing boilerplate that needs to be checked. Also no need for "better abstractions" to be some ethereal thing, this just means better standardised libraries and frameworks. May be if the word "abstractions" sounds too bespoke may be what is needed is better tools and tooling.
The standardised bit seems more difficult than it should be as it's essentially a social problem with development in general, but that also is a problen AI does not address, it's merely a bandaid over the problem.
AI will get there eventually, but this current paradigm seems increasingly only useful for spam and shitty clip art. Even so, everyone is throwing absurd amounts of investment capital at it in the hopes that something useful will happen. It's a pretty clear depiction of the investor class being so detached from the technical reality of what they're investing in that they just sit around lighting billions on fire thinking that they're getting richer instead of poorer. Go ahead and buy more H100s though...
I've been finding this stuff genuinely useful for two years now, across Copilot and ChatGPT and Claude 3 Opus and similar tools.
Either I'm a dimwit, easily conned by hype and shiny tools to the point that I can imagine benefits for two years that simply aren't there... or there's something to them.
I don’t think you’re a dimwit but I read your post[1] with an example and I am curious to whether you feel you’re losing something by telling the LLM it’s wrong and to try again, rather than going through the exploratory/iterative learning process yourself. For example, would you have known to ask about GeoJSON if you had not come across and learned about it pre-LLM? More succinctly: do you feel you’re learning more or less or an equal amount when using LLMs?
Absolutely I could have learned more from that particular project if I'd spent more time with it rather than getting the LLM to do the work... but that's why I like it as an example: since it was effectively a distraction (a "side quest") the alternative wasn't learning more, it was not doing it at all (and learning nothing).
I'm able to get really great results out of LLMs because I have 20+ years of experience helping me know what questions to ask of them.
I do feel like my rate of learning has increased though, because I'm much more likely to try out a completely new technology when I know an LLM can flatten the learning curve for me a bit.
> I do feel like my rate of learning has increased
It's the same for me. Lots of experience knowing what to ask for. It does a better job in summarizing knowledge and getting me a relatively coherent explanation. Much faster than using Google Search to find and connect the dots from dozens of pages.
I just don't see much benefit in its reasoning and code assistance features besides basic stuff.
> I'm able to get really great results out of LLMs because I have 20+ years of experience helping me know what questions to ask of them.
While a direct answer is nice, I like an iterative/explorative process because of all the things I pick up alongside it. An example is when I was working on an epub reader for macOS (side project). I wanted to a native layout engine instead of a webview and I decided to go with muPDF. This has lead me to know more about text layout and rendering, and embedding C inside Swift than I would if I just have direct answers for every problem (if I'd know the correct questions in the first place).
I accumulate side quests like that until I can do a nice experiment to learn as much as I can for a particular domain space.
You're talking right past them though: you were working on a side quest you had interest and bandwidth to iterate on.
I agree with the parent comment, my pipeline is already saturated with side quests, I'm already iterating on a bunch of random fun work, and side-project work and work work. So often times the most "LLM-heavy" projects of mine are things I straight up would not do if I didn't have something to get the ball rolling other than more of my own free time which is already in short supply.
Hopping from direct answer to direct answer isn't where I find wonder/fun in programming anyways, but sometimes you don't have bandwidth for the side quest to be fun or wonderous.
So, to maximize the usefulness of LLMs, requires expert level domain knowledge in a field? Sounds like a useful tool for highly trained experts. For the average Joe Developer, perhaps not.
I was trying to fetch key-value pairs out of a database using PHP+PDO the other day and I knew there was a nice easy to do it but I couldn't remember how. Something about fetchAll, maybe PDO::FETCH_GROUP|PDO::FETCH_COLUMN.... what was it?
So I asked a couple LLMs. They wrote out loops for me to format the data how I wanted. I could have copy-and-pasted that in and it would probably have worked. But I felt there was something better yet, so back to Google I go.
It's `PDO::FETCH_KEY_PAIR`. It's built-in. But oddly kind of hard to find unless you know the right thing to search for, and "key pair" was not springing to my mind.
Point is, if you just let the LLMs do your work you won't even find these better ways of doing things. And I'm quite afraid of what the LLMs are going to do to Google, Stackoverflow and documentation in general. In a couple users it'll be ungoogleable too.
As noted LLMs can only give you what you ask for, but for a lot of problems what you ask for isn't what you need; it's two or three steps removed. And LLMs can't tell you that you're doing something wrong; unlike curmudgeonly users on SO or in various forums/channels.
My gut feeling is that we're going to enter into a 'dark age' of coding where a lot of previously available resources are going to be ransacked and made hard to find in favor of big corporation owned LLMs. It's already having an extremely bad effect on search in general; we're potentially only a few fights away from sites like SO having users leave en masse. That's why I think having a strong network of engineers to talk with will become more important than ever, almost a return to the IRC days.
Strong disagree. I have learned so many new things since I started using LLM's.
Some of them I'm actually embarrassed to admit, because I should have known about them a decade ago.
If you work in a small company and you are the most experienced developer, you don't often get feedback on how you can improve things.
The trick is, quite simply: just ask. I regularly dump some code I wrote in a language model and then ask what can be done better.
I would never do that in any online space, because first, I don't wait an answer maybe some day, I need an answer NOW. And second, I prefer to avoid being called a fool.
This is precisely the wrong way to engage with LLMs. If you are asking it 'what can be done better', it'll spit out something. That something isn't necessarily better or not because it has no concept of 'better' or 'worse'.
Ah, so that's why all my code has become more concise and efficient and I've learned countless new tricks that I did not know before and probably would have never found without LLM's.
Too bad I'm "engaging with them wrong", I could have sworn it was helping me.
Seriously though, claiming LLM's don't have any higher level understanding of right and wrong and then extrapolating that to "they cannot possibly be used to improve things" is a very stubborn refusal of the fact that the most logical answer to the question "what can be improved here" is... actual improvements.
They do not have any higher level understanding of right and wrong. You lead the model on by telling it to improve something, so it will rework the code in question and tell you that it's an improvement. Regardless or not if it is. Coding is about 70% subjective, 30% objective when it comes to figuring out improvements because the majority of improvements deal with business logic and things specific to your domain.
You seem to be writing from a lost world when people enjoyed coding because they enjoyed learning new things with every project. I think that's sadly something only enjoyed at work by a very small minority these days.
I probably wasn't clear I still live in that world, both professionally and in personal projects. But my perception is that the vast majority of devs and engineers do not.
You are creating a false dichotomy. Very few people are saying there are no benefits, but many have reasonable concerns both about the current efficacy of these tools as well as the expectation of continued exponential growth which is driving much of the current hype.
What if we have already passed the inflection point where the exponential growth transitions to an s-curve? That would mean that this technology on its own would only get marginally better than is today. Maybe 2, 4 or even 10x better than it is today, but not 100x or 1000x. To break those barriers, we would need further innovations beyond just throwing more gpus and training data at the problem.
I personally am short on LLMs because I believe it is much more likely that we have already crossed the inflection point or will soon. Again, they are impressive but ultimately I think that LLMs will be at best a footnote in history if they are even remembered at all in a few hundred years. But of course I could be wrong.
I've been generally staying away from the whole "imagine what this stuff could do next!" side of the discourse.
My personal opinion there is that if all research froze today it would still take us years to figure out all of the potential use-cases and applications for the models we have access to right now, and how best to apply them.
No I'm with you. I don't use copilot though for undirected autocomplete, I use tools that allow me to give instructions and diff the results, Cursor and Aider are my current defaults, both with Claude 3 now[0]. Always looking for alternatives though.
I think it's always going to be a YMMV experience with LLMs. I'm an extreme generalist with 23 years of experience so it compliments my strengths and weaknesses. I know how to program across 20 odd languages but most of them I need to look stuff up if I'm not using it frequently enough. Now though, don't really need to look stuff up.
For me there are two groups, those that want to use LLMs and are pushing it forward and those that are would prefer that LLMs not exist and want it to be hype that goes away.
Having followed the hype cycles of VR/AR and crypto I can feel a difference. Both of those felt like a solution in search of a problem. The true believers wanting it to be something like Ready Player One with VR (fun story, Oculus/Facebook actually handed out copies of the novel at Oculus Connect 2 in 2015).
The hype around LLMs seems different; like applying a new solution to all the existing problems to see where it helps, and that's just with the current iteration.
[0]: I'd prefer to be using local/open equivalents but the capabilities are still lacking.
You are clearly not a dimwit, and you don’t only have way more experience than most people here, you also have some amazing projects under your belt.
However I can’t help but notice that the vast majority of blogposts, talks, tweets, and basically everything else you do now is around LLMs. Do you not think that’s indicative of this being “hyped” and “shiny tools”?
If hype just means people are excited about it and talking about it, it should be a positive signal about the merit of the thing. I think people get the wrong idea by measuring P(X is good | people are excited about X) and finding it to be low. But P(X is good | people are not excited about X) is vastly lower, and 'hype' as here used is not really discrediting.
> I think people get the wrong idea by measuring P(X is good | people are excited about X) and finding it to be low. But P(X is good | people are not excited about X) is vastly lower
Only if you're including things that don't exist. If you restrict yourself to things that people might conceivably talk about, that probability is actually very high, for the simple reason that bad things are exciting specifically because they are bad.
Here are some boring things:
- bread
- sunlight
- air
- water
- parents
Note that starvation, darkness, suffocation, dehydration, and being orphaned are all much more exciting than their opposites.
I'm writing a lot about LLMs at the moment because they're the focus of much of my work.
I don't see that as a hype thing - in the past I've had other topics I've focused on, not because of hype but because those were the topics I was spending the most time with.
I mean, already ChatGPT alone has made more cultural and historical impact per capita than the aforementioned gentlemen, simply because nobody cares about philosophy. But that was never my point. My point is: you can think of uncle Witt kind of like "postmodern" twist on Socrates, i.e. they're both effectively talking about the same thing: determining optimal form for computing language. Wittgenstein had made considerable progress here, i.e. language games, and now we finally get means to literally compute these. To me it's absolutely clear; this is the most important thing to happen in philosophy.
I basically say the same when people ask me if the AI bubble is going to burst. While I agree that the current push for LLMs should be taken with a grain of salt, I don't think it's a bubble either. I have been finding it useful too, and I don't think I will stop using it because it will look less shiny in the future.
Nah, it seems a lot of people are using them wrong? OP seems to also fall in that category. I sit next to many people in pair programming situations and it’s quite weird to see very smart programmers using gpt or copilot; they are the ones shouting ‘stochastic parrot’ on forums and yet expect some kind of mind reading magic when using these tools. When I show that you can put a comment for your function, like you should do anyway, and it writes the function, it usually clicks. Many still find it ‘faster and easier’ to write the function themselves and that’s fine, but, like OP, doing things like entering ???? and generally expecting a useful result will not be optimal.
You make an interesting point. I once worked with someone who was brilliant, but had gotten into the field via a math degree, and we were building an enterprise-software product.
We tried pair-programming, and he cracked within a few minutes. He couldn't articulate his thought processes verbally. Sounds similar to the scenario you're describing.
Your counting of 'cons' seems pretty idiosyncratic.
You know there have been lots of other technologies people have been working on in the meantime and concurrently? Some of them more questionable, some of them less.
Yeah; its concerning how depressingly similar Sam's recent language is to, say, SBF's in his prime. Waxing poetically with faux-intellectualism about the sheer addressable market his business will capture (all of humanity) (and funded by the government) [1]. You can just listen to key players in the space, how their language has changed over the past year, and recognize that we're probably very near a local maxima / plateau / AI winter.
Reality is: AI is startlingly expensive. Stupid, stupid expensive. Microsoft is building "Stargate", their $100B AI-focused supercomputer [2]. The money to build that isn't coming from selling AI services. Its coming from their old, boring, tried and true businesses; Windows, Office, Azure, and M365. Xbox is dying before our eyes. Their tried and true businesses make great money but aren't growing. Its up to AI to prove to investors that their P/E multiple is justified.
> It's a pretty clear depiction of the investor class being so detached from the technical reality
I think, their ROI model is much different and subtle. Few decades back, everything ran on-premises and costed ok ... Then a huge investment had been shovelled into fancy frameworks and pushing strange paradigms (microservices, distributed systems, monoliths whatever flavour works for the tech giants) so now people just put together whatever and needs massive _cloud_ to run basic things that used to run on few MHz devices.
In the end, the ROI is NOT from these models, those are just gambles in case one actually becomes a king maker. The actual ROI is from the _cloud_ where innocent people will rent expensive hardware to try and utilise these models and funneling wealth to the actual investments(cloud operators, hardware vendors) which are already part of portfolio of these investors. Basically, $1MM invested in random shiny startup will inspire other shiny startups to also spend similar in race to become unicorn, all the while the cloud operators and GPU vendors are laughing to the bank preparing their shareholders(usually same investors) dividends.
I think the nomenclature of calling generative AI an "AI" is just hype and that leads to disappointment. After all, it's much more exciting for investors than calling it the world's most expensive textual/audio/visual autocomplete. But that's really what this generatoon currently is.
And it can feel like magic, because we're pattern-seeking and pattern-matching creatures, so something that seems to intuit the pattern we're looking for (even if imperfectly) can feel quite a bit like reasoning with another human. To be a bit less generous to humans, it's often the same as talking with another human, because we spend much of our lives writing boilerplate or making small talk or otherwise just kind of on autopilot and answering patterns with expected responses without seriously engaging our complex reasoning. (Do you have to really think about how to map-filter-reduce a dataset anymore? Or think seriously about how you really are when someone says "how are you"? We don't usually expect other people to, and most of the time we're satisfied by exchanging recognizable / somewhat coherent language patterns which we can count on each other to fill in the gaps of. But this is not the intelligence in human intelligence).
The corporate hype machine works overtime to hype it as a solution, but that's just what they do. That's all spin. We see through it now, and we're getting to the point where people are asking the only important question about this tech, which is, how much will you pay for a nerfed autocomplete to do X?
“how much will you pay for a needed autocomplete to do X?”
i think this is a super important for anyone playing in the LLM space to calculate.
currently “AI” providers are selling electricity at a loss to demonstrate product “value”, so even while asking questions is “free” today, there’s actually a finite resource under the hood that needs to have the bill paid in the end.
an estimate, but from historic trends, free bills come due in around 7 years
Even _if_ we don't see our current 'AIs' become more intelligent, I am very certain that we will see the amount of electricity needed to produce 2024-state-of-the-art results drop dramatically over the next few years. People are only just figuring out how any of this works, and they are happy to get any reasonable results at all. They aren't really competing on costs like energy efficiency, yet.
---
However I don't really think that 'if' will come to pass: I expect that we will still see lots of advances in the 'intelligence' of these models, so people will compete on these, and worry about electricity usage afterwards.
> The corporate hype machine works overtime to hype it as a solution, but that's just what they do.
the CoPilot product has billed hundreds of millions of dollars, and is billing right now. Secondly they copied all the GPL code and put it into a mixer, like a bitcoin mixer in a legal way.
Sage words about "what they do" ring hollow while the only measure that counts is money. This situation needs legal action.
It's definitely useful, but in higher level chat use cases, not so much high-precision generation (yet). Rubber ducking, brainstorming, and search-like use cases are definitely a level above what we had.
It is honestly shocking how different people can be, even within programming. Been finding LLMs very useful for pair programming and low-effort spin-ups of projects I would just never do without the tools. Frankly, I'm waiting any day for an architecture to arise that takes lossy LLMs and has them error correct enough to produce 99.9999% reliable progress on simple tasks cheaply. This feels incredibly close, and if the unit of intelligence of each step of those is around GPT4 level? Jesus christ. Programming massively automated overnight.
Any programmer that is this dismissive of these tools frankly... hasn't been using them right, and has no imagination - or a massive ego. This stuff is still far too early to make such judgements.
AI art is better than most humans, including most artists.
You can take the outputs and immediately turn them into animation. Suddenly I, as a single individual, can easily make film content without figuring out set and lighting logistics or roping in dozens of people.
I've said it before and I'll say it again: The fact that you didn't think to include an example is telling. If you think AI 'art' is good, show me, and tell me why you think it's good.
Conflating "art" (the "snobbish" definition if you will) with realistic depictions of imagined worlds, stories and settings is not helpful here. There are so many talented, imaginative people out there that have previously had Zero ability to manifest their thoughts into a visual form. AI has enabled that on so many levels and we should be grateful for it.
Then why not show me? Why is getting AI 'art' advocates to actually post something they think is good and explain why so hard? Where are these talented imaginative people and what are they doing?
You also don't see the everyday people using it to generate pictures about their own experiences and lives, using it to populate their DnD worlds, their personal unpublishable fanfics, etc.
This part is important since I'm still yet to see evidence that the AI artist is capable of thinking more deeply than "(full_body:1.2) , best quality, (8k, RAW photo, best quality, masterpiece:1.2) , (realistic, photo-realistic:1.4) , ultra-detailed, (Kpop idol) , perfect detail, looking at the viewer, makeup, pretty South Korean lady wearing a bathing suit, wet skin, light reflections, angelic cute face, half body shot, short brownish black hair".
You've gotten me confused about what you're going for. I figure the goal is "this is what this would look like, if there were a person with a balloon for a head and we captured him with a video camera".
The linked video isn't there yet, but it seems obvious what would or wouldn't make it good, and the list of traits you describe is really all I want in an image generator. I don't care what the girl is thinking about; that doesn't show up in the image anyway.
I do want to be able to say "no, make the hair shorter", or "turn her to face more toward the left of the frame" or "have her pointing at the door in the background". If that command takes the form of a comma-delimited list of traits, so what?
The balloon is a different shape from shot to shot, and - much worse - in a few of the shots it appears to be hovering in front of the rest of the image rather than occupying the space where Air Head's neck should be.
Is it better than I could do? Yep.
Is it failing a bar that any human would pass? Yep.
Is it better than relevant artists? Nope. But it is cheaper.
Literally everyone in this space is working on controllability. The cherry picked issue you cited has hundreds of the best minds tackling it as we speak.
This is the worst it will ever look. It's only going to improve from here.
Compare that to the early days of silent film. The profess with Gen AI is astounding.
We're going to have Disney/Pixar and Scorsese outputs by the end of the decade. (I'd be willing to wager even sooner than that.)
> The cherry picked issue you cited has hundreds of the best minds tackling it as we speak.
Did you mean to respond to my other, nearby comment? The issue I cite above isn't cherry-picked in any sense; it's a big, glaring problem with what was cited as an example of "good work". If someone's head is a balloon, the balloon should occupy the same position in 3D space as the head would.
> Compare that to the early days of silent film.
This is an interesting comparison. I don't think it really works. Early silent films were aware of what could and couldn't be done in the medium; there aren't any that rely on a nonexistent soundtrack. Generative AI stuff doesn't seem to be very concerned with "what kinds of things can we do well?". Instead, they're attempting everything, almost none of it is being done particularly well, and there are theoretical arguments over whether it makes sense to try to improve on individual tasks.
I don't get it, show you what? You want me to google it and pick a few examples? What purpose would that even serve? It's not even my main point (that you ignored), and I sense you have some "angle" you're pushing because you think it's some sure-fire put-down of AI generated images.
And yet you didn't make an attempt to name a single one. If there are it shouldn't be this hard to provide an example. Could it be that they're all so forgettable that they're completely gone from your mind five seconds after you scroll past them?
My friends and I are making an animated film. Here's a fully controlled test shot from the aesthetic board we created earlier in the year (so it isn't indicative of our current progress, themes, or quality):
I was somewhat with you through the first sentence then it got a bit breathless. As someone with some graphic design experience but who isn’t very good and can’t really draw, I used Photoshop and it’s genAI capability recently to put together a book cover for an ebook.
Would it win any awards? I’m pretty sure not but it’s more than adequate for its purpose and is almost certainly better than anything I could have come up with even if I used some CC artwork.
ADDED: I also had something of a vision for what I was looking for and (I think) enough of an eye to fiddle with things and get to the point where I went "This isn't half bad."
I enjoy working with interns, you can see them learn and they are always making new mistakes. I get a return on the effort of training them. They might even convert into full time employees and take some of my workload.
I don't get that feeling from the LLMs. They have about the same skill level of an intern, but they don't _learn_. I can't offload any work to them, and they take the same level of effort to manage.
I'm not in this job to manage interns. I'm in this job to solve problems. Training the intern is a payment by current me for future me.
This isn't like going from books to Internet search or StackOverflow. It doesn't provide an immediate benefit to me, and I don't benefit in the future from my contributions. I'm not seeing the share-alike vibe necessary for scale.
I want tools that make me faster and more efficient. That's how I keep increasing my wage. Maybe if I could _see_ the AI learn from my training? Maybe if I saw benefit from the effort? However, right now, I'm paying for the benefit of training the LLM.
I have a similar take. As yet I'm unconvinced by claims that using an LLM and correcting the output is faster than just writing the code. What happened to "reading code is harder than writing it"? With an intern or very junior developer you have hope that in a few months/years time they'll be reducing rather than adding to your workload. With an LLM, either you're using it for really rote or boilerplate tasks (fair enough) or you spend so much time looking for subtle errors that it's not worth it. Plus, much less fun than actually crafting something yourself.
The problem is that there’s always a communication cost to outsourcing something. And you can’t outsource a whole project to an LLM, like a human intern. You’re just outsourcing one micro-, sub-task after another. To use the human analogy, it’s more like you’re standing over their shoulder and telling them what function to write, one after the other. And they’re SUPER fast with small, pure functions, but they get confused with anything else.
Is that a faster way to program?
Maybe? If you can fluidly decompose things into small functions in your head and the problem can be solved that way?
I don’t know though, I find myself using chatGPT for bigger meta questions much more frequently than the Copilot autocomplete
From my experience, Github Copilot is useful for its seamless autocompletes. But Github Copilot chat, a new feature, noticeably lags behind GPT-4. I get far better performance copying and pasting code into ChatGPT (GPT-4) than using Github Copilot. Both definitely help productivity, especially when generating code using an unfamiliar framework or library. It's great for generating skeleton code for small programs using new libraries, but it in its current form can only get you around 80% of the way there. You will spend hours fighting with the prompt to try and get the remaining 20% complete when it's much easier to just finish it yourself.
And for those who say that it doesn't speed up development, then maybe they're right, because the main benefit isn't at speed. It reduce the devs mental / cognitive load and pass it to the ai instead, we just need to check and review the result.
If it doesn't monumentally increase the speed, it helps devs to stay productive longer
It's shorter, you're working directly with the code base you're already thinking about, and it's plugged into it rather than having to start a new mental task when you're looking at a pull request.
When I write code, I usually get a complete understanding of the problem and constraints. I have a picture in my mind of all the edge cases and branching that can happen, which most of the time allows me to write correct code without bugs.
When I review code, I am completely unable to get this mental picture, and I usually miss a lot of issues that I would otherwise have avoided.
When you review pull requests you're unable to get the mental picture. But it's different with copilot because it fits in your mental picture, you can compartmentalize what you're working on, and the copilot output can come in the form of small pieces that are easily understood and slot in the prepared places where you need it.
I agree 100% that reviewing human code is a lot of work but not with copilot IMO because of the immediacy of it all.
For me to understand what you mean here, I'm going to need examples... for the stuff I work on, the things you are saying here are verifiably impossible.
There is no possible way to
1. get a complete picture
2. understand all the constraints
because we pull data from the real world, and the real world is constantly changing, and we still don't even have a single theory of everything in physics.
In some cases where I often use it I find it easy. Usually I give an ORM class and ask it to parse to mysql / postgres create table. Tedious tasks and easy to review.
Another time I asked to generate a code to listen to SMTP server using a library, and again it's easier to review because I was not familiar with the library. From that I can make my own adjustment.
This has been exactly my experience. It is great at autocomplete and boilerplate stuff. It's very hit and miss for StackOverflow type of stuff. And it's usually a complete miss when you ask it something where it needs to understand even basic things of what you're actually trying to do. It can also get very confused and circular when it hits those edges as well. I've found myself fighting with it, and so part of my use of it has been learning when to let it go.
I find ChatGPT 4 is the best place to ask a question, then often use it's answer to find more in docs or stack overflow. It's basically a better search engine for certain coding tasks.
I wasn't overly impressed with ChatGPT 3.5 as it gave me wrong answers enough of the time that it cancelled out the amount of time it saved me.
GPT-4 Turbo - the much cheaper version of 4 - was announced on November 6th. So there's a good change they've switched to that at some point in the last 6 months.
That's been my experience as well. Doesn't help me that much in Django where I am pretty familiar. It helps me loads in the frontend, where I am less familiar.
I don't really get the value proposition, for every second you spend writing something you spend another 10 verifying that it was done properly. Speeding up the process of writing is sort of a waste because of Amdahl's law
How does copilot et al. speed up verification and review? For example, a C program is harder to verify than a Go program doing the same thing.
Suppose you use it to verify something in an enterprise context. Are you confident in the result? Personally I'm too paranoid for that and I think more people should be
Beware the hype and those pushing the hype, especially when certain entities have wares to sell. While you should keep an open mind within reason, trust your experience and basic reasoning before you throw everything out to buy the snakeoil.
This is where I stand. I think people fall into the trap of believing that it is reliable and useful because it can occasionally spit out boilerplate code that matches what boilerplate code is supposed to look like.
Which is a very dangerous place to be in, because the more you assume that the code generated is reliable the more likely hallucinations are to sneak into your codebase. And now you don't have a good mental model of what your code is doing because you didn't actually write any of it.
I have a feeling that codebases using and relying on copilot are, essentially, pushing disasters later down the line when bugs pop up and no one has the ability to actually debug the system.
I am so confused. When I write code, I think "Okay I want to write a for loop that does xyz." How is it hard to know if copilot did or didn't do what you wanted?
This is how I've used it:
"I want to write a function that reduces a map of customer data to a list of their
phone numbers from their primary addresses only or contact address if there is no primary address"
and then you look at the resulting flatmap, filter, reduce blob of AI generated code and figure out if what it does is correct for about a minute.
This was an illuminating comment because I think I finally understand why people have wildly varying experiences with CoPilot.
I think if you regularly use CoPilot to write entire functions or use its prompt mode, you will spend more time verifying its output is accurate than it would save writing the code manually. If instead, you use it iteratively via autocomplete to write small fragments of code, a line or two at a time, its trivial to verify its correctness, and it will save you a good 20-30 seconds at a time, adding up to large savings over time.
I exclusively do the latter, so I find it incredibly useful. The few times I've tried using the prompt mode, or comment-driven code generation, it's been very average or awful.
I use combinations of the two. Sometimes it really is worth it to write a comment on what one is trying to achieve and just let it autocomplete the entire thing. And sometimes just for autocompleting a 'for loop' - a small block of code that one already has in their minds.
I feel like this bit is important for this discussion: "...I do not use Copilot for my day job. I use it for my own projects only..."
My guess is the people that don't use Copilot because they don't trust it, or don't like the code it produces, are probably trying to use it in a professional scenario where they have strict coding standards. For hobbyist use or side-projects, Copilot is an incredible time saver. I use it with my Python side-projects, and it's truly amazing how much time it saves me. Some of the most common use-cases where it saves me time:
- Adding docstrings. Just type `"""` after a method name, pause for a second, and Copilot will generate a pretty decent docstring. And usually it will mimic the style you've used to write other docstrings in the file.
- Writing tests. This one can be a bit hit-and-miss, but I mostly get good a good starting point, and in the best case it will suggest test scenarios that I might not have thought about.
- Creating basic functions/methods. I have terrible memory and usually need to refer to the docs for even fundamental Python code, e.g. reading or writing CSV files, looping over and modifying dictionaries, etc... But with Copilot, I get great results, often just by typing `def open_csv_file` and then pausing for a second to get the method code. If that doesn't work, adding a comment with what you want the function to do before writing the function name will help a lot.
- Adding type hints to existing code. Just type ":" and pause for a second, and Copilot will almost always provide the correct type hint. And pressing a space after the closing parentheses in a function, will almost always return the correct return type hint.
I only use Copilot for my day job, because they paid for it. When I'm coding my personal projects on nights and weekends I find myself missing Copilot and I'm contemplating paying for it myself. I don't really trust it for everything, but sometimes it surprises me with how much it gets right.
I just can't justify paying $19/month personally for the "Business" plan so they don't use my proprietary code to train their AI.
> I just can't justify paying $19/month personally [...]
Why not? Is it a significant fraction of your income? I find that the cost of hardware deprecation, electricity and internet access probably amounts to a lot more than that per month for me already. (Not to mention the opportunity cost of my time.)
It's fine to be frugal, but it's also useful to compare your actual costs.
(Though that's not always appreciated. I got into some minor trouble at an employer when I pointed out that the 15 GiB of storage space they 'graciously' gave us should cost them less than the company toilet paper we use every day. And I said 'should' because that's internal storage space, and I used numbers from AWS. But if they are paying significantly more for storage than what AWS charges (with enormous margins for Amazon), we also have a big problem.)
I used ChatGPT 4 while doing some freelance projects, and considering the time it has saved me, it's easily worth the 20 bucks a month. It is by no means perfect, but I look at it more like a super powered stack overflow.
About all I really use Copilot for is auto-completing console.log statements. That's most of what I like it to do. I don't need an AI to write code for me. I already know the code I'm writing better than the AI does. Sometimes the AI guesses what I'm about to write and fills it in, and sometimes I let it, but I don't trust based on my experience with it, and then I have to read a paragraphs length of code to make sure it's right before I allow it, and it's usually wrong so I've just wasted some time thanks to the AI. It gets it wrong more often than it gets anything complicated right, and often suggests buggy code. I can't really see paying for that for myself. I don't think auto-completing console.log statements is worth $20/month.
Author here. I probably should have stated this more explicitly in the article, but my main concern with using it for day job is copyright. I don't want to accidentally commit copyrighted code into the codebase, and I work for an open source project.
I still haven't found any interest in this wave of AIs for coding, mostly because writing code hasn't been a bottleneck for me since I learned vim and started using it as my only tool.
Knowing what to write and how to organize the code is the really difficult part of our job, and I am unable to do this correctly when reviewing stuff written by others (or by an AI).
I have been wondering if people are actually using those to save time writing boilerplate code, in which case it would make sense.
Vim has greatly increased my productivity by allowing me to save time for all repetitive tasks, and it's consistent and reliable (but also super hard to learn).
Maybe it's just a completely different approach in resolving the same problems?
Same. Most of my time is spent thinking about the domain space, constraints and processing flows. Refreshing my memory about a method, a class or a specific how to is fast based on a combination of Dash (offline docs), PDF Manuals, and opening the libraries pages in advance. And with Vim, I can copy-paste code while having complete control over the code. And thanks to the buffer model, I can have everything I'm thinking about in front of me instead of one single file (VSCode Tab model). Boilerplate has never been a bottleneck.
And I like to have an idea of everything I have to manipulate, variables and property, functions and classes, modules and files. If something has not been written by me, I have some good reasons to trust it, either because I reviewed its source code or the library reputation (I'd probably read the source code anyway because some documentations are not good). IDEs have been able to provide the kind of completion that I like (auto-importation, symbols) and information I want (function signature and class properties).
I'd rather take a moment to read a library's documentation and code examples than see ChatGPT generates something that has a non-zero chance of being an hallucination.
I don't use it to build a lot of "features" but it really is great for eliminating a lot of tedium: stuff like boilerplate, building up a hash using values already in my code, etc. That's the stuff that is soul-draining.
A couple of days ago I was building up a Dockerfile for a new Rails project and I literally just let Copilot build it for me instead of look it up. It was probably 95% correct.
Writing fairly standard typescript APIs and React code, Copilot is immensely helpful and almost always correct. It definitely hits a wall when I am working on something sufficiently esoteric. As does ChatGPT, etc.
It is incredible that you can replace 90% of upwork/fiverr/juniors(and many seniors for that matter) with a $20/mo tool for doing the most common work like api integrations and frontend work (both of which I find the most boring work outside the non programming like devops and, worse, fighting tools and versions because of useless updates).
> It definitely hits a wall when I am working on something sufficiently esoteric.
So do humans. Hiring humans to do that work well is hard and expensive, so it’s not strange that is harder for LLMs as well.
I am not sure that finding things LLMs are shite at makes them less productive for most things programmers are doing during the day, which is quite boring stuff. In my reality, which, before mid last year, had 100s of people (across clients I work with, not my company) doing integrations and frontend work, now fired most of these and focused more on optimising the pipeline for copilot, chatgpt and opus.
My friend who runs an outsourcing operation in India almost went out of business mid last year when he saw his EU clients cancelling all his contracts because his people could be replaced by openai; he pivoted by having his people use ai, do ai projects and becoming more (everyone already was that but it was more heavy on the coding) business analyst heavy. He fired no one and is doing better than before now, but he doesn’t hire ‘programmers’ anymore.
I’d love to know the specifics on the ppl hiring the consultants and the work output. In my experience you need someone with experience to chain together any genAI output, as well as validate that it’s even correct. It’s not like it magically does everything for you.
Correct, but that was also the point; all these ‘consultants’ are senior programmers. Nothing is magically done; they use the ai to do the work they were mostly doing before with more people/themselves without ai; now they just focus more on business side. Higher pay and harder to replace (for now).
My company does the same but in a niche; our specific tooling is easier to automate (which we did/are still doing) and we notice it is getting easier with the llms getting better. A year ago we still needed a human to evaluate even the basic scope of work, now llama3 + some tooling we built can automatically get us a locally generated report.
Ah, but those were not consultants or maybe you didn’t mean that. Most of these clients hired people to augment their in house EU teams so they don’t need to hire more programmers locally; after 6-12 months of using copilot/gpt they found out they now could do that work in house with only the in house people in the same time. So they let go of the outsourced team members. I see this happening everywhere. It’s mostly the people that do the ‘light’ work; crud, integrations, frontend, data cleaning that are getting removed; this is a massive portion work wise, but simple, repetitive and boring. Things like business logic, workflows, data science, ml, performance and devops are done in house or still outsourced.
I don’t know if Copilot has made me more productive, I think I get the same stuff done, but I think it has made getting the stuff done more enjoyable. At times with better quality at times with worse probably.
This made me think: the thing Microsoft has been good at for a very long time is developer tools. Think about how much better visual studio is was than Xcode.
If Microsoft focuses copilot on simply making things more enjoyable rather than do the work for you, I think it will be an amazing thing.
Copilot certainly increased my productivity. At least in a very specific context - C programming and working on some small subset of llvm internals. I am not very experienced C or C++ programmer, nor I know anything about LLVM API. Plus - copilot is very good at autocompletions. Yes, the code it generates come out often incorrect, but even my low level expertise allows me to spot issues. And even if the code is incorrect - it gives me clues where to dig. Even such a small detail as autocompleting debug printfs is a really big time-saver for me.
I also think that co-pilot vs code integration extension is big part of it. I don't think I would be as productive with chat UI alone. I wonder if there is alternative I can use with local hosted llm that can give me full block autocompletions similar to co-pilot extension. Everything I've seen so far was at best line at a time autocompletion.
> At least in a very specific context - C programming and working on some small subset of llvm internals. I am not very experienced C or C++ programmer, nor I know anything about LLVM API.
I'm not sure what you're working on, but as someone dependent on LLVM-based tools, reading this is kind of terrifying. I'm sure your intentions are good, but I hope you're at least making it clear that you're heavily reliant on the output of Copilot when making contributions to something. C and C++ in particular are not the kind of language where someone inexperienced can be trusted to review.
> Yes, the code it generates come out often incorrect, but even my low level expertise allows me to spot issues.
Can you give an example of some issues? C and C++ are compiled, so there will be obvious compilation errors, but there are also a lot of "foot guns" that are not obvious and will bite you at runtime.
I’ve configured Copilot to generate suggestions only when I explicitly use a keyboard shortcut and it’s been a game changer for me. I’ve developed a good intuition of what it’s good at and now I much prefer invoking it manually. When it automatically suggests garbage on every keypress, it gets frustrating pretty fast.
I used it for a few months and I think it made me less productive to be honest. I developed this strange "autocomplete disease" where I'd wait for autocomplete when I could've literally typed stuff out myself, it's like some weird tool dependent mental stutter.
And given that I've worked in the languages I write for a living, mostly C and Python for years it's not like it ever suggested anything I wouldn't have known how to write and especially in C it can produce some catastrophically shoddy code.
The one case where it's kind of nice is if you do a side project with a library that doesn't have good documentation and you don't know the API and it spits out some nice example, but honestly I don't consider that worth paying for.
It increased mine yesterday, by quite a bit. I was writing some JSON-to-array-of-arrays conversion code, and when I started to type it out -- dreading the inevitable off-by-one hunt -- CoPilot spat out the code, and it was correct! It helped that I've done this many times, and knew it was correct, but it saved me needing to think about it, or track down any bugs.
This conversion code was in service to using Plotly (the JS graphing library) for the first time. I was trying to get the axes of my surface plot to be labelled with the correct values (instead of just their index), and spent probably a half hour looking at the demos and the docs, and not finding what I needed. I asked CoPilot in the VSCode chat window about it, and it gave me exactly what I hadn't been able to find. As soon as I saw it, I understood the natural extension of the API call I was seeing in the demo, but I still hadn't found a second data point to even start extrapolating, if you take my meaning.
Anyway, I'm paying the $10/mo out of my own pocket, and yesterday -- at least to me -- it paid for itself for the month in just those two examples. I find it delightfully surprising. I just need to find a way to make it stop suggesting comments. I can type my own comments, thank you very much.
I don’t know if this is possible to reconfigure or turn off, but I HATE copilot guessing at file paths & file names (as I write an import for example). Usually I find the file reference broken because copilot hallucinated 3 directories.
Autocomplete for files & paths worked great before copilot got involved.
I mainly use it as a smart rust auto-complete and I would say it saves me probably 30% of coding time, everything boilerplate/tests it gets super well. I am a slowish typer though. It also makes it more enjoyable, just my experience.
Anecdotally I can say it's improved my productivity. Mainly because it's just a slightly better autocomplete than PHPStorm/Jetbrains IDE provides. Although they are now pushing their own AI tools for an extra fee pretty heavily but unlikely to pay for it given copilot pricing is pretty competitive.
There are occasions where I'm just not in the mood to work something complex out so sticking a temporary comment in telling it what to do can sometimes yield a good enough result, all be it with some occasional minor tweaking.
Where it's really shined is with repetitive tasks. It's nothing close to perfect but it's certainly a welcomed little helper even if it does get things wrong sometimes.
In terms of the original question, I think its the cognative mental load thats being helped here more than anything, not having to waste brain power on the mundane bits is the big selling point for me.
I am a 20+ year polyglot developer. I just cancelled my Github Copilot sub.
Sometimes, it was magical, but more often it was mediocre and just got in the way.
I generally have better luck with a back-and-forth chat with an LLM on the web and then cutting/pasting the snippets into my IDE. I can do those for free.
+1 on being more productive with chat than Copilot.
I find copilot mostly to be a useful Google replacement, e.g. if I forget the syntax for a for loop in JavaScript I can just write a comment like `// For loop over X` and it will spit out the boilerplate I need without breaking my flow.
In either case it's really important not to get stuck "trying to make it do what I want". If an LLM can't solve your problem in 3 tries, you're probably asking too much of it.
I find ChatGPT4 to be substantially better than Copilot at writing Go code. I wish it easily integrated into an editor and I didn’t have to copy paste so much.
Github Copilot definitely improved my coding productivity. However, most of my time is spent thinking about the problem to be solved than the actual coding.
For my 2 month experience copilot is showing sometimes very good ideas, but usually it suggests rubbish. Copilot chat is trying to analyse code and answer but it also needs to be controlled. Overall feeling is to have a very fast and intelligent but rather foolish Junior Developer by your side, who reads a lot of documentation, but needs to be fully controlled
Though I have never used CoPIlot myself, I have used Chat GPT for writing emails, adding doc strings, helping with basic boilerplate part like reading a file, generating JSON in a certain structure that the business needs, I can say it has helped but I still don't really trust these models.
I've felt pretty productive using ChatGPT to generate snippets for either an API/SDK/language I'm not too familiar with, or moderately simple things that I'm too lazy to do myself. Sometimes that can be a huge time saver especially if the documentation is poorly written, other times it might actually be slower as I wrestle with getting a good prompt (but with lower cognitive load than were I to do it myself). I still don't do it frequently enough to say it has changed the way I work though.
I'm still not sure if this is a transformative way to code or not. I think in the future it might be able to do to programming languages what programming languages did to assembly... but it probably won't look like a chat UI or an autocomplete plugin.
> You might be surprised to learn that I actually think LLMs have the potential to be not only fun but genuinely useful. “Show me some bullshit that would be typical in this context” can be a genuinely helpful question to have answered, in code and in natural language — for brainstorming, for seeing common conventions in an unfamiliar context, for having something crappy to react to.
> Alas, that does not remotely resemble how people are pitching this technology.
This is exactly how I use it. It's interesting to see what the "expected" thing would be. Even if I don't use the generated code, the time spent reviewing it is not lost. I might learn about a new library, or convention, or way of expressing something syntactically which I wouldn't have thought of. Even when it gets things wrong, it can get things wrong in a way which lets me know that I've made an incorrect assumption elsewhere.
> Without it, I find myself getting grumpy a lot more often when I need to write boilerplate code - "Ugh, Copilot would have done it for me!", and now I have to type it all out myself
It seems to me a good IDE generates most of the boilerplate code I would need to write. For Java I use the community edition of IDEA, that's open source, it runs entirely locally and there's no lag. No open source code gets eaten and no forest get burnt.
I spend a vanishingly small time writing boilerplate code. Or so I think. Am I just not noticing it, or is it rare and I work with high quality code?
My job subscribed to Copilot but I ended up disabling it after a few days, I just found it enormously distracting and providing little value. From what I saw the predictions were very accurate, but constantly having something pop up as I type just kept nudging me out of my concentration state. Oddly I don’t feel the same about typical IDE autocomplete, so I’m not sure where the cutoff is there.
For me it felt like having someone trying to finish every sentence I’m speaking: even if they’re right 100% of the time, it’s still very distracting and more than a little irritating.
Silly. Many big corporations trust Microsoft touching their code. There are enterprise policy controls for Copilot, which includes data privacy.
If you want to make this argument, then the realistic reason would be that Copilot hasn't yet completed the SOC 2 compliance process. That would be a valid reason to wait, for corporations that are listed on Nasdaq or work with very sensitive data. But that's far off from the comment I'm replying to.
ChatGPT4 It won't accept moderately large files that I have given it. It was actually just an HTML file, I wanted to ask about css in it. How have you managed to give it a "larger codebase"?
Not targeting anyone, but after clicking into the link, I (1) noticed it is mdbook, (2) looked at the author GitHub and noticed C and Rust, and (3) predicted that Copilot is gonna be unpredictable with random serious unnoticeable mistakes. Looks like I got it right.
It is surprisingly helpful on old legacy projects with multiple contributors. It will imitate their code style and variable naming, even if outdated or poorly constructed. Saves time looking through spaghetti code to figure out why some function wasn't abstracted or named in a consistent way.
The answer is yes for most people. You can type in pseudocode for simple things and out pops something that basically works for a significant number of languages. Its absolutely terrible for really complex problems but most code is not solving complex problems.
I see this claim a lot, and I have no idea whether or not it's true. I just struggle to imagine that there are millions of people employed, well compensated for doing pretty easy jobs.
In my professional work and my various unrelated personal projects and open source contributions, if you want to do anything useful or interesting, you get into the weeds very very quickly. The territory where LLMs are nearly useless. I'd say maybe 5% of my time is spent writing "easy" code without major complexities.
I really wish writers of and commenters on these articles (especially those less impressed) would clarify the version they have used and whether they have experimented significantly with state of the art.
Overwhelmingly in my experience of reading these back-and-forths, they are referring to 3.5 at best (as in this article).
Yes, of course I accept there are users of the paid version (4+) and class competitors like Opus that don't find they help their productivity but unfortunately any comments referring to significantly less capable engines than those - without at least clarifying that - risk just adding noise to the conversation.
I’ve found that early on in a brand new project, its output is very useful, but as the project grows more complex and interdependent, the suggestions become less so.
Or like GPS? While GPS gets me where I want to go, I find that using a map or no map at all gives me a better sense of place, so I need less time to get to the point where I don't need a GPS to get around.
100% to both. I have a hoard of PDFs. I use Dash as an offline doc browser. I save interesting links in a bookmark app and I still can find libraries documentation and source codes on web search (duckduckgo). So I just freshen up as I go (or on my free time). Memorize what I need/like and capture what I found interesting. I was doing Advent of Code in Common Lisp and the loop keyword page was always open (It's a mini language on its own). Later challenges became easier, as I learned new ways to iteratively express the solution. I could be faster with generated code, but having a bird view of the map will lead you to more interesting routes (more often speed is not the issue, correctness and maintainability is)
Even auto complete and language servers are bad for you. It causes api spread. No lsp and no copilot makes you write more maintainable, simpler code and you can "keep the program in your head."
> If you put your cursor at the position indicated by ????, you can pretty reliably expect Copilot to write the rest of the code for you.
Does the author mean that Copilot filled in the other `switch` cases? (Because that looks like "Here, let me plagiarize that for you. We'll call it 'AI' and 'boilerplate'.")
Long analogy short, this is where I think AI for coding is now. It gets things wrong enough that I have to manually check everything it does and correct it, to the point where I might as well just do it myself in the first place. This might not always be the case, but that's where I feel it is right now.
[0] Entity Framework has moved on a lot since then, and apparently now can be trusted to lazily load data. I don't know because...
[1] I spat the dummy, replaced Windows with Linux, and started learning Go. Which does exactly what it says it does, with no magic. Exactly what I needed, and I still love Go for this.