A project with a single 11,000-line code file

userbinator · on April 3, 2022

I remember many years ago coming across a reimplementation of the server side for a popular MMORPG of the time, reverse-engineered from the client (which was Flash) by what was likely a teenager --- it was over 100k lines in a single file, written in Visual Basic. Global variables everywhere, short names, and not even indentation. All the account data was stored in flat files, there was no actual DB. No "best practices" at all. Yet, not surprisingly, it worked pretty well and was actually not difficult to modify --- Ctrl+F would easily get you to the right place in the code.

I guess the moral is, never underestimate what determination and creativity can do, and always be skeptical when someone says there's only one best way to do something.

sillysaurusx · on April 3, 2022

Ironically, this is a description of hacker news itself. https://github.com/shawwn/arc/blob/arc3.1/news.arc

(HN has indentation, though.)

It’s important to realize that this is good design. It’s hard to separate yourself from the time you live in, but the rewards are worthwhile.

gfunk911 · on April 3, 2022

This is a good point, but there's also a key difference.

There's a big difference between "code being in one file" and "code being in one function." It sounds like the OP had something reasonably close to "one function," whereas the HN code has a lot of (what appear to be) small well designed methods.

saila · on April 4, 2022

I'm not seeing the irony. The file you linked to is only ~2600 lines of code and seems to be well-modularized and fairly well commented. Assuming that's the whole site, I think that's pretty reasonable.

exdsq · on April 3, 2022

The only readability issue I have with that is the functions expected arguments. Add some types and I’d be very happy to work on it. I believe Facebook uses a single directory of files now as best practice? With the file names including namespaces. That was an HN comment from ages ago so could be wrong or misinterpreted.

sillysaurusx · on April 3, 2022

pg's new lisp, Bel, has something close to typed arguments:

  (def add1 (x|int)
    (+ x 1))

http://www.paulgraham.com/bel.html

I've been implementing it for a couple years now, though not seriously till the past couple months. There are some interesting (and overlooked) ideas in Bel.

Bel is sort of the limit case of generality. For example, you might expect the "type" above to be a separate kind of object, the way that types are separate kinds of things in TypeScript.

But in fact, it's simply a function that receives the argument and can throw an error. So for example, you can do something like:

  (def positive (x)
    (if (< x 0) (err 'negative) x))

  (def sqrt (x|positive)
     ...)

I just wish he'd solved keyword arguments as thoroughly as every other kind of argument. There are hints that it was always in the back of his mind. Though it's true he never needed them, so that's probably why he never made them.

exdsq · on April 3, 2022

It's nice to have invariants - I believe I've seen them in Rackets contracts but I'm a little away from that stuff in my Lisp journey :)

CornCobs · on April 4, 2022

That doesn't really allow for type checking which is a major purpose of types. It's more of just some nice sugar for runtime assertions

klibertp · on April 4, 2022

Well, you can interpret the predicate function body as a set of constraints on the type. Of course, such predicate function would need to be restricted to what your type system can handle. Typed Racket does this, by allowing you to implement type refinements[1]. As long as the predicate only uses operations listed there, it can be used for type checking. Idris also lets you write functions that operate on types and that are used for type checking.

[1] https://docs.racket-lang.org/ts-reference/Experimental_Featu...

dopp0 · on April 4, 2022

> I believe Facebook uses a single directory of files now as best practice Do you have any sources for this?

lupire · on April 3, 2022

2600lines is not 100k lines.

fatih-erikli · on April 4, 2022

This is amazing. Is there a syntax highlighting and linter kind of things for the "Arc" language? If it's possible, I would like to try that. I love how Lisp looks like.

sillysaurusx · on April 4, 2022

Indeed there is. I've been using this vim plugin for over a decade: https://www.vim.org/scripts/script.php?script_id=2720

Which IDE do you like? I'll see about getting some highlighting for it.

As for actually running arc, it’s hard to run the original arc3.1 due to racket updates. I’ve made a few branches over the years that try to preserve the original spirit of arc (no significant changes) while making it easy to run. Try this one:

https://github.com/tensorfork/tlarc

I believe you can simply install racket, then run make && bin/arc and be dropped into a repl. From there you can follow the arc tutorial, whose link I’ll dig up after I’m finished driving home.

EDIT: That fork is actually a lot different from arc3.1 proper. I’ll try to locate a more faithful one.

EDIT 2: Unfortunately it's quite a lot of work to remove the mzscheme dependency from the old arc3.1 codebase. And I'm not sure it's even possible to install the mzscheme lib on the latest racket (e.g. `brew install racket` doesn't seem to have it).

So the above instructions are the best I can do for now.

fatih-erikli · on April 4, 2022

I use Visual studio code, I couldn't find it in Extensions. I'm just asking out of curiosity, What do you develop with Arc usually?

I like how the "html" and "css" part was embedded in that "news.arc" file. Do you think that VIM script will highlight and lint the "css" part of an "arc" file?

sillysaurusx · on April 4, 2022

> What do you develop with Arc usually?

I try to use Arc for as much as possible. We wrote our TPU monitoring software in it: http://tensorfork.com/tpus

Eventually I became frustrated with Racket's FFI. So I eventually made my own arclike language called elflang: https://github.com/elflang/elf

... which itself is a fork of Lumen (https://github.com/sctb/lumen) by Scott Bell.

The performance is good enough to run a minecraft-style game engine: https://i.imgur.com/iyr0YrB.png which was satisfying.

Nowadays I've been trying to implement Bel, mostly for the challenge of it than for any practical reason.

> I like how the "html" and "css" part was embedded in that "news.arc" file. Do you think that VIM script will highlight and lint the "css" part of an "arc" file?

Nope. https://i.imgur.com/o9aUG6j.png

But it has one very important feature: it can properly highlight atstrings: https://i.imgur.com/wO4f742.png

It's probably hard to tell, but the "@(hexrep border-color*)" would normally be highlighted as if it were a string. Arc has a feature called atstrings, where you can use @foo to reference the enclosing variable "foo". It can also call functions, e.g. "The value of 1 plus 2 is @(+ 1 2)" will become "The value of 1 plus 2 is 3".

FpUser · on April 4, 2022

Looks totally fine to me. I'd say it is written very well.

KSPAtlas · on April 4, 2022

Wait, hn is written in Lisp?

gofreddygo · on April 3, 2022

Funnily enough, I have recently had great success by reversing the "best practices" on a distributed "micro services" architecture application into a single big Java file.

Best practices were the usual suspects DRY, IOC, SQL + NoSQL, separation of concerns, config files over code, composition over inheritance, unexplainable overlapping annotations, dozens of oversimplified components doing their own thing, and some $something_someone_read_on_a_medium_post

The Single Java File was around 500 lines no db, lots of globals, a dozen or so classes and some interfaces, Threads for simulating event based concurrency, generous use of Java queues and stacks but i specifically made it static with Zero dynamic hashmaps.

It actually runs in my IDE, I can understand what the hell the product is supposed to do what component is doing more than it should and more valuable was to predict what could break if I change that value in the helm chart from 5.0 to 5.1.

It is quite useful and pleasing, I can actually reason about things and I have new found use and appreciation for Type Systems and compile errors. And I can write tests that run in under 3seconds.

teaearlgraycold · on April 3, 2022

Having the whole project actually in one project is critical. I think some of these “best practices” are actually very useful when applied with caution. But you sometimes need to break the rules. Everything should be optimized for developer convenience. Convenience in deployment. Convenience in debugging. Convenience in refactoring. Only do what HN and FAANG says is “right” when you need to.

KwisaksHaderach · on April 4, 2022

This application must be really really simple ;), so no database?

smaudet · on April 4, 2022

You'd be surprised the garbage overhead brings into an app architecture...

These best practices really only make sense in large organizations, i.e. Conways law.

After all, you can't really ask 100 developers to all add code to one file in a couple weeks - they will spend a month or so just resolving conflicts...

100 file repos are designed so that 100 developers can edit them (in theory), and have relatively few conflicts, not because its better code.

As another anecdote, I find whenever I do solo code I can easily spin out thousands of lines of code within a week (includes testing), when doing code on a project with many other devs my rate drops into the range of maybe 200 a week, just because so much time is spent interlinking other code, finding fixing bugs and tests strewn across many files...

gofreddygo · on April 4, 2022

No and no.

tokamak-teapot · on April 3, 2022

Presumably VB.NET? Because the VB6 IDE wouldn’t let you write more than 65534 lines [1]. Don’t ask how I discovered this.

[1] https://docs.microsoft.com/en-us/previous-versions/visualstu...

vsareto · on April 3, 2022

>Don’t ask how I discovered this.

Like most of the world at one point, it ran on Excel 2003

/s

Forge36 · on April 3, 2022

I too learned this lesson the hard way.

cbanek · on April 4, 2022

Came here looking for this. <3

formerly_proven · on April 3, 2022

It always seemed surprising to me how some of the big Oblivion and Skyrim mods would get by with fairly few bugs despite there being no way to have automated tests and some of them having 10k lines of scripting (or much more in some cases) spread around dozens or hundreds of quests (quests in the CE engine are not just the quests you as a player see, but also a huge number of invisible quests because quest state machines and associated scripting is how scripting works).

theobeers · on April 3, 2022

I think it makes sense. Modders get a kind of obsessive testing from their communities (including themselves) that devs in most commercial contexts couldn’t dream of. Skyrim mods are, if anything, correcting for bugs in the game.

pnt12 · on April 4, 2022

For a single modder, it may have to do with being a single developer, working on it over a long period of time and being a passion project.

For a while, I was the only developer working in a small module of a bigger project: I started the code base, discussed requirements with clients, implemented the needed features, tested the whole product end to end. I developed a very good instinct about it and about what any change would do, much better than any other project. My theory is that the code base matched my way of thinking, so thinking about it was pretty easy.

smaudet · on April 4, 2022

Fewer (in charge) developers makes for a more stable product - even open source products fair better when they have a long term maintainer versus ownership changes frequently.

The other thing is style - when used well, state machines don't require testing. There's nothing to test - either your machine works or it doesn't, there is no point testing state transitions because that is the fundamental job of the state machine. You may as well test that addition works.

Ofc, they must be used well - problems may be difficult or overly complex to model as a state machine or even set of state machines - the pattern excels for small problems, less for large ones.

MauroIksem · on April 4, 2022

This description has been life the last 4 years.

Breza · on April 12, 2022

What are you working on?

lenkite · on April 6, 2022

I have observed that bugs tend to be introduced when you have more than one person working on a project. Single person projects have very low bugs especially if the coder is experienced and follows simplicity + structured programming.

wefarrell · on April 3, 2022

You can get away with a lot on a single developer project and best practices aren’t in place solely to make code functional.

That application would likely fall apart if multiple developers of with diverse backgrounds had to maintain it and add new features.

userbinator · on April 3, 2022

I don't know what you mean exactly by "diverse backgrounds" and it doesn't matter in this case either, because there were definitely multiple people working on it (although the initial version was the work of one.) They effectively used a forum thread as source control, and just attached their modified versions to the posts.

suifbwish · on April 3, 2022

To be fair even with using best practices, code can still fall apart with multiple developers from diverse backgrounds.

Dudeman112 · on April 4, 2022

The conclusion, of course, being that if you need multiple developers the code is less likely to fall apart if they have similar backgrounds ;)

KwisaksHaderach · on April 4, 2022

See also Bellard's QuickJS (54k lines) https://raw.githubusercontent.com/bellard/quickjs/master/qui...

juancampa · on April 4, 2022

Came to say this. I actually find QuickJS pretty easy to understand and modify, in part because everything is one file and it's easy to search, using vim's `#` and `*` keys, for example.

When working on projects by myself, I like putting everything in one big file too. Trying to find the "right" place for something is some unnecessary overhead, not to mention the navigation cost. It's a different story when a team is involved though.

yoursunny · on April 4, 2022

Circa 2006, I was in college, and I got hired to write a webapp for a college department. I didn't know JavaScript could have classes and capture variables, so I made the app with entirely global variables and plain functions combined with `eval`. It's over 2000 lines, and nobody after me could understand it.

rezonant · on April 3, 2022

I suspect the game was Runescape. My brother used to be a fan of these custom servers.

RTFM_PLEASE · on April 4, 2022

A likely guess, although RuneScape isn't Flash based (Java & RuneScript).

rezonant · on April 4, 2022

Oh right.

elvennn · on April 4, 2022

I think the game is Dofus, a French MMO ;)

swayvil · on April 3, 2022

Ah, that takes me back. Commodore 64 freeware games written in Basic.

Ya, you could just go in there and mess with the code all over the place.

weq · on April 5, 2022

me when i look back at the code i pumped at as a 15yr old writing a java servlet web app that could admin a quake1 tf server game

cookiengineer · on April 3, 2022

Back in the days at Zynga, there was this ritual that new members of the STG (Shared Tech Group, which developed the game engine stack) had to try to refactor the road logic code.

Suffice it to say, it's a 28k LOC file that was so bad, it could even hold up in court as evidence that a South American company stole the code of Zynga's -ville games. We could reproduce each and every single bug and its effects 1:1 in their games, with all the crashing scenarios that were easy to reproduce, hard to debug, and almost impossible to fix.

Once you dig into the hole of depth sorting and being smart by "just slicing" everything into squared ground tiles on the fly, there's no way out of that spaghetti code anymore.

Fun times, was always a joy seeing people give up to a single code file. The first step to enlightenment was always resignation :)

iaaan · on April 3, 2022

To be fair, the first step towards refactoring is understanding the existing code -- ideally, knowing everywhere it is used, all of its behaviors, and importantly, its history, so that you don't break anything, and so that you don't reintroduce bugs that have already been fixed over the years. Or, in lieu of all that, a robust automated test suite.

This cannot be done with a file containing 28k lines of code. That is an insurmountable task. They may as well have been asked to start from scratch and build a new engine.

I'm curious what the purpose of this ritual was. Was it just hazing, or was the thought that someone might actually be able to accomplish this?

z3t4 · on April 3, 2022

It is possible to write tests in a single-file program. You could for example have a -test flag that runs all the tests when the program starts. It's never too late to introduce tests - the next bug you fix, start by writing a test that detects the bug, then fix the bug - and confirm that the bug is fixed by running the test. Then you never have to worry about a bug you already fixed will show up again. And the tests will build up to a decent test suit over time. The next step is to also write a new test for each new change or feature. Keep doing this and your program will soon have full test coverage. The trick is to not test small units - instead write the tests as if a user was using the program, so that the actual tests covers a lot of code all over the place, not just the code you added. Mock key presses, and button presses. Then when a user reports a bug, you can write the test so that it repeats the user actions.

What makes testing hard is a lot of side effects, like the program writing to databases or calling external API's. Not the LOC of the source file. You might want to mock an automatic call to the fire department for a fire control program, but for API calls and databases, just have the test write to the prod environment, but include rollback/cleanup in the test. That way you don't need a separate testing environment.

branko_d · on April 4, 2022

> but for API calls and databases, just have the test write to the prod environment, but include rollback/cleanup in the test

I disagree with that. Automated testing should be done on a test database.

Holding a lock for too long could effectively block the entire production. This could happen while debugging through a test (e.g. by hitting a breakpoint or by just stepping line-by-line). Or by simply having a bug in the test causing a "transaction leak" - depending on the tech stack, this could keep the transaction and the associated locks alive until all tests have finished running, not just the one which had the bug.

Or you could commit instead of rollback by mistake.

Or you could simply put unexpected strain on the database, affecting the performance of real users.

z3t4 · on April 4, 2022

It can be a lot of work creating an isolated dev/test environment if you already have a large app that communicate with a lot of services. But if you can that is preferable as the test will create strange artifacts - but those systems is probably someone elses job to make sure they dont get corrupted ;)

hulahoof · on April 3, 2022

I would expect people new to the code base would look at this gigantuan file and ask why it hasn’t been refactored, and this probably was seen as an opportunity to familiarise new hires with the code base as well as get them on the same level as to the effort involved in a refactor.

I agree with you, a rewrite is probably how they should have tackled this one.

cookiengineer · on April 4, 2022

> Was it just hazing, or was the thought that someone might actually be able to accomplish this?

Actually no. Most devs that just got started familarizing themselves with the codebase wanted to refactor the file and came up with the idea themselves. Usually they thought this is a crappy file and this must be an easy task to do because they saw all the nested if/elseif/else statements in the code.

The problem, architecture-wise, was that the road logic was the glue code that integrated a lot of different parts, layers, and NPC behaviours from the rest of the codebase as it was changing the surrounding game world.

If there was a hospital placed with a non-squared ground tile next to it, if it was placed with a 1 offset (roads were 3x3 tiles), if it was placed with a 2 offset next to another road... It went as far as influencing the path heatmap that was necessary for the A* guessing algorithm to make the NPCs walk correctly on the sidewalks. The permutations of possible sidewalks alone were enough complexity on their own...

So in a lot of ways necessary features that historically had no place in the Entity/Component based engine at some point made it in there.

The next best thing (and also spaghetti code) was the Cursor Entity, which had to have line tracing algorithms to be able to select things that are visible under a donut-like shape when the user was hovering the hole, or say, a tree in the game world. Convex and Concave shapes were integrated, and lots of edge cases in there, too, which are actually huge mathematical problems in terms of available performance once you dig more into it, so we ended up with binary height map sprites that helped both the slicing and the cursor at some point.

The important lessons learned from the road logic were very valueable for newbies, as it was teaching the practical problems of isometric game worlds.

So afterwards everyone was able to grasp why the complexity was added, and what was necessary to remove it (in the sprints in the future).

At some point we decided a couple of things because of the road logic and cursor entity for new iterations of the engine, like:

- always use a 1x1 road tile

- always use square based tiles for all objects

- dont make sidewalks, use just road tiles

- dont make trees with holes in their leaves

- dont make trees higher than the buildings

- no artist can ever request crosswalks. Never ever.

...etc

lupire · on April 3, 2022

A 28Kloc file can be modular.

WalterBright · on April 3, 2022

It's impossible to refactor spaghetti code without a comprehensive test suite. But you can do it with a test suite - I've done it with large code bases.

lanstin · on April 3, 2022

You sometimes can. Maybe for any legacy code base someone could, but I have tried and failed on more than one occasion. Some people’s thought process is just perversely different to mine and I keep feeling, oh, this is the layer where that happens, but no every time I have an aha moment I am disillusioned.

WalterBright · on April 3, 2022

If you don't have a test suite, you can't know if you're making progress or making things worse.

Learned repeatedly from painful experience.

manquer · on April 3, 2022

To develop a comprehensive test suite can sometimes be hard, especially for code that deals with say concurrency, multi threaded code , locks , 2d/3d physics , video , analog , hardware related , procedurally generated or ML (meta-language ) and the other ML (machine learning) etc.

A lot of edge cases and race conditions would easily slip through, also a different set of edge cases or race conditions you never considered and therefore never tested for in your first version could pop up in your rewrite.

WalterBright · on April 3, 2022

Of course. But dealing with that is why we get paid the big bucks.

I've dealt with concurrency issues. grep is a handy tool to find related synchronization code, then I try to replace it with an encapsulation. In general, I look for things I can replace with algorithms, and things I can encapsulate. And so on.

deterministic · on April 4, 2022

It might sometimes be hard but I have never seen a case where it was impossible (25+ years of experience dealing with undocumented legacy code more often than not).

manquer · on April 4, 2022

If work with not easily deterministic code typically but not always ML models like say speech-to-text or Face-Recognition or classification/ recommendation systems or network performance dependent applications like video conferencing that wouldn't be that feasible .

Almost nothing is impossible to test yes, however to know and be able to mock the data for each test case can be extremely hard and at some point not worth the effort to even attempt.

Most I have seen these kind of systems doing is statistical testing with reference benchmark/ sample data, and maybe monitor real world feedback either telemetry or user complaints.

disgruntledphd2 · on April 4, 2022

> If work with not easily deterministic code typically but not always ML models like say speech-to-text or Face-Recognition or classification/ recommendation systems or network performance dependent applications like video conferencing that wouldn't be that feasible

Nah, most ML systems (actually doing something in the world) are mostly just ordinary code, which can be tested like any other code (as you put it into functions, etc). The models themselves are pretty awkward, but you can normally freeze the model and just use that to ensure that things stay working while you refactor, and then re-run a few (10+) times to check coverage and intervals and stuff.

It tends to be more difficult, as many DS/data people are not software engineering focused, but it's not impossible.

manquer · on April 4, 2022

I haven't heard of any method to test the model apart from statistical analysis of reference /training data.

The model is what gets continually updated and is the critical path that needs coverae, Testing interfaces are trivial and at times not critical to test if already running in production for a while (you probably have already caught most/all issues and know what to test or take care of in a interface rewrite).

It is not about impossible, here is an example, let's say you are working on English speech-to-text model, the next version works better in your set of benchmarks.

It could for example perform very poorly (compared to your previous model) for accented English or mixed with other languages, for older people or in noisy environments like a car, or for for specific subjects like medical/legal dictation and so on and since your benchmarks originally didn't cover these types of scenarios you wouldn't know one way or another.

These were real cases all added to speech-to-text models after user feedback and adequate demand being identified and research effort put in, and now training/benchmark data includes these. There are plenty of scenarios not yet solved (mixing two languages is active area of research) or not included because user feedback didn't capture it, of not yet worth solving.

Neural network testing is hard because by design they have millions(and these days billions) of parameters as inputs and you cannot feasibly test every possible outcome, you will not know what all things to check until people start using your app in ways you never thought off.

NN /ML is not hard requirement this is true for any complex systems. Shazam type fingerprinting for example is just spectrography and Fourier transforms, NN is just newest tool devs use. All complex systems with thousands and above parameters have same problems

alkonaut · on April 4, 2022

Which means it's often difficult. The traits "Has a comprehensive test suite" and "is spaghetti code" are often rarely seen together. A poor code base has become poor because it's not refactored and cleaned up all the time - and that's often because their is no tests to help with that.

And if there is no test suite, there is often very few ways to add a test suite. Poor code has very few points where you can attach a test. If the code contains file databases or structured input of some sort (a web page) you can add some very high level end to end tests. But not all code has easily verifiable endpoints like that. Perhaps my bad experiences comes from "hard to test" domains (Sound, drawing, ...) code, and not "given this input this is written to the database and this is written to screen".

deterministic · on April 4, 2022

My experience as well. Also the main recommendation from the most excellent “Working with legacy code” book.

gridspy · on April 3, 2022

Man, now I want to test myself against your road logic code. Sounds like a worthy challenge.

Always tricky though when the hacks have both undefined features AND bugs.

tjalfi · on April 5, 2022

If you're looking for obfuscated code to refactor, nroff[0][1] is pretty notorious.

[0] http://dtrace.org/blogs/eschrock/2004/07/01/real-life-obfusc...

[1] https://www.youtube.com/watch?v=l6XQUciI-Sc&t=1h28m35s

antifa · on April 4, 2022

This could be like that "endless civ2 game save" where OP thought he was in a permanent stalemate but random internet civ2 veterans found it pretty easy to win.

kbrannigan · on April 3, 2022

So afraid to write bad, spaghetti code, I ended up writing no code at all.

This thread made me realize that it's better to have a working profitable project with bad code, than a perfect unfinished project, with meticulously chosen design patterns.

Afraid of being judged for bad code, I could not start until I had the right architecture.

I'm glad I read this.

This is developers therapy.

khazhoux · on April 3, 2022

I've sadly come to realize (after witnessing on many projects) that there's a pattern that goes like this:

* Team A writes code quickly. Not bad code, really, but they take shortcuts everywhere they can. They don't have the strongest tests, they don't generalize for all the known use cases, etc. Their code goes to beta and gets users and makes progress.

* Team B deliberates and deliberates. They try to avoid taking shortcuts. But in the end, even their code doesn't have the strongest tests, doesn't generalize for all the known use cases, etc. Team B never gets users or gains momentum, and their code+architecture was probably no better than Team A -- they just took 3x the time to get there.

kolinko · on April 3, 2022

+1

I had a lot of trouble trying to explain this to juniors.

The most important things is to have code that is easy to refactor when you know what you're doing (i.e. everything is working properly). Juniors I worked with had a nasty definition of a pretty code being split into a hundred files, each no longer than a screen, and each function no longer than 5 lines. The onboarding of new devs to such code was way worse than into a code that would be 10k lines in one file, but with a flat structure and less interdependency.

gary_0 · on April 3, 2022

"Flat is better than nested" - The Zen of Python

I had an "everything should be broken into a hierarchy!" stage back when I was learning to code, and boy was I off track. In my defense, at the time (and this dates me) OOP was all the rage.

icedchai · on April 3, 2022

I find OOP spaghetti can be incredibly difficult to navigate, example: Class hierarchies 4 or 5 layers deep, some subclasses overriding the parent, others not. It can be it very difficult to follow what's actually going on.

Procedural spaghetti is more manageable, though I once had to update a C app with a 3000 line case statement. Pure madness.

jamesfinlayson · on April 4, 2022

I remember working on a project with endless exception class hierarchies - that was a monumental pain when trying to diagnose what led to an exception.

In a subsequent project we banned subclassing of exceptions.

arealaccount · on April 4, 2022

Inheritance hell would be more lasagna code.

Golang does a good job of addressing this one particular problem.

smackeyacky · on April 4, 2022

This depends a lot on the development environment. In Smalltalk years ago, we were encouraged to limit each method to few lines at the most. This makes sense within Smalltalk and the code browsing inherent in the system.

It makes debugging in Java or C# an exercise in face shredding frustration. Where each class is in a file, it's better to structure the class consistently with other classes in the project, and things like naming conventions become a lot more important.

You could argue that the C/C++/Java/C# languages are fundamentally broken because they don't encourage the succinct, small class methods that Smalltalk did, but you could also argue that those small methods don't necessarily work very well in a different class of languages from Smalltalk and neither approach is really more productive than the other, with the caveat that Smalltalk is largely dead and irrelevent to modern programmers other than a curio.

BurningFrog · on April 3, 2022

> The most important things is to have code that is easy to refactor

Very true.

I'll just add that another most important thing is to actually take time to refactor, even when things are busy.

I spend maybe 1/3 of my time refactoring, and that feels good.

jimbokun · on April 4, 2022

I like to refactor once I have passing tests and before committing.

Once you know it works take a minute to clean up and make the changes fit your preferred style, extract repeated code into shared methods, comment the tricky bits, etc.

BurningFrog · on April 4, 2022

This Is The Way

lupire · on April 3, 2022

> The most important things is to have code that is easy to refactor

This whole post is about how refactoring doesn't matter because your project's development lifetime isn't long and wide enough for maintentance to matter.

Scarblac · on April 3, 2022

Less interdepency is absolutely the key to everything.

But... isn't the easiest way to show that there is little interdepency to put them in separate files that don't import from each other?

lanstin · on April 3, 2022

People misjudge where to draw the lines. You will have an orchestration API call that does five things and each of those five things, not used anywhere else, will get its own class, interface, factory, and configuration, so to read thru the five things you have to open like twenty files. And to notice that despite all this engineering they have static credentials in the code itself, you have to be alert across so many lines of code. The whole thing can be one longer file that reads coherently and in fact lessens the cross class importing.

gridspy · on April 3, 2022

Be Team C.

Team C works like team A. However every time a feature ships, someone who knows that feature well immediately refactors the relevant code to remove the prototype scaffolding. When code becomes static, an expert adds good quality comments. When a bug is found, it is recreated in a regression test prior to being fixed for good.

jokethrowaway · on April 3, 2022

The sad reality of tech companies is that there is little incentive or bonuses for improving the situation. You won't get a bonus for cleaning up the code or for rewriting rotten code.

Hack at the code for 4 years, collect your options and leave the mess to someone else.

To be honest, a hacky codebase written fast is not the worst codebase to deal with. The worst type is when someone had the time to overarchitect and overengineer things.

Following references across 200 different files, tracing calls through hundreds of microservices. Graphql servers with complex resolver logic.

gridspy · on April 3, 2022

> The worst type is when someone had the time to overarchitect and overengineer things.

I concur. Refactoring should be as much about removing unneeded abstraction and features as it is about adding same.

> there is little incentive or bonuses for improving the situation.

Yeah, I just can't seem to believe this in my soul. I just want to fix ugly code and can't stop myself. I get huge satisfaction from speeding up, tidying up or fixing up bad code.

It doesn't help when management wants to minimize time spent on such tidy-up, especially when it's hurting our productivity to maintain it without fixing it.

deterministic · on April 4, 2022

Just do it. Don’t ask for permission. You will end up more productive not less.

jamesfinlayson · on April 4, 2022

This - I've been doing this for years now and simpler code (both simpler logic and the removal of unused code) makes working on legacy codebases feel completely different.

deterministic · on April 5, 2022

Yep exactly my experience. That combined with solid tests => I spend almost 100% of my time adding new features instead of staring at code and fixing bugs.

wsc981 · on April 3, 2022

I try to sneak in refactoring with other tasks.

gridspy · on April 3, 2022

I find the best method is:

1. Figure out what needs doing and what code you need to use

2. Refactor the code you're using until the new code or change is easy

3. Make the change

4. Tidy and document.

Repeat

I often also document during step 1, while I am trying to understand code and realise that comments are missing.

jimbokun · on April 4, 2022

4 years is a long time to suffer through inscrutable code.

wk_end · on April 3, 2022

I don’t think this dichotomy is helpful. I’m presently working at a startup that’s trying to dig itself out of a hole created by the first CTO, who in doing things “quickly” created an MVP so buggy, inefficient, crash-prone, and unmaintainable that we can’t retain customers or engineers. As always, there’s a balance to be struck, and ways to operate quickly that don’t sacrifice quality too much.

notreallyserio · on April 3, 2022

I'm kind of surprised that you can't find engineers interested in creating a new implementation of an existing application that is actually used by people. I think that might be my dream role.

88913527 · on April 3, 2022

At the risk of crushing your dream role, re-implementations can be long slogs. I'm in the middle of one right now. The Product owners don't know what the thing does. The engineers who originally wrote it are gone, and their replacements are relatively new to the codebase. The bright side is customers are hugely interested in our progress to date and we've received positive feedback. The business wishes we could move faster.

Aeolun · on April 4, 2022

> The Product owners don't know what the thing does.

This is the dumbest part. You’d think that someone documented something when they originally built it, but nooo. Don’t even know the requirements, just that it has to be the same as the previous one.

wk_end · on April 3, 2022

Business doesn’t want us to stop the bus to change the tires - or spend so much time changing tires that the bus never reaches the destination.

jimbokun · on April 4, 2022

Makes me think of Lightning McQueen losing the first race in Cars because he refuses to pit to change his tires, then blows them out on the last lap.

cupofpython · on April 4, 2022

That type of role typically requires senior engineers at a junior salary

arealaccount · on April 4, 2022

If your dream job could be in Charleston SC lets chat :)

cupofpython · on April 4, 2022

I think something that is lost in this conversation is that "quickly" maps to wildly different results in code quality depending on the programmer.

It sounds like your CTO did not just operate quickly, but also sloppy and chaotic. From what I am gathering from this thread, the best practice is to move quickly AND organized such that refactoring is reasonable.

kbrannigan · on April 3, 2022

There's a YouTube video about beginner musicians vs intermediate vs advanced.

The beginner uses simple chords

The intermediate uses advanced chords, crazy fills and runs and riffs.

The advanced uses simple chords

jbmny · on April 3, 2022

"First there is a mountain, then there is no mountain, then there is."

userbinator · on April 3, 2022

That's not far from a similar saying in software: expert developers write code that looks like a beginner's, but simpler and with fewer bugs.

trashtester · on April 4, 2022

Beginner devs write code that is over-simplified, and needs hack on top of hack to do anything useful.

Intermediate devs write code that is over-engineered in places where it could be simple, while still needing a lot of extensive refactoring in the parts that deal with irreducible complexity.

Expert devs think deeply about the problem at hand, understand where the complexity is, and create a solution that is as simple as possible, but no simpler. After the fact, beginner and intermediate devs tend to think of this code as something trivial, that anyone could do.

This is why it may sometimes be tactically useful for the expert dev to sometimes take on projects that some beginner or intermediate team has been struggling with for some extended period of time, analyze it properly, and show how elegant the difficult parts can be solved.

Care is needed, though, as it can affect the morale of the other devs and even cause hostility. If the expert dev has a secure position in the organization and want to keep those other devs around, keeping a low profile as well as letting those devs take as much as possible of the credit is adviced.

khazhoux · on April 4, 2022

"It took me a lifetime to paint like a child." -- Picasso

411111111111111 · on April 3, 2022

I kinda agree on this, but not for the implicit reason you're probably thinking of.

Just starting and doing it is just unreasonably effective because very few projects actually need novel solutions - most are just fine with off-the-shelf hacked together solutions.

Thinkers are required if the software is actually groundbreaking new work. Almost everyone's work on this forum probably isn't that however (mine included), which is why I agree with your sentiment

jordanmaulana · on April 4, 2022

Plus.. When someone from Team B initiate the code, with hundreds of files & massive boilerplate, can't continue the work for a reason (e.g: sick / resign), this new dev who take the position will require super long time just to understand the code. Some of them will eventually writing their own, instead of following existing code standards, leading to have multiple standards at 1 project and resulting hell if it runs into trouble..

Also when someone from Team A initiate the code and Team B takes over, there were few times that Team B feels like the code is damn no good and just massively refactor it with what they think is good (re: the boilerplate) without others' concern. Then when Team B leaves, it goes to 1st paragraph..

I think as a team we need to consider the learning curve of our own code cz we don't code for ourselves.. And it's good to know the tolerance & acceptance to 'structure' of the code from other people..

kcb · on April 3, 2022

Something this reminds me of that I've been doing lately when stuck on a particular problem is just coding something. Even if it's the most shittiest, inefficient and naïve solution. More often than not I either discover a more proper solution along the way or just realize my shitty solution actually wasn't all that bad to begin with.

baggy_trough · on April 3, 2022

Start with the simplest idea that might work.

Mawr · on April 3, 2022

https://news.ycombinator.com/item?id=19956614

farmin · on April 3, 2022

Tests, ha

squeaky-clean · on April 3, 2022

The Player controller from the game Celeste is a single 5600 line file that includes things like systems only used in the tutorial. I honestly don't think it's as bad as some of the criticism it got when the code was released makes it seem, but it certainly could be better looking code.

But ultimately, Maddy Thorson isn't selling a block of code. They're selling a game and it has extremely satisfying control of the character. And that's all that really matters for a player controller.

Maybe better organization and design patterns could have made it faster to develop? But I don't believe it would.

But also the type of product does matter for this. Celeste had 2 programmers so a lot of the things necessary for a team of 100 devs would just be harmful. If you're making a library/framework to be used and modded by others, architecture matters a lot more. If you're designing an enterprise application that you know will need business logic customizations for 25 business customers it matters more. It's all about knowing the scope of your project. But also until you start getting that many customers, maybe the unsustainable method is what will allow you to reach those first few sales more quickly to be able to stay in business long enough to be bit by the technical debt.

cyber_kinetist · on April 4, 2022

I remember sharing that Player.cs code from Celeste in the gamedev subreddit, and getting all kinds of weird novice comments about how the code doesn't adhere to 'OOP Principles' or 'there isn't any unit tests' or 'you should split it into multiple files with 100 lines each' or 'you should use an ECS to make a real game'.

Laster on Noel Berry did give a response explaining the various design choices behind their code:

https://github.com/NoelFB/Celeste/tree/master/Source/Player

Anyways, kudos to the team sharing their code even if it's a bit messy.

squeaky-clean · on April 4, 2022

I hadn't seen the new Readme update (even though I Googled the repo again to link the cs file lol). Thanks!

heywire · on April 4, 2022

I am part of a small team that maintains a legacy point of sale system that is still used by thousands of stores around the world. It started life as a DOS application written in C with some ASM bits, and has since accumulated some C++ and C#. There are functions over 5000 lines long. Files over 50000. Global all over the place. It can be a challenge sometimes, but after almost 30 years, it still brings in millions of dollars a year in maintenance and enhancements, and still processes millions of transactions for those retailers.

deterministic · on April 4, 2022

The world runs on software like that. Hard to maintain “crappy” code that makes $ beats clean pure “perfect” code that makes zero $ every time.

wonderwonder · on April 4, 2022

Getting something working and out there is 90% of the battle, especially on small or single person teams. I wrote a saas php app with vanilla html and JS that ran without issue for 8.5 years for a fortune 1000 company. About twice a year I would return to it to add or modify a feature and I had no idea how a lot of it worked and even had duplicated or redundant files that I was too afraid to delete. It worked though and I got paid every month for a very long time. Sometimes delivering a product is all it takes and getting trapped in delivering 'clean' code is just a blocker. Not often, but sometimes :)

nukst · on April 4, 2022

I want to hear more from you.

Oras · on April 3, 2022

Been there. I worked in a company where we had a codebase like the one mentioned in the article and over the years we started developing microservices with 100% code coverage.

The new shiny services took much longer to identify bugs and add new features due to the complexity of the design and endless interfaces.

2143 · on April 4, 2022

+100

I'm so afraid of creating programs in languages that don't enforce a structure at all, even though I know how to write everything from scratch make it work.

If it's some framework, then it'll already be structured somewhat.

In the rare event that I do create something with no frameworks, I ensure that there aren't much global variables.

knome · on April 3, 2022

You sound like you should read this: [removed]

apparently jwz decided not to be linked from here :/

there's an archive.org link below.

Jtsummers · on April 3, 2022

https://www.dreamsongs.com/RiseOfWorseIsBetter.html

For future reference, a non-archive, non-jwz.org link. Straight from the source as that's the author's own site.

GranPC · on April 3, 2022

Heads up: jwz.org redirects Hacker News visitors (via "Referer") to an image of a slightly-hairy testicle sitting inside an egg cup. So maybe don't click the link at work.

AlexCoventry · on April 3, 2022

Safe link: https://web.archive.org/web/20220318220659/https://www.jwz.o...

Jaruzel · on April 4, 2022

Ok, So I wanted to see it. Quick hack of the link tag in sibling comment, and this is what you see:

NOT SFW! https://cdn.jwz.org/images/2016/hn.png

It's funny. Why does he hate us so?

sillysaurusx · on April 3, 2022

That’s friggin’ hilarious. What a boss.

wincy · on April 3, 2022

Uhh you should know that domain redirects traffic from hacker news to an offensive image.

albertzeyer · on April 3, 2022

Put it like this: www.jwz.org/doc/worse-is-better.html

Then just copy & paste it.

baud147258 · on April 3, 2022

from the last time I saw that link on HN, opening it in a private browsing window avoid the redirection.

leetcrew · on April 4, 2022

happy medium: write shitty code with strong API boundaries. at least the damage is localized and every so often you can go back and clean up or replace components independently. or more likely, just leave it that way and make more poorly implemented features that make money.

zem · on April 4, 2022

let me recommend one of my favourite programming blog posts: https://prog21.dadgum.com/21.html

bpicolo · on April 3, 2022

> it's better to have a working profitable project with bad code, than a perfect unfinished project, with meticulously chosen design patterns.

A lot of businesses were built on PHP this way

29athrowaway · on April 3, 2022

Restaurant industry version:

    I was so afraid to cook in a dirty kitchen, I ended up not cooking at all.

    This thread made me realize that's better to sell food prepared on dirty surfaces with unrefrigerated ingredientes half-eaten by rodents and roaches that makes people sick, than fresh food prepared on clean surfaces with clean utensils.

    I'm glad I read this.

    This is a restaurant worker story.

Construction industry version:

    I was so afraid of not using the right construction materials and not building code-compliant structures, I ended up not building at all.

    This thread made me realize that's better to sell houses with structural problems and low quality materials that will be unsafe to live in, than houses built according to code.

    I'm glad I read this.

    This is a builder story.

In any other industry, a person would go to jail for saying that. You won't, because luckily for you, software development is not a regulated activity, and people with your mindset can make a happy living outside of jail. But hopefully one day some types of neglect in software development become illegal.

"Better is the enemy of the worse" is no excuse to have spaghetti code, or 50,000 lines of code files. It means that good is sometimes more convenient than perfect. Spaghetti code is not good to begin with.

passivate · on April 3, 2022

Your analogy is flawed. The CPU doesn't care at all about how or what code looks like, all the nice comments explaining what it does, nice naming conventions, whether its easy to understand or not. They have zero impact on the final compiled code. The executable ends up as a spaghetti of machine instructions with countless gotos in a single large file.

Using bad ingredients in food, or poor quality materials in construction has tremendous impact on the final product.

>"Better is the enemy of the worse" is no excuse to have spaghetti code, or 50,000 lines of code files. It means that good is sometimes more convenient than perfect. Spaghetti code is not good to begin with.

Just calling code good or bad doesn't mean much - ultimately, results matter - if your code doesn't have tons of bugs, if your team can add features without any problems, if you can ship reasonably on time, if your product delivers value to the end user, etc - then you have succeeded. It doesn't matter what outsiders think about the code or what labels people give. Its best to ignore them and continue doing good work.

29athrowaway · on April 4, 2022

[flagged]

passivate · on April 4, 2022

Why do you think you can pressure me into accepting your opinions? I don't want to argue with you so how about this - you do it your way and I'll do it my way. But, thanks for your concern.

patmcc · on April 4, 2022

If you're writing code that controls radiation therapy machines or trains or autoclaves or something, yah, maybe there should be some regulation and potential jail time for negligence. But 95% of all software written (especially if it's the first version of new software) isn't life or death. If the next social media startup or saas-for-painters is spaghetti code it's not going to hurt anyone.

Failure is cheap in our industry. That's largely a very good thing.

Zababa · on April 4, 2022

No, that is not the restaurant industry version or the construction industry version. You can't possibly compare "good practices" in software with building codes and food safety regulations. Building codes and food safety regulations are based on facts, science, and decades of experiences. Good practices are rooted in ideology, hype, and not based on facts. There are no studies showing that "having tests helps". There are no studies showing that "splitting code into multiple files means that you'll have a better end product". And experienced people (like in this thread) even say that there is often not that much correlation between how code "looks" and how well it works.

29athrowaway · on April 4, 2022

> Good practices are rooted in ideology, hype, and not based on facts.

That is a really sad statement that is predicated on the assumption that nothing can be objectively compared and therefore nothing can be ranked, which is also a way to kill arguments that lead to innovation and iterative improvement.

You can measure the complexity of an algorithm, you can measure the cyclomatic complexity of a function, you can measure code in terms of length, you can count references to external functions or modules, etc.

There are many ways in which you can compare code and make decisions about what style is more convenient for your team.

What is clearer for you to understand?

a) 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1

b) 12

If your argument was true, they would both be the same. We both know that a) is a waste of time.

Zababa · on April 5, 2022

Your example is oversimplifying the problem, I don't think you're proving anything here. I've used tools that calculate cyclomatic complexity, and sometimes they're a good indicator, and sometimes they aren't at all. You can measure the complexity of an algorithm, sure, but unless it comes with a real life benchmark, it's only part of the answer. Same with the length of the code, you can measure the length but it doesn't tell you much about how hard to follow is the code.

If you take any style guide, book about good practice, or stuff like that, you'll find that there are some good ideas, and there are some bad ideas. Even with something as simple as code formatting, we still don't know as an industry if it's better to format everything the same, or use formatting to convey information. The debates about OO, FP, static/dynamic typing are endless and there's no evidence about which is better. There is a rough idea that you need more organization when more people work on code (static types, microservices, more documentation, more isolated parts), but even that isn't really clear.

My assumption is not that "nothing can be objectively compared and therefore nothing can be ranked". It's not even an assumption, it's what I've seen in this industry: we lack data to give objective good practice that go beyond anything trivial ("try to make code easy to read and simple", "give meaningful names to your variables"). There is a wide gap between "common sense" and "cargo cults", which is the gap between easy and complex topics. I would really like if in the last 50 years we had learn a lot about how to build software as an industry, as people. The reality is that we haven't.

29athrowaway · on April 5, 2022

We cannot objectively determine what is a good sandwich length therefore we should eat 1000 mile sandwiches.

ByteJockey · on April 4, 2022

I think your examples are unfair. Getting stuck on needing the perfect architecture is closer to scrapping a building plan because it wouldn't hold up to a 2km asteroid strike than it is building something that will ultimately kill people.

Also all of your examples have wildly different impacts than a dev "portfolio project". They all cause physical harm to people, which a poorly coded website/cli tool/etc almost certainly won't. Unless this person's hobby is writing code for MRI machines, in which case, go ahead and make everything is perfect, but that doesn't seem to be the case here.

29athrowaway · on April 4, 2022

Basic hygiene when cooking is not perfectionism, and is not only done at elite restaurants.

Dismissing basic development good practices as "perfectionism" is just gaslighting people into believing that any form of thinking is overengineering.

ByteJockey · on April 8, 2022

> Basic hygiene when cooking is not perfectionism, and is not only done at elite restaurants.

Ok, so let's continue this analogy on the other one.

It's less like basic hygiene, and more like refusing to cook outside a clean room.

> Dismissing basic development good practices as "perfectionism" is just gaslighting people into believing that any form of thinking is overengineering.

Basic development good practices are something you develop during the "portfolio building phase", not before.

Mawr · on April 4, 2022

There's being uncharitable and then there's this.

Your entire point rests on a baseless assumption. There's absolutely nothing in the parent's post that indicates that the programs he would create could have the potential to harm humans.

shakna · on April 3, 2022

I learned that not all text editors go to the effort of loading the file data very carefully with careful underlying data structures when I tried to open a 67K LOC COBOL file on a 32bit system, a while back. (Sidenote: COBOL has a 999,999 LOC hard limit in the compiler spec.)

So very many editors just couldn't open it.

Some would use so much memory that the system would either freeze, or the OS would kill them.

Some would silently truncate at 65,535 lines.

Some would produce a load error.

Some would pop up with an error indicating the developer thought it was an unreachable state. e.g. "If you're seeing this error... Call me. And tell me how the fuck you broke it."

Others would manage to open it, but were completely unuseable. Where moving the cursor would take literal minutes.

There were exactly three editors I found at the time that worked (none of which were graphical editors). And they worked without any increased latency, letting you know that the developers just thought through what they were doing: vim, emacs, nano.

(A few details because people are probably curious - the vast majority of that single file project was taxation formulae. It was National Mutual's repository of all tax calculations for every jurisdiction that they worked in, internationally, for the entire several hundred years of the company. They just transcribed all their tax calculation records into COBOL.)

aasasd · on April 4, 2022

Emacs is actually quite poor at opening large files, at least comparatively—depending on the machine, 65K lines may be enough. However, there's an addon, i.e. a ‘mode’, that implements editing of large files somehow.

Vim, on the other hand, does it splendidly: it keeps only a chunk of the text in memory, and iirc the ‘swap file’ that it creates for every opened file, keeps the changes in some kind of a sparse structure, so they can be tracked at various places in the original file. This ‘swap file’ also serves as a savepoint of the editing session, so the changes can still be recovered even if the machine crashes while the user never saves.

Alas, editors still tend to deal badly with very long lines (just in the low thousands characters). IIRC both Emacs and Vim drop into a big think if the user attempts to put the cursor further down that line.

disgruntledphd2 · on April 4, 2022

> Emacs is actually quite poor at opening large files, at least comparatively—depending on the machine, 65K lines may be enough. However, there's an addon, i.e. a ‘mode’, that implements editing of large files somehow.

Yeah, essentially this (apparently) mostly occurs when the file has no newlines (like json often does). I think the hacks are around turning off font-lock mode and one or two other things (install long-lines-mode if this is a problem you're having).

aasasd · on April 4, 2022

Again alas, long lines is not the only large-file problem in Emacs. Though perhaps most of my woes pertain to Org-mode, but I had to look for a solution to edit large files in the past.

This is probably the most current implementation of a ‘view large files’ package: https://github.com/m00natic/vlfi

bbkane · on April 4, 2022

What editors ended up working?

shakna · on April 4, 2022

> There were exactly three editors I found at the time that worked (none of which were graphical editors). And they worked without any increased latency, letting you know that the developers just thought through what they were doing: vim, emacs, nano.

lordnacho · on April 3, 2022

No doubt dozens of devs will throw in their own 10k LOC story here, and yes it's painful to watch so many people having professional cramps over it.

But don't forget society itself if governed by OOM larger bits of text with no referential integrity, no machine to tell you if it's inconsistent, and no way to test anything, other than making humans write more text to each other and occasionally show up in court. The law itself, even parts of it like the tax code, and regulations on various areas, are a melange of text and cultural understandings between lawyers, judges and government. We collect the data for this machine in the form of contracts and receipts, and it piles up in mountains.

As with code, it's not just legal professionals who have to deal with law. It spills into everyone's life, and there's nothing to do about it other than either guess what to do or pay a pro to tell you what to do.

cormacrelf · on April 4, 2022

You are wrong to say there's no way to test anything. Imagine an enormous AI generating test cases for you constantly, in an adversarial fashion, with built-in rewards for advancing a more correct understanding of the text. Lawyers call this "testing", rightly so. If you are interested in efficiency / cost-effectiveness, it's got lots to be desired. But if you are interested in the internal integrity of the document etc, then this is better than almost anything developers have.

I hate these words as I type them but the law is also "agile" (ugh). It gets modified as it's used. It does not need high-assurance machine-verified "referential integrity". In my entire course of studying the law I don't think I've seen a single legal dispute over a problem of referential integrity. Mistakes, especially drafting mistakes, are corrected on the fly pretty much everywhere they appear, and then they disappear. For a dev, using the wrong variable name in a bad language could mean you introduce a huge security vulnerability and massive loss of trust. (Or if you write smart contracts, $100M down the drain.) For lawyers, referring to the wrong section has essentially zero consequences. Nobody cares. Maybe you get a funny look from a senior.

Finally re the 10k LOC tangent that this is supposed to be connected to, I'm not really sure what you're complaining about. You get "10kLOC" cases, but you also get well-organised practice guides & bench books. Laws in statute are typically very well organised, in my experience about 5-10x better than the average codebase. Laws are organising large swaths of the sum total of human endeavour, just as code does. I would say developers are behind overall, which makes sense for a discipline that's less than a century old.

Fwirt · on April 3, 2022

The payroll check printer for my employer was once a couple thousand lines that generated raw PCL to be sent to a LaserJet that used magnetic toner to produce checks that had a working MICR number. It was rendered into spaghetti by multiple GOTOs that jumped to helpful labels like "666", and calls into other helper programs to generate more PCL that did things like change fonts and draw graphics. Of course none of it was commented, so you had to have a copy of the PCL spec on hand to know what any of it did. It was the product of a retired cowboy that had also written the rest of our custom payroll system over a number of years.

I attacked it by printing out and taping together each program into "scrolls" and tracing control flow with highlighters and sharpies. Had them all taped up on my office wall so I could refactor the whole thing from scratch, coworkers found that entertaining. Got a much more readable replacement working nicely. Then a couple years later HR bought a new system and we stopped printing our own checks. I was not sorry to see the whole thing go.

Lev1a · on April 3, 2022

Reading your process I have that stereotypical TV series image in my mind of a person so deep into the subject matter that plaster every wall with notes and pull string all across the room at head height to hang up ever more notes kinda like that one NCIS episode (S8 E6 "Cracked"): https://img.sharetv.com/shows/episodes/standard/616591.jpg (although that image is only a small part of that whole view).

drdec · on April 4, 2022

A printout of a one-file spaghetti code with gotos is the only case where I can imagine that trope of the wall of connected strings actually being a useful tool.

gnat · on April 4, 2022

We call that a Murder Board in my office.

Here's a TV Tropes article on the cliche: https://tvtropes.org/pmwiki/pmwiki.php/Main/StringTheory

victorcharlie · on April 4, 2022

I went through something similarly during my PhD when my advisor printed the main code of the program that we were going to work on. At first I thought he was kidding (he's a very chill guy) but... Heck, after a few hours of "paper debugging" we discovered a lot of nasty issues, got new ideas and found redundant and spaghetti code that we didn't find when we debugged digitally (obviously, we were not a team of CS students/coders. Just a bunch of chemists kinda newbies to coding). It was a really useful and funny approach.

briantkelley · on April 3, 2022

I worked on Word for years. Office has thousands of files over 10,000 lines with, uh, various degrees of test coverage and comprehensibility. After some time and experience, your mental model of the architecture ends up being way more important than simple metrics on source code organization.

IMO, organizing source code in files seems archaic. E.g. tracing the history of a function moved across files can be tedious even with the best tools. I’d like to see more discussion around different types of source storage abstraction.

There are benefits of large source files... When compiling hundreds of thousands of files (like Office), the overhead of spawning a compiler process, re-parsing or deserializing headers, and orchestrating the build is non-trivial. Splitting large files into smaller ones could add hours of just overhead to the full build time.

woah · on April 3, 2022

What's an alternative to files that doesn't just have all the same attributes of files anyway? If it involves breaking code into multiple chunks of related functions, and possibly having these chunks act as namespaces, that sounds like what a file does.

squeaky-clean · on April 4, 2022

Maybe it's the editors that need updating. It would be pretty neat to have the chunks of functions / namespaces model as lots of tiny separate files on the disk, but a sort of view-layer so that you can view all the related ones in one virtual file. You could even then have multiple virtual files that include the same raw files.

For example in a video game your player, monsters, health potions, and attacks could all have the code for HealthComponent as a part of their virtual file. And updating the HealthComponent would affect the raw file so the virtual files would have the updates automatically. Yeah you can open dozens of editor tabs or always use jump-to-definition, but just being able to scroll around or ctrl+f within a restricted set of limited files would be nice.

Jtsummers · on April 4, 2022

Look up "class browser", in particular "smalltalk class browser" to see examples, current and historic, of editors supporting that kind of approach to navigating codebases.

squeaky-clean · on April 4, 2022

Maybe I'm looking up the wrong thing but these basically list the classes and method names and hotlink to the code?

I mean something that would dump them all into, one contiguous "file" from the editor's perspective. Included components (I don't want to say classes because it could work in a functional language or maybe you just don't include full classes but certain methods) wouldn't have to be coupled to the main class you're editing. Like if you have a decoupled event system you could pull in just the events relevant to the idea you're working on. You could have different views depending on what idea you're working on and save them as their own file.

To use the gamedev analogy again you could have MonsterCombat.view MonsterAI.view MonsterAnimations.view which would all expose different subsets of the Monster class and various related methods from other classes/modules.

nukst · on April 4, 2022

Something like this https://tibleiz.net/code-browser/ maybe?

jamesfinlayson · on April 4, 2022

I might be remembering wrong but I thought Visual C++ 6 might have had a class browser as an alternative to a file browser. Maybe modern Visual Studio has it too?

squeaky-clean · on April 4, 2022

I don't really mean a class browser but something that completely abstracts the underlying code organization and let's you create a sort of meta-file that includes all the related code for various ideas.

Like you could have all of the code from class A except the debug related methods, a few methods from class B, just a few functions from a static MathUtils class, so on.

Maybe your ClassAUnitTest meta-file could include some of the ugliest methods from the class you're testing but not the entire class.

bbkane · on April 4, 2022

Give https://www.unison-lang.org/learn/tour/ a read. The Unison language stores code as hashed syntax trees.

rc_mob · on April 3, 2022

I love how much this questions the status quo.

m1ckey · on April 3, 2022

The .NET runtime GC is a 47k C++ file.

https://github.com/dotnet/runtime/blob/main/src/coreclr/gc/g...

TimTheTinker · on April 3, 2022

Someone please tell me this is transpiled from a separate project.

liversage · on April 3, 2022

It's originally written in LISP and this is why it's a single C++ file. However, I believe that it's now being maintained in its C++ form.

phyrex · on April 3, 2022

If I remember correctly it was in fact written in Common Lisp; the output was originally that file but it may have been modified since. You can probably google the truth with those breadcrumbs :)

rahimiali · on April 3, 2022

Python's main interpreter loop is a single 4k line function.

https://github.com/python/cpython/blob/main/Python/ceval.c#L...

JoeAltmaier · on April 3, 2022

I've refactored monolithic code several times in my career. It starts with a thorough going-over, making notes, identifying the state machines and drawing what was handled and what was not.

Then, reimplement as a simple state machine but this time, fill in all the transitions (event+state => new state + action)

One was an Infiniband code base from the vendor - a 'computer scientist' had written several layers to do what one or two could accomplish. Another, the Windows CE DHCP client (went from seconds to choose an address to milliseconds). Then there was an HDLC modem protocol - I got done, that was sped up a multiple and no longer crashed.

I can't understand them by just reading. I had to make a road-map of all the states, events, actions and interfaces. Design a new code. Then make sure every function of the 'old' code was represented in the new code - line by line. So nothing got dropped.

Satisfying. But more like turning the crank and making sausages than design or architecture.

deterministic · on April 4, 2022

I like your approach. Having a clear understanding that all software is just state machines is a great way to solve hard problems.

holoduke · on April 3, 2022

Better one good organized file than 100s of folders and subfolders and files and symlinks. I have worked on projects where even after 2 years I didn't grasp the folder structure and just used search to locate files.

civilized · on April 3, 2022

People love to complain about things that are simple, fast, and easy to complain about, without regard to whether the complaint is insightful or useful. It's sort of the dark twin of bikeshedding.

If you divide the single 11k-line file into a thousand 11-line files, it may become objectively much harder to understand, but it'll also receive much less flak, guaranteed.

I suspect this is also why Architecture Astronaut-ery can be so successful within a company. If code is chock-full of superficial signs of order and craftsmanship, such as hierarchy, abstraction, and Design Patterns(TM), it takes a lot of mental effort to criticize it, and most people won't.

aaronchall · on April 3, 2022

If you divide a single 11k-line file into 20 files averaging 550 lines per each, by semantics and levels of abstraction, your code will quite possibly be easier to read, maintain and add to. Maybe. Perhaps.

civilized · on April 3, 2022

I mostly agree, but it's often not that big a deal, and some people and applications may favor bigger files.

I have a 4000-line script in a single file that has served me very well. It's perfectly organized and modular. I thought about breaking it into more files but it seemed pointless. It's very convenient for jumping through every mention of a variable, for example.

throwawaylinux · on April 4, 2022

> If you divide the single 11k-line file into a thousand 11-line files, it may become objectively much harder to understand, but it'll also receive much less flak, guaranteed.

A thousand 11-line files? You definitely could not make that guarantee of the people I work with.

smegsicle · on April 3, 2022

> dark twin of bikeshedding

is it different from regular bikeshedding? or are you saying that the dark twin is the evolutionary process of eg. architecture gaining complexity until it becomes difficult to criticize..

civilized · on April 3, 2022

Bikeshedding is usually more about how something should be done than how it should not be done. But, yes, you could think of them as basically one thing.

thedanbob · on April 3, 2022

I once inherited a mission-critical PHP project which had no version control, no tests, and no development environment (all edits were made directly on the server). It used a custom framework of the original author’s own devising which made extensive use of global variables and iframes and mostly lived in several enormous PHP files. I was able to clean it up somewhat, but there was one particularly important file that was so dependent on global variables and runtime state that I never dared touch it.

When I was finally able to retire the project several years later, I first replaced the home page with this picture: http://2.bp.blogspot.com/-6OWKZqvvPh8/UjBJ6xPxwjI/AAAAAAAAOv...

_joel · on April 3, 2022

It wasn't mission critical but my very first production programming project (n.b. I'm not a programmer and never had any classical training or education as one) was an abomination. I'd like to think the realisation of how bad it was, despite it just about working, was a call to arms to up my game a little. I ended up learning a lot about data structures, writing understandable code and comment, when not to write code, all that OOP stuff and things like STI, Generics (still not sure what they are), testing (TDD AND BDD!!! Yea, Cucumber!) and a plethora of other useful things.

I'm still not a programmer.

k__ · on April 3, 2022

Haha, same!

The first project I inherited was PHP app that used a custom UI framework created by an agency that didn't work with us anymore.

One file had 7000 LoC and it would generate hundreds LoC of with-sprinkled JS code and send it to the browser on every click.

Debugging that thing was a nightmare.

lucb1e · on April 3, 2022

My first thought when reading this description was that step one is to make a local copy, get a development environment setup where you can toy around, see how things fit together. The 'stupid'er the setup (like using plain old files instead of a database), the easier that actually gets (apt install apache2 php; rsync da:files /var/www). Wouldn't that have helped solve this particularly important but untouchable file?

thedanbob · on April 3, 2022

If I remember correctly, the file was processing global state from other parts of the system, and it was such a Byzantine bit of code that I had almost no hope of understanding what it was actually doing without being able to observe state in the production system as it was being used. Plus at the time I wasn’t a particularly competent programmer myself (this was my first programming job). In the end I figured it wasn’t worth risking breaking it when its replacement was on the way.

choletentent · on April 3, 2022

If well done, single file projects are not bad. They save a lot boilerplate code. It is also easier to find things, since it is all in the same file.

EDIT: I'll go even further. Programmers who don't like long files are probably using the scrollbar to navigate around the file. Vim saves me from that bad habit.

bo0tzz · on April 3, 2022

What programming language requires 'a lot' of boilerplate code to use multiple files? That sounds awful. I don't think the argument for things being easier to find goes up either, with a tool like grep.

choletentent · on April 3, 2022

You don't need to go far. In C, function prototypes in header files are boilerplate ;)

rwmj · on April 3, 2022

Perl XS (the system used to interface with C) requires module == file, so if you have a particularly large module then it just has to live in a single file. Here's one:

  $ wc -l perl/lib/Sys/Guestfs.xs 
  11930 perl/lib/Sys/Guestfs.xs

Worse still, this expands to C which can be large and takes a noticable time to compile:

  30019 perl/lib/Sys/Guestfs.c

activitypea · on April 3, 2022

I don't get the obsession with file length. What's the benefit of having 100 files with one 50-line function per file, over having a 5000 file with 100 functions? Obviously not counting extreme cases where the file size would break some editors' buffers

perlgeek · on April 3, 2022

Usually (but not always) a single, huge file points towards missing structure, missing abstractions, missing boundaries that aid with understanding.

If it were a huge, single file, with very understandable modularity within that file, likely nobody would've bothered to write a blog post about it :-)

deergomoo · on April 3, 2022

Personally I find it much more difficult to keep n places in one giant file in my head than I do n individual files.

We have a few multi-kloc legacy monsters where I work and I quite often completely lose my place when working on them (and, by association, my train of thought), even though they’re actually structured somewhat reasonably.

niccl · on April 3, 2022

I had this problem until I found an editor that had outlining as it's core design paradigm. Now, with the outline always visible, it's _really_ easy to navigate any length file.

Unfortunately, at one point I got so used to navigating with the outline that I ended up making a 1500 line function in C (I was an even worse C programmer then than I am now). Because of the outline, I could read and follow it easily, but anyone with a different editor was royally screwed :-(

If you're interested, the editor is LEO (http://leoeditor.com/) it's been mentioned on HN a few times

enneff · on April 3, 2022

I think the problem in this case was that the entire file was the script that ran top to bottom. It’s not so much that the file was big, but that the function was huge and impossible to reason about.

I agree that obsessing over file length is it’s own kind of anti pattern. I have had colleagues who insist on putting every little thing in a different file and that is its own special kind of hell.

montenegrohugo · on April 3, 2022

Try debugging a single 10k loc file versus fifty small modules where each takes care of a distinct part of the logic.

userbinator · on April 3, 2022

As someone who did a bit of enterprise Java, I much prefer the former. Jumping around between lots of tiny files and not being able to see where the actual work happens because it's spread everywhere is a debugging nightmare.

vonseel · on April 4, 2022

I think you need a better IDE.

jokethrowaway · on April 3, 2022

I'm not too bothered by the single 10k loc file (and I've seen plenty of files with thousands of lines). I would aim at files in the range of 200-300 LOC

If you split it, it's crucial that you're splitting the logic in the right way (if the modules are too small, they'll just waste your time) and that you're making sure references can be easily traced (eg. if you have modules with some DI system which prevents references from being recognised, as it happens frequently in certain node.js enterprise applications).