Having a proportion of the team act as triage for issues / alerts / questions / requests is a generally good pattern that I think is pretty common - especially when aligned with an on-call rotation. I've done it a few times by having a single person in a team of 6 or 7 do it. If you're having to devote 50% of your 4-person team to this sort of work, that suggests your ratios are a bit off imo.
The thing I found most surprising about this article was this phrasing:
> We instruct half the team (2 engineers) at a given point to work on long-running tasks in 2-4 week blocks. This could be refactors, big features, etc. During this time, they don’t have to deal with any support tickets or bugs. Their only job is to focus on getting their big PR out.
This suggests that this pair of people only release 1 big PR for that whole cycle - if that's the case this is an extremely late integration and I think you'd benefit from adopting a much more continuous integration and deployment process.
> This suggests that this pair of people only release 1 big PR for that whole cycle
I think that's a too-literal reading of the text.
The way I took it, it was meant to be more of a generalization.
Yes, sometimes it really does take weeks before one can get an initial PR out on a feature, especially when working on something that is new and complex, and especially if it requires some upfront system design and/or requirements gathering.
But other times, surely, one also has the ability to pump out small PRs on a more continuous basis, when the work is more straightforward. I don't think the two possibilities are mutually exclusive.
Yeah, but again you might be being too literal. You could get a half dozen "big PRs" out in a month or so, but you'd still want to be able to just focus on "getting your (current) big PR out", you know?
The important part is that you're not interrupted during your large-scale tasks, not the absolute length of those tasks.
> In any case, the nature of the product / features / refactorings usually dictates the minimum size of a PR.
Why not split the big tickets into smaller tickets which are delivered individually? There's cases where you literally can't but in my experience those are the minority or at least should be assuming a decently designed system.
> I think in this sentence, there's a hidden assumption that most projects look like your project(s). That's likely false.
You left out the part of that quote where I explained my assumption very clearly: A decently designed system.
In my experience if you cannot split tasks into <1 week the vast majority of the time then your code has massive land mines in it. The design may be too inter-connected, too many assumptions baked too deeply, not enough tests, or various other issues. You should address those landmines before you step on them rather than perpetually trying to walk around them. Then splitting projects down becomes much much easier.
> You left out the part of that quote where I explained my assumption very clearly: A decently designed system.
That's one possible reason. Sometimes software is designed badly from the ground up, sometimes it accumulates a lot of accidental complexity over years or decades. Solving that problem is usually out of your control in those cases, and only sometimes there's a business driver to fix it.
But there are many other cases. You have software with millions of lines of code, decades of commit history. Even if the design is reasonable, there will be a significant amount of both accidental and essential complexity - from certain size/age you simply won't find any pristine, perfectly clean project. Implementing a relatively simple feature might mean you will need to learn the interacting features you've never dealt with so far, study documentation, talk to people you've never met (no one has a complete understanding either). Your acceptance testing suite runs for 10 hours on a cluster of machines, and you might need several iterations to get them right. You have projects where the trade-off between velocity and tolerance for risk is different from yours, and the processes designed around it are more strict and formal than you're used to.
That's only late if there are other big changes going in at the same time. The vast majority of operational/ticketing issues have few code changes.
I'm glad I had the experience of working on a literal waterfall software project in my life (e.g. plan out the next 2 years first, then we "execute" according to a very detailed plan that entire time). Huge patches were common in this workflow, and only caused chaos when many people were working in the same directory/area. Otherwise it was usually easier on testing/integration - only 1 patch to test.
That's also been my experience. It's part time work for a single on call engineer on a team of 6-8. If it's their full time work for a given sprint then we have an urgent retro item to discuss around bug rates, code quality and so on.
A question for both you and the parent:
If you are heavily performance and memory constrained, why are you using a language that gives you relatively little control over allocations?
In my case it was a choice made long ago, when the exact consequences weren’t fully apparent. I don’t think when making those initial decisions people understood how important low memory usage would turn out to be, or that Go would be an obstacle to that. And we’ve got so much code in Go at this point it would be a huge lift to switch languages. The language does have some nice features that make things easier. It’s just in certain portions of the code, in the really hot loops.
Whenever I used channels in Go, I regretted it at the end. Always have to look up "what happens when I do x and the channel is closed" kind of stuff. Code becomes weird with all the selects and shit.
Yeah, where I worked we rarely used channels directly in business logic. Most of our code looked like "do 5 different time consuming operations, some of them optional i.e can fail" and then combine them all appropriately depending on success/failure so we simply made a BoundedWaitGroup primitive using sync.WaitGroup and a channel to ensure the boundedness that gets used everywhere.
The key here is that in production it's almost always not safe to actually apply a down migration - so it's better to make that clear than to pretend there's some way to reverse an irreversible operation.
The go tools for managing DB schema migrations have always felt lacking to me, and it seems like your tool ticks all of the boxes I had.
Except for one: lack of support for CREATE INDEX CONCURRENTLY (usually done by detecting that and skipping the transaction for that migration). How do you handle creating indexes without this?
Long-running index creation is a problem for pgmigrate and anyone else doing “on-app-startup” or “before-app-deploys” migrations.
Even at moderate scale (normal webapp stuff, not megaco size) building indexes can take a long time — especially for the tables where it’s most important to have indexes.
But if you’re doing long-running index building in your migrations step, you can’t deploy a new version of your app until the migration step finishes. (Big problem for lots of reasons.)
The way I’ve dealt with this in the past is:
- the database connection used to perform migrations has a low statement timeout of 10seconds.
- a long-running index creation statement gets its own migration file and is written as: “CREATE INDEX … IF NOT EXISTS”. This definition does not include the “CONCURRENTLY” directive. When migrations run on a local dev server or during tests, the table being indexed is small so this happens quickly.
- Manually, before merging the migration in and deploying so that it’s applied in production, you open a psql terminal to prod and run “CREATE INDEX … CONCURRENTLY”. This may take a long time; it can even fail and need to be retried after hours of waiting. Eventually, it’s complete.
- Merge your migration and deploy your app. The “CREATE INDEX … IF NOT EXISTS” migration runs and immediately succeeds because the index exists.
I’m curious what you think about this answer. If you have any suggestions for how pgmigrate should handle this better, I’d seriously appreciate it!
I think that’s the safest approach, but it’s inconvenient for the common case of an index that’ll be quick enough in practice.
The approach I’ve seen flyway take is to allow detecting / opting out of transactions on specific migrations
As long as you always apply migrations before deploying and abort the deploy if they time out or fail, then this approach is perfectly safe.
On the whole I think flyway does a decent job of making the easy things easy and the harder things possible - it just unfortunately comes with a bunch JVM baggage - so a Go based tool seems like a good alternative
This post is pretty much a paraphrasing of the "Foundations: Why Britain Has Stagnated" essay, which was discussed a few days ago and apparently ultimately got flagged.
People have very different views on this and there are no rules, but in my view one of the major benefits of pairing all the time is NOT doing code review.
If you want people to adopt your system, you should aim for it to be easier to adopt than whatever their alternatives are.
So I would look into where they're losing time because they need to deal with repeated or inconsistent UI stuff, and make those the priority for the library to deal with first.
If you're forcing work on them for no benefit, then you're not going to get any takeup - so I would also budget plenty of time for the kit-owning team to do a lot of the initial rollout until you can get enough of a critical mass.
Is that the only thing you talk to your friends about?
I mean yeah literally you could talk about your plans [i.e. by yourself or with the other real people] or what's going on in your life and have the bot chime in with thoughts too. You could talk about food, philosophy, politics, travel, complain about stuff, get advice for relationships or work, gossip. All in a group chat setting.
It's not hard to imagine a bot that's able to contribute here and there in a meaningful way in any of these subjects, and other unlimited subjects. But perhaps you weren't open to it in the first place so I'm not going to try to convince you further if you don't buy it; If you don't want to really give it consideration then nobody is going to convince you otherwise.
The thing I found most surprising about this article was this phrasing:
> We instruct half the team (2 engineers) at a given point to work on long-running tasks in 2-4 week blocks. This could be refactors, big features, etc. During this time, they don’t have to deal with any support tickets or bugs. Their only job is to focus on getting their big PR out.
This suggests that this pair of people only release 1 big PR for that whole cycle - if that's the case this is an extremely late integration and I think you'd benefit from adopting a much more continuous integration and deployment process.