Having done several Typescript migration on 100k+ line codebases, this rings entirely true for me.
An additional problem with doing partial migrations is that it leaves people in a position to just not fix the file they’re working on, since half converted files are the norm, and they have more important things to do.
I’m currently going through this at work and my experience even with smaller packages agrees with the author: After trying it the loose way and failing, I realised the best way to proceed is with maximum strictness from the start, converting the .js files to .ts in dependency order. When a file is converted from .js to .js, it’s almost guaranteed to be 100% converted. There are probably some edge cases where some refactoring necessitated by the conversion propagates to its already converted dependencies, but I would expect those to be minimal.
I wish mypy was the same. I’ve tried to add typing for our fairly large (100k+ LOC) python libraries 3 or 4 times now and failed every time. One of the big issues is strict isn’t well supported, and some libraries and frameworks just do not play well with typing or have separate half baked stub libraries. I also firmly believe it should integrate seamlessly with e.g. pre-commit, and that hasn’t been my experience either, with special workarounds needed for the pre-commit hook to perform identically to running it in the locally installed venv.
So we just write new code with types by convention but have no actual linting or checks because it’s too hard to actually static typing to an existing repository.
1. Give up on pre-commit. It would be nice if it just worked, but it doesn't. Just make a type checking script and run it as the first thing in CI.
2. Use Pyright, not Mypy. It's much much better.
3. You can start with a very loose config and gradually make it stricter.
4. Make liberal use of `type: ignore` (and add TODOs to make it clear they're not intentional).
5. You can also make an alias for Any and use that for types that you intend to figure out but haven't yet.
Typing Python that wasn't written with type hints in the first place is always going to be a complete nightmare though. People that write untyped Python tend to write it in highly dynamic ways and Python's type hints often aren't expressive enough to describe them (e.g. they only very recently added support for properly typing kwargs).
> Typing Python that wasn't written with type hints in the first place is always going to be a complete nightmare though
Couldn’t agree more. I joined a new team and recently finished converting a 20K line repo. It was a grueling grind to get mypy running in strict mode.
I’m continually finding that a huge benefit of typed python code isn’t autocomplete or catching bad argument passing, but preventing developers from creating patterns that are unmaintainable.
If your code is too complex to type correctly, its probably too complex in general :)
> If your code is too complex to type correctly, its probably too complex in general :)
Ha I definitely agree with that. There are some exceptions where you want to do a reasonable thing that the types just aren't expressive enough for, but in general if you can't explain it so the type checker understands then your coworkers won't either.
> 1. Give up on pre-commit. It would be nice if it just worked, but it doesn't. Just make a type checking script and run it as the first thing in CI.
I’ve been slowly reaching this conclusion after a few years of running into little issues. We run our pre-commit hooks as their own CI test via tox. I’m beginning to feel both tox and pre-commit are more trouble than they are worth, and most of that trouble comes from them trying to create and manage their own mini-environments. Which is never going to match having explicit envs, state, and deps via docker files that can be granularity inspected and/or deployed to other contexts.
> 2. Use Pyright, not Mypy. It's much much better.
Does this work well with pycharm? I think I looked into this a while ago and it seemed really tightly coupled to VSC. And I know some people love it, but I’ve found VSC’s config for linting/type checks/ auto formatting to be a complete confusing mess, and that it is generally much slower and buggier than pycharm (e.g. it will highlight errors that don’t exist, not update syntax highlighting instantly, etc).
I know it’s entirely possible that it’s just me who is an idiot here, but I’ve set up VSC on like 3 separate machines now in the last few years and it invariably ends up in some weird broken state, especially regarding highlighting code performantly and correctly using the same rules we have in CI.
Nah pre-commit is great for lots of quick checks (formatting, trailing whitespace, linting shell scripts etc), it's just that type checking Python requires the Python dependencies to be already installed and that means setting up a venv and installing stuff and that's just a bit too complex.
> Does this work well with pycharm? I think I looked into this a while ago and it seemed really tightly coupled to VSC.
Possibly not. You can definitely run it from the command line without VSCode since Pyright the type checker is open source. The VSCode Python LSP server Pylance is closed source though (this provides additional features beyond type checking like completion and refactoring).
Haven't used Pycharm for decades but I'd guess it has its own thing.
While I have never done a js to ts conversation, trying to modularize a ts monolith into even some rudimentary packages was very difficult because of circular dependencies at the file level
(which is entirely possible).
Ended up giving up at the time because of cost/benefit.
In one of our single-package repos, I added some ESLint rules to effectively forbid circular module dependencies. Not only are they a code smell, but when modules have side effects (which is really common), they can break circular dependencies at arbitrary places in the cycle.
We’ve been gradually moving to a very small number of monorepos and can better achieve modularity by liberally creating small packages and forbidding package-level cyclic dependencies.
Bit of a shameless plug, but I wrote about how Etsy migrated to TypeScript a few years back. I agree with the author generally: TS strict mode is the way to go, and you may find yourself regretting the alternative, even if it sounds appealing. It takes a lot of work, but most of the effort ended up being:
1. writing types for the big, central libraries that everyone uses (and make them "smart enough" to infer as much as possible).
2. teaching everyone at the company a brand new language without them hating it, and without their team losing (much) velocity.
I don't really like the idea of sprinking around 'any' to silence noImplicitAny problems.
Either put in the correct type or leave it as is. Placing 'any' in the code just fossilizes badness. Implicit any is just easier to accept, because when you finally have time to get serious you can disable it an write the correct types. You don't have to hunt for 'any' throughout the code.
We’ve forbidden the use of any specifically because it’s unwieldy. You can develop using ‘unknown’ until you’re finally ready to type something correctly. It works like any, but it doesn’t allow you to reassign your type. In our opinion any is one of the legacies of JavaScript that you should never, ever, use.
It’s not like you can’t of course, it’s just that it doesn’t “encourage” your developers to type things as they build them, which will eventually lead to shortcuts for a lot of people (myself included). Even if it doesn’t, it’s usually better for people to think about their types earlier rather than later, especially the less senior they are, or at least that is our experience. It’s much easier to get new people onboard when they aren’t doing “bad habits” that they then have to fix before their code is allowed to pass through the production pipeline.
I think it can make it easier to keep track of the work that needs to be done when explicit any (noImplicitAny) is used. I use them to monitor percentage of progress
typescript-eslint has a rule targeting explicit any annotations, and even several rules about assignments etc which are inferred as any. If you really want to include any in a strict-gradual migration strategy (I’m skeptical but could be persuaded depending on the baseline), you can use those rules as warnings or whatever makes the most sense. It’s definitely statically analyzable without a ton of bespoke effort.
"rg -w any", with an optional --count if want to count them rather than list their locations. It functions quite similarly to "unsafe" in Rust, and like "unsafe" I think ideally each "any" would come with a comment explaining why it's use is justified in that case (which is hard to enforce if the "any"s are implicit)
I agree completely that each any should be justified. That's why I'm against sprinkling it to silence errors. noImplicitAny with each manual any clearly justified is the desired end state of the migration.
I routinely ask it to convert javascript to typescript.
It does a great job for the most part in seconds, leaving me to clean up where it got it wrong, and then fix the bugs that typescript exposed.
Surely this makes the task easier?
The only trap to avoid is it takes a conscious decision to do no other factoring than the typescript conversion. Don’t start “just fixing this other little thing while I’m here”.
The real benefit of a statically typed language, especially one like TypeScript with a rich type system, is that you get to express your application in terms of the type system, and that you tend to think a lot more about your data structures.
You can, for example, make invalid states impossible to represent. This can catch a huge number of bugs and prevent some types of bugs from ever appearing.
If you're using something like ChatGPT or some other automated system to add types to your code you have adopted a type system with receiving very little of the benefits. You end up with many of the drawbacks and only a little of the reward.
> You can, for example, make invalid states impossible to represent. This can catch a huge number of bugs and prevent some types of bugs from ever appearing.
I've yet to see these claims realised not by a handful of type gurus with ad-hoc type-level DSLs, but in a more-or-less generic way applicable in every day use.
You definitely have seen it work if you've worked in any statically typed language, for example:
// this will (hopefully) not compile in your favorite language
int i = "Hello World"
Using type systems to enforce business logic is applying the same concept, just at higher levels of abstraction.
Usually you will want (or need) to use the more advanced type features that your language offers, like generics and algebraic data types.
There certainly is a cost to learning these concepts, but the good thing is that many languages borrow these same concepts, so the knowledge easily transfers.
The gulf between this and "make invalid states impossible to represent" is a vast as the void between two galaxies.
> Using type systems to enforce business logic is applying the same concept, just at higher levels of abstraction.
"Just"
> There certainly is a cost to learning these concepts, but the good thing is that many languages borrow these same concepts, so the knowledge easily transfers.
To repeat myself: I've yet to see these claims realised not by a handful of type gurus with ad-hoc type-level DSLs, but in a more-or-less generic way applicable in every day use.
It's sort of postponing the actual work, though: there is indeed value in asserting that your code adheres to some domain model, but it's not entirely clear to me whether the value is outweighed by the friction when that arbitrary ChatGPT-generated domain model disagrees with your domain. There is dramatically more value in asserting that your code adheres to the correct domain model, and you can only know that by, well, modelling your domain.
This is true, but: it’s far easier to get there once you have gotten types in place in the first place—because that kind of domain modeling very often entails non-trivial refactoring work. Our guidance to folks when I was at LinkedIn was that as much as PRs possible adding types should not change the runtime code at all, only describe what was already there (even if it was really dumb!). That meant TS conversions were never going to be at fault for prod issues, only for fixing prod issues; and it therefore also meant that those PRs were much less likely to get reverted… which is super important since a revert would then have knock-on effects on any other modules which relied on the types from the reverted PR.
Ooh that may have been the approach for my sister comment about circular deps. Thanks for the insight.
The solution for me would be make new libraries (without circle) and migrate old code to reference them. Instead of pulling old code into the new libraries with the circular deps coming with them
Is there any automated tool that can add types? I have Playwright E2E tests – it seems reasonable that the type each variable is used as could be recorded and the code annotated with that.
The specific thing you’re asking for there is possible but non-trivial if you try to do it at runtime with tools like Playwright; more importantly it’s more overhead than you really need, and runtime types are often not exactly the same as what you want to describe an API, as you’ll be looking at various objects where you then have to decide how much of the prototype chain to walk and lots of other decisions like that. The way most tools in the space work is by leaning on TS’ own programmatic API. The two most interesting such tools I know of are AirBnB’s ts-migrate and LinkedIn’s RehearsalJS (full disclosure: I was loosely involved in design discussions for the matter, but did not contribute to it code-wise).
I wouldn’t recommend that. You want your types to have meaning for the business and you want to derive types from each other. It’s counterproductive to individually type every variable without context.
In python at least, with strict-mode, Any is useless because you can do almost nothing to it. You can print, you can call 'dir' 'help' and other built-ins, and you can pass it to functions that accept Any.
That means that sprinkling any around your code isn't a great way to silence errors.
I don’t know about Python, but for converting JS to TS you start with allow-implicit-any turned on and add types to your business objects as you go. Eventually your turn it off and fix what remains.
Note that while a lot of people do it this way, my blog post this discussion is notionally about on expressly argues that that’s exactly what not to do!
The article only touches this: when converting to TypeScript, `any` is useful, but in the end you don't want this type in your codebase - so don't forget to use typescript-eslint [0] and turn on those no-unsafe-* rules which guard against `any` leaking into your code.
This seemed like the best approach, and the team I was last part of took it with reasonable positive results. One of the challenges however was less about actual module dependencies being converted before or after, but about conflict resolution when we'd end up needing to create and reconcile the same type across sibling leaves and coordinating that on the fly, especially while learning about the files we might not have written and only being able to reason about the logic at a high level before getting into the weeds of it.
I recently converted most of my team's code alongside two coworkers. While I think it probably shortened the overall time, trying to split that work meant we stepped on each other's toes a lot. Due to the usual time pressure and regression avoidance, we've ended up with more casting and non-null assertions than I'm comfortable with. Still I think it was worthwhile.
https://medium.com/airbnb-engineering/ts-migrate-a-tool-for-...
And also open sourced the tooling:
https://github.com/airbnb/ts-migrate