We have pondered about capability based security for Deno in the past. Our conclusion has always been that this is not possible to do securely in JS without freezing all prototypes and objects by default. The reasoning for this is that you need to make sure the capability token does not ever leak. For example as a malicious user I could override `globalThis.fetch` to exfiltrate the capability token destined for `fetch` and use it myself later.
One could also override `Map.prototype.set` / `Map.prototype.get` to exfiltrate a token every time it is added or removed from a `Map` (people will want to store tokens in a `Map`).
One could also override `Array.prototype[Symbol.iterator]` to exfiltrate tokens stored in arrays if those arrays are destructored, spread, etc.
There are many more cases like this, where one can exfiltrate tokens because of the very dynamic nature of JavaScript.
It is unlikely that freezing all intrinsic prototypes and objects is even enough. People will find ways to exfiltrate tokens.
Yup, SES would address this. But SES also needs to bring with it a paradigm shift for JS:
a) Folks would have to load each bit of code they want separate permissions for in a separate compartment. This won't be easy.
b) Runtimes will need to provide an immutable global realm, which is not something that is the case right now.
As I said in a different commeent, I think a lot of this can already be addressed by ShadowRealms. Deno will likely allow users to specify per ShadowRealm permissions, which is probably as granular as most people will want to get.
There's Endo, which does the loading/importing and dependency resolution. It doesn't have a default user experience at this point, but it creates compartments for packages and runs.
Getting permissions involved will require mapping them to packages in a policy file and there you go - an environment where you can use packages and they can't surprise you with data exfiltration etc.
There's LavaMoat, which enables using SES confinement around normal npm packages by creating a policy file for what can be imported/required by that module (and can auto-generate a suggested policy file from what appears used, which fails to greater restriction/security, and can easily be expanded):
https://github.com/LavaMoat/LavaMoat
Hi there, TC-39 delegate and MetaMask co-founder here.
SES does address this, and strives to achieve "object capability security", wherein access to a function is equivalent to permission to use it.
One difference between an object capability approach and the capability-token approach described in the OP article is that in an ocap approach, you would have no need for passing around a capability token just to pass it to the restricted methods: Instead, you simply disallow importing modules by default, and now pass in any restricted methods to modules that you want to have access to them. I find this approach greatly more ergonomic, and if you ever want to further restrict a function, you don't need a new token, you just write a closure with your own policy defined in it!
By the way, we've developed a tool called LavaMoat that allows applying SES security to existing npm modules, no token-passing needed, by restricting the environment of each module per a policy file.
https://github.com/LavaMoat/LavaMoat
> It is unlikely that freezing all intrinsic prototypes and objects is even enough. People will find ways to exfiltrate tokens.
This is probably true, but frozen intrinsics would make it a _lot_ harder. Right now it's not reasonable to ask a library to be defensive against capability exfiltration, since it means not using any built-ins, but I think with frozen intrinsics it would be reasonable to treat a library leaking its capabilities as a security bug. There would still absolutely be leaks - most significantly in libraries which export classes and don't freeze the class prototype - but things would no longer be completely insecure by default. It would make malicious code have to work a _lot_ harder.
I think it's worth a shot. Deno removed the __proto__ getter/setter, and that did require a bunch of libraries to update, but it worked out OK.
Node already has --frozen-intrinsics, if anyone feels like experimenting with whether that would break your code.
Yeah, I agree it is definitely worth trying! I think all the talk around SES will push JS as a whole further towards something that could support capability based permissions securely in the future.
These seem like features that could work a bit better in the context of WASM modules and/or components. AIUI, WASM was designed with the expectation that support for capabilities would be required.
Your input is appreciated. But to anyone not using JS on a daily basis, the above reads as a blur of detail, rather than a clear, actionable message about to do about the overall situation.
So I'd be curious to know if you have a holistic response to primary claim made by the post, which is:
The fundamental problem with npm is that any package you install has full access to do whatever it wants on your computer.
> The fundamental problem with npm is that any package you install has full access to do whatever it wants on your computer.
Let me try to answer more concisely at a higher level:
Due to JS being a very dynamic language, it is at this time not possible to give different packages inside of the same JS runtime a different set of permissions or capabilities securely.
Because of this, the best we can do for sandboxing right now is permissions / capabilities that are set per JS runtime, rather than per dependency / module.
This is what Deno does. It allows you to set capabilites for the set of your code and all of it's dependencies at once. If you want to run a certain module with lesser permissions, you need to move it into a seperate JS runtime, either by running it on a seperate thread with a web worker, or in a seperate Deno process. Both web workers and subprocesses can have different (lesser) permissions to their parent.
it's been fascinating watching the JS people reinvent the concept of an OS from scratch. Today it's the security model, yesterday it's the task scheduling, etc.
Interesting little bit of outsider art, "what if front-end people designed an OS" and all, and it's going about as well as you'd expect with that.
I mostly wouldn't mind, but there's certain thresholds that I think are noxious or dangerous as a user. WebGL strikes me as much more dangerous than most people anticipate, GPU drivers are not really hardened against hostile shaders very well, it's quite likely that various escapes and data-leaks exist imo. Someone I know who worked on implementing that and said that out of all the graphics vendors, none of them liked it but the blue one's engineers had genuine fear in their eyes when the functionality was laid out to them.
I am also not really looking forward to when javascript realizes the need for persistent services... but thankfully I think web architecture mostly means that's somebody else's computer.
This comment displays a lot of ignorance and prejudice.
"JS people" have not been reinventing much at all. Operating Systems aren't even close to offering what "JS people" needs at the granularity they would benefit from. By that measure, the same complaint can be made about the Java or .NET virtual machine, for example.
If anything, what the "JS people" of today is trying is to fix are all those "dangerous parts" that were added there with abandon by browser makers and OS people and will probably never be fixed in our lifetime.
> I am also not really looking forward to when javascript realizes the need for persistent services
It might surprise you, but Node.js works in the backend from day zero, and Service Workers have been working in the frontend for several years now. You're more than a decade behind the times.
Exactly, I'm not sure what people are trying to fix here. They want code they import in their projects to be sandboxed so they import untrusted code. That seems a bit f*cked up to me. Running untrusted apps is already pretty hard, running apps where some code is trusted and some not, seems... a bad idea.
IMO since the problem is generic to all languages we should have a generic solution, using standard sandboxing technics (containers, VMs, jail, etc)
Makes me think of Plan 9 and its almost religious use of the 9P protocol to control what resources are available to a process. Or perhaps the Erlang VM with communication via messages. Composing systems in this way is very different than adding untrusted code to your application but seems like an interesting alternative approach.
I can't possibly come up with a reason WHY (Maybe Python's standard library is that much better than Node?), but Python dependency trees just tend to be so much simpler. If I "pip install $PACKAGE", it's usually only a couple additional dependencies that get included unless it's one of the huge ones like numpy. If I "npm install $PACKAGE", I can expect 50+ additional dependencies to get added.
For some reason, Node engineers are much more likely to "npm install leftpad" than write the 2 lines of code it takes to implement string padding.
The problem in Javascript is localized. Lots of extremely important foundational packages like React, Vue, Typescript, and even ones like Prettier don't really have lots of dependencies.
The biggest culprits of the craziness are things like Babel and Webpack, which require tens or sometimes hundreds of packages for a basic install. Some of those aren't really needed in Python, for example. But in JS there are alternatives to them.
On the other hand, some foundational packages in languages like Python and Ruby have a lot of dependencies. I don't know about Python much, but Rails also has a large dependency graph, despite Ruby having a good standard library.
The JS standard library is very slim. I think this is both its greatest strength and weakness, and to your point, makes it likely that people just install some random package v's solve the problem themselves. I'm not sure which is better philosophically, but personally having done a lot of PHP dev in the last decade I find the JS code i'm writing to be easier to read but much harder to understand and run from a package management perspective. I think the developers working on solutions in JS would probably behave very different if they were given more built in tools to work with.
the same is true of java - less so in newer versions, but older versions the stdlib was pretty slim and low-level and there was a massive amount of library code written to wrap around it.
it still doesn't devolve to the javascript phenomenon of one-file (or one-function) libraries and so on. Dozens of libraries, sure, hundreds, maybe, but nobody actually has thousands of dependencies like you do on node.
Apparently "number of repos maintained" is a KPI so there's some gamesmanship there. Maybe it's also being used as a caching thing, smaller files might be more cacheable if you don't minify?
More generally though this may be the result of that "enterprise culture" that is sometimes looked down on in other situations, that at least stuff is getting bundled into appropriate packages for distribution vs just chucking every single file into a node package.
Dynamic languages such as JS allow you to override almost anything. It’s essentially impossible to sandbox “part” of JS code, you must either sandbox the entire runtime (e.g. in browsers) or allow everything (e.g. in node.js).
Individual packages want to run arbitrary JS code to do “safe” things during installation. But it’s very hard to allow them to do anything meaningful like file access, even if it’s “safe”, without potentially allowing exploits.
Not the OP, but the approach sounds roughly similar to SE linux, but for node.
A major problem with a half-secure security solution is that it's not actually secure. You might do everything right, and still get owned. For the threat model the solution needs to be complete (or able to be complete, but turned down to less than complete by the user*).
Part of the way this manifests with SElinux is that to have a fully locked down box with SElinux you have to consider access controls for _everything_ on the box on top of regular unix permissions. And to actually be a full solution, you have to install kernel headers because anything user-space isn't good enough to guarantee full security.
For node to make a similar guarantee locking down everything means turning off a lot of the features that allow javascript to be so dynamic or making major changes to the run-time implementation to support being able to use those features without making the security swiss-cheese.
And then, even though the system is securable, there's the issue of turning it on for all the modules you use. And not just the module's you use, but the modules your modules use and on and on. You could delegate the module-module to the module you import but that means either granting overly broad permissions to the module and hoping they don't screw something up (and that any module they delegate to does the same) or not delegating anything and personally granting explicit permissions to every module, no matter how deep in your hierarchy of requirements. If that sounds exhausting, it kind of is.
In SElinux's case what that leads to instead usually is rearchitecting things such that you can use virtualization to sandbox things and limiting access across the sandboxes (ohai, it's the deno solution). There's still places where a VM or other sandbox isn't appropriate though and if you want to be on linux you have to use SElinux (or a competitor, but i've only touched selinux), and it's not uncommon to have an entire team whose only job is to configure SElinux and support other teams that interact with it (eg, coaching them on how SElinux interacts with their codebase and what they need to tune, auditing teams' SElinux configs, and keeping SElinux working for the base system as that gets upgraded). And if you screw up or get lazy with auditing permissions, you've just limited the effectiveness of SElinux, possibly rendering it useless.
A large part of the reason SElinux is so hard to use and use right (and that directly translates to js and node) is that it's attempting to bolt-on security to an existing system that wasn't designed with (that kind of) security in mind. That's a monumentally hard thing to do in a way that doesn't require rewriting everything that uses it. And not having to rewrite everything is a hard requirement, because if you're going to rewrite everything that uses it, it's usually cheaper and easier to just make something new from scratch (in SElinux's case a new OS, in js's case a new language).
So holistic options:
1) remove all the dynamism that makes javascript javascript. this (potentially) breaks all existing code. Call it rustscript and get that to ship in all the browsers, and get all the websites to use that instead of javascript, and then make a serverside environment for rustscript. Now you can do fine-grained module permissioning.
1.5) remove only the dynamism that breaks this sort of security access control as part of a new ECMAScript spec and add support for the security access control at the same time. This breaks existing code, but the old code can still run in a runtime for the earlier spec. New code can take advantage of the new spec features. Old code can be modified to work with the new features. Broken code can be rewritten to the new spec. This makes rustscript ESNext. It will be up to various runtime to support this new runtime, so nodeNext will have support for it but it won't get backported. Browsers will require transpilation from ESNext to an earlier ES version as they do now, but eventually even they would drop support for the older js versions.
2) accept that module permissioning systems are easy enough to get around in JS that anything attempting to implement them is at best security theater. The deno solution isn't security theater, but that's because it makes much less stringent guarantees (ie only runtime granularity and not module granularity).
* why allow the user to make themselves insecure? In some cases, the user will choose to be less secure for some external reason or will be using the security solution as a part of a more holistic security solution, so some other part guarantees the security that is given up.
That's a pretty extreme way to say it. Python is in essentially the same boat, as is bash, Ruby, PHP, dart... Calling languages "broken" is to say that one hard problem makes them unsuitable for use, which is hardly true.
I think the main concern here is that it isn't that the language is too permissive -- but that the primary installer (and what many people cite as one of the greatest strengths of JS, largely responsible for ushering in its Golden Age) -- is structurally (and perhaps irreparably) insecure.
Whether this is really so (and in a significant way compared to its other competitors, out in dynamic programming land) -- is what people seem to be trying to suss out in this thread.
Javascript was designed and intended to be used within the context of a browser to manipulate the DOM and script webpages. It is unsuitable for any other use.
The fact that including a package in your webpage/app, which includes a tree of hundreds of others, one of which can 'dynamically' overwrite/redefine other pieces of the namespace with impunity, is why it's broken in the first place.
ergo it is insecure by design for all uses, and unsuitable for anything important relying on code you wrote doing what you expect.
It's fair to say it's broken in a world where what we do in webpages MATTERS. It was fine in the past where nothing truly important happened in web pages, but that ship has sailed.
Based on your use-case, it may already been solved. Microservices, for example, are put in a container, which allow you to give read-only\full access to selected files. They also usually have some kind of Mesh side container that limits connectivity between whitelisted containers or public DNS. And since containers can limit memory and throttle CPU, you are more or less at the best you can get using untrusted 3rd party code.
Yup, SES is very interesting, and I think it can solve this problem in the future. As this post is about the here and now though, and SES is not yet ready for widespread adoption, I think my point stands that at this current time it is not possible to securely do capability based security in JS.
Hi, first thanks for your work both in TC39 and Deno.
But that's not the heart of the problem for node. The problem with node ecosystem and is a combination of a refusal from Core to provide a meaningful standard library, a set of packages people can trust so they don't have to download random things from the internet just to parse the body of a multi-part request for instance, and the fact that NPM was badly architectured as it's going to fetch as many versions of a same package as it needs to solve dependencies. Good package managers don't do that, period. Of course that, given the dynamic nature of Javascript, anybody can monkeypatch anything. But the language itself isn't at fault, it REALLY IS both the politics of a paper thin STD lib and bad package management with stupid dependency resolution.
And frankly Node.js should stop shipping with a package manager and its infrastructure entirely controlled by Microsoft. it's so bizarre how nobody seems to mind that in the Node community... NPM servers aren't open source.
The notion of 'securing part of a runtime' is essentially a fallacy.
'Sandboxing' anything really is quite hard.
If you want to make a firewall between blocks of untrusted code, they have to be run in completely isolated from one another, which creates a lot of overhead.
I don't see anyway to mix untrusted with trusted code.
If some of your code is untrusted, it's all untrusted.
We probably need an operational situation to this, which is, you pay a fee to some org that literally sits there and pours through the code to look for hacks, and there's some verification & oversight a to who is submitting what. etc..
It may very well be that the days of regular open source are over, there might have to be some changes to it.
I can imagine in 10 years from how, all you telling junior devs about the 'good old days' when people just randomly put some code up on a site, and you used it! They will hardly believe you. "That's crazy, a few line of bad code could wipe out your company!".
Why not just freeze anyway? At the end of the day you don’t know what you don’t know. Deno is well positioned to make these sorts of restrictions, how often is someone doing something as mental as modifying prototypes in a nodejs environment anyway?
(Moment comes to mind actually, but does it really matter? That library is deprecated anyway)
Part of the draw of deno is that for better or worse it's javascript. If you start changing things about the language used in the deno runtime such that it's no longer compliant ecmascript, then you no longer get the benefits of it being ecmascript. The devs' mental modal of the language isn't a drop-in, libraries and modules including popular ones, are no longer guaranteed to work out of the box, etc.
You might say that all that is worth the benefit, but in that case why not just use another language that already gives you the feature you want or why stop there? Why not also fix other issues with javascript at the same time since we're no longer preserving compatibility? Something mental like the automatic type coercions? Or getting rid of var?
Deno has already made similar changes, like https://github.com/denoland/deno/pull/4341. That particular change happens to be allowed by the JS standard. The change discussed here isn't currently allowed, but I suspect TC39 would be open to making it allowed (though obviously it would not be allowed for browsers, in the same way the linked change to __proto__ is not allowed for browsers).
If you change something that most code isn't relying on, most code will still work. This change is plausible because it's very rare for code to be mutating built-ins. That's not true for most other possible "fixes". And most other changes would not have a benefit to consumers of the application (who cares if the library you're using has `var`s?), so they're much less well motivated.
I wouldn't consider that a similar change, because it's fully compliant with a newer js spec, which deprecated that feature. It sounds like, what deno has done is removed native support for older js specs and instead makes you transpile to an earlier spec. And by removing support for the earlier specs, they are able to drop support for deprecated features.
"allowed by the js standard" is key. As long as it's allowed by the standard they're still fully compliant with it. The compliance is necessary because no one wants to deal with "mostly compliant". Users want certainty, so "mostly compliant" becomes "fork their spec and make your own, so i can know what guarantees you make". That's why each change to the spec results in a new version. It's a self-fork of the previous spec.
If the change made it into the ts or js spec, I'm sure they'd add in support for freezing protoype chains, even if just as an option that can be toggled, but i doubt they'd ever want to break the spec just because they don't like parts of it. That opens the door to more changes because they don't like the spec, and eventually you have a new language, or worse, the original language changes out from under you to support something similar in a newer spec (eg typescript and namespaces/modules).
> That's why each change to the spec results in a new version. It's a self-fork of the previous spec.
That's not how it works, no. There is just the spec [1], which is updated frequently. I am editor of the specification. (There are annual editions as well, but no one should pay attention to these.)
> I'm sure they'd add in support for freezing protoype chains, even if just as an option that can be toggled, but i doubt they'd ever want to break the spec just because they don't like parts of it.
Well, like I said, if Deno's only concern is breaking with the spec here, I expect the spec could be updated to allow this behavior.
>>> It is the fourteenth edition of the ECMAScript Language Specification
Fine, then sed s/version/edition/g
From the point of view of a maintainer that makes sense, but for the users each annual edition or feature moving into stage 4 its own spec version, and unless the feature is absolutely groundbreaking, thinking in terms of annual editions makes it possible for users to grab various tools with confidence that things will work together smoothly.
Look at how Mozilla interacts with the spec [1]. They're not thinking in terms of the nightly version of the spec. They're looking at the annualized editions and making sure they support them as fully as possible. And then they communicate that support to their own users in terms of that annualized edition.
V8 consumes from nightly and using the up-to-date test suites and they explain their reasoning here[2], but notably they're still conceptualizing things to their users through the lens of annualized editions even though they're still grabbing features when they're only at stage 3. They even tag blog posts about js with tags for the annualized editions that added the feature: [3].
As a user of the users of the spec, the annualized editions are super super helpful. I can only use features that have actually been implemented. And I have to make sure that each tool I use is only getting code that uses features it's implemented. Can you imagine if every tool had feature-by-feature implementation matrices? "OK, ESlint understands new features A, B and D, but babel only implements B, C, and D, but library X doesn't support feature D yet, so since we want to use that we have to use bluebird instead of the native feature D for now" and on an on. It'd be madness, and we'd end up picking a handful of tools we like enough and then transpiling everything to es5 because our own users don't actually care if we transpiled down to es5 or shaved enough yaks that we realized that our toolchain natively supports featurs B but everything else must be transpiled out or replacing the native implementations with our our in code implementation. Instead each tool picks an annualized edition and while slower tool release cycles be annoying, I can actually turn that guarantee of standardized features into a toolchain with transpilation steps as necessary, which means as a dev I know i can safely use any feature in that annualized edition without worrying if this feature i haven't really used before is going to blow up somewhere in my toolchain because the implementer hasn't gotten around to implementing that feature yet.
So while I like looking at the draft proposals and consider it important to know how the language is evolving, I have to wait for implementations to permeate enough of my tools before I can use the shiny new feature, which means annualized editions of the spec.
Mozilla is absolutely thinking in terms of the nightly version of the spec. I agree that public messaging sometimes talks about annual editions, but this is mostly because it's a convenient way to talk about when features were added to the language, not because it reflects any underlying reality.
Anyway, that's not really the relevant thing. What I'm addressing is:
> deno has done is removed native support for older js specs and instead makes you transpile to an earlier spec. And by removing support for the earlier specs, they are able to drop support for deprecated features.
And that's just not a thing. That has no relationship to the ES specification works. The __proto__ accessor was never mandatory; it was added in browsers a long time before it was specified, and then specified as optional (Annex B) when it was first added to the specification, and has been optional since then. This is true whether or not you think of there being a single specification or annual editions.
So, with that said, to address the specific topic of annual editions:
> "OK, ESlint understands new features A, B and D, but babel only implements B, C, and D, but library X doesn't support feature D yet, so since we want to use that we have to use bluebird instead of the native feature D for now" and on an on.
That's exactly how it works. eslint implements proposals at stage 4. Babel implements proposals as they come out and people contribute, but only adds them to preset-env at stage 3. The output for present-env is based on what's actually supported in the browsers you're using. They both take a variable amount of time to land features once they hit the appropriate stage. Neither of them gates anything on annual editions. Neither do browsers. And browsers will frequently not have implemented features from multiple editions ago; for example, regex lookbehind was added in ES2018 and is still not implemented in Safari.
If it was 2000, or even 2010, I'd agree with you. But that ship has unfortunately sailed. People modify the prototype all the time. If we were to freeze all prototypes right now in JS, a lot of existing code would break.
Why not lock the core libs like Map.prototpy Array.prototype ? Can't we have sandbox like environment (which is not default but which can be enabled on application basis). Java Applets which run inside the browser had this kind of sandbox restritions.
The impression I get from the companies I've worked at is to not trust packages that are mostly maintained by one person. Ironically, those packages are usually the ones that don't have outrageously large dependency trees and can usually be audited by a developer on a weekend, and the more "trustworthy" larger packages with hundreds of maintainers typically have monstrous dependency trees.
Not saying 1-man tools are inherently better, but in my experience, these tools seems to have a tighter focus and less chances of scope/dependency creep.
Node has a peculiar place in the pantheon for me where there are a few well known individuals who are maintaining a mountain of code that is still mostly theirs even with external contributions coming in.
TJ in particular, before he fucked off, and Sindre Sorhus, who has a little kingdom of tools in the Unix philosophy but more useful than leftpad.
I probably use code from others but those are the only two I am keenly aware of.
When I find an issue with one of the little packages Sorhus publishes, I often need to traverse 3-5 different GitHub repos to understand how the code works. Many times I find it better to copy paste the “base” code into my repo once I find it than continue to use the web of NPM packages.
>can usually be audited by a developer on a weekend
why are devs expected to do this on a weekend and not just as part of the work week? are we coming from the perspective of the dev working on a side project?
The big problem with one-person packages isn't so much security as it is support. I have been burned more than once by old applications where key features rely on random packages with one maintainer who disappeared years ago. At least with a group, you have options to keep things moving without having to fork the library yourself.
(Of course the root cause here is arguably too much reliance on third-party dependencies, but searchable dropdowns are _such_ a pain to make on your own, and it's so tempting...)
The Sangria GraphQL library in Scala ran into a version of this. The libraries were primarily maintained by one person, who wrote the vast majority of the code and was the only person with write privileges in the main repos. Sadly, he passed away unexpectedly, and it took months (maybe a year or so) before his colleagues and other contributors were able to get access to the GitHub org.
Well, for what is worth, we have a lot of dependencies maintained by Microsoft of all companies, with lots of production-breaking bugs and they're not too interested in fixing or letting us fix. Even getting fully-functional PRs (with good test coverage and community support) looked at takes a lot of work and time, let alone getting fixes after reporting issues.
One of those packages is a JS package that is hosted by them, so we can't even fork it and host ourselves.
On the other hand, with simple packages that get abandoned, we just fork, publish ourselves with another name or namespaced, and it's solved.
Solo maintainer vs. organization is definitely an imperfect heuristic for long-term support. But it's a decent approximation for dependencies that are low ROI but potentially high impact if they break, like a UI widget that gets used everywhere in your app.
It's the problem with any third-party dependency (ask anyone who's used certain Google products). But then if you build everything in-house, a) it's expensive, and b) you end up with homegrown frameworks written by somebody who left the company five years ago and now everyone is afraid to touch it.
The laws of software thermodynamics come for all of us. Eventually, old systems decay, and you need to roll up your sleeves and do the work to keep them going.
> But it's a decent approximation for dependencies that are low ROI but potentially high impact if they break, like a UI widget that gets used everywhere in your app.
Not really, it's not decent at all. What is a great approximation, however, is the heuristic presented by the grandparent poster: projects that are easy to audit, easy to fork (if necessary) and don't have outrageously large dependency trees. Everything else is a liability.
1-person tools make it easier to audit the person involved, and like you said in general I also find that those tools tend to be smaller and have fewer dependencies, which makes a big difference for security. Limited scope is good when looking at packages/dependencies, and side-projects are kind of required to have limited scope just by virtue of not having a ton of resources.
However, 1-person tools also have less accountability and less bandwidth to respond to emergencies, and (particularly if there aren't a lot of eyes on them), you need to evaluate whether the developer is qualified to build the package -- ie, are they likely to inadvertently introduce a security vulnerability or abandon the package if it needs security updates? People (myself included) have an instinctive bias to assume that 3rd-party code is written by people who know what they're doing. Part of the evaluation process needs to be asking, "does this person actually have the skill to do what they're trying to do?"
I also try to look at how documented the project is -- in an emergency, could I fork the project myself? If a project is being run by only 1 person, then it's more likely that the codebase isn't massive and that it wouldn't require a full team to update. But it's also less likely to be well-documented or extensively tested. Again, balancing act.
I'm not sure there's a single correct answer, I think it depends a lot on the project and on what kinds of packages you're looking at.
Unfortunately, as recent incidents have shown, these many party, otherwise reliable, projects often have dependency chains that have these one person that can have a breakdown projects as dependencies.
Would I ever include left-pad as a direct dependency? No. But as we found out, the person who provided a library that was used by react-dom might.
> The impression I get from the companies I've worked at is to not trust packages that are mostly maintained by one person.
That is hardly the point. Regardless of what you think of the solution presented, the author is utterly right in saying any solution has to involve not trusting any packages at all. How many people wrote the package is irrelevant.
Founder of Socket (https://socket.dev) here, a new tool built by npm maintainers to help solve JavaScript supply chain security.
I totally agree with the idea that we should assume all open source packages may be malicious. Socket.dev uses "deep package inspection" to characterize the behavior of an open source package. By actually analyzing the package code, Socket can detect when packages use security-relevant platform capabilities, such as the network, filesystem, or shell.
For instance, to detect if a package uses the network, Socket looks at whether fetch(), or Node's net, dgram, dns, http or https modules are used within the package or any of its dependencies.
This entails running static analysis (and soon, dynamic analysis) on a package – and all of its dependencies – to look for specific risk markers.
In this way, Socket can detect the tell-tale signs of a supply chain attack, including the introduction of install scripts, obfuscated code, high entropy strings, or usage of privileged APIs such as shell, filesystem, eval(), and environment variables.
We are taking an entirely new approach to one of the hardest problems in security in a stagnant part of the industry that has historically been obsessed with just reporting on known vulnerabilities.
This, in my opinion, is the right answer for the problem identified in the parent blogpost. Rather than trying to get every single package author to adopt some unified capability token scheme in their code, just statically analyze all dependencies from the outside and report the capabilities they actually use.
It would be even better if something like this could be integrated directly into the package management tool itself, so that you could run `npm update` and get back "New dangerous API usage in package X version a.b.c: filesystem access. Type package name to acknowledge and upgrade."
> It would be even better if something like this could be integrated directly into the package management tool itself
We're planning to build this. However right now, the primary way to consume Socket.dev data is through our GitHub app (https://socket.dev/integrations).
What prevents malicious person to craft their code until it evades your analysis? It's the same with antiviruses. They're not that useful because adversaries adapt their viruses to pass antivirus heuristics. And, as viruses show, you can make your heuristics whatever complex, someone smart will find a way around. Especially in that wild JavaScript environment.
This is a fair question. The answer is that most malware behaves in ways that are deterministically detectable. For example, 93% of malware uses install scripts, which must be declared in the package.json file and are not possible to hide from our analysis.
From recent research:
> We found 93.9% (3,412) of malicious packages had at least one
install scripts, indicating that malicious attackers use install
scripts frequently [1]
When malware authors adapt and start doing fancy dynamic stuff, we might not be able to figure out exactly what they're doing, but we can detect the usage of obfuscated code, dynamic requires, and other signals of compromise.
I think Go modules version resolution opting for lowest common release rather than the standard highest is a reasonable and sane option though not a total solution. It prevents users of your library from pulling unvetted versions of your dependencies just by pulling your library alone.
This won't prevent packages you're already giving capabilities to - to later do something evil in a small patch update.
So it's not enough - you should also not rely on "automatic security updates" through some sort of semver trust. Lock your dependencies up completely, choose packages with a small dependency tree or zero dependencies, and at the same time use the newly launched https://socket.dev to know what packages do. Also reading through the source of your dependencies should be on the list.
> This won't prevent packages you're already giving capabilities to - to later do something evil in a small patch update.
True - its not perfect! But the principle of least privilege should help limit the blast radius. How many packages in npm dependency tree need access to the filesystem? Or to the network? I bet its a vanishingly small percentage of the packages in the average nodejs project. Being vulnerable to malicious code in 3 hand selected packages is much, much better than being vulnerable to malicious code in any of the packages in your dependency tree.
And even then, we can be very specific about what those packages have direct access to. Right now the situation is "every package can read and write to any file on my computer". The fine grained permissions I'm proposing in this post would let us say "only package X can access the filesystem at all, and when it does it only has access to this subdirectory".
Thanks for sharing Socket.dev! Totally agree that a key part of any supply chain security strategy must be understanding what packages actually do when they run.
For example, see the package `angular-calendar` which is a calendar/date picker web component. When you look it up on Socket.dev [1], you'll see that it actually uses:
- Install scripts
- Telemetry to track you
- Network access
- Shell access
- Environment variable access
- File system access
Which is waaay more capabilities than you'd expect. All of these capabilities turn out to be caused by a single dependency which implements telemetry to track the package usage a la Google Analytics.
Wow.. Yeah that's a great example of exposing what's actually going on!
Btw, is there a specific reason you're not listing the "yellow issues" from a packages dependencies on its front page? For instance https://socket.dev/npm/package/mongoose doesn't really show anything on the front page, but if you go to "dependency issues" you get "uses network, eval etc.". I think it'd be necessary to treat "dependency issues" the same as the packages own issues.
We're not happy with the noisiness of filesystem and network issues, so we mark them as a bit lower priority for the moment. The specific issue is that we currently only detect when 'fs', 'net', etc. are required and not whether they're actually used and which specific functions are used.
We're working on improving our analysis and are close to shipping a big update at which point we'll increase the severity of these issues.
Author here! I didn't even know that existed. Thanks for the link!
That looks like it has the same problem as Deno's solution, in that its too coarse for my taste. I want to explicitly give permission to a library, not to the process as a whole. (Since I don't want some errant library deep in my dependency tree to nuke my production databases.)
I love the definitions of scope though - that looks like exactly the sort of thing that I want here.
> I want to explicitly give permission to a library, not to the process as a whole. (Since I don't want some errant library deep in my dependency tree to nuke my production databases.)
Doesn't Java have a SecurityManager feature that can do this? Perhaps we need a JS equivalent.
The trouble with security manager is it’s only as good as it’s widespread support.
If it worked, and was widely used, for example, nobody would have had to worry about the possibility of their logging library downloading code from an LDAP server and executing it.
SecurityManager was a bad idea. It might be useful to prevent accidental bound trespassing. But it turned out too flaky to serve as a security foundation. Basically nobody uses it for security.
Good article. This is what I was excited about the first time I heard about Deno before I eventually learned what Deno's sandboxing model actually was. I'm not sure this is the exact proposal I would want, but I do want something vaguely like scopes or capabilities in Node, and even if it wasn't perfect I think it would go a long way towards mitigating at least some of the current risk in the ecosystem.
Also agreed that for all their use, it would have been better in the long run if install scripts had never existed. It's not just that they're a security vulnerability, they also get in the way of vendoring code, and can introduce additional non-JS dependencies and errors on other systems/platforms. Again, not to say that they don't have any use, I get why they're there. I just wonder if the benefits are worth the downsides.
I think that’s throwing the baby out with the bathwater. Wouldn’t it be logical that when people use snippets from Stack Overflow or classical algorithms from Wikipedia the pasted code was actually treated as a dependency so you can get warnings and updates in case errors or security issues are found? It also helps for proper licensing and attribution.
The problem is that code in npm should allow for trust. Either based on signatures, or code reviews by trusted parties, or something like that. Code signed by a long time trusted developer that’s been published for a month should not be treated the same as a 3 minute old commit to a repo by a mysterious developer. These verifications should be automated and npm could give a final ranking.
> I think that’s throwing the baby out with the bathwater. Wouldn’t it be logical that when people use snippets from Stack Overflow or classical algorithms from Wikipedia the pasted code was actually treated as a dependency so you can get warnings and updates in case errors or security issues are found? It also helps for proper licensing and attribution.
Why are you assuming devs don't just both use random snippets from Stack Overflow and also download packages with 900 transitive dependencies from NPM at the same time? It's not one or the other.
In the early days of node, the ability to have tiny nested modules without opening the gates to dependency hell was such a profoundly new and exciting capability that the community didn't need any convincing to (over-)embrace it. As with microservices today or CORBA in the 90s, moving your design's complexity from its nodes to its edges is a powerful way to convince yourself that you've made it simpler.
Could a path forward also be to unify the most depended upon small packages into one large dependency, managed by some trustworthy entity?
I guess some plug-in to npm could handle the resolution-mapping between the (9 line) strip-ansi package and the node-standard-library package?
Of course this don’t solve all problem, but if a create-react style app could lower its number of dependencies by 80% or something, it would be easier to keep track of the remaining.
Because even if you have some kind of capability system, you are still pretty vulnerable to miss behaving packages. Even if the scope of badness is dramatically lower, chaos would ensue in many build pipelines if some of these “core” package just started throwing exceptions / returning empty objects
The tendency is to make for every function a library. Just look at lib directory in the Slackware install tree. The functionality (of the distribution) has not changed too much but the number of libraries needed is astonishing.
No packages or repositiories deserves trust, at least not in the current state of affairs. There is no magic fix that will enable you to trust packages just by adding a new framework or anything similar, and what the linked article is addressing is only half the problem.
We also need a way to make packages auditable. Package signing by the publisher and the repository needs to be mandatory. Having an actual link between the package and the commit it was built on and a way to reproduce the build[1] also needs to be possible. This would allow for proper code reviews, not just of your own code but also audits of whatever extra components you are using.
Organizations need to define sensible thresholds for when you can use a package and when you implement the code yourself, to avoid adding a library only to use one function. They also need to define trust; what is needed to trust a package or a maintainer.
And we need a system and proper best practice to guide us on what to trust; do we really want to add a package with 100 direct or transitive dependencies, where some hasn't been updated for the past two years, some are maintained by solo developers and some are just implementing already existing functionality?
All of these are hard to implement and justify, when the current situation seem to work for a lot of people.
TL;DR: There's nothing fundamentally less worthy of trust about node's supply chain than any other popular mainstream language ecosystem. They've all got some badness.
---
Here's a trope I'm tired of:
Take X general problem that affects a wide range of systems, attribute it to one narrow system Y. Usually because that system Y's general accessibility and success leads to a higher number of high profile incidents related to problem X.
One of the practical outcomes of this trope is people trying to solve this problem in narrow, ecosystem-specific, non-portable ways.
The issue is especially bad, and needs to be especially called out, for npm because of the insane dependency bloat.
You can write fully featured and useful python apps with just the standard packages that come with python. In JS you need a handful of third party packages just to tell if a number is odd.
A difference in magnitude becomes a difference in kind.
I think you have to appreciate the history with node to know why there are so many packages. Node’s growth coincided with GitHub’s and node’s community really adopted the “social programming” trope. You could really make a name for yourself with a popular node module. Javascript had more limitations to work around at the time, and then the desire to reuse code in node and the browser created even more need to abstract common tasks. The “modularize everything” philosophy resulted and it became a kind of game to make as many modules as you could think to make; after all, isn’t code sharing the joy of the FOSS revolution?
That era has since peaked and declined. Now I see people make way fewer modules because of the difficult of managing them all. There’s much less cred to earn from a node package. People who did gain social capital from modules are now stuck as maintainers, gaining very little additional value — thus more conversation about paying maintainers with monetary capital, along with abandoned or ownership-transferred code. And, of course, we’re now suffering from the security issues.
It’s still an incredibly valuable corpus of modules, but it’s post-bubble. It wasn’t just “js programmers are too novice to know better.” It was people having fun, playing the social game, trying silly ideas, and chasing a meme-wisdom of programming (modules = good).
> In JS you need a handful of third party packages just to tell if a number is odd
But you absolutely don't.
I get that you're taking a worst-case example, but it also stands that if Node developers would actually take some time to write stuff themselves -- the leftpad situation was absolutely stupid, because `.padStart()` exists -- then the situation wouldn't be nearly as bad.
At my workplace, if you can write the functionality in (depending on the scale) a day to a week, then you're not allowed to use a 3rd party package for it. You can _look_ at what other people have done, but doing it in-house leads to less work overall in the future.
> the leftpad situation was absolutely stupid, because `.padStart()` exists
String.prototype.padStart hit Chrome in January 2017 and Firefox in June 2016. `left-pad` was published in March 2014.
A lot of these small packages that seem ridiculous now addressed (sometimes poorly!) missing aspects of the library specification for pretty good reasons. The JS specification has improved a lot over time--I just chuck `ESNext` into the library tsconfig.json for every Node project--but there is a lot of historical baggage with which this kind of dismissal doesn't adequately come to grips.
Node has horrific dependency bloat, but this is a strawman here because the right response is not "let's assume the dependency bloat is inherent/justified and come up with specific security mitigations" but rather "let's ask ourselves: WHY does Node have such terrible dependency bloat?"
I've no idea why, but I've some personal theories:
Similar to how PHP has historically been associated with bad code, not all of which can realistically be attributed to the spellings of its API identifiers, I think whenever you have a system that solves the ease-of-use/developer-accessibility problem well, you will end up with the standard of contributor to that system being less skilled, since it's easier for less experienced people to start using it. You see this throughout the NPM package ecosystem: packages developed by very inexperienced engineers being relied upon by big popular projects.
This is ultimately a "good problem". You strive to make your tools easy to use, and when you succeed, you end up with more people using them badly.
You can argue that tools should be both easy to use and also foolproof, but that's utopian. Let's work towards that but not expect it as a baseline.
> In JS you need a handful of third party packages just to tell if a number is odd
Maybe it's hyperbole, but you definitely don't "need" any third-party package to tell if a number is odd (`i % 2` does the trick). That there exists a package for it doesn't mean that the majority of users actually use it.
The standard library for JS is pretty small in general, but I don't think hyperbole is the right way of getting your point across, as I agree with you in general.
is-odd gets between 400k and 500k downloads every week[1]. Maybe you don't need it, but lots of JS developers have decided that they do. is-odd also depends on is-number, and is-even depends on is-odd.
I agree that these packages are trivially implemented by a first party instead of imported, but ~half a million JS developers every week choose to import it instead. _That is the problem_.
I'm fairly certain those download numbers include CI runs redownloading the not-cached package over and over, likely via a transitive dependency. I don't think it's fair to say approximately half a million unique JS developers are choosing to import it every week.
These numbers aren't too hard to rack up for well-known packages (even the meme ones like this). e.g. is-odd is a transitive dependency of stuff like handlebars-helpers which gets a lot of downloads and will pull in is-odd automatically.
People complain about these tiny packages they find on npm the whole time, but usually these packages come from people learning how to create their first npm packages, or creating or following tutorials. They aren't serious packages used by typical developers for production apps.
If you go to the isEven github repository you can even see "I created this in 2014, when I was learning how to program." If you hover over his 'organisation' you'll see the text "This is a joke. You'll only see this org if you are attempting to troll me about repositories I created when I was learning to program."
is-odd gets between 400k and 500k downloads every week[1].
The package was written as an exercise in learning how to create a small, useless package. But a huge portion of the JS development community choosed to import and use the package anyway.
They're jokes or satire or learning repositories, like `install-is`: "Installing this package installs a bunch of useless packages" or Stalinsort or module-practice-january. The most serious looking dependents are by the same author.
This is absolutely not true, and I'm tired of seeing this.
is-odd, alongside a bunch of other microdependencies are almost all the work of one person, who made as many micropackages as possible and then PRd them into other more popular libraries. There are not 6 million people directly downloading `is-odd` a day. At all.
When this person could make one library to do something (like an ANSI-Colouring package), they would fractalise it into as many dependencies as possible, because that boosts their download count on NPM. I should note that this is just one person who has managed to nestle their way into some larger projects. I apologise for the spam, but this point really needs hammering home:
I think you're missing the point. These packages are stupid and too small, but they get millions of downloads a month. That's the problem with the JS community. Instead of rejecting a dependency for being silly, JS devs will happily import them to save two lines of code.
No. The JS community won't happily import stupid packages to save two lines of code, any more than devs in any other community. A very small minority of devs will do this, and publish their larger packages. Those millions of downloads a month are transitive.
This is the same kind of defense C programmers make regarding their beloved footguns.
That might be true in aggregate, but the exceptions are exceptionally bad, impact a lot of people, and then fools try to rationalize the footgun on HN.
The question is, are we seeing that bad behavior because there is something inherent about the platform that encourages it, or are we seeing more with Node just because that community is enormous.
Because people love to bring up is-even as a reason why Node & NPM suck. What exactly did NPM do to create the is-even situation? (other than making it super easy to publish). What should they do differently?
I'm old enough to remember the early days of Node and listening to a podcast (the name escapes me) with Isaac Schlueter explaining how node_modules isn't a hidden directory because you should vendor it.
The problem is Node and NPM grew at a greater rate than the rate it took to introduce someone to vendoring node_modules. Fast-forward a decade and it seems like people have forgotten all about vendoring and instead optimized for blindly shipping code warrantied for no purpose from the Internet.
The excuses why people don't vendor their packages are almost identical to the excuses people don't write tests for their code (i.e., time and velocity impact).
If I remember correctly, that library was a joke. That people immediately started exhorting as "the right way to do it" as a joke. That due to Poe's law other people immediately understood as the right way to do it...
Those numbers are a little deceiving. It’s likely that those modules are upstream of a popular module or two. It’s not like they are installed by projects directly.
It's only directly depended on by 40 or so npm modules, and most of them look like beginner/joke modules like "is-ten-thousand" where they brag about being featured in a "worst npm libraries" article.
Quickly in these sorts of conversations, we're really just making fun of beginners for using dumb packages. With the popularity of Node, it makes sense to me that you can have beginners searching npm/google for how to tell if a number is odd, and for whatever reason they find is-odd or whatever.
And perhaps the idea of even collecting packages to do simple things is something fun for them. I remember having a weird maximalist attitude when I was a beginner using Rails. I'd install a Ruby gem and I could use it anywhere in my files without even importing it, usually a single line of abstraction. For some reason that appealed to me even though I could have done it myself. I think I had this idea that people writing libraries were doing things right, and I was right for tapping into them.
I think we should save our denigration for the serious, popular projects that use packages rather than the fact that beginners use them. Like, this sort of package has no business being a transitive dep of Express (it isn't) and popular Express middleware, and mainly because x-deps are security issues.
Very good points, and I think you're right on the numbers here! I went dependent spelunking - almost half of it is because is-even is a dependent (naturally) but then I didn't notice any dependents of is-even to explain it. I guess people do just google it and grab the module.
Mea culpa for the unsubstantiated assertion before.
JS has a weak stdlib so it's definitely more common to need to pull in some deps vs python. Of course, nothing is forcing you to install silly packages that are wrappers for one-liners.
> In JS you need a handful of third party packages just to tell if a number is odd.
I have no doubt some clueless interns have done this but there is no need to self-inflict this kind of unnecessary pain.
>Of course, nothing is forcing you to install silly packages that are wrappers for one-liners.
That's the thing... I am effectively forced to install a lot of silly packages, because I need some not-so-silly packages, and these in turn pull in all the silly packages as dependencies of their own (a few levels down the chain).
I never installed leftpad (or any package like that) myself, and yet, at one point it was present in basically every node project I ever did, because of indirect dependencies.
While this kind of dependency bloat could happen in any language ecosystem, in node/npm it is from my experience by far the worst. I think it's because the javascript and node standard libraries were/are so very limited combined with npm making it too easy to publish and consume packages, and being early enough in the game so supply chain attacks weren't yet on most people's mind.
I think, aside from node package maintainers being too nonchalant about pulling in basically silly dependencies, it's also a matter of a lot of package maintainers being very laissez-faire when it comes to maintaining the cruft and doing the tedious work of removing dependencies that are no-longer needed.
An example of that - because it bugs me every time I see this show up in my logs, package lock or node_modules - is the isarray package. It's another one-liner, Array.isArray is part of JS since a long time (even IE 9 supports it and IE 9 was EOL in 2016) and the isarray package will just use it when present (i.e. virtually everywhere), and the author recommends to just use the built-in Array.isArray, and yet it's still omnipresent with almost 63 million weekly downloads, 858 direct dependents on npm (with countless other indirect dependents, and dependents not published and therefore not tracked on npm). And the number of weekly downloads still goes up week-by-week, month-by-month.
Hi! Author here. I write a lot more Rust than Javascript these days, and I agree - this is absolutely a problem I'd love to eventually see addressed in the rust ecosystem as well. (Although dependency madness hasn't kicked in anywhere near as much in the rust ecosystem).
But rust and javascript each have unique challenges. Per-library sandboxing is hard in javascript because of the language's dynamism. Its hard in rust because any code you pull in can always drop down to unsafe, and in unsafe land you can really do whatever you want. We could probably selectively ban eval() in modern server-side javascript pretty easily because its just not used that much in modern code. But we can't ban unsafe in rust because its sprinkled everywhere.
> One of the practical outcomes of this trope is people trying to solve this problem in narrow, ecosystem-specific, non-portable ways.
I think trying to solve this in every language at the same time would be an exercise in boiling the ocean. There's no need to implement something like this in every language all at once. Much better to start somewhere, experiment, and hopefully if the solution works well we can see how it might apply in other languages.
Talking about `unsafe` in this context seems like changing the discussion completely.
All of the concerning things done in the JS supply-chain attacks can be done perfectly well in safe Rust. You don't need `unsafe` to exfiltrate secrets or to encrypt or delete files.
Auditing the use of unsafe in dependencies is worthwhile for mostly unrelated reasons; it's not the same thing as sandboxing them, auditing them for malicious code, or assessing how much you trust them.
Right; I'm skipping ahead and imagining an alternate universe standard library for rust with capability support. Like, what if we did the thing I proposed in this blog post, but did it to rust instead of javascript? How would that work?
In that world, all the privileged operations (filesystem, OS, etc) in std would require an extra capability object to be passed at runtime. The capability would grant permission to the caller to perform the privileged action (like writing to a file).
That would stop rogue rust crates from making syscalls that they shouldn't be making. There might also be a way to enforce those permissions at compile time instead of at runtime.
But even if we were willing to do that, it wouldn't matter if any library could still use unsafe blocks. The reason is that there's dozens of ways to make syscalls from unsafe blocks without needing to go through std. For example, you could dynamically execute methods in glibc / musl, use an asm! block, or (probably) manually compile a function into a byte array, transmuting it into a function pointer and then execute it.
If you changed std, and also banned 3rd party libraries code from using unsafe you could probably make the system secure. But I'm worried that the rust community might not be willing to pay that cost for extra security.
> If you changed std, and also banned 3rd party libraries code from using unsafe you could probably make the system secure. But I'm worried that the rust community might not be willing to pay that cost for extra security.
Anyone will be willing to pay a cost when the benefit is well-defined and exceeds that cost. "Make the system secure" doesn't have a well-defined meaning, because security isn't binary like that.
Banning `unsafe` from all non-std deps is pretty hard for most real-world Rust software unless std grows a lot larger. Even apart from domains with a lot of necessary unsafe, like embedded, you have crates like `bytes` or `tokio` that use unsafe for performance reasons.
A realistic discussion about this stuff always has to involve trust. I don't have much issue with tokio containing unsafe, given its track record and the quality of the code. OTOH, I don't allow actix in my dependency trees because its history of using unsafe unnecessarily and unsoundly means that I don't trust it.
To me, a more interesting approach for improving supply chain security in software builds on the issue of trust rather than things like technical capabilities. Rust has some interesting work in this area already, like cargo-crev for distributed code review, cargo-deny for applying rules to your dependency tree, cargo-geiger for seeing which dependencies are using unsafe, etc.
> There might also be a way to enforce those permissions at compile time instead of at runtime.
AIUI, a compile time capability is just a custom unit type, perhaps using PhantomData to depend at compile time on some generic type or consteval parameter. Then ordinary type checking is enough to ensure that this gets "threaded" correctly throughout the code, as required. 'Narrowing' a capability is just a one-way type conversion, e.g. via .into(). Since you're doing this via a unit type that carries no information, everything should disappear at runtime, with no effects on the ABI. You'd effectively be using the type checker to prove things about what your code is allowed to do.
> Its hard in rust because any code you pull in can always drop down to unsafe, and in unsafe land you can really do whatever you want
There is a quasi-standard "cargo geiger" tool that can manage unsafety requirements in Rust imported crates. (Of course, this should really be an officially provided feature in the first place.)
> I think trying to solve this in every language at the same time would be an exercise in boiling the ocean.
I actually think the opposite is true.
Capabilities are great but limited in their applicability within actual application code. Ultimately, your main application that may require some "dangerous" APIs to do something benign, may be loading 3rd party code to wrap those APIs, and we need a way to trust that code. That's a general problem, and - while it is a very hard one to solve - coming up with novel and unique solutions in every ecosystem is going to reinvent a lot more wheels (& boil some seas at least) compared to looking at software composition holistically.
> Its hard in rust because any code you pull in can always drop down to unsafe, and in unsafe land you can really do whatever you want.
Not super familiar with Rust, so this may be a silly question: could Rust mitigate this by forbidding third-party packages to use unsafe code, except for ones you specifically allowlist? e.g. your Cargo.toml might look like this
The proposed system itself should be fairly portable and in its idea reminds me of SELinux. The problem is SELinux knowledge isn't widespread, it's Linux-only (so no support for OS X and Windows which are the majority of JS developer machines), and especially it operates on a syscall level - which is perfectly fine to limit filesystem access (=for all nodejs processes, limit file I/O to the parent directory of node_modules?), but only barely unusable to sandbox network access.
Therefore it makes the most sense to define a standard for configuring capabilities/permission whitelists/grant requests/(however else you want to name it) and leave the implementation up to the platform/language.
Of course node is less trustworthy than other platforms. It it wasn't, we would see active exploits on other platforms as well, but they are either incredibly rare or non-existent.
But you are right in that it's not a difference of kind. Except for the inane idea of executing random code on library installation, node doesn't have any kind of vulnerability that isn't shared with every package manager out there. The important differences are social on the community, and of exposure area, because JS programs tend to rely on 1 or 2 orders of magnitude more developers than other languages.
JS didn't need it either. A dev made the library, shared it, and some developers decided to use it. That's on them. Nothing about JS made it necessary.
Node has a more robust standard library than Rust, which forces devs to download third party libraries to compute a regular expression or generate a SHA-256.
For now Node is a richer target due to its popularity but the same issues will hit any language ecosystem that suffer the same flaws should they become popular.
> One of the practical outcomes of this trope is people trying to solve this problem in narrow, ecosystem-specific, non-portable ways.
In general I agree, but I disagree that conversations about dependency scopes are ecosystem-specific. Frankly, figuring out how to limit the capabilities of imports is a discussion that literally every language with a package manager should be having right now.
Sure, the implementation is always going to be platform/language-specific, but this should be a serious consideration that people have any time that they're designing a new package system, we should expect new package managers to have an answer about how they scope dependencies and handle dependency permissions.
1. The Node supply chain has serious problems when it comes to security: this is true
2. The strong implication (throughout the article but most prominently in the title) that these problems are either unique to Node or at least worse in Node than elsewhere. This is definitely not true.
---
The solutions cited in the article are good, and definitely benefit the Node ecosystem: --ignore-scripts is a well known option that would be great on-by-default but tends to be enforced in large corp CICD currently. The capabilities model is not too different to what Deno do (this article's idea is somewhat more fine grained but much more complex which could hamper adoption or lead to misuse).
Overall both would be of somewhat limited, but still very worthwhile benefit.
The hard problem imo is solving more generally for supply chain trust. We implemented --ignore-scripts enforcement last year and have only really see it have a relatively small impact, despite it being relatively new as an idea in the past few years (which motivates more exploits due to its novelty). In reality, most supply chain attacks are either targeting prod envs, or integrated build dependencies, rather than piggybacking CICD repo install steps: the later attack gives you the same env access and has no package-manager-specific mitigation, so it needs a more general solution. Capabilities also have limited use outside install scripts because limiting them also limits the "benign" functionality of your normal application code.
Debian has ~20,000-30,000 packages total. NPM has well over a million. Contribution frequency and overall contributor numbers are also much much higher.
NPM is a victim of ease of use and popularity. It's a bigger target.
But both systems would benefit from a holistic approach to supply chain security.
Sadly, in Debian every package which gets installed, can run a script with root privileges so this is even more dangerous. Although most packages don't need such privileges.
Nonsense. Package managers need to be able to run scripts as root to do the installation. And yet, in the last 25 or 30 years there's never been a case of a malicious contributor successfully inserting a backdoor in the installation script of any package in any major distribution.
Because there is a vetting process, nothing else.
[And yes, of course, it would be possible to sandbox each package installation to access some very specific paths but so far it's really unnecessary]
The way modern Linux distros work is that a number of volunteers who pay attention to what's going on upstream package software for users and developers. When something gets weird these volunteers change their behavior and prevent the users from being harmed (the recent Audacity mess is a great example of this.) I don't think people pushing eg cargo/node/snap appreciate how safe this has made their OS and languages that rely on distro package managers (such as C.)
You get the safety that you might expect from an app store without actually restricting anyone's freedom. It's very much like a church with elders preventing people from being captured by vices and whatnot. Yes it has issues but it works shockingly well, much better than many of the alternatives.
Also, most distros have the concept of stable releases. This provides a very valuable "focus point". It means that we don't just have to rely on the maintainers. Users can review packages for reasonable behavior too, and this isn't made futile by constantly changing packages. Users and maintainers can focus on just the stable release and at a reasonable cadence, and this focus point being the same for all users means that it has value to everyone else, too.
This is ultimately about scale though. It's easier for Linux because of the relative number of contributors to distro repos.
Ubuntu has 10s of thousands of packages. NPM has well over a million. The average update frequency is also much higher, as is the number of contributors per-package.
> Ubuntu has 10s of thousands of packages. NPM has well over a million. The average update frequency is also much higher, as is the number of contributors per-package.
Node's culture of tiny libraries (partly caused by Javascript's tiny standard lib) is a big part of the problem and increases the number of potential supply chain issues.
It's not that Ubuntu has fewer packages because it's 30x more efficient with how it packages software -- it has fewer libraries because it has less software and less developer attention. It's not at all uncommon for me in Debian systems to have to search out non-distro repositories to pull from. And that's even before we get into the issue that Ubuntu/Debian repositories aren't rolling release. I often find myself jumping outside of the official Ubuntu repos even for software that they provide, just because they're out of date; it's one of the biggest reasons why I eventually moved to Arch.
Yes, JS dependency chains are out of control. No, that's not the only reason why there are over a million packages on NPM. No, the solution to the scalability problem of human-curated package managers can't be, "well, we just won't scale."
Adding a bigger standard library to JS would not be enough to get rid of 970,000 npm packages.
What standard libs are you comparing? Node's to what other language? I've seen so many commenters say this, but still not sure what the magical thing that can't be achieved with Node built-ins is...
Developers don't write libs because you can't do it with built-ins, the write libs because developers like to write code and NPM is easy to use.
You need thousands of packages for a pretty standard React application (which requires hundreds of base packages). That's a cultural problem in Node's packaging community that injects risk into the packaging ecosystem.
> What standard libs are you comparing? Node's to what other language? I've seen so many commenters say this, but still not sure what the magical thing that can't be achieved with Node built-ins is...
This is literally EXACTLY how releases are supposed to work for companies using any package manager out there.
A number of [employees] who pay attention to what's going on upstream package software for users. When something gets weird these [employees] change their behavior and prevent the users from being harmed.
The problem is - much like a church with elders (and linux distros - frankly) - quality varies dramatically.
Some of them prevent people from being captured by vices, some of them diddle the kids.
Same here: Some companies take the appropriate steps to lock down dependencies and only update after a thorough vetting. Some pull the latest packages on every push to master.
The problem is that as you get deeper into Linux, you become progressively more and more likely to install your own packages from source, and then all of that curation goes out the window.
I'm not 100% convinced that the number of volunteers for package managers like Arch are actually sufficient to catch malware even in its current form; I think they get a lot of benefit out of desktop Linux being a relatively low-value target. But I'm really not convinced that their approach would be scalable if they actually had to scale at the level of npm. From what I can tell, Arch only has in the neighborhood of 13,000 packages[0], and it doesn't easily allow installing arbitrary versions[1]. I have nothing but praise for Linux, but none of the main distros have anything close to the amount of developer activity that the npm ecosystem has.
And that's when curation breaks down: Arch solves the problem of not having a ton of packages in the main repos by allowing multiple upstreams, by supplying AUR, and by compiling packages from source. But once you drop down into AUR, it's a lot more dangerous and would be a lot easier for people to push malicious code. And if you're pulling Makefiles off of Github, all of that goes out the window -- and there are good Linux software packages that encourage that behavior.
Not to mention the number of Linux software packages that straight up just give you a shell script to run that configures and installs the rest of the program (looking at you, Calibre). If you're lucky and you're using Arch, then you might be able to just install Calibre from the main repo. But that's also kind of Arch-specific, on Debian systems it's much more likely that you jump out of the main repos because they're out of date and you want the most recent version; you either start pulling from a dev-controlled upstream or you start running the shell scripts to install that software.
----
Don't get me wrong, I actually think that from a curation perspective, the way Linux package managers work is the best available solution we have for human moderation for software. A single large curated list that fits everyone's needs is impossible, it does not scale[2]. The only scalable solution for curation is to have a lot of separate curated lists that people can subscribe to; and then to recursively have curated lists of lists.
However, curation is not a magical catch-all solution against malware, particularly when you get AUR and source compilation in the mix. Curated lists are one layer of security, and need to be combined with other sandboxing techniques, with user education, and with (when possible) minimizing the number of packages people need to install. It's not as simple as saying, "the volunteers won't let anything bad happen" -- and I definitely wouldn't say that Linux package security is a solved issue, I think a recognition of some of the weaknesses of that model is part of the reason we're seeing so much effort going into Flatpak[3].
[1]: Yes, you can roll back but it's not really something that's advised to do for specific packages. Generally, your system will run smoother if you keep everything up-to-date and don't pin specific versions.
[2]: We've seen this with both iOS and Android, you either make a limited list that doesn't meet everyone's needs, or you have bad curation. Sometimes both. Splitting up lists does a lot to help solve that problem.
[3]: Although in the spirit of having multiple curated lists, I wish we'd start to see more popular upstreams than just Flathub.
> The problem is that as you get deeper into Linux, you become progressively more and more likely to install your own packages from source, and then all of that curation goes out the window.
It needn't go out of the window. The key thing is to keep the set of packages on which you deviate small. Then you can curate the exceptions yourself, or a community can form that share the same needs and they can do it.
It's when you do throw the curation out of the window, or subscribe to an ecosystem that effectively requires it [that curation be thrown out the window], that the problem arises.
It doesn't make sense to build this 'capabilities' feature into the program itself. It feels a bit like taping your mouth shut in order to lose weight.
It doesn't make sense for a program to not trust its own code any more than it makes sense for a person to not trust their own thoughts.
There is no need to pollute your code like this. It should be implemented as an external tool which analyzes dependencies when executed on demand. A company could just run this tool as part of their CI pipeline before code is deployed or executed. It could be a default hook which runs automatically as part of npm install. It should not be part of the code itself. It's ugly and adds unnecessary overhead and complexity.
It's possible that an external tool executed at compile-time would not be able to verify modules which come with C/C++ bindings, but I think it would be difficult to stop these anyway (even at runtime). C/C++ bindings will always be less secure because it's harder to understand what's going on if you don't have access to the code. C/C++ is too powerful; you can do some crazy stuff with buffer overflows which would be difficult to detect anyway even at runtime. The solution is to try to stick to modules which rely only on native Node.js functionality and not on custom C/C++ bindings.
The short answer is nothing - it would be very easy to introduce a malicious Maven package.
The slightly longer answer is maven artefacts are immutable; the default behaviour is to pin precise versions; and norms among java programmers don't favour using libraries for one-liners like left-pad - meaning there are fewer people in a position to launch a supply chain attack.
The dam still has cracks in it, but there are fewer cracks and some have sticking plasters over them.
> and norms among java programmers don't favour using libraries for one-liners like left-pad - meaning there are fewer people in a position to launch a supply chain attack.
Javascript is the only ecosystem I've seen people doing stuff like that. I know this is going to sound elitist, but maybe the problem is that the bar for learning javascript is low, and the incentives for a javascript developer to improve themselves is also low. You can get away with lazy and bad practices for your entire career, even as a senior full-stack developer. Typescript kinda raises that bar a little bit, but not by much.
I don't think there's a solution to that particular problem, short of deprecating javascript once WASM reaches a point where it can fully replace it. But even in that scenario, we'll probably start to see JS interpreters ported to WASM anyways.
Maybe those "No Code" products are the solution? Replace all of those JS web/app developer positions with people trained on specific No Code platforms that require basically the same amount of programming knowledge, but outsource things like security and architectural decisions to the platform.
The bar to learning is low, the payout in the industry is high (everyone wants a web site, web service, or web app), and (key in this problem-space) the JavaScript standard library is basically a tiny raisin of functionality.
It's not so much "incentive to improve self is low" as "it doesn't make sense to rewrite something that exists," and since JS developers, to stereotype, tend to be extremely online, they will tend to solve problems by asking "Is this written yet?" instead of writing Yet Another YAML Parser.
Let's take more charitable one where we know that most of dev newcomers flock to JS and expect JS to have things like left-pad.
While being newcomer is not a bad thing in itself, unfortunately there is a bunch of expectations and things learned in JS ecosystem that are not aging well.
I don't think 'rising bar' is the answer as much as going "No Code/Low Code" is. People that are running companies that depend on newcomers will get what they pay for.
Companies that hire people with experience won't notice a thing.
Since left-pad a lot of people in the trenches of JS are learning that adding any dependency might cost a lot, and now we see more and more of that.
I never remember hearing about this sort of behavior in the CPAN scene. There's something different at a cultural level with Node. With Node, there have been several instances of otherwise competent coders destroy their own work to make a statement.
it would be very easy to introduce a malicious Maven package
From the perspective of a cybersecurity researcher, this is just not true. At a minimum, it's not true in the same way.
Node executes arbitrary code on install. The best an attacker can do in Java is execute arbitrary code at runtime, and even then, only insofar as the developer has directed the securitymanager to execute arbitrary code.
This is a massive difference that I believe people need to be at once more aware of, and more wary about. Don't believe the hype. Don't let people on the internet telling you there is no difference lull you into a false sense of security.
If you work on a machine, or on a project where security is important, check your dependencies people.
I'd wager 99% of Maven projects run unit tests on build, so I'm not sure the distinction between install-time and run-time is all that meaningful.
And the Security Manager might have been relevant back in the days of Java Applets and Web Start, but I've never seen it used outside of the OpenJDK test suite - and certainly not for protection against malicious code.
> I'd wager 99% of Maven projects run unit tests on build, so I'm not sure the distinction between install-time and run-time is all that meaningful.
Most Java projects don't build their dependencies from source though (unless it's a local project included via gradle/maven). So yes, unit tests run when dependencies are built, but nobody is building dependencies when their web app gets built.
But if a library is among your dependencies, I'd wager you're going to call some of its functions.
So you run a maven build, maven retrieves the library, maven runs your tests, your tests call functions from the library - and the library code you've just downloaded gets run.
There's a few reasons that NPM sees more attacks than other ecosystems.
First, the scale of the JavaScript ecosystem. JavaScript is so much larger than every other ecosystem, so even a very small probability event (somebody introducing malware into a package) can happen surprisingly often given the scale of the ecosystem. Supply chain attacks are a problem in all open source ecosystems – not just JS – but they are a bit rarer and don't effect as many people so fewer people take note.
Second, npm was one of the first package managers to solve the classic "dependency hell" problem. In Python, if you have two dependencies, A and B, which both depend on different versions of C, say C@1.0.0 and C@2.0.0, respectively, then you're in trouble. You have an broken project. Python can only install one version of C. So now you're in dependency hell.
Npm on the other hand just installs both versions of C and it gives A the version that it wants, C@1.0.0. And it gives B the version that it wants, C@2.0.0. Both packages are happy - problem solved.
This caused Python maintainers to think twice before adding a new dependency lest they cause "dependency hell" for their users. Much better to just copy paste these 50 lines of code rather than adding a dependency. So there was an intrinsic sort of resistance – some pain is involved in adding new dependencies.
Npm maintainers had no such constraints. In a way, npm’s better developer experience led to the whole module ecosystem scaling "too well". Thus, you end up needing to trust more total maintainers, increasing the risk of supply chain attacks.
- Bigger packages in java, developed by big organizations one can trust. Seldom depend on small ad-hoc packages.
- installing a dependency with maven is downloading a jar-file. In npm you often need to run arbitrary code as part of installation, making the attack surface far greater.
- usually use a specific version of dependencies, so no updates unless explicitly wanted.
- in theory, a securitymanager can also be used to give code from different libraries different permissions. Not seen it used much in practice, though. Only seen it be used for plugin systems.
In npm you often need to run arbitrary code as part of installation, making the attack surface far greater.
This really is the key. It makes it so you can't even really compare JS to java in an intellectually honest way. It's just not even close to the same. One downloads a Jar that will only ever execute at runtime, whereas node downloads arbitrary code that will execute on your machine immediately if the attacker so chooses. Not only that, but a user may legitimately not even know that node downloaded that module. The dependency tree is so ridiculous that the user would have to look through it with a fine toothed comb to spot the unimaginably big security hole.
On the one hand, yes, the user should have looked through his/her dependency tree, familiarized themselves with what was in those dependencies code-wise, and known what he/she was doing. On the other, come on man. That's kind of like these 100 page EULA's that take away all your rights. I'm not sure that it's reasonable to expect everyone to read those as carefully as you'd need to read them to avoid the problem?
The difference between code running at install-time and at runtime is not that big, all things considered. How often do you install a dependency without intending to run it almost immediately afterward?
It's an order of magnitude difference. Code running at install time is almost 100% guaranteed to run. So a transitive dependency 10 layers down is just as dangerous as any other.
Runtime, however, you need some code path to actually hit some part of that library/import it to be affected.
The difference between code running at install-time and at runtime is not that big
It is in java, and rust, and other languages that have securitymanagers or make security guarantees. Node is running code not only in a context that the dev never intended code to be run in, but also a context the dev has no control over. In rust or java, (or a lot of languages actually), code only runs in a context controlled by the dev.
I mean, in the worst case, with node, you may not even get the opportunity to run the app you were trying to install. The module may just own you at the outset. The dev would be powerless to stop any malicious behavior in the library.
From a security perspective, these are huge differences.
Most packages you'd pull from maven are developed by large companies or foundations like apache. The dependencies you're pulling are simple jar files that get loaded at runtime and don't execute anything at build or install time.
I have forgot about how maven works, but at least in npm you have the post install script that lets a package run anything after the dependency is downloaded. So that means package creators can run any code they want on your machine.
I don't really remember if maven has something similar since it was years I did anything in the jvm ecosystem but I think some package managers (like composer if I remember correctly) doesn't give this opportunity.
But since node doesn't have a large standard library it means people will reach out for third party packages for stuff that is small tasks in most languages / runtimes.
Having post-install step available to maliciously use doesn't really solve anything. As you're a programmer downloading a dependency, you're allowing the dependency full control of your system (in most language, Deno seems to try to address this at least) at runtime (at least), so they could do whatever they want as soon as you include the dependency in your application and run it once.
True true, but at least you will have to check the api and start implement the library itself. Chances increase that you will see something weird about it the more you have to look at it.
Throw apt/snap/pacman/whatever into the mix and the answer is still "nothing". People act like package managers are somehow the end all but they're no more secure than going to a random official site and downloading some shit, it's just streamlined the process somewhat.
In fact the latter is probably more secure, since the more the packages depend between each other the worse it gets. One random dependency can be hijacked and will be autoinstalled everywhere. Or someone can delete it and break half the internet as we've seen time and time again.
You're kind of right, but there is a different between the default repositories used by apt (Debian & Ubuntu) and pacman (Arch) and things like npm, in that they are indeed reviewed and can't disappear overnight. You have some guarantee that it won't disappear overnight, because of the organizations behind them. With npm, anyone can publish/unpublish without any sort of review, while packages in the default repositories are reviewed by others.
The difference is that non malicious NPM package authors are trying to destroy you with saturation attacks (throw a huge mass of packages at you so you cannot possibly check all of them) so that malware can slip through more easily.
Currently working on this general problem for a large corp: java & js are our two main languages (alongside a lot of python & go, small amounts of swift, groovy, kotljn & c, and some very very old php). Trust me when I say Maven/Gradle etc. are orders of magnitude more painful to solve for than others.
Nothing at all unique to NPM about supply chain risk.
Fwiw, I find Composer to be one of the better of the lot.
How tractable is it to proxy the npm package sources?
Were I to try and solve this as an enterprise project, that's the first thing I'd try: have a team declare the specific subset of packages we have hand-vetted and host them off a corporate package manager. Our software builds from only those packages; if devs need more, they petition to get them vetted. If they need new versions, they petition to get them updated. Our team keeps an eye out for hotfixes and periodically might mandate an upgrade if a vulnerability comes around.
We do, but auditing the mirrored sources is still not straighforward. It's a trade-off between completely blocking all production builds on CVEs and fully transparent mirroring, one we're still trying to balance. It's also expensive - proprietary SaaS offerings in this space are not particularly competitive, and managing it in-house is intensive.
In terms of mandated hand-vetted packages, unless your hand-vetting team is inhumanly well-resourced, you're looking at a stifling corporate environment for engineers there, and/or a lot of attrition. Again, needs to be a balance between central control and autonomy.
Our current approach is simply auditing packages we know are deployed in production and assigning tickets to update/remove within a fixed time period, rather than blocking deployments completely. Probably looking to block select deployments based on criticality in the near future, but again distinguishing between a theoretical exploit and one our code triggers in practice is still pretty difficult to automate without significant noise. And the other issue here is differentiating newly discovered vulns (already in prod - blocking deployment doesn't help) vs newly introduced vulns (not yet deployed).
Author here. I totally hear the criticism. I had "Capabilities for nodejs packages" or something like it as a draft title for the post. But its a boring title. And I believe this article would have had far less reach with a name like that.
I hear you. I don't believe the answer is to just sensationalize everything to 110%. But I personally absolutely can't stand boring writing. Like, holding my attention on anything which isn't at least a bit human and engaging feels like torture. My job is 70% researcher, and one of the worst parts of my job is reading academic papers - I just can't hold my attention on a boring paper for more than a few sentences. My old supervisor used to tease me about using 2 weeks in the lab to save 5 minutes in the library. I think he's more right than he knows.
So, I really don't have a good answer. What feels like sensationalist clickbait for you might seem like a catchy title to me, with a promise of engaging writing for someone else. And what sounds like an accurate title for someone else might turn me off completely because of its blandness.
I suspect there's no middleground here where everyone is happy, and at the end of the day when I write its up to me and my judgement. And you, the audience to complain more loudly if you feel like like I'm taking it too far. I need that, because only have my own judgement for calibration. But thats pretty unsatisfying!
We need one or a few well maintained standard library for Node and/or Browser-JS.
Those must not depend on any other libraries. And then packages can reduce their dependencies vastly, by just referencing one standard library, that provides a lot of features.
JS's small standard lib is the cause of some of these issues.
When I look at one of the JS projects I have worked on there is a non trivial amount of 'is-*' packages whose only job is to identify the type of the object.
Exactly. Somebody would need to put together a standard lib, maybe even by repacking some already de-facto standard libraries. But only one library per category (one „is“ library, one date library, …). It’s a very opinionated task, but it could really help.
The small stdlib is part of it. The fact that identifying an object's type is so difficult is another part of it. It's also more foundational to JavaScript.
I was exploring the actual implementation[0] of a capabilities feature in Nodejs and was utilising seccomp (via libseccomp) on Linux at least to achieve a greater degree of security than might otherwise be possible by remaining in userland code. The idea is that you'd write your code, import whatever you like and define your capabilities upfront at initialisation. The problem is there's quite a big disconnect between what you are doing in JavaScript and what's happening with system calls in v8, libuv and the other native parts that it's difficult to predict what you need to block and what's actually going to happen. So I don't think my approach is really viable in a general sense, although capabilities in general I think would improve the situation if the wider community were to adopt the approach.
This article made me wonder about packages calling user code. Using express' example, what if I don't want to give it filesystem access but need it in one of my callbacks? Is the engine capable of distinguishing my (privileged) code from express' (unprivileged) code?
Your callback would have your capabilities in scope, so your user code would still be able to do whatever you want, even when called by an untrusted library.
This is similar to electronics supply chain. If you source from shady component distributors, you’re going to get bitten with something like a faulty capacitor with an annoying frequency in volume. The difference here is that a single vulnerability can take down your entire app or worse. It’s like sourcing components that can potentially do irreversible fire damage to your customers.
This is why they rely on trust worthy distributors and manufacturers. Without payment incentive and with high expectations for free stuff, this is not possible to solve.
Billion dollar companies building on people’s hobby projects. In any other industry, this would be unprofessional.
No, Go solves this problem and Deno explicitly copies its design. Just use Go. Seriously. The stupidest thing npm still does today is naively assume SemVer is an appropriate strategy for automatic updates.
There is so such thing as an appropriate strategy for automatic updates for developers. For end users? Sure. Because the assumption for them is that things are stable.
Developers need to check. Developers need to pin.
If you’re working in the Node.js ecosystem the solution isn’t to throw the baby out with the bath water and use Deno, it’s to `—save-exact`.
I’m sure as hell not throwing away all of the hours invested in now stable JavaScript software.
It's hard to make it practical without linking a whole lot of your local environment in. Remember it's death by a thousand cuts - every time you need some new thing, you just add it without considering the consequences too much. Probably lots of people doing this with their whole home directory linked in read/write.
Recently I got a little concerned about this and made myself a basic safety harness with the bubblewrap[1] tool: rather then going all out, I just lock the mount namespace to readonly for everything except the directory I execute it in. Which is at least some protection against system mods or wide-spread home directory destruction.
What's a lot more of a problem is trying to protect truly vital files - i.e. SSH keys and the like - which are also things you're likely to have bound into your VM anyway. selinux is a much better solution there (but so hard to administer as to be almost useless, though I do really like Fedora's default scopes and have used them successfully).
> Is it uncommon to use dedicated VMs for development for this very reason?
... or on the production deployment's computer (which presumably is also "your" computer, and has a similar set of problems). (... or, if you go there, in production inside a VM, but in the same context and with access to everything from and all the memory and capabilities as the rest of the code you wrote, such as access to networking and the database or arbitrary CPU utilization.)
These days, for performance reasons most of the old "development VMs" (=Vagrant or worse) got replaced by Docker containers - orders of magnitude less effort.
And in any case a dedicated VM is not going to protect you against attacks on your network, unless you go the full route of using a VPN to provide internet connectivity to the VM, and let's be honest almost no developer is going to do that simply because how much effort and maintenance it requires.
When I was still a consultant I spun up new VMs for each client engagement -- both macOS and Windows -- using Parallels. When the engagement was over I would dump that to an external HDD. It meant over-spending on my MacBook Pro to have the SSD and RAM space to support multiple VMs but there was never any question of client work corrupting my system (or vice versa) or needing to wipe and reload my machine after a client engagement.
Codespaces is great in my experience. VSCode devcontainers is another very similar approach that will keep it on your machine but at least containerize the development environment raising the bar fairly significantly for a rogue package to do serious damage on your machine.
Seeing how there are dozens of package management toolchains out there, why is NPM so uniquely absolutely horrible?
The only behavioral difference I see is that with most other package tools, installing a dependency pins the specific version of the dependency you installed.
So why does NPM instead pin "any version at least this, or greater"?
Feels like a decision that was made when the tooling was young to help improve reliability that has since created this nightmare. Is Yarn better about it?
I don’t think it’s uniquely bad. For example, Python’s story is just as bad.
The Java world is the best i know of but even relatively modern stacks like Rust are still making the same errors long-solved in the java world.
For example, nothing stops me publishing a google-grpc crate today (first come first served on names and that name’s not taken and i don’t need to prove ownership of my domain like in the java world).
C# and nuget is a blind spot for me so I don’t know. Go gets it right arguably even better than Java. Deno borrows the same decentralised idea.
The whole situation is a stack of cards right now though. I’m genuinely surprised we haven’t seen more specific targeting of high value maintainers. Github is arguably ahead of the curve here a little bit (e.g. the ability to flag and act on unsigned commits).
I guess so - there’s certainly been enough examples of various attacks, from typo squatting and dependency confusion right through to crypto mining and exfiltration of user’s data from their machine.
I'm not sure that it is. I think it's a numbers game. I've read* that npm is the largest repository in the world. More actors means more good and bad actors.
JavaScript is so much larger than every other ecosystem, so even a very small probability event (somebody introducing malware into a package) can happen surprisingly often given the scale of the ecosystem. Supply chain attacks are a problem in all open source ecosystems – not just JS – but they are a bit rarer and don't effect as many people so fewer people take note.
There is a flag you can override to be less aggressive, and it's in our onboarding docs, but only because I added it after a conversation with the only other individual on our team who was using it.
It's the classical situation of the framework having defaults that end up being actively hostile to developer success. These Primrose Path situations get under my skin like few things do.
I think making the capabilities the responsibility of the app (i.e., having the app handle tokens and pass them around) won't work, for a couple reasons:
(A) it's tricky for the app to protect itself. The executing app code tells the system what the executing app code is allowed to do. It's tricky to get right -- you have to manage the trust boundaries internally yourself, and it will be easy to get wrong.
(B) capabilities tend to depend on the environment, not just the app. E.g., production vs. staging vs. dev environments may well want different capabilities.
(C) capabilities are probably best expressed declaratively, but putting tokens into into APIs makes the app handle them procedurally. Once an apps capabilities need to go beyond the trivial, this will really explode the complexity, making it brittle and very difficult to handle correctly. E.g. suppose you use express... how would it know how to pass tokens to its dependencies? It would sort of need to understand your app's security model, which it can't do. What would actually happen is express would presume a certain model and pass tokens it receives accordingly. App dev would adjust the tokens it passes to express, adjusting until it works. So that at least works even if responsibilities are mixed up. But now express can't make internal changes to the way it calls it dependencies without risk of breaking apps that take the update.
Anyway, I'm thinking the capabilities need to be specified external to the app.
Say, in a config file (or files) of some sort. The app could only be allowed at most read access to the capabilities file(s); (A). There are some pretty common and straightforward ways to customize config for different environments; (B). A config file lends itself naturally to a declarative form; (C).
Author here. That could well be a better design - especially if we need to explicitly "bless" packages anyway.
One tricky part about that approach is it generates some weird semver problems. Lets say package A uses package B to interact with the filesystem. Package B has some problems, so the author of package A replaces B with B2 (a fork of B).
From a semver perspective, this is totally fine because the exposed API of package A hasn't changed. And this is also true with the capabilities system I explained. But how would we do it in package.json? If the root package needed to explicitly bless B2 instead of B, that means package A must have broken semver compatibility. Maybe each package expresses the permissions its direct dependencies have, and it sort of ripples out getting more specific in the dependency tree?
I think its a good idea, and there probably is a solution here somewhere. But I'm not quite seeing it. Want to write up a sketch of how you imagine that working?
This is a noble proposal but it requires far too many systems to collaborate on adopting it.
A capabilities system based on optional package metadata seems easier to introduce (think about how typings have been able to progressively layer on top of existing packages)
Every time I read anything about JS, WAT comes to my mind and that explains why everything in web UI domain would require a complete bottom-up rewrite.
And ofcourse that is why it will never happen. We just keep on digging deeper in mud same as with climate..
Of course this is a problem for all package repositories. Ruby, Python, Docker, Node, etc. But Node seems especially bad. Why? It seems to me that Node packages are mostly the work of a single person, and lack a community around them. You know you can trust Nokogiri because there are many developers working on it, and they're visible and accessible on Twitter et al. But many Node modules are just one person's passion project and/or resume padder. It's too easy for someone to have their npm publish keys stolen and not notice, whereas if there were more people on the project someone would probably notice sooner.
Many C libraries are maintained by a single person, too. But C/C++ programming has no equivalent to the Node culture of making trivial releases, so every single update to a C package gets quite a bit of attention from its users.
A capabilities system like pledge could be a way to safer use _existing_ packages. However, I think that it's not a very nice way to continue. Every application will end up doing its own capability pledging, and mistakes will be made. A lot.
Another approach could be to use an effect system like PureScript does. The main problem with Node.js packages is that any function you use can execute arbitrary code (such as wiping systems with an IP that is from the Russian region). Having an effect system in place the library author has no other means than to come forward with the side-effect, or code won't compile.
Hmm, where to begin? This is an old idea. It has all been tried before in the JVM world and yet support for it is now being removed, which is in my view a pity given that Now Is The Time. But the problems encountered trying to make it work well were real and would need to be understood by anyone trying the same in the JS world.
Understand that Java had it relatively easy. Java was designed with a sandbox as part of the design from day one, the venerable SecurityManager. The language has carefully controlled dynamism and is relatively easy to statically and dynamically analyze, at least compared to JavaScript. The libraries were designed more or less with this in mind, and so on.
So what went wrong?
Firstly, the model whereby you start with a powerful "root" capability and then shave bits off doesn't have particularly good developer usability. It requires you to manually thread these little capabilities through the call stack and heap, which is a nightmare refactoring job even in a language like Java let alone something with sketchy refactoring tooling like JavaScript. Lots of APIs become awkward or impossible, something as basic as:
var lines = readFile("library-data.txt");
is now impossible because there's no capability there, yet, developers do expect to be able to write such code. Instead it would have look like this:
function readFile(appDataPath) {
var url = appDataPath.resolve("library-data.txt");
var lines = appDataPath.readLines();
}
readFile(rootFileSystem.resolve("/app/data"));
Can you do it? Yes. Does it make code that was once concise and obvious verbose and non-obvious? Also yes.
Consider also the pain that occurs when you need a module that has higher privileges than the code calling it (e.g. a graphics library that needs to load native code, but you don't want to let the sandboxed code do that). In the pure caps model you end up needing a master process that "tunnels" powerful caps through to the lower layers of the system, breaking abstractions all over the place.
Secondly, this model means you can never add new permissions, change the permissions model or have different approaches because refining permissions == refactoring all your code, globally, which isn't feasible.
Thirdly, this model imposes cap management costs on everyone even if they don't care about security because they know the code is trustworthy e.g. because their colleagues wrote it, it came from a trustworthy vendor, or because it'll run in a process sandbox. Even if you know the code is good it doesn't matter, you still have to supply it with lots of capabilities, you still have to implement callbacks to give it the capabilities it needs on demand and so on.
These problems caused Java to adopt a mixed capability/ambient permissions model. In the SecurityManager approach you assigned permissions based on where code came from and stack walks were used to intersect all the sources on the stack. Java also allowed libraries to bundle data files within them, and granted libraries read access to their resources by default. That solved the above problems but introduced new ones, in particular, it lowered performance due to the stack walking, plus now library developers had to document what permissions they needed and actually test the code in a sandboxed context. They never did this. Also the approach was beaten from time to time by people finding clever ways to construct pseudo-interpreters out of highly dynamic code, such that malicious code could get run without the bad guy being on the stack at all.
Fourthly, it's dependent on everyone playing defense all the time. If your object might get passed in to malicious code, then it has to be designed with that in mind. A classic mistake:
The author's intent was to make an object in which you can read the list of commands but not write them. But, they're returning the collection directly instead of using an immutable wrapper. Fine in normal code, but oops, in sandboxed code now you have a CVE. Bugs like this are non obvious and the tooling needed to find them isn't straightforward. These bugs are a drain on development.
Fifthly, Spectre attacks mean that a library that can get data to an attacker via any route can exfiltrate data from anywhere in the process. You may not care about this, and for many libraries there may be no plausible way they can exfiltrate data. But it's another sharp edge.
Finally, it all depends on the ecosystem having minimal native code dependencies. The moment you have native code in the mix, you can't do this kind of sandboxing at all.
Now. All these are challenges but they don't mean it's impossible. Sandboxing of libraries is clearly and obviously where we have to go as an industry. The Java approach didn't fail only due to the fundamental difficulties outlined above - the SecurityManager was poorly documented and not well tuned for the malicious libraries use case, because it was really meant for applets. After the industry gave the Java team so much shit over that, they just sort of gave up on the whole technology rather than continuing to iterate on it. It may be that a team with fresh eyes and fresh enthusiasm can figure out solutions for the above issues and make in-process sandboxing really happen. I wish them the best, but anyone who wants to work on that should start by spending time understanding the SecurityManager architecture and how it ended up the way it did.
Author here. Thankyou so much for this summary of java's approach. I learned Java when I was a kid in the 90s and I remember seeing some SecurityManager stuff in the java standard library and I remember having no idea what that was or why I would want any of it. Its funny to think that decades later I would propose re-inventing it.
As for the code, surely something like this could work?
var lines = readFile("library-data.txt", capabilityToken);
But yeah, even in the example in my post the capability tokens are annoying and feel cumbersome.
Another poster in this thread suggested maybe expressing capabilities in your package.json file. Maybe when you pull in a dependency you can say "oh, and rather than inheriting all my capabilities, only give this library access to capability X. That would provide a nice ramp, but there's a whole new set of problems that way, since you'd need to be able to express something like "the capability you need for the redis client is network access to this specific IP address". And that specific capability needs to be passed all the way through the dependency tree to whatever finally opens the socket.
Expressing this in a granular way in code is easy, but noisy. But if we do it in package.json, maybe thats not going to be expressive enough.
Anyway, like you, I hope someone smart takes the time to have another stab at this. The security model where we trust all software engineers is obviously breaking down at this point. Short of a model like this, I'm not sure how we can really solve this problem at all. In any case, thankyou for sharing your wisdom.
Re: your example. What is "capabilityToken" in this case? What does it grant you, precisely? Is it a directory? A file? Something else? The classical approach to using caps with files is you create a File type of some kind, which encapsulates the permissions and lets you derive from it e.g. sub-directories, files in that directory but not navigate up the tree. Or it's associated with some whitelist of files.
For that to work you need not only a carefully designed set of types but also they must be able to protect their internals. JavaScript historically hasn't had this, I don't know about modern versions, but the ability to restrict monkey-patching, reflection over private fields etc is a must.
> For that to work you need not only a carefully designed set of types but also they must be able to protect their internals. JavaScript historically hasn't had this, I don't know about modern versions, but the ability to restrict monkey-patching, reflection over private fields etc is a must.
At the bottom of the post I sketched out how we could make this work in practice in javascript. We can use a Symbol[1], and then have that be a key into a Map owned by the builtin capabilities library. That would make the token itself safe from being messed with.
But so long as the capabilities library uses whatever the object is as a key in a JS Map (with the value being the token's scope), we could just as easily use anonymous objects or something else.
I think the issue is more the code that uses the capability itself. Like, if I can just read the capability straight out of the object that owns it, or monkey-patch the definition of some other object it calls into so I can use its capabilities indirectly, then you still lose. That's what I meant by playing defense all the time. If you give a bit of sandboxed code a generic utility object, it can all go wrong.
The idea here is that there's 2 things: The token (a Symbol() or something) and the scope of capabilities which that token gives you. The capabilities themselves are stored in a Map that you don't control. Javascript function scopes give us everything we need to hide that map and make sure nobody can modify it. The only methods which are exposed are things like getScopeForToken() which reads from the map (and does a deep clone) then returns that scope object.
In privileged methods like fs.writeFile(), you don't pass the scope. You pass the token. And that method would explicitly go and check if that token has the scope that it needs to write to the passed path.
But I do hear you about playing defense. I mentioned it in the post - there's probably a bunch of subtle ways you could use javascript to mess with things. Covering all of these cases would need some serious rigor.
It was trying to implement capabilities in JavaScript, but failed because JS was too dynamic at the time. It might be that newer language versions have made it possible but it'd be worth researching why they gave up on it.
Caja was designed by Google research scientist Mark S. Miller in 2008[3][4] as a JavaScript implementation for "virtual iframes" based on the principles of object-capabilities. It would take JavaScript (technically, ECMAScript 5 strict mode code), HTML, and CSS input and rewrite it into a safe subset of HTML and CSS, plus a single JavaScript function with no free variables. That means the only way such a function could modify an object, was if it was given a reference to the object by the host page. Instead of giving direct references to DOM objects, the host page typically gives references to wrappers that sanitize HTML, proxy URLs, and prevent redirecting the page; this allowed Caja to prevent certain phishing and cross-site scripting attacks, and prevent downloading malware. Also, since all rewritten programs ran in the same frame, the host page could allow one program to export an object reference to another program; then inter-frame communication was simply method invocation.
I spent some time with one of the Caja developers back in 2010 or so, before it was made public.
From memory, the problem they were trying to solve was a bit different. From what I remember, they wanted to be able to run potentially hostile user supplied javascript code inside the JS VM purely using source code level validation. So for example, Caja needed to make sure the sandbox container didn't access the global object (since then it could escape its sandbox). And because simple code like this: (function () { return this })() evaluates to the global object, they banned the keyword this in sandboxed code.
I'm hoping there's a way we can give untrusted code more or less full access to the JS environment, but just limit its access to the rest of the operating system. Javascript was first developed for web browsers, and to this day most javascript still has little to no need to access the rest of the operating system directly.
But Javascript's obsessively granular modularity works in our favor here. If you look at a library like express, the core library makes vanishingly few calls to the nodejs environment. `app.listen()` is the only method I know about which wouldn't "just work" in this new world I'm proposing. And thats just a convenience wrapper around `require('http').createServer(app)` anyway. All the hard work happens in libraries like express.static - but thats trivially easy to swap out for another package that supports capabilities correctly, if we need to do that.
A bad library could always be buggy - we can't stop that. I mostly want to stop opportunistic developers from taking advantage of the machines their modules are run on, so we can detect (and stop them) from doing nasty things. But as a few people have mentioned, this approach might be stuck "always playing defense". The nice thing about caja is that it was "complete". There were no weird edge cases left over in the language that the sandbox authors didn't consider. Thats what I'm worried the most about here.
> Lots of APIs become awkward or impossible, something as basic as[...]
I mean, wouldn't you use a `readFile()` function like that by passing in the file handle? So:
var lines = readFile(fs.open("library-data.txt"));
...where, if you're in a library somewhere, `fs` may be a capability to a directory that you've been passed rather than a global granting access to the entire filesystem. This doesn't feel much more awkward than your example of:
var lines = readFile("library-data.txt");
EDIT: I am assuming you have an `fs.open()` that returns a file handle here; Node's doesn't seem to and instead takes a callback as an argument. You get the idea though.
That's pretty much what I said, no? It gets awkward: now your library can't just load some data table it needs from a file, it has to have either some sort of initialization step where you give it the capabilities it needs or it has to take them in the API call itself.
Now let's say you change the implementation such that it needs a new permission. You have to pass that in, which may well mean passing it in from the root of the app through a long call stack. Quite painful. Programmers like conveniences such as being able to give a string instead of a file handle.
I’m sure programmers do like that convenience, but if the consequence is that we’re giving every library access to everything the rest of the app has access to, I don’t think that’s tenable long-term.
> Now let's say you change the implementation such that it needs a new permission. You have to pass that in, which may well mean passing it in from the root of the app through a long call stack.
Sure, but put another way: you can’t change the implementation of your library to grant yourself more access to the system without the calling application being aware of it. Is this potentially inconvenient? Sure. But it does mean that the developer of the calling program knows pretty dang well what access they’re handing over to the library.
Vanilla JS is not something we should be afraid of. I wish we’d consider it more often in projects.
The latest generations, ES2020
and newer, are pretty useful and pleasant to use. If you leverage their features, it goes a long way. Implementing the odd missing function yourself along the way is perfectly doable if you embrace TDD.
My feeling is that people in the JS ecosystem tend to overestimate how much time they’d allegedly waste re-implementing stuff, and grossly underestimate the true cost of deeply-nested dependency trees.
Sort of a cake and eat it too mentality if you ask me. If you're not willing to build your own JS glue then you should at least take time to audit the glue you are pulling in from some thankless developer who probably started this as a hobby project in college and now it is the backbone of 10 fortune 500 frontends.
I'm sorry that they can then do whatever the hell they want with that package but this ecosystem exists because companies want free stuff but don't want to provide back into it.
If, instead of loading library code directly into the same process and heap, package code were instead loaded and run in their own dedicated processes with their own dedicated memory and you communicated via message passing (a la actors) then this approach would really start to make sense.
But most node packages today are designed to run directly in the same process and memory space as the code that is being imported.
Also, any package manager that runs arbitrary code should not be trusted.
I was once part of a startup named Intrinsic. We had built out a complex product that protected Node.js processes with extreme granularity. Not much information exists about the product after it was acquired but the blog posts are still up:
And that's the summary of TFA: sandbox everything!
Except there's always sandbox escapes. OK, those are harder, so sandboxing more is a start, for sure.
There's no panacea here. Dependencies are costly. Open coding is costly. Curating external source and packages is costly. It's all costly. We need to recognize a lot of these costs.
We are distributing trust in a too thin way.
Node packages should be grouped in superset packages with a concentrated trust on special maintainers. Makes no sense to upgrade a lot of small packages each time we do a "npm update".
I remember reading articles like this since 2016 or 2017. How come nothing has changed in so many years? Is this issue not important enough? Or is the solution just to expensive/unpractical to implement?
I think the machine has just gotten too big, it would require a huge effort, even the original creator or Node.js, Ryan Dahl, is strugling create sufficient displacement with Deno. Innovation over safety? Like cheap products I guess, either you join them or get driven out of business, leaving only the cheap products anyway.
Also, if you want to spend a lot of time, effort or money to vastly improve something, would you put it into JS or would you perhaps focus on another language that has better foundations and hope it catches on, where there is the potential to outperform (in quality and consistency) the JS ecosystem? Perhaps those with more sense move to other ecosystems, such as C#, Go, Java, Elixir? I say this as a long-time user of JS who has recently enjoyed his foray into Elixir.
I made another post here, but in a separate note, some of the developers of the most used software in the Node.js ecosystem, as people, don’t deserve your trust either.
For the past decade or more I’ve watched and personally interacted with the personalities of some of these developers and the last thing they seem to have on their mind is the stability of your software. They just do not care.
Many of them are totally willing to throw away years of work that you also built years of work on for you to chase after their new toy.
These are guys that built popular packages in their early to mid 20s. They weren’t thinking about software that lasts any meaningful duration of time.
I think this needs to be solved at the OS level, not the language level.
It's a problem for every language -- packages you download can write to any file or make network connections. It might be a little worse in NPM because of the culture, but the problem is pervasive.
Personally I'm interested in the direction of lightweight containers that behave like executables, and that have composable dependencies. Developing in Docker-like containers can work but there are a bunch of downsides to be mitigated.
what is the solution for people who want to use it for frontend only?just importing js libraries that will only need to work on browser so no access to system resources without nodejs or browser? any ideas?
There are only two solutions: 1) Trust someone 2) Carefully read through their changes
node_modules should be banned from .gitignore
npm should stop moving folders around. If I have placed a module in node_modules/foo/bar it should stay there! Not get moved to node_modules/foo@1.0.0 etc
If the code is managed manually and not by package.json, I would consider putting such code in a "vendor" or "third_party" directory instead of putting into "node_modules".
You use "npm install" to place modules in node_modules, and "npm update" to conveniently update them. But you have to either 1) trust the maintainers of the modules or 2) review the code changes.
there is already a project that does most of this: deno. there are of course other motivations that differentiate it from nodejs but i think this should be part of a new runtime that was build with this in mind, deno is the way forward. Of course putting your projects in appropriate jails/zones/containers/vms is also an option, but that has always been true
It makes me deeply sad to see these sort of interactions in open source [1].
> Hmm, I think it's a worthwhile fix. Where did you see malware here?
> I think the author of this repo is free to decide what code he publishes. Say thanks to that it's for free
An incredible amount of people have dedicated sweat and tears and foreheads (from banging against the desk in frustration) to open source across the entire stack, from the contributers to OSs such as Linux to those working their arses off to create better frameworks, languages and runtimes, that we can all benefit from and use with a reasonable expectation of security, respect and privacy.
As a university student, I feel privileged to have been able to grow up in a world where so much work and knowledge is provided for free with no strings attached, regardless of demographic/location, I would not be where I am without it. A century ago this would not have been possible. To all of you who have tirelessly and selflessly worked on OSS for others, without expecting anything in return or imposing politics, ideologies, infringing on privacy, causing damage, collecting vast quantities of marketable personal information or monopolisation, I give you my heartfelt thanks for your efforts, you know who you are. You have created something that will have forever helped to improve our society and empower those that want to learn and create their own designs.
From my own personal experience, I want to give a shout-out to the smaller projects of Rust, Svelte and Elixir. I think it's incredible that the work and ideas of (often) a single person (Rich Harris, José Valim) can grow into larger extremely welcoming and helpful communities with many more motivated contributors that are proud of being parts of those projets and put in an extraordinary effort to try and do things better than before. I'm sure there are plently of other worthy names I'm too young/ignorant to know.
Love it or hate it, Node.js has been very empowering for a large number of people to learn and publish their own full-stack applications, the JavaScript ecosystem has improved enormously since its beginnings, but has a tendancy to change slowly due to its size, unless a disruptive technology comes along such as TypeScript. Websites are a great way to introduce people to the joy of programming with its visual feedback, you can make a small penguin move across the screen, then move on to play tic tac toe. Even as a younger developer, I admit that the days of FTP, no-build-step pages with a sprinkle of JQuery were easier to understand and actually safer for newcomers than introducing someone to a SPA stack (which can easily have thousands of transient dependencies) nowadays.
The reason this keeps happening with NPM is because of absurd number of dependencies in the average node app. I have a tiny app I've been playing with using create-react-app. There are over 800 directories in node_modules. That absolutely dwarfs the number of any other language I've used. Even in a medium sized rails app, you likely have some awareness of what every dependency is. It's just impossible with npm.
This makes it easier for someone to inject their package into the ecosystem whether it's actually very useful of not (like the colors package).
One thought I've had to "reboot" the npm culture is to somehow curate packages that are proven to have minimal and safe dependencies, probably through manual review. Maybe it could be recursive, so that safe projects only rely on other safe projects.
> The reason this keeps happening with NPM is because of absurd number of dependencies in the average node app.
But why does that happen? There are now lots of languages that make it trivial to add dependencies. While I find projects in those other languages to also have too many dependencies, it's no where near what happens in JS apps. I'm thinking of projects I've recently worked on in Rust, PHP, and Java. Java projects seem to be a distant second-place to JavaScript projects when it comes to willy-nilly dependencies.
It's not a rhetorical question: Why is the culture with JS so much worse about this?
I absolutely hate that I'm going to suggest this, but is it just because of the average skill level and experience of people working on JS projects?
Or is it because JS is such a bug-prone programming language that we're all afraid of actually authoring any more code than we absolutely have to, because we know we'll waste hours debugging things that should be relatively simple?
> You almost have to reach for some utility library or build your own ad-hoc one just to use the language.
Sure. But are hundreds of dependencies really required for this?
In Java you would use tools like Guava or Spring for general quality of life improvements, and there would be a few deps for them (under a dozen iirc).
The solution is for the “top tier” libraries and frameworks in the JS world to be designed with minimal dependencies. And where they do have a need for a dependency, they undertake serious consideration of the best option that minimises dependency hell.
And those "top tier" libraries do exist for some stuff- especially if we're talking about the anemic standard library. The famous lodash library doesn't have any (non-lodash-umbrella) dependencies AFAIK.
You are meant to pass an argument to [].sort; If you don't, it falls back to the most generic thing that makes sense -- which is turning everything into a string and sorting lexicographically.
Arrays in JS can have any type in them, such as [number, string, function]. That's obviously very unlikely, but the only thing in common between all types in JS is that they can all be explicitly turned into strings.
And yes, I agree that throwing an error here for no argument would be better here (a linter WILL enforce this), but this is hardly a critical shortcoming of JS's standard library, and you DEFINITELY do not need a utility library just to use the language (especially for your examples).
I hear what you're saying, but it still seems pretty bonkers to me that if you try to sort an array of numbers it will cast them to strings and sort alphabetically (!):
> [1, 2, 10, 3].sort()
[ 1, 10, 2, 3 ]
> And yes, I agree that throwing an error here for no argument would be better here (a linter WILL enforce this), but this is hardly a critical shortcoming of JS's standard library
I suppose "critical" is debatable but this seems very fundamental and very unexpected to me.
> you DEFINITELY do not need a utility library just to use the language (especially for your examples)
I hadn't actually considered using a linter to avoid these types of standard library footguns... that's actually a pretty great idea!
Since arrays can contain a mix of anything, I don't think it's that bonkers that the default impl chose to canonicalize values into consistent comparables with String(). It's just not useful for numbers.
But for example, they probably decided that it was more useful defaulting to having a sort order for things that otherwise aren't comparable:
Since null < undefined and undefined < null are both false, then a simple `a < b` comparator wouldn't sort them at all. Same for objects.
If you have an array that's a huge mix of random values, from null to undefined to [] to {} and you sort() it, all of those values will now be grouped together by type.
There would be a significant performance impact if `.sort()` argumentless had to traverse the list and checked all inputs were numbers. It's better to pass a sort function -- that is how the API is meant to be used.
I agree that without an argument it should just be a fatal error, but if it were to have any sort of functionality, it should convert all to strings.
This is consistent among JS's apis, and pretty much the origin of all the 'js wut' moments on the internet. Everything in JS can be converted into a string. If you do something stupid with disparate types, it will likely turn operands into strings and compare them that way.
I don't see that as the case. Modern JS can do a lot out of the box but the culture looks down on "vanilla" JavaScript. I've seen way too many libraries that are nothing but a thin wrapper around native functionality. When your first (and only) technique is to look for a library, this is where you end up.
I agree and am very pro vanilla JS FWIW. I just find myself reaching for something like lodash's `intersection`/`difference` functions when working with sets, `sortBy` to get more normal (and not in-place) sorting behavior, and `groupBy` to do group by.
What I think is happening: the JS ecosystem has been flooded by developers with minimal experience and/or education.
They learn engineering principles like "don't repeat yourself" and take that to mean installing an entire dependency to implement left pad is a good idea.
You are probably on to something with JavaScript being bug-prone as a factor in that.
JS as an ecosystem has a really big problem with developers not knowing the value of simplicity.
You can't have simplicity when every other week someone is shoving a new framework down your throat. And in fact you are made look like a looser if you dare do things in vanilla JS
This is a big problem in the tech industry, in general. Some weeks back I read a comment that described some behavior as "high intelligence, low wisdom". I believe that fits pretty well here.
People design new frameworks (presumably) because they see an array of problems with existing frameworks. In designing their new framework, they try to address the shortcomings that the existing frameworks have.
What they don't realize is that the problems in existing frameworks were _known tradeoffs_. Now, instead of the One True Perfect Framework, we have yet another framework with its own set of problems.
People think that every problem is solvable simultaneously, but that's simply not true. You can make tradeoffs. And this isn't just true in engineering, it's true in life generally.
Some tradeoffs make sense nearly always. Others only make sense in certain contexts.
An example here is the tradeoff between simplicity and high availability. It doesn't matter what you do -- the simplest high availability configuration for an app will ALWAYS be more complex than the simplest non-HA configuration. You're making a trade here. It's a trade that is absolutely sensible, but it's a trade nonetheless.
The lesson to be learned here is: stop thinking you can solve every problem at once. You can't.
One potentially contributing factor is that npm makes it very easy to avoid conflicts with duplicate dependencies, i.e. if one dependency has a transitive version on some other package, and another dependency also has a transitive dependency on that package, but on an incompatible version, that's not a problem in npm: it'll just store both on your disk. That might remove some pressure to minimise the number of dependencies on library authors.
And of course, there are just far more people working on it, I believe.
This reduces package-installing friction, but this is a good thing anyway -- there's no reason why you shouldn't be able to use foobar@1.0 and foobar@2.0 in the same project. For all intents and purposes they're different packages, viz. the version is just as important as the package name.
This may not be the whole story, but one of the reasons is that JavaScript does not have a standard library. Corollaries are: several different module systems, application frameworks and bundlers exist.
I think that's a big part. The standard library isn't great, and progress is tied to browsers. Additionally, tiny packages became the norm early on. Is even, is odd, is negative zero, left pad, etc.
I don't know all the reasons for that, but I think part of it is that developers create them in order to put it on their resume. "My package is downloaded 500k times a week"
Simply because blast radius for Java is limited to a set of very high quality libraries -- in terms of code not functionality. These libraries come from Apache Foundation, Eclipse Foundation, Google, Facebook, Spring, etc. Literally every single Java application depends on something from Apache [ok I understand stuff like Log4Shell can still happen].
The same is not true for JS. The most mature libraries depend on absurdly vague libraries that no one has ever reviewed.
I was going to ask the obvious question of why the Java ecosystem ended up differently than the JavaScript ecosystem, but I think I know the answer.
It's a giant pain in the ass to publish a Java library. That's already weeding out a ton of low-effort projects. By itself, I wouldn't exactly call that a good thing, but it seems to have a silver lining...
I’ve mostly worked with JS in the browser and my understanding of node lacks nuance, but it seems like all of this would be mitigated drastically by building out the standard library. For comparison, here’s pythons list of standard modules:
They don’t make third party libraries obsolete— for example, I tried using the built-in IMAP library the other day and it’s definitely too low-level to make sense for most quick projects that check email, so I used a third party Library— but all of those modules are vetted, stable, and require no external dependencies.
I believe the node maintainers have staunchly opposed such measures. I don’t know what their reasoning is so I don’t have an opinion on whether or not it’s worth it.
I'm also pretty new to JS but totally agree. Half my Python projects have zero dependencies (except for development tools which don't get packaged with the app) because everything I want is already in the standard library.
It feels like a quarter of the time I spend working on projects that use npm is spent debugging my toolchain because of excessive complexity or weird problems in random dependencies. Doesn't seem like it should be too much to ask that I can spend most of my development time working on actual code.
The process of uploading packages onto Maven Central is... not modern. npm (arguably?) makes this a lot easier than any other language, therefore npm developers do more of it.
The interaction between a language and its culture is really complicated, partially because it's an ongoing iterative process, and thus chaotic, in both the English and mathematical senses of the term.
Not having a standard library made people accept needing libraries for even very small things that in most other languages developers would make at least some effort for using the standard library before reaching for something else.
I think another aspect is that the initial leadership of a language community sets the tone for a long time, but JS in a lot of ways didn't have that, not through any fault of any particular person but simply because Node grew so explosively at the beginning that the usage growth dominated the available leadership growth, and so there was a lot of very wild, woolly growth that got written into the earliest culture. This creates something a lot like a "seed crystal" that has outsized impacts on future paths for a long time.
Finally, I do think it is definitely an issue that JS is somewhere where you get a lot of people who are not "programmers" per se and they are making big decisions about code bases and libraries. They're young in the art. And while there's nothing intrinsically bad about that, people have to start somewhere, there's a lot of ways in which it's good that JS is relatively easy to get into, etc., it is also absolutely true that at scale, as the language and the libraries iterate on each other and seek out their stable points, that's going to affect the landscape. This is an "is", not an "ought". It is what it is. JS has also continued explosive growth, so even as someone who got into JS and programming for the first time in 2017 is now a 5-year "senior" (a crack about our industry terminology, not the dev here) developer who has learned and might be inclined to do things differently than they did 5 years ago, there's another 2.5 newbies "voting" in the community as well.
It's a hard problem, I salute the leaders in the JS and npm world working on it, I wis them the best and advocate people giving them grace working in a very hard situation. But I'm also glad not to be part of it.
I think the reason is JS doesn't have much of a standard library. Java, C#, Ruby, Python, Rust, Go, and many others come with a large library you can use to write non-trivial applications without ever needing to fetch an external dependency. JS, particularly outside of Node, doesn't have that. To get functionality most other languages/runtimes include out of the box, you need to write a bunch of code yourself or pull in a dependency to use someone else's implementation.
It may be more about the quality rather than the quantity, though. Rust's standard library isn't very big. People sometimes complain that we have to fetch dependencies that are so ubiquitous that they are effectively part of standard Rust: crates like rand, futures, bytes, etc.
But even JS's built in string stuff isn't so bad that it somehow justifies leftpad existing, so I don't know...
Can't find it now but I remember an interview or article by Ryan Dahl, describing his original vision of Node.js. He related it to childhood enjoyment of building and piecing things together, and used tinker toys as an analogy saying he wanted code to be the same.
In other words it has been an intentional design decision from the start.
My take is that the lack of experience for the average JavaScript developer is absolutely a factor here. I don't think it's the only factor though. Here are some of the other pieces of the puzzle.
JavaScript's standard library is so thin on the ground that there's already a culture of "reaching for a library" to accomplish tasks that many languages do out of the box.
The monoculture is wide enough that the language caters to lots of paradigms and schools of thought. If there's one library that uses classes and method chaining, you can be sure that another will pop-up to re-implement the same functionality in a pure functional style. One will focus on type safety and another will abuse the dynamic bits of the language to make the code you write as terse as possible.
Amount of code shipped has always been a more important metric for JS than other languages because the nature of the web means that users have to wait whilst the source code is downloaded before your page becomes interactive (for a huge class of applications). This encourages developers to favour smaller libraries that solve for narrower problem domains.
It's become very trendy to write a smaller, faster, better, smarter version of existing libraries. The JavaScript community loves the process of picking a catchy name, registering a domain, designing a logo, and publishing packages as though they were businesses. This creates an abundance of packages that look great on paper, but with no users, patchy/non-existent tests and maintainers that haven't ever used the code in a professional context.
Finally, I think JavaScript is a deceptively simple language. It doesn't take very long before people (mistakenly) think they're close to mastering the language. By comparison, contributing to an open source project in a meaningful way is quite difficult, so these developers assume that other libraries must be written badly if they find it hard to contribute. Then they write their own, because they believe they can do a better job.
The ecosystem as a whole sees a lot of innovation, and pays for that with a lot of churn and a lot of dependencies. From a theoretical standpoint, it's a fascinating corner of modern programming. In a professional context, it horrifies me and I wish I could sanely cut npm out of the chain.
I think it's the sheer amount of programmers using JS. It's also a very approachable language, so it makes it easy to learn the language before learning good practices.
> While I find projects in those other languages to also have too many dependencies, it's no where near what happens in JS apps. I'm thinking of projects I've recently worked on in Rust, PHP, and Java.
My experience with these new languages is such that this feels a bit unfair. It's like insisting that a disaster with 1000 fatalities is "much worse" than one with "only" 200. It's ... true ... I guess, but there's something uncomfortable about making the comparison. Something has gone badly wrong if the comparison even needs to happen in the first place.
What I'm getting at is that e.g. Rust has an enormous problem in this area. It's not uncommon for me to see Node projects with over a thousand transitive dependencies, but on the other hand, I very frequently see Rust projects with over a hundred. And the Node projects tend to be more complicated than the Rust ones; they do more.
Take the last Rust program I tried to use, tealdeer. [1] If you don't know, tldr is a project that provides alternative simplified man pages for commonly used programs that consist entirely of easy to understand examples for the program. [2] What a tldr client needs to do is simply to check a local cache for each lookup, and if necessary update the cache online. It's a trivial problem that can be, and has been! [3], solved in a few hundred lines of shell (if you're being extremely verbose). How many recursive dependencies would you guess tealdeer uses? Depends on how you count, of course, but as of today the answer is ~133 deduplicated dependencies! For a program that's a glorified wrapper around curl!
Or another Rust program I looked at recently, rua [4]. In Arch Linux, the AUR is a repository of user maintained scripts for building and installing software as native Arch packages. Official tools for building and installing software already exist for Arch, but it is common for users to use a wrapper around these tools that makes fetching and updating the software from the AUR easier. It's a relatively simple task that (once again) can be done with shell scripts. rua is such a wrapper. As of today it uses 137 deduplicated dependencies!
These Rust programs are simple terminal tools to do tasks that are almost trivial in nature. And yet they require hundreds of constantly updating dependencies! The situation may well be better than what you'll find for Node, but it's undeniably disastrous compared to either simpler languages without a built in package manager (like C) or more complicated batteries-included languages where best practices continue to prevail (like Python).
Package curation exists in other languages, too. C++ has Boost, and Haskell has its Haskell Platform. It helps avoid the pitfalls of languages with large standard libraries (where stability guarantees make "batteries included" turn into obsolete and bitrotting "dead batteries").
This is an idea that every ecosystem eventually realises it needs. Once you've got enough versions of enough libraries that A and B both need C but at different versions, you start to need curation, although that need might not become pressing enough to do anything about for a while. But once you've got curation, cryptographically trusting that curation becomes viable in a way that cryptographic trust of the original packages often isn't.
Putting a layer of "distributions" over language ecosystems, in the same way that "distributions" solve the problem of putting enough mutually-compatible library versions together to get Linux to work, is, I think, inevitable.
> Once you've got enough versions of enough libraries that A and B both need C but at different versions, you start to need curation
Node specifically doesn’t have this constraint, though, as A’s C and B’s C can be loaded independently into the same VM, each one hermetic and only visible to its parent. (This is probably half of why Node’s ecosystem became the way it did, now that I think about it; every other ecosystem hits increasing numbers of constraint-resolution conflict problems as dependency hierarchy depth increases, and so limits itself in hierarchy depth to avoid this.)
> Node specifically doesn’t have this constraint, though, as A’s C and B’s C can be loaded independently into the same VM, each one hermetic and only visible to its parent.
Rust can do this as well, though most version variation is handled via semver rules wrt. compatibility. I think this exact requirement led to some controversy in the Go community at some point? Though they should now have a module system that allows for this?
Another aspect I think is, how much open source dependency do you actually need for your project. You can probably get by with fewer packages than you expect, which also makes it more feasible to run a curated package system.
For example, you can't submit any old Haskell library you put together to Hackage, or you couldn't at the last time I checked. It had to meet certain minimum standards to be considered.
I honestly don't know where you would start in terms of curating NPM, precisely because of the dependency on dependencies. You'd end up curating half of the ecosystem.
Then you have languages like Go, where it feels like the norm is relying on the standard lib, and any dependency really must be providing value to justify it.
My node projects typically have thousands of dependencies, my java and C++ projects have hundreds. My go projects typically have 8-12, and have never once exceeded 40.
Not to stan the Go language itself, I just wish this philosophy was the prevalent one in languages.
Cool, I'm glad there's precedent for it. I'm not opposed to adding guardrails as the post suggests, but it feels like a band-aid on much deeper problem.
> That absolutely dwarfs the number of any other language I've used.
Rust has this flaw too. Last time I wanted to compile a simple project using actix-web and probably a database library, I had 200 crates to compile. In both cases I think it's due partially to preferring small packages/crates.
Note that this separation is necessary if you want to achieve somewhat parallel compilation in Rust. Every proc_macro needs its own separate crate, so I end up providing two crates for macro-based stuff. At least, the stdlib seems to have enough batteries included for the dependency count to not be as high as NodeJS projects.
From my experience I would rate the count of dependencies in this order:
NodeJS > Rust > Python > C/C++
Though I cannot explain why this is ordered this way between {Node, Rust, Python}. Again, batteries included? Language popularity? Beginners-friendly programming language?
I would argue that it's ordered by a mix between popularity, need for those packages and ease of installing packages. JS and Python are around the same order of magnitude of popularity, installing packages is way easier in JS than in Python (NPM might not be great but pip is hell). Rust is less popular but it's very easy to have lots of packages, plus what you mentioned about procedural macros. And the standard library, like JS, is relatively small. If you add Go to that (relatively easy to add package, less popular than Python and JS but more than Rust, lots of stuff in the standard library) which would be higher than C/C++ and lower than Python, it does seem to fit.
I don't know anything about the C#/Java ecosystems, same with Ruby and Perl. I'd like to know if they fit this "model" too, that would be interesting. It could give some pointers on how to design/make a language evolve to avoid having lots of packages.
I'm not sure about the standard library of Rust being bigger. Node ships with an HTTP module, random number generation and regexes. Rust has neither of those. Maybe more functions in each modules?
> I cannot explain why this is ordered this way between {Node, Rust, Python}. Again, batteries included?
My vote is that its cultural. Both for "batteries included" python, and for nodejs.
Its easy to forget now, but npm was extremely innovative when it first came out. It absolutely went out of its way to make adding dependencies as easy as possible. And the culture (especially in the early days of nodejs) went bananas for this philosophy of programming. I had a chat with @isaacs on a bus one time. (He was the maintainer of nodejs at the time). I asked him what he thought about package documentation, and what to do when a README isn't enough. He said that he thinks if a readme isn't enough documentation for your library, your library is probably too big and should be split up.
You still see this today with packages like "isobject" (a tiny function published as a package) which still gets 53M downloads / week[1].
Correct me if I'm wrong, but as I understand it python and ruby still don't support parallel dependencies with different versions in the build tree like npm does. If a python or ruby package transitively depends on foo@1.0 and foo@2.0 then my understanding is that ruby and python lose their minds. And this problem is almost impossible for the end user to solve. So libraries like rails sort of need to be designed as one big block of software.
Nodejs has no problem with this - if you do this in nodejs, npm will just quietly install both versions and node will happily wire everything up correctly. The only constraint is that no single package can have a direct dependency on both foo@1 and foo@2 at the same time. But thats not something that you ever really want to do in practice.
The tooling supported this too. It was quite common to have a library with some optional features that not many people used. If you wanted to exclude them from your javascript bundle, until recently the main way to do that was by breaking your library up into small pieces so your library's consumer could pick and choose what they wanted.
Building a dependency tree out of with thousands of tiny modules is exactly what nodejs is designed for. It should come as no surprise that thats what we got.
Mentioned in a sibling comment already, but Rust has these features as well, and I think Go ended up getting them too although there was some controversy about the need for them.
AIUI, rustc now has ad-hoc parallelization of compile within a single crate. The defined compilation unit is still the crate though, as opposed to the single file in C/C++.
It's ultimately Rust's reliance on generic code that forces us to deal with so many packages at build time. In idiomatic C/C++ much of the reusable code you compile ultimately turns into reusable shared objects, that sit in binary form on your fs with their corresponding include files. This is only possible in rare cases with Rust, because the "library" equivalents (crates) don't know how their generic types and code will be instantiated downstream. Everything is the equivalent of a "header only" lib.
Yeah, in language design they might be on cutting edge. But in software engineering they want to be close to 90's, 00's cultish software design patterns, DRY principle and so on.
> And more of that. But just saying "a lot of dependencies bad" is not useful.
I disagree. The primary problem is that proper dependency auditing is incredibly time-consuming, especially if you want to stay up-to-date. The reality is that most people simply don't do it all.
In the below graph, create-react-app has 66 package dependencies and 88 links between packages. The purpose of the graph isn't to disagree with the statement of "800 directories" but to illustrate that part of the problem is depth. The maximum depth (in my hand count) is seven.
We have pondered about capability based security for Deno in the past. Our conclusion has always been that this is not possible to do securely in JS without freezing all prototypes and objects by default. The reasoning for this is that you need to make sure the capability token does not ever leak. For example as a malicious user I could override `globalThis.fetch` to exfiltrate the capability token destined for `fetch` and use it myself later.
One could also override `Map.prototype.set` / `Map.prototype.get` to exfiltrate a token every time it is added or removed from a `Map` (people will want to store tokens in a `Map`).
One could also override `Array.prototype[Symbol.iterator]` to exfiltrate tokens stored in arrays if those arrays are destructored, spread, etc.
There are many more cases like this, where one can exfiltrate tokens because of the very dynamic nature of JavaScript.
It is unlikely that freezing all intrinsic prototypes and objects is even enough. People will find ways to exfiltrate tokens.