Hacker News new | past | comments | ask | show | jobs | submit login
Understanding Concurrency, Parallelism and JavaScript (rugu.dev)
99 points by kugurerdem 4 days ago | hide | past | favorite | 26 comments





The one missed distinction is that concurrent tasks can be executing in parallel, it just doesn't imply they are or aren't.

Basically, all parallel tasks are also concurrent, but there are concurrent tasks which are not executed in parallel.


I like to use "interleaved concurrency" for the later category.

That reminds me of the JavaScript interpreter written in JavaScript (which is used for Scratch, I think) - that supports stepping through multiple instances of the interpreter to achieve a kind of interleaved concurrency.

https://neil.fraser.name/software/JS-Interpreter/docs.html#t...

> JavaScript is single-threaded, but the JS-Interpreter allows one to run multiple threads at the same time. Creating two or more completely independent threads that run separately from each other is trivial: just create two or more instances of the Interpreter, each with its own code, and alternate calling each interpreter's step function. They may communicate indirectly with each other through any external APIs that are provided.

> A more complex case is where two or more threads should share the same global scope..


That is how I learned it at uni too. Unfortunately, I have found these definitions are all over the place in some literature.

Pragmatically it doesn't feel like there's a lot of difference to humans, because even single-threaded concurrent code feels like it's parallel because computers are just so, so much faster than humans that switching rapidly feels parallel. And you often are well advised to treat it as parallel anyhow, at least to some extent, for safety's sake. That is, you may be able to get away with less "locking" in concurrent-but-not-parallel code, but you still can't write it exactly the same as conventionally single-threaded code or you will still get into a lot of trouble very quickly.

So pragmatically I don't find a lot of value in distinguishing between "concurrent" and "parallel". Some. But not a lot.

There is a difference for sure. It just isn't useful in proportion to the way people like to jump up and correct people about the differences.


Think about it not in terms of timing but in terms of instruction. Parallelism requires that different instructions can occur with resource independence. In JavaScript that requires something like web workers or Node clusters. Concurrency merely means one task does not block another, which is less restrictive and more commonly available through the event loop.

To sort through the "dialect of jargon" that a piece of educational CS/SWEng writing is using, it helps to put it in the context of its publication. Consider both when it was published, and what kind of work the author of the writing does.

Why "when it was published"?

Well, mainframes were single-core until the 70s, and then multi-core (really, massively multi-socket NUMA) thereafter. PCs were single-core until the late 90s, and then multi-core thereafter. Thus:

• Anyone writing about "concurrency" in some old Comp Sci paper, is probably thinking about "concurrency" as it pertains to their local single-core Minix mainframe, or on their single-core Sparc/NeXT/SGI university workstation — and so is inherently thinking and talking about some form of cooperative or pre-emptive multi-tasking through context-switching on a single core, with or without hardware-assisted process address-space isolation.

• Anyone writing about "concurrency" in some old "industry-sponsored" Software Engineering or (especially) Operational Research paper, was likely working with the early parallel batch-processing mainframes, or perhaps with HPC clusters — and so is much more loose/sloppy with their definitions. In their mental model, there is no context-switching — there is just a cluster-level workload scheduler (itself bound to a core) which assigns workloads to cores; where these workloads are essentially single-threaded or shared-nothing-multi-threaded virtual machines, which own the cores they run on until they finish. (Actually very similar to writing CUDA code for a GPU!) To them, the only kind of concurrency that exists is parallelism, so they just use the words interchangeably.

And why "what kind of work the author does"?

Well, anyone writing about "concurrency" today, is writing in a context where everything — even the tiniest little microcontrollers — have both multiple cores and the inherent hardware capability to do pre-emptive multitasking (if not perhaps the memory for doing so to make any sense — unless your "tasks" can be measured in kilobytes); and yet everything does it in a slightly different way. Which in turn means that:

• If such a person is writing product software, not in control of the deploy environment for their software — then to them, "concurrency" and "parallelism" both just mean "using the multi-threading abstractions provided by the runtime, where these might become OS threads or green-threads, might pin their schedulers to a core during CPU work or not, might yield the core during IO or not, who knows." None of these things can be guaranteed — even core count can't be guaranteed — so their definition of what "concurrency" or even "parallelism" will do for them, has to be rather weak. To these people, "parallelism" is "nice if you can get it" — but not something they think about much, as they have to write code under the assumption that the software will inevitably get stuck running on a single core (e.g. on a heavily-overloaded system) at some point; and they must ensure it won't deadlock or livelock under those conditions.

• Meanwhile, if such a person comes from a background of writing SaaS software (i.e. software that runs in a big knowable environment), then anything they say about concurrency / parallelism is likely founded on the assumption of having large distributed clusters of big highly-multicore servers, where the chief concern isn't actually in achieving higher throughput through parallelism (as that part is "easy"), but in resolving distributed data races through write-linearization via artificial concurrency-bottlenecking abstractions like channels, message queues, or actors that hold their own linear inboxes. For these types, "parallelism" puts them in mind of the "multi-threaded with shared state behind semaphores" model that they want to avoid at all costs to keep their software scalable and ensure distributed fault-tolerance. So this type prefers to talk about designing architectures made of little actors that are individually intentionally concurrent-but-not-parallel; and then hand-wavingly introducing shared-nothing instances or pools of these trees of little actors, that can live and move within greater parallel distributed clusters.

• And if such a person comes from a background of writing embedded software (i.e. software that runs in a small knowable environment), then their assumption will likely be founded on a concern for achieving realtime dataflow semantics for at least some parts of the system — requiring some, but not all, of the cores to sit there bound to particular tasks and spin-waiting if they finish their processing step early; while other cores are free to be "application cores", executing arbitrarily-long instruction sequences. To these people, "concurrency" is mostly the frustrating low-level process of handing off data between the real-time and non-realtime worlds, using lockless abstractions like shared ring buffers; and "parallelism" is mostly just getting the single most expensive part of the application to schedule a pool of almost-identical expensive operations onto a pool of specialized identical cores. (This is the design perspective that made the PS3's Cell architecture seem like a good idea.)


If you prefer to learn by video, here's an excellent talk on the same subject by Rob Pike that I link all the time to people

https://www.youtube.com/watch?v=oV9rvDllKEg


I had posted this as well before realizing you had beaten me to it!

For the uninitiated, this is a seminal and oft cited Go talk "Concurrency is not Parallelism" by Rob Pike - well worth your time for anyone, even outside the Go community.

I'd encourage anyone who finds themselves in the comments here to check it out, it expands on the content of the article beautifully.

Another link, with some added context

https://go.dev/blog/waza-talk

As well as the slides for those who may not want to watch the video

https://go.dev/talks/2012/waza.slide#1


Good video.

(My paraphrase of) Rob Pike's tldr:

Concurrency is a characteristic of the design that allows multiple tasks to execute, coordinate and be scaled.

Parallelism is a characteristic of the runtime that allows simultaneous execution of those tasks.


I like to think of them as different levels. Concurrency is at a higher abstraction level: steps that can execute without needing to wait on each other. Parallelism is a bit lower and reflects the ability to actually execute the steps at the same time.

Sometimes you can have concurrent units like multiple threads, but a single CPU, so they won’t execute in parallel. In an environment with multiple CPU they might execute in parallel.


I just learned something! I realize now I was talking about parallelism in a recent interview question about concurrency. Oh well.

Concurrency often is about running your I/O routines in parallel, achieving higher bandwidth. For example, one computer handling 50 concurrent HTTP requests simultaneously.

No single HTTP request uses all the CPU power or even your Ethernet bandwidth. The bulk of your waiting is latency issues. So while one task is waiting on Ethernet responses under the hood, the system should do something else.

Hard Drives are another: you can have random I/O bandwidths of 5MB/s or so, but every request always takes 4ms on the average for a 7200 RPM drive (aka: 120 rotations per second, or about 8 miliseconds for a complete rotation. So 4ms on the average for any request to complete).

So while waiting for the HDD to respond, your OS can schedule other reads or writes for the head to move to which improves average performance (ex: if 8 requests all are within the path of the head, you'll still wait 4ms on the average, but maybe each will be read per 1ms).

----------

Parallelism is often about CPU limited situations where you use a 2nd CPU (today called a core). For example, if one CPU core is too slow, you can use a 2nd, or 8 or even 128 cores simultaneously.

------------

Hyperthreads is the CPU designer (Intel and AMD) that the above concurrency technique can apply to modern RAM because a single RAM read is like 50ns or 200 clock ticks. Any RAM-latency problem (ex: linked list traversals) would benefit from the CPU core doing something else while waiting for the RAM latency to respond.

-----

Different programming languages have different patterns to make these situations easier to program.


thinking about this - is there a term for tasks which are partially parallel, that is to say X starts at 0.1 and ends at 1 and Y starts at 0.2 and ends at 0.9 - X and Y are not parallel, but they are something that I'm not sure what the technical term for is. (this is assuming they are not executed concurrently either of course)

If there are 2 processes/threads, they are parallel, if there is just one it is concurrent. For example, the event loop in JS is concurrent, offloading computation to workers is parallel.

sure but parallel comes anyway from the English word, inheriting the meaning "occurring or existing at the same time or in a similar way; corresponding.", 2 processes that exist in the same time all time are, to coin a phrase, classically parallel (by the original English usage), but two processes where they exist only part of the time at the same time are not parallel in the original meaning although they may be for CS - but the question is if there is a special technical term for this condition that anyone knows of?

> 2 processes that exist in the same time all time are, to coin a phrase, classically parallel

This is not physically realizable.

> but two processes where they exist only part of the time at the same time are not parallel in the original meaning

It doesn't, both formally and colloquially one would still refer to them as "in parallel."

---

In a very loose way you're talking about structured vs unstructured parallelism but it has nothing to do with time, and you can have parallel processes that start at the same point but do not join at the same point and it's still considered "structured."

Structured parallelism is where the distinction of where a process starts/joins is meaningful semantically, but there is not really a distinction between "these N things live for the same amount of time" and "N things have different lifetimes" in terms of vocabulary. They're all still parallel or concurrent processes.


Thanks that was helpful, I suppose when you say vocabulary and colloquially you mean this in the context of Computer Science, since I think parallel outside CS has in some ways both laxer and stricter meanings (that is to say a parallel process in colloquial English usage is something that runs for the same time)

That said I have trouble finding resources on structured and unstructured parallel processes, google just wants to tell me about structured and unstructured data, do you have some links?


I think there's a language gap here, in colloquial English saying "two things in parallel" means they happen at the same time but does not imply anything about when the events start or end. A big mistake among programmers is that "concurrent" and "parallel" are often synonyms in colloquial English while they have more formal definitions in computer science.

> That said I have trouble finding resources on structured and unstructured parallel processes

Because you're searching the wrong thing, you should be looking up sources for parallelism in the context of structured programming, specifically structured concurrency. Googling for "structured concurrency" "structured parallel programming" has plenty of results. In multithreaded code the data structure is sometimes called a "scoped thread."

To be perfectly clear: structured concurrency/scoped threads do not require that the start/end of tasks are equal. They only allow you to rely on the semantics of them starting/ending when structuring a program, which may include that constraint, but does not explicitly refer to it.


thanks!

Are you really trying to learn something here? Because it seems you can't accept that you are thinking it wrongly and it has no back in CS (or outside). Google doesn't just want to tell you about structured and unstructured data, what you are looking for doesn't exist. Parallelism (in CS or outside of it) doesn't mean things start and end at the same time just that they happen in PARALLEL. That's really not such a difficult concept to grasp but you need to have some humility to just say: oops, I might have a wrong understanding and might need to fix it. This is healthy and make people smarter.

probably I should just drop it but

>Are you really trying to learn something here?

sure, which is why when user "duped" said

>>In a very loose way you're talking about structured vs unstructured parallelism

I replied

>>That said I have trouble finding resources on structured and unstructured parallel processes, google just wants to tell me about structured and unstructured data, do you have some links?

As I assumed the structured and unstructured was referring to parallel processes - I can find "Structured vs Unstructured concurrency" but that of course goes back to the top level discussion that concurrency and parallelism are not exactly the same.

>what you are looking for doesn't exist.

OK, because someone other than you, unless you are using two accounts somewhat suggested that it did exist. That being the user named "duped" that I was replying to and whose terms I used in my question. If you are "duped" (short for duplicated?) then why did you imply these concepts did exist in your earlier message?

>Parallelism (in CS or outside of it) doesn't mean things start and end at the same time just that they happen in PARALLEL.

Sorry, but the definition of Parallel "outside" of CS - when dealing with time - is "occurring or existing at the same time or in a similar way; corresponding."

>That's really not such a difficult concept to grasp but you need to have some humility to just say: oops, I might have a wrong understanding and might need to fix it. This is healthy and make people smarter.

OK, but I am pretty used to standard English usage and my understanding is that vernacular usage of parallel - when referring to time - is that they start and end at the same time (which is also how I would define "happens at the same time", evidently you have a different definition of that though), whether or not your or my definition is right it seems to me that my definition lines up with the Oxford Language dictionary definition I quoted above.

However given that two lines can be parallel even if they do not have the same length perhaps I am too literal in my time parallel usage - though every usage for time parallel I have ever seen seems to imply running over the same time-span is the meaning.

on edit: changed distance to length as distance is meaningful in relation to parallel lines.


> OK, but I am pretty used to standard English usage and my understanding is that vernacular usage of parallel - when referring to time - is that they start and end at the same time

So what happens if one task is simpler and ends earlier? it needs to continue running to be parallel and wait for the other task to end together? Maybe I'm in the wrong here and would love to understand it but for me it doesn't make sense (English isn't my native language tho).


After discussing it on English stackexchange https://english.stackexchange.com/questions/625990/meaning-o... I have to conclude that my definition is probably too strict (based on other people's opinions as language is a shared construct) although I still think maybe I just noticed something about the usage that other people haven't - that is to say my natural feeling of usage in English is that parallel things start and end at the same time (on same time scale - not assuming millisecond accuracy here) otherwise if not we would say that the first reign of Napoleon and the Presidency of Thomas Jefferson were in parallel to each other which to me at least seems absurd - they overlapped each other or were contemporaneous - but not parallel.

This however just seems to be my own feeling that nobody else shares, which I find strange, I guess it's one of those how do I know the color orange is the same for you as it is for me things - only in this case it turns out the color orange totally isn't the same for the two of us.


when I say start and end at the same time this of course refers to the time scale used, for example if I am talking years and say Mr. Adams started his project in 1920 and finished in 1928 and then later I say Mr. Bronder worked on his project in parallel to Mr. Adams I would expect that meant Mr. Bronder started his project in 1920 and finished in 1928 - not that they started the exact same minute - because the time scale used is in years.

Learn Go and you will understand concurrency



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: