Common Multithreading Mistakes in C# – Unsafe Assumptions

exDM69 · on Feb 22, 2017

Nice writeup, and most of this isn't specific to C#, but applies to native code (or Java) as well. These are typical misconceptions of people who have not studied concurrecy.

What is specific to C# (applies to Java too) is having an implicit mutex/condition pair in each object. I think this is a terrible design mistake because it's not very practical and it's confusing to newbies (a big stumbling block for students when I studied concurrency at the uni).

It's not very practical because in most concurrency tasks I've dealt with (not using C# or Java), there's typically more conditions than there are mutexes (typically 2-4 + O(n) conditions per mutex). In Java/C# land, the typical solution would be to have a complex conditional expression, all the threads spinning on a .wait() there and then over-use of .notifyAll() instead of .notify() causing lots of spurious wakeups and wasting precious cpu cycles.

It's confusing because of the reason above, it's easy to go "I'll solve this by adding another mutex". Unfortunately this is seldom the correct solution (to any problem), and a much better result would be achieved adding more condition variables to wait on.

I wouldn't mind too much about a design mistake in a language I almost never use (Java/C#) if it wasn't the de facto language for learning about concurrency at universities. This has produced so many engineer with a twisted view on concurrency. I understand that "Java is easy" and "C is hard" but when we're already talking about memory models, multi-core, cache coherency and atomicity, C-the-language isn't the hard part and Java-the-language does very little to help with those parts.

pvg · on Feb 22, 2017

In Java/C# land, the typical solution would be to have a complex conditional expression, all the threads spinning on a .wait() there and then over-use of .notifyAll() instead of .notify() causing lots of spurious wakeups and wasting precious cpu cycles.

The typical solution is to use the higher-level constructs provided by the standard library that take care of these details for you. In Java, at least, explicit mutexes, notify, etc in application code are roughly the equivalent of typing 'AES'. I'm somewhat surprised this C# article happily tells you to just keep adding locks until you think things work. "I mean if you have a hunchback, just throw a little glitter on it, honey, and go dancing."

zzzcpan · on Feb 22, 2017

It's more complicated than that. Implicit synchronization and multithreading in libraries is error-prone, poorly performant, has a significant cognitive overhead. It's a liability if you are chasing performance and a liability if you aren't. So to be good citizens libraries have to assume that a user might need a complete control in choosing and using a concurrency model (of course unless they are the ones providing concurrency models themselves).

pvg · on Feb 22, 2017

I'm talking about libraries that provide higher-level concurrency constructs not implicit multithreading in a library that does, say, image manipulation. The whole point of the former is to reduce cognitive load and errors.

zzzcpan · on Feb 22, 2017

I assumed you were talking about things like concurrent queues, etc. They do implicit synchronization and enforce a specific concurrency model, so they should not be used by libraries.

m_fayer · on Feb 22, 2017

Yes, but I think he also meant things like the TPL, PLINQ, TPL Dataflow, RX, Akka.NET and so on. I have been happily using one or more of these to write tons of event driven and/or parallel client-side code and haven't touched a lock in years.

And libraries exposing APIs through observables is a great idea.

eropple · on Feb 22, 2017

I don't dislike TPL and PLINQ (though I don't much care for Akka.NET at all), but at the same time I pretty regularly write code with locks. There are problems for which the simplest solution is to treat it stupidly and imperatively and to run those threads side-by-side with well-defined critical sections.

m_fayer · on Feb 22, 2017

For dead-simple parallel execution of a handful of jobs, IMO nothing beats mapping data to tasks with Linq and then "await Task.WhenAll". And if they have to share data, use something in System.Collections.Concurrent.

emn13 · on Feb 22, 2017

For fun bits of API inconsistency there, would you think that...

"task.Wait()" is to "await task" as "Task.WaitAll(tasks)" is to "await Task.WhenAll(tasks)"?

Maybe it's just me, but I sure did.

bonzini · on Feb 22, 2017

WhenAll returns a task (a promise, essentially) that completes "when all" its arguments have completed.

emn13 · on Feb 22, 2017

I seem to have misremembered, the odd one out was WhenAny, and the issue is that it crashes when given the empty set, instead of waiting forever (which is fine if you're dealing with a specific number of tasks, but not fine if you're actually using it for a dynamic number of tasks).

I could have sworn it was WaitAll, but heck - clearly not! Sorry :-).

pvg · on Feb 22, 2017

Whether they should or should not be used by libraries is a completely different kettle of turtle wax, I did say 'application code' above.

vvanders · on Feb 22, 2017

Another downside of the "throw another mutex at it" is if you're caring about performance(which you probably are) ti's a great way to introduce overhead via death-by-a-thousand-cuts.

Each mutex you take has a not-trivial cost but doesn't show up well on a profiler since the individual call is small and spread across an invocation path.

sharpercoder · on Feb 22, 2017

The more threading code I have seen, the more convinced I become that it should be avoided at all cost. The pitfalls are nonobvious and potentially incurring datacorruption. Rather, they should be behind understandable abstractions (like await/async and Task/Task<T>, but those have their own pitfalls) where applicable. Also, I can see value in an "unsafe" block (e.g. a "threading {}" keyword) where the writer of code communicates to the reader "inspect this with special threading awareness"

ro_sharp · on Feb 22, 2017

I concur, with a caveat - feel free to use threading / async I/O if you think you really understand the runtime behaviour of your code.

I've seen so much code that just runs TaskFactory.StartNew some arbitrary number of times in a loop to run some CPU intensive task with no/effectively no I/O, presupposing a performance improvement as a result.

Understand runtime performance. Understand the different failure modes inherent in multi-threaded approaches, and lastly: test, profile, test and profile again!

flukus · on Feb 22, 2017

> and lastly: test, profile, test and profile again!

This, this and this. I see so much code where threading and caching are thrown around as a solution to a performance and no one ever tests to see if it's actually an improvement. I saw one the other day that used multiple threads to call a micro service which locked execution to a single caller and most of the bottleneck was serialization/deserialization. Making the microservice a library would have been much more performant.

thomasz · on Feb 22, 2017

Is there something wrong with

    for(int i = 0; i < Environment.ProcessorCount; i++)
        Task.Factory.StartNew(...);
?

quinnftw · on Feb 22, 2017

Potentially, yes. Assuming that you are correctly synchronizing the actions inside each Task as to avoid race conditions (a big assumption), there is no control on the number of threads here. What if your program is already using a bunch of threads for something else, and then you go ahead and spawn off 10 more? You will oversaturate the CPU and end up with virtual threads, thus incurring scheduling overhead. In my opinion when you are doing this type of thing you should almost always use a thread pool to ensure that you aren't creating an unreasonable number of threads.

mdpopescu · on Feb 22, 2017

For one thing, prefer Task.Run to Task.Factory.StartNew - see https://blogs.msdn.microsoft.com/pfxteam/2011/10/24/task-run...

prefect42 · on Feb 22, 2017

I've been doing concurrent programming for 17 years, it's been pretty much the same game, at least with Java/C/C#/C++ which is my experience. The abstractions are helpful, but clearly no panacea. The most common problems I see today are with Tasks w/lambdas and Parallel.ForEach, where synchronization is either completely missing or misused, or unnecessary (i.e. a better design would have been to remove all shared state to begin with). The next main problem I run into, folks tend to sprinkle in concurrent code in an ad-hoc fashion, even using the understandable abstractions. That works fine, until it doesn't, with the "not working" state being rather difficult to detect.

markdoubleyou · on Feb 22, 2017

Yeah, the industry is aching for a good solution. My favorite articles on the topic are:

- Edward Lee's "The Problem with Threads": https://www2.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-...

- The occasionally hilarious chapter on concurrency (especially the "Concurrentgate" section) from Andrei Alexandrescu's "The D Programming Language", available in its entirety here: http://www.informit.com/articles/article.aspx?p=1609144

I think MS made great progress with the TPL and kickstarted an industry-wide movement with async/await. But certain aspects of C# still drive me nuts and will allow a junior dev to blow a leg off (I'd give my left arm for C++-style const references and/or compiler-enforced immutability). At one point I went so far as to play around with a Rosyln analyzer to tackle the problem (https://github.com/markwaterman/CondensedDotNet/blob/master/...), but gave up after realizing anything more then a token effort would be a huge undertaking.

Other languages are nibbling away at the edge of the concurrency problem with language-level support for CSP (golang), actors, etc., but, outside of the functional world (Erlang), I don't seen anyone working to address concurrency from the ground up.

Scramblejams · on Feb 22, 2017

If you haven't, you might want to check out Pony.

btschaegg · on Feb 23, 2017

> The more threading code I have seen, the more convinced I become that it should be avoided at all cost.

My last boss had the same approach to threading. Although I understand the reasoning (usually it's based on some bad experiences with messy multithreaded code), I still strongly disagree with the premise.

Of course, the approach can be fine depending on the requirements (in which case, of course, it's preferrable), but generalizing the statement can be a huge mistake.

Most of the threading code I've seen got so horrible precisely because it was tucked on after the fact. If you start your codebase on the premise that everything is single-threaded, switching the important bits to something remotely concurrency-compatible often requires major rewrites. Thus, I consider it to be an important architectural decision one should make early on. You don't have to go through with it and start with a huge degree of concurrency, but while you're implementing the core of your application, you should be aware where it makes sense to design around the possibility of concurrency. Once the API is suited in that way, you avoided a major headache while trying to make your glacial code faster in a hurry.

Of course, I'm referring to concurrency on the architecture level, not the trivial cases in which individual operations can be sped up. Having those issues in the back of your head while designing is one of the things that makes a good software architect. Too often it's omissions like this that will be the cause for explanation like "I know it's a mess, but it has grown organically and we can't change it now." down the line if you're explaining your project to a newcomer.

Edit: As an addendum: The problem with switching to multithreaded code also doesn't only lie in the amount of work it entails and the amount of code you'll have to touch to do it, but also in the amount of code you forget to touch. It's very easy to introduce very subtle race conditions that only show in very specific edge cases. But when they occur, you'll have a major firefighting job on your hand. I feel those are easier to avoid if you try to design for concurrency up front instead of trying to remember everything that could potentially break after the fact.

sidlls · on Feb 22, 2017

I'm not sure why this is getting downvotes. I think some folks try to leverage this tool because they think it's "more performant" or "an optimization" or whatever and the result is some questionable threading code.

Concurrent programming is hard and complicated, even in languages with superior safety guarantees, especially for non-trivial, non-toy implementations. The harder it is to implement a solution, the more care and consideration for whether it's necessary or some less difficult or complex solution might suffice instead, in my opinion.

ibgib · on Feb 22, 2017

I enjoyed this article. I started concurrency with Delphi 5 using critical sections, mutexes, semaphores, etc. Then I started using C# which was a major upgrade, and I loved having the simplicity of things like the `BackgroundWorker` class and the `lock` statement which certainly beat the need to write my own multi-threaded class for every little thing in Delphi. I recall some of article's mentioned atomicity problems, and I would have to juggle when to use Interlocked vs locks, etc. For example, it mentions floats and hardware-specific issues with 64-bit ints, but from what I recall I treated booleans as atomic and not needing Interlocked or locks, but floats I did. That CPU caching detail level though sounds mighty insidious. I also forged ahead and used TPL, async/await, Rx (Observables are pretty awesome)...

But now I live on Erlang's BEAM (via Elixir) and I freaking love it. The real gain for me is that I found I didn't need mutexes, locks, critical sections, etc., because the super lightweight thread-like Erlang processes (not OS processes) themselves run in parallel but each one runs in a single-threaded manner. This effectively turns each process itself into its own critical section, and it's this aspect that I personally have found extremely valuable.

bonzini · on Feb 22, 2017

Do your Erlang processes never execute an asynchronous (await-like) operation? If they did, any shared data would need to be protected by a mutex just as like as with multiple OS threads.

ibgib · on Feb 22, 2017

> Do your Erlang processes never execute an asynchronous (await-like) operation? If they did, any shared data would need to be protected by a mutex just as like as with multiple OS threads.

I think perhaps what you are getting at is that there is still need for coordination for parallel processes, and that is totally correct. But each BEAM(!) process has its own non-shared memory and runs on a single thread. This is my point in that it doesn't need the mutex because the process' execution is its own critical section. I've structured my ibGib engine to be more functional and the need for the mutex/critical sections has disappeared for me.

For example, check out this SO post that I just googled: http://stackoverflow.com/questions/28554114/applying-a-mutex...

In the answer, he tells the OP (who thinks he needs a mutex) that what he can actually do is just do the work in the process. This kind of understanding of single-threaded "bounded context"-like execution, combined with ubiquitous immutability has been massively helpful for me with ibGib.

In C# for example, I was trying to apply `Task<ib>` all over the place (yes it is non-idiomatic lowercasing of the interface...totally aware of this sin in C#). And so even with async/await awesomeness, the code is just riddled with async/await, blah blah blah. The same goes for using Rx in C# for a "microservices" engine (I started it before that was a fad and I called them autonomous services lol) and my more recent POC with Rxjs in TypeScript/JavaScript. The point is the whole approach of concurrency and whatnot are "addons" in the form of mutexes, semaphores, critical sections, locks, barriers (I've used quite a few over the years). But with programming on the BEAM(!) with Elixir and Erlang, parallel coding is just a more natural experience, because it's just how it's done.

In my view, this largely stems from the aspect I mentioned: that they have parallel processes, each running its own memory and on a single thread, communicating with message passing (now called the Actor Model but they didn't know that at the time). On top of this fundamental design decision, they built up an awesome infrastructure with OTP, Supervision trees, and more. But anyway, I didn't mean to write that much - but it's just been a delightful experience, since my very first application in Delphi was a transcription app with a from-scratch multi-threaded realtime document checker (like a spell checker, but more pluggable and geared towards transcription with text expansion which is a totally different use case than existing off-the-shelf spell checkers).

achr2 · on Feb 22, 2017

It's odd to have a whole section on coherency and never mention the built-in `volatile` keyword who's entire purpose is to indicate that a variable requires fresh reads..

squeaky-clean · on Feb 22, 2017

> In this case, the correct solution is to mark field _A as volatile. If that’s done, the compiler shouldn’t reorder the reads of _A and _B, because the load of _A has load-acquire semantics. However, I should point out that the .NET Framework, through version 4, doesn’t handle this case correctly and, in fact, marking the _A field as volatile will not prevent the read reordering. This issue has been fixed in the .NET Framework version 4.5.

From the article linked where they mention volatile and the C# memory model. Seems like volatile may not actually have worked as intended for this case!

achr2 · on Feb 22, 2017

I'm not saying volatile is a good solution, but to not cover it in this article is an odd omission - especially since he shows an explicit memory barrier.

mathw · on Feb 22, 2017

I'd like some more explanation of why the author thinks it's better to use a lock statement than an Interlocked method most of the time.

This is really just fundamental concurrency stuff though. Something which is sadly in short supply in some people's skill sets, but then I guess not everyone spent four years working on massively multithreaded C++ software early in their career like I did.

I'd really prefer it if C# made you share memory explicitly - default shared memory concurrency is just asking for trouble, in this and many other languages, because you have to do extra to do things right, rather than extra to do things wrong.

chaostheory · on Feb 22, 2017

> Something which is sadly in short supply in some people's skill sets

Things may have changed, but when I last looked there are very few books on multithreading, especially when it comes to a particular language. A lot of it seems to just be learned on the job using reference docs

achr2 · on Feb 22, 2017

I agree about the use of Interlocked. Obviously if you already need a lock around additional code, there is no need to use Interlocked, but for any single statements where Interlocked is available, it is a simple performance gain. That said I have seen several incorrect and unnecessary `CompareExchange` usages that could have been avoided with a simple lock...

Sharlin · on Feb 22, 2017

> That said I have seen several incorrect and unnecessary `CompareExchange` usages that could have been avoided with a simple lock...

This is, obviously, the answer to the GP's question.

TillE · on Feb 22, 2017

That's no reason for avoiding atomics in simple use cases. "Use locks and then profile" is fairly bad advice. Forget the profiler and always use atomics in simple situations where you know exactly how they work. Just don't try anything complicated with them.

blahfuk · on Feb 22, 2017

However, interlocked is really the only way to create (most) lock-free data structures. Fortunately, there are a few libraries available for that now, but it wasn't always that way in C#.

gwbas1c · on Feb 22, 2017

Yeah, I tend to prefer various forms of Interlocked to avoid deadlocks that happen 0.01% of the time.

jaegerpicker · on Feb 22, 2017

It's a cool series. Mostly review for me but I certainly passed it to my team to review. Ultimately I say that 99% of the time the correct answer is to not use raw threads and never share memory. There are much better concurrency and parallel patterns out there. Akka(.net in this case), Coroutines and Channels, Erlang Actors, Clojure's core.async, .net's async/await even. Honestly if you are reaching for threads you probably should ask yourself if some other way would be better and safer.

cm2187 · on Feb 22, 2017

I learned something about Thread.MemoryBarrier().

I expected the following to be safe:

    Dim V(99) As Integer
    Parallel.For(0, 100, Sub(i)
                           V(i) = i
                         End Sub)
    Dim Result = V.Sum

If I understand correctly it looks like I need to flush the memory before accessing V:

    Dim V(99) As Integer
    Parallel.For(0, 100, Sub(i)
                           V(i) = i
                         End Sub)
    Threading.Thread.MemoryBarrier()
    Dim Result = V.Sum

taspeotis · on Feb 22, 2017

The CLR has some implicit memory barriers that make most things "just work."

Take a look at "[t]he following implicitly generate full fences" here: http://www.albahari.com/threading/part4.aspx.

Parallel.For would probably be covered by "[a]nything that relies on signaling." There's more info scattered about in some Stack Overflow answers. The content is useful, just unofficial.

E.g. http://stackoverflow.com/a/6932271/242520, http://stackoverflow.com/a/681872/242520

Matthias247 · on Feb 22, 2017

That's interesting, I had thought the first one to be safe since Parallel.For should already imply synchronization: The operation should only return once everything was done - and the waiting everything done part of it should act as a memory barrier.

gnur · on Feb 22, 2017

I know there is a website that has gamified certain race conditions. There was a interface that allowed you to step through code in 2 "threads" and the goal was to defeat the enemy army though deadlocking their code.

Does anyone know where it is? It was very informative..

ttrunck · on Feb 22, 2017

I know this one: https://deadlockempire.github.io/#menu

jsingleton · on Feb 22, 2017

Nice post. Simply written and easy to understand.

There are two others:

- http://benbowen.blog/post/cmmics_i/

- http://benbowen.blog/post/cmmics_ii/

It's well worth learning about the Interlocked class if you want to do parallel programming. If you get it wrong then you will see incorrect/corrupt results. It's also worth keeping in mind that for simple tasks it can be quicker to do them single threaded, due to the overheads involved. I demonstrated this in a simple benchmark app [0] that I wrote for a chapter of my book on ASP.NET Core [1].

[0]: https://github.com/PacktPublishing/ASP.NET-Core-1.0-High-Per...

[1]: https://unop.uk/book/

manigandham · on Feb 22, 2017

Always use Interlocked over a lock statement if you're just dealing with incrementing numbers. It's 1-line, faster and always safe. Author is mistaken for advocating a lock in these situations.

Only need lock() and the full Monitor classes if there's more to be done within the locked statement.

sidlls · on Feb 22, 2017

These are common mistakes novices to multithreaded programming make, presented and explained well. Very nice.

In my experience another mistake made by novices and journeymen alike (at least occasionally by the latter) is to reach for this tool without careful consideration of whether it's even necessary.

bbcbasic · on Feb 22, 2017

I'm such a novice I avoid reaching for the tool ;)

TazeTSchnitzel · on Feb 22, 2017

“Newbie” question: if reading a value without having a lock on it can give you a torn read, wouldn't it be possible to crash the CLR by getting a torn read on a reference? Or, does it account for this somehow?

Eyas · on Feb 22, 2017

A reference fits perfectly in a word-sized chunk. A pointer/reference is typically 32 or 64-bits wide, since it is simply pointing to a location in memory.

dsp1234 · on Feb 23, 2017

from the ECMA-335 standard (which is not the latest reference, but is easily accessible):

"A conforming CLI shall guarantee that read and write access to properly aligned memory locations no larger than the native word size (the size of type native int) is atomic (see §I.12.6.2) when all the write accesses to a location are the same size."

Since references are the native word size, the CLR will ensure that there is no tearing.

Note that it is possible to produce a torn read variables that are larger than a native-int, even of native types such as decimal[1].

[0] - http://www.ecma-international.org/publications/files/ECMA-ST... section I.12.6.6, page 102

[1] - http://stackoverflow.com/questions/23262513/reproduce-torn-r...

Pxtl · on Feb 22, 2017

The one that caught me recently was assuming that you could use a Dictionary as a concurrent cache.

You can't. Dictionary can be read concurrently, but once you start writing to it concurrently all bets are off. ConcurrentDictionary implements IDictionary and allows concurrent writes.

Also, Entity Framework. We had an icky bug coming from somebody storing the datacontext in a member variable. Don't do that.

tracker1 · on Feb 22, 2017

You want ConcurrentDictionary[1]... the worst one I saw was attaching information keyed off the thread id. Note: in ASP.Net lifecycle there are no guarantees that two events (load, prerender, render, etc) will be performed on the same thread from the pool.

Wound up seeing some REALLY weird issues... HttpContext.Current.Items is your friend there.

https://msdn.microsoft.com/en-us/library/dd287191(v=vs.110)....

cm2187 · on Feb 22, 2017

The one I am concerned about is iterators (for each). I have read somewhere that iterators are stateful and I should be careful when using them in a multi-threaded environment. Currently I am ignoring this advice and use them happily (on static lists and dictionaries) but I am not sure if this isn't going to bite me some day. But I don't understand how it would bite me.

jdmichal · on Feb 22, 2017

That is because your different threads are not using the same `Enumerator`. I'm guessing your threads look something like this, based on your mention of `foreach`:

    // In multiple threads:
    foreach(element in collection) { ... }

`collection` is not an `Enumerator` but rather an `Enumerable`. The `foreach` statement calls `collection.GetEnumerator`, so each thread gets its own iterator. You would only see inconsistencies if another thread modified the collection during iteration.

Try this instead, and you'll guaranteed see issues:

    // Before:
    IEnumerator enumerator = collection.getEnumerator();

    // In multiple threads:
    while(enumerator.Move()) {
        var element = enumerator.Current;
    }

Pigo · on Feb 22, 2017

I've often wondered what the real gains are for using async/await methods inside a web api call per say, especially when it's a single task being performed, like a query. I've tried to setup concurrent queries, and do whenall or parallel.foreach. But I think I always hit a road block because each query has to be from a separate context for that to work.

kaliku2 · on Feb 22, 2017

while the awaitable task is .. well.. waiting for the query, the thread servicing it is returned to the thread pool to be reused. This helps with thread pool starvation in the case when the process must service many requests.

m_fayer · on Feb 22, 2017

Exactly.

Here's another simple way to think about it: Handling a request usually invokes a couple of layers of your code until you hit some IO - database, file system, network, etc. If that IO is done in a blocking fashion, this request is consuming server resources THE WHOLE TIME, even while it's doing nothing but waiting for a network response. On the other hand, if the IO is non-blocking (async), then this request consumes no resources while it's waiting.

So imagine you have a request that takes 2000ms to service, and 1700ms of that time is spent waiting on the database to fetch some data. Without async, that request will consume 2000ms of CPU time. With async, only 300ms of CPU time is used.

tuwtuwtuw · on Feb 22, 2017

No, the request wont consume 2000ms of CPU time if it's blocking. It will prevent the thread from doing other work which may be bad, but the CPU itself does not have to spend all 1700ms on that thread. At least my CPU doesn't use 100% of a core while waiting for a network package from the database server.

m_fayer · on Feb 22, 2017

Should have phrased it as "the blocking wait will consume resources (keep a thread busy) for 1700ms whereas the async one will not." Hard to stay both accurate and extremely simple when talking about concurrency.

SideburnsOfDoom · on Feb 22, 2017

Right. Or you can phrase this as: using async ... await can't make you wait faster. Waiting will take just as long, but it will be more efficient. The thread is freed up to do other things during the wait, increasing the possible throughput.

jrs235 · on Feb 22, 2017

So, (is it safe to say?) using await/async will likely be slower for each given awaited operation since there is overhead in thread management and because when the awaited operation finishes it must wait for a thread to become available to continue onward.

lmm · on Feb 22, 2017

No. When implemented correctly, with async/await you run one thread per CPU core or thereabouts and that thread runs continuously, picking up async tasks when it can, whereas using threads you'll have say 100 threads and each core is picking up threads when it can. And switching threads on a CPU core is a much heavier operation than switching tasks on a thread (you have to switch the whole 4k stack, rather than just whatever's in the task).

It's very possible to stuff it up, to have a conventional thread pool end up competing with your async-handling threads, you can get priority inversions where the OS thread scheduling competes with the task scheduling, and plenty of other pitfalls. But done right async/await should be significantly faster than threading.

SideburnsOfDoom · on Feb 22, 2017

There is some overhead in thread management, yes. But in a server scenario (e.g. a web api) there is already thread management happening, as it all runs in threads from a thread pool.

I don't believe that you pay a significant additional cost per await - i.e. if you have one await in your code, then you might as well use as many as you can to chop the operation up into small parts so that operations can flow through better.

If you "must wait for a thread to become available" for any significant amount of time then you have run out of threads despite being async, and you have other problems.

This article talks about the mechanics of how the operation's completion bubbles back up to your code, in some detail: http://blog.stephencleary.com/2013/11/there-is-no-thread.htm...

See also, avoiding some basic mistakes: http://www.anthonysteele.co.uk/AsyncBasicMistakes

in UI code, there is a guideline that "operations that could take over 50ms should be async" (1) so as to not lock the UI thread. But it's clear that this does not apply to web server code where it's all thread pool threads already.

1) http://blog.stephencleary.com/2013/04/ui-guidelines-for-asyn...

kazagistar · on Feb 22, 2017

The thread overhead is a legitimate concern, but waiting for a thread to be free is usually much much better in the asyn then in the sync scenario. If you have a finite number of worker threads, then async will mean less are pointlessly idling at any time, and thus can handle new incoming requests much faster.

daxfohl · on Feb 22, 2017

My understanding is, not a lot. You free a little memory (amortized at that) by releasing a thread back to the threadpool during the query. Compared to the CPU-scorching work that the DB is doing to execute this query, the async-style method call isn't adding much value.

jimmaswell · on Feb 22, 2017

Not working very well in mobile: https://imgur.com/xPn8YLc

A simple table would have worked fine.

sundvor · on Feb 23, 2017

Great article. I learned a lot from reading this; being relatively new to C#, now I certainly know what to look for when coming across multithreading.

samfisher83 · on Feb 22, 2017

I think I learned this during learning Peterson's algorithm. The professor emphasized the flag had to be a single operation variable.

jorgeleo · on Feb 22, 2017

Interesting is also that these recommendations also apply to the use of the async/await pair

mihular · on Feb 22, 2017

Only when running code in different threads. async/await isn't necessarily multithreaded. Actually usually isn't.

jorgeleo · on Feb 23, 2017

Which makes it worse because it opens up the possibility of intermitent bugs, and also those "it works on my machine, I don't know why it does not work on the server"

So it is better to treat them as multithreaded always... which falls back on the original post.

novaleaf · on Feb 22, 2017

the biggest gotcha I've seen with C# multithreaded is the usage of the basic .NET Collections: Dictionary<K,V> and List<V>

they will mostly work (like multiple threads reading, or 1 writes while another retrieves), except when two threads try to simultaneously write, or when 1 writes while another enumerates.

When that happens, all bets are off, and you will silently get obtuse errors such as null return values or enumerating past the end of the collection, or a corrupt collection that throws exceptions from other (seemingly) random and unrelated areas.

I believe the new concurrent collections take care of this, but it is still so easy for beginners to shoot themselves with async programming. very much in stark contrast to how dev-friendly the rest of the CLR is.

dahauns · on Feb 22, 2017

New concurrent collections?

.NET 4 is over eight years old now.

And for all these years, the docs have explicitly informed about lack of thread safety in the non-concurrent versions and referenced the concurrent equivalents.

novaleaf · on Feb 23, 2017

yes, but it doens't stop novices from doing it, or perhaps if you have existing code you are trying to parallelize.

lmm · on Feb 22, 2017

That's pretty unfortunate by comparison with Java, where while ConcurrentModificationException is specified as unreliable you tend to hit it pretty quickly in practice (and then hopefully the docs encourage you to fix your code properly rather than catching it).

alkonaut · on Feb 22, 2017

You get that in C# as well, if you modify a collection that is being enumerated. That is simply thrown by e.g the List enumerator MoveNext() method if it discovers that the underlying list has been modified, which it detects since all mutating methods increments a version. You don't even need multiple threads for this to occur! You can just create a list enumerator, and then accidentally add/remove an item while iterating.

The more difficult problems to detect are those unrelated to collection versions + enumerators. For example accidentally reading a Dictionary<K, V> while someone is adding to it on another thread (this requires true concurrency). This one has no builtin detection for modification. Does the Java dictionary really check thread identifiers for who is modifying? Or use some kind of entry counter to detect concurrency?

I'd love to have that in debug builds in C#.

The result in C# is usually that it works 99% of the time and corrupts silently in the edge case that someone happened to read the hash table at the same time as an internal array was resized or similar. Typical consequence is that it reports a count of N and Contains(x) is true, but when listing the contents we see some other number of items, and no sign of x...

It's that customer report with a stack trace you say "that's impossible, foo is never null there". Or that unit test that only fails one build of 1000 and you blame cosmic rays first...

Matthias247 · on Feb 23, 2017

Checking Thread IDs alone won't help. It is perfectly valid that a collection or even if an iterator is passed between multiple threads and used by them, as long as there's external synchronization (which the collection implementation unfortunately can't see). So it's hard to check for the error at runtime - the only thing that can be done is invalidating old iterators if a modification happens, which can be detected later on (ConcurrentModificatoinException).

As you said that will not always reliably detect the error and show it the developer. But that's imho the biggest challenge in concurrent and multithreaded programming in general: Errors are mostly not visible - until they totally tear down the software in all imaginable ways. And the root cause if often very very hard to find. There's also no general solution for improving the situation. E.g. making all collections concurrent collections would mean they would not throw exceptions anymore if misused, but most likely the application behavior would be silently broken - and the user would take a performance hit. Imho the best solution that's currently in use is preventing shared mutable objects between threads at all - like Erlang (and also Javascript!) are doing. But for general purpose languages like C# and Java people would complain that this is too opinionated and disallows several optimizations.

alkonaut · on Feb 23, 2017

Exactly - the problem I'm usually facing with concurrent collection access in C# and Java is that the concurrent ccess wasn't even known or even intended. Having a mode or a special set of collections that guarantee that they are never accessed from two threads would be very useful to detect unintended sharing. It wouldn't help in scenarios where things are passed between threads sequentially but for that case at least the sharing is intended and can be reasoned about. The bugs are invariably in code no one even considered involved two threads at all. Having e.g a debug-only optional mode that verifies all calls to collection classes are from the same thread would provide a way of trapping many of these.

C# makes it very easy to avoid the normal threading issues by using Task<T> and other high level helpers - which is excellent. The problem I'm usually facing is isolating what the code within the task can "reach". Normally the main program code is a huge amount of code that must be single threaded. Tons of caches using vanilla dictionaries and so on. Now there is a small perf critical thing I need to do with Tasks. The failure mode is that the code from in the task somehow indirectly (e.g through some property on an object ending up in a cache backed by a dictionary) ends up accessing shared mutable state. Soparating the code used concurrently from the rest becomes a mental overhead. So anything that helps me do that would help.