Nonsense, it still works. Hardware prices are still falling. People aren't writing web apps without garbage collection. Performance/$ is still going to rise.
While I do understand what you mean about writing web apps in managed languages for optimizing programmer productivity instead of speed of code, garbage collection is not slow in general. In fact, garbage collection can be faster than manual collection in some cases, due to several reasons (e.g., GC may require operations based on the size of the working set only, manual collection will need to touch each piece of garbage explicitly).
There are a lot of things that make GC performance an issue, including stop-the-world behaviour for some collectors making real-time operations impossible as well as the tendency for programmers to produce a lot more garbage in their programs. In general, I think that most people overstate the percieved inefficiencies of GC as it comes to throughput (predicatable latency is the main culprit IMHO).
The problem with garbage collection is not only the speed penalty. Garbage collection will also use a lot more memory than manual memory management or automatic reference counting. This is especially bad when running in a memory constrained environment and dealing with data intensive tasks. Think image or video manipulation on a mobile or embedded device.
Reference counting is very expensive time-wise and memory-wise if you have lots of small objects. Memory-wise because of the additional counting-field needed, and time-wise because of the counter update at every gain and loss of a reference. These updates can also easily become huge concurrency bottlenecks, since they introduce false write sharing. Another issue with reference counting is of course that cyclic structures are hard to handle.
It is true that GC works best when you have sufficient amounts of free memory compared to the amount of garbage produced. On any kind of memory-constrained (embedded) device I would naturally view everything about memory allocations as critical issues to handle, with clear memory usage budgets.
I mean, this conversation has been around for easily 10 years. We live in a world where we already have to do this—look at all the progress in the last decade on the prevalence of futures, async i/o, and channels/queues/whatever. Memory bandwidth and latency has proven to be far more of a performance drag that has improved far slower—the likely areas that will lead to performance problems will be in areas other than single-core clock speed.
As it turns out, Erlang is already providing much better memory locality than your average language. Each process runs in a contiguous memory area (heap+stack) which fits perfectly with a many-core world.