Hacker News new | past | comments | ask | show | jobs | submit login
Don’t blindly prefer emplace_back to push_back (quuxplusone.github.io)
178 points by todsacerdoti on March 4, 2021 | hide | past | favorite | 140 comments



I’ve personally never encountered the particular misunderstanding the author dispels here, but I’m sure the post is written from bitter experience. On the one hand, I shouldn’t be that surprised: Unfortunately, programming in C++ is really brutal for people who would rather learn by example than by carefully reading documentation.

But on the other hand, my instinctive reaction is: Come on, is this really something that confuses people? emplace_back is all about constructing something in-place, which is exactly the opposite of moving it there! How can any professional programmer be 180 degrees wrong about the purpose of a function they use regularly?

What I want to say here is that C++ is really hard, and that there are a million subtle things that one simply has to look up instead of making educated guesses. I don’t think people appreciate this difficulty enough.


The article did touch on a possible reason for the misunderstanding.

If you already know from previous versions that push_back makes a copy and that C++11 added move semantics and emplace_back, it's not a huge leap to connect the two. Especially if you don't notice push_back's rvalue overload (or understand rvalues).

And if you're new to C++ and are having all of this thrown at you at once it'd definitely be easy to get some crossed wires.


As someone new to C++, it seems endlessly confusing that you are supposed to know if a function call is making a copy vs a move by figuring out the signature of the function being called or knowing which one the compiler picks, rather than being explicit about it in the code (with std::move or somehow annotating the call with what you want).


You should always know the signature of a function you're calling. Please be reading docs.

The part I agree with are the method selection semantics. A good rule of thumb (though don't rely on it) is that the "more specific" prototype generally wins.


I agree you should know what you are calling when writing code, but there is also a need for people to read the code and understand what the behavior will be. My impression is that C++ relies a lot on implicit knowledge.


All languages do.


Well, many languages try to keep this minimal. C++ doesn't.


And it's nothing new with move/copy semantics either. Even passing by reference is bad for hiding that. Every time I write C++ I'm tempted to take pointers as parameters in places where you'd normally take a reference just so the call site has an indication of what I'm doing.


Totally agree.


It's definitely a lot to process.

I found these two articles really helpful when trying to understand moves:

https://abseil.io/tips/55 https://abseil.io/tips/77

Though I'd also say don't worry about it too much, especially at first. If you're copying a lot of temporary objects moving can get you some performance wins, but that's something profiling should be telling you m


Authors should also try to be aware that for some types there's not a meaningful difference between copy construction and move construction.


Well ideally functions shouldn't be moving behind your back, i.e if you write:

  foo(someValue);
someValue will bever be implicitly moved. Either you need an explicit cast (i.e. std move) or someValue is actually an expression returing a temporary or an rvalue.

Of course if foo takes someValue by non const reference it might still modify someValue as it wishes (and moving out of it is just one possible mutation). But this is not a specific issue with moves but with functions taking non const references in general, which should be used only sporadically and hopefully the name makes the mutation clear.


> How can any professional programmer be 180 degrees wrong about the purpose of a function they use regularly

If you have a strong understanding of move semantics, then I agree it would be pretty weird to misunderstand the relationship between move semantics and emplace_back. But how common is it to have a strong understanding of move semantics, really? Can we confidently assume that even 50% of professional C++ programmers have a strong understanding of move semantics? I've known competent C++ devs, who write template metaprograms for fun, who were still surprised to learn basic facts about move semantics like "the destructor of a moved-from object will still be called." I know for certain that I don't have a strong understanding of move semantics, but I wouldn't call myself a C++ programmer either, so at least I'm not dragging down the statistics :)


Yes, I think his recommendation to prefer push_back is wrong for that reason. I can't imagine it's easier for the compiler to reason about std::move and destructors compared to inlining the constructor.


Nope, he's right. He maybe shouldn't be right, but certain convenience features are still missing from Standard Library components and the language that could someday make him wrong.

It is sometimes right to hold your nose and use std::piecewise_construct.

But most usually it doesn't matter, and worrying about it will distract you from what does matter. So, push_back() unless you know you have a good reason not to. And, you might never encounter one.

Good C++ and naïve C++ are not so different. Both are much better than fake-smart C++.


> C++ is really brutal for people who would rather learn by example than by carefully reading documentation

As if the documentation/compiler implementations are otherwise perfect. Quick, does `std::condition_variable::wait_for` use a monotonic clock under the hood?

Here's the docs: https://en.cppreference.com/w/cpp/thread/condition_variable/...

You might think so based on the part that says:

> Note that rel_time must be small enough not to overflow when added to std::chrono::steady_clock::now()."

But I happen to know from actually using it, that not all implementations use a monotonic clock: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=41861

This led to a rather dramatic time bomb bug in our code that only flared up when GPS (and therefore system time) was being spoofed to replay a specific scenario in the past.


wouldn't that be an implementation bug? Because waiting for a specific amount of time is not the same as waiting for a specific wall clock time to come.


Yes, it is an implementation bug. My point being that C++ can be brutal even for people who carefully read the docs.


> What I want to say here is that C++ is really hard, and that there are a million subtle things that one simply has to look up instead of making educated guesses. I don’t think people appreciate this difficulty enough.

Very much so. I tell everyone not to bother learning C++, there are so many other better languages out there. Hell, I'm only still using it because of legacy projects


> Unfortunately, programming in C++ is really brutal for people who would rather learn by example than by carefully reading documentation.

This is a great observation! There's probably a ton of this that feeds the "$lang/api/framework is garbage"/"no it's not" schisms. "Well do you prefer documentation or a REPL?"


In some languages the documentation is available as essentially a property of the documented object, so pulling up docs in the repl is easy. It isn’t quite a binary choice.


"A manual is a human-operated type system."


I learn C++ by example by using the VSCode debugger. I prefer it over gdb with tui. It won’t go all the way (e.g. educational/smallish projects only that are single threaded) but by that time a programmer has seen some wonky things happening that they can hypothesize about. In my case:

- dangling pointers

- integer under/overflow

- macro expansion (not via debugger but VSCode helps with this)

- double values not being the same when you expect them to be the same

- Seeing if something is copied or moved [1]

[1] Not sure if this is actually true. But I think it can be true by simply stepping inside the function like emplace_back


If you're using Linux, run your test programs under valgrind. It'll help you immensely to find memory issues.


I mostly use address sanitizer these days rather than valgrind. It's fast enough that you can just put it in your default compile flags for debugging.


Address sanitizers have bitten me before. Valgrind has not, aside from the times it wasn't supported on new macos.

Just be careful.


It is faster, but IME valgrind is more thorough. I have an option to run unit tests with valgrind.


And it's available for Visual Studio, so you don't even need to be running Linux.


Don’t blindly prefer emplace_back to push_back*

*when using it incorrectly. The premise of of emplace_back is that you use it for calling the constructor of the object in place. Obviously it won't help you if you call a copy constructor instead. I find this article a bit pointless. Clang's suggestion was spot-on and emplace_back would have (potentially) helped if the suggestion was actually followed correctly.


Actually, I am not so happy with clang-tidy: its first advice was spot-in, yes. But it should have thrown the exact same advice after the student incorrectly applied the first advised patch (by not removing the 'Widget(...)' constructor call).


Wouldn't that have changed the semantics (since you're not calling the copy constructor anymore?)

The student accidentally changed the semantics by adding the extra call to the copy contstructor, and therefore Clang didn't know anymore that it could be removed


In that case, push_back to emplace_back changed the semantics (as construct + push_back made a move construction, whereas emplace_back did not).


Good point, it doesn't seem to be any bigger of a change than the original suggestion. Perhaps they mistakenly thought nobody would call emplace_back in that way by accident.


The warning was not explicit enough to ask you to remove the constructor as well as changing the method name. If it called for "hey, you can remove the constructor and emplace_back would automagically call it for you in place!" and then checked that the constructor is not spelled out there, then there would not be such issue and the tool would be 100% helpful.


It did ask to remove the constructor as well as changing the method (note which parts are covered by tildes).

My question is, after having incorrectly only done one of those things, how could it have known that the user was only halfway though a two-part operation and didn't actually mean for it to be that way? Clang doesn't know how the code was in the past.


The article goes even further than that: When all else is equal, prefer push_back to emplace_back. It still shows the example of using emplace_back with a pre-existing object, which as you say, isn't the point of emplace_back. It also states that it's more work for the compiler, stating that the template resolution seems more complex, and suggesting it's likely to bloat compile times. That's a more persuasive point (hard numbers are provided at the end of the article), I wonder if more work has been done on this.

The author seems to be an advanced C++ programmer, but doesn't seem to properly acknowledge that, as you say, The premise of of emplace_back is that you use it for calling the constructor of the object in place. The only instance of proper use of emplace_back is in

    widgets.emplace_back(foo, bar, baz);
I was surprised to see no follow-up for this example:

    auto w = Widget(1,2,3);
    widgets.emplace_back(w);  // Fixed? Nope!
No mention of the obvious fix:

    widgets.emplace_back(1,2,3);
I don't mean to 'pile on', but, a pet peeve of mine: emplace_back is not magic C++11 pixie dust. This isn't saying anything. You could say garbage collection isn't magic, or error-correcting codes aren't magic, or sound type systems aren't magic. To say that something isn't magic is an empty statement, especially when it's something like emplace_back which introduces an important new ability.


> It also states that it's more work for the compiler, stating that the template resolution seems more complex, and suggesting it's likely to bloat compile times. That's a more persuasive point (hard numbers are provided at the end of the article)

I take the author's point about the cost of compiling a template, but the benchmark is odd: a thousand functions with a single call to emplace_back(). C++ programmers are (in)famously willing to trade compile-time performance for run-time performance. Complement the compile benchmark with a realistic program benchmark. In a typical program, I would expect emplace_back() to be called many more times than it's compiled.


Well I'm willing to trade compile-time performance for run-time performance on parts where it does matter. But these parts are quite rare, most of the code of an application is not a performance hot spot where using emplace_back instead of push_back will make a measurable performance difference. So I think the author about compile time is interesting.


When adding items to maps it becomes less obvious (when learning from examples at least) whether map.emplace({Key(args), Value(args)}) avoids copies for Key, Value, pair<Key, Value>, or all of them.


> To say that something isn't magic is an empty statement

I think this is said to express "simply replacing that function call/method/... without changing your calls/methods/coding style is not going to magically solve your problems". Which is true in the push_back/emplace_back case (you need to adjust your call).

GC, on the other hand, is magic pixie dust, if my definition is correct ;)


> I think this is said to express "simply replacing that function call/method/... without changing your calls/methods/coding style is not going to magically solve your problems"

I agree, but you're doing all the work in conjuring this interpretation. Saying it's not magic doesn't say anything substantial.

If it were possible to mechanically replace all references to push_back by emplace_back for an easy performance boost, then we wouldn't need emplace_back to be its own member function, it would just be a compiler optimisation. You have to make shallow and localised changes to make proper use of emplace_back, but isn't that the best we can hope for? Far easier than parallelism, say, where sweeping architectural changes may be necessary.


> auto w = Widget(1,2,3);

Note that unlike in Java or C# this is not the right way to initialize a variable in C++.


Herb Sutter offers another viewpoint:

In particular: #5 on that page: "we see that the right-hand style is not only more robust and maintainable for the reasons already given, but also arguably cleaner and more regular with the type consistently on the right when it is mentioned"

[1] https://herbsutter.com/2013/08/12/gotw-94-solution-aaa-style...


I don't think Herb's points apply to an ordinary declare-an-object-and-immediately-initialize-using-the-constructor, do they?

He gives an example using aggregate initialization, but that's not the same thing:

    // Classic C++ declaration order     // Modern C++ style

    employee e{ empid };                 auto e = employee{ empid };
    widget w{ 12, 34 };                  auto w = widget{ 12, 34 };
He makes the case that it's nicer to use auto pretty much wherever we can, to avoid repeating ourselves regarding type, and to avoid unexpected conversions. The auto keyword doesn't apply in our case, there's no way to use auto to simplify a statement like:

    Widget w(1,2,3);


Actually, he does discuss this in section 4b. He doesn't come out as strongly in favor of it, "The jury is still out on whether to recommend this one wholesale," although he does list what he considers some advantages. He also acknowledges in section 6 that it doesn't work for non-moveable types.

Despite my tendencies towards almost-always auto, I can't say that I consistently use auto x = type{init}, but there is certainly nothing wrong with it.


> He also acknowledges in section 6 that it doesn't work for non-moveable types.

Actually these days it does. Copy/Move elision is mandated by the language in this case and no valid copy/move constructor is required. Similarly is no possible to return non-movable non-copyable types from functions which opens the possibility of interesting code patternes.


Exactly. auto is only useful in contexts where the type is expected to be inferred (by the compiler).


Great point, I'd missed that! It should be:

    Widget w(1,2,3);


It is a completely valid way of initializing a variable in modern C++. Modern compilers will not create a copy assignment, if that is what you are afraid of.


> Modern compilers will not create a copy assignment, if that is what you are afraid of.

Probably, but using the canonical syntax removes all doubt, following the C++ language specification. That's a small but real disadvantage in robustness when using the unnecessary assignment, and I'm not aware of any reason to favour using it.


Copy elision in this example is also required by the language specification (if you're using a modern version of C++).


Neat, I didn't know that.

I still don't like it though. I'd rather express what I want to happen than express something I don't want to happen in the knowledge that the standard requires my compiler to be sufficiently smart to repair my statement. Consistent behaviour across different language versions is another plus.

If you'll permit me a moment's snark:

Most OOP languages: You can't copy objects. You may copy references to objects. Copying objects is one road to madness. Move semantics is another.

Old C++: You can copy objects, and specify your own copy constructors (which are permitted to side-effect), but the compiler is permitted to optimise away copy operations under certain circumstances. You shouldn't assume your copy constructor will be invoked whenever you see what looks like a copy.

New C++: You can copy objects, and specify your own copy constructors (which are permitted to side-effect), but the compiler is permitted to optimise away copy operations under certain circumstances, and beyond that, is now required to optimise away copy operations in some specific circumstances. You shouldn't assume your copy constructor will be invoked whenever you see what looks like a copy. When you see a copy operation in certain specific contexts, you are permitted to assume your copy constructor will not be invoked.

As with so many discussions of C++, I'm reminded of two quotes:

> Within C++, there is a much smaller and cleaner language struggling to get out.

- Bjarne Stroustrup [0]

> Do you know how the Orcs first came into being? They were elves once, taken by the dark powers, tortured and mutilated. A ruined and terrible form of life. And now... perfected.

- Saruman

[0] https://en.wikiquote.org/wiki/Bjarne_Stroustrup


Well, that piece of code is a perfectly valid way to initialize a variable in C++.


It's not bad advice. I see many code reviews that use emplace_back because it's 'faster'.

Though I think the abseil article does a better job of explaining why: https://abseil.io/tips/112

[Googler; unaffiliated with Abseil(sadly)]


Hence “blindly”. The blog post was a detailed, less sanctimonious version of your comment.


Yeah but the title sounds like "you shouldn't just always use emplace_back instead of push_back" but the actual content is "you shouldn't use emplace_back completely incorrectly because you don't know what it is".

The only actual argument against blindly using emplace_back everywhere is that it has worse compile time in a deliberately pathological microbenchmark. Not convincing, though it is interesting.


But then you could write a blog post about pretty much every API misuse ever, which doesn't really seem very constructive to me.

But okay, maybe it's more obvious when having worked with pre-C++11 code, where you would have killed for something like emplace_back. If you are new to the language and only know modern C++, you probably don't think much about why emplace_back exists in addition to push_back, and why it can be faster.


Given the whole scanf/strlen thing going on, more blog posts about API pitfalls wouldn't necessarily be a bad thing.

And that is basically the premise of the Abseil Tip of the week series, which are an incredible resource for learning how to work with modern C++.


Given that some schools still teach C++ with Turbo C++ for MS-DOS, one cannot expect that everyone is up to date with CppCon C++Now talks.


Not to mention the very real risk of accidently invoking the wrong constructor. If you change the type of a vector, push_back is a good deal safer than emplace_back.

Especially since many of the standard containers have many more constructors that you'd think - hello std::vector(size_type count).


That’s one of my favorites as well. The fact that C++ will so happily auto-convert almost anything into an integer makes it even more likely to hit this accidentally.


And explicit constructors won't save you from this one.


Use Phantom Types. The compiler will help you.


Random emplace_back hate: I really dislike how you can't use it for objects that you'd normally use aggregate initialization for.

See https://godbolt.org/z/6qW7q3

    struct SimpleData
    {
        int a;
        double b;
    };

    SimpleData someData = {1, 2.0}; // ok

    std::vector<SimpleData> data;

    // Why can I not do this?
    data.emplace_back(1, 2.0);


You can as of C++20 - aggregate initialization has been extended to support T(x, y) syntax.

See https://godbolt.org/z/Tj5baM and https://en.cppreference.com/w/cpp/language/aggregate_initial... (case 5).


That's because of the epic fail that are initializer lists that prevent aggregate initialization (which is otherwise awesome) to be used safely in generic contextes.


To be fair, push_back won’t help you here, either.


It does, actually. In fact, this is the most concise one (see godbolt link):

    data.push_back({1, 2.0});
That does not work with emplace_back (not that it would be useful) because the template arguments can't be deduced from the initializer list (whereas for push_back the type is known so you can initialize it with the initializer list).


You are correct, the problem is exclusively with emplace_back; incidentally, there is an "official" discussion of this at https://cplusplus.github.io/LWG/issue2089.


Thanks for the article, there should be a huge list of those type of classic C++ misconceptions/FAQ style.

One of my favorite is "you don't need a linked list".

Although I'm still genuinely interested when a linked list is best suited. I'm also curious why they were invented and why they're taught, maybe just for teaching purposes...


Several few decades ago, pointer chasing was much cheaper because the difference between an arithmetic operation and a memory access was far lower. CPUs in those times didn't even have caches (because they weren't needed). That made linked lists a reasonable choice. But CPU and memory speeds have been diverging -- both are growing at an exponential pace, but with different exponents, so the gap is also growing exponentially! So now we have a huge memory hierarchy and random accesses into memory are really slow and linked lists rarely make sense.


> Several few decades ago, pointer chasing was much cheaper because the difference between an arithmetic operation and a memory access was far lower.

Also allocation was just bumping the brk.


I used one the other day because it had the interface I needed and the number of items was so small and so infrequently iterated over that performance didn't matter. (I guess that's not really what you meant though: you were probably interested in a use where it really was needed for performance reasons - splicing things around or the like.)

Specifically I needed the guarantee that iterators weren't invalidated by other parts of the list changing. I could've used a vector and worked around it but it would've taken more effort and more code, and like I said the performance difference wouldn't have made an overall difference to the program.


Linked lists become really powerful and performant when implemented as intrusive containers, such containers do not need to live on heap and can be embedded directly into another data structures, so an element can be linked into multiple containers at the same time, this gives great flexibility and compactness (especially when indices are used to form the links instead of pointers).


Similarly, some data structure and algorithmic choices that don't make much sense when operating in RAM, can suddenly become appropriate in different environments.

Eg when working on hard disks (eg think a file system or database with raw storage access). Or if you are working in a medium were you can only write once, and can't change what's already written, etc.


>> Although I'm still genuinely interested when a linked list is best suited. I'm also curious why they were invented and why they're taught, maybe just for teaching purposes...

It's not hard to come up with plenty of valid use cases for linked lists. Any time you need some algorithm that does a lot of non-predictable insertions/deletions, on a collection that is not regularly iterated in full or used to find elements (so pointer-chasing/cache effects are irrelevant). And/or when the elements are prohibitively expensive reorder/move on insertions/deletions, or some other piece of code is known to keep direct references to the elements that should not be invalidated when some unrelated element is inserted/removed and. Maintaining validity of iterators while an algorithm is operating on the list is another important benefit.

Some people will come up with all kinds of clever workarounds or more complicated datastructures/bookkeeping so you can achieve the same thing without a linked list, but in most cases I don't see the point of it. Especially not in cases that are not performance-sensitive at all. In such cases you're just increasing code complexity and obfuscating intent because of some ideological objection to using linked lists.


> Any time you need some algorithm that does a lot of non-predictable insertions/deletions, on a collection that is not regularly iterated in full or used to find elements (so pointer-chasing/cache effects are irrelevant).

You still should at least consider a vector here. If lookups are infrequent, you can just sort it and binary search it on demand (and just add new items at the end). There are cases when this is not good enough, but they really are surprisingly rare.

> And/or when the elements are prohibitively expensive reorder/move on insertions/deletions, or some other piece of code is known to keep direct references to the elements that should not be invalidated

In that case, why not store the elements in a unique_ptr (and then everything in a vector)?

> Especially not in cases that are not performance-sensitive at all. In such cases you're just increasing code complexity and obfuscating intent because of some ideological objection to using linked lists.

Fully agreed - if you are sure that it cannot possibly become performance relevant and a list has the right interface, use it. For example, when the number of items is fundamentally tiny (e.g. one item per dimension of your 3D data) and the thing you'll do with the list item will be complex (or infrequent) anyway so that a few pointer indirections are negligible.


>> You still should at least consider a vector here. If lookups are infrequent, you can just sort it and binary search it on demand (and just add new items at the end). There are cases when this is not good enough, but they really are surprisingly rare.

Well most of the time linked lists are used for things that have to be kept in some order that cannot (easily) be determined by a sorting predicate, and/or when re-sorting is not allowed.

>> In that case, why not store the elements in a unique_ptr (and then everything in a vector)?

But why? What's the benefit of doing that? The objects will still be scattered around the heap so iterating and dereferencing still trashes the cache, you lose benefits like not invalidating iterators on inserts/removals, and you complicate the code.

I'm not saying linked lists should be your go-to data structure, but IMO the 'linked lists are bad, dont use them' meme is a little overused.


>> or some other piece of code is known to keep direct references to the elements that should not be invalidated

> In that case, why not store the elements in a unique_ptr (and then everything in a vector)?

I recently ended up using std::list<Object*> because of pointer invalidation on vector resize. I see now that using smart pointers would have worked just as well. Thanks.


One place where a (doubly, perhaps) linked list might be suitable is when you have a list that wants to provide the following features:

1. Multiple independent (not necessarily concurrent, but not coordinated) mutators.

2. Insertion and removal happen commonly and should be cheap.

3. Operations on the whole list typically happen in list order or close to it.

Bonus points if the size of the relevant data structures is often such that you blow out caches anyway (e.g each thing in your list is larger than a cache line) and hence there are no strong locality benefits from an array. Thought prefetching _can_ sometimes help with arrays there.

The above features _can_ be implemented on top of an array, but feature 1 means you can't use indices as long-lived handles to items, so you need either handles that get updated on mutations or some concept of identity and "find the current index of this thing" calls.

For context here, Gecko used to store the children of a DOM node as an array of pointers and switched to a doubly-linked list because the DOM does have the above properties and it was faster and mostly simpler to have a linked list in practice. There is a bit of complexity to make node.childNodes iteration (as opposed to .nextSibling/.previousSibling iteration) not be O(N^2), but overall the switch was a win.


Naively, I would assume that LLs are best used for long lists which are typically traversed incrementally, like a FILO-only job queue. That wouldn't perform better than a static array of statically-sized objects, but it could be better for eg. a list of structs with flexible array members (ie, variable-sized structs).

They might be good for traversed lists with the possibility of partway insertion too, or performing insertion sort. Hell they might even be good for traversed lists in the general case, but I'm not a low level dev so I don't know without testing it.

Fun exercise though!

I'm guessing they're taught because they're taught, in that special inertial way that education propagates. Originally, I bet they were introduced when implementing your own data structures rather than pulling a library was normal industry practice.

Of course there is tons of value in knowing about data structures and their characteristics, but probably not in knowing how to implement LLs.


They're also an example of a data-structure which works pretty in a concurrent setting - you can get away with a single atomic reference for an unbounded FILO queue implemented with a linked list. Adding and removing items to it is then dirt-cheap - provided you don't go through the queue quickly and job running time > iteration time.

But with all things in computer science, it really depends on your use-case.


Even if linked lists and double linked lists aren't all that universally useful by themselves, they teach all the basics about taversing data structures by references / pointers that more complex trees and graphs require. A linked list is just a degenerate tree with at most one child for each parent. So I think teaching that isn't a bad decision.


I couldn't have said it better :)


good point, they may still be useful as a pedagogical tool to introduce pointer-based structures.


Everyone would probably agree it's good to teach trees in a CS data structures class. Linked lists make a nice introduction to this: what is a linked list except a tree with max branching factor 1? So you teach LLs on day one, and introduce in the simplest useful way the idea of a recursive data structure.


Linked lists are great when you want a fully persistent data structure version of a stack. (https://en.wikipedia.org/wiki/Persistent_data_structure)

If you don't need persistence, you can often use a dynamic array to do basically everything a linked list can do for you, but most of it faster.

Also, if you have a relocating garbage collector, or otherwise a smart enough compiler (or interpreter), linked lists can become sufficiently efficient.


Use a linked list when you need to store iterators into a container while mutating it. E.g. if you are making a super naive LRU cache, you could use a map<K, list<pair<K,V>>::iterator> and a list<pair<K,V>>. This is something you can't do with vectors unless you can guarantee the capacity will never increase.

This is the most useful case of removing from the middle of a container, which linked lists are good at. Intrusive lists are common for this type of thing, instead of actually storing a list iterator.


> Although I'm still genuinely interested when a linked list is best suited

Maintaining an intrusive list of objects that you have around anyway is sometimes the simplest option, especially because objects can easily remove themselves from the list upon destruction.


Linked lists are pretty nice.

* Lazy linked lists are the same thing as iterators, a fact which is used pervasively in Haskell.

* Intrusive or otherwise inline linked lists like are used everywhere, e.g. the DOM has a doubly linked list of children for each node, simple memory pools are often implemented as free lists, and tons of resources inside kernels are members of many linked lists.

* Linked lists are guaranteed O(1) append, but dynamic arrays are only amortized O(1) append. Intrusive linked lists also never need to allocate. That makes them useful inside resource-constrained places.


Agreed on linked lists in Haskell if you’re doing forward iteration. You can have lazy byte strings under the bonnet while providing a ‘pure’ linked list interface. No worries about explicit iteration.


I remember the linked list being useful to solve Leetcode-style questions involving LRU (least recently used) caches.


> Although I'm still genuinely interested when a linked list is best suited.

I can think of a few use-cases where I've opted for linked-lists over resizable arrays:

1. When you have enough memory for the set of all elements you want to store but not enough contiguous memory for the set of all elements you want to store (i.e. You have the available memory, but it's all fragmented).

2. When you have multiple threads adding to/removing from the container, a linked list lets each thread (in theory) insert/remove nodes at the same time. For example, one thread can be adding to the head of a queue while another thread is removing from the tail of the same queue; they don't need to both wait for the same lock to resize the container.

3. When you have large insertions/removals. For example, concatenating two lists into one is just reassigning two/four pointers in linked lists. With arrays or similar you have to have enough free memory to make of a copy of the smaller list, at least).

   Another example is "zipping" two lists together - you simply go through the lists and reassign pointers. With arrays you have to allocate enough memory for the final list before copying elements over.

   Same goes for splitting lists into two: for example, applying a predicate function to remove and return all elements that satisfies the predicate takes no extra memory using lists, but with arrays will need the destination to be allocated first).
4. Finally, you may want to store cycles in your container (perhaps the traversal maintains state so you can use the cycle to make traversal of some nodes repeat, or maybe you're simply insane and like to confuse readers of your code :-).

Okay, fair enough, that last one is me stretching things a bit :-)


Re 1. (memory fragmentation): I'm sort of on board with this (although you probably make it worse for the next person if you use linked lists). But it'll be a very special case where the performance hit from working with a huge list will be acceptable.

Re 2. (threading): What happens when the list becomes empty? How do you keep track of the size to determine whether the list is empty? How do you know the size is still up-to-date when you go on to make a modification? I don't think this is as easy (even "in theory") as you make it sound.

Re 3. (merging overhead): You say "takes no extra memory", but you use significant extra memory already: You incur an additional two pointers of (and possibly per-allocation) overhead. For the entire lifetime of the list, not just when you do some operations.

If your elements are small, then your list might have double memory consumption the entire time instead of only when merging. And if they are large, you could store unique_ptrs to those elements in a vector. In which case the total memory overhead of merging two vectors (2 x total number of elements x pointer size) is exactly the same as a linked list has the entire time. And all the operations then turn into shuffling around pointers too, but probably fewer: You just have to move the unique_ptrs around, which is one pointer each instead of constantly unlinking and relinking lists.

Sure, sometimes iterator invalidation is a thing, and maybe keeping pointers to (separately allocated) elements doesn't cut it. But even then you wouldn't want to use a list if performance is even remotely a concern.


> Re 2. (threading): What happens when the list becomes empty? How do you keep track of the size to determine whether the list is empty? How do you know the size is still up-to-date when you go on to make a modification? I don't think this is as easy (even "in theory") as you make it sound.

No, it isn't easy. I used a single lock to lock both head and tail pointers, and did the pointer-swapping and pointer-comparisons for the pathological cases in that lock[1]. However, that being said, locking for the comparison and head/tail reassignment is *still* preferable to locking while reallocating your array.

For counting items in the list, I post to a semaphore when an item is inserted, wait on the semaphore when an item is removed and simply return the semaphore count when a count of items is required. Because it is a linked list, there is no problem with having the count change in between retrieving it and adding/removing items - that'd only be a problem in array-like structures.

If you want to code-review my implementation then be my guest (it's on github, under my name, a repo called libcmq). I welcome criticism :-)

In any case, I've been considering reimplementing that library as a resizable array precisely because of the downsides of lists. I'm keeping it as it is (for now) because there are other things I want to do that aren't boring.

[1] Empty list and list of exactly one element are problematic.


To fight fragmentation, a deque-like container is probably better anyway.

Very good point on 3.


Indeed. I recently looked at using std::deque because I was exactly trying to avoid allocating too much contiguous memory, fearing fragmentation.

Sadly, std::deque is pretty much useless. It uses something like page-sized blocks in one implementation, a few cachelines in another one (forgot which is gcc/clang), and on MSVC it allocates every single item separately (!!!) for element size > 1 word. And this will not be changed due to ABI compatibility. It might as well not exist in the standard.


yes, I never use std::deque for the same reason. I believe that the deque in boost these days has a compile time configurable block size.


> When you have enough memory for the set of all elements you want to store but not enough contiguous memory for the set of all elements you want to store

That's only relevant if you don't have an MMU tho. So not on any desktop or embedded system where ram is measured in gigabytes.

Otherwise it's up to the OS to handle that - only individual memory pages are contiguous, but not sets of pages.


If your memory is fragmented enough you can have many pages that are lightly used, at which point a MMU isn't going to help.


Lazy HN: Is emplace_back new? I was a C++ dev for over a decade and never came across it. But I got out of serious gamedev in 2014 or so (sadly).

(Is this what it feels like to become a dinosaur?)

There is some confusion on this issue: https://stackoverflow.com/questions/4303513/push-back-vs-emp...

The function void emplace_back(Type&& _Val) provided by MSCV10 is non conforming and redundant, because as you noted it is strictly equivalent to push_back(Type&& _Val).

Looks like (as usual) MSVC's lackluster implementation drives real-world usage. Is there a rule of thumb?


It was added in C++11 as part of the perfect forwarding changes, so I would say that it is about a decade old.

As an aside, if you check cppreference.com it will tell you which standard added (or removed) a specific API.


This (https://abseil.io/tips/112) is a much better explanation of this advice.


My pet peeve with all emplace functions, make_unique etc. is that CLion won't let you command-click the calls to jump into the constructor being called. Are other IDEs better at this?


This would be very hard to do because the code isn't explicitly calling the constructor. It just happens to be passing arguments which get passed to a constructor. I guess with a fancy set of heuristics or special-casing std:: it could work.


  widgets.emplace_back(foo, bar, baz);
I honestly always thought that was the only way to use emplace_back.


I mean, if a copy constructor exists then the other version should obviously also work. But at the same time, it isn't quite how this was meant to be used.


Yup indeed. :)


Google's internal style guide basically had this same suggestion. Was it publicized already? If so, please dont down vote me.

Just a hint that this is a very sound suggestion.


Why is there no move_back(), I wonder?


Well that's emplace_back() or push_back() when you pass them a r-value reference basically.


Rust's push is a `move_back`. Curiously, it doesn't have any equivalent of emplace or placement new. It does memcpy on push, and relies on the optimizer to remove it.


Rust gets away with this because it doesn't have move constructors. Which is a pain if you want self referencing structures, but avoids quite a bit of complexity


> C++ is really brutal for people who would rather learn by example than by carefully reading documentation.

I've been at my current job for a little over two years. Before then I never touched C++; now it's the main (basically only) language used here.

I'm still getting trucked by this...


But, it seems like no harm done? So it's just a code style issue.


"emplace_back"? People actually looked at this and went "Yes, this is a good function name. I understand this perfectly."?


I quite like the name, not least because it accurately gives the impression of "something you will probably misunderstand".


This had me burst out laughing. Thanks for that perspective. I guess that's one way to solve the naming problem: just make all your names super obscure as a warning to any would-be students of the dark arts.


My personal rules of naming, in order of importance are:

1. Be unique.

2. Don't be misleading.

3. Be leading.

So in this case you are nailing 2 but missing out on three which I would consider a "pretty good" name. Sure, it would be nice if it was a bit more obvious (maybe construct_in_place_back?) but at least it doesn't sound like it does something else so people know to look it up.


All the simple names and most of the syntax were taken in C++98. This is one of the problems with backwards compatibility: even if everyone migrates to the new standard on its first day, so you never actually have to target the old compiler, the simplest way to do things will often be the old way.

Also, in mitigation, it's not a special-purpose word for appending to vectors. They introduced "emplace" to other STL data structures at the same time, with the equivalent semantics, so it's a word you only have to learn once.


I mean, this is entirely a self-inflicted problem. If you'd just use a few more words in the name, none of this would be an issue.

There's nothing wrong with longer function names. They make code a lot easier to read.


Actually, for frequently used standard library functionality, I would very much object to introducing overhead for educational purposes at every single call site.

It's a tradeoff, of course, but if the user code becomes unreadable because the function names are verbose (so you constantly have to wrap lines and cut through the visual noise to see the actually meaningful part of the code), just to make sure people don't misunderstand basic functionality... that's not the tradeoff I would like.


You'd prefer something like vec.construct_then_push_back(foo, bar)? I think it's reasonable that people would want a pithier name than that, for something you're going to use constantly.


I think they mean `std::vector::construct_back_in_place()`. 200 character line widths will no longer be enough: 400 or 500 will be more realistic. Sacrifice readability for readability.


How many function calls do you put on a single line? Break those statements up!


"f8" would be even shorter, but we don't use that, because it is more important that code is easy to read than quick to write.


If you realise how it relates to "placement new" then it does kinda make sense. But yeah, not exactly beginner friendly!


Something like construct_back would've maybe been a little clearer. I imagine that would have it's own issues as well though.


Or just `place_back`! Why do they have to be so whimsical?


Maybe they're fans of why's guide to Ruby :)

Though place_back still has the connotation of 'take this thing and put it over there' that emplace_back is actually the opposite of.

Naming things really is hard.


Naming things is very hard. Wish there was away to learn how to do it better. Reading more to acquire new vocabulary? Learning latin? I'm not really sure.


My stance: You start writing some piece of code and introduce e.g. a variable or a function. Then, half a minute later, you want to actually use it. You probably forgot which exact name you gave that thing. Instead of looking it up, try to come up with the name again (this time you're in the "user" perspective, not the "creator" one).

If you came up with the same name: Congratulations, you probably picked a good name!

If not: Think about how the two names you picked differ. What made you sway towards one or the other in the respective contexts? Is there some subtlety that you were not aware of earlier? Is the intention still the same? Maybe you even realize that what you wrote earlier doesn't fully do what you needed.

This also works on longer time scales and in reverse, of course. Occasionally I look at some older code I wrote and realize that, with a shifted frame of mind, I read/understand a function name the wrong way, even though it technically means the right thing too. Try to learn from these experiences as you make them and consider them when you're coming up with a name ("can this be misunderstood from the perspective of a user that is trying to do something that's only tangentially related?").


I use this all the time to improve the name I picked in the first place.

But there is something I noticed: changing to a new name on the spot is more work.

I have to find all the previous occurrences and rename them.

I can use my editor to help, some plugins for it, or I can get the code to some parsable step and then do a refactor rename.

Notice how I had to get sidetracked here while I am trying to keep my original context in mind.

I use vim, so maybe I am missing out on fancier ways to do this quick on the spot rename? I would love to optimize this problem away.


Wouldn't have that the inverse effect? - The more I know about the origin and etymology the more clever words I pick, the harder it is for the ones without that knowledge? The more trivial name might be the better one.


Yeah just give everything a Latin name, your co-workers will thank you ;).


A lot of words have latin origins though. Especially scientific jargon. Greek too.


Don't insist on making names as short as humanly possible. Use more words. That's all it takes.


It means put into position. Seems perfectly reasonable


Weird article. In the first section the compiler (correctly, albeit unclearly) hints the student to write “widgets.emplace_back(foo, bar, baz);”.

The author then proceeds to compare widgets.push_back(std::move(w)); versus widgets.emplace_back(std::move(w));. Wat? Why is he comparing against the thing you aren’t supposed to write? That makes no sense.

The final section I can’t tell if he’s comparing compile-time or run-time. The comments make me think compile but I’m not sure?

Good topic for a blog post. But it feels like this post is sloppily mixing three different ideas. Very strange.


The author explains that people he's worked with when seeing a message from the compiler telling them to use emplace_back() instead of push_back() just replace "push_back" by "emplace_back" without changing the function arguments. And since the result makes the compiler happy they don't realize that this is not what the message told them to do.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: