> std::vector uses constructors and destructors to create and destroy objects which in some cases can be significantly slower than memcpy().
This is precisely what vector::emplace() solves, and std::move should be faster than swap and pop.
Modern C++ has changed a lot, this article ignores the massive improvements added in c++11,14,17.
> This is precisely what vector::emplace() solves, and std::move should be faster than swap and pop.
The whole swap-and-pop section weirded me out. Maybe I just don't know enough about C++, but saying that assignment (a[i] = a[n-1]) will call the destructor seems false.
As far as I know, the compiler should generate an implicitly defined copy assignment operator for these fixed size PODs and it should be as performant as memcpy.
But again, I don't have years and years of in-depth C++ experience, so I would be grateful if an expert could shed more light on this.
In theory erase could return a move iterator, meaning that you could omit the call to std::move. This wouldn't be backwards compatible though so not going to happen.
There is no erase that takes an index, so I assume that n = a.end(). Also it is missing a dereference:
a[i] = std::move(*a.erase(a.end()-1));
but erasing the one-before-the-end returns the (new) end iterator, which obviously is not referenceable. In general, after calling erase, it is too late to access the erased element.
You want something like:
template<class Container, class Iter>
auto erase_and_return(Container&& c, Iter pos)
{
auto x = std::move(*pos);
c.erase(pos);
return x;
}
Also in the general case it doesn't make sense for erase to return a move iterator.
You are correct. A trivial copy assignment operator makes a copy of the object representation as if by std::memmove. All data types compatible with the C language (POD types) are trivially copy-assignable.
I assume you mean aligned on boundaries ? I picked up that from https://en.cppreference.com/w/cpp/language/copy_assignment and it does also say that memmove has a fallback to std::memcp when there is no overlap between source and destination.
The article is just doing a generic cargo cult warning there. Not bad as a general C++ gotcha warning, but definitely incorrect in this specific case.
As per the author's constraints these are "POD types that are trivially memcpy-copyable", so by definition the copy constructors will never do anything. Much less "allocate memory" as the author claims.
> Please don't comment on whether someone read an article. "Did you even read the article? It mentions that" can be shortened to "The article mentions that."
My understanding the primary reason for he developers using custom libraries is not so much performance but a) historically console compilers and expecially standard libraries have been extremely buggy and b) is good to have a single implementation across platforms instead of having to deal with quirks and implementation divergence.
From what I've heard, there are two more major reasons to not use STL for gamedev.
- Debug build performance. Release builds of C++ code using STL are generally pretty fast, but Debug builds suffer a lot (especially Visual Studio's std::vector implementation is notoriously horrible for debug builds). Debug executable speeds matter when you are debugging a game; you don't want to test your first-person shooter in 1 FPS!
- Build speed. Because of heavy use of templates and historical cruft, STL slows down your build times a lot. The build-test cycle is very important when designing games; you don't want to wait for a few hours after you've changed a few lines of code to tweak a new feature. Gigantic distributed build servers alleviates this problem a bit, but they are pretty cumbersome to set up nonetheless.
For MSVC, the debug checks are fairly customizable through judicious use of appropriate debug macros. One can also enabled optimizations with debug symbols, but the debugging experience can be jarring.
I'm not a game developer, but have spent a decade doing C++ on Windows, and at former employer, we had several different debugging profiles depending on the severity/difficulty of reproducing/debugging an issue. Our "normal" debug profile had all of the debug checks in the std lib disabled, and we could only effectively debug our own code. Not sure if games dont do this, or if its still not performing enough.
One problem with using different debug macros in your debug build is that any libraries you link in must also be using the same flags. This is not necessarily possible for binary releases as they will assume certain standard library flags to exist in the debug builds (like iterator checking levels).
At work we don't use a debug build in the traditional sense, it's what you call a no-optimisations build where the code is compiled without most optimisations but otherwise the flags are the same as a release build. Some teams also go a step further and compile most of the code in release but some of their code with optimisations disabled.
> One problem with using different debug macros in your debug build is that any libraries you link in must also be using the same flags.
They don't have to be, but it certainly makes this world's easier. If the flags are not the same, for sure you have to be very careful about passing objects between DLL boundaries.
At the companies I've done C++ work at, we've always had the source for all non C libs and compiled any C++ libs our selves (except for Windows libs, bit they also provide checked debug libs), so we could control the flags.
Are all those "best practices" valid for modern C++ ? I mean one statement says "Pass and return containers by reference instead of value.". This is in contradiction to modern C++ where you return containers by value and rely on copy-elison/RVO.
https://stackoverflow.com/questions/15704565/efficient-way-t...
The best way to tell is to try it the modern way and then look at the assembly code generated on something like godbolt.org. If it ends up being less efficient then you change it to accept a non-const reference to store the result in as a parameter instead.
Though if you'll be calling the same function repeatedly to accumulate content into a single container it is far more efficient to have a function with an output reference rather than returning a new container. This will result in fewer memory allocations and you can also pre-allocate the size once before calling those functions.
On the part of tooling it might be nice if there was a way to annotate a function so that it creates a warning if the compiler cannot use copy-elision for the return value. (To be honest I haven't checked the documentation for this specific thing)
The warning that I would want would trigger when someone changes the function and prevents or suppresses copy elision from happening. Like for example adding a check at the start of the function and returning a default container.
I'm not sure if C++14 or C++17 has fixed this but if the object was not copy-constructible then the compiler would emit an error if it was returned by value even if RVO/NRVO was meant to be used. I figure because semantically you need to enable copy-construction of the object.
IIRC it was changed in C++14. Now in the RVO case, no copy/move constructor is required (and in fact the compiler is not allowed to call it if it exists).
vector::emplace() still needs to construct the object, it just happens inplace and avoids a redundant copy of an already constructed object. Same with std::move(). As such the blog post is correct.
Using POD structs which can be zero-initialized and memcpy'ed may indeed be faster, especially when these are bulk-operations.
It's not clear from the article, but I suspect the author is talking about what happens when the vector is resized and has to move existing elements, which is a real problem.
That is correct yet the compilers definition of what is trivially copyable might be more strict than what you expect. For example, objects that are trivially relocatable can also be memcpy'd for reserve/realloc, but the compiler will not be able to figure that on its own.
std::vector itself falls in this category: trivially relocatable, definitely not trivially copyable. So a vector of vectors will not necessarily be able to use memcpy but rather fall back to copy/move assignment. This is not very significant in performance for this type (vector move being cheap) but a language gotcha nonetheless (as the move constructor will be called n times in every capacity change)
Since C++11, you can use template traits to determine if a type is trivially copyable, and even add static_asserts to your code to ensure future changes dont break expectations.
Trivially copyable is a word of power (well, two words I guess), it's meaning is well defined and you can statically assert for it.
What unfortunately is not defined is (trivially) relocatable as that's not a property that can be safely be inferred so it is not (yet) part of the standard. Some libraries still have this concept and require some sort of opt in.
> So in general, if both push_back() and emplace_back() would work with the same arguments, you should prefer push_back(), and likewise for insert() vs. emplace().
That's an interesting point the tip makes. Is there guidance on how to use the emplace_back() added to c++17 which returns a reference to the constructed element?
The reference returning emplace_back() is used frequently in the code to construct a new element of a struct and then fill in its members, as opposed to creating a new struct then push_back() to copy the memory in.
No, the problem is that std::vector still calls the constructor and destructor of each and every object in the array at least once. This is a performance loss if they don't do anything - you have to rely on the compiler to inline the call, then remove the code. For POD datastructures it can be significant, because those are usually the largest arrays in your application. This is why e.g. Facebook's Folly library detects POD types in their vector and doesn't call ctors and dtors at all.
Similarly, std::vector has to allocate more memory every time it has to grow and copy all its contents, whereas for POD datatypes you can just use realloc which can save copies.
These are all borderline microoptimisations, but they matter for realtime highly responsive software. Or just in general when you need to squeeze out every last bit of performance.
std::move'ing a vector does not call the ctor/dtor of every element within the vector, but that might not be what you're referring to.
If you want an `A` struct/class, you'll call the ctor/dtor, that's true. But for POD types, if the ctor/dtor does nothing, they are trivial to inline and will incur no runtime overhead by any compiler nowadays.
This is precisely what vector::emplace() solves, and std::move should be faster than swap and pop. Modern C++ has changed a lot, this article ignores the massive improvements added in c++11,14,17.