"If you unwind the stack after the catch block, the lock_guard destructor inside of lock_and_throw() gets executed after the mutex has already been deleted. So either the mutex_wrapper constructor can't delete m_ptr from the catch block (and then where is it supposed to do it when it wants to re-throw the exception?) or lock_and_throw can't assume that an otherwise valid argument passed to it won't be deleted out from under its stack variables' destructors whenever any exception is thrown."
That sounds like the sort of memory management problem that smart pointers are supposed to solve. It looks like a smart pointer would be exactly the sort of thing you would want here: when and if the stack is ultimately unwound, the lock_guard object will be destroyed first, then the smart pointer (because unwinding a constructor will cause the destructors of member objects to be invoked; note that your code throws the exception out of the constructor, and so the stack would have to be unwound at a higher level catch). The problem is not with unwinding the stack at the end of the catch block (which would not even be reached in your example, because of the throw in the catch block); the problem is that you explicitly deleted the pointer before the stack would have been unwound.
"And if you run the catch block and then go back and unwind the stack, it still leaves the question of what happens when a destructor throws during stack unwinding. Do you call the same catch block again (and then have to check for possible double free etc.), or do you let an exception that was surrounded by catch(…) escape out to the calling function just because it was already called once? Neither seems pleasant."
Really? It sounds like the second option would be what you would want: if the stack was unwound implicitly after the body of the catch block had executed, and unwinding the stack caused an exception to be thrown, then the exception was thrown out of the catch block. How is that unpleasant?
"The conclusion should be instead that global variables shouldn't be instantiated using constructors that throw exceptions unless program termination is the desired result when such an exception occurs -- which it actually is in most cases, because having uninitialized global variables floating around is unwise to say the least."
Except that a failure to initialize should at least be reported to the user. Maybe you could not get a network connection, or you could not allocate memory, or there is a missing file -- whatever it is, the user should know, and the thrower of the exception should not be responsible for telling the user. If you have restarts, you get something better -- you get the chance to try the operation again, which might be good if you have a long start-up process.
"Also, if it should ever become an actual problem in a specific instance, there is always the possibility of initializing the object with a different, non-throwing constructor and then having main reinitialize it on startup and catch any exceptions."
In other words, all classes should provide a non-throwing constructor and an initialization routine, because any object might be constructed in the global scope.
"There are obviously some applications that require arbitrary precision. But most applications never need to count as high as 2,305,843,009,213,693,952."
It is not just about counting high. If an application adds or multiplies two numbers, there is a chance of an overflow. If the application is reading one of the operands from the user, that overflow could be a security problem -- such problems are frequently reported.
"So why pay for it constantly if you so rarely need it?"
You don't have to pay for it constantly; you can have fixed-width types as something the programmer explicitly requests, or as something the compiler generates as an optimization. The real question is, why should the default type be the least safe, and why should programmers have to work harder to get a natural and safe abstraction?
"who says the compiler can't optimize out the arbitrary precision library calls in cases where the values are known at compile time just because the library isn't an official part of the language? The calls are likely to be short enough to be inlined and then if all the values are static the optimizer has a good shot at figuring it out."
Can you name a C++ compiler that does this?
"String literals as const char arrays are mostly harmless because the compiler ensures that they're "right" -- if they're const then you can't accidentally go writing past the end because you can't accidentally write at all"
You can accidentally read past the end, and you can accidentally print what you read. That can cause a lot of problems. There is no requirement that people use the standard library to iterate through a string.
"Moreover, bounds checking is pretty easy if you want it"
Once again, the programmer has to do extra work just to get something safe, because the default semantics are unsafe. If bounds checking is so easy, why not make it the default, and have unchecked access be an option for cases where speed matters? You already have at() and operator[] -- all that is needed is to switch which one of those does the bounds check.
>That sounds like the sort of memory management problem that smart pointers are supposed to solve.
Using a smart pointer would solve that specific problem, but then you're de facto mandating the use of smart pointers in every code block that an exception could be thrown through, which is every code block.
And what if mutex_wrapper is a smart pointer class? Do I now need to use a smart pointer to implement my smart pointer? Turtles all the way down? Or take operator new, which isn't an object at all so doesn't have a destructor, but still has to deallocate the memory it allocated if the constructor it calls throws, so it would need a "special" smart pointer object to use internally that only deallocates but doesn't call the destructor. It's not just calling destructors -- it's any cleanup operations because you can't do any destructive cleanup from a catch block anymore. So you end up requiring atomic RAII, and then none of the code actually implementing RAII can safely throw. It feels like just moving the problem: Now instead of not being able to throw from a destructor, you can't throw from code between resource allocation and turning on the destructor. Which is very close to saying you can't throw through a constructor.
>Really? It sounds like the second option would be what you would want: if the stack was unwound implicitly after the body of the catch block had executed, and unwinding the stack caused an exception to be thrown, then the exception was thrown out of the catch block. How is that unpleasant?
void foo()
{
std::vector<destructor_always_throws> v;
v.reserve(1000);
for(size_t i = 0; i < 1000; ++i)
v.emplace_back(destructor_always_throws());
}
Now I've got a thousand destructors that are going to throw one after the other as soon as the function returns. The nearest catch block will catch the first one, then go back to unwinding the stack where the next one throws. The next nearest catch block will catch that one and then go back to unwinding the stack again. Pretty sure I'm going to run out of catch blocks eventually, but I'd like to be able to catch all the destructor exceptions somehow and not end up with program termination, since that was supposed to be the whole idea.
>Except that a failure to initialize should at least be reported to the user.
Which it is:
$ ./a.out
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Granted the message could be less cryptic, but we already know how to fix it. Don't call constructors that throw from global scope. It's pretty much the same thing as saying don't throw out of main(), because you get the same thing.
>In other words, all classes should provide a non-throwing constructor and an initialization routine, because any object might be constructed in the global scope.
Not any object, just the ones that have already been used that way thoughtlessly and need to be fixed without impacting existing code using the object.
If you just need a global in new code (and you're insistent on a global), make it a pointer or smart pointer and then use operator new in main to initialize it.
>If the application is reading one of the operands from the user, that overflow could be a security problem -- such problems are frequently reported.
All user input needs to be validated. The language doesn't matter. If you're using a language that provides arbitrary precision, the user can instead provide you with a number which is a hundred gigabytes long and will take a hundred years for your computer to multiply. "If num > threshold then reject" is going to be necessary one way or another.
>The real question is, why should the default type be the least safe, and why should programmers have to work harder to get a natural and safe abstraction?
Because it's faster. The language is old, being fast was important then, and sometimes it still is today. But given that both types are available, isn't your complaint more with the textbooks that teach people to use the faster, less safe versions rather than the slower, safer versions by default? Or do you just not like that the trade off between runtime checking and performance is even allowed?
>You can accidentally read past the end, and you can accidentally print what you read. That can cause a lot of problems. There is no requirement that people use the standard library to iterate through a string.
There is no requirement that people don't write code that says "uid = 0" where it should say "uid == 0" either. People who write bad code write bad code. I understand that languages can and should help you avoid doing things like that, but at some point, when everybody says "don't do that" and you do it anyway, you get what you get. Most languages allow you to call C libraries from them, and if you call them wrong you get the same unsafe behavior. Does that make all those languages too unsafe to use too?
"a smart pointer would solve that specific problem, but then you're de facto mandating the use of smart pointers in every code block that an exception could be thrown through, which is every code block."
Which is already what the standard does to closures: you are basically forced to use smart pointers and capture by value if you are returning a closure from a function.
"Or take operator new, which isn't an object at all so doesn't have a destructor, but still has to deallocate the memory it allocated if the constructor it calls throws"
So have the catch set a flag, save a copy of the exception, and then after the catch block the flag is checked; if the flag is set, free the memory and throw the copied exception. Or just give programmers a way to explicitly unwind the stack at any point in a catch block. Or give programmers a Lisp-style "restarts" system, and create a standard restart for freeing resources, so that resources will only be freed if no recovery is possible (and so that constructor exceptions can be recovered without having to reallocate resources).
The difference is in what errors can be handled: handling constructor exceptions well would require a bit more care if the stack were not unwound until the catch executes, but right now destructor exceptions cannot be handled at all unless you are willing to have a program terminate (which is the default behavior).
"Pretty sure I'm going to run out of catch blocks eventually, but I'd like to be able to catch all the destructor exceptions somehow"
As opposed to the current situation, where your program would terminate without ever reaching a catch block? This sounds like another case where restarts would be handy: stack unwinding could set a restart (or the vector destructor, since the objects are being destroyed there), so that one catch block could keep invoking a restart and then handle each exception until no objects remain. I can imagine cases where it would be better to ignore some errors until a particular resource is freed than to have a program quit or to allow a low-priority error to prevent that resource from being freed.
So if your point is, "Catching before the stack has been unwound necessitates a restart system," I can agree to that, especially since catching before unwinding makes restarts possible.
"Granted the message could be less cryptic,"
It could also be completely useless. What if there is no terminal? What if the user is only presented with a GUI? The default exception handler has no way to know what sort of user interface a program will have, so it has no reliable way to present errors to users, let alone to allow users to correct errors.
"we already know how to fix it. Don't call constructors that throw from global scope."
Which is basically saying that all classes need a non-throwing constructor, or that you should never have a global object (not necessarily a bad idea, but people still create global objects sometimes). A better idea, which I think we agree on, would be to give programmers a way to handle exceptions outside of main.
"All user input needs to be validated"
OK, sure. Except that people do not always validate input, which is how we wind up with bugs. Input validation adds complexity to code, and like all things that involve extra programmer work, it is likely to be forgotten or done incorrectly somewhere.
"The language doesn't matter"
Sure it does, because the languages decides whether or not forgetting to validate some input will result in the program terminating (from an exception) or the program having a vulnerability (because it will use bad input). If there is no bounds checking on arrays, failing to validate input that is used as an array index is a vulnerability. If there is no error signalled when an integer overflows, failing to validate input that is used in integer arithmetic is a vulnerability.
I think experience has shown that it is easy for programmers to forget about validating and sanitizing input, and that it is easy for programmers to validate or sanitize input incorrectly. SQL injection attacks can be prevented by either (a) sanitizing all inputs or (b) using parameterized queries; it is hard to argue that (a) is a superior solution to (b), because there are fewer things to forget or get wrong with (b).
"the user can instead provide you with a number which is a hundred gigabytes long and will take a hundred years for your computer to multiply"
Sure, and then they can trigger a denial of service attack. And then the admin will see that something strange is happening, kill the process, and take some appropriate action, and that will be that. Denial of service attacks are a problem, sure, but it is almost always worse for a program to leak secret data or to give an untrusted user the ability to execute arbitrary commands -- especially when the user might do so without immediately alerting the system administrator to the problem (which spinning the CPU will probably do). It is also worth noting that a vulnerability that allows a remote attacker to run arbitrary code on a machine gives the attack the ability to run a denial of service attack (the attacker could just use their escalated privileges to spin the CPU); denial of service does not imply the ability to control a machine or read (or modify) sensitive information.
"isn't your complaint more with the textbooks that teach people to use the faster, less safe versions rather than the slower, safer versions by default?"
It's not just the textbooks; it is the language that encourages this. It is harder to use arbitrary precision types than to use fixed-width types in C++, because the default numeric types are all fixed width. That, in a nutshell, is the problem: C++ encourages programmers to write unsafe code by making safe code much harder to write. It is not just about numeric types: bounds checking is harder than unchecked array access, it is easier to use a primitive pointer type, it is easier to use a C-style cast, etc.
It would not have been hard to say that "int" is arbitrary precision, and to force programmers to use things like "uint64_t" if they want fixed-width (and perhaps to have a c_int type for programmers who need to deal with C functions that return the C "int" type). It would result in slower code for programmers who were not paying attention, but that is usually going to be better than unsafe code (can you name a case where speed is more important than correctness or safety?). Even something as simple as ensuring that any integer overflow causes an exception to be thrown unless the programmer explicitly disables that check (e.g. introduce an unsafe_arithmetic{} block to the language) would go a long way without forcing anyone to sacrifice speed.
"People who write bad code write bad code"
It's not just bad code; people can forget things when they are on a tight deadline. It happens, and languages should be designed with that in mind.
"at some point, when everybody says "don't do that" and you do it anyway, you get what you get"
I think there is a lesson from C++98 that is relevant here. Everyone says not to use C-style casts, and to use static_cast, dynamic_cast, or const_cast instead (or reinterpret_cast if for some reason it makes sense to do so), yet you still see people using C-style casts. It is just less difficult to use a C-style cast: less typing, less mental effort (there is no need to choose the correct cast), fewer confusing messages from the compiler, etc. Likewise, people continue to use operator[] in lieu of at()/iterators, because it is less effort.
Blaming programmers for writing bad code when that is the easiest thing for them to do is the wrong approach (but unfortunately, it is an approach that seems common among C and C++ programmers). The right approach is to make writing bad code harder, and to make writing good code easier.
"Most languages allow you to call C libraries from them, and if you call them wrong you get the same unsafe behavior. Does that make all those languages too unsafe to use too?"
No, because most languages with an FFI do not require you to use the FFI, nor do they encourage you to do so. The two FFI's I am most familiar with are JNI and SBCL's FFI, and both of those require some effort to use at all. One cannot carelessly invoke an FFI in most languages; usually, a programmer must be very explicit about using the FFI, and it is often the case that special functions are needed to deal with unsafe C types. You could be a fairly successful Java programmer without ever touching the FFI; likewise with Lisp, Haskell, Python, and other high-level languages.
I am actually a big fan of languages with FFIs, because sometimes high-level programs must do low-level things, but most of the time a high-level program is only doing high-level things. FFIs help to isolate low-level code, making it easier to debug low-level problems and allowing programmers to work on high-level logic without having to worry about low-level issues.
It is worth noting that there is nothing special about C. You can take high-level languages and retool their FFIs for some other low-level language, and the FFIs would be just as useful. You usually see C because most OSes expose a C API, and so low-level code for those systems is usually written in C. Were you to use Haskell on a system that exposes a SPARK API, you would want an FFI for SPARK, and your FFI would be less of a liability (since SPARK is a much safer language than C). So no, I do not think you can argue that having an FFI that allows code written in an unsafe language makes a high-level language unsafe; if a high-level program is unsafe because of its use of C via an FFI, the problem is still C (and the problem is still solved by either not using C or by only using a well-defined subset of C).
>Which is already what the standard does to closures: you are basically forced to use smart pointers and capture by value if you are returning a closure from a function.
Well, you have to make sure somehow that the thing a pointer is pointing to will still be there when the closure gets executed, sure. But the change your asking for would have a much wider impact. Anywhere you have a dynamically allocated pointer, or really anything that needs destruction whatsoever, without having an already-associated destructor would become unsafe for exceptions. Which is commonly the case in constructors. It's basically this pattern (which is extremely common) that would become prohibited:
What you would need to do is exclude exceptions from passing through any constructor that has an associated destructor, because the destructor wouldn't be called if the constructor throws and the catch block couldn't safely destroy the resources before the stack is unwound.
>So have the catch set a flag, save a copy of the exception, and then after the catch block the flag is checked; if the flag is set, free the memory and throw the copied exception.
Obviously it can be worked around, but can you see how quickly it becomes a headache? And now you're adding code and complexity to operator new, which is a good candidate for the most frequently called function in any given program.
>Or just give programmers a way to explicitly unwind the stack at any point in a catch block.
In other words, the existing functionality is good and necessary for some circumstances, but you want something different in addition to it.
It seems like you're looking for something like this:
void foo::bar()
{
connect_network_drives();
try {
do_some_stuff();
} catch(file_write_exception) {
check_and_reconnect_network_drives();
resume;
} catch(fatal_file_write_exception) {
// epic fail, maybe network is dead
// handle serious error, maybe terminate etc.
}
}
void do_some_stuff()
{
// …
file.write(stuff);
if(file.write_failed()) {
throw file_write_exception();
on resume {
file.write_stuff(stuff); // try again
// no resume this time if error not fixed
if(file.write_failed())
throw fatal_file_write_exception();
}
}
}
But if that's what you want, why do you need special language support, instead of just doing something like this?
// (this class could be a template for multiple different kinds of errors)
class file_write_error
{
private:
static thread_local std::vector< std::function<void()> > handlers;
public:
file_write_error(std::function<void()>& handler) {
handlers.push_back(handler);
}
~file_write_error() {
handlers.pop_back();
}
static void occurred() {
// execute most recently registered handler
if(handlers.size() > 0)
handlers.back()();
}
};
// (and then poison operator new for file_write_error
// so it can only be allocated on the stack
// and destructors run in reverse construction order)
void foo::bar()
{
connect_network_drives();
try {
file_write_error handler([this]() {
check_and_reconnect_network_drives();
});
do_some_stuff();
} catch(fatal_file_write_exception) {
// epic fail, maybe network is dead
// handle serious error, maybe terminate etc.
}
}
void do_some_stuff()
{
// …
file.write(stuff);
if(file.write_failed()) {
file_write_error::occurred(); // handle error
file.write_stuff(stuff); // try again
if(file.write_failed())
throw fatal_file_write_exception();
}
}
It seems like you're just looking for an error callback that gets called to try to fix a problem before throwing a fatal stack-unwinding exception is necessary. And it's not a bad idea, maybe more people should do that. But doesn't the language already provides what is necessary to accomplish that? Are we just arguing about syntax?
You could easily use that sort of error handler in a destructor as an alternative to exceptions as it is now. There is nothing the error handler can't do that a catch block could before stack unwinding. And if you call such a thing and it fails, the two alternatives of either ignoring the error or terminating the program are all you really have left anyway, because if there was anything else to do then you could have either done it in the destructor or in the error handler. (I can imagine that an error handler may benefit from being able to call the next one up in the hierarchy if any, analogous to 'throw' from a catch block, but that could be accomplished with minimal changes to the above.)
"In other words, the existing functionality is good and necessary for some circumstances, but you want something different in addition to it."
The problem with the existing approach is that the safety of throwing an exception from a destructor depends on the context in which the destructor is invoked, and there is no workaround for it. What I am proposing would ensure that the safety of throwing an exception would not be dependent on why the function throwing the exception was called; edge cases where this might be unsafe could either be worked around (perhaps in a headache-inducing way, but nothing close the headaches associated with destructors having no way to signal errors), or a language feature could be added to solve those problems.
"It seems like you're just looking for an error callback that gets called to try to fix a problem before throwing a fatal stack-unwinding exception is necessary. And it's not a bad idea, maybe more people should do that. But doesn't the language already provides what is necessary to accomplish that? Are we just arguing about syntax?"
Well, if we are arguing about programming languages, what is wrong with arguing about syntax? The language does provide enough features to manually implement restarts -- but if that is your argument, why do you even bother with C++? C gives you everything you need to implement any C++ feature; assembly language gives you all you need to implement any C feature. We use high-level languages because our productivity is greatly enhanced by having things automated.
Take exception handling itself as an example. We do not actually need the compiler to set it up for us -- we already have setjmp/longjmp, which are enough to manually create an exception handling system. The problem is that the programmer would be responsible to setting up everything related to exceptions -- stack unwinding, catching exceptions, etc. Nobody complains about exceptions being a language feature instead of something programmers implement by hand -- so why not add another useful language feature?
Rather than callbacks, what you really want for restarts is continuations. One way a Lisp compiler might implement the Lisp "conditions" system would be something like this: convert the program to continuation passing style; each function takes two continuations, one that returns from the function (normal return) and one that is invoked when an exception is raised. Each function passes an exception handler continuation to the functions it calls; when a function declares an exception handler, it modifies the exception handler continuation to include its handler (the continuation would need to distinguish between exception types; this can be done any number of ways), which would include the previous exception handler continuation (so that exceptions can be propagated to higher levels). Exception handler continuations will take two continuations: one to unwind the stack (which is used to "return" from the handler), and the restarts continuation that is used for invoking restarts. When an exception is thrown, the thrower passes a restart continuation (or a continuation that throws some exception if no restarts are available), and then the handler continuation will either invoke the appropriate handler or it will invoke the handler continuation from the next higher level.
Complicated? Yes, and what I described above is just the simplified version. It should, however, be done automatically by the compiler, and the programmer should not even be aware that those extra continuation arguments are being inserted. The stack unwinding continuation could potentially be exposed to the programmer, and for convenience it could be set up to take a continuation as an argument -- either the rest of the handler, or else the "return from handler" continuation that exits the handler, so that the programmer could perform some error handling after the stack is unwound (e.g. the clean-up code), although this could potentially be accomplished using restarts (but that might be less "pretty").
Perhaps continuations should be suggested for C++14; it is an almost logical followup to the introduction to closures in C++11.
That sounds like the sort of memory management problem that smart pointers are supposed to solve. It looks like a smart pointer would be exactly the sort of thing you would want here: when and if the stack is ultimately unwound, the lock_guard object will be destroyed first, then the smart pointer (because unwinding a constructor will cause the destructors of member objects to be invoked; note that your code throws the exception out of the constructor, and so the stack would have to be unwound at a higher level catch). The problem is not with unwinding the stack at the end of the catch block (which would not even be reached in your example, because of the throw in the catch block); the problem is that you explicitly deleted the pointer before the stack would have been unwound.
"And if you run the catch block and then go back and unwind the stack, it still leaves the question of what happens when a destructor throws during stack unwinding. Do you call the same catch block again (and then have to check for possible double free etc.), or do you let an exception that was surrounded by catch(…) escape out to the calling function just because it was already called once? Neither seems pleasant."
Really? It sounds like the second option would be what you would want: if the stack was unwound implicitly after the body of the catch block had executed, and unwinding the stack caused an exception to be thrown, then the exception was thrown out of the catch block. How is that unpleasant?
"The conclusion should be instead that global variables shouldn't be instantiated using constructors that throw exceptions unless program termination is the desired result when such an exception occurs -- which it actually is in most cases, because having uninitialized global variables floating around is unwise to say the least."
Except that a failure to initialize should at least be reported to the user. Maybe you could not get a network connection, or you could not allocate memory, or there is a missing file -- whatever it is, the user should know, and the thrower of the exception should not be responsible for telling the user. If you have restarts, you get something better -- you get the chance to try the operation again, which might be good if you have a long start-up process.
"Also, if it should ever become an actual problem in a specific instance, there is always the possibility of initializing the object with a different, non-throwing constructor and then having main reinitialize it on startup and catch any exceptions."
In other words, all classes should provide a non-throwing constructor and an initialization routine, because any object might be constructed in the global scope.
"There are obviously some applications that require arbitrary precision. But most applications never need to count as high as 2,305,843,009,213,693,952."
It is not just about counting high. If an application adds or multiplies two numbers, there is a chance of an overflow. If the application is reading one of the operands from the user, that overflow could be a security problem -- such problems are frequently reported.
"So why pay for it constantly if you so rarely need it?"
You don't have to pay for it constantly; you can have fixed-width types as something the programmer explicitly requests, or as something the compiler generates as an optimization. The real question is, why should the default type be the least safe, and why should programmers have to work harder to get a natural and safe abstraction?
"who says the compiler can't optimize out the arbitrary precision library calls in cases where the values are known at compile time just because the library isn't an official part of the language? The calls are likely to be short enough to be inlined and then if all the values are static the optimizer has a good shot at figuring it out."
Can you name a C++ compiler that does this?
"String literals as const char arrays are mostly harmless because the compiler ensures that they're "right" -- if they're const then you can't accidentally go writing past the end because you can't accidentally write at all"
You can accidentally read past the end, and you can accidentally print what you read. That can cause a lot of problems. There is no requirement that people use the standard library to iterate through a string.
"Moreover, bounds checking is pretty easy if you want it"
Once again, the programmer has to do extra work just to get something safe, because the default semantics are unsafe. If bounds checking is so easy, why not make it the default, and have unchecked access be an option for cases where speed matters? You already have at() and operator[] -- all that is needed is to switch which one of those does the bounds check.