The behavior of 'assert' is not an anomaly. It comes from 'design by contract.' Assert is primarily meant to be documentation of constraints in code and secondarily a way of catching errors during development.
"Contract conditions should never be violated during execution of a bug-free program. Contracts are therefore typically only checked in debug mode during software development. Later at release, the contract checks are disabled to maximize performance." - https://en.wikipedia.org/wiki/Design_by_contract
That is certainly one approach, and the article agrees.
> The root cause of this weakness is that the assert mechanism is designed purely for testing purposes, as is done in C++.
However, C and C++ are perhaps unique in how much undefined behavior is possible and in how simple it is to create. Inserting into a vector while iterating through it, for instance. Or an uninitialized pointer.
That's why many C++ experts believe in runtime assertions in production. Crashing the application with a core dump is generally preferable to trashing memory, corrupting your database, or launching the missiles.
All of the C/C++ experts I know, as well as people who have interviewed me coming from primarily that background, have always been among the most adamant to stress that an application crashing unexpectedly should never happen and is always the wrong outcome.
I imagine they would say that your statement about crashing vs. e.g. launching the missiles is a false dilemma. You don't crash and you don't incorrectly launch the missiles.
I'm not a C++ developer so I can't say it with certainty. I more agree with what you're saying. I'm just relaying that my experience has been that out of many different language communities, C++ actually seems adamantly the opposite of what you're describing.
I think the industry is on the cusp of settling on the Erlang model which is essentially allowing pieces of a program to crash so that the whole program doesn't have to. It will take time for practices and tools to spread.
I have occasionally needed to argue with a long-time C dev that crashing is exactly what I want my program to do if the user gives unexpected input. They're used to core dumps instead of pleasant tracebacks.
I'm a big fan of fatal errors and crashing the program with a stack trace:
1. Stack trace at point of a contract violation tends to capture the most relevant context for debugging -- the faster it is to discover and debug an issue the easier it is to fix
2. Interacting code has to become sufficiently coupled to preserve "sane program state" -- an exception may or may not be recoverable -- a fatal error never is and there's no point in building code to try to recover. If the programmerer has to design the interaction among program components to avoid fatal errors then there must be fewer total states in the program vs a program which recovers from errors -- this makes the program easier to reason about.
3. On delivering good User experience -- id rather have clear and obvious crashes which are more likely to include the most relevant debug information -- than delivering the user some kind of non-crash but non-working, behavior (with possibly unknown security consequences) which may take longer to get noticed and fixed as a result of an error handling mechanism that deliberately _tries_ to paper over programming problems ...
I've actually modified third party libraries I've used to remove catch blocks or replace error handling within with fatal errors -- when dealing with unknown code it really can vastly speed up the learning process and the understanding based on observational behavior ... -- especially in understanding the behavior around edge cases.
In my experience, "You don't crash" means you catch the exception and exit gracefully, reporting a fatal error has occurred. Users don't distinguish between a crash and a fatal error.
Higher level languages are better at reporting uncaught runtime errors than C/C++ is, because they'll automatically do things like print useful stack traces and then exit gracefully even if you don't catch an exception. The interpreter doesn't crash when your code does.
I think you misunderstand the use case. If your container is tracking work done and it thinks 20 requests were handled and only 10 were received, you have an invariant failure. Without more context, this could easily be trashed memory, in which case, you might already be in the middle of undefined behavior. In that case, getting the hell out of the process is the most responsible course of action. Efforts to even log what happened could be counterproductive. You might log inaccurate info or write garbage to the DB.
Also, if you don't catch an exception in C++, most systems will give you a full core, which includes a stack trace for all running threads. Catching an exception and 'exiting cleanly' actually loses that information.
I've been writing software in C and C++ for a long time. Crashing is never a good user experience, so avoid it. If something unexpected happens, catch and report the error, then carry on, if possible, or gracefully exit otherwise.
As someone who works in support, customers (at least the ones I support) REALLY need a clear and obvious crash to understand something's wrong. That really forces them to "do something differently" and/or look for help. You're correct in that it's not a good user experience. Neither is chaos.
Exactly. "Undefined behavior" includes showing private data to the wrong user and booking ten times more orders than the user originally indicated. I'll take crashing over that.
It really depends on the type of software. Sometimes if something unexpected happens and you just catch and report an error, then you may end up in a state which will result in further errors a few hours later. It is much easier to track the root cause, if the program crashes immediately, than having to analyze several hours of logs. And then crashing (i.e. quitting with a core dump) is as graceful as it can be, since it provides all the necessary information to analyze the problem right when it happened.
I think it depends on the job the program is being used for. If the program is used in a setting where occasional crash has no severe consequences, say search engine backend, it may be useful for the company to actually run in production a program that is allowed to crash whenever severe error condition occurs. In scenarios where lives or lots of money hinge on program being alive and functioning, such as plane autopilot, or rocket / space probe control, crash or fatal error of the main program often means disaster and thus should never occur. If error condition occurs during execution, the program should withstand that and continue with default path of execution. In the past lots of lives and money could be saved if only the software and hardware conformed to this paradigm.
I agree with 'it depends' except for the case of safety critical systems, which I actually have experience in. A proper safety critical system should also be able to withstand a crash in an arbitrary process. The thing to remember is that there is no default path of execution if there is undefined behavior in C or C++ code. The process may do the worst thing it can, at least with the OS-level permissions it has.
It's not an anomaly, but it can be a surprise for people who don't understand what it does. For example, they use asserts for validation and then the validation doesn't work in production. It's absolutely right the way it works, but it's still a gotcha for the audience this blog post is aimed at.
The point is that they think they understand it, because most of Python behaves the same in development and production. You see this function called 'assert', it gives the right error at the right time, and all is good. Then you push it to production and it stops throwing errors. Eventually, you read the manual and it tells you that this specific function is ignored in production. This is a surprise because, say, print doesn't behave like that.
That only happens if you have different production and development environment settings though -- in which case you should expect the different results.
In this particular case, you compiled your code with "-O", so it's not the "same code" used in production, but code compiled with a different flag. Shouldn't they check what the flag does?
The developer may be deploying code to a server they didn't configure.
I agree with the way it's done in Python, as it's consistent with most other languages. But the blog post is right to point out to inexperienced developers that the way assert behaves might give them a surprise.
Unless the article was updated after your comment: the reason is right there in the article:
"However, Python does not produce any instructions for assert statements when compiling source code into optimized byte code (e.g. python -O). That silently removes whatever protection against malformed data that the programmer wired into their code leaving the application open to attacks.
The root cause of this weakness is that the assert mechanism is designed purely for testing purposes, as is done in C++. Programmers must use other means for ensuring data consistency."
I would argue that one should never use '-O' it also strips doc strings from running code. Not really an `optimization` but they had do something right? One couldn't run __unoptimized__ in production could they?
Couldn't the assert message just say something like ", warning: this is only checked in development". I don't know, requirement of knowing how something works are always kind of tough since a lot of people's first interactions with things are in the code (like if they've just joined a new project), and they may assume they understand the functionality, and their assumptions may initially seem correct as they test it themselves. Its one of those "don't know what you don't know" scenarios, and "look up every function you ever see just in case what you think it does isn't what it actually does!" can be a bit impractical. So if this is known to be a gotcha, making the function itself speak that gotcha might be useful.
> Assert is primarily meant to be documentation of constraints in code
Real or imagined constraints? AFAICT, an assert only tells me what you wish your program did, but that has absolutely no bearing on what it will actually do.
>AFAICT, an assert only tells me what you wish your program did, but that has absolutely no bearing on what it will actually do.
Depending on the implementation, an assert can either merely log or absolutely stop a program that doesn't pass its test, so it very much has a bearing on what the program will actually do.
Then imagined it is. Your desires are totally a part of your imagination, unless you make them become real.
> Depending on the implementation, an assert can either merely log or absolutely stop a program that doesn't pass its test, so it very much has a bearing on what the program will actually do.
Point taken. Unfortunately, logging errors or aborting the program won't make assertions magically become true, though.
Only in the most pedantic and useless sense of the term.
Asserts are not just some random imagination, they are added based on the program's specifications and expected/desired functionality and constraints. The CS term for those kind of constraints are "invariants", and asserts are a way to be notified if those invariants are violated.
>unless you make them become real.
Only there are no assurances for that. If the invariants in your program were somehow guaranteed to be "real" then you wouldn't need asserts.
Asserts are there because whether you tried to make your invariants "become real" or not, you'll still miss things, have bugs, have unexpected interactions with code/systems outside your control etc. So they are there to tell you about those misses.
>Point taken. Unfortunately, logging errors or aborting the program won't make assertions magically become true, though.
Assertions are not expected to "magically become true" -- just to (a) inform about anytime they are violated, and, optionally, (b) not be violated and still have the program continue to run.
> The CS term for those kind of constraints are "invariants", and asserts are a way to be notified if those invariants are violated.
An “invariant” is a function of the process state whose value remains constant (hence “invariant”) in spite of changes to the process state. Perhaps you meant “precondition” or “postcondition”?
> Only there are no assurances for that. If the invariants in your program were somehow guaranteed to be "real"
Guaranteeing that preconditions, postconditions and invariants hold when they're supposed to hold is your job, not the computer's.
> then you wouldn't need asserts.
I absolutely don't need asserts. An assert merely describes what you want, but that's useless to me, unless you establish a relation between what you want and what your program actually does - with proof.
> Asserts are there because whether you tried to make your invariants "become real" or not, you'll still miss things, have bugs,
It will become patently clear when the proof doesn't go through.
> have unexpected interactions with code/systems outside your control etc.
What happened to sanitizing input at system boundaries?
> Assertions are not expected to "magically become true"
Of course. Assertions are expected to always be true.
>An “invariant” is a function of the process state whose value remains constant (hence “invariant”) in spite of changes to the process state. Perhaps you meant “precondition” or “postcondition”?
No, I meant invariant. An invariant is something that is supposed to hold true, not just things that are guaranteed to hold true (e.g. a constant that can't ever change anyway). That's why the need for assertions to check that invariants hold.
From Wikipedia:
"In computer science, an invariant is a condition that can be relied upon to be true during execution of a program, or during some portion of it. It is a logical assertion that is held to always be true during a certain phase of execution. (...) Programmers often use assertions in their code to make invariants explicit."
Preconditions and postconditions are similar in concept, but are supposed/wanted to hold true before (pre) or after (post) a method runs.
>I absolutely don't need asserts. An assert merely describes what you want, but that's useless to me, unless you establish a relation between what you want and what your program actually does - with proof.
Well, asserts weren't created specifically for you. Feel free not to use them.
They are useful to me, and assuming from their widespread use, others, even if they don't formally prove the program does 100% that it needs to (which nobody expected them to anyway).
Until we all program in Coq or similar, they will be useful for all kinds of checks. A correct program is a spectrum, not a binary option.
>Of course. Assertions are expected to always be true.
No, they are also expected to be false -- that's why we add assertion statements to check whether our assertions hold. But we're splitting hairs twice or three times here.
> That's why the need for assertions to check that invariants hold.
No, you need proof.
> Until we all program in Coq or similar
So you're saying humans are fundamentally incapable of establishing the logical validity of what they assert by themselves? This contradicts historical evidence that people have done this for well over 2 millennia, using various methods and tools.
> A correct program is a spectrum, not a binary option.
Some errors might be easier to fix or have less disastrous consequences than others, but a correct program is one that has no errors, so I don't see where the spectrum is.
>That's why the need for assertions to check that invariants hold.
>No, you need proof.
You might need proof, but it doesn't mean you'll get it. In most languages in common use (e.g. not Coq and co) and for any larger than trivial program "proof" is impossible.
So, we'll continue to need all the tools we can realistically use, including assertions, unit tests and others.
>So you're saying humans are fundamentally incapable of establishing the logical validity of what they assert by themselves? This contradicts historical evidence that people have done this for well over 2 millennia, using various methods and tools.
This particular question is not even wrong in the context of the discussion. I don't usually throw around the term "troll", but you're either trolling or being alternatively naive on principle / too pedantic.
I any case, whether people are "capable of establishing the logical validity of what they assert by themselves" for trivial things or for narrow domains, the absolutely have not been able to manually do it, or do it fast enough to be practical, for software programs, especially any non trivial one. Even the best programmers introduce bugs and have behavior in their program that they didn't expect.
Which is also why even the best programmers use assertions. It's not some obscure feature relegated to newbies or bad programmers. It's a standard practice, even in the most demanding and hardcore programming environments, from the Linux kernel (which uses the BUG_ON assertion macro) to NASA rocket code.
Or I could turn "troll mode" on an answer on the same vein as the question: if "people have done this for well over 2 millennia, using various methods and tools" then they haven't been doing it "by themselves" any more so than when using assertions (which is also one of such "tools").
And of course, I haven't anywhere stated that "humans are fundamentally incapable of establishing the logical validity of what they assert by themselves".
The gist of my comment would be merely that humans are bad at establishing the logical validity of their computer programs by themselves -- for which there is ample "historical evidence".
>Some errors might be easier to fix or have less disastrous consequences than others, but a correct program is one that has no errors, so I don't see where the spectrum is.
The spectrum is obviously in that correctness is not black and white, and all non trivial programs have bugs in practice. Those programs with few and far between bugs are more correct than others.
> If you have the process quit it definitely stops them from being false though.
The assertion remains false for the final process state, before the process quits. Outside of the process, the assertion is simply meaningless (neither true nor false), because the assertion's free variables are only bound inside the process.
>The assertion remains false for the final process state, before the process quits.
Which is inconsequential. Programmers don't expect automatic "recovery" from assertions, they expect them to notify them of the violated constraint, and/or to ensure that a program wont go on and use a value that violates an assertion further down -- which program termination achieves.
>Outside of the process, the assertion is simply meaningless (neither true nor false), because the assertion's free variables are only bound inside the process.
> He meant from being false subsequently in the program.
But, you see, the assertion is no less false just because the process was aborted. The fact remains that there exists a reachable state for which the assert fails. So apparently what I meant is no more obvious to you than it was for JonnieCache.
>But, you see, the assertion is no less false just because the process was aborted. The fact remains that there exists a reachable state for which the assert fails
Yes, Captain Obvious, and that reachable state is exactly what every programmer who uses an assert() statement expects when he writes it.
If there wasn't the potential for such a state, assert statements would do nothing ever in the first place -- so it would be kinda silly to even have them in.
One Python gotcha that has bitten people in my company a lot:
fun_call('string1',
'string2'
'string3')
That is, missing commas and subsequent string concatenations can lead to nasty errors. I wish Python didn't nick this from C and would have just enforced the use of + to concat over-length strings, if they need to be split to multiple lines.
They considered dropping that for Python 3. I forget the reason why they changed their minds, but there's probably a PEP about it. You may find that their discussion will change your mind as well.
Nothing mind-changing in here. Translations seem to take the biggest hit, but it's largely a matter of company conventions if this is a problem. IMO the grounds of rejection weren't discussed very thoroughly.
In many cases, this invocation of fun_call() would not match its definition signature and it would generate an exception. When that's the case it's not at all a Python gotcha because it halted and forced you to fix the error.
I like this string catenation behavior and I prefer it, even if it causes some confusion in (IMO rare) cases.
Unless I'm using a string more than once or its meaning is unclear, I always use the literal. In most cases, I find it more legible than having to go look up a constant's definition.
he probably meant not using string literals / ints outside of top-level declarations, so you would instead assign all these parameters to "constants" at top level and then use those constants in function calls, hence avoiding this error.
If you're interested in reviewing Python code for potential security issues, here's a related project: https://github.com/openstack/bandit (I'm one of the devs)
It will actually pick up a number of security issues listed in the post. It's useful in real world too - led to a number of CVEs being reported.
I looked at Bandit earlier this year, but had to put it down when I discovered it didn't have a way to fill in default config -- the instant I specified anything in the config file, I had to supply a complete config, including literally every single check it's capable of doing, because Bandit would discard all its defaults on encountering that single line of custom config.
It's not ideal - the config file is still a complete list, but there are a few things you can do.
- `bandit-config-generator` will give you a file filled with the current/default configuration, so it's a simple way to start with the defaults and modify just what you need
- if you just need to enable/disable tests rather than reconfigure, you can do that in command line options
- if you want to get rid of specific warning, you can mark the line with "# nosec" in the source
Merging various configs is possible, but rather complex to implement considering we aim for the config to be a complete description that won't ever need to change between versions.
If none of the above workarounds solve your use case, feel free to report an issue. (https://bugs.launchpad.net/bandit) I can't guarantee how/whether we'll fix this, but we'd definitely like to know what the problem is and how you're trying to use Bandit.
FYI when using pypy bandit fails to discover tests for some reason (could be a pypy bug?). In this case, the default output generated by bandit-config-generator is mostly empty and bandit fails to parse it. It doesn't indicate what the nature of the parse failure is (what line, what rule(s) were violated), even with verbose mode. The readme doesn't cover the format of the config file. Is it YAML/TOML/JSON/other?
EDIT: nevermind, couldn't reproduce it with a new virtualenv. Whatever problem occurred in that virtualenv likely wasn't that interesting (wild guess is a package name collision).
pypy2-v5.3.1-linux64 / [PyPy 5.3.1 with GCC 4.8.2]
If you can't reproduce it with that tarball I'll dig deeper to see the mechanism of failure, maybe it's not pypy and it's just something local to my config or venv.
I did something similar, yes. I cannot reproduce this with a new virtualenv anymore. It may have been due to odd bits in my environment (likely not worth further investigation).
virtualenv -p `which pypy` ~/pypy_env
source ~/pypy_env/bin/activate
# indeterminate <but probably critical> changes to this venv
pip install bandit
bandit-config-generator -o tmp_file
bandit --help # "The following sets..." is empty
bandit -c tmp_file -r path/to/some/project # gives the error regarding config file parse failure
I can reproduce the parse error given the nearly empty config file, but it's not clear to me whether the parse error is expected in this case or not.
I wouldn't call it 'traps'. I would call it 'read and understand documentation before writing code' like: what is 'is' operator, or how floats behave in EVERY programming language, or why you should sanitize EVERY user input.
So, basically, I can write such a list for every language I know.
I'd argue that a good programming language should rarely force you to go back to the docs. Anyway, for a list of weird language properties, check out this tag:
Relying on developers to read and remember every bit of documentation for every bit of code is more likely to end up with insecure code compared to introducing sane defaults with an explicit, expressive API.
Which is why any sane industry has lots of safety involved. We don't just shrug every time someone gets electrocuted to death and say "they forgot part c page 4 of the operations manual which indicates that the off switch doesn't work on tuesdays".
And the way we handle that is by designing systems to compensate for the fallibility of humans so that the human-computer system is more robust as a whole.
I think the point of the parent comment is that they should. I'm not even a regular user of Python (or dynamic languages in general), and yet the list of things in the article doesn't seem all that surprising.
I would never accuse Python of "language clarity and friendliness". Far from it. For someone who came up through C, Java, Perl, and Ruby, but who's wrangled with Python, Javascript, Go, and even Haskell in recent years, I still find Python mysterious, self-contradictory, and filled with implicit rules and assumptions that are entirely un-intuitive to me far more than other languages. And yet, people seem to like it. Certainly this article does. It's an interesting effect.
I find that with Python that's almost always caused by not quite understanding the underlying rules. Once understood, they're very consistent.
For example: does Python pass function arguments by value or by reference? Neither! It passes them by object reference - not by variable reference like C/C++.
Check out:
>>> def foo(a):
... a = 2
...
>>> value = 1
>>> value
1
>>> foo(value)
>>> value
1
and:
>>> def mutate(dct):
... dct['foo'] = 'bar'
...
>>> value = {}
>>> value
{}
>>> mutate(value)
>>> value
{'foo': 'bar'}
This apparent contradiction confuses a lot of people. The first example would imply that Python's pass-by-value, but the second looks a lot like pass-by-reference. If you don't know the actual answer, it looks magical and inconsistent, and I've heard all sorts of explanations like "mutable arguments are passed by reference while immutable objects are passed by value".
In reality, the object itself - not the variable name referring to the object - is passed to a function arguments. In the first example we're passing in the object `int(1)`, not the variable `value`, and creating a new variable `a` to refer to it. When we then run `a = 2`, we're creating a new object `int(2)` and altering `a` to point to the new object instead of the old one. Nothing happens to the old `int(1)` object. It's still there, and the top-level `value` variable still points to it. `a` is just a symlink: it doesn't have a value of its own. Neither does `value` or any other Python variable name. That's why the second example works: we're passing in the actual dictionary object and then mutating it. We're not passing in the variable `value`; we're passing in the object that `value` refers to.
The point of this long-windedness is that Python's rules tend to be very, very simple and consistent. Its behavior can be unexpected if you don't truly understand the details or if you try to infer parallels to other languages by observing it and hoping you're right.
Except that's 100% wrong here. There's not even a facility for marking an object as immutable in Python, so that wouldn't support user-defined classes at all.
Python is always pass-by-object-reference, and assignment is always a pointer operation. It's not special-cased like you're describing.
In my experience and implied by the rising popularity of python, you would be among the minority. Personally, I find python to be the most clear of any language I've worked with, most resembling natural language in the way I typically speak. Do you have some examples of how you find it self-contradictory?
Here's an example of its expressiveness a colleague and mine I discussing the other day:
Python: [os.remove(i.local_path) for i in old_q if i not in self.queue]
Java: old_q.stream().filter(i -> !self.queue.contains(i)).map(i -> new Path(i.local_path)).forEach(Files::delete);
I've programmed in both languages but joked I could only understand the Java line by using the Python line as documentation!
Well you've picked the perfect example where Python's list comprehensions shine, 1 map, 1 filter, 1 'each'.
I think your Java example only looks gross because it's using ugly APIs, and isn't indented well, but otherwise, apart from contrived examples, pipelining is superior.
I find Python's lack of pipeline capability, whilst every other modern language supports it, very frustrating. JavaScript, Scala, Swift, Rust, Ruby, Elixir, C#, F#, Java, Kotlin <-- all support pipelines.
Meanwhile, Python has borked, 1-line, lambdas that compose awkwardly with map/filter (if you do a map over the result of a filter, they'll be written in reverse order), and refuses to implement useful methods on lists, that would allow pipelining. It's like it can't decide to pick the OO solution to the problem (and add the methods to lists) or to go the FP route (and fix its lambdas), so has done neither.
So we're stuck hoping our problem at hand fits neatly into a list comprehension, which still won't be composable when we come back to it and realise we want to add another operation.
I like Python very much, but this is one of it's weakest areas in my opinion, so I'm surprised you bring it up as a strength.
I'm afraid I disagree. I programmed in a variety of languages in grad school (physics): C, C++, Fortran 77, Tcl, Perl, Matlab, Maple, Mathematica, IDL, Emacs LISP, etc. Not to mention the stuff I started on in high school.
When I switched my analysis to Python, I became so much more productive. And other science researchers I have known have echoed this sentiment. Even writing a C module to speed up my Python was pretty straightforward, if tedious.
Python had the fewest surprises. And debugging other people's Python is exponentially less annoying than debugging other people's Fortran or C. It's still my go-to language to get stuff done without fuss.
Yep, this was my point. I'm curious what makes Python seem so obviously clear to other people, but not to me. Maybe it's the OO approach. I had done a lot of Java, Smalltalk, and Ruby before I ever tried to approach Python. Maybe it's because Python's OO support feels (to me) bolted on compared to those other languages which are obviously OO from the bottom up, and I'm unwittingly trying to apply mental models I developed in those other languages to how I think about Python.
But Python is OO from the bottom up. Unlike Java, everything in Python is an object.
Perhaps your experience with objects in other languages has given you a different mental model for what an object is. I find Python objects to be more straightforward than in other languages, especially because classes are objects, too.
If it really is OO, then the global `len()` function and like explicitly declaring "self" in method declarations makes it _feel_ bolted on (to me). Why is `len()` special? I immediately question what other basic operations aren't methods, but global functions.
And as for method declaration, if you aren't satisfied with implicit self, I much prefer Go's choice of having you declare the self reference for methods before the method name, instead of in the argument spec list (which then doesn't match the calling list). Python's way makes it feel like the compiler writer couldn't be bothered to hide the OO implementation on the declaration side, but embraced it on the calling side.
Meh, I know these have been hashed over a thousand times here. Just some of the things that rub me the wrong way when I've tried to deal with Python.
Python is a multi-paradigm language in the sense that it does not explicitly force the programmer to write all code in a particular way. So, for example, Python does not forbid the existence of standalone functions, or the execution of functions without looking them up through a class of which they happen to be a member.
But given that it is inescapably true that every function call in Python is translated to a call of a method of an object, it's hard to argue that it isn't "really" OO.
Guido once explained further in an email to the mailing list. I've forgotten some of it, but the gist is that he didn't want anyone to accidentally create or override a .len() method to do something other than tell the number of elements in the container.
And... almost everything is a method. Even ``len(obj)`` is just sugar for ``obj.__len__()``.
There are a few functions in the C API to let you call things, depending on the type of thing and set of arguments you feel like providing, and they rely on the Python API to handle the calling for you. I imagine if you really wanted to, and knew enough about the structure and expectations of the Python object you were working with, you could "manually" call without going through one of those C API functions, but I don't know that I'd recommend trying it...
I'm inclined to agree that python isn't the "clearest and friendliest". I've been using it a while, and I still find myself looking up how to do X, when it should be obvious. I like python, but I'm amazed at how people love it.
I have to maintain a codebase of php/perl/java/python. "Pythonistic" programming seems to encourage finding the shortest/fastest way to code things at the expense of clarity.
Plus dependencies can get headachy. This might just be the code I have to work with, but while better than perl, in my case its harder to maintain than java or php, (the global scope thing in python seems to get me).
I usually find it fairly easy to figure out what Python code does... as long as no errors occur. The fact that basically no Python code documents what errors it can throw/generate is really annoying.
every language has warts and gotchas. I judge a language by how nice its _idiomatic_ patterns are, rather than by how bad its warts are (unless there are an overwhelming number of warts that can't be avoided even in idiomatic code).
The point the article makes on comparing floating point values and the floating point type is true, but it's not because of any rounding error.
It's because the comparison operators are defined for every value. That is, "True < []" is valid in Python 2.7, along with any other 2 values, regardless of type. This is a surprising instance of weak typing in Python, which is otherwise strongly typed, which is why this was fixed in Python 3 (https://docs.python.org/3.0/whatsnew/3.0.html#ordering-compa...).
This is also not a case of Python doing something useful, like with '"foo"*2'. The result of the comparison is defined, but it's not useful. I suppose it was useful for making sure that you can always sort a list, but there are better ways to do that.
> But my point was that it's not related to weak typing as the parent seems to suggest.
And you're completely right there. Any language using floating-point numbers will have the same issue regardless of its typing discipline e.g. Rust: https://is.gd/4BNoWa
The speed of these operations isn't on the same order of magnitude as floating-point operations. I do agree that literals for `Fraction` and `Decimal` would be interesting.
Also, I think that '2.2' is better represented as `Decimal`, as it's a decimal number (which is a subset of rational numbers, that are usually better represented using `Decimal`) (edit: that of course depends on the use case, as Decimal uses fixed-point precision).
The documentation of most modules cited in the article start with a paragraph in red and bold warning the reader of the same danger explained by the author. So this is a nice compilation, but nothing new and nothing somebody looking at the documention of the module he's using will miss.
There are nonetheless good remarks about poor design choices of Python which can lead to misconceptions to newbies, such as naming `input` the function that does `eval(raw_input(prompt))` (as casually documented[0]), and the existence of such function in a first place.
Completely out of context, sorry, but couldn't avoid to note this:
"Being easy to pick up and progress quickly towards developing larger and more complicated applications, Python is becoming increasingly ubiquitous in computing environments".
Why would you change the order of the subject in such an unreadable way? Isn't much easier to say:
"Python is becoming increasingly ubiquitous in computing environments, as it's easy to pick up and progress quickly towards developing larger and more complicated applications"
I'm not expert in writing, it just sounded weird. If anyone can explains what's going on there, really appreciated.
Sounds to me like they started with something like "Being easy to pick up, Python is becoming increasingly ubiquitous..." and then made the old engineer-as-writer error of adding extra specificity.
The phrase "increasingly ubiquitous" doesn't make sense. Something is either ubiquitous or not. There's no such thing as increasingly infinite, for example.
It's best to avoid passive voice, but sometimes that requires difficult thinking about who is taking action. In this case, who is causing Python to be deployed in more computing environments? How about this revision:
"With increasing frequency, software engineers and system administrators are choosing Python, because the language is easy to learn and productive for developing large, complex applications."
I'm not sure that's how I'd explain Python's popularity.
I forget the second half of the line, but one of my high school English teachers would always say "being is bad, ...", meaning there's rarely ever a time when you'd want to use it over some other way of saying it. I guess it's along the same lines as "don't end a sentence in a preposition".
On the float behavior: I really wish Python 3 had the sense to do what Perl 6 did and interpret all literals with decimal points (except those that use scientific notation) as Fractions instead of floats. That would solve all these floating-point errors without requiring significant modification of code, plus Python 3 would be the perfect time to do it because they're already throwing out backwards compatibility because of the str/bytes thing.
If Python used the Perl 6 model, you could still use floats by writing your literals in scientific notation, so if you want floating-point performance, you can still get it. For example, 2.2 would be a Fraction, but 2.2e0 would be a float. I don't want to eliminate floats from the language, just hide them from average users by default.
And it's not like rationals-as-default are just some weird Perl 6-ism. Haskell does the same thing, and the language is fairly well-received.
> Also, while nice, Fractions have their own pitfalls due to potentially catastrophic runtime behavior.
It's in their threat model under 'Module injection':
> The mitigation is to maintain secure access permissions on all directories and package files in search path to ensure unprivileged users do not have write access to them.
Ok, I see. To be honest I read that as "keep your PYTHONPATH sane". I think that's a bit different from worrying about someone having write access to the source, but still related - point taken.
CVE-2008-5983 (https://bugs.python.org/issue5753) "Untrusted search path vuln... prepends an empty string to sys.path when the argv[0] argument does not contain a path separator"
Yes it does... One of the examples is monkey patching using bytecode. How are you going to do that without write access to the filesystem running your code?
The same is true for module imports... If you have write access to the same directory as the code itself there's all sorts of havoc one can cause beyond merely substituting your own os.py.
The part of the article about an issue with name mangling of private fields is somehow misleading.
The feature is just some syntactic sugar.
When within a class, private fields such as:
class Foo:
def __init__(self):
self.__bar
are accessible from within other methods of class Foo as `self.__bar`. But that's just syntactic sugar, the real name of `self.__bar` is `self._Foo__bar`.
So from the outside "world", including `hasattr()`, you can still access `self.__bar` as `Foo()._Foo__bar`.
>>> class Foo():
... def __init__(self):
... self.__bar = 'hello'
... def show(self):
... print(1, self.__bar)
... print(2, getattr(self, '__bar'))
...
>>> foo = Foo()
>>> foo._Foo__bar
True
>>> foo.show()
1 hello
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 6, in show
AttributeError: 'Foo' object has no attribute '__bar'
>>> foo.__bar = 'world'
>>> foo.show()
1 hello
2 world
In the end, when `x.__private` is setup outside of the class definition, obviously, it's a new member as its name differs from the internal name `__private` (which really is `_X__private`).
From within the code doing `getattr('X', '__private')` will return the `__private` setup from outside the class, and `getattr('X', '_X__private')` the one defined from within the class.
The whole point of that feature is to ensure that members defined within a class that are not part of the public API are left untouched when that class get subclassed, to avoid unexpected behaviours.
Here's an example of why this has been designed:
>>> class A:
... def __init__(self):
... self.__internal = "this is a"
... def show(self):
... print(1, "A", self.__internal)
...
>>> class B(A):
... def __init__(self):
... super(B, self).__init__()
... self.__internal = "this is b"
... def show(self):
... super(B, self).show()
... print(2, "B", self._A__internal)
... print(3, "B", self.__internal)
...
>>> B().show()
1 A this is a
2 B this is a
3 B this is b
>>>
There's nothing that should be surprising or asymmetrical to anybody who've read the python documentation, and use that feature appropriately. It's maybe a weird feature, but it's still a coherent and homogeneous behaviour and actually adding more safety to codes.
I'd say that private methods are double underscored, and protected methods are single underscored, since the goal of the __ is to prevent child classes from being able to use the parent implementation via self.__meth.
I would not call it "syntactic sugar", but rather a leak of implementation details. It could be deliberate, like Perl did for its OO (showing the entrails of all its objects), but it's not particularly sugary-sweet-yummy.
It's not an implementation detail, it is a deliberate, specified and documented feature of the language dedicated to namespace conflict resolutions in the context of inheritance.
It's so crude that, again, it looks like implementation details having been promoted to specs. Field visibility and name mangling done with very magic underscores just doesn't look right, to me.
Which is my point, it's still roll-your-own, with some magic when necessary. I don't want to make a fuss about it, I mean, it's great for tinkering or small projects, but I prefer when my compiler works harder for me and actually enforces what is intended (bar some runtime reflection for very special cases).
Or it's just a philosophy that you happen not to like, which doesn't make it objectively bad or wrong. There's room for people to have different tastes and preferences.
Yes, it is intended and fully part of the philosophy behind the language's design (as shown in PEP-20, aka the Zen of Python):
> Explicit is better than implicit
So there's no "leaking" of implementation details, because the implementation shall always be fully exposed.
As said in a sibling post, the private fields are just public fields, which are not documented as part of the public API and start with a `_`.
And as I said in the parent post, the reason to use the name mangling mechanism on top of that is to ensure that those variables won't be used by descendants in the class hierarchy, when a given class is intended to be subclassed by a peer.
The the sugar /is/ actually sugary-sweet-yummy, as it's preventing potential faults from people who blindly subclass stuff they haven't read the source code of.
I get full well what it does, but I find the syntax "magic" and not very attractive. Why no special keyword for visibility, proper namespaces etc.?
In c++ and likes, I know there's a vtable somewhere, and I get why there must be one for virtual funcs, but I don't want to deal with it directly, it's the compiler job. Same for Python, I know it must prevent fields from getting clobbered when inheriting, but I don't want to be exposed to its mangling or whatever other mechanism it uses.
> Same for Python, I know it must prevent fields from getting clobbered when inheriting, but I don't want to be exposed to its mangling or whatever other mechanism it uses.
Python does not prevent anything, it gives the developer a name mangling tool, it's up to the developer whether they want to use it or not. By default, identical names will conflict and you will clobber supertype fields or methods.
Again, that's very Perl-ish, to me. "Here are some tools and tricks, use them to do your own OO if you want--but you can easily bypass it all every time you wish".
the last one (script injection) isn't limited to python but any language that make use of template engine. escaping variables should be the default behavior.
Now I like python, it has many useful libraries, in fact it is one of the language that has the most libraries for any purpose. I wish, even as a dynamically typed language, it was stricter sometimes though.
Yes, in Python 2, input() is a shortcut for eval(raw_input(...)), and documented as such. Obviously that is not a safe way to parse user input, and therefore it has been changed in Python 3. So this has been fixed, but if you don't read the documentation you probably will keep introducing security issues with whatever programming language.
> Assert statement
If you want to effectively protect against a certain condition, raise an exception! Asserts, on the other hand, exist to help debugging (and documenting) conditions that should never occur by proper API usage. Stripping debugging code when optimizing is common practice, not only with Python.
> Reusable integers
First of all, this behavior isn't part of the Python programming language, but an implementation detail, and a feature as it reduces memory footprint. But even when small integers wouldn't be cached, you would still have the same situation when using the is operator on variables holding the same int object. On the other hand, caching all integers could easily cause a notable memory leak, in particular considering that ints in Python 3 (like longs in Python 2) can be as large as memory available. But either way, there is no good reason to check for identify if you want to compare values, anyway.
> Floats comparison
floats in Python use essentially the native "double" type. Hence they have whatever precision, your CPU has for double precision floating point numbers, actually it is specified in IEEE 754. That way floating point numbers are reasonable fast, while as precise as in most other programming languages. However, if that still isn't enough for your use case, Python also comes with the decimal module (for fixed-point decimal numbers) and the fractions module (for infinite precision fractions).
And as for infinity, while one would expect float('infinity') to be larger than any numerical value, the result of comparing a numerical value with a non-numerical type is undefined. However, Python 3 is more strict and raises a TypeError.
> Private attributes
Class-private attributes (those starting with __) exist to avoid conflicts with class-private attributes of other classes in the class hierarchy, or similar accidents. From my experience that is a feature that is rarely needed, even more rarely in combination with getattr()/setattr()/delattr(). But if you need to dynamically lookup class-private attributes you can still do so like hastattr('_classname__attrname'). After all, self.__attrname is just syntactical sugar for self._classname__attrname.
Also note that private attributes aren't meant as a security mechanism, but merely to avoid accidents. That's not specific to Python; in most object-oriented languages it is possible to to access private attributes, one way or another. However, Python tries to be transparent about that fact, by keeping it simple.
> Module injection
Yes, Python looks in a few places for modules to be imported. That mechanism is quite useful for a couple of reasons, but most notably it's necessary to use modules without installing them system-wide. It can only become a security hole if a malicious user has write access to any location in sys.path, but not to the script, importing the modules, itself. I can hardly think about a scenario like that, and even then I'd rather blame the misconfiguration of the server.
> Code execution on import
Yes, just like every other script language, Python modules can execute arbitrary code on import. That is quite expected, necessary, and not limited to Python. Even if module injection is an issue, it doesn't make anything worse, as you you don't necessarily have to run malicious code on module import but could do it with whatever API is being called. But as outlined above, this is a rather theoretical scenario.
> Shell injection via subprocess
Yes, executing untrusted input, is insecure. That is why the functions in Python's subprocess module, by default, expect a sequence of arguments, rather than a string that is parsed by the system's shell. The documentation clearly explains the consequences of using shell=True. So introducing a shell injection vulnerability by accident, in Python, seems less likely than with most other programming languages.
> Temporary files
If anything, Python is as unsecure as the underlying system, and therefore as most other programming languages too. But CWE-377, the issue the author is talking about, isn't particular easy to exploit in a meaningful way, plus it requires the attacker to already have access to the local temporary directory. Moreover, Python's tempfile module encourages the use of high-level APIs that aren't effected.
> Templating engines
The reason jinja2 doesn't escape HTML markup by default is that it is not an HTML template engine, but a general purpose template engine, which is meant to generate any text-based format. Of course, it is highly recommended to turn on autoescaping when generating HTML/XML output. But enforcing autoescaping would break other formats.
Having written code in Python for a few years, I've come across most of these (some of the ways to hack builtins/modify the code on a function reference were new to me).
However, it had also never occurred to me to make anything I cared about the security of in python. Perhaps this article is aimed at people who are writing system utilities for linux distributions, and are considering Python? Presumably some such utilities are written that way already.
It comes down to doing a proper security analysis before you define the requirements of the software: Specifically what attack vectors you want to defend against. A valid conclusion for some types of software, given the list of "bugs" in the post, would be don't write it in Python. (Indeed, I have done exactly this before writing 200 lines of C instead of 20 lines of Python.)
> A valid conclusion for some types of software, given the list of "bugs" in the post, would be don't write it in Python.
Do you have some specific types in mind? I know some types of protection are not reachable from python directly and require native modules, but I'm not sure what would cause you to drop Python altogether. I'd be interested to hear some examples.
Well, for example, you might want to write a package manager for a Linux distribution[1]. An attacker could change the behaviour of someone's package manager by some non-privileged means (messing with the user's python path and placing an evil self-hiding module that changes the behaviour of the engine e.g. by always listing a malicious package as a dependency of whatever you're currently installing).
The problematic code here is Python's `import` mechanism and mutable global references to standard library functions. You can cut out the "buggy" code by writing in another language.
That's possible, but I don't think it's relevant. If you have access to the user's profile, it doesn't really matter what language the package manager uses. You still have rights to create aliases, create local wrapper scripts, use LD_PRELOAD, set LD_LIBRARY_PATH, and many other ways to execute your own code. You can stop the user accessing the original package manager in the first place.
So many people chiming in with their dismissive comments and superior Python knowledge. The article is excellent and should be required reading for Python devs. Having it in one place is valuable resource.
it's historical - python was originally conceived as a toy language for teaching, in which context being able to do x = input(), type 2, and get an integer, is a desirable property.
Then we got stuck with it because backwards compatibility.
"Reusable integers" is a real fail - it violates the principle of least surprise and introduces a nasty inconsistency - all integers should logically be (refer to) the same integer object, not just the first 100.
Assert is a statement, not an expression, so do not use it as an expression.
One should never compare floats. This is taught in any freshman CS course. The limitation is due to the standard encoding of floats - IEEE 754 - not Python's fault.
Everything else are features of a truly dynamic language, designed for a really quick prototyping. Python3.x got rid of many inconsistencies and caveats of 2.x
Yes, but it ether should be that way for all or for none of them, not for some of them.)
Logically, they should refer to the same entity. It is "natural" - when people are trying to communicate a concept to one another they assume they are referring to the same concept. Not to an instance of it.)
> "Reusable integers" is a real fail - it violates the principle of least surprise and introduces a nasty inconsistency - all integers should logically be (refer to) the same integer object, not just the first 100.
I find the concept of special-casing ints to behave that way to be surprising and inconsistent. If ints act that way, shouldn't strings? And if they (very much unexpectedly) did, why not every other type?
"is" is very useful on its own. "variable is None" is a common and powerful idiom entirely distinct from "variable == None". There are many cases when you want to compare object identity. None of those use cases apply to ints where "==" is always the correct way to compare them, so the fact that "a == b" and "a is b" might occasionally be the same or different doesn't affect anything at all in practice.
`str' can be interned in some situations, though the rules vary across implementations and versions. Most of these things just boil down to unintuitive caching optimizations. Like you mention, it's pretty rare to check the object identity for integers or strings, but if you are doing so, you probably want the real answer.
Aside Python's small integers, True, False, and None, Java has these rules for boxing in the specification [1]:
> If the value p being boxed is an integer literal of type int between -128 and 127 inclusive (§3.10.1), or the boolean literal true or false (§3.10.3), or a character literal between '\u0000' and '\u007f' inclusive (§3.10.4), then let a and b be the results of any two boxing conversions of p. It is always the case that a == b.
> Ideally, boxing a primitive value would always yield an identical reference. In practice, this may not be feasible using existing implementation techniques. The rule above is a pragmatic compromise, requiring that certain common values always be boxed into indistinguishable objects. The implementation may cache these, lazily or eagerly. For other values, the rule disallows any assumptions about the identity of the boxed values on the programmer's part
"Contract conditions should never be violated during execution of a bug-free program. Contracts are therefore typically only checked in debug mode during software development. Later at release, the contract checks are disabled to maximize performance." - https://en.wikipedia.org/wiki/Design_by_contract