Which turned into a big internet fight, and he changed his mind.
It's good to see people willing to change the way they do things, especially when that means publicly contradicting their former selves and with such an intense personality. It's oddly inspiring.
Also, when he wrote it was kind of true. A lot of things hadn't been ported to Python 3 yet and using it was asking for pain. Now finally we are in the opposite situation where many things are ending support for Python 2 and the easier route is Python 3.
He updated it a few months ago, but he's had versions of this for quite some time. Yes, maintaining it in late 2016 is somewhat stubborn, but it made a lot of sense in the early years of Python 3.
I have a difficult time believing that we're talking about the same rant, because the one I'm talking about would not have made any sense even in the early years of Python 3.
Oh, I see it's been updated with a sober disclaimer now, where "In the previous version I trolled people" and "I even had a note after the gag saying it was a gag, but everyone is too stupid to read that note even when they do elaborate responses to my writing."
I read his writing very carefully. This note didn't exist.
For those of you now breaking out the Internet Archive to prove the note existed
> "Yes, that is kind of funny way of saying that there's no reason why Python 2 and Python 3 can't coexist other than the Python project's incompetence and arrogance. Obviously it's theoretically possible to run Python 2 in Python 3, but until they do it then they have decided to say that Python 3 cannot run one other Turing complete language so logically Python 3 is not Turing complete."
Even though he says it's a "funny" way to say something, his explanation of said "joke" is not a correct description of what Turing completeness is. There is no "gag" here; Zed didn't understand it, and when people on the internet corrected him, he decided to play it off as a gag.
Edit: Just to make it explicit for those of you reading who don't get it, "Turing completeness" doesn't mean "I can run a program written in X in Y's runtime." It means that a language is capable of expressing all operations a Turing machine can do, which means that you can re-write a program in X to a program in Y if both X and Y are Turing complete. You can obviously re-write a Python 2 program to be a Python 3 program, so both of those languages are Turing complete.
The property he describes later, of the JVM and the CLR supporting multiple languages has absolutely nothing to do with Turing completeness. Lisp and Javascript are both Turing complete languages, but the fact that you can't run Lisp on Node.js doesn't mean you can't re-write a Lisp program in Javascript. The fact that he's equating being able to run many languages on a single runtime and Turing compleness means he doesn't understand what he's talking about.
>> Obviously it's theoretically possible to run Python 2 in Python 3, but until they do it then they have decided to say that Python 3 cannot run one other Turing complete language so logically Python 3 is not Turing complete."
> "Turing completeness" doesn't mean "I can run a program written in X in Y's runtime."
That's not quite what he meant. He meant that Turing completeness of both languages implies that an interpreter for Python 2 can be written in Python 3,and that if the Python maintainers say it isn't possible, etc. etc.
Which is technically true, except that it is completely obvious that the performance of Python 2 under that arrangement would so totally suck that no sane developer would ever take that approach (and I can believe that the Python maintainers just said "that won't work" or some such), and that is what he apparently didn't understand (or is now claiming he pretended not to understand for comedic effect).
> "Turing completeness" doesn't mean "I can run a program written in X in Y's runtime."
It's not the definition of Turing Complete, but it is a provable property of Turing-Completeness. If Y is Turing-Complete, then you can use Y to write an interpreter for X. Then the issue is reduced to arguing about what the word "in" means in the phrase "run Python 2 in Python 3."
Actually it's reduced to the words "in Python 3" because the conflation between the Python 3.x language and the CPython 3.x runtime is something that is prevalent throughout the document.
Just because the maintainers of CPython 3 haven't included a Python 2 interpreter doesn't mean that Python 3 is not Turing complete. Their choice not to do that has nothing to do with the Turing completeness of anything.
> Just because the maintainers of CPython 3 haven't included a Python 2 interpreter doesn't mean that Python 3 is not Turing complete. Their choice not to do that has nothing to do with the Turing completeness of anything.
Which is precisely the point that the original version of the essay was making. The joke was on those insisting that it couldn't possibly work, and the punchline was telling them "oh, so what you're saying is that your language is not Turing complete?"
Ok, maybe that doesn't follow the format for a classic joke, but it's certainly a humorous and sarcastic remark.
Except that the original statement isn't really that you can't write a python2 interpreter in python3. In fact, I'm quite sure that those exist somewhere on github, but that there wasn't some kind of transparent interop between py2 and py3.
That is, why can't the cPython3 implementation of the python3 language also execute python2 code? To which I think the simplest response is "why can't the gcc implementation of the C++ language spec link arm-compiled C++ and x86-compiled C++?"
But you can write a Lisp interpreter in Node.js, because they are both Turing complete. In fact, I believe I've seen tutorials written up on how to write a Lisp in Javascript.
The fact that Node.js, the software package that some people use to execute Javascript, does not include in it is default distribution a way to execute Lisp programs, says absolutely nothing about the Turing completeness of the Javascript programming language.
What those of you repeating this interpeter thing are missing is that there's a four term fallacy being employed here. Zed is trying to make an argument of the form:
And who are these "some people"? I don't see citations here identifying anybody.
Pro-Tip: They're straw men. Even if you find examples of people asserting that after the fact, the undentified parties asserting "Python 3 cannot run Python 2" in the way that Zed claims he's responding to, are only mentioned in this rant to make Zed look smarter than he is.
Agreed, it's not a problem of theoretical possibility. It's just that the maintainers' time can be better spent improving Python 3 than maintaining a Python 2 compatible runtime.
> You can obviously re-write a Python 2 program to be a Python 3 program, so both of those languages are Turing complete.
This is true, but the converse isn't true. e.g. Python and brainfuck are both turing complete, but Python has interfaces around a hell of a lot more syscalls than Brainfuck which only has operations for reading and writing from stdio. You can't write a Python implementation in brainfuck, it is impossible.
No, that's just more work, because you have to write the required syscall interfaces for Brainfuck. There's nothing about Brainfuck that makes that impossible.
There's a difference between "impossible" and "really difficult giant waste of time" here when it comes to discussing Turing completeness.
Actually, how would you do a syscall in Brainfuck? Its only way to interact with the outside world is through "input byte" and "output byte", and that does not give you the ability to do arbitrary syscalls. As far as I can tell, arbitrary syscalls are impossible to do in Brainfuck (which is why that one person made a fork of Brainfuck with added syscall support).
Of course, syscalls are not related to Turing-completeness in any way. Turing-completeness means that the language can express any computation which is possible to do on a Turing machine. Turing machine computations can not have any side effects, and syscalls are a kind of side effect. Therefore, having syscalls is not necessary for a language to be Turing-complete.
Yes, that is kind of funny way of saying that there's no reason why Python 2 and Python 3 can't coexist other than the Python project's incompetence and arrogance. Obviously it's theoretically possible to run Python 2 in Python 3, but until they do it then they have decided to say that Python 3 cannot run one other Turing complete language so logically Python 3 is not Turing complete. I should also mention that as stupid as that sounds, actual Python project developers have told me this, so it's their position that their own language is not Turing complete.
If you check hacker news and reddit threads from the original posting, that note wasn't on the article when it was originally posted. It was only added later, but apparently still before anyone archived it.
Edit: I'm mistaken, however I'll say what I said then again:
The note does nothing to show that he actually understands the statements he's making. The note gets the definition of turing completeness wrong too.
If you have a link proving that, I would love to see it. I completely believe you that it was added later, as it's my own recollection that there was no note.
But regardless, it doesn't change the fact that Zed wrote as if didn't know what he was talking about.
You didn't read it carefully enough. It did exist.
It read:
"Yes, that is kind of funny way of saying that there's no reason why Python 2 and Python 3 can't coexist other than the Python project's incompetence and arrogance. Obviously it's theoretically possible to run Python 2 in Python 3, but until they do it then they have decided to say that Python 3 cannot run one other Turing complete language so logically Python 3 is not Turing complete. I should also mention that as stupid as that sounds, actual Python project developers have told me this, so it's their position that their own language is not Turing complete."
I think Zed was in the majority when it came to Py3. He kind of comes off as a jerk due to his personality, but I really like his approach and he always makes me think. I might not agree with him but he certainly is not boring. Looking at Pandas which is a R like library. Wes wrote his book in Py2 and did not have any desire for Py3.
I moved from Python mostly to using R, Racket and Haxe for my other side projects. This infighting just left a bad taste in my mouth and started seeing greener pastures, which for once in my life were in fact greener. I still love Python but I don't use it much due to community drama.
> Wes wrote his book in Py2 and did not have any desire for Py3
I find it hard to believe you're citing a book that was published in 2012[1], when it made sense to still target Python 2, as relevant to today's argument. The new version is updated for Python 3.5[2].
I am totally confused. Zed Shaw wrote in 2010 how he didn't like Python 3 and his book wasn't going to show Python 3. Then he edited the note in November 2016 restating his position again.
Sometime between November 2016 to today he decided to release a Python 3 version.
I mention that the original idea that Python 3 was not a good practice in 2010 and was the majority view and again in 2012 a major library was released with Python 2 as its main target with the majority of the community was fine with the Python 2 focus.
Why am I making no sense time line?
I like Zed Shaw as a member of the Python community and I was defending his stance on sticking with Python 2 (I personal supported Python 3 since announcement)
So what is you agree and disagree with totally?????
Don't blame the community for that. Blame Guido and the core development team. The community at-large weren't rallying for breaking changes to the language. GvR wanted that, it's "his" language afterall. Of course some of the community are more conservative and were willing to more quickly fall in line to goosestep with the core team on Python3. So the tension stems from the Python3 folks having extreme disdain for all the Python2 users and codebases.
I agree with you though, it's what got me looking towards other tech like Node, Go, Elixir. I can look past it all except for the fact they got 'unicode by default' wrong. They should simply do what Go does, everything is a bytestring with assumed encoding as UTF8. What they have today is ridiculous and needs changed before I could embrace Python3+.
I can't understand what's going on with the Python community for folks to even make statements like yours. It has to be an influx of new Python developers who started on 3.
Python2 has unicode support. Python2 already had (better) async IO with Gevent (Tornado is also there). There's just nothing there with Python3 except what I can only describe as propaganda from the Python Software Foundation that has led to outright ignorance with many users.
Some people wanted "unicode strings by default". Which Python3 does have, but they even got that wrong. The general consensus on how to handle this correctly is to make everything a bytestring and assume the encoding as being UTF8 by default. Then like Python2, it just works.
You mean most of the mainstream, genpurpose languages in usage today which were initially designed 20-50 years ago. And those which don't actively look for opportunities to shoot themselves in the face with sidestepping changes to break the language? Then no, not that consensus.
But if you were to find any sensible designer of new language with skyrocketing, runaway popularity- such as Rob Pike, they'll tell you differently. Or even Guido van Rossum, if he were to be honest with you. While Pike's colleagues at Bell Labs like Dennis Ritchie may not have designed C this way for obvious reasons, they did design Go that way.
So now it's the consensus of "sensible designers of new languages". Where "sensible" is very subjective, and I have a feeling that your definition of it would basically presuppose agreeing with your conceptual view of strings, begging the question.
Aside from Go, can you give any other examples? Swift is obviously in the "new language" category (more so than Go, anyway), and yet it didn't go down that path.
Well, do your research and come to your own conclusions. Most people are going to agree that UTF8 is the way to go. You can advocate something else, since you seem to take affront to my opposition to Python3's Microsoft-oriented implementation.
If you know anything about Swift, it was designed with a primary goal to being a smooth transition from and interop with ObjC so like other legacy implementations (such as CPython3), it had sacrifices that limited how forward-looking it could be.
I'm not at all opposed to UTF-8 as internal encoding for strings. But that's completely different and orthogonal to what you're talking about, which is whether strings and byte arrays should be one and the same thing semantically, and represented by the same type, or by compatible types that are used interchangeably.
I want my strings to be strings - that is, an object for which it is guaranteed that enumerating codepoints always succeeds (and, consequently, any operation that requires valid codepoints, like e.g. case-insensitive comparison, also succeeds). This is not the case for "just assume a bytearray is UTF-8 if used in string context", which is why I consider this model broken on the abstraction level. It's kinda similar to languages that don't distinguish between strings and numbers, and interpret any numeric-looking string as number in the appropriate context.
FWIW, the string implementation in Python 3 supports UTF-8 representation. And it could probably be changed to use that as the primary canonical representation, only generating other ones if and when they're requested.
A default UTF8 string-type has to be allowed to be used interchangeably with bytestrings since ASCII is a valid subset. Your string type shouldn't be spellchecking nor checking for complete sentences either. What comes in, comes in. Validate it elsewhere.
Thus Go's strings don't potentially fail your desire for a guarantee anymore than anything else would assuming UTF8. They're unicode-by-default, which was the whole point to Python3 but Go has it too in a more elegant way. That's the beauty to UTF8 by default, you can pass it valid UTF8 or ASCII since it's a subset, the context of which it's being received is up to you. If you're expecting bytes it works, if you're expecting unicode codepoints that works. There's no reason to get your hands dirty with encodings unless you need to decode UTF16 etc first. If there is still a concern about data validation, that's up to you not your string type to throw an exception.
> A default UTF8 string-type has to be allowed to be used interchangeably with bytestrings since ASCII is a valid subset.
This only works in one direction. Sure, any valid UTF-8 is a bytestring. But not every bytestring is valid UTF-8. "Use interchangeably" implies the ability to substitute in both directions, which violates LSP.
> What comes in, comes in. Validate it elsewhere.
I have a problem with that. It's explicitly against the fail-fast design philosophy, whereby invalid input should be detected and acted upon as early as possible. First, because failing early helps identify the true origin of the error. And second because there's a category of subtle bugs where invalid input can be combined or processed in ways that make it valid-but-nonsensical, and as a result there are no reported errors at all, just quiet incorrect behavior.
Any language that has Unicode strings can handle ASCII just fine, since ASCII is a strict subset of UTF-8 - that doesn't require the encoding of the strings to be UTF-8. For languages that use a different encoding, it would mean that ASCII gets re-encoded into whatever the language uses, but this is largely an implementation detail.
Of course, if you're reading a source that is not necessarily in any Unicode encoding (UTF-8 or otherwise), and that may be non-string data, and you just need to pass the data through - well then, that's exactly what bytestrings are there for. The fact that you cannot easily mix them with strings (again, even if they're UTF-8-encoded) is a benefit in this case, because such mixing only makes sense if the bytestring is itself properly encoded. If it's not, you just silently get garbage. Using two different types at the input/output boundary makes it clear what assumptions can be made about every particular bit of input.
I understand your position. This is a longstanding debate within the PL community, as you know. I've considered that stance before but I have to say thanks for stating it so well because it's given me pause. I can't say I agree but you're not "wrong". I agree with the fail-fast concept but disagree that's the place to do it. A strict fail-fast diehard probably shouldn't even be using Python, IMO. I don't have anything else useful to add because this is simply a engineering design philosophy difference. We both think our own conclusions are better of course but I appreciate you detailing yours. Good chat, upvoted.
I agree that this is a difference caused by different basic premises, and irreconcilable without surrendering one of those premises.
And yes, you're right that Python is probably not a good example of ideologically pure fast-fail in general, simply because of its dynamic typing - so that part of argument is not really strong in that particular context.
(Side note: don't you find it amusing that between the two languages that we're discussing here, the one that is statically typed - Go - chose to be more relaxed about its strings in the type system, while the one that is dynamically typed - Python - chose to be more strict?)
The key takeaway, I think, is that there is no general consensus on this. As you say, "this is a longstanding debate within the PL community" (and not just the topics we touched upon, but the more broad design of strings - there's also the Ruby take on it, for example, where string is bytes + encoding).
My broader take, encoding aside, is that strings are actually not sufficiently high-level in most modern languages (including Python). I really want to get away from the notion of string as a container of anything, even Unicode code points, and instead treat it as an opaque representation of text that has various views exposing things like code points, glyphs, bytes in various encodings etc. But none of those things should be promoted as the primary view of what a string is - and, consequently, you shouldn't be able to write things like `s[i]` or `for ch in s`.
The reason is that I find that the ability to index either bytes or codepoints, while useful, has this unfortunate trait that it often gives right results on limited inputs (e.g. ASCII only, or UCS2 only, or no combining characters) while being wrong in general. When it's accessible via shorthand "default" syntax that doesn't make it explicit, people use it because that's what they notice first, and without thinking about what their mode of access implies. Then they throw inputs that are common for them (e.g. ASCII for Americans, Latin-1 for Western Europeans), observe that it works correctly, and conclude that it's correct (even though it's not).
If they are, instead, forced to explicitly spell out access mode - like `s.get_code_point(i)` and `for ch in s.code_points()` - they have to at least stop and think what a "code point" even is, how it's different from various other explicit options that they have there (like `s.get_glyph(i)`), and which one is more suitable to the task at hand.
And if we assume that all strings are sequences of bytes in UTF-8, the same point would also apply to those bytes - i.e. I'd still expect `s[i]` to not work, and having to write something like `s.get_byte(i)` or `for b in s.bytes()` - for all the same reasons.
It's worth nothing that I think Python's success and simplicity came from the ease of use and flexibility. It should have Go's strings. You have a point there. The type annotations are really jumping the shark too, it's just no longer the language that once made so much sense.
On the consensus on bytestrings assumed as UTF8, there's only 1 new language without legacy baggage that has skyrocketing popularity and it has bytestrings assumed as UTF8. Everyone I've surveyed says that's no coincidence, including members of the Python core dev team. So that's where I'm seeing consensus on the string issue. While Python3 has struggled and folks are rightly irritated. Because some were irresponsible everyone has to pay, that's not really how I viewed Python's strengths prior to 3.x. Zed said it best but it really was one of the best examples of how dynamic typing can be fun.
I normally wouldn't respond to any comments that are merely a link to someone elses thoughts without something original of your own. Because it means you likely don't know what you're talking about and merely attempting to speak through someone else because you think you agree. So I will respond not to your benefit but for anyone else who is new to Python and may come across this.
I've read that before and the author is ignorant. He's parroting GvR & the CPython core development team's line that unicode strings are codepoints. Sure, but he's arguing with himself and note the argument is Python2 vs 3. That narrow focus is what results in his tunnelvision. As a result of the argument as he frames it, Python3 is not better than Python2 in string handling, it's merely different. One favors POSIX (Linux), Python2. One favors the Windows way of doing things, Python3.
There is an outright better way to handle strings. It's what Google did with Go. How do we know it's better? Well, it is because it makes more sense on technical merits and members of the CPython core dev team have admitted that if Python3 were designed today they would go down this path. But during the initial Python3000 talks this option was not as obvious. Bad timing or poor implementation choices. Take your pick, given the runaway feature-soup that Python3 has become I'd assume both.
So like all tech, let Python3 live or die on its technical merits. That's exactly what the PSF has been afraid of, so we have the 2020 date which is nothing more than a political stunt among others. Python3 is merely different, it favors one usecase over another, but did not outright make Python better. To break a language for technical churn is and was a terrible idea.
You're right, I do lean towards agreeing with the author of the blog post. However, I wasn't (and am still not) in any way certain, and didn't want to be one of those asses you see on the internet who turn everything into a religious war. So I just put the information out there because I (in my ignorance) thought it was useful information from which intelligent people could draw their own conclusions.
Honestly, I don't care a great deal about string handling in Python and just wanted to inject (what I thought was) more information into the discussion. I'm kinda regretting that now. Lesson learned: steer clear and leave it to the experts.
Well, I'm not an expert but in effort for full-disclosure Guido and the CPython core dev team aren't either. They hold a myriad of excuses for their decisions and they're all highly suspect from even a casual observer that doesn't just drink the koolaid. In the end, they'll just tell you they maintain CPython so only their opinion matters. Fair, but they're still wrong. Python3 is controversial for good reasons.[0] It's not lazy folks or whatever ad hominem is out there today. I couldn't tell your intentions given the lack of information included with your post.
Go handles strings by having strings be like Python3's byte-strings and unicode-strings as one type. This enables code to be written that generally doesn't force you to very often think about encodings, which you shouldn't have to as UTF8 is the one true encoding.[1] Or litter your code with encode/decode, or receive exceptions from strings (see Zed's post on some of that) where there wasn't previously. Python3 solved the unicode emojibake mixing unicode and bytes problems that some developers created for themselves in Python2, but did so by forcing the burden to every single Python3 developer, and breaking everyone's code while simultaneously refusing to engineer an interpreter that could run both CPython2/3 bytecode. Which is possible, the Hotspot JVM and .NET CLR prove it. Shifting additional burdens to the developer in situations where it's necessary, makes sense. It wasn't here because of both Python's general abstraction level, and Go showed it can be solved elegantly. Strings are just bytes and they're assumed to be UTF8 encoding. Everyone wins. Only Windows-specific implementations like the original .Net CLR shouldn't be UTF-8 by default, internal and external representation. Only a diehard Windows-centric person would disagree or someone with a legacy implementation (Java, C#, Python3 etc). The CPython3 maintainers fully admit they're leaning towards the Windows way of handling strings.
As you know, handling text/bytes is fairly critical and fundamental. For Python3 to get this wrong with such a late stage breaking change with no effort to make up for it with a unified bytecode VM is unfortunate. Add in the feature soup and the whole thing is a mess.
It's good to see people willing to change the way they do things, especially when that means publicly contradicting their former selves and with such an intense personality. It's oddly inspiring.