A Python internals adventure (2014)

_ooqq · on Dec 29, 2017

The reason for why the whole unicode checking is done in python3 is to guarantee Unicode support in python3. I actually disliked the old half-string-half-binary approach and almost from the start enjoyed the clear distinction between str and bytes in python3.

That being said, the strings/bytes cleanup was also one of the few things that really broke backward compatibility with 2.x.

tbodt · on Dec 29, 2017

The convention with the Python C API is to return a non-NULL pointer to a python object on success, and return NULL and set the exception global variable on error. Yes, global variables are also alive and well.

    PyObject *fout = _PySys_GetObjectId(&PyId_stdout);
    stdout_encoding = _PyObject_GetAttrId(fout, &PyId_encoding);

The python equivalent of this is `sys.stdout.encoding`. The StringIO object was constructed without an encoding, so this is None.

    stdout_encoding_str = PyUnicode_AsUTF8(stdout_encoding);

This tries to convert None to a C string, which fails.

jwilk · on Dec 29, 2017

The bug has been fixed since then:

https://bugs.python.org/issue8256