https://stackoverflow.com/questions/26079392/how-is-unicode-represented-internally-in-python
https://www.python.org/dev/peps/pep-0393/
Python 3.3 switched to a new internal representation, using the most compact form needed to represent all characters in a string.
Either 1 byte, 2 bytes or 4 bytes are picked. ASCII and Latin-1 text uses just 1 byte per character,
the rest of the BMP characters require 2 bytes and after that 4 bytes is used.