bytes consistently and correctly has traditionally been one
of the most difficult tasks in writing a Py2/3 compatible codebase. This
is because the Python 2
bytes object is simply an alias for
str, rather than a true implementation of the Python
bytes object, which is substantially different.
future contains a backport of the
bytes object from Python 3
which passes most of the Python 3 tests for
tests/test_future/test_bytes.py in the source tree.) You can use it as
>>> from builtins import bytes >>> b = bytes(b'ABCD')
On Py3, this is simply the builtin
bytes object. On Py2, this
object is a subclass of Python 2’s
str that enforces the same
strict separation of unicode strings and byte strings as Python 3’s
>>> b + u'EFGH' # TypeError Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: argument can't be unicode string >>> bytes(b',').join([u'Fred', u'Bill']) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: sequence item 0: expected bytes, found unicode string >>> b == u'ABCD' False >>> b < u'abc' Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unorderable types: bytes() and <type 'unicode'>
b = bytes(b'ABCD') assert list(b) == [65, 66, 67, 68] assert repr(b) == "b'ABCD'" assert b.split(b'B') == [b'A', b'CD']
Currently the easiest way to ensure identical behaviour of byte-strings
in a Py2/3 codebase is to wrap all byte-string literals
b'...' in a
bytes() call as follows:
from builtins import bytes # ... b = bytes(b'This is my bytestring') # ...
This is not perfect, but it is superior to manually debugging and fixing code incompatibilities caused by the many differences between Py3 bytes and Py2 strings.
>>> from builtins import bytes >>> b = bytes(b'\xff') >>> b.decode('utf-8', 'surrogateescape') '\udcc3'
This feature is in alpha. Please leave feedback here about whether this works for you.