str¶

The str object in Python 3 is quite similar but not identical to the Python 2 unicode object.

The major difference is the stricter type-checking of Py3’s str that enforces a distinction between unicode strings and byte-strings, such as when comparing, concatenating, joining, or replacing parts of strings.

There are also other differences, such as the repr of unicode strings in Py2 having a u'...' prefix, versus simply '...', and the removal of the str.decode() method in Py3.

future contains a newstr type that is a backport of the str object from Python 3. This inherits from the Python 2 unicode class but has customizations to improve compatibility with Python 3’s str object. You can use it as follows:

>>> from __future__ import unicode_literals
>>> from builtins import str

On Py2, this gives us:

>>> str
future.types.newstr.newstr

(On Py3, it is simply the usual builtin str object.)

Then, for example, the following code has the same effect on Py2 as on Py3:

>>> s = str(u'ABCD')
>>> assert s != b'ABCD'
>>> assert isinstance(s.encode('utf-8'), bytes)
>>> assert isinstance(b.decode('utf-8'), str)

These raise TypeErrors:

>>> bytes(b'B') in s
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'in <string>' requires string as left operand, not <type 'str'>

>>> s.find(bytes(b'A'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: argument can't be <type 'str'>

Various other operations that mix strings and bytes or other types are permitted on Py2 with the newstr class even though they are illegal with Python 3. For example:

>>> s2 = b'/' + str('ABCD')
>>> s2
'/ABCD'
>>> type(s2)
future.types.newstr.newstr

This is allowed for compatibility with parts of the Python 2 standard library and various third-party libraries that mix byte-strings and unicode strings loosely. One example is os.path.join on Python 2, which attempts to add the byte-string b'/' to its arguments, whether or not they are unicode. (See posixpath.py.) Another example is the escape() function in Django 1.4’s django.utils.html.

In most other ways, these builtins.str objects on Py2 have the same behaviours as Python 3’s str:

>>> s = str('ABCD')
>>> assert repr(s) == 'ABCD'      # consistent repr with Py3 (no u prefix)
>>> assert list(s) == ['A', 'B', 'C', 'D']
>>> assert s.split('B') == ['A', 'CD']

The str type from builtins also provides support for the surrogateescape error handler on Python 2.x. Here is an example that works identically on Python 2.x and 3.x:

>>> from builtins import str
>>> s = str(u'\udcff')
>>> s.encode('utf-8', 'surrogateescape')
b'\xff'

This feature is in alpha. Please leave feedback here about whether this works for you.