str¶
The str
object in Python 3 is quite similar but not identical to the
Python 2 unicode
object.
The major difference is the stricter type-checking of Py3’s str
that
enforces a distinction between unicode strings and byte-strings, such as when
comparing, concatenating, joining, or replacing parts of strings.
There are also other differences, such as the repr
of unicode strings in
Py2 having a u'...'
prefix, versus simply '...'
, and the removal of
the str.decode()
method in Py3.
future
contains a newstr
type that is a backport of the
str
object from Python 3. This inherits from the Python 2
unicode
class but has customizations to improve compatibility with
Python 3’s str
object. You can use it as follows:
>>> from __future__ import unicode_literals
>>> from builtins import str
On Py2, this gives us:
>>> str
future.types.newstr.newstr
(On Py3, it is simply the usual builtin str
object.)
Then, for example, the following code has the same effect on Py2 as on Py3:
>>> s = str(u'ABCD')
>>> assert s != b'ABCD'
>>> assert isinstance(s.encode('utf-8'), bytes)
>>> assert isinstance(b.decode('utf-8'), str)
These raise TypeErrors:
>>> bytes(b'B') in s
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'in <string>' requires string as left operand, not <type 'str'>
>>> s.find(bytes(b'A'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: argument can't be <type 'str'>
Various other operations that mix strings and bytes or other types are
permitted on Py2 with the newstr
class even though they
are illegal with Python 3. For example:
>>> s2 = b'/' + str('ABCD')
>>> s2
'/ABCD'
>>> type(s2)
future.types.newstr.newstr
This is allowed for compatibility with parts of the Python 2 standard
library and various third-party libraries that mix byte-strings and unicode
strings loosely. One example is os.path.join
on Python 2, which
attempts to add the byte-string b'/'
to its arguments, whether or not
they are unicode. (See posixpath.py
.) Another example is the
escape()
function in Django 1.4’s django.utils.html
.
In most other ways, these builtins.str
objects on Py2 have the
same behaviours as Python 3’s str
:
>>> s = str('ABCD')
>>> assert repr(s) == 'ABCD' # consistent repr with Py3 (no u prefix)
>>> assert list(s) == ['A', 'B', 'C', 'D']
>>> assert s.split('B') == ['A', 'CD']
The str
type from builtins
also provides support for the
surrogateescape
error handler on Python 2.x. Here is an example that works
identically on Python 2.x and 3.x:
>>> from builtins import str
>>> s = str(u'\udcff')
>>> s.encode('utf-8', 'surrogateescape')
b'\xff'
This feature is in alpha. Please leave feedback here about whether this works for you.