What else you need to know¶
The following points are important to know about when writing Python 2/3 compatible code.
bytes¶
Handling bytes
consistently and correctly has traditionally been one
of the most difficult tasks in writing a Py2/3 compatible codebase. This
is because the Python 2 bytes
object is simply an alias for
Python 2’s str
, rather than a true implementation of the Python
3 bytes
object, which is substantially different.
future
contains a backport of the bytes
object from Python 3
which passes most of the Python 3 tests for bytes
. (See
tests/test_future/test_bytes.py
in the source tree.) You can use it as
follows:
>>> from builtins import bytes
>>> b = bytes(b'ABCD')
On Py3, this is simply the builtin bytes
object. On Py2, this
object is a subclass of Python 2’s str
that enforces the same
strict separation of unicode strings and byte strings as Python 3’s
bytes
object:
>>> b + u'EFGH' # TypeError
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: argument can't be unicode string
>>> bytes(b',').join([u'Fred', u'Bill'])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: sequence item 0: expected bytes, found unicode string
>>> b == u'ABCD'
False
>>> b < u'abc'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unorderable types: bytes() and <type 'unicode'>
In most other ways, these bytes
objects have identical
behaviours to Python 3’s bytes
:
b = bytes(b'ABCD')
assert list(b) == [65, 66, 67, 68]
assert repr(b) == "b'ABCD'"
assert b.split(b'B') == [b'A', b'CD']
Currently the easiest way to ensure identical behaviour of byte-strings
in a Py2/3 codebase is to wrap all byte-string literals b'...'
in a
bytes()
call as follows:
from builtins import bytes
# ...
b = bytes(b'This is my bytestring')
# ...
This is not perfect, but it is superior to manually debugging and fixing code incompatibilities caused by the many differences between Py3 bytes and Py2 strings.
The bytes
type from builtins
also provides support for the
surrogateescape
error handler on Python 2.x. Here is an example that works
identically on Python 2.x and 3.x:
>>> from builtins import bytes
>>> b = bytes(b'\xff')
>>> b.decode('utf-8', 'surrogateescape')
'\udcc3'
This feature is in alpha. Please leave feedback here about whether this works for you.
str¶
The str
object in Python 3 is quite similar but not identical to the
Python 2 unicode
object.
The major difference is the stricter type-checking of Py3’s str
that
enforces a distinction between unicode strings and byte-strings, such as when
comparing, concatenating, joining, or replacing parts of strings.
There are also other differences, such as the repr
of unicode strings in
Py2 having a u'...'
prefix, versus simply '...'
, and the removal of
the str.decode()
method in Py3.
future
contains a newstr
type that is a backport of the
str
object from Python 3. This inherits from the Python 2
unicode
class but has customizations to improve compatibility with
Python 3’s str
object. You can use it as follows:
>>> from __future__ import unicode_literals
>>> from builtins import str
On Py2, this gives us:
>>> str
future.types.newstr.newstr
(On Py3, it is simply the usual builtin str
object.)
Then, for example, the following code has the same effect on Py2 as on Py3:
>>> s = str(u'ABCD')
>>> assert s != b'ABCD'
>>> assert isinstance(s.encode('utf-8'), bytes)
>>> assert isinstance(b.decode('utf-8'), str)
These raise TypeErrors:
>>> bytes(b'B') in s
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'in <string>' requires string as left operand, not <type 'str'>
>>> s.find(bytes(b'A'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: argument can't be <type 'str'>
Various other operations that mix strings and bytes or other types are
permitted on Py2 with the newstr
class even though they
are illegal with Python 3. For example:
>>> s2 = b'/' + str('ABCD')
>>> s2
'/ABCD'
>>> type(s2)
future.types.newstr.newstr
This is allowed for compatibility with parts of the Python 2 standard
library and various third-party libraries that mix byte-strings and unicode
strings loosely. One example is os.path.join
on Python 2, which
attempts to add the byte-string b'/'
to its arguments, whether or not
they are unicode. (See posixpath.py
.) Another example is the
escape()
function in Django 1.4’s django.utils.html
.
In most other ways, these builtins.str
objects on Py2 have the
same behaviours as Python 3’s str
:
>>> s = str('ABCD')
>>> assert repr(s) == 'ABCD' # consistent repr with Py3 (no u prefix)
>>> assert list(s) == ['A', 'B', 'C', 'D']
>>> assert s.split('B') == ['A', 'CD']
The str
type from builtins
also provides support for the
surrogateescape
error handler on Python 2.x. Here is an example that works
identically on Python 2.x and 3.x:
>>> from builtins import str
>>> s = str(u'\udcff')
>>> s.encode('utf-8', 'surrogateescape')
b'\xff'
This feature is in alpha. Please leave feedback here about whether this works for you.
dict¶
Python 3 dictionaries have .keys()
, .values()
, and .items()
methods which return memory-efficient set-like iterator objects, not lists.
(See PEP 3106.)
If your dictionaries are small, performance is not critical, and you don’t need the set-like behaviour of iterator objects from Python 3, you can of course stick with standard Python 3 code in your Py2/3 compatible codebase:
# Assuming d is a native dict ...
for key in d:
# code here
for item in d.items():
# code here
for value in d.values():
# code here
In this case there will be memory overhead of list creation on Py2 for each
call to items
, values
or keys
.
For improved efficiency, future.builtins
(aliased to builtins
) provides
a Python 2 dict
subclass whose keys()
, values()
, and
items()
methods return iterators on all versions of Python >= 2.7. On
Python 2.7, these iterators also have the same set-like view behaviour as
dictionaries in Python 3. This can streamline code that iterates over large
dictionaries. For example:
from __future__ import print_function
from builtins import dict, range
# Memory-efficient construction:
d = dict((i, i**2) for i in range(10**7))
assert not isinstance(d.items(), list)
# Because items() is memory-efficient, so is this:
d2 = dict((v, k) for (k, v) in d.items())
As usual, on Python 3 dict
imported from either builtins
or
future.builtins
is just the built-in dict
class.
Memory-efficiency and alternatives¶
If you already have large native dictionaries, the downside to wrapping them in
a dict
call is that memory is copied (on both Py3 and on Py2). For
example:
# This allocates and then frees a large amount of temporary memory:
d = dict({i: i**2 for i in range(10**7)})
If dictionary methods like values
and items
are called only once, this
obviously negates the memory benefits offered by the overridden methods through
not creating temporary lists.
The memory-efficient (and CPU-efficient) alternatives are:
to construct a dictionary from an iterator. The above line could use a generator like this:
d = dict((i, i**2) for i in range(10**7))
to construct an empty dictionary with a
dict()
call usingbuiltins.dict
(rather than{}
) and then update it;to use the
viewitems
etc. functions fromfuture.utils
, passing in regular dictionaries:from future.utils import viewkeys, viewvalues, viewitems for (key, value) in viewitems(hugedictionary): # some code here # Set intersection: d = {i**2: i for i in range(1000)} both = viewkeys(d) & set(range(0, 1000, 7)) # Set union: both = viewvalues(d1) | viewvalues(d2)
For compatibility, the functions iteritems
etc. are also available in
future.utils
. These are equivalent to the functions of the same names in
six
, which is equivalent to calling the iteritems
etc. methods on
Python 2, or to calling items
etc. on Python 3.
int¶
Python 3’s int
type is very similar to Python 2’s long
, except
for the representation (which omits the L
suffix in Python 2). Python
2’s usual (short) integers have been removed from Python 3, as has the
long
builtin name.
Python 3:
>>> 2**64
18446744073709551616
Python 2:
>>> 2**64
18446744073709551616L
future
includes a backport of Python 3’s int
that
is a subclass of Python 2’s long
with the same representation
behaviour as Python 3’s int
. To ensure an integer is long compatibly with
both Py3 and Py2, cast it like this:
>>> from builtins import int
>>> must_be_a_long_integer = int(1234)
The backported int
object helps with writing doctests and simplifies code
that deals with long
and int
as special cases on Py2. An example is the
following code from xlwt-future
(called by the xlwt.antlr.BitSet
class)
for writing out Excel .xls
spreadsheets. With future
, the code is:
from builtins import int
def longify(data):
"""
Turns data (an int or long, or a list of ints or longs) into a
list of longs.
"""
if not data:
return [int(0)]
if not isinstance(data, list):
return [int(data)]
return list(map(int, data))
Without future
(or with future
< 0.7), this might be:
def longify(data):
"""
Turns data (an int or long, or a list of ints or longs) into a
list of longs.
"""
if not data:
if PY3:
return [0]
else:
return [long(0)]
if not isinstance(data,list):
if PY3:
return [int(data)]
else:
return [long(data)]
if PY3:
return list(map(int, data)) # same as returning data, but with up-front typechecking
else:
return list(map(long, data))
isinstance¶
The following tests all pass on Python 3:
>>> assert isinstance(2**62, int)
>>> assert isinstance(2**63, int)
>>> assert isinstance(b'my byte-string', bytes)
>>> assert isinstance(u'unicode string 1', str)
>>> assert isinstance('unicode string 2', str)
However, two of these normally fail on Python 2:
>>> assert isinstance(2**63, int)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AssertionError
>>> assert isinstance(u'my unicode string', str)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AssertionError
And if this import is in effect on Python 2:
>>> from __future__ import unicode_literals
then the fifth test fails too:
>>> assert isinstance('unicode string 2', str)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AssertionError
After importing the builtins from future
, all these tests pass on
Python 2 as on Python 3:
>>> from builtins import bytes, int, str
>>> assert isinstance(10, int)
>>> assert isinstance(10**100, int)
>>> assert isinstance(b'my byte-string', bytes)
>>> assert isinstance(u'unicode string 1', str)
However, note that the last test requires that unicode_literals
be imported to succeed.:
>>> from __future__ import unicode_literals
>>> assert isinstance('unicode string 2', str)
This works because the backported types int
, bytes
and str
(and others) have metaclasses that override __instancecheck__
. See PEP 3119
for details.
Passing data to/from Python 2 libraries¶
If you are passing any of the backported types (bytes
, int
, dict,
``str
) into brittle library code that performs type-checks using type()
,
rather than isinstance()
, or requires that you pass Python 2’s native types
(rather than subclasses) for some other reason, it may be necessary to upcast
the types from future
to their native superclasses on Py2.
The native
function in future.utils
is provided for this. Here is how
to use it. (The output showing is from Py2):
>>> from builtins import int, bytes, str
>>> from future.utils import native
>>> a = int(10**20) # Py3-like long int
>>> a
100000000000000000000
>>> type(a)
future.types.newint.newint
>>> native(a)
100000000000000000000L
>>> type(native(a))
long
>>> b = bytes(b'ABC')
>>> type(b)
future.types.newbytes.newbytes
>>> native(b)
'ABC'
>>> type(native(b))
str
>>> s = str(u'ABC')
>>> type(s)
future.types.newstr.newstr
>>> native(s)
u'ABC'
>>> type(native(s))
unicode
On Py3, the native()
function is a no-op.
Native string type¶
Some library code, include standard library code like the array.array()
constructor, require native strings on Python 2 and Python 3. This means that
there is no simple way to pass the appropriate string type when the
unicode_literals
import from __future__
is in effect.
The objects native_str
and native_bytes
are available in
future.utils
for this case. These are equivalent to the str
and
bytes
objects in __builtin__
on Python 2 or in builtins
on Python 3.
The functions native_str_to_bytes
and bytes_to_native_str
are also
available for more explicit conversions.
open()¶
The Python 3 builtin open()
function for opening files returns file
contents as (unicode) strings unless the binary (b
) flag is passed, as in:
open(filename, 'rb')
in which case its methods like read()
return Py3 bytes
objects.
On Py2 with future
installed, the builtins
module provides an
open
function that is mostly compatible with that on Python 3 (e.g. it
offers keyword arguments like encoding
). This maps to the open
backport
available in the standard library io
module on Py2.7.
One difference to be aware of between the Python 3 open
and
future.builtins.open
on Python 2 is that the return types of methods such
as read()
from the file object that open
returns are not
automatically cast from native bytes or unicode strings on Python 2 to the
corresponding future.builtins.bytes
or future.builtins.str
types. If you
need the returned data to behave the exactly same way on Py2 as on Py3, you can
cast it explicitly as follows:
from __future__ import unicode_literals
from builtins import open, bytes
data = open('image.png', 'rb').read()
# On Py2, data is a standard 8-bit str with loose Unicode coercion.
# data + u'' would likely raise a UnicodeDecodeError
data = bytes(data)
# Now it behaves like a Py3 bytes object...
assert data[:4] == b'\x89PNG'
assert data[4] == 13 # integer
# Raises TypeError:
# data + u''
Custom __str__ methods¶
If you define a custom __str__
method for any of your classes,
functions like print()
expect __str__
on Py2 to return a byte
string, whereas on Py3 they expect a (unicode) string.
Use the following decorator to map the __str__
to __unicode__
on
Py2 and define __str__
to encode it as utf-8:
from future.utils import python_2_unicode_compatible
@python_2_unicode_compatible
class MyClass(object):
def __str__(self):
return u'Unicode string: \u5b54\u5b50'
a = MyClass()
# This then prints the name of a Chinese philosopher:
print(a)
This decorator is identical to the decorator of the same name in
django.utils.encoding
.
This decorator is a no-op on Python 3.
Custom iterators¶
If you define your own iterators, there is an incompatibility in the method name
to retrieve the next item across Py3 and Py2. On Python 3 it is __next__
,
whereas on Python 2 it is next
.
The most elegant solution to this is to derive your custom iterator class from
builtins.object
and define a __next__
method as you normally
would on Python 3. On Python 2, object
then refers to the
future.types.newobject
base class, which provides a fallback next
method that calls your __next__
. Use it as follows:
from builtins import object
class Upper(object):
def __init__(self, iterable):
self._iter = iter(iterable)
def __next__(self): # Py3-style iterator interface
return next(self._iter).upper()
def __iter__(self):
return self
itr = Upper('hello')
assert next(itr) == 'H'
assert next(itr) == 'E'
assert list(itr) == list('LLO')
You can use this approach unless you are defining a custom iterator as a
subclass of a base class defined elsewhere that does not derive from
newobject
. In that case, you can provide compatibility across
Python 2 and Python 3 using the next
function from future.builtins
:
from builtins import next
from some_module import some_base_class
class Upper2(some_base_class):
def __init__(self, iterable):
self._iter = iter(iterable)
def __next__(self): # Py3-style iterator interface
return next(self._iter).upper()
def __iter__(self):
return self
itr2 = Upper2('hello')
assert next(itr2) == 'H'
assert next(itr2) == 'E'
next()
also works with regular Python 2 iterators with a .next
method:
itr3 = iter(['one', 'three', 'five'])
assert 'next' in dir(itr3)
assert next(itr3) == 'one'
This approach is feasible whenever your code calls the next()
function
explicitly. If you consume the iterator implicitly in a for
loop or
list()
call or by some other means, the future.builtins.next
function
will not help; the third assertion below would fail on Python 2:
itr2 = Upper2('hello')
assert next(itr2) == 'H'
assert next(itr2) == 'E'
assert list(itr2) == list('LLO') # fails because Py2 implicitly looks
# for a ``next`` method.
Instead, you can use a decorator called implements_iterator
from
future.utils
to allow Py3-style iterators to work identically on Py2, even
if they don’t inherit from future.builtins.object
. Use it as follows:
from future.utils import implements_iterator
Upper2 = implements_iterator(Upper2)
print(list(Upper2('hello')))
# prints ['H', 'E', 'L', 'L', 'O']
This can of course also be used with the @
decorator syntax when defining
the iterator as follows:
@implements_iterator
class Upper2(some_base_class):
def __init__(self, iterable):
self._iter = iter(iterable)
def __next__(self): # note the Py3 interface
return next(self._iter).upper()
def __iter__(self):
return self
On Python 3, as usual, this decorator does nothing.
Binding a method to a class¶
Python 2 draws a distinction between bound and unbound methods, whereas
in Python 3 this distinction is gone: unbound methods have been removed
from the language. To bind a method to a class compatibly across Python
3 and Python 2, you can use the bind_method()
helper function:
from future.utils import bind_method
class Greeter(object):
pass
def greet(self, message):
print(message)
bind_method(Greeter, 'greet', greet)
g = Greeter()
g.greet('Hi!')
On Python 3, calling bind_method(cls, name, func)
is equivalent to
calling setattr(cls, name, func)
. On Python 2 it is equivalent to:
import types
setattr(cls, name, types.MethodType(func, None, cls))
Metaclasses¶
Python 3 and Python 2 syntax for metaclasses are incompatible.
future
provides a function (from jinja2/_compat.py
) called
with_metaclass()
that can assist with specifying metaclasses
portably across Py3 and Py2. Use it like this:
from future.utils import with_metaclass
class BaseForm(object):
pass
class FormType(type):
pass
class Form(with_metaclass(FormType, BaseForm)):
pass