What else you need to know

The following points are important to know about when writing Python 2/3 compatible code.

bytes

Handling bytes consistently and correctly has traditionally been one of the most difficult tasks in writing a Py2/3 compatible codebase. This is because the Python 2 bytes object is simply an alias for Python 2’s str, rather than a true implementation of the Python 3 bytes object, which is substantially different.

future contains a backport of the bytes object from Python 3 which passes most of the Python 3 tests for bytes. (See tests/test_future/test_bytes.py in the source tree.) You can use it as follows:

>>> from builtins import bytes
>>> b = bytes(b'ABCD')

On Py3, this is simply the builtin bytes object. On Py2, this object is a subclass of Python 2’s str that enforces the same strict separation of unicode strings and byte strings as Python 3’s bytes object:

>>> b + u'EFGH'      # TypeError
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: argument can't be unicode string

>>> bytes(b',').join([u'Fred', u'Bill'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: sequence item 0: expected bytes, found unicode string

>>> b == u'ABCD'
False

>>> b < u'abc'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unorderable types: bytes() and <type 'unicode'>

In most other ways, these bytes objects have identical behaviours to Python 3’s bytes:

b = bytes(b'ABCD')
assert list(b) == [65, 66, 67, 68]
assert repr(b) == "b'ABCD'"
assert b.split(b'B') == [b'A', b'CD']

Currently the easiest way to ensure identical behaviour of byte-strings in a Py2/3 codebase is to wrap all byte-string literals b'...' in a bytes() call as follows:

from builtins import bytes

# ...

b = bytes(b'This is my bytestring')

# ...

This is not perfect, but it is superior to manually debugging and fixing code incompatibilities caused by the many differences between Py3 bytes and Py2 strings.

The bytes type from builtins also provides support for the surrogateescape error handler on Python 2.x. Here is an example that works identically on Python 2.x and 3.x:

>>> from builtins import bytes
>>> b = bytes(b'\xff')
>>> b.decode('utf-8', 'surrogateescape')
'\udcc3'

This feature is in alpha. Please leave feedback here about whether this works for you.

str

The str object in Python 3 is quite similar but not identical to the Python 2 unicode object.

The major difference is the stricter type-checking of Py3’s str that enforces a distinction between unicode strings and byte-strings, such as when comparing, concatenating, joining, or replacing parts of strings.

There are also other differences, such as the repr of unicode strings in Py2 having a u'...' prefix, versus simply '...', and the removal of the str.decode() method in Py3.

future contains a newstr type that is a backport of the str object from Python 3. This inherits from the Python 2 unicode class but has customizations to improve compatibility with Python 3’s str object. You can use it as follows:

>>> from __future__ import unicode_literals
>>> from builtins import str

On Py2, this gives us:

>>> str
future.types.newstr.newstr

(On Py3, it is simply the usual builtin str object.)

Then, for example, the following code has the same effect on Py2 as on Py3:

>>> s = str(u'ABCD')
>>> assert s != b'ABCD'
>>> assert isinstance(s.encode('utf-8'), bytes)
>>> assert isinstance(b.decode('utf-8'), str)

These raise TypeErrors:

>>> bytes(b'B') in s
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'in <string>' requires string as left operand, not <type 'str'>

>>> s.find(bytes(b'A'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: argument can't be <type 'str'>

Various other operations that mix strings and bytes or other types are permitted on Py2 with the newstr class even though they are illegal with Python 3. For example:

>>> s2 = b'/' + str('ABCD')
>>> s2
'/ABCD'
>>> type(s2)
future.types.newstr.newstr

This is allowed for compatibility with parts of the Python 2 standard library and various third-party libraries that mix byte-strings and unicode strings loosely. One example is os.path.join on Python 2, which attempts to add the byte-string b'/' to its arguments, whether or not they are unicode. (See posixpath.py.) Another example is the escape() function in Django 1.4’s django.utils.html.

In most other ways, these builtins.str objects on Py2 have the same behaviours as Python 3’s str:

>>> s = str('ABCD')
>>> assert repr(s) == 'ABCD'      # consistent repr with Py3 (no u prefix)
>>> assert list(s) == ['A', 'B', 'C', 'D']
>>> assert s.split('B') == ['A', 'CD']

The str type from builtins also provides support for the surrogateescape error handler on Python 2.x. Here is an example that works identically on Python 2.x and 3.x:

>>> from builtins import str
>>> s = str(u'\udcff')
>>> s.encode('utf-8', 'surrogateescape')
b'\xff'

This feature is in alpha. Please leave feedback here about whether this works for you.

dict

Python 3 dictionaries have .keys(), .values(), and .items() methods which return memory-efficient set-like iterator objects, not lists. (See PEP 3106.)

If your dictionaries are small, performance is not critical, and you don’t need the set-like behaviour of iterator objects from Python 3, you can of course stick with standard Python 3 code in your Py2/3 compatible codebase:

# Assuming d is a native dict ...

for key in d:
    # code here

for item in d.items():
    # code here

for value in d.values():
    # code here

In this case there will be memory overhead of list creation on Py2 for each call to items, values or keys.

For improved efficiency, future.builtins (aliased to builtins) provides a Python 2 dict subclass whose keys(), values(), and items() methods return iterators on all versions of Python >= 2.7. On Python 2.7, these iterators also have the same set-like view behaviour as dictionaries in Python 3. This can streamline code that iterates over large dictionaries. For example:

from __future__ import print_function
from builtins import dict, range

# Memory-efficient construction:
d = dict((i, i**2) for i in range(10**7))

assert not isinstance(d.items(), list)

# Because items() is memory-efficient, so is this:
d2 = dict((v, k) for (k, v) in d.items())

As usual, on Python 3 dict imported from either builtins or future.builtins is just the built-in dict class.

Memory-efficiency and alternatives

If you already have large native dictionaries, the downside to wrapping them in a dict call is that memory is copied (on both Py3 and on Py2). For example:

# This allocates and then frees a large amount of temporary memory:
d = dict({i: i**2 for i in range(10**7)})

If dictionary methods like values and items are called only once, this obviously negates the memory benefits offered by the overridden methods through not creating temporary lists.

The memory-efficient (and CPU-efficient) alternatives are:

  • to construct a dictionary from an iterator. The above line could use a generator like this:

    d = dict((i, i**2) for i in range(10**7))
    
  • to construct an empty dictionary with a dict() call using builtins.dict (rather than {}) and then update it;

  • to use the viewitems etc. functions from future.utils, passing in regular dictionaries:

    from future.utils import viewkeys, viewvalues, viewitems
    
    for (key, value) in viewitems(hugedictionary):
        # some code here
    
    # Set intersection:
    d = {i**2: i for i in range(1000)}
    both = viewkeys(d) & set(range(0, 1000, 7))
    
    # Set union:
    both = viewvalues(d1) | viewvalues(d2)
    

For compatibility, the functions iteritems etc. are also available in future.utils. These are equivalent to the functions of the same names in six, which is equivalent to calling the iteritems etc. methods on Python 2, or to calling items etc. on Python 3.

int

Python 3’s int type is very similar to Python 2’s long, except for the representation (which omits the L suffix in Python 2). Python 2’s usual (short) integers have been removed from Python 3, as has the long builtin name.

Python 3:

>>> 2**64
18446744073709551616

Python 2:

>>> 2**64
18446744073709551616L

future includes a backport of Python 3’s int that is a subclass of Python 2’s long with the same representation behaviour as Python 3’s int. To ensure an integer is long compatibly with both Py3 and Py2, cast it like this:

>>> from builtins import int
>>> must_be_a_long_integer = int(1234)

The backported int object helps with writing doctests and simplifies code that deals with long and int as special cases on Py2. An example is the following code from xlwt-future (called by the xlwt.antlr.BitSet class) for writing out Excel .xls spreadsheets. With future, the code is:

from builtins import int

def longify(data):
    """
    Turns data (an int or long, or a list of ints or longs) into a
    list of longs.
    """
    if not data:
        return [int(0)]
    if not isinstance(data, list):
        return [int(data)]
    return list(map(int, data))

Without future (or with future < 0.7), this might be:

def longify(data):
    """
    Turns data (an int or long, or a list of ints or longs) into a
    list of longs.
    """
    if not data:
        if PY3:
            return [0]
        else:
            return [long(0)]
    if not isinstance(data,list):
        if PY3:
            return [int(data)]
        else:
            return [long(data)]
    if PY3:
        return list(map(int, data))   # same as returning data, but with up-front typechecking
    else:
        return list(map(long, data))

isinstance

The following tests all pass on Python 3:

>>> assert isinstance(2**62, int)
>>> assert isinstance(2**63, int)
>>> assert isinstance(b'my byte-string', bytes)
>>> assert isinstance(u'unicode string 1', str)
>>> assert isinstance('unicode string 2', str)

However, two of these normally fail on Python 2:

>>> assert isinstance(2**63, int)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AssertionError

>>> assert isinstance(u'my unicode string', str)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AssertionError

And if this import is in effect on Python 2:

>>> from __future__ import unicode_literals

then the fifth test fails too:

>>> assert isinstance('unicode string 2', str)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AssertionError

After importing the builtins from future, all these tests pass on Python 2 as on Python 3:

>>> from builtins import bytes, int, str

>>> assert isinstance(10, int)
>>> assert isinstance(10**100, int)
>>> assert isinstance(b'my byte-string', bytes)
>>> assert isinstance(u'unicode string 1', str)

However, note that the last test requires that unicode_literals be imported to succeed.:

>>> from __future__ import unicode_literals
>>> assert isinstance('unicode string 2', str)

This works because the backported types int, bytes and str (and others) have metaclasses that override __instancecheck__. See PEP 3119 for details.

Passing data to/from Python 2 libraries

If you are passing any of the backported types (bytes, int, dict, ``str) into brittle library code that performs type-checks using type(), rather than isinstance(), or requires that you pass Python 2’s native types (rather than subclasses) for some other reason, it may be necessary to upcast the types from future to their native superclasses on Py2.

The native function in future.utils is provided for this. Here is how to use it. (The output showing is from Py2):

>>> from builtins import int, bytes, str
>>> from future.utils import native

>>> a = int(10**20)     # Py3-like long int
>>> a
100000000000000000000
>>> type(a)
future.types.newint.newint
>>> native(a)
100000000000000000000L
>>> type(native(a))
long

>>> b = bytes(b'ABC')
>>> type(b)
future.types.newbytes.newbytes
>>> native(b)
'ABC'
>>> type(native(b))
str

>>> s = str(u'ABC')
>>> type(s)
future.types.newstr.newstr
>>> native(s)
u'ABC'
>>> type(native(s))
unicode

On Py3, the native() function is a no-op.

Native string type

Some library code, include standard library code like the array.array() constructor, require native strings on Python 2 and Python 3. This means that there is no simple way to pass the appropriate string type when the unicode_literals import from __future__ is in effect.

The objects native_str and native_bytes are available in future.utils for this case. These are equivalent to the str and bytes objects in __builtin__ on Python 2 or in builtins on Python 3.

The functions native_str_to_bytes and bytes_to_native_str are also available for more explicit conversions.

open()

The Python 3 builtin open() function for opening files returns file contents as (unicode) strings unless the binary (b) flag is passed, as in:

open(filename, 'rb')

in which case its methods like read() return Py3 bytes objects.

On Py2 with future installed, the builtins module provides an open function that is mostly compatible with that on Python 3 (e.g. it offers keyword arguments like encoding). This maps to the open backport available in the standard library io module on Py2.7.

One difference to be aware of between the Python 3 open and future.builtins.open on Python 2 is that the return types of methods such as read() from the file object that open returns are not automatically cast from native bytes or unicode strings on Python 2 to the corresponding future.builtins.bytes or future.builtins.str types. If you need the returned data to behave the exactly same way on Py2 as on Py3, you can cast it explicitly as follows:

from __future__ import unicode_literals
from builtins import open, bytes

data = open('image.png', 'rb').read()
# On Py2, data is a standard 8-bit str with loose Unicode coercion.
# data + u'' would likely raise a UnicodeDecodeError

data = bytes(data)
# Now it behaves like a Py3 bytes object...

assert data[:4] == b'\x89PNG'
assert data[4] == 13     # integer
# Raises TypeError:
# data + u''

Custom __str__ methods

If you define a custom __str__ method for any of your classes, functions like print() expect __str__ on Py2 to return a byte string, whereas on Py3 they expect a (unicode) string.

Use the following decorator to map the __str__ to __unicode__ on Py2 and define __str__ to encode it as utf-8:

from future.utils import python_2_unicode_compatible

@python_2_unicode_compatible
class MyClass(object):
    def __str__(self):
        return u'Unicode string: \u5b54\u5b50'
a = MyClass()

# This then prints the name of a Chinese philosopher:
print(a)

This decorator is identical to the decorator of the same name in django.utils.encoding.

This decorator is a no-op on Python 3.

Custom iterators

If you define your own iterators, there is an incompatibility in the method name to retrieve the next item across Py3 and Py2. On Python 3 it is __next__, whereas on Python 2 it is next.

The most elegant solution to this is to derive your custom iterator class from builtins.object and define a __next__ method as you normally would on Python 3. On Python 2, object then refers to the future.types.newobject base class, which provides a fallback next method that calls your __next__. Use it as follows:

from builtins import object

class Upper(object):
    def __init__(self, iterable):
        self._iter = iter(iterable)
    def __next__(self):                 # Py3-style iterator interface
        return next(self._iter).upper()
    def __iter__(self):
        return self

itr = Upper('hello')
assert next(itr) == 'H'
assert next(itr) == 'E'
assert list(itr) == list('LLO')

You can use this approach unless you are defining a custom iterator as a subclass of a base class defined elsewhere that does not derive from newobject. In that case, you can provide compatibility across Python 2 and Python 3 using the next function from future.builtins:

from builtins import next

from some_module import some_base_class

class Upper2(some_base_class):
    def __init__(self, iterable):
        self._iter = iter(iterable)
    def __next__(self):                 # Py3-style iterator interface
        return next(self._iter).upper()
    def __iter__(self):
        return self

itr2 = Upper2('hello')
assert next(itr2) == 'H'
assert next(itr2) == 'E'

next() also works with regular Python 2 iterators with a .next method:

itr3 = iter(['one', 'three', 'five'])
assert 'next' in dir(itr3)
assert next(itr3) == 'one'

This approach is feasible whenever your code calls the next() function explicitly. If you consume the iterator implicitly in a for loop or list() call or by some other means, the future.builtins.next function will not help; the third assertion below would fail on Python 2:

itr2 = Upper2('hello')

assert next(itr2) == 'H'
assert next(itr2) == 'E'
assert list(itr2) == list('LLO')      # fails because Py2 implicitly looks
                                      # for a ``next`` method.

Instead, you can use a decorator called implements_iterator from future.utils to allow Py3-style iterators to work identically on Py2, even if they don’t inherit from future.builtins.object. Use it as follows:

from future.utils import implements_iterator

Upper2 = implements_iterator(Upper2)

print(list(Upper2('hello')))
# prints ['H', 'E', 'L', 'L', 'O']

This can of course also be used with the @ decorator syntax when defining the iterator as follows:

@implements_iterator
class Upper2(some_base_class):
    def __init__(self, iterable):
        self._iter = iter(iterable)
    def __next__(self):                 # note the Py3 interface
        return next(self._iter).upper()
    def __iter__(self):
        return self

On Python 3, as usual, this decorator does nothing.

Binding a method to a class

Python 2 draws a distinction between bound and unbound methods, whereas in Python 3 this distinction is gone: unbound methods have been removed from the language. To bind a method to a class compatibly across Python 3 and Python 2, you can use the bind_method() helper function:

from future.utils import bind_method

class Greeter(object):
    pass

def greet(self, message):
    print(message)

bind_method(Greeter, 'greet', greet)

g = Greeter()
g.greet('Hi!')

On Python 3, calling bind_method(cls, name, func) is equivalent to calling setattr(cls, name, func). On Python 2 it is equivalent to:

import types
setattr(cls, name, types.MethodType(func, None, cls))

Metaclasses

Python 3 and Python 2 syntax for metaclasses are incompatible. future provides a function (from jinja2/_compat.py) called with_metaclass() that can assist with specifying metaclasses portably across Py3 and Py2. Use it like this:

from future.utils import with_metaclass

class BaseForm(object):
    pass

class FormType(type):
    pass

class Form(with_metaclass(FormType, BaseForm)):
    pass