Python FAQ

Sudeshna

5 months ago

Table of Contents

Add a header to begin generating the table of contents

General Python FAQ Frequently Asked Questions

What is Python ?

Python is an interpreted, interactive, object-oriented programming language. It incorporates modules, exceptions, dynamic typing, very high level dynamic data types, and classes. It supports multiple programming paradigms beyond object-oriented programming, such as procedural and functional programming. Python combines remarkable power with very clear syntax. It has interfaces to many system calls and libraries, as well as to various window systems, and is extensible in C or C++. It is also usable as an extension language for applications that need a programmable interface. Finally, Python is portable: it runs on many Unix variants including Linux and macOS, and on Windows.

What is the Python Software Foundation ?

The Python Software Foundation is an independent non-profit organization that holds the copyright on Python versions 2.1 and newer. The PSF’s mission is to advance open source technology related to the Python programming language and to publicize the use of Python.

Are there copyright restrictions on the use of Python ?

You can do anything you want with the source, as long as you leave the copyrights in and display those copyrights in any documentation about Python that you produce. If you honor the copyright rules, it’s OK to use Python for commercial use, to sell copies of Python in source or binary form (modified or unmodified), or to sell products that incorporate Python in some form. We would still like to know about all commercial use of Python, of course.

Why was Python created in the first place ?

Here’s a very brief summary of what started it all, written by Guido van Rossum:

I had extensive experience with implementing an interpreted language in the ABC group at CWI, and from working with this group I had learned a lot about language design. This is the origin of many Python features, including the use of indentation for statement grouping and the inclusion of very-high-level data types (although the details are all different in Python).
I had a number of gripes about the ABC language, but also liked many of its features. It was impossible to extend the ABC language (or its implementation) to remedy my complaints – in fact its lack of extensibility was one of its biggest problems. I had some experience with using Modula-2+ and talked with the designers of Modula-3 and read the Modula-3 report. Modula-3 is the origin of the syntax and semantics used for exceptions, and some other Python features.
I was working in the Amoeba distributed operating system group at CWI. We needed a better way to do system administration than by writing either C programs or Bourne shell scripts, since Amoeba had its own system call interface which wasn’t easily accessible from the Bourne shell. My experience with error handling in Amoeba made me acutely aware of the importance of exceptions as a programming language feature.
It occurred to me that a scripting language with a syntax like ABC but with access to the Amoeba system calls would fill the need. I realized that it would be foolish to write an Amoeba-specific language, so I decided that I needed a language that was generally extensible.
During the 1989 Christmas holidays, I had a lot of time on my hand, so I decided to give it a try. During the next year, while still mostly working on it in my own time, Python was used in the Amoeba project with increasing success, and the feedback from colleagues made me add many early improvements.
In February 1991, after just over a year of development, I decided to post to USENET. The rest is in the Misc/HISTORY file.

What is Python good for ?

Python is a high-level general-purpose programming language that can be applied to many different classes of problems.

The language comes with a large standard library that covers areas such as string processing (regular expressions, Unicode, calculating differences between files), internet protocols (HTTP, FTP, SMTP, XML-RPC, POP, IMAP), software engineering (unit testing, logging, profiling, parsing Python code), and operating system interfaces (system calls, filesystems, TCP/IP sockets). Look at the table of contents for The Python Standard Library to get an idea of what’s available. A wide variety of third-party extensions are also available. Consult the Python Package Index to find packages of interest to you.

How does the Python version numbering scheme work ?

Python versions are numbered “A.B.C” or “A.B”:

A is the major version number – it is only incremented for really major changes in the language.
B is the minor version number – it is incremented for less earth-shattering changes.
C is the micro version number – it is incremented for each bugfix release.

Not all releases are bugfix releases. In the run-up to a new feature release, a series of development releases are made, denoted as alpha, beta, or release candidate. Alphas are early releases in which interfaces aren’t yet finalized; it’s not unexpected to see an interface change between two alpha releases. Betas are more stable, preserving existing interfaces but possibly adding new modules, and release candidates are frozen, making no changes except as needed to fix critical bugs.

Alpha, beta and release candidate versions have an additional suffix:

The suffix for an alpha version is “aN” for some small number N.
The suffix for a beta version is “bN” for some small number N.
The suffix for a release candidate version is “rcN” for some small number N.

In other words, all versions labeled 2.0aN precede the versions labeled 2.0bN, which precede versions labeled 2.0rcN, and those precede 2.0.

You may also find version numbers with a “+” suffix, e.g. “2.2+”. These are unreleased versions, built directly from the CPython development repository. In practice, after a final minor release is made, the version is incremented to the next minor version, which becomes the “a0” version, e.g. “2.4a0”.

See the Developer’s Guide for more information about the development cycle, and PEP 387 to learn more about Python’s backward compatibility policy. See also the documentation for sys.version, sys.hexversion, and sys.version_info.

How do I obtain a copy of the Python source ?

The latest Python source distribution is always available from python.org, at https://www.python.org/downloads/. The latest development sources can be obtained at https://github.com/python/cpython/.

The source distribution is a gzipped tar file containing the complete C source, Sphinx-formatted documentation, Python library modules, example programs, and several useful pieces of freely distributable software. The source will compile and run out of the box on most UNIX platforms.

Consult the Getting Started section of the Python Developer’s Guide for more information on getting the source code and compiling it.

How do I get documentation on Python?

The standard documentation for the current stable version of Python is available at https://docs.python.org/3/. PDF, plain text, and downloadable HTML versions are also available at https://docs.python.org/3/download.html.

The documentation is written in reStructuredText and processed by the Sphinx documentation tool. The reStructuredText source for the documentation is part of the Python source distribution.

I’ve never programmed before. Is there a Python tutorial?

Numerous tutorials and books are available. The standard documentation includes The Python Tutorial. Consult the Beginner’s Guide to find information for beginning Python programmers, including lists of tutorials.

Is there a newsgroup or mailing list devoted to Python ?

There is a newsgroup, comp.lang.python, and a mailing list, python-list. The newsgroup and mailing list are gatewayed into each other – if you can read news it’s unnecessary to subscribe to the mailing list. comp.lang.python is high-traffic, receiving hundreds of postings every day, and Usenet readers are often more able to cope with this volume.

Announcements of new software releases and events can be found in comp.lang.python.announce, a low-traffic moderated list that receives about five postings per day. It’s available as the python-announce mailing list.

More info about other mailing lists and newsgroups can be found at https://www.python.org/community/lists/.

How do I get a beta test version of Python ?

Alpha and beta releases are available from https://www.python.org/downloads/. All releases are announced on the comp.lang.python and comp.lang.python.announce newsgroups and on the Python home page at https://www.python.org/; an RSS feed of news is available.

You can also access the development version of Python through Git. See The Python Developer’s Guide for details.

How do I submit bug reports and patches for Python ?

To report a bug or submit a patch, use the issue tracker at https://github.com/python/cpython/issues.

For more information on how Python is developed, consult the Python Developer’s Guide.

Are there any published articles about Python that I can reference ?

It’s probably best to cite your favorite book about Python.

The very first article about Python was written in 1991 and is now quite outdated.

Guido van Rossum and Jelke de Boer, “Interactively Testing Remote Servers Using the Python Programming Language”, CWI Quarterly, Volume 4, Issue 4 (December 1991), Amsterdam, pp 283–303.

Are there any books on Python ?

Yes, there are many, and more are being published. See the python.org wiki at https://wiki.python.org/moin/PythonBooks for a list.

You can also search online bookstores for “Python” and filter out the Monty Python references; or perhaps search for “Python” and “language”.

Where in the world is www.python.org located ?

The Python project’s infrastructure is located all over the world and is managed by the Python Infrastructure Team. Details here.

Why is it called Python ?

When he began implementing Python, Guido van Rossum was also reading the published scripts from “Monty Python’s Flying Circus”, a BBC comedy series from the 1970s. Van Rossum thought he needed a name that was short, unique, and slightly mysterious, so he decided to call the language Python.

How stable is Python ?

Very stable. New, stable releases have been coming out roughly every 6 to 18 months since 1991, and this seems likely to continue. As of version 3.9, Python will have a new feature release every 12 months (PEP 602).

The developers issue bugfix releases of older versions, so the stability of existing releases gradually improves. Bugfix releases, indicated by a third component of the version number (e.g. 3.5.3, 3.6.2), are managed for stability; only fixes for known problems are included in a bugfix release, and it’s guaranteed that interfaces will remain the same throughout a series of bugfix releases.

The latest stable releases can always be found on the Python download page. Python 3.x is the recommended version and supported by most widely used libraries. Python 2.x is not maintained anymore.

How many people are using Python ?

There are probably millions of users, though it’s difficult to obtain an exact count.

Python is available for free download, so there are no sales figures, and it’s available from many different sites and packaged with many Linux distributions, so download statistics don’t tell the whole story either.

The comp.lang.python newsgroup is very active, but not all Python users post to the group or even read it.

Have any significant projects been done in Python ?

See https://www.python.org/about/success for a list of projects that use Python. Consulting the proceedings for past Python conferences will reveal contributions from many different companies and organizations.

High-profile Python projects include the Mailman mailing list manager and the Zope application server. Several Linux distributions, most notably Red Hat, have written part or all of their installer and system administration software in Python. Companies that use Python internally include Google, Yahoo, and Lucasfilm Ltd.

What new developments are expected for Python in the future ?

See https://peps.python.org/ for the Python Enhancement Proposals (PEPs). PEPs are design documents describing a suggested new feature for Python, providing a concise technical specification and a rationale. Look for a PEP titled “Python X.Y Release Schedule”, where X.Y is a version that hasn’t been publicly released yet.

New development is discussed on the python-dev mailing list.

Is it reasonable to propose incompatible changes to Python ?

In general, no. There are already millions of lines of Python code around the world, so any change in the language that invalidates more than a very small fraction of existing programs has to be frowned upon. Even if you can provide a conversion program, there’s still the problem of updating all documentation; many books have been written about Python, and we don’t want to invalidate them all at a single stroke.

Providing a gradual upgrade path is necessary if a feature has to be changed. PEP 5 describes the procedure followed for introducing backward-incompatible changes while minimizing disruption for users.

Is Python a good language for beginning programmers ?

Yes.

It is still common to start students with a procedural and statically typed language such as Pascal, C, or a subset of C++ or Java. Students may be better served by learning Python as their first language. Python has a very simple and consistent syntax, a large standard library, and, most importantly, using Python in a beginning programming course allows students to concentrate on essential programming skills, such as problem decomposition and data type design. With Python, students can be quickly introduced to basic concepts such as loops and procedures. They can probably even work with user-defined objects in their very first course.

For a student who has never programmed before, using a statically typed language seems unnatural. It presents additional complexity that the student must master and slows the pace of the course. The students are trying to learn to think like a computer, decompose problems, design consistent interfaces, and encapsulate data. While learning to use a statically typed language is important in the long term, it is not necessarily the best topic to address in the students’ first programming course.

Many other aspects of Python make it a good first language. Like Java, Python has a large standard library so that students can be assigned programming projects very early in the course that do something. Assignments aren’t restricted to the standard four-function calculator and check balancing programs. By using the standard library, students can gain the satisfaction of working on realistic applications as they learn the fundamentals of programming. Using the standard library also teaches students about code reuse. Third-party modules such as PyGame are also helpful in extending the students’ reach.

Python’s interactive interpreter enables students to test language features while they’re programming. They can keep a window with the interpreter running while they enter their program’s source in another window. If they can’t remember the methods for a list, they can do something like this:

L = []
dir(L)
['__add__', '__class__', '__contains__', '__delattr__', '__delitem__',
'__dir__', '__doc__', '__eq__', '__format__', '__ge__',
'__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__',
'__imul__', '__init__', '__iter__', '__le__', '__len__', '__lt__',
'__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
'__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__',
'__sizeof__', '__str__', '__subclasshook__', 'append', 'clear',
'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove',
'reverse', 'sort']
[d for d in dir(L) if '__' not in d]
['append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']

help(L.append)
Help on built-in function append:

append(...)
    L.append(object) -> None -- append object to end

L.append(1)
L
[1]

With the interpreter, documentation is never far from the student as they are programming.

There are also good IDEs for Python. IDLE is a cross-platform IDE for Python that is written in Python using Tkinter. Emacs users will be happy to know that there is a very good Python mode for Emacs. All of these programming environments provide syntax highlighting, auto-indenting, and access to the interactive interpreter while coding.

Programming FAQ Frequently Asked Questions

Is there a source code level debugger with breakpoints, single-stepping, etc.?

Yes.

Several debuggers for Python are described below, and the built-in function breakpoint() allows you to drop into any of them. The pdb module is a simple but adequate console-mode debugger for Python. It is part of the standard Python library, and is documented in the Library Reference Manual. You can also write your own debugger by using the code for pdb as an example.

The IDLE interactive development environment, which is part of the standard Python distribution (normally available as Tools/scripts/idle3), includes a graphical debugger.

PythonWin is a Python IDE that includes a GUI debugger based on pdb. The PythonWin debugger colours breakpoints and has quite a few cool features, such as debugging non-PythonWin programs. PythonWin is available as part of pywin32 project and as part of the ActivePython distribution. Eric is an IDE built on PyQt and the Scintilla editing component. trepan3k is a gdb-like debugger.

Visual Studio Code is an IDE with debugging tools that integrates with version-control software.

There are a number of commercial Python IDEs that include graphical debuggers. They include:

Are there tools to help find bugs or perform static analysis ?

Yes. Pylint and Pyflakes do basic checking that will help you catch bugs sooner. Static type checkers such as Mypy, Pyre, and Pytype can check type hints in Python source code.

How can I create a stand-alone binary from a Python script ?

You don’t need the ability to compile Python to C code if all you want is a stand-alone program that users can download and run without having to install the Python distribution first. There are a number of tools that determine the set of modules required by a program and bind these modules together with a Python binary to produce a single executable.

One is to use the freeze tool, which is included in the Python source tree as Tools/freeze. It converts Python bytecode to C arrays; with a C compiler, you can embed all your modules into a new program, which is then linked with the standard Python modules.

It works by scanning your source recursively for import statements (in both forms) and looking for the modules in the standard Python path as well as in the source directory (for built-in modules). It then turns the bytecode for modules written in Python into C code (array initialisers that can be turned into code objects using the marshal module) and creates a custom-made config file that only contains those built-in modules that are actually used in the program. It then compiles the generated C code and links it with the rest of the Python interpreter to form a self-contained binary that acts exactly like your script.

The following packages can help with the creation of console and GUI executables:

Are there coding standards or a style guide for Python programs ?

Yes. The coding style required for standard library modules is documented as PEP 8.

Why am I getting an UnboundLocalError when the variable has a value ?

It can be a surprise to get the UnboundLocalError in previously working code when it is modified by adding an assignment statement somewhere in the body of a function.

This code:

x = 10
def bar():
    print(x)

bar()
10

works, but this code:

x = 10
def foo():
    print(x)
    x += 1

results in an UnboundLocalError:

foo()
Traceback (most recent call last):
  ...
UnboundLocalError: local variable 'x' referenced before assignment

This is because when you make an assignment to a variable in a scope, that variable becomes local to that scope and shadows any similarly named variable in the outer scope. Since the last statement in foo assigns a new value to x, the compiler recognizes it as a local variable. Consequently when the earlier print(x) attempts to print the uninitialized local variable and an error results.

In the example above you can access the outer scope variable by declaring it global:

x = 10
def foobar():
    global x
    print(x)
    x += 1

foobar()
10

This explicit declaration is required in order to remind you that (unlike the superficially analogous situation with class and instance variables) you are actually modifying the value of the variable in the outer scope:

print(x)
11

You can do a similar thing in a nested scope using the nonlocal keyword:

def foo():
   x = 10
   def bar():
       nonlocal x
       print(x)
       x += 1
   bar()
   print(x)

foo()
10
11

What are the rules for local and global variables in Python ?

In Python, variables that are only referenced inside a function are implicitly global. If a variable is assigned a value anywhere within the function’s body, it’s assumed to be a local unless explicitly declared as global.

Though a bit surprising at first, a moment’s consideration explains this. On one hand, requiring global for assigned variables provides a bar against unintended side-effects. On the other hand, if global was required for all global references, you’d be using global all the time. You’d have to declare as global every reference to a built-in function or to a component of an imported module. This clutter would defeat the usefulness of the global declaration for identifying side-effects.

Why do lambdas defined in a loop with different values all return the same result ?

Assume you use a for loop to define a few different lambdas (or even plain functions), e.g.:

squares = []
for x in range(5):
    squares.append(lambda: x**2)

This gives you a list that contains 5 lambdas that calculate x**2. You might expect that, when called, they would return, respectively, 0, 1, 4, 9, and 16. However, when you actually try you will see that they all return 16:

squares[2]()
16
squares[4]()
16

This happens because x is not local to the lambdas, but is defined in the outer scope, and it is accessed when the lambda is called — not when it is defined. At the end of the loop, the value of x is 4, so all the functions now return 4**2, i.e. 16. You can also verify this by changing the value of x and see how the results of the lambdas change:

x = 8
squares[2]()
64

In order to avoid this, you need to save the values in variables local to the lambdas, so that they don’t rely on the value of the global x:

squares = []
for x in range(5):
    squares.append(lambda n=x: n**2)

Here, n=x creates a new variable n local to the lambda and computed when the lambda is defined so that it has the same value that x had at that point in the loop. This means that the value of n will be 0 in the first lambda, 1 in the second, 2 in the third, and so on. Therefore each lambda will now return the correct result:

squares[2]()
4
squares[4]()
16

Note that this behaviour is not peculiar to lambdas, but applies to regular functions too.

How do I share global variables across modules ?

The canonical way to share information across modules within a single program is to create a special module (often called config or cfg). Just import the config module in all modules of your application; the module then becomes available as a global name. Because there is only one instance of each module, any changes made to the module object get reflected everywhere. For example:

config.py:

x = 0   # Default value of the 'x' configuration setting

mod.py:

import config
config.x = 1

main.py:

import config
import mod
print(config.x)

Note that using a module is also the basis for implementing the singleton design pattern, for the same reason.

What are the “best practices” for using import in a module ?

In general, don’t use from modulename import *. Doing so clutters the importer’s namespace, and makes it much harder for linters to detect undefined names.

Import modules at the top of a file. Doing so makes it clear what other modules your code requires and avoids questions of whether the module name is in scope. Using one import per line makes it easy to add and delete module imports, but using multiple imports per line uses less screen space.

It’s good practice if you import modules in the following order:

standard library modules – e.g. sys, os, argparse, re
third-party library modules (anything installed in Python’s site-packages directory) – e.g. dateutil, requests, PIL.Image
locally developed modules

It is sometimes necessary to move imports to a function or class to avoid problems with circular imports. Gordon McMillan says:

Circular imports are fine where both modules use the “import <module>” form of import. They fail when the 2nd module wants to grab a name out of the first (“from module import name”) and the import is at the top level. That’s because names in the 1st are not yet available, because the first module is busy importing the 2nd.

In this case, if the second module is only used in one function, then the import can easily be moved into that function. By the time the import is called, the first module will have finished initializing, and the second module can do its import.

It may also be necessary to move imports out of the top level of code if some of the modules are platform-specific. In that case, it may not even be possible to import all of the modules at the top of the file. In this case, importing the correct modules in the corresponding platform-specific code is a good option.

Only move imports into a local scope, such as inside a function definition, if it’s necessary to solve a problem such as avoiding a circular import or are trying to reduce the initialization time of a module. This technique is especially helpful if many of the imports are unnecessary depending on how the program executes. You may also want to move imports into a function if the modules are only ever used in that function. Note that loading a module the first time may be expensive because of the one time initialization of the module, but loading a module multiple times is virtually free, costing only a couple of dictionary lookups. Even if the module name has gone out of scope, the module is probably available in sys.modules.

Why are default values shared between objects ?

This type of bug commonly bites neophyte programmers. Consider this function:

def foo(mydict={}):  # Danger: shared reference to one dict for all calls
    ... compute something ...
    mydict[key] = value
    return mydict

The first time you call this function, mydict contains a single item. The second time, mydict contains two items because when foo() begins executing, mydict starts out with an item already in it.

It is often expected that a function call creates new objects for default values. This is not what happens. Default values are created exactly once, when the function is defined. If that object is changed, like the dictionary in this example, subsequent calls to the function will refer to this changed object.

By definition, immutable objects such as numbers, strings, tuples, and None, are safe from change. Changes to mutable objects such as dictionaries, lists, and class instances can lead to confusion.

Because of this feature, it is good programming practice to not use mutable objects as default values. Instead, use None as the default value and inside the function, check if the parameter is None and create a new list/dictionary/whatever if it is. For example, don’t write:

def foo(mydict={}):
    ...

but:

def foo(mydict=None):
    if mydict is None:
        mydict = {}  # create a new dict for local namespace

This feature can be useful. When you have a function that’s time-consuming to compute, a common technique is to cache the parameters and the resulting value of each call to the function, and return the cached value if the same value is requested again. This is called “memoizing”, and can be implemented like this:

# Callers can only provide two parameters and optionally pass _cache by keyword
def expensive(arg1, arg2, *, _cache={}):
    if (arg1, arg2) in _cache:
        return _cache[(arg1, arg2)]

    # Calculate the value
    result = ... expensive computation ...
    _cache[(arg1, arg2)] = result           # Store result in the cache
    return result

You could use a global variable containing a dictionary instead of the default value; it’s a matter of taste.

How can I pass optional or keyword parameters from one function to another ?

Collect the arguments using the * and ** specifiers in the function’s parameter list; this gives you the positional arguments as a tuple and the keyword arguments as a dictionary. You can then pass these arguments when calling another function by using * and **:

def f(x, *args, **kwargs):
    ...
    kwargs['width'] = '14.3c'
    ...
    g(x, *args, **kwargs)

What is the difference between arguments and parameters ?

Parameters are defined by the names that appear in a function definition, whereas arguments are the values actually passed to a function when calling it. Parameters define what kind of arguments a function can accept. For example, given the function definition:

def func(foo, bar=None, **kwargs):
    pass

foo, bar and kwargs are parameters of func. However, when calling func, for example:

func(42, bar=314, extra=somevar)

the values 42, 314, and somevar are arguments.

Why did changing list ‘y’ also change list ‘x’ ?

If you wrote code like:

x = []
y = x
y.append(10)
y
[10]
x
[10]

you might be wondering why appending an element to y changed x too.

There are two factors that produce this result:

Variables are simply names that refer to objects. Doing y = x doesn’t create a copy of the list – it creates a new variable y that refers to the same object x refers to. This means that there is only one object (the list), and both x and y refer to it.
Lists are mutable, which means that you can change their content.

After the call to append(), the content of the mutable object has changed from [] to [10]. Since both the variables refer to the same object, using either name accesses the modified value [10].

If we instead assign an immutable object to x:

x = 5  # ints are immutable
y = x
x = x + 1  # 5 can't be mutated, we are creating a new object here
x
6
y
5

we can see that in this case x and y are not equal anymore. This is because integers are immutable, and when we do x = x + 1 we are not mutating the int 5 by incrementing its value; instead, we are creating a new object (the int 6) and assigning it to x (that is, changing which object x refers to). After this assignment we have two objects (the ints 6 and 5) and two variables that refer to them (x now refers to 6 but y still refers to 5).

Some operations (for example y.append(10) and y.sort()) mutate the object, whereas superficially similar operations (for example y = y + [10] and sorted(y)) create a new object. In general in Python (and in all cases in the standard library) a method that mutates an object will return None to help avoid getting the two types of operations confused. So if you mistakenly write y.sort() thinking it will give you a sorted copy of y, you’ll instead end up with None, which will likely cause your program to generate an easily diagnosed error.

However, there is one class of operations where the same operation sometimes has different behaviors with different types: the augmented assignment operators. For example, += mutates lists but not tuples or ints (a_list += [1, 2, 3] is equivalent to a_list.extend([1, 2, 3]) and mutates a_list, whereas some_tuple += (1, 2, 3) and some_int += 1 create new objects).

In other words:

If we have a mutable object (list, dict, set, etc.), we can use some specific operations to mutate it and all the variables that refer to it will see the change.
If we have an immutable object (str, int, tuple, etc.), all the variables that refer to it will always see the same value, but operations that transform that value into a new value always return a new object.

If you want to know if two variables refer to the same object or not, you can use the is operator, or the built-in function id().

How do I write a function with output parameters (call by reference)?

Remember that arguments are passed by assignment in Python. Since assignment just creates references to objects, there’s no alias between an argument name in the caller and callee, and so no call-by-reference per se. You can achieve the desired effect in a number of ways.

By returning a tuple of the results:

def func1(a, b):
    a = 'new-value'        # a and b are local names
    b = b + 1              # assigned to new objects
    return a, b            # return new values

x, y = 'old-value', 99
func1(x, y)
('new-value', 100)

This is almost always the clearest solution.

By using global variables. This isn’t thread-safe, and is not recommended.

By passing a mutable (changeable in-place) object:

def func2(a):
    a[0] = 'new-value'     # 'a' references a mutable list
    a[1] = a[1] + 1        # changes a shared object

args = ['old-value', 99]
func2(args)
args
['new-value', 100]

By passing in a dictionary that gets mutated:

def func3(args):
    args['a'] = 'new-value'     # args is a mutable dictionary
    args['b'] = args['b'] + 1   # change it in-place

args = {'a': 'old-value', 'b': 99}
func3(args)
args
{'a': 'new-value', 'b': 100}

Or bundle up values in a class instance:

class Namespace:
    def __init__(self, /, **args):
        for key, value in args.items():
            setattr(self, key, value)

def func4(args):
    args.a = 'new-value'        # args is a mutable Namespace
    args.b = args.b + 1         # change object in-place

args = Namespace(a='old-value', b=99)
func4(args)
vars(args)
{'a': 'new-value', 'b': 100}

There’s almost never a good reason to get this complicated.

Your best choice is to return a tuple containing the multiple results.

How do you make a higher order function in Python ?

You have two choices: you can use nested scopes or you can use callable objects. For example, suppose you wanted to define linear(a,b) which returns a function f(x) that computes the value a*x+b. Using nested scopes:

def linear(a, b):
    def result(x):
        return a * x + b
    return result

Or using a callable object:

class linear:

    def __init__(self, a, b):
        self.a, self.b = a, b

    def __call__(self, x):
        return self.a * x + self.b

In both cases,

taxes = linear(0.3, 2)

gives a callable object where taxes(10e6) == 0.3 * 10e6 + 2.

The callable object approach has the disadvantage that it is a bit slower and results in slightly longer code. However, note that a collection of callables can share their signature via inheritance:

class exponential(linear):
    # __init__ inherited
    def __call__(self, x):
        return self.a * (x ** self.b)

Object can encapsulate state for several methods:

class counter:

    value = 0

    def set(self, x):
        self.value = x

    def up(self):
        self.value = self.value + 1

    def down(self):
        self.value = self.value - 1

count = counter()
inc, dec, reset = count.up, count.down, count.set

Here inc(), dec() and reset() act like functions which share the same counting variable.

How do I copy an object in Python ?

In general, try copy.copy() or copy.deepcopy() for the general case. Not all objects can be copied, but most can.

Some objects can be copied more easily. Dictionaries have a copy() method:

newdict = olddict.copy()

Sequences can be copied by slicing:

new_l = l[:]

How can I find the methods or attributes of an object ?

For an instance x of a user-defined class, dir(x) returns an alphabetized list of the names containing the instance attributes and methods and attributes defined by its class.

How can my code discover the name of an object ?

Generally speaking, it can’t, because objects don’t really have names. Essentially, assignment always binds a name to a value; the same is true of def and class statements, but in that case the value is a callable. Consider the following code:

class A:
    pass

B = A
a = B()
b = a
print(b)
<__main__.A object at 0x16D07CC>
print(a)
<__main__.A object at 0x16D07CC>

Arguably the class has a name: even though it is bound to two names and invoked through the name B the created instance is still reported as an instance of class A. However, it is impossible to say whether the instance’s name is a or b, since both names are bound to the same value.

Generally speaking it should not be necessary for your code to “know the names” of particular values. Unless you are deliberately writing introspective programs, this is usually an indication that a change of approach might be beneficial.

In comp.lang.python, Fredrik Lundh once gave an excellent analogy in answer to this question:

The same way as you get the name of that cat you found on your porch: the cat (object) itself cannot tell you its name, and it doesn’t really care – so the only way to find out what it’s called is to ask all your neighbours (namespaces) if it’s their cat (object)…
….and don’t be surprised if you’ll find that it’s known by many names, or no name at all!

What’s up with the comma operator’s precedence ?

Comma is not an operator in Python. Consider this session:

"a" in "b", "a"
(False, 'a')

Since the comma is not an operator, but a separator between expressions the above is evaluated as if you had entered:

("a" in "b"), "a"

not:

"a" in ("b", "a")

The same is true of the various assignment operators (=, += etc). They are not truly operators but syntactic delimiters in assignment statements.

Is there an equivalent of C’s “?:” ternary operator ?

Yes, there is. The syntax is as follows:

[on_true] if [expression] else [on_false]

x, y = 50, 25
small = x if x < y else y

Before this syntax was introduced in Python 2.5, a common idiom was to use logical operators:

[expression] and [on_true] or [on_false]

However, this idiom is unsafe, as it can give wrong results when on_true has a false boolean value. Therefore, it is always better to use the ... if ... else ... form.

Is it possible to write obfuscated one-liners in Python ?

Yes. Usually this is done by nesting lambda within lambda. See the following three examples, slightly adapted from Ulf Bartelt:

from functools import reduce

# Primes < 1000
print(list(filter(None,map(lambda y:y*reduce(lambda x,y:x*y!=0,
map(lambda x,y=y:y%x,range(2,int(pow(y,0.5)+1))),1),range(2,1000)))))

# First 10 Fibonacci numbers
print(list(map(lambda x,f=lambda x,f:(f(x-1,f)+f(x-2,f)) if x>1 else 1:
f(x,f), range(10))))

# Mandelbrot set
print((lambda Ru,Ro,Iu,Io,IM,Sx,Sy:reduce(lambda x,y:x+'\n'+y,map(lambda y,
Iu=Iu,Io=Io,Ru=Ru,Ro=Ro,Sy=Sy,L=lambda yc,Iu=Iu,Io=Io,Ru=Ru,Ro=Ro,i=IM,
Sx=Sx,Sy=Sy:reduce(lambda x,y:x+y,map(lambda x,xc=Ru,yc=yc,Ru=Ru,Ro=Ro,
i=i,Sx=Sx,F=lambda xc,yc,x,y,k,f=lambda xc,yc,x,y,k,f:(k<=0)or (x*x+y*y
>=4.0) or 1+f(xc,yc,x*x-y*y+xc,2.0*x*y+yc,k-1,f):f(xc,yc,x,y,k,f):chr(
64+F(Ru+x*(Ro-Ru)/Sx,yc,0,0,i)),range(Sx))):L(Iu+y*(Io-Iu)/Sy),range(Sy
))))(-2.1, 0.7, -1.2, 1.2, 30, 80, 24))
#    \___ ___/  \___ ___/  |   |   |__ lines on screen
#        V          V      |   |______ columns on screen
#        |          |      |__________ maximum of "iterations"
#        |          |_________________ range on y axis
#        |____________________________ range on x axis

Don’t try this at home, kids!

What does the slash(/) in the parameter list of a function mean ?

A slash in the argument list of a function denotes that the parameters prior to it are positional-only. Positional-only parameters are the ones without an externally usable name. Upon calling a function that accepts positional-only parameters, arguments are mapped to parameters based solely on their position. For example, divmod() is a function that accepts positional-only parameters. Its documentation looks like this:

help(divmod)
Help on built-in function divmod in module builtins:

divmod(x, y, /)
    Return the tuple (x//y, x%y).  Invariant: div*y + mod == x.

The slash at the end of the parameter list means that both parameters are positional-only. Thus, calling divmod() with keyword arguments would lead to an error:

divmod(x=3, y=4)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: divmod() takes no keyword arguments

How do I specify hexadecimal and octal integers ?

To specify an octal digit, precede the octal value with a zero, and then a lower or uppercase “o”. For example, to set the variable “a” to the octal value “10” (8 in decimal), type:

>>> a = 0o10
>>> a

Hexadecimal is just as easy. Simply precede the hexadecimal number with a zero, and then a lower or uppercase “x”. Hexadecimal digits can be specified in lower or uppercase. For example, in the Python interpreter:

>>> a = 0 x a5
>>> a
165
>>> b = 0 X B2
>>> b
178

Why does -22 // 10 return -3 ?

It’s primarily driven by the desire to i % j have the same sign as j. If you want that, and also want:

i == (i // j) * j + (i % j)

Then integer division has to return the floor. C also requires that identity to hold, and then compilers that truncate i // j need to i % j have the same sign as i.

There are few real use cases for i % j when j is negative. When j is positive, there are many, and in virtually all of them it’s more useful i % j to be >= 0. If the clock says 10 now, what did it say 200 hours ago? -190 % 12 == 2 is useful; -190 % 12 == -10 is a bug waiting to bite.

How do I get int literal attribute instead of SyntaxError ?

Trying to look up an int literal attribute in the normal manner gives a SyntaxError because the period is seen as a decimal point:

1.__class__
  File "<stdin>", line 1
  1.__class__
   ^
SyntaxError: invalid decimal literal

The solution is to separate the literal from the period with either a space or parentheses.

1 .__class__
<class 'int'>
(1).__class__
<class 'int'>

How do I convert a string to a number ?

For integers, use the built-in int() type constructor, e.g. int('144') == 144. Similarly, float() converts to a floating-point number, e.g. float('144') == 144.0.

By default, these interpret the number as decimal, so that int('0144') == 144 holds true, and int('0x144') raises ValueError. int(string, base) takes the base to convert from as a second optional argument, so int( '0x144', 16) == 324. If the base is specified as 0, the number is interpreted using Python’s rules: a leading ‘0o’ indicates octal, and ‘0x’ indicates a hex number.

Do not use the built-in function eval() if all you need is to convert strings to numbers. eval() will be significantly slower and it presents a security risk: someone could pass you a Python expression that might have unwanted side effects. For example, someone could pass __import__('os').system("rm -rf $HOME") which would erase your home directory.

eval() also has the effect of interpreting numbers as Python expressions, so that e.g. eval('09') gives a syntax error because Python does not allow leading ‘0’ in a decimal number (except ‘0’).

How do I convert a number to a string ?

To convert, e.g., the number 144 to the string '144', use the built-in type constructor str(). If you want a hexadecimal or octal representation, use the built-in functions hex() or oct(). For fancy formatting, see the f-strings and Format String Syntax sections, e.g. "{:04d}".format(144) yields '0144' and "{:.3f}".format(1.0/3.0) yields '0.333'.

How do I modify a string in place ?

You can’t, because strings are immutable. In most situations, you should simply construct a new string from the various parts you want to assemble it. However, if you need an object with the ability to modify in-place Unicode data, try using an io.StringIO object or the array module:

import io
s = "Hello, world"
sio = io.StringIO(s)
sio.getvalue()
'Hello, world'
sio.seek(7)
7
sio.write("there!")
6
sio.getvalue()
'Hello, there!'

import array
a = array.array('w', s)
print(a)
array('w', 'Hello, world')
a[0] = 'y'
print(a)
array('w', 'yello, world')
a.tounicode()
'yello, world'

How do I use strings to call functions/methods ?

There are various techniques.

The best is to use a dictionary that maps strings to functions. The primary advantage of this technique is that the strings do not need to match the names of the functions. This is also the primary technique used to emulate a case construct:
```
def a():
    pass

def b():
    pass

dispatch = {'go': a, 'stop': b}  # Note lack of parens for funcs

dispatch[get_input()]()  # Note trailing parens to call function
```

Use the built-in function getattr():

import foo
getattr(foo, 'bar')()

Note that getattr() works on any object, including classes, class instances, modules, and so on.

This is used in several places in the standard library, like this:

class Foo:
    def do_foo(self):
        ...

    def do_bar(self):
        ...

f = getattr(foo_instance, 'do_' + opname)
f()

Use locals() to resolve the function name:

def myFunc():
    print("hello")

fname = "myFunc"

f = locals()[fname]
f()

Is there an equivalent to Perl’s chomp() for removing trailing newlines from strings ?

You can use S.rstrip("\r\n") to remove all occurrences of any line terminator from the end of the string S without removing other trailing whitespace. If the string S represents more than one line, with several empty lines at the end, the line terminators for all the blank lines will be removed:

lines = ("line 1 \r\n"
         "\r\n"
         "\r\n")
lines.rstrip("\n\r")
'line 1 '

Since this is typically only desired when reading text one line at a time, using S.rstrip() this way works well.

Is there a scanf() or sscanf() equivalent ?

Not as such.

For simple input parsing, the easiest approach is usually to split the line into whitespace-delimited words using the split() method of string objects and then convert decimal strings to numeric values using int() or float(). split() supports an optional “sep” parameter which is useful if the line uses something other than whitespace as a separator.

For more complicated input parsing, regular expressions are more powerful than C’s sscanf and better suited for the task.

What does UnicodeDecodeError or UnicodeEncodeError error mean ?

This HOWTO discusses Python’s support for the Unicode specification for representing textual data and explains various problems that people commonly encounter when working with Unicode.

Introduction to Unicode

Definitions

Today’s programs need to be able to handle a wide variety of characters. Applications are often internationalised to display messages and output in a variety of user-selectable languages; the same program might need to output an error message in English, French, Japanese, Hebrew, or Russian. Web content can be written in any of these languages and can also include a variety of emoji symbols. Python’s string type uses the Unicode Standard for representing characters, which lets Python programs work with all these different possible characters.

Unicode (https://www.unicode.org/) is a specification that aims to list every character used by human languages and give each character its own unique code. The Unicode specifications are continually revised and updated to add new languages and symbols.

A character is the smallest possible component of a text. ‘A’, ‘B’, ‘C’, etc., are all different characters. So are ‘È’ and ‘Í’. Characters vary depending on the language or context you’re talking about. For example, there’s a character for “Roman Numeral One”, ‘Ⅰ’, that’s separate from the uppercase letter ‘I’. They’ll usually look the same, but these are two different characters that have different meanings.

The Unicode standard describes how characters are represented by code points. A code point value is an integer in the range 0 to 0x10FFFF (about 1.1 million values, the actual number assigned is less than that). In the standard and in this document, a code point is written using the notation U+265E to mean the character with value 0x265e (9,822 in decimal).

The Unicode standard contains a lot of tables listing characters and their corresponding code points:

0061    'a'; LATIN SMALL LETTER A
0062    'b'; LATIN SMALL LETTER B
0063    'c'; LATIN SMALL LETTER C
...
007B    '{'; LEFT CURLY BRACKET
...
2167    'Ⅷ'; ROMAN NUMERAL EIGHT
2168    'Ⅸ'; ROMAN NUMERAL NINE
...
265E    '♞'; BLACK CHESS KNIGHT
265F    '♟'; BLACK CHESS PAWN
...
1F600   '😀'; GRINNING FACE
1F609   '😉'; WINKING FACE
...

Strictly, these definitions imply that it’s meaningless to say ‘this is character U+265E’. U+265E is a code point, which represents some particular character; in this case, it represents the character ‘BLACK CHESS KNIGHT’, ‘♞’. In informal contexts, this distinction between code points and characters will sometimes be forgotten.

A character is represented on a screen or on paper by a set of graphical elements that’s called a glyph. The glyph for an uppercase A, for example, is two diagonal strokes and a horizontal stroke, though the exact details will depend on the font being used. Most Python code doesn’t need to worry about glyphs; figuring out the correct glyph to display is generally the job of a GUI toolkit or a terminal’s font renderer.

Encodings

To summarize the previous section: a Unicode string is a sequence of code points, which are numbers from 0 through 0x10FFFF (1,114,111 decimal). This sequence of code points needs to be represented in memory as a set of code units, and code units are then mapped to 8-bit bytes. The rules for translating a Unicode string into a sequence of bytes are called a character encoding, or just an encoding.

The first encoding you might think of is using 32-bit integers as the code unit, and then using the CPU’s representation of 32-bit integers. In this representation, the string “Python” might look like this:

   P           y           t           h           o           n
0x50 00 00 00 79 00 00 00 74 00 00 00 68 00 00 00 6f 00 00 00 6e 00 00 00
   0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

This representation is straightforward but using it presents a number of problems.

It’s not portable; different processors order the bytes differently.
It’s very wasteful of space. In most texts, the majority of the code points are less than 127, or less than 255, so a lot of space is occupied by 0x00 bytes. The above string takes 24 bytes compared to the 6 bytes needed for an ASCII representation. Increased RAM usage doesn’t matter too much (desktop computers have gigabytes of RAM, and strings aren’t usually that large), but expanding our usage of disk and network bandwidth by a factor of 4 is intolerable.
It’s not compatible with existing C functions such as strlen(), so a new family of wide string functions would need to be used.

Therefore this encoding isn’t used very much, and people instead choose other encodings that are more efficient and convenient, such as UTF-8.

UTF-8 is one of the most commonly used encodings, and Python often defaults to using it. UTF stands for “Unicode Transformation Format”, and the ‘8’ means that 8-bit values are used in the encoding. (There are also UTF-16 and UTF-32 encodings, but they are less frequently used than UTF-8.) UTF-8 uses the following rules:

If the code point is < 128, it’s represented by the corresponding byte value.
If the code point is >= 128, it’s turned into a sequence of two, three, or four bytes, where each byte of the sequence is between 128 and 255.

UTF-8 has several convenient properties:

It can handle any Unicode code point.
A Unicode string is turned into a sequence of bytes that contains embedded zero bytes only where they represent the null character (U+0000). This means that UTF-8 strings can be processed by C functions such as strcpy() and sent through protocols that can’t handle zero bytes for anything other than end-of-string markers.
A string of ASCII text is also valid UTF-8 text.
UTF-8 is fairly compact; the majority of commonly used characters can be represented with one or two bytes.
If bytes are corrupted or lost, it’s possible to determine the start of the next UTF-8-encoded code point and resynchronize. It’s also unlikely that random 8-bit data will look like valid UTF-8.
UTF-8 is a byte oriented encoding. The encoding specifies that each character is represented by a specific sequence of one or more bytes. This avoids the byte-ordering issues that can occur with integer and word oriented encodings, like UTF-16 and UTF-32, where the sequence of bytes varies depending on the hardware on which the string was encoded.

References

The Unicode Consortium site has character charts, a glossary, and PDF versions of the Unicode specification. Be prepared for some difficult reading. A chronology of the origin and development of Unicode is also available on the site.

On the Computerphile Youtube channel, Tom Scott briefly discusses the history of Unicode and UTF-8 (9 minutes 36 seconds).

To help understand the standard, Jukka Korpela has written an introductory guide to reading the Unicode character tables.

Another good introductory article was written by Joel Spolsky. If this introduction didn’t make things clear to you, you should try reading this alternate article before continuing.

Wikipedia entries are often helpful; see the entries for “character encoding” and UTF-8, for example.

Can I end a raw string with an odd number of backslashes ?

A raw string ending with an odd number of backslashes will escape the string’s quote:

r'C:\this\will\not\work\'
  File "<stdin>", line 1
    r'C:\this\will\not\work\'
    ^
SyntaxError: unterminated string literal (detected at line 1)

There are several workarounds for this. One is to use regular strings and double the backslashes:

'C:\\this\\will\\work\\'
'C:\\this\\will\\work\\'

Another is to concatenate a regular string containing an escaped backslash to the raw string:

r'C:\this\will\work' '\\'
'C:\\this\\will\\work\\'

It is also possible to use os.path.join() to append a backslash on Windows:

os.path.join(r'C:\this\will\work', '')
'C:\\this\\will\\work\\'

Note that while a backslash will “escape” a quote for the purposes of determining where the raw string ends, no escaping occurs when interpreting the value of the raw string. That is, the backslash remains present in the value of the raw string:

r'backslash\'preserved'
"backslash\\'preserved"

Also see the specification in the language reference.

My program is too slow. How do I speed it up ?

That’s a tough one, in general. First, here are a list of things to remember before diving further:

Performance characteristics vary across Python implementations. This FAQ focuses on CPython.
Behaviour can vary across operating systems, especially when talking about I/O or multi-threading.
You should always find the hot spots in your program before attempting to optimize any code (see the profile module).
Writing benchmark scripts will allow you to iterate quickly when searching for improvements (see the timeit module).
It is highly recommended to have good code coverage (through unit testing or any other technique) before potentially introducing regressions hidden in sophisticated optimizations.

That being said, there are many tricks to speed up Python code. Here are some general principles which go a long way towards reaching acceptable performance levels:

Making your algorithms faster (or changing to faster ones) can yield much larger benefits than trying to sprinkle micro-optimization tricks all over your code.
Use the right data structures. Study documentation for the Built-in Types and the collections module.
When the standard library provides a primitive for doing something, it is likely (although not guaranteed) to be faster than any alternative you may come up with. This is doubly true for primitives written in C, such as builtins and some extension types. For example, be sure to use either the list.sort() built-in method or the related sorted() function to do sorting (and see the Sorting Techniques for examples of moderately advanced usage).
Abstractions tend to create indirections and force the interpreter to work more. If the levels of indirection outweigh the amount of useful work done, your program will be slower. You should avoid excessive abstraction, especially under the form of tiny functions or methods (which are also often detrimental to readability).

If you have reached the limit of what pure Python can allow, there are tools to take you further away. For example, Cython can compile a slightly modified version of Python code into a C extension, and can be used on many different platforms. Cython can take advantage of compilation (and optional type annotations) to make your code significantly faster than when interpreted. If you are confident in your C programming skills, you can also write a C extension module yourself.

What is the most efficient way to concatenate many strings together ?

str and bytes objects are immutable, therefore concatenating many strings together is inefficient as each concatenation creates a new object. In the general case, the total runtime cost is quadratic in the total string length.

To accumulate many str objects, the recommended idiom is to place them into a list and call str.join() at the end:

chunks = []
for s in my_strings:
    chunks.append(s)
result = ''.join(chunks)

(another reasonably efficient idiom is to use io.StringIO)

To accumulate many bytes objects, the recommended idiom is to extend a bytearray object using in-place concatenation (the += operator):

result = bytearray()
for b in my_bytes_objects:
    result += b

How do I convert between tuples and lists ?

The type constructor tuple(seq) converts any sequence (actually, any iterable) into a tuple with the same items in the same order.

For example, tuple([1, 2, 3]) yields (1, 2, 3) and tuple('abc') yields ('a', 'b', 'c'). If the argument is a tuple, it does not make a copy but returns the same object, so it is cheap to call tuple() when you aren’t sure that an object is already a tuple.

The type constructor list(seq) converts any sequence or iterable into a list with the same items in the same order. For example, list((1, 2, 3)) yields [1, 2, 3] and list('abc') yields ['a', 'b', 'c']. If the argument is a list, it makes a copy just like seq[:] would.

What’s a negative index ?

Python sequences are indexed with positive numbers and negative numbers. For positive numbers 0 is the first index 1 is the second index and so forth. For negative indices -1 is the last index and -2 is the penultimate (next to last) index and so forth. Think of seq[-n] as the same as seq[len(seq)-n].

Using negative indices can be very convenient. For example S[:-1] is all of the string except for its last character, which is useful for removing the trailing newline from a string.

How do I iterate over a sequence in reverse order ?

Use the reversed() built-in function:

for x in reversed(sequence):
    ...  # do something with x ...

This won’t touch your original sequence, but build a new copy with reversed order to iterate over.

How do you remove duplicates from a list ?

See the Python Cookbook for a long discussion of many ways to do this:

https://code.activestate.com/recipes/52560/

If you don’t mind reordering the list, sort it and then scan from the end of the list, deleting duplicates as you go:

if mylist:
    mylist.sort()
    last = mylist[-1]
    for i in range(len(mylist)-2, -1, -1):
        if last == mylist[i]:
            del mylist[i]
        else:
            last = mylist[i]

If all elements of the list may be used as set keys (i.e. they are all hashable) this is often faster

mylist = list(set(mylist))

This converts the list into a set, thereby removing duplicates, and then back into a list.

How do you remove multiple items from a list ?

As with removing duplicates, explicitly iterating in reverse with a delete condition is one possibility. However, it is easier and faster to use slice replacement with an implicit or explicit forward iteration. Here are three variations.:

mylist[:] = filter(keep_function, mylist)
mylist[:] = (x for x in mylist if keep_condition)
mylist[:] = [x for x in mylist if keep_condition]

The list comprehension may be fastest.

How do you make an array in Python ?

Use a list:

["this", 1, "is", "an", "array"]

Lists are equivalent to C or Pascal arrays in their time complexity; the primary difference is that a Python list can contain objects of many different types.

The array module also provides methods for creating arrays of fixed types with compact representations, but they are slower to index than lists. Also note that NumPy and other third party packages define array-like structures with various characteristics as well.

To get Lisp-style linked lists, you can emulate cons cells using tuples:

lisp_list = ("like",  ("this",  ("example", None) ) )

If mutability is desired, you could use lists instead of tuples. Here the analogue of a Lisp car is lisp_list[0] and the analogue of cdr is lisp_list[1]. Only do this if you’re sure you really need to, because it’s usually a lot slower than using Python lists.

How do I create a multidimensional list ?

You probably tried to make a multidimensional array like this:

A = [[None] * 2] * 3

This looks correct if you print it:

A
[[None, None], [None, None], [None, None]]

But when you assign a value, it shows up in multiple places:

A[0][0] = 5
A
[[5, None], [5, None], [5, None]]

The reason is that replicating a list with * doesn’t create copies, it only creates references to the existing objects. The *3 creates a list containing 3 references to the same list of length two. Changes to one row will show in all rows, which is almost certainly not what you want.

The suggested approach is to create a list of the desired length first and then fill in each element with a newly created list:

A = [None] * 3
for i in range(3):
    A[i] = [None] * 2

This generates a list containing 3 different lists of length two. You can also use a list comprehension:

w, h = 2, 3
A = [[None] * w for i in range(h)]

Or, you can use an extension that provides a matrix datatype; NumPy is the best known.

How do I apply a method or function to a sequence of objects ?

To call a method or function and accumulate the return values is a list, a list comprehension is an elegant solution:

result = [obj.method() for obj in mylist]

result = [function(obj) for obj in mylist]

To just run the method or function without saving the return values, a plain for loop will suffice:

for obj in mylist:
    obj.method()

for obj in mylist:
    function(obj)

Why does a_tuple[i] += [‘item’] raise an exception when the addition works ?

This is because of a combination of the fact that augmented assignment operators are assignment operators, and the difference between mutable and immutable objects in Python.

This discussion applies in general when augmented assignment operators are applied to elements of a tuple that point to mutable objects, but we’ll use a list and += as our exemplar.

If you wrote:

a_tuple = (1, 2)
a_tuple[0] += 1
Traceback (most recent call last):
   ...
TypeError: 'tuple' object does not support item assignment

The reason for the exception should be immediately clear: 1 is added to the object a_tuple[0] points to (1), producing the result object, 2, but when we attempt to assign the result of the computation, 2, to element 0 of the tuple, we get an error because we can’t change what an element of a tuple points to.

Under the covers, what this augmented assignment statement is doing is approximately this:

result = a_tuple[0] + 1
a_tuple[0] = result
Traceback (most recent call last):
  ...
TypeError: 'tuple' object does not support item assignment

It is the assignment part of the operation that produces the error, since a tuple is immutable.

When you write something like:

a_tuple = (['foo'], 'bar')
a_tuple[0] += ['item']
Traceback (most recent call last):
  ...
TypeError: 'tuple' object does not support item assignment

The exception is a bit more surprising, and even more surprising is the fact that even though there was an error, the append worked:

a_tuple[0]
['foo', 'item']

To see why this happens, you need to know that (a) if an object implements an __iadd__() magic method, it gets called when the += augmented assignment is executed, and its return value is what gets used in the assignment statement; and (b) for lists, __iadd__() is equivalent to calling extend() on the list and returning the list. That’s why we say that for lists, += is a “shorthand” for list.extend():

a_list = []
a_list += [1]
a_list
[1]

This is equivalent to:

result = a_list.__iadd__([1])
a_list = result

The object pointed to by a_list has been mutated, and the pointer to the mutated object is assigned back to a_list. The end result of the assignment is a no-op, since it is a pointer to the same object that a_list was previously pointing to, but the assignment still happens.

Thus, in our tuple example what is happening is equivalent to:

result = a_tuple[0].__iadd__(['item'])
a_tuple[0] = result
Traceback (most recent call last):
  ...
TypeError: 'tuple' object does not support item assignment

The __iadd__() succeeds, and thus the list is extended, but even though result points to the same object that a_tuple[0] already points to, that final assignment still results in an error, because tuples are immutable.

I want to do a complicated sort: can you do a Schwartzian Transform in Python ?

The technique, attributed to Randal Schwartz of the Perl community, sorts the elements of a list by a metric which maps each element to its “sort value”. In Python, use the key argument for the list.sort() method:

Isorted = L[:]
Isorted.sort(key=lambda s: int(s[10:15]))

How can I sort one list by values from another list ?

Merge them into an iterator of tuples, sort the resulting list, and then pick out the element you want.

list1 = ["what", "I'm", "sorting", "by"]
list2 = ["something", "else", "to", "sort"]
pairs = zip(list1, list2)
pairs = sorted(pairs)
pairs
[("I'm", 'else'), ('by', 'sort'), ('sorting', 'to'), ('what', 'something')]
result = [x[1] for x in pairs]
result
['else', 'sort', 'to', 'something']

What is a class ?

A class is the particular object type created by executing a class statement. Class objects are used as templates to create instance objects, which embody both the data (attributes) and code (methods) specific to a datatype.

A class can be based on one or more other classes, called its base class(es). It then inherits the attributes and methods of its base classes. This allows an object model to be successively refined by inheritance. You might have a generic Mailbox class that provides basic accessor methods for a mailbox, and subclasses such as MboxMailbox, MaildirMailbox, OutlookMailbox that handle various specific mailbox formats.

What is a method ?

A method is a function on some object x that you normally call as x.name(arguments...). Methods are defined as functions inside the class definition:

class C:
    def meth(self, arg):
        return arg * 2 + self.attribute

What is self ?

Self is merely a conventional name for the first argument of a method. A method defined as meth(self, a, b, c) should be called as x.meth(a, b, c) for some instance x of the class in which the definition occurs; the called method will think it is called as meth(x, a, b, c).

How do I check if an object is an instance of a given class or of a subclass of it ?

Use the built-in function isinstance(obj, cls). You can check if an object is an instance of any of a number of classes by providing a tuple instead of a single class, e.g. isinstance(obj, (class1, class2, ...)), and can also check whether an object is one of Python’s built-in types, e.g. isinstance(obj, str) or isinstance(obj, (int, float, complex)).

Note that isinstance() also checks for virtual inheritance from an abstract base class. So, the test will return True for a registered class even if hasn’t directly or indirectly inherited from it. To test for “true inheritance”, scan the MRO of the class:

from collections.abc import Mapping

class P:
     pass

class C(P):
    pass

Mapping.register(P)

c = C()
isinstance(c, C)        # direct
True
isinstance(c, P)        # indirect
True
isinstance(c, Mapping)  # virtual
True

# Actual inheritance chain
type(c).__mro__
(<class 'C'>, <class 'P'>, <class 'object'>)

# Test for "true inheritance"
Mapping in type(c).__mro__
False

Note that most programs do not use isinstance() on user-defined classes very often. If you are developing the classes yourself, a more proper object-oriented style is to define methods on the classes that encapsulate a particular behaviour, instead of checking the object’s class and doing a different thing based on what class it is. For example, if you have a function that does something:

def search(obj):
    if isinstance(obj, Mailbox):
        ...  # code to search a mailbox
    elif isinstance(obj, Document):
        ...  # code to search a document
    elif ...

A better approach is to define a search() method on all the classes and just call it:

class Mailbox:
    def search(self):
        ...  # code to search a mailbox

class Document:
    def search(self):
        ...  # code to search a document

obj.search()

What is delegation ?

Delegation is an object oriented technique (also called a design pattern). Let’s say you have an object x and want to change the behaviour of just one of its methods. You can create a new class that provides a new implementation of the method you’re interested in changing and delegates all other methods to the corresponding method of x.

Python programmers can easily implement delegation. For example, the following class implements a class that behaves like a file but converts all written data to uppercase:

class UpperOut:

    def __init__(self, outfile):
        self._outfile = outfile

    def write(self, s):
        self._outfile.write(s.upper())

    def __getattr__(self, name):
        return getattr(self._outfile, name)

Here the UpperOut class redefines the write() method to convert the argument string to uppercase before calling the underlying self._outfile.write() method. All other methods are delegated to the underlying self._outfile object. The delegation is accomplished via the __getattr__() method; consult the language reference for more information about controlling attribute access.

Note that for more general cases delegation can get trickier. When attributes must be set as well as retrieved, the class must define a __setattr__() method too, and it must do so carefully. The basic implementation of __setattr__() is roughly equivalent to the following:

class X:
    ...
    def __setattr__(self, name, value):
        self.__dict__[name] = value
    ...

Many __setattr__() implementations call object.__setattr__() to set an attribute on self without causing infinite recursion:

class X:
    def __setattr__(self, name, value):
        # Custom logic here...
        object.__setattr__(self, name, value)

Alternatively, it is possible to set attributes by inserting entries into self.__dict__ directly.

How do I call a method defined in a base class from a derived class that extends it ?

Use the built-in super() function:

class Derived(Base):
    def meth(self):
        super().meth()  # calls Base.meth

In the example, super() will automatically determine the instance from which it was called (the self value), look up the method resolution order (MRO) with type(self).__mro__, and return the next in line after Derived in the MRO: Base.

How can I organize my code to make it easier to change the base class ?

You could assign the base class to an alias and derive from the alias. Then all you have to change is the value assigned to the alias. Incidentally, this trick is also handy if you want to decide dynamically (e.g. depending on the availability of resources) which base class to use. Example:

class Base:
    ...

BaseAlias = Base

class Derived(BaseAlias):
    ...

How do I create static class data and static class methods ?

Both static data and static methods (in the sense of C++ or Java) are supported in Python.

For static data, simply define a class attribute. To assign a new value to the attribute, you have to explicitly use the class name in the assignment:

class C:
    count = 0   # number of times C.__init__ called

    def __init__(self):
        C.count = C.count + 1

    def getcount(self):
        return C.count  # or return self.count

c.count also refers to C.count for any c such that isinstance(c, C) holds, unless overridden by c itself or by some class on the base-class search path from c.__class__ back to C.

Caution: within a method of C, an assignment like self.count = 42 creates a new and unrelated instance named “count” in self’s own dict. Rebinding of a class-static data name must always specify the class whether inside a method or not:

C.count = 314

Static methods are possible:

class C:
    @staticmethod
    def static(arg1, arg2, arg3):
        # No 'self' parameter!
        ...

However, a far more straightforward way to get the effect of a static method is via a simple module-level function:

def getcount():
    return C.count

If your code is structured so as to define one class (or tightly related class hierarchy) per module, this supplies the desired encapsulation.

How can I overload constructors (or methods) in Python ?

This answer actually applies to all methods, but the question usually comes up first in the context of constructors.

In C++ you’d write

class C {
    C() { cout << "No arguments\n"; }
    C(int i) { cout << "Argument is " << i << "\n"; }
}

In Python you have to write a single constructor that catches all cases using default arguments. For example:

class C:
    def __init__(self, i=None):
        if i is None:
            print("No arguments")
        else:
            print("Argument is", i)

This is not entirely equivalent, but close enough in practice.

You could also try a variable-length argument list, e.g.

def __init__(self, *args):
    ...

The same approach works for all method definitions.

I try to use __spam and I get an error about _SomeClassName__spam.

Variable names with double leading underscores are “mangled” to provide a simple but effective way to define class private variables. Any identifier of the form __spam (at least two leading underscores, at most one trailing underscore) is textually replaced with _classname__spam, where classname is the current class name with any leading underscores stripped.

The identifier can be used unchanged within the class, but to access it outside the class, the mangled name must be used:

class A:
    def __one(self):
        return 1
    def two(self):
        return 2 * self.__one()

class B(A):
    def three(self):
        return 3 * self._A__one()

four = 4 * A()._A__one()

In particular, this does not guarantee privacy since an outside user can still deliberately access the private attribute; many Python programmers never bother to use private variable names at all.

My class defines del but it is not called when I delete the object.

There are several possible reasons for this.

The del statement does not necessarily call __del__() – it simply decrements the object’s reference count, and if this reaches zero __del__() is called.

If your data structures contain circular links (e.g. a tree where each child has a parent reference and each parent has a list of children) the reference counts will never go back to zero. Once in a while Python runs an algorithm to detect such cycles, but the garbage collector might run some time after the last reference to your data structure vanishes, so your __del__() method may be called at an inconvenient and random time. This is inconvenient if you’re trying to reproduce a problem. Worse, the order in which object’s __del__() methods are executed is arbitrary. You can run gc.collect() to force a collection, but there are pathological cases where objects will never be collected.

Despite the cycle collector, it’s still a good idea to define an explicit close() method on objects to be called whenever you’re done with them. The close() method can then remove attributes that refer to subobjects. Don’t call __del__() directly – __del__() should call close() and close() should make sure that it can be called more than once for the same object.

Another way to avoid cyclical references is to use the weakref module, which allows you to point to objects without incrementing their reference count. Tree data structures, for instance, should use weak references for their parent and sibling references (if they need them!).

Finally, if your __del__() method raises an exception, a warning message is printed to sys.stderr.

How do I get a list of all instances of a given class ?

Python does not keep track of all instances of a class (or of a built-in type). You can program the class’s constructor to keep track of all instances by keeping a list of weak references to each instance.

Why does the result of id() appear to be not unique ?

The id() builtin returns an integer that is guaranteed to be unique during the lifetime of the object. Since in CPython, this is the object’s memory address, it happens frequently that after an object is deleted from memory, the next freshly created object is allocated at the same position in memory. This is illustrated by this example:

id(1000)
13901272
id(2000)
13901272

The two ids belong to different integer objects that are created before, and deleted immediately after execution of the id() call. To be sure that objects whose id you want to examine are still alive, create another reference to the object:

a = 1000; b = 2000
id(a)
13901272
id(b)
13891296

When can I rely on identity tests with the is operator ?

The is operator tests for object identity. The test a is b is equivalent to id(a) == id(b).

The most important property of an identity test is that an object is always identical to itself, a is a always returns True. Identity tests are usually faster than equality tests. And unlike equality tests, identity tests are guaranteed to return a boolean True or False.

However, identity tests can only be substituted for equality tests when object identity is assured. Generally, there are three circumstances where identity is guaranteed:

Assignments create new names but do not change object identity. After the assignment new = old, it is guaranteed that new is old.
Putting an object in a container that stores object references does not change object identity. After the list assignment s[0] = x, it is guaranteed that s[0] is x.
If an object is a singleton, it means that only one instance of that object can exist. After the assignments a = None and b = None, it is guaranteed that a is b because None is a singleton.

In most other circumstances, identity tests are inadvisable and equality tests are preferred. In particular, identity tests should not be used to check constants such as int and str which aren’t guaranteed to be singletons:

a = 1000
b = 500
c = b + 500
a is c
False

a = 'Python'
b = 'Py'
c = b + 'thon'
a is c
False

Likewise, new instances of mutable containers are never identical:

a = []
b = []
a is b
False

In the standard library code, you will see several common patterns for correctly using identity tests:

As recommended by PEP 8, an identity test is the preferred way to check for None. This reads like plain English in code and avoids confusion with other objects that may have boolean values that evaluate to false.

Detecting optional arguments can be tricky when None is a valid input value. In those situations, you can create a singleton sentinel object guaranteed to be distinct from other objects. For example, here is how to implement a method that behaves like dict.pop():

_sentinel = object()

def pop(self, key, default=_sentinel):
    if key in self:
        value = self[key]
        del self[key]
        return value
    if default is _sentinel:
        raise KeyError(key)
    return default

Container implementations sometimes need to augment equality tests with identity tests. This prevents the code from being confused by objects such as float('NaN') that are not equal to themselves.

For example, here is the implementation of collections.abc.Sequence.__contains__():

def __contains__(self, value):
    for v in self:
        if v is value or v == value:
            return True
    return False

How can a subclass control what data is stored in an immutable instance ?

When subclassing an immutable type, override the __new__() method instead of the __init__() method. The latter only runs after an instance is created, which is too late to alter data in an immutable instance.

All of these immutable classes have a different signature than their parent class:

from datetime import date

class FirstOfMonthDate(date):
    "Always choose the first day of the month"
    def __new__(cls, year, month, day):
        return super().__new__(cls, year, month, 1)

class NamedInt(int):
    "Allow text names for some numbers"
    xlat = {'zero': 0, 'one': 1, 'ten': 10}
    def __new__(cls, value):
        value = cls.xlat.get(value, value)
        return super().__new__(cls, value)

class TitleStr(str):
    "Convert str to name suitable for a URL path"
    def __new__(cls, s):
        s = s.lower().replace(' ', '-')
        s = ''.join([c for c in s if c.isalnum() or c == '-'])
        return super().__new__(cls, s)

The classes can be used like this:

FirstOfMonthDate(2012, 2, 14)
FirstOfMonthDate(2012, 2, 1)
NamedInt('ten')
10
NamedInt(20)
20
TitleStr('Blog: Why Python Rocks')
'blog-why-python-rocks'

How do I cache method calls ?

The two principal tools for caching methods are functools.cached_property() and functools.lru_cache(). The former stores results at the instance level and the latter at the class level.

The cached_property approach only works with methods that do not take any arguments. It does not create a reference to the instance. The cached method result will be kept only as long as the instance is alive.

The advantage is that when an instance is no longer used, the cached method result will be released right away. The disadvantage is that if instances accumulate, so too will the accumulated method results. They can grow without bound.

The lru_cache approach works with methods that have hashable arguments. It creates a reference to the instance unless special efforts are made to pass in weak references.

The advantage of the least recently used algorithm is that the cache is bounded by the specified maxsize. The disadvantage is that instances are kept alive until they age out of the cache or until the cache is cleared.

This example shows the various techniques:

class Weather:
    "Lookup weather information on a government website"

    def __init__(self, station_id):
        self._station_id = station_id
        # The _station_id is private and immutable

    def current_temperature(self):
        "Latest hourly observation"
        # Do not cache this because old results
        # can be out of date.

    @cached_property
    def location(self):
        "Return the longitude/latitude coordinates of the station"
        # Result only depends on the station_id

    @lru_cache(maxsize=20)
    def historic_rainfall(self, date, units='mm'):
        "Rainfall on a given date"
        # Depends on the station_id, date, and units.

The above example assumes that the station_id never changes. If the relevant instance attributes are mutable, the cached_property approach can’t be made to work because it cannot detect changes to the attributes.

To make the lru_cache approach work when the station_id is mutable, the class needs to define the __eq__() and __hash__() methods so that the cache can detect relevant attribute updates:

class Weather:
    "Example with a mutable station identifier"

    def __init__(self, station_id):
        self.station_id = station_id

    def change_station(self, station_id):
        self.station_id = station_id

    def __eq__(self, other):
        return self.station_id == other.station_id

    def __hash__(self):
        return hash(self.station_id)

    @lru_cache(maxsize=20)
    def historic_rainfall(self, date, units='cm'):
        'Rainfall on a given date'
        # Depends on the station_id, date, and units.

How do I create a .pyc file ?

When a module is imported for the first time (or when the source file has changed since the current compiled file was created) a .pyc file containing the compiled code should be created in a __pycache__ subdirectory of the directory containing the .py file. The .pyc file will have a filename that starts with the same name as the .py file, and ends with .pyc, with a middle component that depends on the particular python binary that created it. (See PEP 3147 for details.)

One reason that a .pyc file may not be created is a permissions problem with the directory containing the source file, meaning that the __pycache__ subdirectory cannot be created. This can happen, for example, if you develop as one user but run as another, such as if you are testing with a web server.

Unless the PYTHONDONTWRITEBYTECODE environment variable is set, creation of a .pyc file is automatic if you’re importing a module and Python has the ability (permissions, free space, etc…) to create a __pycache__ subdirectory and write the compiled module to that subdirectory.

Running Python on a top level script is not considered an import and no .pyc will be created. For example, if you have a top-level module foo.py that imports another module xyz.py, when you run foo (by typing python foo.py as a shell command), a .pyc will be created for xyz because xyz is imported, but no .pyc file will be created for foo since foo.py isn’t being imported.

If you need to create a .pyc file for foo – that is, to create a .pyc file for a module that is not imported – you can, using the py_compile and compileall modules.

The py_compile module can manually compile any module. One way is to use the compile() function in that module interactively:

import py_compile
py_compile.compile('foo.py')

This will write the .pyc to a __pycache__ subdirectory in the same location as foo.py (or you can override that with the optional parameter cfile).

You can also automatically compile all files in a directory or directories using the compileall module. You can do it from the shell prompt by running compileall.py and providing the path of a directory containing Python files to compile:

python -m compileall .

How do I find the current module name ?

A module can find out its own module name by looking at the predefined global variable __name__. If this has the value '__main__', the program is running as a script. Many modules that are usually used by importing them also provide a command-line interface or a self-test, and only execute this code after checking __name__:

def main():
    print('Running test...')
    ...

if __name__ == '__main__':
    main()

How can I have modules that mutually import each other ?

Suppose you have the following modules:

foo.py:

from bar import bar_var
foo_var = 1

bar.py:

from foo import foo_var
bar_var = 2

The problem is that the interpreter will perform the following steps:

main imports foo
Empty globals for foo are created
foo is compiled and starts executing
foo imports bar
Empty globals for bar are created
bar is compiled and starts executing
bar imports foo (which is a no-op since there already is a module named foo)
The import mechanism tries to read foo_var from foo globals, to set bar.foo_var = foo.foo_var

The last step fails, because Python isn’t done with interpreting foo yet and the global symbol dictionary for foo is still empty.

The same thing happens when you use import foo, and then try to access foo.foo_var in global code.

There are (at least) three possible workarounds for this problem.

Guido van Rossum recommends avoiding all uses of from <module> import ..., and placing all code inside functions. Initializations of global variables and class variables should use constants or built-in functions only. This means everything from an imported module is referenced as <module>.<name>.

Jim Roskind suggests performing steps in the following order in each module:

exports (globals, functions, and classes that don’t need imported base classes)
import statements
active code (including globals that are initialized from imported values).

Van Rossum doesn’t like this approach much because the imports appear in a strange place, but it does work.

Matthias Urlichs recommends restructuring your code so that the recursive import is not necessary in the first place.

These solutions are not mutually exclusive.

import(‘x.y.z’) returns ; how do I get z ?

Consider using the convenience function import_module() from importlib instead:

z = importlib.import_module('x.y.z')

When I edit an imported module and reimport it, the changes don’t show up. Why does this happen ?

For reasons of efficiency as well as consistency, Python only reads the module file on the first time a module is imported. If it didn’t, in a program consisting of many modules where each one imports the same basic module, the basic module would be parsed and re-parsed many times. To force re-reading of a changed module, do this:

import importlib
import modname
importlib.reload(modname)

Warning: this technique is not 100% fool-proof. In particular, modules containing statements like

from modname import some_objects

will continue to work with the old version of the imported objects. If the module contains class definitions, existing class instances will not be updated to use the new class definition. This can result in the following paradoxical behaviour:

import importlib
import cls
c = cls.C()                # Create an instance of C
importlib.reload(cls)
<module 'cls' from 'cls.py'>
isinstance(c, cls.C)       # isinstance is false?!?
False

The nature of the problem is made clear if you print out the “identity” of the class objects:

hex(id(c.__class__))
'0x7352a0'
hex(id(cls.C))
'0x4198d0'

Design and History FAQ Frequently Asked Questions

Why does Python use indentation for grouping of statements ?

Guido van Rossum believes that using indentation for grouping is extremely elegant and contributes a lot to the clarity of the average Python program. Most people learn to love this feature after a while.

Since there are no begin/end brackets there cannot be a disagreement between grouping perceived by the parser and the human reader. Occasionally C programmers will encounter a fragment of code like this:

if (x <= y)
        x++;
        y--;
z++;

Only the x++ statement is executed if the condition is true, but the indentation leads many to believe otherwise. Even experienced C programmers will sometimes stare at it a long time wondering as to why y is being decremented even for x > y.

Because there are no begin/end brackets, Python is much less prone to coding-style conflicts. In C there are many different ways to place the braces. After becoming used to reading and writing code using a particular style, it is normal to feel somewhat uneasy when reading (or being required to write) in a different one.

Many coding styles place begin/end brackets on a line by themselves. This makes programs considerably longer and wastes valuable screen space, making it harder to get a good overview of a program. Ideally, a function should fit on one screen (say, 20–30 lines). 20 lines of Python can do a lot more work than 20 lines of C. This is not solely due to the lack of begin/end brackets – the lack of declarations and the high-level data types are also responsible – but the indentation-based syntax certainly helps.

Why am I getting strange results with simple arithmetic operations ?

Check next answers.

Why are floating-point calculations so inaccurate ?

Users are often surprised by results like this:

1.2 - 1.0
0.19999999999999996

and think it is a bug in Python. It’s not. This has little to do with Python, and much more to do with how the underlying platform handles floating-point numbers.

The float type in CPython uses a C double for storage. A float object’s value is stored in binary floating-point with a fixed precision (typically 53 bits) and Python uses C operations, which in turn rely on the hardware implementation in the processor, to perform floating-point operations. This means that as far as floating-point operations are concerned, Python behaves like many popular languages including C and Java.

Many numbers that can be written easily in decimal notation cannot be expressed exactly in binary floating point. For example, after:

x = 1.2

the value stored for x is a (very good) approximation to the decimal value 1.2, but is not exactly equal to it. On a typical machine, the actual stored value is:

1.0011001100110011001100110011001100110011001100110011 (binary)

which is exactly:

1.1999999999999999555910790149937383830547332763671875 (decimal)

The typical precision of 53 bits provides Python floats with 15–16 decimal digits of accuracy.

For a fuller explanation, please see the floating-point arithmetic chapter in the Python tutorial.

Why are Python strings immutable ?

There are several advantages.

One is performance: knowing that a string is immutable means we can allocate space for it at creation time, and the storage requirements are fixed and unchanging. This is also one of the reasons for the distinction between tuples and lists.

Another advantage is that strings in Python are considered as “elemental” as numbers. No amount of activity will change the value 8 to anything else, and in Python, no amount of activity will change the string “eight” to anything else.

Why must ‘self’ be used explicitly in method definitions and calls ?

The idea was borrowed from Modula-3. It turns out to be very useful, for a variety of reasons.

First, it’s more obvious that you are using a method or instance attribute instead of a local variable. Reading self.x or self.meth() makes it absolutely clear that an instance variable or method is used even if you don’t know the class definition by heart. In C++, you can sort of tell by the lack of a local variable declaration (assuming globals are rare or easily recognizable) – but in Python, there are no local variable declarations, so you’d have to look up the class definition to be sure. Some C++ and Java coding standards call for instance attributes to have an m_ prefix, so this explicitness is still useful in those languages, too.

Second, it means that no special syntax is necessary if you want to explicitly reference or call the method from a particular class. In C++, if you want to use a method from a base class which is overridden in a derived class, you have to use the :: operator – in Python you can write baseclass.methodname(self, <argument list>). This is particularly useful for __init__() methods, and in general in cases where a derived class method wants to extend the base class method of the same name and thus has to call the base class method somehow.

Finally, for instance variables it solves a syntactic problem with assignment: since local variables in Python are (by definition!) those variables to which a value is assigned in a function body (and that aren’t explicitly declared global), there has to be some way to tell the interpreter that an assignment was meant to assign to an instance variable instead of to a local variable, and it should preferably be syntactic (for efficiency reasons). C++ does this through declarations, but Python doesn’t have declarations and it would be a pity having to introduce them just for this purpose. Using the explicit self.var solves this nicely. Similarly, for using instance variables, having to write self.var means that references to unqualified names inside a method don’t have to search the instance’s directories. To put it another way, local variables and instance variables live in two different namespaces, and you need to tell Python which namespace to use.

Why can’t I use an assignment in an expression ?

Starting in Python 3.8, you can!

Assignment expressions using the walrus operator := assign a variable in an expression:

while chunk := fp.read(200):
   print(chunk)

See PEP 572 for more information.

Why does Python use methods for some functionality (e.g. list.index()) but functions for other (e.g. len(list)) ?

As Guido said:

(a) For some operations, prefix notation just reads better than postfix – prefix (and infix!) operations have a long tradition in mathematics which likes notations where the visuals help the mathematician thinking about a problem. Compare the easy with which we rewrite a formula like x*(a+b) into x*a + x*b to the clumsiness of doing the same thing using a raw OO notation.
(b) When I read code that says len(x) I know that it is asking for the length of something. This tells me two things: the result is an integer, and the argument is some kind of container. To the contrary, when I read x.len(), I have to already know that x is some kind of container implementing an interface or inheriting from a class that has a standard len(). Witness the confusion we occasionally have when a class that is not implementing a mapping has a get() or keys() method, or something that isn’t a file has a write() method.
—https://mail.python.org/pipermail/python-3000/2006-November/004643.html

Why is join() a string method instead of a list or tuple method ?

Strings became much more like other standard types starting in Python 1.6, when methods were added which give the same functionality that has always been available using the functions of the string module. Most of these new methods have been widely accepted, but the one which appears to make some programmers feel uncomfortable is:

", ".join(['1', '2', '4', '8', '16'])

which gives the result:

"1, 2, 4, 8, 16"

There are two common arguments against this usage.

The first runs along the lines of: “It looks really ugly using a method of a string literal (string constant)”, to which the answer is that it might, but a string literal is just a fixed value. If the methods are to be allowed on names bound to strings there is no logical reason to make them unavailable on literals.

The second objection is typically cast as: “I am really telling a sequence to join its members together with a string constant”. Sadly, you aren’t. For some reason there seems to be much less difficulty with having split() as a string method, since in that case it is easy to see that

"1, 2, 4, 8, 16".split(", ")

is an instruction to a string literal to return the substrings delimited by the given separator (or, by default, arbitrary runs of white space).

join() is a string method because in using it you are telling the separator string to iterate over a sequence of strings and insert itself between adjacent elements. This method can be used with any argument which obeys the rules for sequence objects, including any new classes you might define yourself. Similar methods exist for bytes and bytearray objects.

How fast are exceptions ?

A try/except block is extremely efficient if no exceptions are raised. Actually catching an exception is expensive. In versions of Python prior to 2.0 it was common to use this idiom:

try:
    value = mydict[key]
except KeyError:
    mydict[key] = getvalue(key)
    value = mydict[key]

This only made sense when you expected the dict to have the key almost all the time. If that wasn’t the case, you coded it like this:

if key in mydict:
    value = mydict[key]
else:
    value = mydict[key] = getvalue(key)

For this specific case, you could also use value = dict.setdefault(key, getvalue(key)), but only if the getvalue() call is cheap enough because it is evaluated in all cases.

Why isn’t there a switch or case statement in Python ?

In general, structured switch statements execute one block of code when an expression has a particular value or set of values. Since Python 3.10 one can easily match literal values, or constants within a namespace, with a match ... case statement. An older alternative is a sequence of if... elif... elif... else.

For cases where you need to choose from a very large number of possibilities, you can create a dictionary mapping case values to functions to call. For example:

functions = {'a': function_1,
             'b': function_2,
             'c': self.method_1}

func = functions[value]
func()

For calling methods on objects, you can simplify yet further by using the getattr() built-in to retrieve methods with a particular name:

class MyVisitor:
    def visit_a(self):
        ...

    def dispatch(self, value):
        method_name = 'visit_' + str(value)
        method = getattr(self, method_name)
        method()

It’s suggested that you use a prefix for the method names, such as visit_ in this example. Without such a prefix, if values are coming from an untrusted source, an attacker would be able to call any method on your object.

Imitating switch with fallthrough, as with C’s switch-case-default, is possible, much harder, and less needed.

Can’t you emulate threads in the interpreter instead of relying on an OS-specific thread implementation ?

Answer 1: Unfortunately, the interpreter pushes at least one C stack frame for each Python stack frame. Also, extensions can call back into Python at almost random moments. Therefore, a complete threads implementation requires thread support for C.

Answer 2: Fortunately, there is Stackless Python, which has a completely redesigned interpreter loop that avoids the C stack.

Why can’t lambda expressions contain statements ?

Python lambda expressions cannot contain statements because Python’s syntactic framework can’t handle statements nested inside expressions. However, in Python, this is not a serious problem. Unlike lambda forms in other languages, where they add functionality, Python lambdas are only a shorthand notation if you’re too lazy to define a function.

Functions are already first class objects in Python, and can be declared in a local scope. Therefore the only advantage of using a lambda instead of a locally defined function is that you don’t need to invent a name for the function – but that’s just a local variable to which the function object (which is exactly the same type of object that a lambda expression yields) is assigned!

Can Python be compiled to machine code, C or some other language ?

Cython compiles a modified version of Python with optional annotations into C extensions. Nuitka is an up-and-coming compiler of Python into C++ code, aiming to support the full Python language.

How does Python manage memory ?

The details of Python memory management depend on the implementation. The standard implementation of Python, CPython, uses reference counting to detect inaccessible objects, and another mechanism to collect reference cycles, periodically executing a cycle detection algorithm which looks for inaccessible cycles and deletes the objects involved. The gc module provides functions to perform a garbage collection, obtain debugging statistics, and tune the collector’s parameters.

Other implementations (such as Jython or PyPy), however, can rely on a different mechanism such as a full-blown garbage collector. This difference can cause some subtle porting problems if your Python code depends on the behavior of the reference counting implementation.

In some Python implementations, the following code (which is fine in CPython) will probably run out of file descriptors:

for file in very_long_list_of_files:
    f = open(file)
    c = f.read(1)

Indeed, using CPython’s reference counting and destructor scheme, each new assignment to f closes the previous file. With a traditional GC, however, those file objects will only get collected (and closed) at varying and possibly long intervals.

If you want to write code that will work with any Python implementation, you should explicitly close the file or use the with statement; this will work regardless of memory management scheme:

for file in very_long_list_of_files:
    with open(file) as f:
        c = f.read(1)

Why doesn’t CPython use a more traditional garbage collection scheme ?

For one thing, this is not a C standard feature and hence it’s not portable. (Yes, we know about the Boehm GC library. It has bits of assembler code for most common platforms, not for all of them, and although it is mostly transparent, it isn’t completely transparent; patches are required to get Python to work with it.)

Traditional GC also becomes a problem when Python is embedded into other applications. While in a standalone Python it’s fine to replace the standard malloc() and free() with versions provided by the GC library, an application embedding Python may want to have its own substitute for malloc() and free(), and may not want Python’s. Right now, CPython works with anything that implements malloc() and free() properly.

Why isn’t all memory freed when CPython exits ?

Objects referenced from the global namespaces of Python modules are not always deallocated when Python exits. This may happen if there are circular references. There are also certain bits of memory that are allocated by the C library that are impossible to free (e.g. a tool like Purify will complain about these). Python is, however, aggressive about cleaning up memory on exit and does try to destroy every single object.

If you want to force Python to delete certain things on deallocation use the atexit module to run a function that will force those deletions.

Why are there separate tuple and list data types ?

Lists and tuples, while similar in many respects, are generally used in fundamentally different ways. Tuples can be thought of as being similar to Pascal records or C structs; they’re small collections of related data which may be of different types which are operated on as a group. For example, a Cartesian coordinate is appropriately represented as a tuple of two or three numbers.

Lists, on the other hand, are more like arrays in other languages. They tend to hold a varying number of objects all of which have the same type and which are operated on one-by-one. For example, os.listdir('.') returns a list of strings representing the files in the current directory. Functions which operate on this output would generally not break if you added another file or two to the directory.

Tuples are immutable, meaning that once a tuple has been created, you can’t replace any of its elements with a new value. Lists are mutable, meaning that you can always change a list’s elements. Only immutable elements can be used as dictionary keys, and hence only tuples and not lists can be used as keys.

How are lists implemented in CPython ?

CPython’s lists are really variable-length arrays, not Lisp-style linked lists. The implementation uses a contiguous array of references to other objects, and keeps a pointer to this array and the array’s length in a list head structure.

This makes indexing a list a[i] an operation whose cost is independent of the size of the list or the value of the index.

When items are appended or inserted, the array of references is resized. Some cleverness is applied to improve the performance of appending items repeatedly; when the array must be grown, some extra space is allocated so the next few times don’t require an actual resize.

How are dictionaries implemented in CPython ?

CPython’s dictionaries are implemented as resizable hash tables. Compared to B-trees, this gives better performance for lookup (the most common operation by far) under most circumstances, and the implementation is simpler.

Dictionaries work by computing a hash code for each key stored in the dictionary using the hash() built-in function. The hash code varies widely depending on the key and a per-process seed; for example, 'Python' could hash to -539294296 while 'python', a string that differs by a single bit, could hash to 1142331976. The hash code is then used to calculate a location in an internal array where the value will be stored. Assuming that you’re storing keys that all have different hash values, this means that dictionaries take constant time – O(1), in Big-O notation – to retrieve a key.

Why must dictionary keys be immutable ?

The hash table implementation of dictionaries uses a hash value calculated from the key value to find the key. If the key were a mutable object, its value could change, and thus its hash could also change. But since whoever changes the key object can’t tell that it was being used as a dictionary key, it can’t move the entry around in the dictionary. Then, when you try to look up the same object in the dictionary it won’t be found because its hash value is different. If you tried to look up the old value it wouldn’t be found either, because the value of the object found in that hash bin would be different.

If you want a dictionary indexed with a list, simply convert the list to a tuple first; the function tuple(L) creates a tuple with the same entries as the list L. Tuples are immutable and can therefore be used as dictionary keys.

Some unacceptable solutions that have been proposed:

Hash lists by their address (object ID). This doesn’t work because if you construct a new list with the same value it won’t be found; e.g.:
```
mydict = {[1, 2]: '12'}
print(mydict[[1, 2]])
```
would raise a KeyError exception because the id of the [1, 2] used in the second line differs from that in the first line. In other words, dictionary keys should be compared using ==, not using is.
Make a copy when using a list as a key. This doesn’t work because the list, being a mutable object, could contain a reference to itself, and then the copying code would run into an infinite loop.
Allow lists as keys but tell the user not to modify them. This would allow a class of hard-to-track bugs in programs when you forgot or modified a list by accident. It also invalidates an important invariant of dictionaries: every value in d.keys() is usable as a key of the dictionary.
Mark lists as read-only once they are used as a dictionary key. The problem is that it’s not just the top-level object that could change its value; you could use a tuple containing a list as a key. Entering anything as a key into a dictionary would require marking all objects reachable from there as read-only – and again, self-referential objects could cause an infinite loop.

There is a trick to get around this if you need to, but use it at your own risk: You can wrap a mutable structure inside a class instance which has both a __eq__() and a __hash__() method. You must then make sure that the hash value for all such wrapper objects that reside in a dictionary (or other hash based structure), remain fixed while the object is in the dictionary (or other structure).

class ListWrapper:
    def __init__(self, the_list):
        self.the_list = the_list

    def __eq__(self, other):
        return self.the_list == other.the_list

    def __hash__(self):
        l = self.the_list
        result = 98767 - len(l)*555
        for i, el in enumerate(l):
            try:
                result = result + (hash(el) % 9999999) * 1001 + i
            except Exception:
                result = (result % 7777777) + i * 333
        return result

Note that the hash computation is complicated by the possibility that some members of the list may be unhashable and also by the possibility of arithmetic overflow.

Furthermore it must always be the case that if o1 == o2 (ie o1.__eq__(o2) is True) then hash(o1) == hash(o2) (ie, o1.__hash__() == o2.__hash__()), regardless of whether the object is in a dictionary or not. If you fail to meet these restrictions dictionaries and other hash based structures will misbehave.

In the case of ListWrapper, whenever the wrapper object is in a dictionary the wrapped list must not change to avoid anomalies. Don’t do this unless you are prepared to think hard about the requirements and the consequences of not meeting them correctly. Consider yourself warned.

Why doesn’t list.sort() return the sorted list ?

In situations where performance matters, making a copy of the list just to sort it would be wasteful. Therefore, list.sort() sorts the list in place. In order to remind you of that fact, it does not return the sorted list. This way, you won’t be fooled into accidentally overwriting a list when you need a sorted copy but also need to keep the unsorted version around.

If you want to return a new list, use the built-in sorted() function instead. This function creates a new list from a provided iterable, sorts it and returns it. For example, here’s how to iterate over the keys of a dictionary in sorted order:

for key in sorted(mydict):
    ...  # do whatever with mydict[key]...

How do you specify and enforce an interface spec in Python ?

An interface specification for a module as provided by languages such as C++ and Java describes the prototypes for the methods and functions of the module. Many feel that compile-time enforcement of interface specifications helps in the construction of large programs.

Python 2.6 adds an abc module that lets you define Abstract Base Classes (ABCs). You can then use isinstance() and issubclass() to check whether an instance or a class implements a particular ABC. The collections.abc module defines a set of useful ABCs such as Iterable, Container, and MutableMapping.

For Python, many of the advantages of interface specifications can be obtained by an appropriate test discipline for components.

A good test suite for a module can both provide a regression test and serve as a module interface specification and a set of examples. Many Python modules can be run as a script to provide a simple “self test.” Even modules which use complex external interfaces can often be tested in isolation using trivial “stub” emulations of the external interface. The doctest and unittest modules or third-party test frameworks can be used to construct exhaustive test suites that exercise every line of code in a module.

An appropriate testing discipline can help build large complex applications in Python as well as having interface specifications would. In fact, it can be better because an interface specification cannot test certain properties of a program. For example, the list.append() method is expected to add new elements to the end of some internal list; an interface specification cannot test that your list.append() implementation will actually do this correctly, but it’s trivial to check this property in a test suite.

Writing test suites is very helpful, and you might want to design your code to make it easily tested. One increasingly popular technique, test-driven development, calls for writing parts of the test suite first, before you write any of the actual code. Of course Python allows you to be sloppy and not write test cases at all.

Why is there no goto ?

In the 1970s people realized that unrestricted goto could lead to messy “spaghetti” code that was hard to understand and revise. In a high-level language, it is also unneeded as long as there are ways to branch (in Python, with if statements and or, and, and if/else expressions) and loop (with while and for statements, possibly containing continue and break).

One can also use exceptions to provide a “structured goto” that works even across function calls. Many feel that exceptions can conveniently emulate all reasonable uses of the go or goto constructs of C, Fortran, and other languages. For example:

class label(Exception): pass  # declare a label

try:
    ...
    if condition: raise label()  # goto label
    ...
except label:  # where to goto
    pass
...

This doesn’t allow you to jump into the middle of a loop, but that’s usually considered an abuse of goto anyway. Use sparingly.

Why can’t raw strings (r-strings) end with a backslash ?

More precisely, they can’t end with an odd number of backslashes: the unpaired backslash at the end escapes the closing quote character, leaving an unterminated string.

Raw strings were designed to ease creating input for processors (chiefly regular expression engines) that want to do their own backslash escape processing. Such processors consider an unmatched trailing backslash to be an error anyway, so raw strings disallow that. In return, they allow you to pass on the string quote character by escaping it with a backslash. These rules work well when r-strings are used for their intended purpose.

If you’re trying to build Windows pathnames, note that all Windows system calls accept forward slashes too:

f = open("/mydir/file.txt")  # works fine!

If you’re trying to build a pathname for a DOS command, try e.g. one of

dir = r"\this\is\my\dos\dir" "\\"
dir = r"\this\is\my\dos\dir\ "[:-1]
dir = "\\this\\is\\my\\dos\\dir\\"

Why doesn’t Python have a “with” statement for attribute assignments ?

Python has a with statement that wraps the execution of a block, calling code on the entrance and exit from the block. Some languages have a construct that looks like this:

with obj:
    a = 1               # equivalent to obj.a = 1
    total = total + 1   # obj.total = obj.total + 1

In Python, such a construct would be ambiguous.

Other languages, such as Object Pascal, Delphi, and C++, use static types, so it’s possible to know, in an unambiguous way, what member is being assigned to. This is the main point of static typing – the compiler always knows the scope of every variable at compile time.

Python uses dynamic types. It is impossible to know in advance which attribute will be referenced at runtime. Member attributes may be added or removed from objects on the fly. This makes it impossible to know, from a simple reading, what attribute is being referenced: a local one, a global one, or a member attribute?

For instance, take the following incomplete snippet:

def foo(a):
    with a:
        print(x)

The snippet assumes that a must have a member attribute called x. However, there is nothing in Python that tells the interpreter this. What should happen if a is, let us say, an integer? If there is a global variable named x, will it be used inside the with block? As you see, the dynamic nature of Python makes such choices much harder.

The primary benefit of with and similar language features (reduction of code volume) can, however, easily be achieved in Python by assignment. Instead of:

function(args).mydict[index][index].a = 21
function(args).mydict[index][index].b = 42
function(args).mydict[index][index].c = 63

write this:

ref = function(args).mydict[index][index]
ref.a = 21
ref.b = 42
ref.c = 63

This also has the side-effect of increasing execution speed because name bindings are resolved at run-time in Python, and the second version only needs to perform the resolution once.

Similar proposals that would introduce syntax to further reduce code volume, such as using a ‘leading dot’, have been rejected in favour of explicitness (see https://mail.python.org/pipermail/python-ideas/2016-May/040070.html).

Why don’t generators support the with statement ?

For technical reasons, a generator used directly as a context manager would not work correctly. When, as is most common, a generator is used as an iterator run to completion, no closing is needed. When it is, wrap it as contextlib.closing(generator) in the with statement.

Why are colons required for the if/while/def/class statements ?

The colon is required primarily to enhance readability (one of the results of the experimental ABC language). Consider this:

if a == b
    print(a)

versus

if a == b:
    print(a)

Notice how the second one is slightly easier to read. Notice further how a colon sets off the example in this FAQ answer; it’s a standard usage in English.

Another minor reason is that the colon makes it easier for editors with syntax highlighting; they can look for colons to decide when indentation needs to be increased instead of having to do a more elaborate parsing of the program text.

Why does Python allow commas at the end of lists and tuples ?

Python lets you add a trailing comma at the end of lists, tuples, and dictionaries:

[1, 2, 3,]
('a', 'b', 'c',)
d = {
    "A": [1, 5],
    "B": [6, 7],  # last trailing comma is optional but good style
}

There are several reasons to allow this.

When you have a literal value for a list, tuple, or dictionary spread across multiple lines, it’s easier to add more elements because you don’t have to remember to add a comma to the previous line. The lines can also be reordered without creating a syntax error.

Accidentally omitting the comma can lead to errors that are hard to diagnose. For example:

x = [
  "fee",
  "fie"
  "foo",
  "fum"
]

This list looks like it has four elements, but it actually contains three: “fee”, “fiefoo” and “fum”. Always adding the comma avoids this source of error.

Allowing the trailing comma may also make programmatic code generation easier.

Library and Extension FAQ Frequently Asked Questions

How do I find a module or application to perform task X ?

Check the Library Reference to see if there’s a relevant standard library module. (Eventually you’ll learn what’s in the standard library and will be able to skip this step.)

For third-party packages, search the Python Package Index or try Google or another web search engine. Searching for “Python” plus a keyword or two for your topic of interest will usually find something helpful.

Where is the math.py (socket.py, regex.py, etc.) source file ?

If you can’t find a source file for a module it may be a built-in or dynamically loaded module implemented in C, C++ or other compiled language. In this case you may not have the source file or it may be something like mathmodule.c, somewhere in a C source directory (not on the Python Path).

There are (at least) three kinds of modules in Python:

modules written in Python (.py);
modules written in C and dynamically loaded (.dll, .pyd, .so, .sl, etc);
modules written in C and linked with the interpreter; to get a list of these, type:
```
import sys
print(sys.builtin_module_names)
```

How do I make a Python script executable on Unix ?

You need to do two things: the script file’s mode must be executable and the first line must begin with #! followed by the path of the Python interpreter.

The first is done by executing chmod +x scriptfile or perhaps chmod 755 scriptfile.

The second can be done in a number of ways. The most straightforward way is to write

#!/usr/local/bin/python

as the very first line of your file, using the pathname for where the Python interpreter is installed on your platform.

If you would like the script to be independent of where the Python interpreter lives, you can use the env program. Almost all Unix variants support the following, assuming the Python interpreter is in a directory on the user’s PATH:

#!/usr/bin/env python

Don’t do this for CGI scripts. The PATH variable for CGI scripts is often very minimal, so you need to use the actual absolute pathname of the interpreter.

Occasionally, a user’s environment is so full that the /usr/bin/env program fails; or there’s no env program at all. In that case, you can try the following hack (due to Alex Rezinsky):

#! /bin/sh
""":"
exec python $0 ${1+"$@"}
"""

The minor disadvantage is that this defines the script’s __doc__ string. However, you can fix that by adding

__doc__ = """...Whatever..."""

Is there a curses/termcap package for Python ?

For Unix variants: The standard Python source distribution comes with a curses module in the Modules subdirectory, though it’s not compiled by default. (Note that this is not available in the Windows distribution – there is no curses module for Windows.)

The curses module supports basic curses features as well as many additional functions from ncurses and SYSV curses such as colour, alternative character set support, pads, and mouse support. This means the module isn’t compatible with operating systems that only have BSD curses, but there don’t seem to be any currently maintained OSes that fall into this category.

Is there an equivalent to C’s onexit() in Python ?

The atexit module provides a register function that is similar to C’s onexit().

Why don’t my signal handlers work ?

The most common problem is that the signal handler is declared with the wrong argument list. It is called as

handler(signum, frame)

so it should be declared with two parameters:

def handler(signum, frame):
    ...

How do I test a Python program or component ?

Python comes with two testing frameworks. The doctest module finds examples in the docstrings for a module and runs them, comparing the output with the expected output given in the docstring.

The unittest module is a fancier testing framework modelled on Java and Smalltalk testing frameworks.

To make testing easier, you should use good modular design in your program. Your program should have almost all functionality encapsulated in either functions or class methods – and this sometimes has the surprising and delightful effect of making the program run faster (because local variable accesses are faster than global accesses). Furthermore the program should avoid depending on mutating global variables, since this makes testing much more difficult to do.

The “global main logic” of your program may be as simple as

if __name__ == "__main__":
    main_logic()

at the bottom of the main module of your program.

Once your program is organized as a tractable collection of function and class behaviours, you should write test functions that exercise the behaviours. A test suite that automates a sequence of tests can be associated with each module. This sounds like a lot of work, but since Python is so terse and flexible it’s surprisingly easy. You can make coding much more pleasant and fun by writing your test functions in parallel with the “production code”, since this makes it easy to find bugs and even design flaws earlier.

“Support modules” that are not intended to be the main module of a program may include a self-test of the module.

if __name__ == "__main__":
    self_test()

Even programs that interact with complex external interfaces may be tested when the external interfaces are unavailable by using “fake” interfaces implemented in Python.

How do I create documentation from doc strings ?

The pydoc module can create HTML from the doc strings in your Python source code. An alternative for creating API documentation purely from docstrings is epydoc. Sphinx can also include docstring content.

How do I get a single keypress at a time ?

For Unix variants there are several solutions. It’s straightforward to do this using curses, but curses is a fairly large module to learn.

How do I program using threads ?

Be sure to use the threading module and not the _thread module. The threading module builds convenient abstractions on top of the low-level primitives provided by the _thread module.

None of my threads seem to run: why ?

As soon as the main thread exits, all threads are killed. Your main thread is running too quickly, giving the threads no time to do any work.

A simple fix is to add a sleep to the end of the program that’s long enough for all the threads to finish:

import threading, time

def thread_task(name, n):
    for i in range(n):
        print(name, i)

for i in range(10):
    T = threading.Thread(target=thread_task, args=(str(i), i))
    T.start()

time.sleep(10)  # <---------------------------!

But now (on many platforms) the threads don’t run in parallel, but appear to run sequentially, one at a time! The reason is that the OS thread scheduler doesn’t start a new thread until the previous thread is blocked.

A simple fix is to add a tiny sleep to the start of the run function:

def thread_task(name, n):
    time.sleep(0.001)  # <--------------------!
    for i in range(n):
        print(name, i)

for i in range(10):
    T = threading.Thread(target=thread_task, args=(str(i), i))
    T.start()

time.sleep(10)

Instead of trying to guess a good delay value for time.sleep(), it’s better to use some kind of semaphore mechanism. One idea is to use the queue module to create a queue object, let each thread append a token to the queue when it finishes, and let the main thread read as many tokens from the queue as there are threads.

How do I parcel out work among a bunch of worker threads ?

The easiest way is to use the concurrent.futures module, especially the ThreadPoolExecutor class.

Or, if you want fine control over the dispatching algorithm, you can write your own logic manually. Use the queue module to create a queue containing a list of jobs. The Queue class maintains a list of objects and has a .put(obj) method that adds items to the queue and a .get() method to return them. The class will take care of the locking necessary to ensure that each job is handed out exactly once.

Here’s a trivial example:

import threading, queue, time

# The worker thread gets jobs off the queue.  When the queue is empty, it
# assumes there will be no more work and exits.
# (Realistically workers will run until terminated.)
def worker():
    print('Running worker')
    time.sleep(0.1)
    while True:
        try:
            arg = q.get(block=False)
        except queue.Empty:
            print('Worker', threading.current_thread(), end=' ')
            print('queue empty')
            break
        else:
            print('Worker', threading.current_thread(), end=' ')
            print('running with argument', arg)
            time.sleep(0.5)

# Create queue
q = queue.Queue()

# Start a pool of 5 workers
for i in range(5):
    t = threading.Thread(target=worker, name='worker %i' % (i+1))
    t.start()

# Begin adding work to the queue
for i in range(50):
    q.put(i)

# Give threads time to run
print('Main thread sleeping')
time.sleep(5)

When run, this will produce the following output:

Running worker
Running worker
Running worker
Running worker
Running worker
Main thread sleeping
Worker <Thread(worker 1, started 130283832797456)> running with argument 0
Worker <Thread(worker 2, started 130283824404752)> running with argument 1
Worker <Thread(worker 3, started 130283816012048)> running with argument 2
Worker <Thread(worker 4, started 130283807619344)> running with argument 3
Worker <Thread(worker 5, started 130283799226640)> running with argument 4
Worker <Thread(worker 1, started 130283832797456)> running with argument 5
...

Consult the module’s documentation for more details; the Queue class provides a featureful interface.

What kinds of global value mutation are thread-safe ?

A global interpreter lock (GIL) is used internally to ensure that only one thread runs in the Python VM at a time. In general, Python offers to switch among threads only between bytecode instructions; how frequently it switches can be set via sys.setswitchinterval(). Each bytecode instruction and therefore all the C implementation code reached from each instruction is therefore atomic from the point of view of a Python program.

In theory, this means an exact accounting requires an exact understanding of the PVM bytecode implementation. In practice, it means that operations on shared variables of built-in data types (ints, lists, dicts, etc) that “look atomic” really are.

For example, the following operations are all atomic (L, L1, L2 are lists, D, D1, D2 are dicts, x, y are objects, i, j are ints):

L.append(x)
L1.extend(L2)
x = L[i]
x = L.pop()
L1[i:j] = L2
L.sort()
x = y
x.field = y
D[x] = y
D1.update(D2)
D.keys()

These aren’t:

i = i+1
L.append(L[-1])
L[i] = L[j]
D[x] = D[x] + 1

Operations that replace other objects may invoke those other objects’ __del__() method when their reference count reaches zero, and that can affect things. This is especially true for the mass updates to dictionaries and lists. When in doubt, use a mutex!

Can’t we get rid of the Global Interpreter Lock ?

The global interpreter lock (GIL) is often seen as a hindrance to Python’s deployment on high-end multiprocessor server machines, because a multi-threaded Python program effectively only uses one CPU, due to the insistence that (almost) all Python code can only run while the GIL is held.

With the approval of PEP 703 work is now underway to remove the GIL from the CPython implementation of Python. Initially it will be implemented as an optional compiler flag when building the interpreter, and so separate builds will be available with and without the GIL. Long-term, the hope is to settle on a single build, once the performance implications of removing the GIL are fully understood. Python 3.13 is likely to be the first release containing this work, although it may not be completely functional in this release.

The current work to remove the GIL is based on a fork of Python 3.9 with the GIL removed by Sam Gross. Prior to that, in the days of Python 1.5, Greg Stein actually implemented a comprehensive patch set (the “free threading” patches) that removed the GIL and replaced it with fine-grained locking. Adam Olsen did a similar experiment in his python-safethread project. Unfortunately, both of these earlier experiments exhibited a sharp drop in single-thread performance (at least 30% slower), due to the amount of fine-grained locking necessary to compensate for the removal of the GIL. The Python 3.9 fork is the first attempt at removing the GIL with an acceptable performance impact.

The presence of the GIL in current Python releases doesn’t mean that you can’t make good use of Python on multi-CPU machines! You just have to be creative with dividing the work up between multiple processes rather than multiple threads. The ProcessPoolExecutor class in the new concurrent.futures module provides an easy way of doing so; the multiprocessing module provides a lower-level API in case you want more control over dispatching of tasks.

Judicious use of C extensions will also help; if you use a C extension to perform a time-consuming task, the extension can release the GIL while the thread of execution is in the C code and allow other threads to get some work done. Some standard library modules such as zlib and hashlib already do this.

An alternative approach to reducing the impact of the GIL is to make the GIL a per-interpreter-state lock rather than truly global. This was first implemented in Python 3.12 and is available in the C API. A Python interface to it is expected in Python 3.13. The main limitation to it at the moment is likely to be 3rd party extension modules, since these must be written with multiple interpreters in mind in order to be usable, so many older extension modules will not be usable.

How do I delete a file? (And other file questions…)

Use os.remove(filename) or os.unlink(filename); for documentation, see the os module. The two functions are identical; unlink() is simply the name of the Unix system call for this function.

To remove a directory, use os.rmdir(); use os.mkdir() to create one. os.makedirs(path) will create any intermediate directories in path that don’t exist. os.removedirs(path) will remove intermediate directories as long as they’re empty; if you want to delete an entire directory tree and its contents, use shutil.rmtree().

To rename a file, use os.rename(old_path, new_path).

To truncate a file, open it using f = open(filename, "rb+"), and use f.truncate(offset); offset defaults to the current seek position. There’s also os.ftruncate(fd, offset) for files opened with os.open(), where fd is the file descriptor (a small integer).

The shutil module also contains a number of functions to work on files including copyfile(), copytree(), and rmtree().

How do I copy a file ?

The shutil module contains a copyfile() function. Note that on Windows NTFS volumes, it does not copy alternate data streams nor resource forks on macOS HFS+ volumes, though both are now rarely used. It also doesn’t copy file permissions and metadata, though using shutil.copy2() instead will preserve most (though not all) of it.

How do I read (or write) binary data ?

To read or write complex binary data formats, it’s best to use the struct module. It allows you to take a string containing binary data (usually numbers) and convert it to Python objects; and vice versa.

For example, the following code reads two 2-byte integers and one 4-byte integer in big-endian format from a file:

import struct

with open(filename, "rb") as f:
    s = f.read(8)
    x, y, z = struct.unpack(">hhl", s)

The ‘>’ in the format string forces big-endian data; the letter ‘h’ reads one “short integer” (2 bytes), and ‘l’ reads one “long integer” (4 bytes) from the string.

For data that is more regular (e.g. a homogeneous list of ints or floats), you can also use the array module.

Note

To read and write binary data, it is mandatory to open the file in binary mode (here, passing "rb" to open()). If you use "r" instead (the default), the file will be open in text mode and f.read() will return str objects rather than bytes objects.

I can’t seem to use os.read() on a pipe created with os.popen(); why ?

os.read() is a low-level function which takes a file descriptor, a small integer representing the opened file. os.popen() creates a high-level file object, the same type returned by the built-in open() function. Thus, to read n bytes from a pipe p created with os.popen(), you need to use p.read(n).

How do I access the serial (RS232) port ?

For Win32, OSX, Linux, BSD, Jython, IronPython:

pyserial

For Unix, see a Usenet post by Mitch Chapman:

https://groups.google.com/groups?selm=34A04430.CF9@ohioee.com

Why doesn’t closing sys.stdout (stdin, stderr) really close it ?

Python file objects are a high-level layer of abstraction on low-level C file descriptors.

For most file objects you create in Python via the built-in open() function, f.close() marks the Python file object as being closed from Python’s point of view, and also arranges to close the underlying C file descriptor. This also happens automatically in f’s destructor, when f becomes garbage.

But stdin, stdout and stderr are treated specially by Python, because of the special status also given to them by C. Running sys.stdout.close() marks the Python-level file object as being closed, but does not close the associated C file descriptor.

To close the underlying C file descriptor for one of these three, you should first be sure that’s what you really want to do (e.g., you may confuse extension modules trying to do I/O). If it is, use os.close():

os.close(stdin.fileno())
os.close(stdout.fileno())
os.close(stderr.fileno())

Or you can use the numeric constants 0, 1 and 2, respectively.

What WWW tools are there for Python ?

See the chapters titled Internet Protocols and Support and Internet Data Handling in the Library Reference Manual. Python has many modules that will help you build server-side and client-side web systems.

A summary of available frameworks is maintained by Paul Boddie at https://wiki.python.org/moin/WebProgramming.

What module should I use to help with generating HTML ?

You can find a collection of useful links on the Web Programming wiki page.

How do I send mail from a Python script ?

Use the standard library module smtplib.

Here’s a very simple interactive mail sender that uses it. This method will work on any host that supports an SMTP listener.

import sys, smtplib

fromaddr = input("From: ")
toaddrs  = input("To: ").split(',')
print("Enter message, end with ^D:")
msg = ''
while True:
    line = sys.stdin.readline()
    if not line:
        break
    msg += line

# The actual mail send
server = smtplib.SMTP('localhost')
server.sendmail(fromaddr, toaddrs, msg)
server.quit()

A Unix-only alternative uses sendmail. The location of the sendmail program varies between systems; sometimes it is /usr/lib/sendmail, sometimes /usr/sbin/sendmail. The sendmail manual page will help you out. Here’s some sample code:

import os

SENDMAIL = "/usr/sbin/sendmail"  # sendmail location
p = os.popen("%s -t -i" % SENDMAIL, "w")
p.write("To: receiver@example.com\n")
p.write("Subject: test\n")
p.write("\n")  # blank line separating headers from body
p.write("Some text\n")
p.write("some more text\n")
sts = p.close()
if sts != 0:
    print("Sendmail exit status", sts)

How do I avoid blocking in the connect() method of a socket ?

The select module is commonly used to help with asynchronous I/O on sockets.

To prevent the TCP connect from blocking, you can set the socket to non-blocking mode. Then when you do the connect(), you will either connect immediately (unlikely) or get an exception that contains the error number as .errno. errno.EINPROGRESS indicates that the connection is in progress, but hasn’t finished yet. Different OSes will return different values, so you’re going to have to check what’s returned on your system.

You can use the connect_ex() method to avoid creating an exception. It will just return the errno value. To poll, you can call connect_ex() again later – 0 or errno.EISCONN indicate that you’re connected – or you can pass this socket to select.select() to check if it’s writable.

Note

The asyncio module provides a general purpose single-threaded and concurrent asynchronous library, which can be used for writing non-blocking network code. The third-party Twisted library is a popular and feature-rich alternative.

Are there any interfaces to database packages in Python ?

Yes.

Interfaces to disk-based hashes such as DBM and GDBM are also included with standard Python. There is also the sqlite3 module, which provides a lightweight disk-based relational database.

Support for most relational databases is available. See the DatabaseProgramming wiki page for details.

How do you implement persistent objects in Python ?

The pickle library module solves this in a very general way (though you still can’t store things like open files, sockets or windows), and the shelve library module uses pickle and (g)dbm to create persistent mappings containing arbitrary Python objects.

How do I generate random numbers in Python ?

The standard module random implements a random number generator. Usage is simple:

import random
random.random()

This returns a random floating-point number in the range [0, 1).

There are also many other specialized generators in this module, such as:

randrange(a, b) chooses an integer in the range [a, b).
uniform(a, b) chooses a floating-point number in the range [a, b).
normalvariate(mean, sdev) samples the normal (Gaussian) distribution.

Some higher-level functions operate on sequences directly, such as:

choice(S) chooses a random element from a given sequence.
shuffle(L) shuffles a list in-place, i.e. permutes it randomly.

There’s also a Random class you can instantiate to create independent multiple random number generators.