Jump to content

Python (programming language)

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Tompsci (talk | contribs) at 14:27, 14 January 2006 (AWB assisted clean up). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
File:PythonProgLogo.png
Paradigm: imperative, object-oriented
Appeared in: 1990
Designed by: Guido van Rossum
Developer: Python Software Foundation
Latest release version: 2.4.2
Latest release date: September 28, 2005
Typing discipline: strong, dynamic ("duck")
Major implementations: CPython, Jython, IronPython
Dialects: --
Influenced by: ABC, Perl, Lisp, Smalltalk, Tcl
Influenced: Ruby
Operating system: Cross-platform
License: Python Software Foundation License
Website: www.python.org

Python is an interpreted programming language created by Guido van Rossum in 1990. Python is fully dynamically typed and uses automatic memory management; it is thus similar to Perl, Ruby, Scheme, Smalltalk, and Tcl. Python is developed as an open source project, managed by the non-profit Python Software Foundation. Python 2.4.2 was released on September 28, 2005.

History

Python was created in the early 1990s by Guido van Rossum at Stichting Mathematisch Centrum (CWI) in the Netherlands as a successor of the ABC programming language. Guido is Python's principal author, although it includes many contributions from others. Guido's continuing central role in deciding the direction of Python is jokingly acknowledged by referring to him as its Benevolent Dictator for Life (BDFL).

The last version released from CWI was Python 1.2. In 1995, Guido continued his work on Python at the Corporation for National Research Initiatives (CNRI) in Reston, Virginia where he released several versions of the software. Python 1.6 was the last of the versions released by CNRI. In 2000, Guido and the Python core development team moved to BeOpen.com to form the BeOpen PythonLabs team. Python 2.0 was the first and only release from BeOpen.com.

Following the release of Python 1.6, and after Guido van Rossum left CNRI to work with commercial software developers, it became clear that the ability to use Python with software available under the GNU General Public License (GPL) was very desirable. CNRI and the Free Software Foundation (FSF) interacted to develop enabling wording changes to the Python license which would make Python's license GPL-compatible. That year, Guido was awarded the FSF Award for the Advancement of Free Software.

Python 1.6.1 is essentially the same as Python 1.6, with a few minor bug fixes, and with the new GPL-compatible license. Python 2.1 also includes this new license and is a derivative work of Python 1.6.1, as well as of Python 2.0. Current versions of the license are called the Python Software Foundation License.

After Python 2.0 was released by BeOpen.com, Guido van Rossum and the other PythonLabs developers joined Digital Creations. All intellectual property added from this point on, starting with Python 2.1 and its alpha and beta releases, is owned by the Python Software Foundation (PSF), a non-profit organization modeled after the Apache Software Foundation.

Python 3000

Python developers have an ongoing discussion of a future "Python 3000" that will break backwards compatibility with the 2.x series in order to repair perceived flaws in the language. The guiding principle is to "reduce feature duplication by removing old ways of doing things". There is no definite schedule for Python 3000, but a Python Enhancement Proposal that details planned changes exists. [1]

Among other changes, Python 3000 would not support functional programming constructs such as lambda, map, filter or reduce, the rationale being that map and filter are equivalent to list comprehensions in power, lambda is irrelevant with the advent of nested functions, and that reduce is incomprehensible to the majority of programmers who do not have a functional programming background. Also Python 3000 would add optional static typing, remove "classic classes", and usually replace immediate sequences with iterators.

Philosophy

Python is a multi-paradigm language. This means that, rather than forcing coders to adopt one particular style of coding, it permits several. Object orientation, structured programming, functional programming, aspect-oriented programming, and more recently, design by contract are all supported. Python is dynamically type-checked and uses garbage collection for memory management. An important feature of Python is dynamic name resolution, which binds method and variable names during program execution.

While offering choice in coding methodology, Python's designers reject exuberant syntax, such as in Perl, in favor of a more sparse, less cluttered one. As with Perl, Python's developers expressly promote a particular "culture" or ideology based on what they want the language to be, favoring language forms they see as "beautiful", "explicit" and "simple". For the most part, Perl and Python users differ in their interpretation of these terms and how they are best implemented (see TIMTOWTDI and PythonPhilosophy).

Another important goal of the Python developers is making Python fun to use. This is reflected in the origin of the name (after the television series Monty Python's Flying Circus), in the common practice of using Monty Python references in example code, and in an occasionally playful approach to tutorials and reference materials.

Python is sometimes referred to as a "scripting language". In practice, it is used as a dynamic programming language for both application development and occasional scripting. Python has been used to develop many large software projects such as the Zope application server and the Mnet and BitTorrent file sharing systems. It is also extensively used by Google. [2]

Another important goal of the language is ease of extensibility. New built-in modules are easily written in C or C++. Python can also be used as an extension language for existing modules and applications that need a programmable interface.

Though the designer of Python is somewhat hostile to functional programming and the Lisp tradition, there are significant parallels between the philosophy of Python and that of minimalist Lisp-family languages such as Scheme. Many past Lisp programmers have found Python appealing for this reason.

Usage

Syntax

Python was designed to be a highly readable language. It has a simple visual layout, uses English keywords frequently where other languages use punctuation, and has notably fewer syntactic constructions than many structured languages such as C, Perl, or Pascal.

For instance, Python has only two structured loop forms:

  1. for item in iterator:, which loops over elements of a list or iterator
  2. while expression:, which loops as long as a boolean expression is true.

It thus forgoes the more complex, C-style for (initialize; end condition; increment) syntax (common in many popular languages) And it does not have any of the common alternative loop syntaxes such as do...while, repeat until, etc. though of course equivalents can be expressed. Likewise, it has only if...elif...else for branching -- no switch or labeled goto (goto was implemented as a joke for 1 April 2004, in an add-on module).

Indentation

One unusual aspect of Python's syntax is its use of whitespace to delimit program blocks (the off-side rule). Sometimes termed "the whitespace thing", it is one aspect of Python syntax that many programmers otherwise unfamiliar with Python have heard of, since it is nearly unique among currently widespread languages.

In so-called "free-format" languages, that use the block structure ultimately derived from ALGOL, blocks of code are set off with braces ({ }) or keywords. In all these languages, however, programmers conventionally indent the code within a block, to set it off visually from the surrounding code.

Python, instead, borrows a feature from the lesser-known language ABC—instead of punctuation or keywords, it uses this indentation itself to indicate the run of a block. A brief example will make this clear. Here are C and Python recursive functions which do the same thing—computing the factorial of an integer:

Factorial function in C:

int factorial(int x)
{
    if (x == 0) 
        return 1;
    else 
        return x * factorial(x-1);
}

Factorial function in Python:

def factorial(x):
    if x == 0:
        return 1
    else:
        return x * factorial(x-1)

Some programmers used to ALGOL-style languages, in which whitespace is semantically empty, find this confusing or even offensive. A few have drawn unflattering comparison to the column-oriented style used on punched-card Fortran systems. When ALGOL was new, it was a major development to have "free-form" languages in which only symbols mattered and not their position on the line.

To Python programmers, however, "the whitespace thing" is simply the enforcement of a convention that programmers in ALGOL-style languages already follow anyway. They also point out that the free-form syntax has the disadvantage that, since indentation is ignored, good indentation cannot be enforced. Thus, incorrectly indented code may be misleading, since a human reader and a compiler could interpret it differently. Here is an example:

Misleading indentation in C:

for (i = 0; i < 20; ++i)
   a();
   b();
   c();

This code is intended to call functions a(), b(), and c() 20 times—and at first glance, that's what it appears to do. Actually it calls a() 20 times, and then calls b() and c() one time each. This sort of mistake can be very hard to spot when reading code. In Python, if the indentation looks correct, it is correct.

The whitespace thing has minor disadvantages. Both space characters and tab characters are currently accepted as forms of indentation. Since they are not visually distinguishable (in many tools), mixing spaces and tabs can create bugs that are particularly difficult to find (a perennial suggestion among Python users has been removing tabs as block markers—except, of course, among those Python users who propound removing spaces instead).

Because whitespace is syntactically significant, it is not always possible for a program to automatically correct the indentation on Python code as can be done with C or Lisp code. Moreover, formatting routines which remove whitespace—for instance, many Internet forums—can completely destroy the syntax of a Python program, whereas a program in a bracketed language would merely become more difficult to read.

Data structures

Since Python is a dynamically typed language, Python values, not variables, carry type. This has implications for many aspects of the way the language functions.

All variables in Python hold references to objects, and these references are passed to functions by value; a function cannot change the value a variable references in its calling function. Some people (including Guido van Rossum himself) have called this parameter-passing scheme "Call by object reference."

Among dynamically typed languages, Python is moderately type-checked. Implicit conversion is defined for numeric types, so one may validly multiply a complex number by a long integer (for instance) without explicit casting. However, there is no implicit conversion between (e.g.) numbers and strings; a string is an invalid argument to a mathematical function expecting a number.

Base types

Python has a broad range of basic data types. Alongside conventional integer and floating point arithmetic, it transparently supports arbitrary-precision arithmetic and complex numbers.

It supports the usual panoply of string operations, with one caveat: strings in Python are immutable objects. This means that any string operation, such as a substitution of characters, that in other programming languages might alter a string will instead return a new string in Python. While this at first sight appears to be a limitation, it in fact allows programmers to write code that is much more readable, maintainable and efficient, as they never have to worry about unwanted or unexpected modifications to strings by other parts of the program.

Collection types

One of the very useful aspects of Python is the concept of collection (or container) types. In general a collection is an object that contains other objects in a way that is easily referenced or indexed. Collections come in two basic forms: sequences and mappings.

The ordered sequential types are lists (dynamic arrays), tuples, and strings. All sequences are indexed positionally (0 through length − 1) and all but strings can contain any type of object, including multiple types in the same sequence. Both strings and tuples are immutable, making them perfect candidates for dictionary keys (see below). Lists, on the other hand, are mutable; elements can be inserted, deleted, modified, appended, or sorted in place.

On the other side of the collections coin are mappings, which are unordered types implemented in the form of dictionaries which "map" a set of immutable keys, to corresponding elements much like a mathematical function. The keys in a dictionary must be of an immutable Python type such as an integer or a string. For example, one could define a dictionary having a string "foo" mapped to the integer 42 or vice versa. This is done under the covers via a hash function which makes for faster lookup times, but is also the culprit for a dictionary's lack of order and is the reason mutable objects (i.e. other dictionaries or lists) cannot be used as keys. Dictionaries are also central to the internals of the language as they reside at the core of all Python objects and classes: the mapping between variable names (strings) and the values which the names reference is stored as a dictionary (see Object system). Since these dictionaries are directly accessible (via an object's __dict_ attribute), meta-programming is a surprisingly straightforward and natural process in Python.

A set collection type was added to the core language in version 2.4. A set is an unindexed, unordered collection that contains no duplicates. This container type has many applications where only membership information is required and acts essentially like a dictionary without values. There are two types of sets: set and frozenset, the only difference being that set is mutable and frozenset is immutable. Elements in a set must be hashable and immutable. Thus, for example, a frozenset can be an element of a regular set whereas the opposite is not true.

Python also provides extensive collection manipulating abilities such as built in containment checking and a generic iteration protocol.

Object system

In Python, everything is an object, even classes. Classes, as objects, have a class, which is known as their metaclass. Python also supports multiple inheritance and mixins (see also MixinsForPython).

The language supports extensive introspection of types and classes. Types can be read and compared— types are instances of a type. The attributes of an object can be extracted as a dictionary.

Operators can be overloaded in Python by defining special member functions—for instance, defining __add__ on a class permits one to use the + operator on members of that class.

Operators

Comparison operators

The basic comparison operators such as ==, <, >=, and so forth, are used on all manner of values. Numbers, strings, sequences, and mappings can all be compared. Objects of dissimilar type (such as a string and a number) can be compared; the result is arbitrary, but consistent.

Chained comparison expressions such as a < b < c have roughly the meaning that they have in mathematics, rather than the unusual meaning found in C and similar languages. The terms are evaluated and compared in order. The operation is short circuit, meaning that evaluation stops as soon as the expression is proven false: if a < b is false, c is never evaluated.

For expressions without side effects, a < b < c is equivalent to a < b and b < c. However, there is a substantial difference when the expressions have side effects. a < f(x) < b will evaluate f(x) exactly once, whereas a < f(x) and f(x) < b may evaluate it once or twice.

Logical operators

Python 2.2 and earlier does not have an explicit boolean type. In all versions of Python, boolean operators treat zero values or empty values such as "", 0, None, 0.0, [], and {} as false, while in general treating non-empty, non-zero values as true. In Python 2.2.1 the boolean constants True and False were added to the language. Thus, the binary comparison operators such as == and >, return either True or False. The lazily evaluated boolean operations, and and or, return the value of the last evaluated subexpression, in order to preserve backwards compatibility. For example, as of Python 2.2.1 the expression (2 == 2) evaluates to True, the expression (4 and 5) evaluates to 5, and the expression (4 or 5) evaluates to 4.

Functional programming

As mentioned above, another strength of Python is the availability of a functional programming style. As may be expected, this makes working with lists and other collections much more straightforward. One such construction is the list comprehension, as seen here in calculating the first five powers of two:

numbers = [1, 2, 3, 4, 5]
powers_of_two = [2**n for n in numbers]

The Quicksort algorithm can be expressed elegantly using list comprehensions:

def qsort(L):
  if L == []: return []
  return qsort([x for x in L[1:] if x< L[0] ]) + L[0:1] + \
         qsort([x for x in L[1:] if x>=L[0] ])

Although execution of this naïve form of Quicksort is less space-efficient than forms which alter the sequence in-place, it is often cited as an example of the expressive power of list comprehensions.

First-class functions

In Python, functions are first-class objects that can be created and passed around dynamically.

Python's lambda construct can be used to create anonymous functions within expressions. Lambdas are however limited to containing expressions; statements can only be used in named functions created with the def statement. (However, any type of control flow can in principle be implemented within lambda expressions[3] by short-circuiting the and and or operators.)

Closures

Python has had support for lexical closures since version 2.2. Python's syntax, though, sometimes leads programmers of other languages to think that closures are not supported. Since names are bound locally, the trick to creating a closure is using a mutable container within enclosing scope. Many Python tutorials explain this usage, but it is an atypical style in Python programs.

Generators

Introduced in Python 2.2 as an optional feature and finalized in version 2.3, generators are Python's mechanism for lazy evaluation of a function that would otherwise return a space-prohibitive or computationally intensive list.

This is an example to lazily generate the prime numbers:

import sys
def generate_primes(max=sys.maxint):
    primes = []
    n = 2
    while n < max:
        composite = False
        for p in primes:
            if not n % p:
                composite = True
                break
            elif p**2 > n: 
                break
        if not composite:
            primes.append(n)
            yield n
        n += 1

To use this function simply call, e.g.:

for i in generate_primes():  # iterate over ALL primes
    if i > 100: break
    print i,

The definition of a generator appears identical to that of a function, except the keyword yield is used in place of return. However, a generator is an object with persistent state, which can repeatedly enter and leave the same dynamic extent. A generator call can then be used in place of a list, or other structure whose elements will be iterated over. Whenever the for-loop in the example requires the next item, the generator is called, and yields the next item.

Generator expressions

Introduced in Python 2.4, generator expressions are the lazy evaluation equivalent of list comprehensions. Either you could write a specific generator for it

def generate_ints(N):
   for i in xrange(N):
      yield i
for x in generate_ints(100):
   print x

or write a slightly more concise

for x in (i for i in xrange(100)):
   print x

Note that the example given is purely to demonstrate generator expressions- since xrange is an iterable itself, for the example above

for x in xrange(100):
   print x

is actually simpler.

Objects

Python's support for object oriented programming paradigm is vast. It supports polymorphism, not only within a class hierarchy but also by duck typing. Any object can be used for any type, and it will work so long as it has the proper methods and attributes. And everything in Python is an object, including classes, functions, numbers and modules. Python also has support for metaclasses, an advanced tool for enhancing classes' functionality. Naturally, inheritance, including multiple inheritance, is supported. It has limited support for private variables using name mangling. See the "Classes" section of the tutorial for details. Many Python users don't feel the need for private variables, though. The slogan "We're all consenting adults here" is used to describe this attitude. Some consider information hiding to be unpythonic, in that it suggests that the class in question contains unaesthetic or ill-planned internals.

From the tutorial: As is true for modules, classes in Python do not put an absolute barrier between definition and user, but rather rely on the politeness of the user not to "break into the definition."

OOP doctrines such as the use of accessor methods to read data members are not enforced in Python. Just as Python offers functional-programming constructs but does not attempt to demand referential transparency, it offers (and extensively uses!) its object system but does not demand OOP behavior. Moreover, it is always possible to redefine the class using properties so that when a certain variable is set or retrieved in calling code, it really invokes a function call, so that foo.x = y might really invoke foo.set_x(y). This nullifies the practical advantage of accessor functions, and it remains OOP because the property 'x' becomes a legitimate part of the object's interface: it need not reflect an implementation detail.

In version 2.2 of Python, "new-style" classes were introduced. With new-style classes, objects and types were unified, allowing the subclassing of types. Even new types entirely can be defined, complete with custom behavior for infix operators. This allows for many radical things to be done syntactically within Python. A new multiple inheritance model was adopted with new-style classes, making a much more logical order of inheritance. The new method __getattribute__ was also defined for unconditional handling of attribute access.

For a useful introduction to the principles of object oriented programming, read Introduction to OOP with Python.

Exceptions

Python supports (and extensively uses) exception handling as a means of testing for error conditions and other "exceptional" events in a program. Indeed, it is even possible to trap the exception caused by a syntax error.

Python style calls for the use of exceptions whenever an error condition might arise. Indeed, rather than testing for access to a file or resource before actually using it, it is conventional in Python to just go ahead and try to use it, catching the exception if access is rejected.

Exceptions can also be used as a more general means of non-local transfer of control, even when an error is not at issue. For instance, the Mailman mailing list software, written in Python, uses exceptions to jump out of deeply-nested message-handling logic when a decision has been made to reject a message or hold it for moderator approval.

Exceptions are often, especially in threaded situations, used as an alternative to the if-block. A commonly-invoked motto is EAFP, or "It is Easier to Ask for Forgiveness than to ask for Permission." In this first code sample, there is an explicit checks for the attribute (i.e., "asks permission"):

if hasattr(foo, 'bar'):
    baz = foo.bar
else:
    handle_error()

This second sample follows the EAFP paradigm:

try:
    baz = foo.bar
except AttributeError:
    handle_error()

These two code samples have the same effect, although there will be performance differences. When foo has the attribute bar, the EAFP sample will run faster. When foo does not have the attribute bar (the "exceptional" case), the EAFP sample will run significantly slower. The Python programmer usually writes for code readability first, then uses Python's code profiling tools for peformance analysis to determine if further optimization is required. In most cases, the EAFP paradigm results in faster and more readable code.

Comments and docstrings

Python has two ways to annotate Python code. One is by using comments to indicate what some part of the code does.

 def getline():
     return sys.stdin.readline()       # Get one line and return it

Comments begin with the hash character ("#") and are terminated by the end of line. Python does not support comments that span more than one line. The other way is to use docstrings (documentation string), that is a string that is located alone without assignment as the first line within a module, class, method or function. Such strings can be delimited with " or ' for single line strings, or may span multiple lines if delimited with either """ or ''' which is Python's notation for specifying multi-line strings. However, the style guide for the language specifies that triple double quotes (""") are preferred for both single and multi-line docstrings.

Single line docstring:

 def getline():
     """Get one line from stdin and return it."""
     return sys.stdin.readline()

Multi-line docstring:

 def getline():
     """Get one line
        from stdin
        and return it."""
     return sys.stdin.readline()

Docstrings can be as large as the programmer wants and contain line breaks (if multi-line strings are used). In contrast with comments, docstrings are themselves Python objects and are part of the interpreted code that Python runs. That means that a running program can retrieve its own docstrings and manipulate that information. But the normal usage is to give other programmers information about how to invoke the object being documented in the docstring.

There are tools available that can extract the docstrings to generate an API documentation from the code. Docstring documentation can also be accessed from the interpreter with the help() function, or from the shell with the pydoc command.

Resources

Implementations

The standard Python interpreter also supports an interactive mode in which it acts as a kind of shell: expressions can be entered one at a time, and the result of their evaluation is seen immediately. This is a boon for those learning the language and experienced developers alike: snippets of code can be tested in interactive mode before integrating them into a proper program.

Python also includes a unit testing framework for creating exhaustive test suites. While static typing aficionados see this as a replacement for a static type-checking system, Python programmers largely do not share this view.

Standard Python does not support continuations (and never will, according to Guido van Rossum), but there is a variant known as Stackless Python that does. However, support for coroutines (based on generators) is planned, see [4].

Standard library

Python comes with "batteries included"

Python has a large standard library, which makes it well suited to many tasks. This comes from a so-called "batteries included" philosophy for Python modules. The modules of the standard library can be augmented with custom modules written in either C or Python. The standard library is particularly well tailored to writing Internet-facing applications, with a large number of standard formats and protocols (such as MIME and HTTP) supported. Modules for creating graphical user interfaces, connecting to relational databases, arithmetic with arbitrarily precise decimals, and manipulating regular expressions are also included.

The standard library is one of Python's greatest strengths. The bulk of it is cross-platform compatible, meaning that even heavily leveraged Python programs can often run on Unix, Windows, Macintosh, and other platforms without change.

It is currently being debated whether or not third-party but open source Python modules such as Twisted, NumPy, or wxPython should be included in the standard library, in accordance with the batteries included philosophy.

Availability

Supported platforms

The most popular (and therefore best maintained) platforms Python runs on are Linux, BSD, Mac OS X, Microsoft Windows and Java (this JVM version is a separate implementation). Other supported platforms include:

Most of the third-party libraries for Python (and even some first-party ones) are only available on Windows, Linux, BSD, and Mac OS X.

Python was originally developed as a scripting language for the Amoeba operating system capable of making system calls; that version is no longer maintained.

Neologisms

A few neologisms have come into common use within the Python community. One of the most common is "pythonic", which can have a wide range of meanings related to program style. To say that a piece of code is pythonic is to say that it uses Python idioms well; that it is natural or shows fluency in the language. Likewise, to say of an interface or language feature that it is pythonic is to say that it works well with Python idioms; that its use meshes well with the rest of the language.

In contrast, a mark of unpythonic code is that it attempts to "write C++ (or Lisp, or Perl) code in Python"—that is, provides a rough transcription rather than an idiomatic translation of forms from another language.

The prefix Py- can be used to show that something is related to Python. Examples of the use of this prefix in names of Python applications or libraries include Pygame, a binding of SDL to Python (commonly used to create games), PyUI, a GUI encoded entirely in Python, and PySol, a series of solitaire card games programmed in Python.

Users and admirers of Python—most especially those considered knowledgeable or experienced—are often referred to as Pythonists, Pythonistas, and Pythoneers.

References

Books

Journals

  • Py, "The Python Online Technical Journal".

Resources

Non-English resources

Python 3000

Template:Major programming languages small

Template:Link FA