Overview of python.

Introduction:

          Python is a popular, free, cross-platform, open-source computer programming language that is in wide use. It has no licensing restrictions that would prevent its use in commercial projects. It has a rich set of libraries for scientific and technical applications. Support, tutorials and documentation are widely available.

    Python is one of a class of languages that began as simple scripting languages but evolved over time to become more powerful languages with compilers and libraries — Perl and Ruby are other examples. To indicate the informal way they evolved, I call these "bottom-up" languages. Each of them has advantages and drawbacks, but like many technical issues, there is a snowball effect — as a language's popularity increases, this motivates people to write application libraries and dedicated development environments, which increases its popularity further.

    Some will argue that this is a rather uncivilized way to create computer languages, and a more formal, "top-down" approach should produce better results. There are examples of language designs like C, C++ and Java that have advantages in structure and syntax because architecture and consistent design were (to some extent) considered in advance of practical embodiments.


    The advantages of "top-down" language design include a more formal way to manage things like classes, a structure conducive to compilation and rational memory management, and a consistent, reliable way to link with other modules. The advantages of "bottom-up" languages like Python include the existence of an accessible, interpreted form for experimentation, and very quick development times.

    This is not to say that languages like Python don't have drawbacks when used for serious development work. Python's managment of classes is an example — every function within a class needs to identify itself as belonging to the class that encloses it in a rather peculiar way, and each function call within a class must also be written in a way that only reveals the degree to which classes were an afterthought (more on this topic later).

    There is one aspect of Python that, all by itself, prevented me from adopting it for many years — the absence of block tokens, an unambiguous way for a programmer (or a syntax checking algorithm) to identify a logical block. In most languages, there is a clear, unambiuguous way to structure a program's logical blocks:

    C/C++/Java:

    if (condition == x) {
      result = option(y);
      if (result == z) {
        process(a);
      }
      process(b);
    }
    process(c);
             

    Ruby:

    if (condition == x)
      result = option(y)
      if (result == z)
        process(a)
      end
      process(b)
    end
    process(c)
             

    Python:

    if (condition == x):
      result = option(y)
      if (result == z):
        process(a)
      process(b)
    process(c)
             

    Yes, the final example means just what it appears to — in Python, because of the absence of block delimiting tokens, whitespace is syntactically significant, and if I change the indentation of any line, the program's meaning changes. This means creating a syntax checker / beautifier is difficult and of limited usefulness. Indeed, in Python, there are some things a syntax checker must never do. If there is a code block or line that has the wrong indentation, this changes the meaning of the program, as a result of which source code editors and beautifiers must never change indentations.

    From time to time I have a nightmare in which someone applies a filter to a directory of Python source files, removing all the leading white space — I wake up in a cold sweat. A group of Python source files filtered in that way might as well be thrown away, but source files for C, C++, Java, JavaScript, Ruby and other languages can be quickly and unambiguously reformatted.

    This issue kept me from using Python for many years, but as time passed, I've gotten involved with a number of projects that required some knowledge of Python — Sage and Blender among others — as a result of which Python was more or less forced on me. For example, in order to write my Sage tutorial, I had to create quite a lot of Python code, and now that I'm starting to use Blender (a ray-tracing and graphic modeling environment), I've discovered that it also uses Python.

    I personally won't expend energy on a computer language unless I can write a "beautifier" for it, a way to automatically clean up source code files. My Python beautifier can't really do much compared to its predecessors, but it has its uses (explanation later) and it works within the limitations of the language — primarily the fact that the program's meaning is determined by the indentation of the lines. For example, in the Python code snippet above, changing the indentation of the three final lines would change the meaning of the program, not just its appearance.

    Here are some of Python's features:

        Accessible interactively on the command line, by way of interpreted scripts, and in some compiled forms.
        Easy learning curve to the level of useful programs, harder after that (typical for modern languages).
        Support for all the expected properties and libraries of a modern language — regular expressions, classes, graphics, scientific and technical libraries, graphical user interfaces, portability between platforms.
        Interpreted scripts compiled into bytecode before execution to inprove speed and efficiency.
        Some Python development environments for serious work and GUI design, as well as syntax/indentation support in most programming ediitors.
        Plenty of readily available documentation.

Quick Tour

    Readers may simply browse this section, but I recommmend that people download and install Python to be able to run the examples firsthand.

    Open a shell session (indicated by a '$' prompt) and type "python" to start an interactive session (user entries are in blue):

    $ python
    Python 2.6.4 (r264:75706, Jun  4 2010, 18:20:16)
    [GCC 4.4.4 20100503 (Red Hat 4.4.4-2)] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>>
             

    The ">>>" string is Python's prompt for interactive user entries.

    Long Integers

    Now type this:

    >>> 111111111**2
    12345678987654321L
    >>>
             

    What does this tell us? We typed eleven "1"s followed by "**2" which is how you tell Python to square the number to its left. The result, "12345678987654321L", means that Python automatically switched to a numerical mode that isn't limited by your computer's native integer data size. In fact, the size of integers is only limited by your computer's memory:

    >>> 111111111**11
    31866355102719439692709575611832245125767178743323754858490688959195755275492295598602711L
    >>>
             

    >>> 111111111**111
    11997241580139700753317722306541179696019537945276677076620236371271367738530816272427293245
    36044189785106590881907871589130011785279768267350151806505206291147217916354548236760857171
    58345945071801061169908649699656875046803888091118822420445243239281252464917711608768124665
    00043506858289314436574268356926519770457593308469952173691331994280973613468226937436046635
    29154170932905626164217324294276789230868612486289475029336445992904460558155636629618461983
    93063655793401359040660390618941657378193647262405747743293028881358033403212918442648709014
    92567835554908020509498870115766477236087661882191422576442109159919614878846924042427291286
    52340825486056145043201441623544527642732118816973915240715987481283448945380735612004616582
    51609207270501435418753039853438608270414558491772799555462255286036384211202553997802177952
    587642973236246989126832243739532931275812149042894103594586318711L
    >>>
             

    I think my readers can see where this is going, and readers should feel free to create absurdly long integers.

    Big Floating-point Numbers

    There is a Python module that allows the above sort of extended precision for floating-point numbers — it's called "mpmath". One can acquire it here or by way of the usual package management utilities under the name "python-mpmath". Here is an example of its use:

    >>> from mpmath import *
    >>> mp.dps = 200
    >>> print pi
    3.141592653589793238462643383279502884197169399375105820974944592307816406286208998628034825
    34211706798214808651328230664709384460955058223172535940812848111745028410270193852110555964
    4622948954930382
    >>> print e
    2.718281828459045235360287471352662497757247093699959574966967627724076630353547594571382178
    52516642742746639193200305992181741359662904357290033429526059563073813232862794349076323382
    9880753195251019
    >>> print sqrt(2)
    1.414213562373095048801688724209698078569671875376948073176679737990732478462107038850387534
    32764157273501384623091229702492483605585073721264412149709993583141322266592750559275579995
    05011527820605715
    >>> print 80.0/81.0
    0.987654320988
    >>> print mpf(80)/81
    0.987654320987654320987654320987654320987654320987654320987654320987654320987654320987654320
    98765432098765432098765432098765432098765432098765432098765432098765432098765432098765432098
    765432098765432099
    >>>
             

    The command "mp.dps = 200" sets the number of decimal places in the mantissa (and it can be anything within your computer's memory capacity). The fourth example shows that, unlike mathematical constants and numbers submitted to mpmath's special set of functions, bare numbers need to be created as mpmath floats (mpf(n)) in order to show extended precision.

    I won't display this next result, but I just set mp.dps = 100,000 and printed the value of pi — my system took about seven seconds to produce this result. Pretty impressive.

    Plotting

    In this next example, we will plot some functions using the mpmath module we loaded above. I emphasize there is no shortage of Python modules that support plotting, and I strongly recommend SciPy and related modules in this connection, as well as for its mathematical content.

    >>> plot([cos, sin], [-4, 4]) (click here for graphic)
    >>> plot([fresnels, fresnelc], [-4, 4]) (click here for graphic)
    >>> plot([lambda x:exp(-(x*x)),lambda x:exp(-(x*x)) * sin(x*3*pi)**2] ,[-2,2])
        (click here for graphic)
             

    Python Objects


    It's time to talk about objects — in Python, everything is an object, and each object has a type. This is both important and useful. (The following interactive Python session isn't a continuation of the above session — there is no mpmath module loaded.)

    >>> type(1)
    <type 'int'>
    >>> type (1.0)
    <type 'float'>
    >>> type(111111111**2)
    <type 'long'>
    >>> type (pi)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    NameError: name 'pi' is not defined
    >>> import math
    >>> type(math)
    <type 'module'>
    >>> type(math.pi)
    <type 'float'>
    >>> math.pi
    3.1415926535897931
    >>>
             

    We examined the type of a few objects, then we "imported" the math module and examined its type. Remember that importing modules is how one extends Python's abilities, and any but the simplest math functions are part of the math module. If you try to use a common math function and Python tells you it cannnot find it, chances are it's because you haven't yet imported the math module: "import math".

    There are two primary ways to import a module's contents — we can say "import (module name)", or we can say "from (module name) import (names)". Here is an example of the difference:

    >>> sqrt(2)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    NameError: name 'sqrt' is not defined
    >>> math.sqrt(2)
    1.4142135623730951
    >>>
             

    It seems we must prefix the module name to each math function. But this is because we earlier said "import math". If we say "from math import *", the outcome is different:

    >>> from math import *
    >>> sqrt(2)
    1.4142135623730951
    >>> pi
    3.1415926535897931
    >>>
             

    Each approach has advantages and drawbacks. If we say "import math" and always prefix the math functions with the module name "math", then we will always know which function we're calling — what module it comes from. In larger, more complex programs, or in a case where we have both "math" and "mpmath" modules loaded, this may be very important. The convenience of being able to type "sqrt(2)" instead of "math.sqrt(2)" may be undermined by a confusion about names and their origins as programs become more complex.

    Lists


    Now let's play with lists. It turns out that lists are a very powerful part of Python (and most modern languages). If you are accustomed to thinking of lists as arrays, that's fine, but "list" is the preferred term and ... it's one syllable easier to say. Imagine the energy saved over a period of years!

    Let's make a simple list. Enter:

    >>> a = range(1,13)
    >>> a
    [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
    >>> type(a)
    <type 'list'>
    >>>
             

        We used the "range" function to create a list of 12 members with values 1 through 12.
        Then we looked at it by entering the list's name: "a".
        Then we asked Python what the type of "a" is, and Python identified it as a list.

    Here are some examples of things we can do with lists:

    >>> a = range(1,13)
    >>> a
    [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
    >>> for n in a: print "%2d: %3d" % (n,n*n)
    ...
     1:   1
     2:   4
     3:   9
     4:  16
     5:  25
     6:  36
     7:  49
     8:  64
     9:  81
    10: 100
    11: 121
    12: 144
    >>>
             

        We accessed each of the list's members in a way that avoids a numerical index: "for n in a:".
        We printed each member in two ways: as the original number and as that number multiplied by itself.
        Those who have used other computer languages may recognize the formatting string for "print" — it contains a specification for the style of each item to be printed: "%2d: %3d" (the syntax originates in a C/C++ function called "printf()").
        The specification is followed by a percent sign, then the variables to be printed enclosed in parentheses: "(n,n*n)".
        The variables enclosed in parentheses have a special name. What is it?

    >>> type((n,n*n))
    <type 'tuple'>
    >>>
             

    A "tuple". Okay. When I first heard this name, I thought there would be a different name for each grouping based on size — two members would be called a "tuple," three members would be a "triple," four members would be a "quadruple." But no — regardless of their size, all these parenthesized entities are called "tuples".

    Tuples — "(1,2,3,4)" — are much like lists — "[1,2,3,4]" — but tuples can't be changed once they're created. This makes them a good choice for static data, or for a case where we need to be sure the content won't change after we create it.

    List Indexing


    There are some subtle ways to access the contents of lists — to slice them up and take only the parts we want. Let's create a list and access its contents:

    >>> a = ["dog","cat","bird","penguin"]
    >>> a
    ['dog', 'cat', 'bird', 'penguin']
    >>> a[0]
    'dog'
    >>> a[3]
    'penguin'
    >>>
             

    The above example shows that List indexing is zero-based, meaning the indices for our four-element list are 0,1,2,3.

    >>> a[2:2]
    []
    >>> a[2:3]
    ['bird']
    >>> a[1:3]
    ['cat', 'bird']
    >>>
             

    For this two-argument access method, the first value is the index of the desired first element, and the second value is the index for the last desired element plus one.

    >>> a[:2]
    ['dog', 'cat']
    >>> a[2:]
    ['bird', 'penguin']
    >>>
              

    By leaving off one argument, we say we want all the members in that direction — "[:n]" means "all from the beginning to n-1" and "[n:]" means "all from n to the end".

    >>> a[-1]
    'penguin'
    >>>
             

    Surprised? The idea of negative arguments is that we won't necessarily know how long a list is, but we know we want an element near the end. To do this, we provide a negative number, meaning "the end of the list - n".

    >>> a[1:-1]
    ['cat', 'bird']
    >>>
              

    The above is how one accesses all the list's members except the first and the last.

    >>> a[::-1]
    ['penguin', 'bird', 'cat', 'dog']
    >>>
             

    The above is a simple way to make a copy of a list with the elements in reverse order.

    There are many similar operations — the reader should feel free to experiment with different indexing methods. Errors are harmless, and there are any number of variations on the above examples.

    List Comprehensions


    It may seem that I'm dwelling a long time on lists, but they're very important in program design, so it's time well spent. List comprehensions are operations on lists that create other lists, in various useful ways:

    >>> a = range(1,9)
    >>> a
    [1, 2, 3, 4, 5, 6, 7, 8]
    >>> [(x,x*x) for x in a]
    [(1, 1), (2, 4), (3, 9), (4, 16), (5, 25), (6, 36), (7, 49), (8, 64)]
    >>> [2**n for n in a]
    [2, 4, 8, 16, 32, 64, 128, 256]
    >>>

    That's the basic idea of a list comprehension — you create a new list out of an old one, or from the result of a sequence or range, plus any transformations you care to create. Here's a more exotic example:

    >>> [(c,ord(c)) for c in list("zygote")]
    [('z', 122), ('y', 121), ('g', 103), ('o', 111), ('t', 116), ('e', 101)]
    >>>

    Here is an example that nests list comprehensions to create a two-dimensional list:

    >>> [[x*y for x in range(1,13)] for y in range(1,13)]
    [[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
    [2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24],
    [3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36],
    [4, 8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48],
    [5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60],
    [6, 12, 18, 24, 30, 36, 42, 48, 54, 60, 66, 72],
    [7, 14, 21, 28, 35, 42, 49, 56, 63, 70, 77, 84],
    [8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96],
    [9, 18, 27, 36, 45, 54, 63, 72, 81, 90, 99, 108],
    [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120],
    [11, 22, 33, 44, 55, 66, 77, 88, 99, 110, 121, 132],
    [12, 24, 36, 48, 60, 72, 84, 96, 108, 120, 132, 144]]
    >>>

Python Scripts


    To create a Python script, first create a plain-text file, give it a suffix of ".py" and enter these lines:

    #!/usr/bin/env python
    # -*- coding: utf-8 -*-

    print "Hello World!"
             

    The above example script includes the standard heading for Python scripts. If you want your Python programs to run anywhere, use this heading. Strictly speaking, the first line is not needed on Windows, but the script won't necessarily run on Unix/Linux platforms without it, so it's a good idea to keep it. The second line is also a good idea — it assures compatibility with international character sets.

    On Unix/Linux platforms, remember to make your Python scripts executable before trying to run them. Let's say the above script file has been named "first_program.py":

    $ chmod +x *.py
    $ ./first_program.py
    Hello World!
    $
             

    Common Program Operations

    Here are some brief script examples (without the above needed header lines) showing how to do useful things.

    A keyboard input example:

    line = ""

    while (line != "q"):
      line = raw_input("Type someting (q = quit): ")
      print "You typed \"%s\"." % line
             

    Here are function definitions to read and write text files:

    def write_entire_file(path,data):
      with open(path,'w') as f:
        f.write(data)

    def read_entire_file(path):
      with open(path) as f:
        return f.read()
       
    def read_file_lines(path):
      with open(path) as f:
        for n, line in enumerate(f.readlines()):
          print "Line %3d: %s" % (n+1,line.strip())
             

    Here is a function that accepts a list as an argument and prints some formatted results:

    import math
         
    def display_list(data):
      for n in data:
        print "The square root of %2.0f is %3.16f" % (n,math.sqrt(n))

    display_list(range(2,11))
             

    Here is the output:

    The square root of  2 is 1.4142135623730951
    The square root of  3 is 1.7320508075688772
    The square root of  4 is 2.0000000000000000
    The square root of  5 is 2.2360679774997898
    The square root of  6 is 2.4494897427831779
    The square root of  7 is 2.6457513110645907
    The square root of  8 is 2.8284271247461903
    The square root of  9 is 3.0000000000000000
    The square root of 10 is 3.1622776601683795
             

    Here is a nested loop, a loop within a loop:

    def show_matrix(mat):
      for y in mat:
        for x in mat:
          print "%3d" % (x * y),
        print

    show_matrix(range(1,13))
             

    Here is the output:

      1   2   3   4   5   6   7   8   9  10  11  12
      2   4   6   8  10  12  14  16  18  20  22  24
      3   6   9  12  15  18  21  24  27  30  33  36
      4   8  12  16  20  24  28  32  36  40  44  48
      5  10  15  20  25  30  35  40  45  50  55  60
      6  12  18  24  30  36  42  48  54  60  66  72
      7  14  21  28  35  42  49  56  63  70  77  84
      8  16  24  32  40  48  56  64  72  80  88  96
      9  18  27  36  45  54  63  72  81  90  99 108
     10  20  30  40  50  60  70  80  90 100 110 120

 11  22  33  44  55  66  77  88  99 110 121 132
 12  24  36  48  60  72  84  96 108 120 132 144
         

Notice in the above example that the first print statement doesn't emit a linefeed because of the appended ",", and the second print statement's sole purpose is to emit a linefeed at the end of each line.

Indentation, logical blocks

Newcomers to Python should be aware that whitespace is syntactically significant and that the Python interpreter will accept any consistent indentation, even if a particular block's indentation is at odds with others in the same source file. The only rule is that the indentation of a logical block be consistent with itself:

for c in list("This is accepted."):
        print c,
print

for c in list("And so is this."):
    print c,
print

The first code example above indents eight spaces, the next indents by four. Python accepts both. Most programming editors will try to create consistent indentations by automatically indenting new lines in a consistent way after a line ending with ":", but as a programming project moves forward and logical blocks are manually moved, this discipline can erode. My point is that Python won't reject valid code based on inconsistent indentation (unless a line is indented in a way that doesn't agree with its adjacent lines), but for the sake of consistency and readable code, programmers might want to pay attention to this issue.

My Python "beautifier" script PyBeautify will enforce consistent indentation by flagging lines that aren't consistent (it won't try to correct errors in indentation, only flag them). There is a Python module called "tabnanny" that does much the same thing, but it won't enforce overall consistency (i.e. the idea that all indentations should be multiples of the same basic unit).

No comments: