Python Things

Python is a clean and powerful semicompiled object-oriented programming language. If you hadn't heard of Python, go find out about it now!

Stop press. :) Strange "Python" twins sighted at Python 10.


Here is a collection of some possibly useful Python things. They're in chronological order, so scroll down to the end for the most recent stuff.

Perlish features for Python regexes
The internal regex module has a syntax mode labelled RE_SYNTAX_AWK that helps to make regexes more familiar for Perl hackers by allowing unbackslashed parens and pipes. But it doesn't have \d, \D, \s, or \S; and \w and \W don't work quite the same as Perl's. So this patch adds these conveniences under a new regex syntax flag named RE_EXTRA_CLASSES. For kicks, i also added \h for hexadecimal digits and \l for letters of the alphabet.

Also, this patch adds a syntax flag named RE_MINIMAL_OPS which enables the new operators ??, *?, and +? (from Perl 5). These have similar meanings to their counterparts ?, *, and +, but the minimal versions prefer to match the shortest string possible, which can be really useful in some situations.

These two features are both enabled when you select the regex syntax dubiously labelled RE_SYNTAX_PERLISH. This patchfile should be applied to the Python 1.4 beta 3 distribution, and it alters three files: Modules/regexpr.c, Modules/regexpr.h, and Lib/

To apply the patch, simply go to the directory into which you extracted the Python tar file (probably named Python1.4beta3), and -- if you saved the patch file as, say, /home/bob/regex-perlish.patch -- go

patch </home/bob/regex-perlish.patch
Then just run make to build the new Python interpreter.
extra registers for regex matches
The compiled regular expression objects produced by the internal regex module don't have attributes equivalent to Perl's $&, $`, or $' (the part of the string that matched, everything before what matched, and everything after what matched, respectively). It's possible to construct these strings using the indices from the regs attribute, but it's somewhat inconvenient, harder to read, and slower.

So i suggested the addition of three attributes, before, found, and after, to the compiled regex objects. Andrew Kuchling almost immediately posted a patch to do just that.

rxb, a regex builder
Here's an idea suggested by Greg Ewing that i coded up one day. This Python module lets you verbosely build regular expressions with phrases such as
digit + some(whitespace) + exactly(':)')
without worrying about the exact syntax or bothering to backslash dangerous characters. You might like it if you find yourself wasting a lot of time looking up regular-expression syntax.

This new version allows a more concise syntax and generates instances of a Pattern class, instead of strings; this way you can directly use methods like search on the result, and you don't need to worry about compiling and caching. It makes regexes more convenient to use. Here's an example:

>>> import rxb
>>> rxb.welcome()
>>> pat = label.spam(some(letters)) + digit
>>>'foo bar python8')
>>> pat.spam

>>> pat
<Pattern \\(<spam>[A-Za-z]+\\)[0-9]>

>>> rxb.banish()
or, a slightly more complex example from Grail:
import rxb

flag = member(letters, '-')

LISTING_PATTERN = (begline +
        flag +                            # file type
        flag*3 + flag*3 + flag*3) +       # owner, group, world perms
        somespace + anything +            # links, owner, grp, sz, date
        somespace +
        digit*2 + maybe(':') + digit*2 +  # year or hh:mm
        somespace) +
        anybut('->')) +                   # anything before symlink
        somespace + '->' + anything)) +   # possible symlink

Thanks to William S. Lear <> who pointed out a problem with this example which was due to a bug in the regex.symcomp() routine. The rxb module has been recently modified to produce regular expressions using backslashed (instead of bare) parentheses for grouping, as a workaround for the symcomp() bug.

The bug is this: if you open a new subgroup with a left-parenthesis immediately following the greater-than sign which ends a group label, symcomp() will miss the parenthesis and thus miscount the rest of the subgroups. The bug has been fixed in Python 1.4 final.

concatenation in place for lists
Since lists are mutable, you can modify them in place using list methods. Using list.append(element) is much faster than doing list = list + element, since the latter has to make a new object with a copy of the whole list. Unfortunately, list.append can only append one element at a time. I wrote the following patch to add the concat method to lists, which will concatenate a list argument onto another list in place.
new version of
I decided to rewrite as an exercise in text-processing with Python (in part, this is what prompted me to think of the regex modifications above, but this script does not require them). The new version is quite a bit more general than the old, and should be able to convert most reasonably-formatted FAQs into HTML, provided that questions are preceded with Q. and answers preceded with A.. Check the top of the script for details.
small improvement to
This tiny patch makes dis.disco() display the names of local variables along with the disassembly.
string interpolation for Python
This module lets you quickly and conveniently interpolate values into strings (in the flavour of Perl or Tcl, but with less extraneous punctuation). You get a bit more power than in other languages, because this module allows subscripting, slicing, function calls, attribute lookup, or arbitrary expressions. Here are the simple interpolation rules:
  1. A dollar sign and a name, possibly followed by any of:
    • an open paren, and anything up to the matching paren
    • an open bracket, and anything up to the matching bracket
    • a period and a name
    any number of times, is evaluated as a Python expression.
  2. A dollar sign immediately followed by an open curly-brace, and anything up to the matching curly-brace, is evaluated as a Python expression.
  3. Two dollar signs in a row give you one literal dollar sign.
  4. Anything else is left alone.
Expressions are evaluated in the namespace of the caller. This lets you painlessly do:
"Here is a $string."
"Here is a $module.member."

"Here is an $object.member."

"Here is a $functioncall(with, arguments)."

"Here is an ${arbitrary + expression}."

"Here is an $array[3] member."

"Here is a $dictionary['member']."

You can download the module from this site. It contains a class named 'Itpl' for representing interpolated-string objects, and a function named 'printpl' which will interpolate a given string and print the results. Here is the documentation page generated from the module.

generalized string.join
This patch generalizes the string.join() routine to accept any instance of a class that implements the __len__ and __getitem__ disciplines, rather than accepting only the built-in sequence types (list and tuple). Your __getitem__ method will be called twice for each element (once to add up the total length of the result, and once during construction), so it had better return consistent results for this to work...

This lack of safety is bad. If the returned string lengths are inconsistent, you can cause a segmentation fault. Watch here for a more robust update.

assignment of while and if conditions (warning: controversial!)
This small patch, recently discussed on the Python newsgroup to some degree, changes the syntax of while and if statements to allow an optional from keyword to save the result of the conditional in a variable. This lets you write, for example:
while line from sys.stdin.readline():

if status from pipe.close():

My goal was to put the condition where it belongs instead of having to put extra "if ... break" statements inside the loop or duplicate the condition at the end where it it less apparent. There have been a fair number of comments about this. Just for fun, i'll quote some here (with apologies to the speakers)...

"Reads like Python." (David Ascher)

"... a very elegant solution, IMHO." (Andrew Kuchling)

"... seems like a C idiom trying to work its way into Python." (Johann Hibschman)

"... can easily be emulated using a file iterator." (Fredrik Lundh)

"I like this proposal." (Anthony Baxter)

"... looks just right to me." (Konrad Hinsen)

"I don't see why grammar changes are needed for what is essentially just an addition to a class's methods..." (Tony J Ibbs)

"Is a syntax change really worth it when all you save is one (1) line of source code?" (Fredrik Lundh)

"I'd really like to see it in the official release." (Marnix Klooster)

"I don't see what all the fuss is about. I commonly use a while 1 loop with one or more if:break clauses..." (Donn Cave)

"I have to concur with Donn on this one. I'm never really been inconvenienced by using the while 1:...break idiom." (Barry Warsaw)

"... agree with Donn that this is all unnecessary and we're better off with the 'while 1' idiom." (Guido van Rossum)

"I really like the 'from' proposal." (Richard Jones)

Oh, well. Anyway, here it is. After applying the patch, you need to go into the Grammar/ subdirectory and do a make to rebuild the parser (isn't that cool?) before going back up and doing make to build the interpreter.

improved tokenizer module
You don't need to get if you have Python 1.5. A patched copy of this module made it into the standard distribution.

The tokenize module included with Python 1.3 and Python 1.4 does not quite "match the working of the Python tokenizer exactly", as it claims. Specifically, the new double-star operator is not recognized, CR/LF is not accepted at the end of a line, FF is not accepted, and there is no support for triple-quoted strings or backslash-continuations of lines. The new module fixes the regex in tokenize.tokenprog to accept the double-star, but such a regex is only good for scanning individual lines of text.

So the new module (posted 1 April) includes a new function tokenize.tokenize() which will scan streams of text. The function accepts a readline-like method which is called to come up with the next input line (or "" for EOF) and a "token-eater" function. The "token-eater" function should accept five arguments: the type of the token, a string containing the token, the starting and ending (row, column) coordinates of the token, and the line itself. This function should match the working of the Python tokenizer, nd will return INDENT and DEDENT tokens as the line indentation changes.

The information your "token-eater" function gets from tokenize.tokenize() should be enough to exactly reconstruct the original source script, if you need it. The regurgitate script below is an example of how to do this. The cedit script below is an example of using the tokenizer to colourize Python code in a simple Tk text-editing window.

simple text-based Python "lint" script
This entry in the growing list of Python "lint" scripts has two (so far) unique features in particular:

The principle is simple -- any identifier which is seen only once in your script is considered suspect. Warnings are not generated for keywords or for built-in object methods (when used as methods); extra warnings are generated for identifiers that look like __reserved__ words but aren't known.

With the -i option, this script will also import modules whenever it sees import statements in your script, so that if you use string.split only once, there won't be a complaint about split if you have imported string in your script.

To use this script, you need to also have the "" module mentioned above.

an interactive Tk-enabled shell (like wish)
This script uses the _tkinter.createfilehandler call and a simple Python interpreter written in Python to make it look like you're running Python the normal way in a terminal window, but still have live widgets in Tk windows, like wish. Funnily enough, this one's called pywish. With it, you can play with user interfaces in a quick and natural way:
wheat[251]% pywish
Python 1.4 (Mar 17 1997) [C] (pywish)
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> import sys
>>> def hello():
...   print 'hi!'
>>> b = Tkinter.Button(root, text='hi', command=hello)
>>> b.pack()
>>> hi!
>>> q = Tkinter.Button(root, text='quit', command=sys.exit)
>>> q.pack()
>>> hi!

The words in boldface are not ones that i entered; they were printed on the screen when i pressed the "hi" button in the Tk window (first three times, then twice, and then i pressed the "quit" button, which exited to the shell prompt).

My thanks are due to Guido van Rossum for pointing out createfilehandler so i could produce this effect, and also for a tip on successful use of compile and exec with multi-line strings: tack on a few newlines.

a Tkinter-based console component
This console features more tolerant pasting of code from other interactive sessions, better handling of continuations than the standard Python interpreter, highlighting of the most recently-executed code block, the ability to edit and reexecute previously entered code, a history of recently-entered lines, and automatic multi-level completion with pop-up menus. It will also present pop-up help based on documentation strings.

I plan to clean it up a little to make it more usable as a general component in other applications, but for now i'll just post it in its current state and hope you find it useful. You can just run this script directly to pop up a Console window.

Roundup, a simple and effective bug-tracking system (see the short paper).
Roundup is out! (sort of) You will really have to excuse the mess, and don't say i didn't warn you! It has been hurriedly extracted from its running implementation into a hacked-up copy -- but at least you should be able to run it yourself if you have a webserver setup.
htmldoc, a documentation generator that produces HTML documents from live Python objects.
You can see samples of the output from htmldoc (i ran it over the Python 1.5.2 standard library) at


To do:

The "htmldoc" module is actually quite small (only about 300 lines) as most of the hard work has been factored out into the "inspect" module -- a non-HTML-specific collection of routines for getting all kinds of information out of your Python objects. My favourite routine in "inspect" is inspect.getsource(object), which can get you the source code for a function, method, or class.

pydoc, a documentation generator that works from the command line, in the Python interpreter, and as a web server in the background.
pydoc sys                    # document a built-in module
pydoc copy                   # document a module written in Python
pydoc types                  # document a module written in Python
pydoc abs                    # document a built-in function
pydoc repr.Repr              # document a single class
pydoc -k mail                # keyword search like man -k
pydoc -p 6789                # start a web server at http://localhost:6789/

>>> from pydoc import help
>>> help("getopt.getopt")    # document something you haven't imported
>>> import calendar
>>> help(calendar)           # document a live object

To get it, download these two files:

copyright © by Ka-Ping Yee <> updated Mon 20 Aug 2001