PEP:215
Title:String Interpolation
Version:$Revision: 1.3 $
Author:ping@lfw.org (Ka-Ping Yee)
Status:Draft
Type:Standards Track
Python-Version:2.1
Created:24-Jul-2000
Post-History:


Abstract

    This document proposes a string interpolation feature for Python
    to allow easier string formatting.  The suggested syntax change
    is the introduction of a '$' prefix that triggers the special
    interpretation of the '$' character within a string, in a manner
    reminiscent to the variable interpolation found in Unix shells,
    awk, Perl, or Tcl.


Copyright

    This document is in the public domain.


Specification

    Strings may be preceded with a '$' prefix that comes before the
    leading single or double quotation mark (or triplet) and before
    any of the other string prefixes ('r' or 'u').  Such a string is
    processed for interpolation after the normal interpretation of
    backslash-escapes in its contents.  The processing occurs just
    before the string is pushed onto the value stack, each time the
    string is pushed.  In short, Python behaves exactly as if '$'
    were a unary operator applied to the string.  The operation
    performed is as follows:

    The string is scanned from start to end for the '$' character
    (\x24 in 8-bit strings or \u0024 in Unicode strings).  If there
    are no '$' characters present, the string is returned unchanged.

    Any '$' found in the string, followed by one of the two kinds of
    expressions described below, is replaced with the value of the
    expression as evaluated in the current namespaces.  The value is
    converted with str() if the containing string is an 8-bit string,
    or with unicode() if it is a Unicode string.

    1.  A Python identifier optionally followed by any number of
        trailers, where a trailer consists of:
            - a dot and an identifier,
            - an expression enclosed in square brackets, or
            - an argument list enclosed in parentheses
        (This is exactly the pattern expressed in the Python grammar
        by "NAME trailer*", using the definitions in Grammar/Grammar.)

    2.  Any complete Python expression enclosed in curly braces.

    Two dollar-signs ("$$") are replaced with a single "$".


Examples

    Here is an example of an interactive session exhibiting the
    expected behaviour of this feature.

        >>> a, b = 5, 6
        >>> print $'a = $a, b = $b'
        a = 5, b = 6
        >>> $u'uni${a}ode'
        u'uni5ode'
        >>> print $'\$a'
        5
        >>> print $r'\$a'
        \5
        >>> print $'$$$a.$b'
        $5.6
        >>> print $'a + b = ${a + b}'
        a + b = 11
        >>> import sys
        >>> print $'References to $a: $sys.getrefcount(a)'
        References to 5: 15
        >>> print $"sys = $sys, sys = $sys.modules['sys']"
        sys = <module 'sys' (built-in)>, sys = <module 'sys' (built-in)>
        >>> print $'BDFL = $sys.copyright.split()[4].upper()'
        BDFL = GUIDO


Discussion

    '$' is chosen as the interpolation character within the
    string for the sake of familiarity, since it is already used
    for this purpose in many other languages and contexts.

    It is then natural to choose '$' as a prefix, since it is a
    mnemonic for the interpolation character.

    Trailers are permitted to give this interpolation mechanism
    even more power than the interpolation available in most other
    languages, while the expression to be interpolated remains
    clearly visible and free of curly braces.

    '$' works like an operator and could be implemented as an
    operator, but that prevents the compile-time optimization
    and presents security issues.  So, it is only allowed as a
    string prefix.


Security Issues

    "$" has the power to eval, but only to eval a literal.  As
    described here (a string prefix rather than an operator), it
    introduces no new security issues since the expressions to be
    evaluated must be literally present in the code.


Implementation

    The Itpl module at http://www.lfw.org/python/Itpl.py provides a
    prototype of this feature.  It uses the tokenize module to find
    the end of an expression to be interpolated, then calls eval()
    on the expression each time a value is needed.  In the prototype,
    the expression is parsed and compiled again each time it is
    evaluated.

    As an optimization, interpolated strings could be compiled
    directly into the corresponding bytecode; that is,

        $'a = $a, b = $b'

    could be compiled as though it were the expression

        ('a = ' + str(a) + ', b = ' + str(b))

    so that it only needs to be compiled once.