PEP 2xx -- Iterators

PEP:	2xx
Title:	Iterators
Version:	$Revision: 1.1 $
Author:	ping@lfw.org (Ka-Ping Yee)
Status:	Draft
Type:	Standards Track
Python-Version:	2.1
Created:	30-Jan-2001
Post-History:

Abstract

    This document proposes an iteration interface that objects can
    provide to control the behaviour of 'for' loops.  Looping is
    customized by providing a method that produces an iterator object.
    The iterator should be a callable object that returns the next
    item in the sequence each time it is called, raising an exception
    when no more items are available.

Copyright

    This document is in the public domain.

Sequence Iterators

    A new field named 'sq_iter' for requesting an iterator is added
    to the PySequenceMethods table.  Upon an attempt to iterate over
    an object with a loop such as

        for item in sequence:
            ...body...

    the interpreter looks for the 'sq_iter' of the 'sequence' object.
    If the method exists, it is called to get an iterator; it should
    return a callable object.  If the method does not exist, the
    interpreter produces a built-in iterator object in the following
    manner (described in Python here, but implemented in the core):

        def make_iterator(sequence):
            def iterator(sequence=sequence, index=[0]):
                item = sequence[index[0]]
                index[0] += 1
                return item
            return iterator

    To execute the above 'for' loop, the interpreter would proceed as
    follows, where 'iterator' is the iterator that was obtained:

        while 1:
            try:
                item = iterator()
            except IndexError:
                break
            ...body...

    (Note that the 'break' above doesn't translate to a "real" Python
    break, since it would go to the 'else:' clause of the loop whereas
    a "real" break in the body would skip the 'else:' clause.)

    The list() and tuple() built-in functions would be updated to use
    this same iterator logic to retrieve the items in their argument.

    List and tuple objects would implement the 'sq_iter' method by
    calling the built-in make_iterator() routine just described.

    Instance objects would implement the 'sq_iter' method as follows:

        if hasattr(self, '__iter__'):
            return self.__iter__()
        elif hasattr(self, '__getitem__'):
            return make_iterator(self)
        else:
            raise TypeError, thing.__class__.__name__ + \
                ' instance does not support iteration'

    Extension objects can implement 'sq_iter' however they wish, as
    long as they return a callable object.

Mapping Iterators

    An additional proposal from Guido is to provide special syntax
    for iterating over mappings.  The loop:

        for key:value in mapping:

    would bind both 'key' and 'value' to a key-value pair from the
    mapping on each iteration.  Tim Peters suggested that similarly,

        for key: in mapping:

    could iterate over just the keys and

        for :value in mapping:

    could iterate over just the values.

    The syntax is unambiguous since the new colon is currently not
    permitted in this position in the grammar.

    This behaviour would be provided by additional methods in the
    PyMappingMethods table: 'mp_iteritems', 'mp_iterkeys', and
    'mp_itervalues' respectively.  'mp_iteritems' is expected to
    produce a callable object that returns a (key, value) tuple;
    'mp_iterkeys' and 'mp_itervalues' are expected to produce a
    callable object that returns a single key or value.

    The implementations of these methods on instance objects would
    then check for and call the '__iteritems__', '__iterkeys__',
    and '__itervalues__' methods respectively.

    When 'mp_iteritems', 'mp_iterkeys', or 'mp_itervalues' is missing,
    the default behaviour is to do make_iterator(mapping.items()),
    make_iterator(mapping.keys()), or make_iterator(mapping.values())
    respectively, using the definition of make_iterator() above.

Indexing Sequences

    The special syntax described above can be applied to sequences
    as well, to provide the long-hoped-for ability to obtain the
    indices of a sequence without the strange-looking 'range(len(x))'
    expression.

        for index:item in sequence:

    causes 'index' to be bound to the index of each item as 'item' is
    bound to the items of the sequence in turn, and

        for index: in sequence:

    simply causes 'index' to start at 0 and increment until an attempt
    to get sequence[index] produces an IndexError.  For completeness,

        for :item in sequence:

    is equivalent to

        for item in sequence:

    In each case we try to request an appropriate iterator from the
    sequence.  In summary:

        for k:v in x    looks for mp_iteritems, then sq_iter
        for k: in x     looks for mp_iterkeys, then sq_iter
        for :v in x     looks for mp_itervalues, then sq_iter
        for v in x      looks for sq_iter

    If we fall back to sq_iter in the first two cases, we generate
    indices for k as needed, by starting at 0 and incrementing.

    The implementation of the mp_iter* methods on instance objects
    then checks for methods in the following order:

        mp_iteritems    __iteritems__, __iter__, items, __getitem__
        mp_iterkeys     __iterkeys__, __iter__, keys, __getitem__
        mp_itervalues   __itervalues__, __iter__, values, __getitem__
        sq_iter         __iter__, __getitem__

    If a __iteritems__, __iterkeys__, or __itervalues__ method is
    found, we just call it and use the resulting iterator.  If a
    mp_* function finds no such method but finds __iter__ instead,
    we generate indices as needed.

    Upon finding an items(), keys(), or values() method, we use
    make_iterator(x.items()), make_iterator(x.keys()), or
    make_iterator(x.values()) respectively.  Upon finding a
    __getitem__ method, we use it and generate indices as needed.

    For example, the complete implementation of the mp_iteritems
    method for instances can be roughly described as follows:

        def mp_iteritems(thing):
            if hasattr(thing, '__iteritems__'):
                return thing.__iteritems__()
            if hasattr(thing, '__iter__'):
                def iterator(sequence=thing, index=[0]):
                    item = (index[0], sequence.__iter__())
                    index[0] += 1
                    return item
                return iterator
            if hasattr(thing, 'items'):
                return make_iterator(thing.items())
            if hasattr(thing, '__getitem__'):
                def iterator(sequence=thing, index=[0]):
                    item = (index[0], sequence[index[0]])
                    index[0] += 1
                    return item
                return iterator
            raise TypeError, thing.__class__.__name__ + \
                ' instance does not support iteration over items'

Examples

    Here is a class written in Python that represents the sequence of
    lines in a file.

        class FileLines:
            def __init__(self, filename):
                self.file = open(filename)
            def __iter__(self):
                def iter(self=self):
                    line = self.file.readline()
                    if line: return line
                    else: raise IndexError
                return iter

        for line in FileLines('spam.txt'):
            print line

    And here's an interactive session demonstrating the proposed new
    looping syntax:

        >>> for i:item in ['a', 'b', 'c']:
        ...     print i, item
        ...
        0 a
        1 b
        2 c
        >>> for i: in 'abcdefg':        # just the indices, please
        ...     print i,
        ... print
        ...
        0 1 2 3 4 5 6
        >>> for k:v in os.environ:      # os.environ is an instance, but
        ...     print k, v              # this still works because we fall
        ...                             # back to calling items()
        MAIL /var/spool/mail/ping
        HOME /home/ping
        DISPLAY :0.0
        TERM xterm
        .
        .
        .

Rationale

    If all the parts of the proposal are included, this addresses many
    concerns in a consistent and flexible fashion.  Among its chief
    virtues are the following three -- no, four -- no, five -- points:

    1. It provides an extensible iterator interface.

    2. It resolves the endless "i indexing sequence" debate.

    3. It allows performance enhancements to dictionary iteration.

    4. It allows one to provide an interface for just iteration
       without pretending to provide random access to elements.

    5. It is backward-compatible with all existing user-defined
       classes and extension objects that emulate sequences and
       mappings, even mappings that only implement a subset of
       {__getitem__, keys, values, items}.

Errors

    Errors that occur during sq_iter, mp_iter*, or the __iter*__
    methods are allowed to propagate normally to the surface.

    An attempt to do

        for item in dict:

    over a dictionary object still produces:

        TypeError: loop over non-sequence

    An attempt to iterate over an instance that provides neither
    __iter__ nor __getitem__ produces:

        TypeError: instance does not support iteration

    Similarly, an attempt to do mapping-iteration over an instance
    that doesn't provide the right methods should produce one of the
    following errors:

        TypeError: instance does not support iteration over items
        TypeError: instance does not support iteration over keys
        TypeError: instance does not support iteration over values

    It's an error for the iterator produced by __iteritems__ or
    mp_iteritems to return an object whose length is not 2:

        TypeError: item iterator did not return a 2-tuple

Open Issues

    We could introduce a new exception type such as IteratorExit just
    for terminating loops rather than using IndexError.  In this case,
    the implementation of make_iterator() would catch and translate an
    IndexError into an IteratorExit for backward compatibility.

    We could provide access to the logic that calls either 'sq_item'
    or make_iterator() with an iter() function in the built-in module
    (just as the getattr() function provides access to 'tp_getattr').
    One possible motivation for this is to make it easier for the
    implementation of __iter__ to delegate iteration to some other
    sequence.  Presumably we would then have to consider adding
    iteritems(), iterkeys(), and itervalues() as well.

    An alternative way to let __iter__ delegate iteration to another
    sequence is for it to return another sequence.  Upon detecting
    that the object returned by __iter__ is not callable, the
    interpreter could repeat the process of looking for an iterator
    on the new object.  However, this process seems potentially
    convoluted and likely to produce more confusing error messages.

    If we decide to add "freezing" ability to lists and dictionaries,
    it is suggested that the implementation of make_iterator
    automatically freeze any list or dictionary argument for the
    duration of the loop, and produce an error complaining about any
    attempt to modify it during iteration.  Since it is relatively
    rare to actually want to modify it during iteration, this is
    likely to catch mistakes earlier.  If a programmer wants to
    modify a list or dictionary during iteration, they should
    explicitly make a copy to iterate over using x[:], x.clone(),
    x.keys(), x.values(), or x.items().

    For consistency with the 'key in dict' expression, we could
    support 'for key in dict' as equivalent to 'for key: in dict'.

BDFL Pronouncements

    The "parallel expression" to 'for key:value in mapping':

        if key:value in mapping:

    is infeasible since the first colon ends the "if" condition.
    The following compromise is technically feasible:

        if (key:value) in mapping:

    but the BDFL has pronounced a solid -1 on this.

    The BDFL gave a +0.5 to:

        for key:value in mapping:
        for index:item in sequence:

    and a +0.2 to the variations where the part before or after
    the first colon is missing.