From: Tim Peters Subject: RE: [Python-Dev] Accessing globals without dict lookup To: Guido van Rossum Cc: python-dev@python.org Date: Sun, 10 Feb 2002 16:13:07 -0500 >>> When you use its getitem method, the PyObject * in the cell is >>> dereferenced, and if a NULL is found, getitem raises KeyError >>> even if the cell exists. >> Since we're doing something with "the PyObject* in the cell", >> surely "the cell" *must* exist. So what is the "even if the cell >> exists" trying to say? > It is trying to say "despite the cell's existence". Then s/even if/even though/, and that's what it will say . >> I believe it means to say >> >> even if the cell's cellptr is not NULL >> >> and "the cell's cellptr is not NULL" is quite different from "the cell >> exists". > No, it doesn't try to say that. But you're right that it's useful to > add that the cell's cellptr is irrelevant to getitem. So even though the cell exists, and even if its cellptr is non-NULL. [on a write-through dict] > This works for propagating changes from the celldict to the real dict, > but not the other way around. Fudge, that's right. Make it a 2-way write-through dict pair . > Example: > > d = {'x': 10} > def set_x(x): > d['x'] = x > exec "...some code that calls set_x()..." in d Understood. What about this? import cheater def f(): cheater.cheat() return pachinko() print f() where cheater.py contains: def cheat(): import __builtin__ __builtin__.pachinko = lambda: 666 That works fine today, and prints 666. Under the proposed scheme, I believe: 1. The main module's globals don't get a cell for pachinko at module creation time, because pachinko isn't in the builtins at that point. 2. When the function object for f() is created, the main module's globals grow a {NULL, NULL} cell for pachinko. 3. When cheat() is called, __builtin__'s celldict grows a {lambda: 666, NULL} slot for pachinko. 4. But the main module's globals neither sees that nor has a pointer to the new cell. 5. The reference to pachinko in f's return finds {NULL, NULL} in the globals, and so raises NameError. I'm thinking more radically now, about using module dicts mapping to pairs PyObject* (the actual value) a "I got this PyObject* from the builtins" flag (the "shadow flag") Invariants: 1. value==NULL implies flag is false. 2. flag is true implies value is the value in the builtin dict Suppose that, at module creation time, the module's globals still got an entry for every then-existing builtin, but rather than point to the builtin's cell, copied the ultimate PyObject* into one of these pairs (and set the flag). Most of the other machinery in the proposal stays the same. Accessing a global (builtin or not) via LOAD_GLOBAL_CELL is then very simple: the pair does or doesn't contain NULL, and that's the end of it either way. This makes the most frequent operation as fast as I can imagine it becoming (short of plugging ultimate PyObject* values directly into func_cells -- still thinking about that). How can this get out of synch? 1. Mutations of the builtin dict (creation of a new builtin, rebinding an existing builtin, del'ing an existing builtin). This has got to be exceedingly rare, and never happens in most programs: it doesn't matter how expensive this is. Each dict serving as a builtin dict could, e.g., maintain a list of weakrefs to the modules that were initialized from it. Then mutations of the dict could reach into the relevant module dicts and adjust them accordingly. The shadow flags in the module dicts' pairs let the fixup code know whether a given entry really refers to the builtin. This makes mutations of the builtins very expensive, but I'd be surprised to find a real program where it's a measurable expense. Note: It may be helpful to view this as akin to propagating changes in new-style base classes down to their descendants. 2. del'ing a module global may uncover a builtin of the same name. While not as exceedingly rare as mutations of the builtins, it's still a rare thing. Seems like it would be reasonably cheap anyway: Module delitem: raise exception if no such key # key exists raise exception if it came from the builtins (flag is set) # key exists and is a module global, not a builtin; flag is false set value to NULL if the builtins have this name as a key: copy the current builtin value set the flag 3. Module setitem: if key exists: overwrite the value and clear the flag else: add new {value, false} pair 4. Module getitem: if name isn't a key or flag is set or value is NULL: raise exception else: return value That's for non-builtin module timdicts. I expect the same code would work for the builtin module's timdict too provided it were given an empty dict as the source from which to initialize itself (then the flag component of all its pairs would start out, and remain, false, and the code above would "do the right thing" by magic, reading "the builtins" as "the timdict from which I got initialized"). > ... > I meant that for "len" it would not change, i.e. it would be > > {NULL, pointer to __builtin__'s "len" cell} > > but for a global "foo" it would change to > > {value of foo or NULL if foo is undefined, pointer to this very cell} > > Then if foo is defined, the code would find the value of foo in the > first cell it tries, and if foo is undefined, it would find a NULL in > the cell and in the cell it points to. Whereas the original scheme stored {value of foo or NULL if foo is undefined, NULL} in this case. So that's just as quick if foo is defined, but if it isn't defined as a module global, has to do an extra NULL check on the cellptr. Gotcha. OTOH, if "foo" later *becomes* defined in the builtins too, the module dict won't know that it must change its foo's cellptr. [on the compiler's architecture] > Actually, the concrete syntax tree was never a very good > representation; it was convenient for the parser to generate that, and > it was "okay" (or "good enough") to generate code from and to do > anything else from. I think it worked great until the local-variable optimization got added. Unfortunately, that happened shortly after the dawn of time. > I agree that it's a good idea to start thinking about changing the > parse tree representation to a proper abstract syntax tree. Maybe the > normalization that the compiler.py package uses would be a good start? > Except that I've never quite grasped the visitor architecture there. :-( This msg is too long already <0.7 wink>. ... [on Jeremy's global.attr scheme] > One problem with that is that it's hard to know when in > . is a module, and when it's something else. I > guess global analysis could help -- if it's imported ("import math") > it's likely a module, if it's assigned from an expression ("L = []") > or a locally defined function or class, it's likely not a module. But > "from X import Y" creates a mystery -- X could be a package containing > a module Y, or it could be a module containing a function or class Y. Jeremy is aware of all this (I've heard him ponder these specific points), but I don't think he has a fully fleshed out approach to all of it yet. [a rework of the "len resolution" example, incorporating Guido's comments] """ If I'm reading this right, then in the normal case of resolving "len" in def mylen(s): return len(s) 1. A pointer to a cell object is read out of func_cells at a fixed (wrt this function) offset. This points to len's cell object in the module's celldict. 2. The cell object's PyObject* pointer is tested and found to be NULL. [3'. The cell object's cellptr pointer is tested and found not to be NULL. This points to len's cell object in __builtin__'s celldict. > This NULL test shouldn't be needed given my trick of linking cells > that do not shadow globals to themselves. As above, in the presence of mutations to builtins, a global that didn't shadow a builtin at first may end up shadowing one later. Perhaps you want to punt on preserving current behavior in such cases. The variant I sketched above is intended to preserve all current behavior, while running faster in non-pathological cases. ] 3. The cell object's cellptr's PyObject* is tested and found not to be NULL. 4. The cell object's cellptr's PyObject* is returned. """ In the {PyObject*, flag} variant: 1. A pointer to a pair is read out of func_cells at a fixed (wrt this function) offset. This points to len's pair in the module's timdict. 2. The pair's PyObject* pointer is tested and found to be non-NULL. 3. The pair's PyObject* is returned. > ... > I know I got sick there and am now stuck with a horrible cold (the > umpteenth one this season). In empathy, I'll refrain from painting a word-picture of my neck . _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev