From: Jeremy Hylton To: Tim Peters , Guido van Rossum Cc: python-dev@python.org Subject: RE: [Python-Dev] Accessing globals without dict lookup Date: Sat, 9 Feb 2002 19:04:19 -0500 Here's a brief review of the example function. def mylen(s): return len(s) LOAD_BUILTIN 0 (len) LOAD_FAST 0 (s) CALL_FUNCTION 1 RETURN_VALUE The interpreter has a dlict for all the builtins. The details don't matter here. Let's say that len is at index 4. The function mylen has an array: func_builtin_index = [4] # an index for each builtin used in mylen The entry at index 0 of func_builtin_index is the index of len in the interpreter's builtin dlict. It is either initialized when the function is created or on first use of len. (It doesn't matter for the mechanism and there's no need to decide which is better yet.) The module has an md_globals_dirty flag. If it is true, then a global was introduced dynamically, i.e. a name binding op occurred that the compiler did not detect statically. The code object has a co_builtin_names that is like co_names except that it only contains the names of builtins used by LOAD_BUILTIN. It's there to get the correct behavior when shadowing of a builtin by a local occurs at runtime. The frame grows a bunch of pointers -- f_module from the function (which stores it instead of func_globals) f_builtin_names from the code object f_builtins from the interpreter The implementation of LOAD_BUILTIN 0 is straightforward -- in pidgin C: case LOAD_BUILTIN: if (f->f_module->md_globals_dirty) { PyObject *w = PyTuple_GET_ITEM(f->f_builtin_names); ... /* rest is just like current LOAD_GLOBAL except that is used PyDLict_GetItem() */ } else { int builtin_index = f->f_builtin_index[oparg]; PyObject *x = f->f_builtins[builtin_index]; if (x == NULL) raise NameError Py_INCREF(x); PUSH(x); } The LOAD_GLOBAL opcode ends up looking basically the same, except that it doesn't need to check md_globals_dirty. case LOAD_GLOBAL: int global_index = f->f_global_index[oparg]; PyObject *x = f->f_module->md_globals[global_index]; if (x == NULL) { check for dynamically introduced builtin } Py_INCREF(x); PUSH(x); In the x == NULL case above, we need to take extra care for a builtin that the compiler didn't expect. It's an odd case. There is a global for the module named spam that hasn't yet been assigned to in the module and there's also a builtin named spam that will be hidden once spam is bound in the module. Jeremy _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev From: Jeremy Hylton To: Tim Peters , Guido van Rossum Cc: python-dev@python.org Subject: RE: [Python-Dev] Accessing globals without dict lookup Date: Sat, 9 Feb 2002 20:13:53 -0500 Let's try an attribute of a module. import math def mysin(x): return math.sin(x) There are two variants of support for this that differ in the way they handle math being rebound. Say another function is: def yikes(): global math import string as math We can either check on each use of math.attr to see if math is rebound, or we can require that STORE_GLOBAL marks all the math.attr entries as invalid. I'm not sure which is better, so I'll try to describe both. Case #1: Binding operation responsible for invalidating cache. The module has a dlict for globals that contains three entries: [math, mysin, yikes]. Each is a PyObject *. The module also has a global attrs cache, where each entry is struct { int ce_initialized; /* just a flag */ PyObject **ce_ref; } cache_entry; In the case we're considering, ce_module points to math and ce_module_index is math's index in the globals dlict. It's assigned to when the module object is created and never changes. There is one entry in the global attrs cache, for math.sin. There's only one entry because the compiler only found one attribute access of a global bound by an import statement. The function mysin(x) uses LOAD_GLOBAL_ATTR 0 (math.sin). case LOAD_GLOBAL_ATTR: cache_entry *e = f->f_module->md_cache[oparg]; if (!e->ce_initialized) { /* lookup module and find it's sin attr. store pointer to module dlict entry in ce_ref. NB: cache shared by all functions. if the thing we expected to be a module isn't actually a module, handle that case here and leave initalized set to false. */ } if (*e->ce_ref == NULL) { /* raise NameError if global module isn't bound yet. raise AttributeError if module is bound, but doesn't have attr. */ } Py_INCREF(*e->ce_ref); PUSH(*e->ce_ref); To support invalidation of cache entries, we need to arrange the cache entries in a particular order and add an auxiliary data structure that maps from module globals to cache entries it must invalidation. For example, say a module use math.sin, math.cos, and math.tan. The three cache entries for the math module should be stored contiguously in the cache. cache_entry *cache[] = { math.sin entry, math.cos entry, math.tan entry, } struct { int index; /* first attr of this module in cache */ int length; /* number of attrs for this module in cache */ } invalidation_info; There is one invalidation_info for each module that has cached attributes. (And only for things that the compiler determines to be modules.) The invalidation_info for math would be {0, 3}. If a STORE_GLOBAL rebinds math, it must walk through the cache and set ce_initialized to false for each cache entry. This isn't exactly the scheme I described in the slides, where I suggested that the LOAD_GLOBAL_ATTR would check if the module binding was still valid on each use. A question from Ping pushed me back in favor of the approach that I just described. No time this weekend to describe that check-on-each-use scheme. Jeremy _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev