Amit's Python Scoping Dreams

2003

I have to warn you. This is a bit wacky, and I have no illusion that any of this will go into Python. It's just some stuff that Python has inspired me to think about, mainly because Python exposes namespaces to the programmer, and programmers do cool things with it. How far can that go? (You're supposed to run away yelling when you hear someone say "How far can something go?") This idea touches on stackless (because stack frames become objects, which are on the heap) and nested scopes, too, if that'll help you run screaming. But if you're ready to read some crazy stuff, here it is:

[2012-12-02] I wrote this article in 2003 and didn't realize that there was a more general pattern underneath. See Steve Yegge's article on properties.

[2017-01-23] Older versions of Lua have something like this. Just as you can modify the lookup process on objects with metatables, you can modify the lookup process on scopes with the _ENV table. Then you can use this to implement objects, classes, modules, stack frames, with statement, methods of modules, changing parents, etc. Cool!

[2020] Smalltalk has something like this too, which shouldn't surprise me.

Namespace Unification

Python and many other OO languages have a nice way of looking up field names in objects — they look in an object, then in the class, then superclass, and so on. It's a chained lookup system. Python also uses a chained lookup for variable names. It first looks in locals() and then looks in globals(). If we have nested scopes in P3k, it will look in the local namespace, then in the parent namespace, then in its parent, and so on until we reach the global namespace. Jim Fulton's ExtensionClass implements a chained lookup system that can go from one object to its "container" to find values.

I'd like to see all of the chained namespace implementations unified. Objects, classes, modules, activation records ("stack frames")— all are namespaces underneath.

In this proposal, the __parent__ name is treated specially in an object. For looking up a name, you first check the namespace and then if you don't have the name, you ask your parent. If you have no __parent__, you give a NameError. For assigning to a name, you always assign the name in the current namespace, and you never look at the parent.

Note that this is just like what we do now, except we call the special field "__class__" if it's in an object and "__bases__" if it's a class. In this system, "ordinary" objects would simply have their class as the __parent__. An ordinary class would have its base class as its __parent__, as long as there's no multiple inheritance [1]. Environmental acquisition (as implemented in ExtensionClass) would have its container has its __parent__. At this point, an ordinary class or an acquisition class is no different than a "normal" object, so we may not need classes. (Already in Python, classes and objects aren't so different.)

The lookup semantics (chaining for getattr and not chaining for setattr) almost match both how fields in objects work and how local variables work [2]. So we could make namespaces into objects too. In the following example:

  # module spam.py

  x = 5

  def foo(y):
     print x
     x = 8
     print x

     def cheese(z):
        print x, y, z
     return spam

  bar = foo(10)
  bar(35)
    

The global (module) namespace is an object. (In fact, it's the module object.) It's the object <x = 5, foo = function....>. The function foo needs to remember its scope, so its co_namespace points to the spam module object.

When we call foo(10), we need to create a new namespace for the local variables. This is an object <y=10, __parent__ = spam-module>. When we print x, we look for x in this object, and it isn't there. So we look for x in the __parent__, which is the spam module object. It's there, so we return 5 and print it. The next thing we do is assign x to 8. Since there's no chaining on assignment, we create an entry x=8 in the namespace object, which becomes <y=10, x=8, __parent__ = spam-module>. Now we try to print x again. We look in the namespace object and find x is 8, so that's what we print. Next we define a function cheese, which gets its co_namespace set to the current namespace object. Then we return that function object.

The global module now gets bar=function... added to it.

And we call bar(35). What happens here? When we call a function, we have to create a new namespace object where the parent is set to the co_namespace. So that's <z=35, __parent__ = <y=10, x=8, __parent__ = spam-module> >. So when we try to print x, y, z, we find 8, 10, 35.

Does that all make sense?

Summary: rules for variable lookup look and smell a lot like rules for object lookup. If we combine the two, we can also put in classes, acquisition classes, and modules. It's "everything's an object" pushed a bit farther, into the area of local variables. I would keep the syntax that distinguishes classes from objects, but it'd be just syntax— nothing deeply different underneath.

Notes

Methods on objects

We have to resolve the issue of "self" being a magic argument. If objects and classes are the same, then either def foo(self, x) or .self.foo = lambda x: .. isn't consistent.

We might want to say "method" introduces an unbound method and "def" introduces a bound method (i.e., a function that isn't going to take 'self').

A magic "self" variable

We could have it so that when you invoke a method, it sets not only __parent__ to the co_namespace, but also sets self to the object. That way, you'd define foo() without listing "self", but you'd still have a "self" object and you'd access fields with "self.f".

     method foo(x):
         print self.f + x
      

I'm not convinced this is the right thing to do.

Inline objects

It'd be nice to have a lightweight way of declaring objects, like the <name=value, name=value, ...> syntax I used in the examples above. If objects are everything, you want to be able to make them all the time. It's possible to use {} even though {} is used for dictionaries. If it's {key:value, ..} then it's a dictionary and if it's {name=value, ..} then it's an object. (Maybe objects and dictionaries become the same thing underneath.) But that may be too confusing.

The "global" keyword

We still want some way of getting to the next level of scope. Since objects and namespaces are combined, and the object has a special __parent__, the __parent__ is exposed as a local variable. So we could do:

  x = 5
  def foo():
     __parent__.x = 10
      

to assign to the global x. If it's nested more:

  x = 5
  def foo():
     def cheese():
        __parent__.__parent__.x = 5
      

Not ideal, but then, I don't deal with globals too much anyway. Maybe 'global' can be a magic variable (not keyword) that gets set to the last non-empty __parent__. Then you'd end up doing:

  x = 5
  def foo():
     def cheese():
        global.x = 5
      
Cycles and refcounting

If we do nested scopes, we already get the cycles problem. I don't think making namespaces into objects makes anything worse.

Modules

Modules already exhibit this unification of namespaces and objects. From inside the module, you can refer to variables. From outside the module, you refer to fields. Unifying namespaces and objects wouldn't really do anything to modules. Instead, it'd make all namespaces behave like a module in that they're objects.

Methods on modules

I think you could get methods (including things like __repr__) on modules free here. I haven't thought about it enough, but my intuition tells me that since classes have gone away, and you can put methods on objects, you should be able to put methods on modules too. We just have resolve the bound/unbound method issue.

The "with" statement from Pascal

I suspect you can do with pretty easily but I'm not sure about the details. It may require multiple inheritance (aiee) if you want to be able to see local variables at the same time you see an object's fields.

Assignment to __parent__

So far I've been assuming that __parent__ is something set by Python for internal use. For objects, it's set when you call a class constructor. For classes, it's set when you declare the class using the "class" construct. For namespaces, it's set when you call a function.

But really, the cool stuff is really when the user can assign to __parent__!

For example, __parent__ = self would make it so that you can assign to fields in your object without using "self.". (You'd also lose your local and global variables, which would discourage anyone from doing this.) The environmental acquisition aspect of ExtensionClass is pretty easy now — you just create a raw object and set its __parent__ to its container. You could "reparent" an object on the fly to change its class. Or you could use prototype-based programming techniques, as in Self.

Efficiency

It'll probably be harder to optimize name lookup with something like this (especially if you can change __parent__). However .. with only one underlying implementation for namespaces, objects, modules, and classes, any optimization work you perform here will benefit all of those language features.

Related language features

Simula's objects were originally activation records / stack frames. So making the two similar or even the same is a really old idea. Smalltalk makes everything an object, so my guess is that activation records too were objects. Smalltalk blocks are objects too. JavaScript tries to do something that unifies the two, but gets many aspects totally wrong, by mixing up static scopes and dynamic activation records. (For example, functions get an object with local variables when they're declared, even though they haven't been called yet!) Pascal and JavaScript have a "with" statement that lets you view an object's fields as variables.

Footnotes

[1] One snag is multiple inheritance. I don't think we really need multiple inheritance once we have nested scopes, but MI is a really controversial issue. Here's an example of MI:

     class TCPServer:
        ...
     class ForkingMixIn:
        ...
     class ForkingTCPServer(ForkingMixIn, TCPServer):
        pass
    

What I'd do instead is write ForkingMixIn this way:

     def ForkingMixIn(AnyServer):
        class ForkingMixIn(AnyServer):
           ...
        return ForkingMixIn
    

(Note: I wouldn't want to use this syntax, but this illustrates the underlying construct.)

Now I can write ForkingTCPServer this way:

     ForkingTCPServer = ForkingMixIn(TCPServer)
    

The advantage of this over MI (in Python) is that you can call the methods from the base class. Right now, ForkingMixIn can't easily define method foo() to call TCPServer's foo() and then do one more thing. With the above setup, ForkingMixIn has a handle to the base class, and can call AnyServer.foo(self).

Alternatively, a namespace could have a __parents__ tuple, and it could search each one of them. However, this would be unnecessary for variable scoping (unless your mind is really twisted!). It'd be useful for the "with" statement, but it seems confusing.


[2] One thing I find confusing in Python is that a local name definition applies even before it's been executed:

     x = 5
     def foo():
        print x
        x = 3
        print x
    

IMO, this should print 5 and then 3. The unified namespace proposal would do this. However, Python currently gives a NameError.