[Update: Michael Salib's PyCon 2004 session will be addressing exactly this topic --- "Faster than C: Static Type Inference with Starkiller"...
This dynamism makes programming in Python a joy, but generating optimal code a nightmare. Yet while the presence of such abundant dynamism makes traditional static optimization impossible, in most programs, there is surprisingly little dynamism. For example, in most Python programs:
- all class and function objects are created exactly once
- class inheritance relationships do not change at run time
- methods are not added after a class object has been created
- most expressions have exactly one type; the vast majority of those that have more than one type have only a few types
The flip side is that what little dynamism a particular program makes use of is often absolutely vital.
I have developed a type inference algorithm for Python that is able to resolve most dispatches statically. This algorithm is based on Ole Agesen's Cartesian Product Algorithm for type inference of Self programs. I have built a type inferencer for Python based on this algorithm called Starkiller. ]
As folks are discussing, C Python's simple implementation has a cost/benefit. Compiling efficiently may require some kind of "optimistic with fallback" approach. Assume the internal representationss are not messed with, and so compile the code to be optimistically efficient. If the internals become messed with then flip the bit and fallback to the simple implementation for that object.
This is similar to optimistically compiling for efficient data representations. For example when the code is about to do some math, compile in-line a test for the data type. Branch to multiple paths of code based on the type. Create a branch for in-line integer math and/or a branch for in-line double math and also include a branch for the fully boxed math.
Another approach that should not be overlooked is whole/partial program analysis. When moving from development to production, whole program analysis could be employed to optimize the specific application or groups of packages. This approach has been employed for Scheme with a good deal of promise...
Stalin has been tested on a suite of benchmarks whose length ranges up to a thousand lines. Stalin outperforms Scheme->C, Gambit-C, Chez, SML/NJ, and even handwritten Ada, Pascal, Fortran, and C code, on most of these benchmarks.