Joe Gregorio wonders about hardware support for functional languages...
I've been thinking about this in the context of the Professionalization of Scripting Languages, does anyone know if any of the processor manufacturers are working on adding silicon support for activation records?I know first-hand Intel has been searching for ways to spend transistors for at least a decade. There's been some looking at objects including closures, generally, and some consideration of Smalltalk, Lisp, etc. But I am not aware of specifically looking at stack vs. heap allocation of activation records.
On the other hand I have witnessed decades of considerations of hardware support for functional languages. In each case the problems seem to have been solved at least as well with general-purpose hardware combined with clever software at compile-time and run-time. Looking back at the history of hardware support for Lisp and Smalltalk, and OOP and functional programming generally, will turn up many research and production efforts that have not panned out in the long-run. However the economics may have changed assuming scripting languages will become as "professional" (!) as Joe predicts.
I've been paid well from time-to-time over 25 years to program with sophisticated dynamic languages. That today we're just now thought to be entering a "professional" stage is an indicator that the profession is missing some of it's own literature. So please consider the following. From my historical viewpoint as a veteran dynamic language applications programmer, and dynamic language implementation aficionado, I believe activation record solutions you seek have a rich history of solutions to continue from (so to speak), and they are almost all in software.
From there I would recommend moving on to work published by Appel, et al. for Standard ML/New Jersey (SML/NJ). A starting point is the text book "Modern Compiler Implementation". But this should also include all the research done around compiling ML.
Along the same lines are all the years of work around compilers and run-times for Scheme. SML/NJ and Scheme have particularly interesting activation/closure needs because they are mostly functional, but do have side-effects, and they also have first-class continuations. Look up various work by Kranz, et al. at Yale (in particular for the Orbit compiler for "T", a dialect of Scheme); Dybvig, et al. at Indiana U.; Feeley at U. Montreal; and Clinger, et al. at U. Oregon and Northeastern.
And then include the publications about Smalltalk, Self, etc. implementations by Peter Deutsch, David Unger, Craig Chambers (U. of Washington), and Eliot Miranda among others. Generally reading through the years of OOPSLA proceedings, especially the earlier years, and the proceedings of the ACM Lisp and Functional Programing Conference will keep you busy for a while.
Efficient implementation of first-class activation records revolves around a few decisions. One is whether or not to use the hardware stack at all. Some efficient implementations use the stack and copy to/from the heap as needed. Others use the heap solely. Efficiency also becomes intertwined with interoperability with "C" and so-called "native code".
These choices have more to do with first-class continuations and languages like Smalltalk that promote activation records as first-class objects. If you go back and look at MIT Scheme and work by Guillermo Rozas, they paid a lot of attention to providing features like continuations and first-class top-level environments without _always_ paying the price.
The more common problem of allocating heap for first-class lexical environments is probably even less conducive to hardware-based solutions. A naive compiler (the most naive compiler is a simple interpreter) will allocate a lot of heap and generate a lot of garbage. So part of the solution is how efficiently short-lived heap allocations can be collected. The more complicated compiler problem is analyzing the code to determine how to reduce the amount or frequency of allocated heap. Based on determining which closures are "upward", what lexical state must be captured, and which closures share some or all of that lexical state, a compiler can very smartly move state capture up or down the lexical scope.
For example if closure C1 captures variables V1, V2, and V3, and in the same scope closure C2 captures just V2, then C1 and C2 need to refer to the same "instance" of V2. But maybe V2 is bound several scopes "up" from V1 and V3. The compiler may choose to heap allocate space for V1, V2, and V3 at the point where V2 becomes bound. The C2's code is generated to reference V2 at some offset from the start of this allocation. C1 references all three slots.
And that's the simplest case I could write about. There are other problems that I've not thought about for at least 15 years. My main point is that none of this "professionalism" is new. Many problems new implementers will encounter have a long history of solutions that are likely to be sufficient and, minimally, good starting points. All of the software solutions could show up within a year, where hardware solutions would not show up for five if ever.
The hardware instructions for all of this are fairly basic: incrementing pointers, relative addressing, short jumps. Not much more than that. The problem is how to do as little of all that as possible by walking a lot of tree stuff at compile-time, and leaving a marker here or there at run-time.
In addition to diving into the literature, I would recommend studying, if not using directly as a "common language compiler and run-time" the Gambit Scheme compiler and run-time. Gambit is the system to beat for efficiency, if it is not going to be your system to utilize as a "back-end".