Update 3: Anonymous points out that Jim Weirich has an inspired exposition of the spreadsheet implementation in Ruby. Very nice.
Python and Ruby are excellent languages. There are just a few small cracks in these languages where if you look carefully through them toward Smalltalk and Lisp, you can see how truly gifted, or maybe just lucky, the designers of those original dynamic languages were. These are just small cracks though, as Jim pointed out in Ruby and Ian Bicking mentioned about Python, for example, that delaying some of the operators with Formulas is difficult or impossible.
This is a lesson for designers to be as regular as possible. (If you are into the theory of regularity in design: pdf) End of Update 3
I've wondered several times in conversations why the concept of a spreadsheet is not more easily a first-class feature of popular programming languages. There is a long history of constraint programming of various flavors that show up typically as an API in an imperative language or as the dominant style of a new language. (Mozart/Oz comes to mind as a language that emphasizes constraints but does not make them dominant.)
Kimbly points to an approximation in Python...
What bothers me about this of course (?) is the "extra-lingual" expressions embedded as text, e.g.
>>> ss = SpreadSheet()
>>> ss['a1'] = '5'
>>> ss['a2'] = 'a1*6'
>>> ss['a3'] = 'a2*7'
>>> ss['a3']
210
'a1*6'
. A first-class implementation would be done entirely in the language per se. After the text 'a1*6' is passed around, which functions assume it is just text and which assume it is a constrained expression? (We like objects.)
When I have seen a "spreadsheet" implemented in a language, invariably the objects are exposed through a user interface primarily, and programmatically through an awkward API at best. (With certain exceptions, in particular Lisp, which make programmatic integration essentially transparent, everything looks the same.)
In Smalltalk, maybe, using points rather than the alphanumeric column/row convention...
Spreadsheets should be useful without a user interface. Constraints should be a part of contemporary programming languages. Why aren't they?
ss := Spreadsheet new.
ss cell: 1@1 value: 5.
ss cell: 1@2 value: (ss cell: 1@1) * 6.
ss cell: 1@3 value: (ss cell: 1@2) * 7.
(ss cell: 1@3) value
210
Update: Matthias Ernst asks in a comment...
Don't you rather want:The use of a block here is intended to "delay" the computation until a change to a dependency forces a revaluation.
ss cell: 1@2 value: [(ss cell: 1@1) * 6]
Add cell value caching and invalidation after change.
The problem then is the programmer has to know where to put the blocks to delay computation and where to force the revaluations. Instead of blocks, though, my intention is to use some new object, a Formula
, say, and to make those objects implicit as much as possible.
So the message Spreadsheet>>cell: aPoint value: aFormula
would store the formula at the cell's point in the spreadsheet. If aFormula
is not a Formula
, then it is assumed to be a constant, and so the spreadsheet would wrap the constant in a subclass of Formula
, say ConstantFormula
. (Probably the thing to do is to send the argument the message Object>>asFormula
which would simply return itself if it is already a formula.)
The result of the message Spreadsheet>>cell: aPoint
is a Formula
(which may be a ConstantFormula
) and so to get its value, send it the message Formula>>value
. (e.g. the last expression in the example returns 210
.
But what about (ss cell: 1@1) * 6
? The expression (ss cell: 1@1)
results in the formula at the point 1@1
. The result of multiplying a formula by a constant is a new formula that does the obvious multiplication when evaluated. (And since multiplication is commutative and Smalltalk allows you to redefine "system" methods, numbers should understand (through double dispatching) that the result of 6 * (ss cell: 1@1)
is also a formula.)
*"is not a formula" does not mean your code should test for the class of the object. Rather a message, say Object>>isFormula
, should be sent to the object. the default implementation for all objects returns false
. The implementation of Formula>>isFormula
returns true
. Other implementations may return true as well if they are intended to behave as formulas rather than as constants in a spreadsheet.
**Double Dispatching: Yes, multiple argument dispatching in CLOS is better, and yes, Python has this in 2.4. Maybe something should go into Smalltalk, but the good news is you are free to try it then tell the rest of us. (Dan Ingalls, Ralph Johnson, Andrew Black)
Update 2: Gavin McGovern writes in a comment...
One of my favorite spreadsheets is Levoy's "Spreadsheets for Images",Nice. Thanks. This is kind of like Kurt Piersol's spreadsheet in Smalltalk he presented at OOPSLA '86. He uses image processing as an example of operating on non-numeric data. (Unfortunately or not, he also extends the syntax for operating on spreadsheets. From the examples above, I am not yet convinced this is necessary.)Bit backwards: he essentially put Tcl (+ image stuff) in a spreadsheet. But a neat example nonetheless of a alternative user interface.
5 comments:
> ss cell: 1@2 value: (ss cell: 1@1) * 6.
Don't you rather want:
ss cell: 1@2 value: [(ss cell: 1@1) * 6]
?
Add cell value caching and invalidation after change ...
Matthias Ernst
One of my favorite spreadsheets is Levoy's "Spreadsheets for Images", http://graphics.stanford.edu/projects/spreadsheets/
Bit backwards: he essentially put Tcl (+ image stuff) in a spreadsheet. But a neat example nonetheless of a alternative user interface.
gavin
You can use that same technique (ss cell: 1@1) in Python. That's what I do in SQLObject.sqlbuilder to construct queries; you can override all the operators (as of Python 2.1) to return arbitrary objects. Except the boolean operators (and, or, not), which is a bit annoying. I still find it kind of hacky, since it involves these magic objects, that are kind of value and kind of not. But one of the advantages of this over eval() is that you can mix in known constants, even when they are stored in a variable. And you could potentially mix in other late-bound values and expressions, though both in Python and Smalltalk you'd have to wrap them in a closure. (Which is prettier in Smalltalk)
When I think of spreadsheets I usually think about it in terms of naked objects -- variables and arrays made visually manifest, and thus much more accessible. But this aspect is important too... it's more the mathematical notion: the use of "=" to mean "is equal to", not "is equal to?" or "make equal to". Prolog-ish, even if the implementations tend to be constrained so that they can be implemented in an imperative way.
Jim Weirich responded to this post and tackled it in Ruby. It's a pity there is no referer or trackback on this page. URL (can't figure out how to make it a link):
http://onestepback.org/index.cgi/Tech/Ruby/SlowingDownCalculations.rdoc
How about using lambdas for expressions. The you are using the language.
For instance
class spreadsheet(dict):
def __getitem__(self,val):
if callable(super(spreadsheet,self).__getitem__(val)) :
return super(spreadsheet,self).__getitem__(val) (self)
else:
return super(spreadsheet,self).__getitem__(val)
# Excuse the formatting I couldn't get it too look right ;-)
Then
>>> s = spreadsheet()
>>> s['a1'] = 4
>>> s['a2'] = 2
>>> s['a3'] = lambda self: self['a1'] + self['a2']
>>> s['a3]
6
>>> s['a4'] = lambda self: self['a3'] * 10
60
Yep heaps of stuff is missing ;-)
Post a Comment