"I have a mind like a steel... uh... thingy." Patrick Logan's weblog.

Search This Blog

Friday, December 31, 2004

Languages and Data Integration

Years of endless debates about static type annotations regurgitating the same information over again have led me to shy awayfrom them in recent years. I've done pretty well, yet here I am in the gravity field of another one. I'm not sure how long this will last but for this day, I am somewhat interested in continuing.

PJE writes in a comment...

Um, just because a given use case can be accomplished without a given tool, doesn't mean the tool is useless, or isn't useful for that use case. If that were true, we'd never use anything but machine code.
I have not found any useful scenarios where I want static type annotations. There's not much more to say than that. I'd like to hear of scenarios others have found useful so I can compare them to my own.
Meanwhile, I notice you haven't responded to the point that solid integration with statically typed languages often requires type annotation; e.g. jythonc's use of special patterns in docstrings. Integration with other languages, documentation, and runtime introspection/adaptation are my use cases for type annotation.
I'm not sure what you mean by "solid" integration. I have done production-quality integration between "agile" languages like Lisp, Smalltalk, and Python with "rigid" languages like Pascal, C, C++.

There are situations where the more rigid languages have required, say, a 32-bit integer or a specifically formatted sequence of values (i.e. "struct" or "record"). "Agile" languages tend to provide integers that are not limited by the word size of the underlying hardware. In scenarios like this then, on the agile side of the integration a check is necessary that the integer will fit in the rigid size of 32-bits and so on. Likewise a conversion is necessary to actually format the scalar data and structured data into the kind expected on the rigid side.

I would not call these expressions "static type annotations" in the agile language though. Rather they are hoops I'd not have to jump through as much if the rigid language were more agile. The language systems I have used that I can recall have tended to allow me a way to express these in the agile language itself rather than in a comment or doc string, but that's a minor point.

Languages like Java and C# are less rigid and given their capabilities for reflection, integration with more agile languages tend to be more automatic. And so Jython and Python.Net among others tend to do most of the conversion automatically without special annotations or coding.

Corner cases require more programmer attention but that's also a minor point. They could agree on a common model for data exchange such as is done in CORBA and it's descendent WS-xxx. Doing integration via CORBA using Smalltalk is so easy I believe had more programmers been working this way the WS-xxx fiasco would never have been. Rather improvements to CORBA's relationship with HTTP would have been getting all the intention. These are market force issues, not software.

Another source of feedback on this is the Inter-Language Unification work done at Xerox. Related to CORBA but more focused on integration aspects and RPC, where CORBA was beginning to branch out into a variety of distributed programming models.

I don't see how type inference helps with either documentation or introspection, either. If it's pure "duck typing", how is the system going to infer that I want an IAtomicStream, say, versus an IRawStream, if the two have the same methods? Duck typing isn't sufficient to distinguish different *semantics* of a type/interface, only its method signature -- and that is not sufficient for documentation!
If you need an IAtomicStream on the rigid side then there is probably enough information to infer that from the code on the agile side. If the rigid specification is ambiguous (well, then it's not really a "specification" is it?)... then, yes, you need to have some kind of an expression on the agile side to disambiguate.

The point is, this is a burden rather than a feature. Why would I *want* to work at this level of detail?

A good bit of information on the power of inference for agile languages can be found in the implementation of the Stalin Scheme compiler, see "Flow-Directed Lightweight Closure Conversion" by Jeffrey Mark Siskind.

Documentation needs to present the semantics of the interface, so that if you are implementing that interface yourself, you know what semantics to implement.
The documentation I want to see is (1) executable examples or (2) technical memos that augment executable examples. Period.
Likewise, for introspection, I don't see how an implicit signature helps. Let's say I want to create an XML schema for a document format that can contain messages that get converted to method calls on some object. If each of those methods uses a different set of methods on their arguments, I'm going to end up defining new types for each method, which might be okay for the computer to process, but for a human being it's going to be a mess to understand.
I am not sure what this example is about. I agree there are integration scenarios where something like an XML Scheme is required to specify the kinds of messages that can be exchanged between language-independent processes. That does not in any way imply that static type annotations are a benefit to the languages I use to implement the independent processes.
Type inference, IMO, is just type checking. It doesn't seem to me to be useful for anything else -- and like you, I'm not interested in type checking! (I suppose it could be used for optimization in some cases, but only if you can statically prove a concrete type - not just an operation signature.)
Type inference is better than type annotation. The real benefit I see is in getting even more capable agile modeling and inferencing tools by extrapolating from recent work in model checking. Simple, even compound, type checks are uninteresting. We do need a lot of help with concurrent state machines however. Tools that can tell me more about *real* behavior problems rather than simple type checks should be getting all this attention.
Now, perhaps I could be wrong, and there is some other way to accomplish these use cases that's just as clean. If so, I'd sure like to see them. I personally don't see that any of my uses for type annotation *can* be satisfied by inference; there's just not enough information. Heck, I don't see how a *human* could do the inference in the absence of added documentation, let alone a computer. It would be little more than blind guessing.
I think I see one use case here... how to integrate languages that have different data models. I agree system integration requires some kind of agreement on specifying data integration. I'm not sure what else I am seeing here.

OK. I am getting tired of writing about type checking. I think we're in agreement on this use case of language-independent data integration.


Anonymous said...

Static typing is only useful if it helps ME. I don't care if it makes the compiler's job easier, and I don't care if it makes someone else's job easier, unless that's someone I care about. The reality is that even Lisp and Smalltalk are perhaps 30 years behind where they should be because we continue to argue about quaint notions.

For example, if Python wants to add type "annotation" to help out in the short term while type inferencing develops inside the compiler, then we can have an intellectual discussion. If it's just a fetishistic addition for a check-box, then it's counter-productive.

PJE said...

"""Years of endless debates about static type annotations regurgitating the same information over again have led me to shy awayfrom them in recent years. I've done pretty well, yet here I am in the gravity field of another one."""

You may be debating about static typing, but I'm not. :) I agree with you about static typing; I'm only continuing to comment because I'm not yet sure you realize that, and may still be missing the use case I was (and am) trying to explain. Or, if you didn't miss it, and have some really cool way to do what I want without doing it in the way I think it would work best, then I'd still like to find out about it. :)

"""The language systems I have used that I can recall have tended to allow me a way to express these in the agile language itself rather than in a comment or
doc string, but that's a minor point."""

I'm not so sure. It depends on what you mean by "express these in the agile language itself". Do you mean something like:

@java_signature(jint, jstr)
def javaMethod(self, anInt, aStr):

Or do you mean something like:

def javaMethod(self, anInt, aStr):
    anInt = jint2pyint(anInt)

My point here is that something more like the former and less like the latter is more attractive to me, because it allows me to build the system on explicit metadata. For example, it would be easier to build a Python-to-Java translator that used explicit metadata, rather than implicit inference.

"""The point is, this is a burden rather than a feature. Why would I *want* to work at this level of detail? """

I'm writing a library. The library has certain requirements of items passed to certain methods. These requirements are common across several methods. I would like to have a concise way to refer to them; an interface name is a concise, precise, symbolic way to do that.

I've never programmed in Smalltalk, but I've heard that one common approach there is to define a "protocol", usually by adding an 'isFoo' method to the base object type. This is another way of addressing the same issue; you can at least document that such-and-such message requires an object for whom 'isFoo' is true, and then use 'isFoo' itself to hold the documentation explaining the semantics and interface, or at least to be the name you use to look them up by.

In Python, a similar approach is to use an interface IFoo, and call 'IFoo(ob)' instead of testing 'ob isFoo', or whatever the correct Smalltalk is for that.

My point is, it would be nice to be able to 1) explicitly state in computer readable form, "this method takes an IFoo", 2) be able to introspect that, and 3) automatically adapt to IFoo if a suitable object is passed to the method.

This is *not* the same thing as "static typing"; please don't confuse it with that. It is type *annotation*. In fact, if PEP 246 semantics are used, there isn't necessarily any way for a compiler to prove at compile time that a program is incorrect, since an object can *dynamically* decide to support (or not support) an interface, and the interface can *dynamically* decide to put a wrapper around the object or not.

I agree that required explicit static typing is a pain, and I am not advocating it. To the extent that Guido's proposal is mere static type enforcement, I'm against it.

However, being able to *annotate* methods or classes with type information is useful, as is being able to do automatic adaptation with a compact notation. Having explicit annotation allows you to do things with the data, like generate a database schema or other syntax-driven machinery.

For me, the main reason to support a type annotation syntax in Python is to pre-empt the fifty different kinds of decorators people will need to write to do type annotation in order to support these use cases. (E.g. one for ctypes, one for Jython, one for PyObjC, etc.)

It would be really nice to be able to have a common syntax and semantics that could be leveraged to implement lots of different metadata-driven use cases. That way, whenever someone reads a Python program, the documentation purpose is served, even if they are reading the code in a plain ol' text editor, rather than a fancy type-inferencing IDE. :)

Anyway, it's pretty clear we are talking about different animals; you keep saying "type checking is useless, type checking is bad", and I *agree with you*. I don't want annotation for type checking; I just want to be able to use type annotations to drive other things. (Btw, adaptation is not type checking either.)

For example, Zope has this thing where it maps form fields into function arguments on the method of an object; if you POST a form to a URL that represents an object method, Zope introspects the method's argument names to see what form fields it should pass to the method.

However, because there's no type information to extract, Zope uses this naming convention for form fields where you tack on things like ':int' or ':list:int' in order to put the type information into the HTML form! This is duplicative, to say the least, and it also causes interesting issues with client-side JavaScript at times. It would be much nicer if you could simply annotate the method with what types it wants, and the framework could then look up converters to map from the posted form data to the type desired by the method being invoked.

This is a simple example of a case where the logical place to express needed type information is on the method itself, and it isn't even an inter-language integration issue, unless you consider HTTP form posts to be another language. Even then, it's certainly not a statically typed language being interfaced with; rather, the HTTP post is essentially typeless, and the typefulness is needed by the Python program in order to escape from everything's-a-string land.

Notice that the issue here isn't type checking, it's type conversion; an example of one kind of metadata-driven transformation.

Here's another one: create a GUI wrapper that uses argument metadata to "intuit" suitable forms to invoke methods, using the types to look up suitable widgets or subforms.

Can you meet these use cases other ways? Sure, trivially, just write your own darn function/method decorators, or invent some other specification language outside the code, use code generators, or any of half a dozen other techniques. I just think it'd be nice to have a convenient, easy-to-read/use syntax that everyone would then have in *common*, instead of the decorator-per-use-case scenario.

Paul Moore said...

I think that the key point here was in the last paragraph of one of PJE's comments - it's about having a notation *in common*. This is a key issue in a number of areas - adaptation, interfaces, introspection, etc. People have created ways of doing all of these, and they work fine. But they aren't interoperable (without substantial effort). If I use Twisted interfaces in my code, I can't just plug it into something that uses Zope interfaces. To my mind, *that's* the reason for language support.

Oh, and I'm fairly scared by Guido's latest posting on type algebras, inferencing, etc. If it can be done, great - but that looks like a lot of developer effort on something I don't (yet) see the benefit of - given that I'm another of the people who has no interest in static type checking, but might have an interest in the annotation aspects of type declarations.

We'll see - I'm much clearer on what Guido's getting at after his second posting, so maybe it's just a matter of time. I don't see any of this happening soon, though, for better or worse.

Blog Archive

About Me

Portland, Oregon, United States
I'm usually writing from my favorite location on the planet, the pacific northwest of the u.s. I write for myself only and unless otherwise specified my posts here should not be taken as representing an official position of my employer. Contact me at my gee mail account, username patrickdlogan.