"I have a mind like a steel... uh... thingy." Patrick Logan's weblog.

Search This Blog

Sunday, September 30, 2007

Many-Node has many more implications than Many-Core

ManyCoreEra refers to a post on the Parrot VM and various concurrency mechanisms.

These shared-memory mechanism discussions continue to miss the point about the Many Core Era...

The many-core era will also be a many-node era. You will not have a C: drive except for those legacy systems running in their little jars of formaldehyde.

You will have a lot of computing power "out there" and several kinds of displays that are not directly attached to "your computer". You probably will not be able to locate an "application" as being installed on your C: drive and running on the CPU on the mother board that has a ribbon running out to the disk that has C: formatted on it.

Hardware will be too cheap to be organized this way. We need to begin reorganizing our software now or five years from now, we'll be really bad off, without any excuses.

If your current programming models distinguish between "these threads threads" and "those other processes over there" then it's almost certainly the wrong model for today, not to mention tomorrow.

Update:

The Java Server Side site has a lengthy discussion based on this lil'post of mine. A number of different views there.

More on the ESB

Cross-referencing fuzzy's cross reference to my cross reference -- we may create a vortex that implodes on itself. But anyway, in my URI post I said the ESB subject per se is a topic for another post, which turns out to be this one. Mike's hit my main impression with ESBs over the years. I wonder what Ross Mason or other ESB advocates have to say about my experience. First Mike's note on his recent exploration with Mule...

I have more learning on Mule to do. I was a little overwhelmed in working with it yesterday (lots of jar files, lots of XML config, examples that didn't help with what I wanted to do, docs that didn't answer my questions - typical newbie stuff). I did succeed in getting it configured to do what I wanted it to do after going down a number of rat holes & now have something to build on this week.
In my experience with various middleware tools (some of which predate the term "ESB" but all of which tend now to place themselves somewhere in the ESB territory) these turn out to be excercises in installation and configuiration by dialog box and/or by XML config file. I dislike both.

ESBs seem to me to be lumps of things useful for integration, but do not form any kind of coherent shape out of the individual lumps. If you need XML transformation, email, HTTP, file drop, etc. then why not just use the simplest dang library for the one or two of those needed for any given situation? If you need more than two of those in a single situation, then that's a symptom rather than a need.

Look at the list of "adapters" or whatever an ESB advertises. In each case, one can locate a fairly simple library, probably already on their machines, that implement that one feature well enough. ESB's say they can "wire" all that stuff together.

Such wiring usually appears to be more complicated than the original problem. They just have never appealed to me. But never say never, I guess.

I am from Missouri when it comes to ESBs. Show me. I'll give you the benefit of the doubt, if you can just show me, why it makes more sense to use such an all-in-one, configuration nightmare. :-/

I may be biased already, but I've not seen where these things pay off over a little code, some tests, and a couple simple libraries that do exactly what's needed. Assuming your experience differs from mine, please show me.

Update from comments: Yuen-Chi Lian's comment to this post responds to Dan Creswell's. First, Dan Creswell's... (Dan's connected to Jini/Javaspaces, in particular to the Blitz Javaspace implementation and some significant uses of J/J in production situations)

Of course the more point-to-point type integration solutions can also get out of hand but as you say it's simpler, more contained and easier to get to grips with.....
And now Yuen-Chi Lian's response to that... (he's connected to Mule, says google, so presumably has some good case studies in applying ESBs)
That's the point. And when it comes to a system with more and more enterprise applications connecting to it -- different vendors, different language, etc. You need a centralized mechanism, a universal connectivity, a message bus, an ESB. To keep messy things at a manageable level.

Also, programming in the large is simply a different (not better) development model to follow. But there are still fundamental concepts as its core which we have learned throughout the years in "traditional" development -- "high abstraction" and "loose-coupling".

I've got some experience with the Tibco RV bus, JMS, and Javaspaces. I understand how to use pub/sub, queues generally, and tuple spaces to reduce the NxM connectivity problem. I understand how to define "canonical messages" so that the messages on the bus are not just propagating the internal implementation details of the participants. And so this to me seems like appropriate "high abstraction" and "loose coupling".

I've never been a fan of the big tools sets that have grown up around tools like RV. All the point-and-click, configure the xml, "wire-up-the-bubbles" tools just don't do much for me. Writing a little bit of code to read/write messages on a bus seems just fine.

And so to me "message bus" and "enterprise service bus" seem to be two different animals. One is a simple animal (basic bus capabilities), the other a mangy mutt (ESB). The ESB is like a grab-bag of mechanisms that form no coherent whole. And they tend to have WS-Deathstar infused in their being, lacking simplicity and simply defined terms.

So give me a bus, give me a tuple space, give me simple mechanisms like that. But xml config'ing a zillion jar files to "mix-n-match" all the ESB thingies -- who really does that? Successfully? And what does the *architecture* of that look like? I remain unconvinced and nearly uninterested.

Today I am drawn to the simple web approach as opposed to the simple bus approach. The web, via atompub and the atom format, seems to incorporate enough "bus-like" behavior to meet many needs. Where it falls short then simple bus mechanisms can be employed as I'd turn to before. But I'd look to do that *without* hurting the web aspects. All that NxM connectivity stuff should have a web woven around the bits. They can all use the web as the simple, unifying architecture.

ESBs do not form an architecture in and of themselves. They are mechanisms that can provide some plumbing. I'm not sure at all that they add much to the simple plumbing mechanisms I've had success with already. I have trouble understanding how to approach them, and nothing I have seen to date has eased that for me.

So really good case studies would be appreciated. Give me your best shot, and I'll try to consider it.

Leaning

Fuzzy again...

The Poppendiecks are my favorite methodologists. I have yet to be disappointed with Lean thinking.
(Get their books if you have not yet. And read their other articles too.)

Mike has some good quotes and I'll pull some others out here. Another nice article from that pair. They provide a setting for having discussions with the rest of the business, less tied to the jargon and rigidity that has overtaken a true "agile" approach to software development. Here they are quoting Taiichi Ohno of Toyota...

"Years ago, I made them hang the standard work documents on the shop floor. After a year I said to a team leader, 'The color of the paper has changed, which means you have been doing it the same way, so you have been a salary thief for the last year.' I said 'What do you come to work to do each day? If you are observing every day you ought to be finding things you don't like, and rewriting the standard immediately. Even if the document hanging there is from last month, this is wrong.' At Toyota in the beginning we had the team leaders write down the dates on the standard work sheets when they hung them. This gave me a good reason to scold the team leaders, saying 'Have you been goofing off all month?'

"If it takes one or two months to create these documents, this is nonsense. You should not create these away from the job. See what is happening on the gemba and write it down."

Years ago I attended several weeks of CMMI training from a couple of really good instructors from SEI. They taught us essentially this. Start where you are, make small, incremental improvements.

I was the "agile" person there, and of course identified with this immediately. Most of the other people there were from "the CMMI team" whose job it was to "improve" everyone else. Unfortunately after years of discussion, they never really understood how to make improvements. To a large degree this was due to the fact that none of them ever developed much software, ever made many improvements, and were determined to force a made-up, "ideal" process on all the developers and project managers, who did nothing but resent it.

And so it goes. Now we have "agile" groups acting just as rigidly about their crown jewels.

Saturday, September 29, 2007

Happy 80th

John McCarthy turned 80 recently. Happy birthday!

In 1961, he was the first to publicly suggest (in a speech given to celebrate MIT's centennial) that computer time-sharing technology might lead to a future in which computing power and even specific applications could be sold through the utility business model (like water or electricity).

Enterprise URIs

I saw the page in our wiki where fuzzy was working out some uri's. That *is* cool...

I'd rather bet on the URI than any ESB vendor...

With a URI naming scheme that doesn't change you get a very different, very simple view of your systems. Sure, there may be madness today behind those URIs, but over time that madness will hopefully start to go away. But your URIs will stay the same.

Rather than selling proprietary ESB middleware, "integration" vendors should be developing and selling "information architecture" expertise as well as experience in planning and developing simple tools, that don't hurt the web, around the awful messes every organization has and would like to move away from over time.

Update: Ross Mason comments...

I agree that URI design is very important, but I'm not sure I understand what that has to do with ESB vendors. In the Mule we use URI to express service endpoints, but this isn't limited to http but also Jms queues, data repostiories even IM and EJBs. However, in all these cases, we, the vendor, don't control how the URIs are defined above ensuring that they are valid. Can you elaborate a little on your comments about the ESB vendor involvement?
Good question. Over the last few years, in a couple of different IT shops, I have been directly involved in "SOA" efforts. At the same time I have been a fairly close observer of other similar efforts. In each of these efforts a lot of time has been devoted to products in the ESB category.

My primary concern over the last 25 years of software development has been the inability for the software developers to make a change they would otherwise choose to make, except for the fact that making that change is inordinately too difficult. The difficulty in almost all circumstances were due to implementation details of one component being expressed as dependencies by another component. In order to change A, you also have to change B, and usually worse than that.

As a result the *apparently* most cost-effective choice *in the moment* usually is, "don't change anything". And so these problems compound with more problems over time until the entire data center is in a development deadlock. Nothing significant can be improved because everything depends on everything else. I've been involved in a couple of situations where the big changes *were* funded and they are nightmares.

To avoid, or to climb out of, such massive technical debt, an enterprise must invest in separation of concerns, abstractions that hide details as much as possible, "wedges", if you will, between the components that should be able to change independently. The WS-* proponents envisioned WSDL-based interfaces as such "wedges". The message bus proponents (and I was one of those for a long time) envisioned "messages" as such wedges. The distributed object proponents (and I was one of those for a long time too) envisioned "objects" as such wegdes.

I was never a WSDL/WS-* fan, but I still believe reasonable solutions could be implemented with messages and/or objects in many data center situations. But taking a step further back, I can now see how identifying the significant "resources" of an organization, and how to identify (and "locate") those can be even more abstract than messages or objects. Atom, for example, can be used to identify, locate, manipulate, and announce changes to resources over time. Nothing at that level must be said about the lower-level techniques (e.g. messages and/or objects) used to do so.

I would like to say as little as possible at the enterprise level about how I implemented the enterprise. An ESB product may or may not fit in somewhere behind the scenes (that's another discussion). I would like products and technologies to come and go as desired without unduly affecting the rest of the enterprise.

An ESB, or any kind of "bus" or object system is not an architecture, per se. They may help to realize an architecture. Right now it looks like "the web" is a pretty good architecture for the enterprise.

Third parties, such as ESB vendors, over time could have more real success with enterprises by helping them implement web-like architectures rather than helping them install and configure ESBs. Mule may play a role here or there behind the scenes, but enterprises need to learn how to build more lasting structures ("information architectures" may be a decent name for it) and focus less on the plumbing that they've been sold by the ESB vendors.

Thursday, September 27, 2007

When it rains, it pours i/o all over Erlang

klacke adds to the erlang i/o discussion as he did with the regexp discussion, with a faster library...

Originally at Bluetail, we had some serious problems with high performance file I/O, especially line oriented such.

I then wrote a portable (yes win32 too) linked in driver for fast FILE I/O. It's based on an old and hacked version of the BSD FILE* interface. It's called bfile and we've been using it in pretty much all projects during the past 8 years. I've prepared a tarball of it at

http://yaws.hyber.org/download/bfile-1.0.tgz

2> bfile:load_driver().
ok
4> {ok, Fd} = bfile:fopen("Makefile", "r").
{ok,{bfile,#Port<0.98>}}
5> bfile:fgets(Fd).
{line,<<10>>}
6> bfile:fgets(Fd).
{line,<<10>>}
7> bfile:fgets(Fd).
{line,<<97,108,108,58,32,10>>}
14> bfile:fread(Fd, 10000).
{ok,<<10,10,105,110,115,116,97,108,108,58,32,97,108,108,10,9,40,99,100,32,99,95,115,114,99,59,32,...>>}
15> bfile:fread(Fd, 10000).
eof

More on Erlang's i/o Rationale

From Ulf Wiger of Ericsson on the erlang-questions email list, regarding the performance of some of the i/o functions...

One reason is that file IO in Erlang has traditionally been tuned in order to be as unobtrusive as possible, in massively concurrent systems. For example, Mnesia's log dumps usually run in the background at low priority in such systems, and the more important IO is the signaling to/from the network. In these systems, writes to disk are uncommon, and reading large volumes of data from disk only occurs at restarts (which are - hopefully - exceedingly uncommon).

While we've noticed for a long time that Erlang's IO generally sucks in benchmarks that test raw sequential speed on one large file or one socket, it hasn't been clear that this adversely affects the key products using Erlang.

I'm sure that we can find ways to speed up such IO without adversely affecting the characteristics of massively concurrent IO. As Erlang is spreading more into other application areas, this is bound to be a major issue.

Solid State

(via Steve Dekorte)

Fusion io's flash storage card. Neat.

...the cards will start at 80 GB and will scale to 320 and 640 GB next year. By the end of 2008, Fusion io also hopes to roll out a 1.2 TB card...

...the card has 160 parallel pipelines that can read data at 800 megabytes per second and write at 600 MB/sec. He even proved it by running a Linux drive I/O benchmark. But for large corporations running busy databases, operations per second is a much more important number than bandwidth.

Flynn set the benchmark for the worst case scenario by using small 4K blocks and then streaming eight simultaneous 1 GB reads and writes. In that test, the ioDrive clocked in at 100,000 operations per second. “That would have just thrashed a regular hard drive,” said Flynn.

Five years from now will be fun, running not-so-little data centers in a pizza box.

How are you going to justify running your operations on a mainframe in 2012?

Moore's Law is changing the hardware landscape orders of magnitude more quickly than the software community can track. We have not even grasped the difference between today and tomorrow because we're still way back before yesterday in the way we think about software.

How long before someone gets rid of that artificial disk driver sitting between the processor/caches/memory and the "disk"?

Tuesday, September 25, 2007

Apparently Fast Erlang File Read and Regexp

Claes Wikstrom sent a link to the erlang-questions list to his faster regexp library. No report yet on speed...

...the only fast way today to process a large file line/by/line is to
  1. file:open(Filename, [read, raw])
  2. In a loop {ok, Bin} = file:read(Fd, BufSize)
  3. Use a binary regex matcher such as...
http://yaws.hyber.org/download/posregex-1.0.tgz

(I don't know the state of the regex lib in OTP today, last time I looked it sucked bigtime though)

/klacke

Here's an example of its use. Note it uses Erlang's binary representation (a sequential hunk of memory) of strings instead of its list-of-characters representation...
Erl Interface to posix regular expressions by klacke@emailaddress.xyz
LICENSE: BSD style, free, use,  molest and rewrite

To build, make and sudo make install

To use:

1. Compile your regexp.

4>  {ok, RE} = posregex:compile(<<"abc.*foo">>, [extended]).
{ok,#Port<0.101>}

Try to match something 

7> posregex:match(RE, <<"abc mre text here foo">>, []).
ok

If it doesn't match 

9> posregex:match(RE, <<"abdc mre text here foo">>, []).
{error,nomatch}

Try to match and find out where the match occured

10> posregex:exec(RE, <<"abc mre text here foo">>, []).  
{ok,[{0,21}]}

Free memory occupied by the compilation (or exit process since
RE is an erlang port)

11> posregex:free(RE).
ok 

Monday, September 24, 2007

Beer Riot

Oh and BeerRiot.com is also done in Erlang and ErlyWeb. Gulp.

I gotta try that Sam Adams Imperial Pilsner. (Sam Adams? Imperial? The irony.)

Vimagi

Vimagi was built with Erlang and the ErlyWeb framework. (And Flash apparently.)

Check out an example.

Pier Port for Cincom Smalltalk

Via James Robertson, Pier has been ported to Cincom Smalltalk. What's that mean?

Pier may be the best open source application built on the open source Seaside framework. (The best commercial application being DabbleDB of course.)

Pier may be the best extension of Ward Cunningham's "wiki" concept. And it is built on Magritte, which may be the best self-describing meta application system on... well, on earth.

And Cincom Smalltalk may be the best OO dynamic language system, probably the best such commercial system, and has all the openness that Smalltalk systems have had going back to the early 1980s. In fact CST's lineage goes all the way back. (Why would you use Ruby when you could use Smalltalk???)

Cincom has or will soon have support for Seaside. Try out Pier. It's cool.

Regular Expression Matching Can Be Simple And Fast

Speaking of regexp. Interesting analysis of approaches to regexp design. Rob Pike shows up again in this. Those Unix guys had something going back then.

This is a tale of two approaches to regular expression matching. One of them is in widespread use in the standard interpreters for many languages, including Perl. The other is used only in a few places, notably most implementations of awk and grep. The two approaches have wildly different performance characteristics...

Notice that Perl requires over sixty seconds to match a 29-character string. The other approach, labeled Thompson NFA for reasons that will be explained later, requires twenty microseconds to match the string. That's not a typo. The Perl graph plots time in seconds, while the Thompson NFA graph plots time in microseconds: the Thompson NFA implementation is a million times faster than Perl when running on a miniscule 29-character string. The trends shown in the graph continue: the Thompson NFA handles a 100-character string in under 200 microseconds, while Perl would require over 1015 years. (Perl is only the most conspicuous example of a large number of popular programs that use the same algorithm; the above graph could have been Python, or PHP, or Ruby, or many other languages. A more detailed graph later in this article presents data for other implementations.)

It may be hard to believe the graphs: perhaps you've used Perl, and it never seemed like regular expression matching was particularly slow. Most of the time, in fact, regular expression matching in Perl is fast enough...

Today, regular expressions have also become a shining example of how ignoring good theory leads to bad programs. The regular expression implementations used by today's popular tools are significantly slower than the ones used in many of those thirty-year-old Unix tools.

This article reviews the good theory: regular expressions, finite automata, and a regular expression search algorithm invented by Ken Thompson in the mid-1960s. It also puts the theory into practice, describing a simple implementation of Thompson's algorithm. That implementation, less than 400 lines of C, is the one that went head to head with Perl above. It outperforms the more complex real-world implementations used by Perl, Python, PCRE, and others. The article concludes with a discussion of how theory might yet be converted into practice in the real-world implementations...

While writing the text editor sam in the early 1980s, Rob Pike wrote a new regular expression implementation, which Dave Presotto extracted into a library that appeared in the Eighth Edition. Pike's implementation incorporated submatch tracking into an efficient NFA simulation but, like the rest of the Eighth Edition source, was not widely distributed. Pike himself did not realize that his technique was anything new. Henry Spencer reimplemented the Eighth Edition library interface from scratch, but using backtracking, and released his implementation into the public domain. It became very widely used, eventually serving as the basis for the slow regular expression implementations mentioned earlier: Perl, PCRE, Python, and so on. (In his defense, Spencer knew the routines could be slow, and he didn't know that a more efficient algorithm existed. He even warned in the documentation, “Many users have found the speed perfectly adequate, although replacing the insides of egrep with this code would be a mistake.”) Pike's regular expression implementation, extended to support Unicode, was made freely available with sam in late 1992, but the particularly efficient regular expression search algorithm went unnoticed. The code is now available in many forms: as part of sam, as Plan 9's regular expression library, or packaged separately for Unix. Ville Laurikari independently discovered Pike's algorithm in 1999, developing a theoretical foundation as well.

Finally, any discussion of regular expressions would be incomplete without mentioning Jeffrey Friedl's book Mastering Regular Expressions, perhaps the most popular reference among today's programmers. Friedl's book teaches programmers how best to use today's regular expression implementations, but not how best to implement them. What little text it devotes to implementation issues perpetuates the widespread belief that recursive backtracking is the only way to simulate an NFA. Friedl makes it clear that he neither understands nor respects the underlying theory.

Regular expression matching can be simple and fast, using finite automata-based techniques that have been known for decades. In contrast, Perl, PCRE, Python, Ruby, Java, and many other languages have regular expression implementations based on recursive backtracking that are simple but can be excruciatingly slow. With the exception of backreferences, the features provided by the slow backtracking implementations can be provided by the automata-based implementations at dramatically faster, more consistent speeds.

Companion articles, not yet written, will cover NFA-based submatch extraction and fast DFA implementations in more detail.

Just goes to show that various benchmarks are relative, and there's likely a good bit of low-hanging fruit in Erlang's implementation.

Steve Vinoski on Tim Bray and Erlang

Not only is Steve Vinoski blogging again, he's blogging about Erlang. Not only is he blogging about Erlang, he's written some code like Tim Bray's but parallelized it on two-core and eight-core machines, with ease, as a relative newbie to Erlang...

Reading between the lines, it seems that Tim was hoping to take advantage of Erlang’s concurrency to put his multicore machines to work analyzing his logs...

I decided to take a crack at it myself...

The way this solution works is that it uses multiple Erlang processes to convert chunks of the input file to lists of strings and process them for matches...

The best I got on my MacBook Pro after numerous runs was 0.301 seconds with 2400 processes, but the average best seems to be about 0.318 seconds. The performance of this approach comes pretty close to other solutions that rely on external non-Erlang assistance, at least for Tim’s sample dataset on this machine.

I also tried it on an 8-core (2 Intel Xeon E5345 CPUs) 64-bit Dell box running Linux, and it clocked in at 0.126 seconds with 2400 processes, and I saw a 0.124 seconds with 1200 processes. I believe this utilization of multiple cores was exactly what Tim was looking for.

If you’re a Java or C++ programmer, note the ease with which we can spawn Erlang processes and have them communicate, and note how quickly we can launch thousands of processes. This is what Tim was after, I believe, so hopefully my example provides food for thought in that area. BTW, I’m no Erlang expert, so if anyone wants to suggest improvements to what I’ve written, please feel free to comment here.

Very cool. There are still the benefits to be gained from improving Erlang's I/O and regexp libraries for doing the sequential aspects of Tim's work. But this shows the real value of Erlang (and Erlang-like capabilities if they show up in other language systems) for the increasingly multi-core, multi-node world.

Sunday, September 23, 2007

Intel C/C++ STM Compiler

James Reinders, who lives just down the road a bit, announces a prototype C/C++ compiler with Software Transactional Memory...

We have a lot to learn before we can decide whether STM offers some relief from locks (they are NOT going away) and offers help for programming, or for tools which compose programs automatically. We think that the existence of a C/C++ compiler supporting Software Tranactional Memory (STM) would be a great help. So... Today, we released a prototype version of the Intel C/C++ Compiler with support for STM. It is available from Whatif.intel.com. The Intel STM Compiler supports Linux and Windows producing 32 bit code for x86 (Intel and AMD) processors. We hope that the availability of such a prototype compiler allows unprecedented exploration by C / C++ software developers of a promising technique to make programming for multi-core easier.
That's a healthy attitude. Have fun with it.

If you *are* interested in STM (well, I'm not), then you might consider how a system like Gambit Scheme, which compiles to C, could use this new C compiler. (You'd also have to consider how the Gambit Scheme interpreter does the same.)

Postmodern I/O

Update: As it turns out, get_line is intended primarily for writing interactive tty character-by-character apps. On the erlang-questions list, I think, someone on the implementation team announced they're updating the performance guides. And now I assume the i/o and regexp implementations. End.

Tim Bray's note to Erlang...

I like you. Really, I do. But until you can read lines of text out of a file and do basic pattern-matching against them acceptably fast (which most people would say is faster than Ruby), you’re stuck in a niche; you’re a thought experiment and a consciousness-raiser and an engineering showpiece, but you’re not a general-purpose tool. Sorry
I've never had to do a lot of really fast I/O in Erlang, so Tim's excercise (and Steve Loughran's a while back) have been useful for me.

Fortunately for Erlang, making improvements to the I/O and perhaps the regexp libraries should be a fair bit easier than making concurrency and distributed system improvements in other languages.

If I had to do a lot of really fast I/O and Erlang did not pan out for that, I would probably turn to Gambit Scheme. In fact Gambit can do really nice Erlang-like concurrency as well as really fast I/O. It just doesn't have all the OTP libraries that Erlang has.

Maybe Gambit will get there some day. Maybe someone will port Ruby to Gambit, so Ruby can run really fast and have really fast I/O too. And have really nice concurrency. All without building a new virtual machine from scratch or building on top of the JVM.

(If you're interested in that, let me know. I just don't care enough for the cruft in Ruby to write an implementation of it for my own use. There're a helluva lot of good things that fall out of this approach, like secure multiple application spaces per OS process, multiple languages with message-passing integration per OS process, "engines" that can be metered, pet-named secure access to resources, etc. OK -- that's its own blog post.)

If I had to do a lot of really fast I/O in the context of a reliable, scalable distributed system, I would probably do the really fast I/O in Gambit Scheme connected to an Erlang external port. Or this could be Python or Ruby if you wanted something from those systems.

For example instead of implementing a rule engine in Erlang, I'd integrate from Erlang to PyClips (a really nice integration of Python and the Clips rules engine) like this. That seems like the way to develop postmodern systems... use good tools for appropriate situations, especially if they are built to be integrated easily. Programming today of any size probably leads to multiple languages, by its very nature.

Meanwhile Tim's I/O results have been taken to the erlang-questions list. With those numbers there seems to be some low hanging fruit that may or may not require some underlying systems coding. Already solutions are pouring in just using a different approach than the apparently slow get_line.

Saturday, September 22, 2007

PDXFunc Wiki

The PDXFunc, the Portland functional programmers group has a wiki. Nothing there yet, but there it is, and it's a wiki.

Actually it seems to be password protected, so I could not edit the sidebar to include a link to the PDXFunc google group. Coming soon, I hope.

Biztalk Services In The Cloud, Or: Why Not Jingle All The Way

Jon Udell mentioned Biztalk's Internet Service Bus a while back. I'm just getting caught up with that.

Curiously, in this video they mention using WCF and Biztalk Services to implement a chat that can traverse firewalls, etc.

Have they considered not hurting the web and using IETF standards like XMPP? XMPP defines an HTTP-based client. And the Jingle (XEP-0166) proposal, em, which actually has some implementations, extends all *that* to accommodate higher-bandwidth, out-of-band, protocols for VOIP, includes firewall negotiation, etc. to figure out if they need to go through a server.

Ah, but, yeah, we could just use this Microsoft code and go non-standard all the way. Maybe I'm missing something. Or maybe you're happy in your little Microsoft corner of the internets.

Early Diagnostics

Tim Bray and Cédric Beust each make good points about Erlang's error reporting. My first reaction was: apparently they've not used Haskell. But coming back around, I do agree that the messages can be confusing, especially to beginners, especially to beginners whose mindset is in the typical assignment-oriented, imperative C/C++/Java model of computing.

Pattern matching errors can result in a "bad match" message. When you thought you were doing assignment, and you forgot you forgot you cannot "re-assign" different values to the one some variable already has, then having the system tell you "bad match" is going to cause some grief.

Learning anything new, and especially "re-learning" something you thought you already knew, will take time. Grant it, that effort may not always pay off. Maybe it's just not your cup of tea. On the other hand what I learned about programming in Erlang is that a few error messages pop-up with some regularity, and I now am fairly quick to identify the cause.

Good practices to aid diagnosing errors are:

  • Write a little code at a time.
  • Test-drive that code using an xUnit framework or just using the erl command line.
  • Use pattern matching and guards to narrow the acceptable parameters to a function.
That last one is a key: catch the error asap, and rely on the language itself to catch it for you. In Tim's example, the error message is correct - it's a bad argument. But the top-level function did not catch it. Rather the recursion was fairly deep before some other function caught the problem.

That top-level function looks as though it should expect a list, and not an atom. So this would have helped greatly...

scan(Arg) when is_list(Arg) ->
  ...
Or maybe better...
scan([]) ->
  %% Empty list, do the default thing...
  ;
scan([Head | Tail]) ->
  %% Do something with Head then recursion on the Tail...
  ...
  scan(Tail).
Cédric also says Erlang "feels old"... so do I sometimes, but I still have something left every now and then. :-D

Erlinguistics: Counting with a Functional Accumulator

Tim Bray though Erlang weird, which it is, but he's pushing on with it, which is fun and educational, hopefully, for everyone following along.

Tail recursion is really a glorified goto statement, and after a while it starts to feel like an elaborate fake-recursive lie.
Once this style is adopted though, solutions tend use less code and become more tractable. At least code comes out more factored and there's less of it in one place. i.e. you know those long stretches of looping code that can last for a couple of screenfuls? You tend not to find those in code written in a functional style. Assignments don't change the state of a variable 17 lines into a 49-line iteration. So in a sense there is no "lying" in a recursive function, but sometimes some unwinding is needed to get a sense of what's happening.

On counting...

I thought of two ways to count things. Every Erlang process has a process dictionary, i.e. a hashtable or content-addressable store or whatever, with get() and put() calls. This is the kind of thing that the Ruby code above depends on; the square brackets are a dictionary lookup. The problem is, Joe Armstrong severely disses the use of the dictionary... [Tim's analogy elided]

There’s also a hashtable package called ets, of which Joe expresses less disapproval.

Another way to count things, which I suspect would make Erlinguists happier, is to spawn a process for each thing you need to count

Yes, using a process to maintain changeable state is generally a good thing. Think of each such process as a "little database" and a "little messaging protocol" for accessing that database. Counting and inspecting the count would be the degenerate example. This approach usually would be reserved for more complicated kinds of "little databases".

More preferable in this simple counting situation might be to include one or more "accumulators" in the recursive function. Instead of passing around a Counter process id, the actual "count" could be passed around. For example, the original "morse codes in erlang" uses an accumulator that is not a count, but it is an accumulator of a list of results. As the results are accumulated, those results are passed around until the recursion is terminated, and then those results are returned as, well, the results.

A simpler counter kind of accumulator could look like the code below, which is close enough to Tim's original example. I'm not sure what his process_match is doing, but in this case, handle_match is playing that part, and here simply printing information about the match. The count is maintained in the Count "accumulator" which is passed around with the recursion. The top-level function "seeds" the accumulator with 0.

-module(foo).
-export([count_matches/2]).

-import(file, [open/2]).
-import(io, [format/2, get_line/2]).
-import(regexp, [match/2]).

count_matches(Filename, Pattern) ->
  {ok, In} = open(Filename, read),
  count_matches(In, get_line(In, ""), Pattern, 0).

count_matches(_In, eof, _Pattern, Count) ->
  Count;
count_matches(In, Line, Pattern, Count) ->
  case match(Line, Pattern) of
    {match, Start, Length} ->
      handle_match(Line, Start, Length),
      count_matches(In, get_line(In, ""), Pattern, Count + 1);
    nomatch ->
      count_matches(In, get_line(In, ""), Pattern, Count)
  end.

handle_match(Line, Start, Length) ->
  format("Found a match in ~s from ~w for ~w~n", [Line, Start, Length]).
Taking this approach to its limits results in functions like lists:foldl and lists:foldr which "fold" some value over a list of elements, starting from the left or the right of the list, respectively. It's all a variation on "map/reduce", essentially. The above code could be made more general to "fold" over the lines in a file from top to bottom for some given functional argument.

Wednesday, September 19, 2007

Atom and RelaxNG

(Update: Links now work, thanks to Sjoerd Visscher.)

I'm fairly new to the details of Atom format and brand-spanking new to the details of RelaxNG. So here's what I did, and if you have any comments, then add them here or send an email to patrickdlogan at gmail dot com. I also did not find any explicit examples like this on the internets, so this may help someone else -- it's not complex, and that makes me feel good about atom.rnc and relaxng in general.

My objective was to use relaxng to validate an atom entry where the content is inline'd xml. The validation should be for the atom stuff per se, as well as for the specific xml content. A second objective was to apply the relaxng schema for that specific xml to that content separate from the atom stuff. So the xml content should stand alone or be inline'd in an entry and one schema definition should be usable for both cases.

I grabbed atom.rnc, the compact relaxng definition for atom. I grabbed the jing validator. Applying just this schema worked. Good. Good.

Then I spent some time looking at that schema and reading through various relaxng combination mechanisms. The combination takes place at the definition of atomInlineOtherContent in atom.rnc. This is where the inline'd xml content goes.

An interleaved combination did not work, as I found out, because that definition as well as the definition of my own xml includes "text", which causes relaxng trouble. Although I don't fully understand why this should cause trouble if my schema is more specific.

The combination mechanism that did work is to redefine atomInlineOtherContent in my own schema...

namespace atom = "http://www.w3.org/2005/Atom"

# Usage:
# java -jar jing.jar -c atom_policy.rnc example_insurance.atom

# Note: this includes an unmodified atom.rnc grammar from the ietf
# standard. This grammar here redefines the inline content pattern to
# specify that such content should include appropriate attributes from
# the atom spec, e.g. type="application/xml", but the content itself
# should be a  tree of elements.

include "policy.rnc"

include "atom.rnc"
{
  atomInlineOtherContent =
     element atom:content {
        atomCommonAttributes,
        attribute type { atomMediaType }?,
        amitPolicy
     }
}
This looks just like the original definition except the actual content has to match the definition of amitPolicy as defined in policy.rnc. (I should probably redefine the attributes to require a specific type of "application/xml" or something, but this is close enough for now.)

The example entry with policy content validates using this more specific grammar. More goodness.

I met the second objective with just a little bit of cruft. I'd like to use policy.rnc to validate just some xml that starts with amitPolicy and has no atom elements at all.

I could not apply policy.rnc directly because I'd have to add a "start" to that grammar, which then gets the validator bogged down when using that grammar with atom, two starting points, not combined correctly.

Instead I wrote another small schema to define the starting point for a standalone policy...

include "policy.rnc"

# This currently exists to support validating policy xml content
# without being part of an atom entry. There may be another way to
# avoid having this separate little grammar.

start = amitPolicy
There may be another way to do this without defining another schema like this.

Tuesday, September 18, 2007

Erlang Interview

André Pang was interviewed at builder.au re: Erlang...

Our very first server was written in C++, but we realised that we had a problem if the server crashes. There might be 20 VFX houses connected to this server, and when it goes down, they all go down with it. So we looked at using an Erlang server, by writing custom plug-ins to ejabberd, which is a popular XMPP (Extensible Messaging and Presence Protocol) platform -- so we layered all of our code above that.

In the end we found that to be a good solution, in fact the only time I can remember it crashing is when we forgot to allocate swap space for the server and it ran out of memory. Due to the way you can set up ejabberd and Erlang in general is that it really suits a distributed system without a single point of failure. It makes it easy to set up different nodes which all replicate as each other, for instance, there were plenty of times when the primary server went down, but users were automatically connected to the secondary server instead. With a bit of smarts in the client it was possible to reconnect to a secondary server in the middle of the session and only lose a small amount of information in the process.

We had three or four servers around the world, and it was incredibly cheap to deploy, because with the Erlang runtime you don't need any special hardware or anything. It was very reliable and turned out to be a very good decision.

Sunday, September 16, 2007

AMQP, XMPP

Stefan Tilkov suggests...

Personally, I see AMQP on the one side and AtomPub/XMPP on the other side of a fence — AMQP internally, addressing the same problem domain as current, proprietary queueing solutions, and AtomPub and XMPP over the Internet.
I'd like to understand more about his reasoning.

What would be the drivers for this dichotomy? Why are two different messaging systems necessary? What would limit AMQP from being used on the internet scale? What would limit XMPP from being used on the intranet scale?

Without some reasoning behind this dichotomy, it just feels to much like the old WS-* on the intranet, HTTP on the internet. I don't recall seeing a sufficient rationale for that.

Now it seems fairly clear that HTTP fits well on both scales, no?

So is HTTP sufficient for all kinds of "message transfer"? HTTP and XMPP? HTTP, XMPP, and AMQP? Or... how many kinds of messaging are enough?

As an aside, our viking friend suggests Apache can go beyond the traditional as well. I think we have a lot to learn still about how simple the enterprise can eventually be.

Emacs Tag Magic

Tim Bray on programming in C using Emacs...

I’ve been using Emacs, and I seem to recall that it has all sorts of navigation magic.
That would be the "tags" mechanism. Tags work with non-emacs tools as well, but emacs is especially good with them. See the manual.

Tag databases can be created for all kinds of languages.

Those IDE's still have *nothing* on emacs. :-D

Nuke-you-lair

On nuke-you-lair power...

How effective is long term storage of nuclear waste? Stewart's answer was typically provocative. As I recall it, he said something like this: "We don't know, but our framing of the question shows a failure of long term thinking. We've all been imagining that we have to solve the nuclear waste problem for all time to come. In fact, we only have to solve it for a few hundred years. Either by then technology will have advanced sufficiently that it will no longer be a problem, or we will have regressed so far that a few nuclear waste dumps in out of the way places will be the least of our worries."
In other words, we're screwed. Welcome to your post-apocalyptic future.

Saturday, September 15, 2007

Sweet Linux Air

From the RIA Cowboy himself. Yeehaw. While we're waiting for Air to be officially available on Linux...

*** WARNING - THIS IS TOTALLY UNSUPPORTED, UNENDORSED, AND A COMPLETE HACK ***

...

All I need is ADL - the testing tool for AIR applications. So I gave it a try on a Salesforce.com project I’m working on:

jamesw@dos:~/projects/mavericks/examples/air/AccountTracker/bin$ wine ~/flex_sdk-3_b1/bin/adl.exe salesforceTest-app.xml
And to my total surprise the AIR application loaded and ran on Linux! Sweet!
The one component that truly needs Windows or Mac right now can run on Wine on Linux. The degree the API is mapped to the Linux environment is probably limited, but for at least some things, developers can currently use Linux for Air.

Friday, September 14, 2007

Squeak by Example

(via James Robertson)

A nice looking new book for getting started with Squeak Smalltalk, pdf is free online, or you can order it print-on-demand from lulu. They also accept donations for the pdf.

Co-authored by local Portlander Andrew Black.

The book seems comprehensive for getting started, covering the development environment (including the Monticello package tool) as well as the language, the GUI classes, "meta" classes, etc.

This is a good year to learn Smalltalk and Erlang, with two of the best programming language books ever now available for two of the most distinguished programming languages. Every programmer would be better at whatever they do by knowing a good bit about these languages.

Thursday, September 13, 2007

ONLamp: an Introduction to Erlang

From Gregory Brown at ONLamp...

I consider myself someone with a middling grasp of concurrency at best, and though I'd likely be scratching my head trying to write this in many other languages, I hacked this together in about 20 minutes with only a beginner's level of experience in Erlang. This certainly says a lot for the possibilities of the language to make parallel programming very easy.

A Scalable Distributed Data Structure for P2P

Google Tech Talk

Tuesday, September 11, 2007

Robust Composition: Unified Access and Concurrency Control

Mark S. Miller's PhD dissertation, "Robust Composition: Towards a Unified Approach to Access Control and Concurrency Control"...

When separately written programs are composed so that they may cooperate, they may instead destructively interfere in unanticipated ways. These hazards limit the scale and functionality of the software systems we can successfully compose. This dissertation presents a framework for enabling those interactions between components needed for the cooperation we intend, while minimizing the hazards of destructive interference.

Great progress on the composition problem has been made within the object paradigm, chiefly in the context of sequential, single-machine programming among benign components. We show how to extend this success to support robust composition of concurrent and potentially malicious components distributed over potentially malicious machines. We present E, a distributed, persistent, secure programming language, and CapDesk, a virus-safe desktop built in E, as embodiments of the techniques we explain.

Performance of Selective Receive

Pascal Brisset explains (on erlang-questions some time ago) a scenario where Erlang's selective receive can fall behind...

The system is dimensioned so that the CPU load is low (say 10 %). Now at some point in time, the backend service takes one second longer than usual to process one particular request. You'd expect that some requests will be delayed (by no more than one second) and that quality of service will return to normal within two seconds, since there is so much spare capacity.

Instead, the following can happen: During the one second outage, requests accumulate in the message queue of the server process. Subsequent gen_server calls take more CPU time than usual because they have to scan the whole message queue to extract replies. As a result, more messages accumulate, and so on.

snowball.erl (attached) simulates all this. It slowly increases the CPU load to 10 %. Then it pauses the backend for one second, and you can see the load rise to 100 % and remain there, although the throughput has fallen dramatically.

Here are several ways to avoid this scenario...

...

Add a proxy process dedicated to buffering requests from clients and making sure the message queue of the server remains small. This was suggested to me at the erlounge. It is probably the best solution, but it complicates process naming and supervision. And programmers just shouldn't have to wonder whether each server needs a proxy or not.

I'm not sure how it really complicates naming and supervision so much. I think it is the best solution. The problem is not in selective receive, per se, which has a lot of benefits that outweigh this specific scenario. Especially wrong would be to gum up the Erlang language and simple message passing mechanisms just for this.

The problem in this scenario is *coupling* too closely the asynchronous selective receive with the backend synchronous service. This is not an uncommon scenario in all kinds of "service-oriented architectures" and the solution, generally, should be the one quoted above.

A programmer should legitimately wonder whether some kind of a "proxy" is needed when they see this kind of a combination.

This is related to the blogs going round not so long ago among fuzzy, Bill de hÓra, Dan Creswell, and others.

Receptionists for Shared Resources

Carl Hewitt, et al.'s Linguistic Support of Receptionists for Shared Resources (MIT AI Memo 781 -- pdf).

exmpp

Back in December, Mickaël Rémond wrote on the erlang-questions list...

Jabberlang for now rely on ejabberd. You need ejabberd to use it. The shared object you are mentionning comes from ejabberd. That said Jabberlang is a client library that rely on a server implementation which seems strange. That's why we are currently doing some code refactoring. Our target is to have:
  • exmpp: A common XMPP library that is used by ejabberd and Jabberlang
  • Jabberlang: Relying on exmpp.
  • ejabberd: Relying on exmpp.
We are in the middle of this refactoring currently.
Recently, elsewhere, he updated:
Jabberlang will be replaced soon by a much better and easier to use library called exmpp. Stay tuned. It will be published here: http://www.process-one.net/

The Killer App for Multi-Node

From Many Core Era, the question: what's the killer app for many-core?

The true "many core" era will have arrived when we realize this is the wrong question.

By the time we get to a significant number of cores we shouldn't care much about cores any longer.

Monday, September 10, 2007

So You Flip The Common Torch Around

This google tech talk by Van Jacobson on networking is great.

Fun game: the video has closed captions in english. Watch for the phrase around 28 min. into it, where the captions that reads "so you flip the common torch around" and catch what Jacobson actually says. (Answer below.)

Was the caption written by a human or a machine? Was it proofed and edited?

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Answer: "so you flip the combinatorics around"

On The Web

Douglas Crockford at the Rich Web Experience conference...

AJAX applications are highly interactive, highly social, easy to use, and offer great network efficiency, according to Crockford. "The big problem is that it is too damn hard to write these apps," he said.

"The most interesting innovation in software development in 20 years has got to be the mashup," which shows the benefits of distributed programming. "Unfortunately, mashups are insecure [in the browser]," said Crockford, with components unable to be protected from each other.

The model in the browser is fully broken and needs to be fixed, he said. "The Web is an exploit waiting to happen," Crockford said.

Crockford then went through a critique of various Web technologies.

"[JavaScript is a] deeply flawed language," with an unpopular programming model. "But to its credit, it's working really, really well in an environment where Java failed," said Crockford.

The planned JavaScript 2 upgrade also has problems. "It will make the language considerably more complicated," Crockford said.

Good so far, but then...
If the Web is unable to repair itself, it could be replaced with a proprietary system such as Microsoft's Silverlight or Adobe's AIR (Adobe Integrated Runtime), Crockford said. Proprietary systems do present advantages, such as having only one source of new bugs and presenting a simpler upgrade story. But people like open systems and are suspicious of proprietary systems, he said.
Em, AIR is like any other programming language and framework -- it can be used "on the web" or it can be used to route around the web. We need better ways than the browser to be "on the web" but we certainly don't need to route around the web per se.

The modern-day web browser is not the only client that can be "on the web". Look at blogging tools like Bottomfeeder. BF is a great example of an end-user application that is (1) not based in a popular web browser like IE or FF, and yet (2) is "on the web".

On the Road...

We were all delighted, we all realized we were leaving confusion and nonsense behind and performing our one noble function of the time, move.

List Comprehensions

Merlyn Albery-Speyer rewrote my morse code example using erlang's list comprehensions. It's posted on the pdx.erl site. Here I have reproduced it but I'm using the same variable names from my earlier example to make the comparison easier on the eye.

It's doing essentially the same thing in fewer lines of code because the list comprehension syntax is so concise. Not that the previous example was so long in the first place, but coming from an older Lisp world where we wrote out our recursions in "long form" ;-/, I need to remember to keep list comprehensions in my toolkit.

-module(lcmorse).
-export([codes/0, decode/1]).

-import(lists, [prefix/2, reverse/1]).
-import(string, [len/1, substr/2]).


codes() -> 
  [{$A, ".-"},   {$B, "-..."}, {$C, "-.-."}, {$D, "-.."},  {$E, "."}, 
   {$F, "..-."}, {$G, "--."},  {$H, "...."}, {$I, ".."},   {$J, ".---"}, 
   {$K, "-.-"},  {$L, ".-.."}, {$M, "--"},   {$N, "-."},   {$O, "---"}, 
   {$P, ".--."}, {$Q, "--.-"}, {$R, ".-."},  {$S, "..."},  {$T, "-"}, 
   {$U, "..-"},  {$V, "...-"}, {$W, ".--"},  {$X, "-..-"}, {$Y, "-.--"}, 
   {$Z, "--.."}].

decode("") -> [""];
decode(String) ->
  [[Char | Rest] ||
    {Char, Code} <- codes(),
    prefix(Code, String),
    Rest <- decode(substr(String, 1 + len(Code)))].

Saturday, September 08, 2007

Things from Phil Windley

Things from Phil Windley... Longtails and Software... User Centric Identity.

Morse Codes in Erlang

The portland erlang group is considering some of the ruby quizzes to dive into erlang. If you're not used to programming recursively, you'll probably want to start with simpler problems like implementing the member function, factorial, or count to determine the number of occurrences of some element in a list. When those are comfortable, then the ruby quizzes might be a good challenge.

Here's a sequential, recursive solution in erlang to ruby quiz #121: morse code. It's not *tail* recursive so it could maybe blow out some stack given a really long morse code word. (I assume morse code is translated one word at a time, with spaces between words.)

A tail recursive implementation... hmm... would be a bit more complicated. The two "stacks" are the dots and dashes yet to be decoded and the codes to be considered with each undecoded substring. So one way would be to manage the stacks explicitly in one recursive "loop". Another way would be to use continuation passing style.

Here's (I think) a fairly straight-forward recursive style, morse.erl...

-module(morse).
-export([codes/0, decode/1]).

-import(lists, [prefix/2, reverse/1]).
-import(string, [len/1, substr/2]).

%% http://www.rubyquiz.com/quiz121.html
%%
%% decode/1 when given a string of dots and dashes returns a list of
%% all possible decodings using morse code.
%%
%% Examples:
%%
%% morse:decode(".-").
%% ["ET","A"]
%%
%% lists:member("SOFIA", morse:decode("...---..-....-")).
%% true
%%
%% lists:member("SOPHIA", morse:decode("...---..-....-")).
%% false
%%
%% lists:member("EUGENIA", morse:decode("...---..-....-")).
%% true
%%
%% length(morse:decode("...---..-....-")).
%% 5104

codes() -> 
  [{$A, ".-"},   {$B, "-..."}, {$C, "-.-."}, {$D, "-.."},  {$E, "."}, 
   {$F, "..-."}, {$G, "--."},  {$H, "...."}, {$I, ".."},   {$J, ".---"}, 
   {$K, "-.-"},  {$L, ".-.."}, {$M, "--"},   {$N, "-."},   {$O, "---"}, 
   {$P, ".--."}, {$Q, "--.-"}, {$R, ".-."},  {$S, "..."},  {$T, "-"}, 
   {$U, "..-"},  {$V, "...-"}, {$W, ".--"},  {$X, "-..-"}, {$Y, "-.--"}, 
   {$Z, "--.."}].

decode("") ->
  [];
decode(String) when is_list(String) ->
  decode(String, "", [], codes()).

decode("", "", Results, _Codes) ->
  Results;
decode("", PartialDecoding, Results, _Codes) ->
  [reverse(PartialDecoding) | Results];
decode(_String, _PartialDecoding, Results, []) ->
  Results;
decode(String, PartialDecoding, Results, [{Char, Code} | Rest]) ->
  MoreResults =
    case prefix(Code, String) of
      true ->
        decode(substr(String, 1 + len(Code)), [Char | PartialDecoding], Results, codes());
      false ->
        Results
    end,
  decode(String, PartialDecoding, MoreResults, Rest).

Friday, September 07, 2007

Secure Enterprise-grade Persistent Group Chat

Also from Todd Bishop... Microsoft buys MindAlign...

MindAlign is a secure enterprise-grade persistent group chat solution.
Oh, dear. Is there any reason a large organization would not want to base its future messaging capabilities on XMPP?

But I had similar thoughts about email, and yet there's Exchange. One difference may be, though, with email and Exchange, Microsoft got in before email could become much of a platform for messaging applications. I would imagine over the last 10 years there's been little venture money for email-based platforms.

The same is probably not true for "instant messaging". What do you think? Am I just being ignorant in my recollection and/or prognostications?

Just a Router

Via Todd Bishop on Microsoft's Seattle-area buses running Linux-based wi-fi...

"Obviously this could be a sensitive issue for an operating system company like Microsoft," Polson said of the Linux issue, "but it's just a router, and that happens to be the operating system."
Just a router. Problem is for an operating system company like Microsoft, the operating system is disappearing as quick as it can. No one cares, nor should they.

Before long no one will care about Word, SQL-Server, or SharePoint. Excel may last a bit longer, but most people will realize they don't need 3/4 of what's in that, and there are better ways of getting at the 1/4 they do care about.

Microsoft has shown few signs of being a successful "design shop" in an age of increasingly boutique long tails running on increasingly simple, open, common foundations that have little to do with any specific piece of hardware sitting right in front of you.

I know, I know. They've got a lot of money, past success, patents, and Silverspoon. There's hope for them yet.

And they are a sponsor of the Open Router Platform project.

Parallel Jars of Formaldehyde

Phil Windley observing Parallels in action...

One of the cool features of Parallels is something they call “Smart Select.” With Smart Select you can specify which file types are handled by which application and in which OS. So for example, you can specify that Word docs are always opened in Office 2007 in Windows, regardless of which OS you click on the document. Or that clicking on a URL, regardless of which OS you’re using always opens the page in Safari on the Mac.
Little jars of formaldehyde to preserve the remains of yesterday's mammoth operating systems.

(Re)New Web

"Air will set the World on Fire."

The current browser is a terrible platform for the web. Whatever you think about Adobe's Air, per se, it will ultimately change "the browser" for the betterment of the web generally.

Wednesday, September 05, 2007

Naming and Synchronization in a Decentralized Computer System

This is David Reed's dissertation from 1978. Croquet, developed in Squeak Smalltalk, is based on this mechanism.

It's Simply Different

Sam Ruby gushes...

With many frameworks and languages, I get the feeling that I’m dealing with a metal cabinet covered by layers of marine paint; one where scratches tend to reveal sharp edges. With Erlang, I get the feeling of a Victorian mahogany armoire; one where scratches in the wood simply reveal more rich wood.
Reading erlang can be fun. Here's Sam's atom2json.erl.

When combining this kind of concurrency with an imperative language, there's typically a bit of code explosion implementing loops and iterators. The tendency also is to write long stretches of code that update variables and structures in place, even when concurrency and first class functions exist.

Erlang, being a Lisp-like language (really), instead traverses lists recursively.

Erlang, being influence by Prolog (somewhat), combines recursion with (a simpler form of) pattern matching.

List recursion, pattern matching, and simple concurrency mechanisms interact into a code implosion. Not weird at all to this Lisp programmer.

It's Already On

To Bob Warfield's point on the multicore crisis upon us already: yes, even on the desktop.

Here's an exercise -- think of your favorite or your most frustrating desktop applications. Maybe a browser, a mail reader, presentation or drawing apps. No matter which specific application comes to mind, that application almost certainly consists of long stretches of sequential code. Almost certainly a good bit of that code could be concurrent not in the sense of a parallel algorithm, but in the sense of there being no logical reason for C to follow B to follow A other than the languages used reinforce that style.

Anytime time you see an hour glass, you see an opportunity for concurrency.

I've been trying out Yahoo's beta email application. It's fine, but really cries out to be developed in a truly concurrent system. Moving more applications into an environment worse than modern desktops, let alone far from securely concurrent, is just a shame. Modern browsers are horrible platforms. We need a new browser model that is concurrent and secure. We need a new desktop that is concurrent and connected.

Both the desktop and the browser are poorly suited for upcoming many-core laptops and desktops. Not to mention five or ten years from now when we may well be deluged in so much more cheap iron looking to do something useful for us other than running bloated sequential code.

Programming shared memory threads in C# with transactional memory will not get us any closer to where we need to be. We need to think much differently with languages that support that thinking. It's only too bad we're still in the 1970s.

Tony Hoare developed monitors in the early 1970s then replaced them with concurrent message passing in the late 1970s. Then Java brought us monitors again in the mid-1990s.

Over a decade later we're still twiddling bits in critical sections stuck in a stretch of sequential code. It's ludicrous to think that's the natural human problem solving model. The tools have shaped our thinking. And I am rambling.

My Agile Failed - pause - NOT

OK, here's the deal: "agile" *cannot* fail!

Preposterous?

Here's why "agile" cannot fail: it is a set of tools that can be adapted to your needs. You may do better or worse with them. In that sense you may fail to benefit from them, or you may simply prefer not to use them. Or you may benefit from them. In either case it is you suffering or benefiting, not "agile".

If you think "agile" can fail, there is a different problem to talk about.

But "agile" cannot fail in the same way "hammer" cannot fail. (Thanks, Ed, for the analogy.)

Delaying The Inevitable

Dan Creswell on a few of the ways we developers paint ourselves into a corner...

Every time we assume we can keep all our data in a single memory or database (even if it’s a cluster) we’re embedding assumptions into our software that will be broken come the day we must partition across multiple memories or databases.

Each time we choose an algorithm that doesn’t easily partition or assumes a single memory/database we’re storing up trouble in our data and computational models.

In big monolithic systems it’s possible to create (by force) a never-fails environment which allows developers to ignore various edge cases.

Tuesday, September 04, 2007

PDX.erl

Merlyn Albery-Speyer has started a yahoo group for portlanders (and beyond, I presume) interested in erlang.

pdx.erl

The Race Is On, Or Is That Off?

Or: "I just dropped in to see what condition my condition was in."

Tim Bray ponders more cores, hardware (and software -- cannot forget the software) transactional memory as well as erlang, or some sort of erlang transmorgrigication into java or something less weird.

Ralph Johnson addressed that idea, probably accurately.

These are fascinatingly intertwined topics. Dan Creswell sent a link to me the other day: this Sun research paper on Hybrid Transactional Memory (pdf). I hope he blogs about it. He's got a good handle on the situation.

Unlike apparently many smart people at Sun, Microsoft, Intel, and elsewhere, I'm still unconvinced that transactional memory makes the overall problem any easier or better.

I do know that focusing on transactional memory is a short-term solution at best. Erlang addresses the true problem: a simple programming model that can scale out beyond SMP and yet scale down to a single core just as well.

Tim suggests transactional memory will remain well out of sight for application programmers. But these programmers need better tools, no matter how HTM affects them in the small (and eight, even 16, cores should be considered small over the next decade). The results of system programmers using transactional memory in low level SMP code is a drop in the bucket compared to today's and tomorrow's application development problems. These have little to do with a single piece of silicon and have everything to do with networks and complex concurrent collaboration among disparate subsystems.

Not so many years from now we will be awash in more, cheaper, hardware than our current application development languages and tools can reasonably accommodate. We should have a simple model for addressing all that hardware with less developer effort. We need simple languages and tools for concurrency and *distribution* so that we can waste cheap hardware in exchange for leveraging valuable developer ergs.

Today we are wasting hardware running garbage collectors in order to save developer ergs. Increasingly we need to be wasting hardware running large numbers of small processes.

Transactional memory is not even close to the support we need. I am not sure why so many people find it shiny. Maybe I'll be surprised.

Update: Some urls in comments made easier here:

Brit's got a conversation going with Tim Sweeney on transactional memory vs. message passing.

Tim's example for TM is coordinating a lot of objects quickly in a shared memory for some game scenarios. Fair enough - I am unable to compare the complexity of transactional memory vs. traditional "critical section" mechanisms for this. Off the top of my head I would agree that a shared-nothing message passing mechanism does not really address this problem, but I would imagine it still useful for other aspects of that kind of a game system. My bigger point is this: there are relatively few people with that kind of a problem. Most of us have the kinds of problems that are far better addressed by shared nothing message passing.

So what frightens me as much as the transactional memory hardware and/or software itself is the *attention* it is receiving as any sort of general solution to developing software. Is the cost worth the benefit?

Monday, September 03, 2007

Phil Windley on the Microwulf

Phil Windley on driving down the cost of Beowulf clusters...

The world's cheapest supercomputer, built by a Calvin College CS professor Joel Adams and student Tim Brom is very interesting. They built an 8 core Beowulf cluster using four motherboards and a gigabit network switch for less than $2500. The resulting machine has a price/performance ratio of $100/Gigaflop. That's just plain fun.

I think there ought to be a yearly competition of this sort for students. Who can build the fastest supercomputer for $2500?

Rob Pike on Concurrency and Message passing in Newsqueak

Rob Pike and Luca Cardelli created a simple, concurrent language called Squeak (not to be confused with Squeak Smalltalk) for simplifying programming user interfaces. Pike then turned that into Newsqueak, a more complete programming language. (See "Squinting at Power Series", (pdf)).

Newsqueak has processes like Erlang. But instead of an implicit inbox and outbox per process, Newsqueak has channels as first-class types. All channels are half-duplex, synchronous communicating, well, "channels", as in C.A.R. Hoare's Communicating Sequential Processes. Newsqueak also has only immutable data values, although it does have true variables rather than single-assignment. (Not sure yet how that interacts with concurrent processes running closures over variables lexically known to each process.)

And so Newsqueak can be used to do Erlang-like programming and vice versa, using processes/channels/functions or processes/pids/functions to implement something like the other's mechanisms.

Although Newsqueak does not have the distribution mechanisms or the failure mechanisms of Erlang. And rather than using pattern matching over all the messages in a process inbox, in Newsqueak a process might use something like one channel per pattern or encode all the possible patterns into a structured type passed over a single channel.

Russ Cox has an overview of CSP in the context of Bell Labs where Newsqueak was developed and then led to other interesting things.

Pike is now at Google and not too long ago recorded an interesting and well-done Tech Talk on Newsqueak and concurrent, message passing programming generally...

  • Define components as interfaces with all data flow and sharing done as communication over channels.
  • The interface is a type; implementations of that interface just honor the protocol.
  • Composition is linear in complexity of design but superlinear in expressibility. (The opposite of composition of state machines.) Interleaving is free. Compose interfaces, not state machines.
  • Parallelism is not the point, but falls out for free. Networking and remote execution are not the point, but also can fall out (although not quite for free).

...Concurrent processes with message passing can be a powerful model for programming, whether the problem is intrinsically parallel (server design) or not (power series)...

The expressiveness - notation - is important.

Update: Ehud linked here from Lambda the Ultimate, and that thread has several comments of its own.

Blog Archive

About Me

Portland, Oregon, United States
I'm usually writing from my favorite location on the planet, the pacific northwest of the u.s. I write for myself only and unless otherwise specified my posts here should not be taken as representing an official position of my employer. Contact me at my gee mail account, username patrickdlogan.