Making it stick.: 5/18/08

Saturday, May 24, 2008

Erlang and the Facebook Chat Architecture

Seen on InfoQ re: the Facebook Chat Architecture...

Another challenge was delivering messages in real time. Facebook choose a technique whereby the client pulls updates from the server, similar to Comet's XHR Long Polling Process...
Facebook choose a combination of C++ and Erlang to implement clustered and partitioned subsystems. The C++ module is used to log chat messages, while Erlang "holds online users' conversations in-memory and serves the long-polled HTTP requests". epoll, a new system call introduced in Linux 2.6, was used to drive the Erlang module. Eugene states why the decision was made to go with Erlang...
because the problem domain fits Erlang like a glove. Erlang is a functional concurrency-oriented language with extremely low-weight user-space "processes", share-nothing message-passing semantics, built-in distribution, and a "crash and recover" philosophy proven by two decades of deployment on large soft-realtime production systems...
The secret for going from zero to seventy million users overnight is to avoid doing it all in one fell swoop. We chose to simulate the impact of many real users hitting many machines by means of a "dark launch" period in which Facebook pages would make connections to the chat servers, query for presence information and simulate message sends without a single UI element drawn on the page.

I am kind of amazed to see the attention Erlang has been given over the last couple two three years. OSCON in July here in Portland will have at least a three Erlang related sessions: a couple of tutorials and a presentation.

Mike Herrick on CSI and on InfoQ

InfoQ interviews Mike Herrick re: using Rails and JRuby for the CSI. Mike also blog'd about his project recently...

When deployed, UT-NEDSS project will directly contribute to the prevention of sickness and death by effectively collecting, identifying, tracking and trending information gathered about infectious diseases and bioterrorism attacks. Additionally, this unique partnership is providing lessons for public health informatics

Steve Vinoski on RPC and Messaging

There's been a stream of messages on the erlang list about RPC. Steve Vinoski's put an explanation of RPC and his position on his blog. Now he's followed up on the email list with a position on integration generally, and pub/sub messaging in particular...

One of the most effective forms of enterprise integration I've seen over the years is publish/subscribe messaging. I worked many years for CORBA vendors, and we'd often lose potential deals to messaging systems. Message queuing systems work well because (in no particular order):

they don't pretend to be programming language procedure or method calls, so they avoid the associated impedance mismatch problems

they don't try to hide distributed systems issues

coupling is low -- drop a message into a queue here, pick up a message from a queue there

queues can be persistent, or more generally, delivery guarantees can be varied as needed

asynchrony

payloads need not conform to some made-up IDL type system

getting two different messaging systems to interoperate is easier than getting two different RPC or distributed object systems to interoperate

The problem with messaging systems, though, is that traditionally they've been quite expensive. Thankfully, I believe AMQP (http://amqp.org/) solves that issue nicely, and of course Erlang is the perfect way to implement it, which Alexis, Tony, and the rest of the RabbitMQ guys have already done (http://www.rabbitmq.com/).

I agree with Steve's take. The best pub/sub mechanism I have experience with has been Tibco's rv. All the Tibco layers above it have been enterprisey bits to make a sale. The basic rv bus is super-easy to just use from C, Python, Java, anything. Horribly expensive back in the day, and I assume it still is.

AMQP should have helped address this. But I've not seen much on the internets about AMQP over the last many months. I wonder if this is going to go anywhere? Or am I just missing the traffic?

I'm not sure XMPP pub/sub in the enterprise is going anywhere too quickly either, but that's another candidate for getting beyond vendor lock-in for enterprise messaging.

Why on earth Cisco is defining a new RPC protocol is beyond me.

And to any AMQP folks: the redirect from http://amqp.org/ seems broken.

Enterprises should be moving toward HTTP first and foremost, figuring out what's happening with XMPP for various other kinds of messaging, and/or AMQP in situations where HTTP and XMPP don't seem to cut it.

Friday, May 23, 2008

Matthias Felleisen speaking at Portland State University on June 2

Just rec'd the following announcement on the pdxfunc email list...

Matthias Felleisen is speaking at PSU a week from Monday (June 2). The
talk is open to the public.

"The First Year in Computer Science"
Matthias Felleisen
Northeastern University
2nd June, 16:40–18:00
Location: Hoffman Hall Rm 109, Portland State University
Abstract

My team and I have developed an alternative approach to the first-year courses on programming and computing. Unlike conventional approaches, ours focuses on designing programs in a systematic manner. The syntax of the chosen programming languages (both Java and Scheme) is only discussed as needed to support design principles.

Field tests with over 500 high schools and a dozen colleges have shown that the approach produces better students than conventional approaches that use a single language.

In several controlled studies, we could also show that students find our curriculum more appealing than the AP curriculum.

In my talk, I will provide an overview of the project, especially its intellectual premises and principles. My goal is to encourage you to think about the first year in a different way. The old ones are of questionable value. If we want our beautiful discipline to survive, we must find a good way of teaching it."

http://web.cecs.pdx.edu/~colloq/felleisen-firstYear.html for more information.

Sunday, May 18, 2008

More Ted

This is part two of my response to Ted Neward's latest misunderstanding of Erlang. Ted wrote to Steve Vinoski...

Sorry, Steve-O, but I think you're out in left field on this one. I'm happy to argue it further with you over beer, but if you want the last word, have at it, and we'll compare scores when we run into each other at the next conference.

Everything I've read of Steve's tells me he's well informed by actual experience. Ted's writing exhibits the most basic misunderstandings. I wonder why Ted didn't spend a little beer time with some experts before attempting to write with supposed authority.

Toward Finer Tuning of Definitions of Processes and Languages, Reliably

Here's an attempt to define terms more finely (finally?) after reading the latest (final?) response from Ted Neward to Steve Vinoski's attempts at helping Ted to clarify differences between Erlang and other languages and run-times, esp. those based on Java/JVM-ish. At a superficial level of understanding what Ted writes has some meaning. Just enough to be troublesome. So here's my pass at making clarifications.

The first terms to narrow in on are "process" and "thread" as they are used in various languages and operating systems. There are similarities for sure, but the subtle differences are the important aspects when making comparisons. Ted writes a few things using these terms...

Processes in *nix are just as vulnerable to bad coding practices as are processes in Windows or Mac OS X

Aside: Mac OS X _is_ Unix. But that's for a different debate on the merits of Windows per se.

Anyway. I don't expect anyone to argue with Ted on this point, so let's dig deeper into "bad coding practices" and processes. Erlang is a "mostly-functional" language and there is very little shared memory in its concurrency model. This removes many opportunities for bad coding. So can we say that programming an operating system process in Erlang is less vulnerable than programming an operating system process using most popular languages? I would say yes, generally.

the robustness and reliability of a system is entirely held hostage to the skill and care of the worst programmer on the system

The emphasis is Ted's, not mine. There are many fine points to be made about this claim. One is the term "entirely" and another is the term "system". But first is the observation that this claim is about development practices more than anything else. To argue about language and mechanism shouldn't we assume that the teams are skilled at least in their own languages and mechanisms?

Let's say no and look deeper. You won't read much Erlang literature before you come across terms like robustness and reliability, and explanation of language constructs and framework mechanisms that support these traits.

On the other hand one can program in Java and similar languages for years before realizing that attention to robustness and reliability are even the developer's concern.

Back to "system" and "entirely" being at the mercy of the worst programmer. Perhaps the most important point made by the Erlang experts is to expect the worst and to design your system to recover from that. The Erlang culture does its best to help you isolate bad parts and the Erlang mechanisms make that isolation and recovery easier than with most other systems.

Erlang's model is similar to the CLR's AppDomain construct, which has been around since .NET 1.0, and Java's proposed "Isolate" feature which has yet to be implemented.

I'm not going to presume to know anything more about AppDomains than Ted know's about about Erlang, but this comparison seems true only barely. Here's some information I found about AppDomains...

The main purpose of an Application Domain is to isolate our applications from other applications. Application domains run on a single Win32 process... Objects in the same Application Domain communicate directly while Objects that exists in different Application Domains interact with each other by transporting copies of objects to each other or by using proxies for message exchange (by reference).

Any somewhat-skilled programmer familiar in the least with Erlang should be able to tell you significant differences between the AppDomain described above and Erlang's process model. I'm bothered that Ted Neward has such a large stage for making superficial statements that these mechanisms are in any way meaningfully the same. This is a significant disservice to programmers following his writing, trying to learn useful information. If Ted does not have (apparently) even a few days worth of reasonable Erlang experience or even reading, why should he presume to make such significant and horribly misleading statements to his large audience?

Here are the significant differences between these two mechanisms (Erlang processes and dotnet AppDomains)...

AppDomains appear to be used to put distinct applications inside the same Windows OS process.
Erlang processes are used to decompose a single application into many, small, independent, shared nothing processes.
Objects in different AppDomains are considered separate and rarely-interacting.
Processes in Erlang are roughly the same "size" as an instance of a major role-playing object in a single application. They collaborate with other processes frequently and are highly interactive.
Objects in an AppDomain are "shared-everything", running concurrent, shared-everything threads.
Erlang processes (the equivalent of objects in dotnet) in an application are "shared-nothing".
AppDomains pass copies of values or references to objects (as far as I can tell from this one article).
Erlang processes always pass copies of values. Erlang and its frameworks per se do not provide references, and so it's "shared-nothing" all the way down. The equivalent of "reference to an object" in Erlang is "identifier of a process".
AppDomains appear to be a large-grain, compile-time, declarative mechanism. Any Windows OS process would have just a few AppDomains. I would suppose these number in the dozens per OS process.
Erlang processes are a fine-grain, run-time, dynamic mechanism. An Erlang node (essentially an OS process) would have many hundreds of processes or more.

Ted continues...

if the argument here is that Erlang's reliability comes from its lack of shared state between threads, hell, man, that's hardly a difficult architecture to cook up. Most transactional systems get there pretty easily, including EJB, though then programmers then go through dozens of hoops to try and break it.

Ted does a pretty good job of hinting that EJB is a broken model. EJB isolation is more like an AppDomain. EJB separates one application from another, with fairly laborious mechanisms required to get them to communicate with each other.

Within an EJB application there is supposed to be no concurrency at all. And so again this is the opposite of Erlang. Erlang encourages high concurrency and ease of interaction. But these points are apparently lost on Ted. Usually one cannot read a single page of Erlang literature without having this made absolutely obvious.

The next post will get into Ted's arguments about language and common run-times.

Making it stick.

Search This Blog