"I have a mind like a steel... uh... thingy." Patrick Logan's weblog.

Search This Blog

Saturday, May 24, 2008

Erlang and the Facebook Chat Architecture

Seen on InfoQ re: the Facebook Chat Architecture...

Another challenge was delivering messages in real time. Facebook choose a technique whereby the client pulls updates from the server, similar to Comet's XHR Long Polling Process...

Facebook choose a combination of C++ and Erlang to implement clustered and partitioned subsystems. The C++ module is used to log chat messages, while Erlang "holds online users' conversations in-memory and serves the long-polled HTTP requests". epoll, a new system call introduced in Linux 2.6, was used to drive the Erlang module. Eugene states why the decision was made to go with Erlang...

because the problem domain fits Erlang like a glove. Erlang is a functional concurrency-oriented language with extremely low-weight user-space "processes", share-nothing message-passing semantics, built-in distribution, and a "crash and recover" philosophy proven by two decades of deployment on large soft-realtime production systems...

The secret for going from zero to seventy million users overnight is to avoid doing it all in one fell swoop. We chose to simulate the impact of many real users hitting many machines by means of a "dark launch" period in which Facebook pages would make connections to the chat servers, query for presence information and simulate message sends without a single UI element drawn on the page.

I am kind of amazed to see the attention Erlang has been given over the last couple two three years. OSCON in July here in Portland will have at least a three Erlang related sessions: a couple of tutorials and a presentation.

Mike Herrick on CSI and on InfoQ

InfoQ interviews Mike Herrick re: using Rails and JRuby for the CSI. Mike also blog'd about his project recently...

When deployed, UT-NEDSS project will directly contribute to the prevention of sickness and death by effectively collecting, identifying, tracking and trending information gathered about infectious diseases and bioterrorism attacks. Additionally, this unique partnership is providing lessons for public health informatics

Steve Vinoski on RPC and Messaging

There's been a stream of messages on the erlang list about RPC. Steve Vinoski's put an explanation of RPC and his position on his blog. Now he's followed up on the email list with a position on integration generally, and pub/sub messaging in particular...

One of the most effective forms of enterprise integration I've seen over the years is publish/subscribe messaging. I worked many years for CORBA vendors, and we'd often lose potential deals to messaging systems. Message queuing systems work well because (in no particular order):
  • they don't pretend to be programming language procedure or method calls, so they avoid the associated impedance mismatch problems
  • they don't try to hide distributed systems issues
  • coupling is low -- drop a message into a queue here, pick up a message from a queue there
  • queues can be persistent, or more generally, delivery guarantees can be varied as needed
  • asynchrony
  • payloads need not conform to some made-up IDL type system
  • getting two different messaging systems to interoperate is easier than getting two different RPC or distributed object systems to interoperate
The problem with messaging systems, though, is that traditionally they've been quite expensive. Thankfully, I believe AMQP (http://amqp.org/) solves that issue nicely, and of course Erlang is the perfect way to implement it, which Alexis, Tony, and the rest of the RabbitMQ guys have already done (http://www.rabbitmq.com/).
I agree with Steve's take. The best pub/sub mechanism I have experience with has been Tibco's rv. All the Tibco layers above it have been enterprisey bits to make a sale. The basic rv bus is super-easy to just use from C, Python, Java, anything. Horribly expensive back in the day, and I assume it still is.

AMQP should have helped address this. But I've not seen much on the internets about AMQP over the last many months. I wonder if this is going to go anywhere? Or am I just missing the traffic?

I'm not sure XMPP pub/sub in the enterprise is going anywhere too quickly either, but that's another candidate for getting beyond vendor lock-in for enterprise messaging.

Why on earth Cisco is defining a new RPC protocol is beyond me.

And to any AMQP folks: the redirect from http://amqp.org/ seems broken.

Enterprises should be moving toward HTTP first and foremost, figuring out what's happening with XMPP for various other kinds of messaging, and/or AMQP in situations where HTTP and XMPP don't seem to cut it.

Friday, May 23, 2008

Matthias Felleisen speaking at Portland State University on June 2

Just rec'd the following announcement on the pdxfunc email list...


Matthias Felleisen is speaking at PSU a week from Monday (June 2). The
talk is open to the public.

"The First Year in Computer Science"
Matthias Felleisen
Northeastern University
2nd June, 16:40–18:00
Location: Hoffman Hall Rm 109, Portland State University
Abstract

My team and I have developed an alternative approach to the first-year courses on programming and computing. Unlike conventional approaches, ours focuses on designing programs in a systematic manner. The syntax of the chosen programming languages (both Java and Scheme) is only discussed as needed to support design principles.

Field tests with over 500 high schools and a dozen colleges have shown that the approach produces better students than conventional approaches that use a single language.

In several controlled studies, we could also show that students find our curriculum more appealing than the AP curriculum.

In my talk, I will provide an overview of the project, especially its intellectual premises and principles. My goal is to encourage you to think about the first year in a different way. The old ones are of questionable value. If we want our beautiful discipline to survive, we must find a good way of teaching it."

http://web.cecs.pdx.edu/~colloq/felleisen-firstYear.html for more information.

Sunday, May 18, 2008

More Ted

This is part two of my response to Ted Neward's latest misunderstanding of Erlang. Ted wrote to Steve Vinoski...

Sorry, Steve-O, but I think you're out in left field on this one. I'm happy to argue it further with you over beer, but if you want the last word, have at it, and we'll compare scores when we run into each other at the next conference.
Everything I've read of Steve's tells me he's well informed by actual experience. Ted's writing exhibits the most basic misunderstandings. I wonder why Ted didn't spend a little beer time with some experts before attempting to write with supposed authority.

More from Ted...

any time we incorporate something directly as part of the language, there's all kinds of versioning and revision issues that come with it. This, to my mind, is one of Scala's (and F#'s and Lisp's and Clojure's and Scheme's and other composite languages') greatest strengths, the ability to create constructs that look like they're part of the language, but in fact come from libraries.
Most of Erlang's power comes from libraries, but Ted wouldn't know that. Erlang the language is very small and expressive, but Ted wouldn't know that. Erlang's not really suffered from revision issues over the nine years I've been following the language, but Ted wouldn't know that. And Ted apparently didn't bother to ask anyone or read anything that might inform himself.

I agree with Ted's observation that extensible language syntax is desirable. Erlang's extensibility is cumbersome and not-too-beautiful. That's not the goal of Erlang though.

So you have to decide whether the benefits of Erlang outweigh issues like this. By the way, Gambit Scheme has Erlang-like concurrency performance, although shared-everything. But since Scheme can be used to define new languages such as Termite, it can be used to create syntactically-extensible Erlang-ish systems.

If you decide to go with Erlang the decision should be based on what Erlang is now, and not expect Erlang to change much. It's been largely stable as a language for years, as opposed to the arms race in Java and Java-like worlds. Language extensibility should not be as important to you as the simple Erlang programming model. This is a reasonable approach to take. Erlang is a known quantity, well proven. Not every language has to be as extensible as Lisp, in fact most are not.

Beware of languages like Scala and Clojure that intend to be safer concurrently than Java - they still run on the JVM and integration with Java code. So by definition their safety only can go so far.

I do criticize Erlang for having its own VM (though I think it's not a VM, it's an interpreter, which is a far cry from an actual VM)
Maybe we should criticize Java for having its own JVM? Why doesn't it use Smalltalk's or Erlang's which pre-date the JVM? (Trick question. The JVM, and Hot Spot in particular, is based on ideas developed in Smalltalk, etc. Essentially the JVM is a dynamic language VM plus layers of cruft.)

Not much digging would have told Ted that Erlang compiles to native code or to an intermediate compiled code. Ted's misleading claims are heading into the dangerously ignorant. Why should I believe anything Ted writes about any topic? When Ted writes "I think that..." I think that we should read it as, "I'm guessing that..."

The JVM and the CLR have (literally) thousands of man-months sunk into them to reach high levels in all those areas. Can Erlang claim the same?
Well, yeah. I don't know how many "man months" it stacks up to be, but Erlang has been around for decades in some very demanding situations. Did Ted bother to look into any of these VM questions?
Best part is, the IT department doesn't have to do anything different to their existing Java-based network topology to start taking advantage of this. Can you say the same for Erlang?
If the architecture is going to change from "big, honkin' J2ee behemoths" to highly concurrent, lightweight interaction, then the topology should change. This is true whether moving from J2EE to Erlang or J2EE to JRuby/Rails or J2EE to Project Zero or J2EE to anything reasonably resembling modern best practices.

Moving to Erlang can take advantage of existing Java bits using the various network protocol standards or CORBA or SOAP or the JInterface class library which makes a Java application act like an Erlang node.

Frankly, whether the application you're monitoring hooks into the monitoring infrastructure is not really part of the argument, since Erlang doesn't offer that, either. I'm more concerned with whether the infrastructure is monitoring-friendly... if Erlang ties into SNMP out of the box with no work required by the programmer, please tell me where that's doc'ed and how it works!
Apparently Ted's world of monitoring begins and ends with Java and dotnet's JMX and WMI. How would you monitor both Java and dotnet? SNMP likely. Both JMX and WMI can be hooked into SNMP. SNMP is the internet standard, more mature, and extends beyond Java and dotnet.

I've never used it, but just doing a quick check shows that Erlang has a pretty extensive SNMP api and Reference Manual, and User Guide. Hmm. Did Ted even bother to look?

Yes, the JVM could easily adopt the multi-process model if it chose to. (Said work is being done via the Java Isolates JSR.) The CLR already does (via AppDomains).
I addressed Erlang vs. AppDomains in my previous post. These are clearly two different mechanisms with two different intents: AppDomains provide application isolation; Erlang processes provide finer-grained isolation of concurrent activities ("actor"-like) within an application.

Within an application neither the JVM nor the JVM-ish CLR provide Erlang-like processes. Scala and Clojure go a good bit of the way, but the advantage of Erlang is in its simplicity and combination of concurrency programming _and_ management. This goes well beyond a little bit of syntax around Java threads within a single JVM.

Barry Kelly has a really good blog post explaining more about these various VMs and Scala's Lift framework and implementation of Actors.

Toward Finer Tuning of Definitions of Processes and Languages, Reliably

Here's an attempt to define terms more finely (finally?) after reading the latest (final?) response from Ted Neward to Steve Vinoski's attempts at helping Ted to clarify differences between Erlang and other languages and run-times, esp. those based on Java/JVM-ish. At a superficial level of understanding what Ted writes has some meaning. Just enough to be troublesome. So here's my pass at making clarifications.

The first terms to narrow in on are "process" and "thread" as they are used in various languages and operating systems. There are similarities for sure, but the subtle differences are the important aspects when making comparisons. Ted writes a few things using these terms...

Processes in *nix are just as vulnerable to bad coding practices as are processes in Windows or Mac OS X
Aside: Mac OS X _is_ Unix. But that's for a different debate on the merits of Windows per se.

Anyway. I don't expect anyone to argue with Ted on this point, so let's dig deeper into "bad coding practices" and processes. Erlang is a "mostly-functional" language and there is very little shared memory in its concurrency model. This removes many opportunities for bad coding. So can we say that programming an operating system process in Erlang is less vulnerable than programming an operating system process using most popular languages? I would say yes, generally.

the robustness and reliability of a system is entirely held hostage to the skill and care of the worst programmer on the system
The emphasis is Ted's, not mine. There are many fine points to be made about this claim. One is the term "entirely" and another is the term "system". But first is the observation that this claim is about development practices more than anything else. To argue about language and mechanism shouldn't we assume that the teams are skilled at least in their own languages and mechanisms?

Let's say no and look deeper. You won't read much Erlang literature before you come across terms like robustness and reliability, and explanation of language constructs and framework mechanisms that support these traits.

On the other hand one can program in Java and similar languages for years before realizing that attention to robustness and reliability are even the developer's concern.

Back to "system" and "entirely" being at the mercy of the worst programmer. Perhaps the most important point made by the Erlang experts is to expect the worst and to design your system to recover from that. The Erlang culture does its best to help you isolate bad parts and the Erlang mechanisms make that isolation and recovery easier than with most other systems.

Erlang's model is similar to the CLR's AppDomain construct, which has been around since .NET 1.0, and Java's proposed "Isolate" feature which has yet to be implemented.
I'm not going to presume to know anything more about AppDomains than Ted know's about about Erlang, but this comparison seems true only barely. Here's some information I found about AppDomains...
The main purpose of an Application Domain is to isolate our applications from other applications. Application domains run on a single Win32 process... Objects in the same Application Domain communicate directly while Objects that exists in different Application Domains interact with each other by transporting copies of objects to each other or by using proxies for message exchange (by reference).
Any somewhat-skilled programmer familiar in the least with Erlang should be able to tell you significant differences between the AppDomain described above and Erlang's process model. I'm bothered that Ted Neward has such a large stage for making superficial statements that these mechanisms are in any way meaningfully the same. This is a significant disservice to programmers following his writing, trying to learn useful information. If Ted does not have (apparently) even a few days worth of reasonable Erlang experience or even reading, why should he presume to make such significant and horribly misleading statements to his large audience?

Here are the significant differences between these two mechanisms (Erlang processes and dotnet AppDomains)...

  • AppDomains appear to be used to put distinct applications inside the same Windows OS process.
  • Erlang processes are used to decompose a single application into many, small, independent, shared nothing processes.
  • Objects in different AppDomains are considered separate and rarely-interacting.
  • Processes in Erlang are roughly the same "size" as an instance of a major role-playing object in a single application. They collaborate with other processes frequently and are highly interactive.
  • Objects in an AppDomain are "shared-everything", running concurrent, shared-everything threads.
  • Erlang processes (the equivalent of objects in dotnet) in an application are "shared-nothing".
  • AppDomains pass copies of values or references to objects (as far as I can tell from this one article).
  • Erlang processes always pass copies of values. Erlang and its frameworks per se do not provide references, and so it's "shared-nothing" all the way down. The equivalent of "reference to an object" in Erlang is "identifier of a process".
  • AppDomains appear to be a large-grain, compile-time, declarative mechanism. Any Windows OS process would have just a few AppDomains. I would suppose these number in the dozens per OS process.
  • Erlang processes are a fine-grain, run-time, dynamic mechanism. An Erlang node (essentially an OS process) would have many hundreds of processes or more.
Ted continues...
if the argument here is that Erlang's reliability comes from its lack of shared state between threads, hell, man, that's hardly a difficult architecture to cook up. Most transactional systems get there pretty easily, including EJB, though then programmers then go through dozens of hoops to try and break it.
Ted does a pretty good job of hinting that EJB is a broken model. EJB isolation is more like an AppDomain. EJB separates one application from another, with fairly laborious mechanisms required to get them to communicate with each other.

Within an EJB application there is supposed to be no concurrency at all. And so again this is the opposite of Erlang. Erlang encourages high concurrency and ease of interaction. But these points are apparently lost on Ted. Usually one cannot read a single page of Erlang literature without having this made absolutely obvious.

The next post will get into Ted's arguments about language and common run-times.

Blog Archive

About Me

Portland, Oregon, United States
I'm usually writing from my favorite location on the planet, the pacific northwest of the u.s. I write for myself only and unless otherwise specified my posts here should not be taken as representing an official position of my employer. Contact me at my gee mail account, username patrickdlogan.