"I have a mind like a steel... uh... thingy." Patrick Logan's weblog.

Search This Blog

Thursday, July 03, 2008

Taking a Global Snapshot

In a recent comment here Dan Creswell refers to an algorithm for taking a global snapshot across a distributed message system...

...if our snapshot could just capture the "after" state, then we'd be done. However, we can't do that either because we're not allowed to stop the other messages flowing through the system after we take the local snapshot at a node. Therefore, by the time all nodes take their local snapshots, the global system state may have changed.

So, we define a recent and consistent global state as a global state that "could have occurred" between the "before" state and the "after" state. In other words, consider the sequence S of communication events that occurs in the system between the "before" and the "after" state. A recent and consistent global state X is reachable from the "before" state by some subsequence of the events in C. Furthermore, the "after" state is reachable from the state X by the remaining events in C. Notice that this global state didn't necessarily occur at any point in real time, but it "could have occured" between the "before" and "after" states using the same sequence of communication events.

Global snapshots are useful in a wide variety of distributed applications. One application is in distributed databases, for instance a group of bank branches. Another use is deadlock detection: a global snapshot can be examined to see if there has been any progress made by the algorithm. Termination of a distributed algorithm can detected in the same way.

From an interesting-looking course in distributed systems at Washington University, St. Louis.

Judge vs. Law: "their privacy concerns are speculative"

From the EFF deeplinks blog...

Yesterday, in the Viacom v. Google litigation, the federal court for the Southern District of New York ordered Google to produce to Viacom (over Google's objections):
all data from the Logging database concerning each time a YouTube video has been viewed on the YouTube website or through embedding on a third-party website...

for each instance a video is watched, the unique “login ID” of the user who watched it, the time when the user started to watch the video, the internet protocol address other devices connected to the internet use to identify the user’s computer (“IP address”), and the identifier for the video.


Google correctly argued that “the data should not be disclosed because of the users’ privacy concerns,” citing the VPPA, 18 U.S.C. § 2710. However, the Court dismissed this argument with no analysis, stating “defendants cite no authority barring them from disclosing such information in civil discovery proceedings, and their privacy concerns are speculative.”

Dang. Wish I hadn't watched so much Rick Astley. I hope Google tells the judge, "Never gonna give you up..."

Reminder: Program Casualties

Bill de hÓra responds to yet another RPC effort...

"JSON is the sweet spot of type driven interop"


Wednesday, July 02, 2008

Lisp's 50th Birthday Celebration

This will coincide with OOPSLA in Nashville. John McCarthy will present.


50th! Take that, other dynamic languages!

Tuesday, July 01, 2008

RabbitMQ / XMPP Gateway

RabbitMQ and XMPP... Now get the best of both worlds, I guess, through this gateway.

Out of Control

James Robertson observes OnStar, the Pentagon, and your local transit authority losing control over their control freakishness...

This is like DRM, but with extra stupidity added in.
Because I am _real_ worried I might end up in Speed 3.

Re: Question about message passing paradigm

Responding to a thread on the erlang-questions list...
The problem we are discussing is processes B, C, D hold information X, Y, Z respectively; process A wants a coherent snapshot of X, Y,Z.

There are actually two slightly different cases depending on A needs "X Y Z as of *now*" (A, B, C, and D must all synchronise), or A needs "X Y Z as of *some* time in the recent past" (B, C, and D must all synchronise but can then send the information to A without waiting for A to receive it).

I like this problem because it is simple yet subtle. One way that it is subtle is that in "multithread" programming most people STILL think in terms of a single absolute time shared by all threads. (After all, there _is_ such a thing, the system clock. And yes, it's not exactly true, but it's close enough to make people _think_ it's true.) But when you start thinking about Erlang and especially *distributed* Erlang, you start realising that "now" is a pretty fuzzy concept.

Yes, the problem seems simple yet subtle. The downside is there are many unwritten constraints (or not) on any specific problem that could lead the solution alternatives one way or another. Unless you want to really dig into those, then the cost/benefit of one solution or another could be more or less off.

e.g. why not coordinate through an in-memory database? This could be reasonable, or not. We don't know enough.

Why not schedule the source processes to send a message on a periodic or scheduled basis? This could be reasonable, or not, and cut down the message traffic, which seemed to be a concern.

Why is sending fewer than N messages a concern? Why does one process have to collect the information? How much information? How tight is the deadline? Is "now" an actual timestamp or just some unknown point in time that a request has been received? How close to "now" do the other "nows" have to be with respect to each other? Can you widen that window if it would decrease the effort to build?

If synchronization across the processes is needed then is an "eventually consistent" approach reasonable if it lowers the effort to build?

Interesting stuff, but challenging to talk about in when the details are too abstract.

Sunday, June 29, 2008

Thoughts on Hotels and Coffee in Belfast

As long as I'm writing down my thoughts on this trip: I've been in two hotels in Belfast, the Hilton and the Radisson. Both are nice, close to the city center, and on the water.

They both have internet frustrations: neither have wireless in the rooms; the Hilton charges for wired internet in the rooms at ₤15 per 24 hours! When you pay that fee then they also throw in wireless _in the lobby_! Well, it's BT and the voucher is good for any BT provider in that 24 hour period. I found BT is the provider around some nearby coffee shops.

The Radisson at least provides free, but wired, internet. The problem then is the MacBook's adapter for UK plugs do not fit in the limited space above the immovable desk provided in the room! The nearest outlet that provides enough clearance for the adapter is across the room on the other side of the bed. If I had a 25' extension cord for either electricity or the internet, then this might be less frustrating.

(Let me update this with a warning: if you intend to use the internets with any expectation of getting bits back to your laptop, avoid the Radisson altogether. Band-width-wise this turned out to be simply unusable for long stretches.)

And so on to coffee: Caffe Nero is the best I've found in the city center so far, and the best I've found anywhere for a large chain. http://www.caffenero.com/

Blog Archive

About Me

Portland, Oregon, United States
I'm usually writing from my favorite location on the planet, the pacific northwest of the u.s. I write for myself only and unless otherwise specified my posts here should not be taken as representing an official position of my employer. Contact me at my gee mail account, username patrickdlogan.