In a recent comment here Dan Creswell refers to an algorithm for taking a global snapshot across a distributed message system...
...if our snapshot could just capture the "after" state, then we'd be done. However, we can't do that either because we're not allowed to stop the other messages flowing through the system after we take the local snapshot at a node. Therefore, by the time all nodes take their local snapshots, the global system state may have changed.From an interesting-looking course in distributed systems at Washington University, St. Louis.So, we define a recent and consistent global state as a global state that "could have occurred" between the "before" state and the "after" state. In other words, consider the sequence S of communication events that occurs in the system between the "before" and the "after" state. A recent and consistent global state X is reachable from the "before" state by some subsequence of the events in C. Furthermore, the "after" state is reachable from the state X by the remaining events in C. Notice that this global state didn't necessarily occur at any point in real time, but it "could have occured" between the "before" and "after" states using the same sequence of communication events.
Global snapshots are useful in a wide variety of distributed applications. One application is in distributed databases, for instance a group of bank branches. Another use is deadlock detection: a global snapshot can be examined to see if there has been any progress made by the algorithm. Termination of a distributed algorithm can detected in the same way.
1 comment:
Also there are a couple of books out there, covering this topic and others:
Distributed Computing: Principles, Algorithms and Systems
http://www.amazon.com/gp/product/0521876346
Distributed Algorithms:
http://www.amazon.com/gp/product/1558603484
The former is probably the more accessible of the two texts.
Post a Comment