5 Fault-Tolerant Examples

This chapter shows how to use the failure model to build robust distributed applications. We first present basic fault-tolerant versions of common language operations. Then we present fault-tolerant versions of the server examples. We conclude with a bigger example: reliable objects with recovery.

5.1 A fault-tolerant broadcast channel

We will show how to define a simple open fault-tolerant broadcast channel. The example uses almost every concept seen in the former chapter. This is a useful abstraction; for example it can be used as the heart of a chat tool such as IRC. The service has a client/server structure and is aware of permanent crashes of clients or the server. In case of a client crash, the system continues to work. In case of a server crash, the service will no longer be available. Clients receive notification of this.

Users access the broadcast service through a local client. The user creates a client object that is initialized with a ticket provided by the server. The client object has a method sendMessage for broadcasting a message. When the client object receives a message or is notified of a client or server crash, it informs the user by calling a user-defined procedure with one argument. The following events are possible:

message(UserID Mess): receive the message Mess from client UserID.
registered(UserID): the client identified by UserID has registered to the channel.
unregistered(UserID): the client identified by UserID has unregistered from the channel.
permClient(UserID): the client identified by UserID has crashed.
permServer: the broadcast channel server has crashed.

We give an example of how the broadcast channel is used, and we follow this by showing its implementation. We first show how to use and implement a non-fault-tolerant broadcast channel, and then we show the small extensions needed for it to detect client and server crashes.

5.1.1 Sample use (no fault tolerance)

First we create the channel server. The server is a port. To connect with clients, the server is made available via a ticket with unlimited connection ability. The ticket is available through a publicly-accessible URL.

local S={NewChannelServer} in {Pickle.save {Connection.offerMany S} "/usr/staff/pvr/public_html/chat"} end

A client can be created on another site. We first define on the client's site a procedure HandleIncomingMessage that will handle incoming messages from the broadcast channel. Then we access to the channel by its URL. Finally, we create a local client and give it our handler procedure.

local proc {HandleIncomingMessage M} {Show {VirtualString.toString case M of message(From Content) then From#' : '#Content [] registered(UserID) then UserID#' joined us' [] unregistered(UserID) then UserID#' left us' end}} end S={Connection.take {Pickle.load "http://www.info.ucl.ac.be/~pvr/chat"}} C={New ChannelClient init(S 'Raphael' HandleIncomingMessage)} in {For 1 1000 1 proc {$ I} {C sendMessage('hello'#I)} {Delay 800} end} {C close} end

In this example we send 1000 messages of the form 'hello'#I, where I takes successive values from 1 to 1000. Then we close the client.

Note that the client needs the code of the class ChannelClient. It is not difficult to avoid that requirement by making the server provide a functor, that the client applies, and which instantiates a ChannelClient. Please consult the Application Programming tutorial to learn more about that.

5.1.2 Definition (no fault tolerance)

The whole client-server communication is implemented with ports. They have a well-defined behavior in the case of a permanent crash: the operation Send has no more effect. We will show the service's implementation in two steps. First, we show how it is written without taking fault tolerance into account. Second, we complete the example by adding fault handling code. This is easy; it amounts to add a few threads monitoring the fault state of the clients and server ports.

The client is an object with the following structure, while the server is a port created by the function NewChannelServer.

<Client class and server function>=: class ChannelClient feat client userID attr server <Client methods> end fun {NewChannelServer} Stream Server={NewPort Stream} <Server's serving procedures> in thread {Serve Stream} end Server end

Client definition

The client has three methods. The first one, init, initializes a client with a server port, a user identifier, and a message handler procedure. The second one, close, deregisters the client from the service. The last one, sendMessage, sends a message for broadcast.

<Client methods>=: meth init(Server UserID MsgHandler) Ms in @server = Server self.client={NewPort Ms} self.userID=UserID {Send Server register(self.client self.userID)} thread {ForAll Ms MsgHandler} end <Client's server failure handler> end meth close() {Send @server unregister(self.client self.userID)} server := unit end meth sendMessage(M) {Send @server broadcast(self.userID M)} end

The client keeps a reference to the server, to itself for unregistering, and to its user identification. The user-defined handler procedure is directly applied to the stream of incoming messages from the server. In our case, the server only sends events as described above. As we can see, the server must be able to process three kinds of messages: register and unregister for managing a client's connection to the server, and broadcast for sending messages to all clients.

There is one statement in method init that is in charge of handling server failures. Currently we assume that statement is skip, which is the semantics of the handler in case of no failure.

A user should access the broadcast channel only through a client.

Server definition

The server's procedure Serve must process client messages (at least). In order to avoid inter-client dependencies when forwarding messages, the server creates one thread per registered client, that processes incoming messages to the server, and sends events to that client only. That thread is created when the server observes a registration message from a client. This is possible because of the simplicity of the service.

<Server's serving procedures>=: proc {Serve X|T} case X of register(Client UserID) then {ServeClient Client UserID X|T} else skip end {Serve T} end <Server's simple ServeClient>

The procedure ServeClient creates one thread that sends appropriate event messages to a given client. Currently the thread processes the three messages sent by clients, namely register, unregister, and broadcast. Note how the client thread terminates automatically when its client unregisters: an exception is raised in the processing loop, and caught at the outer level. This makes the registration management implicit in the server.

<Server's simple ServeClient>=: proc {ServeClient Client UserID L} proc {Loop X|T} case X of broadcast(U M) then {Send Client message(U M)} [] register(C U) then {Send Client registered(C U)} [] unregister(C U) andthen C==Client then raise done end [] unregister(C U) then {Send Client unregistered(C U)} end {Loop T} end in thread try {Loop L} catch done then skip end end end

Note that clients can receive server messages at different rates. The server does not wait for client acknowledgement when sending messages. However, the order of messages is preserved because the ServeClient threads all read from the same sequential stream.

Note that clients are identified uniquely by references to the client object Client, and not by the client's user ID UserID. This is visible in the processing of the message unregister. This means that the channel will work correctly even if there are clients with the same user ID. The users may get confused, but the channel will not.

5.1.3 Sample use (with fault tolerance)

The fault-tolerant channel can be used in exactly the same way as the non-fault-tolerant version. The only difference is that the user-defined handler procedure can receive two extra messages, permClient and permServer, to indicate client and server crashes:

proc {UserMessageHandler Msg} {Show {VirtualString.toString case Msg of message(From Content) then From#' : '#Content [] registered(UserID) then UserID#' joined us' [] unregistered(UserID) then UserID#' left us' [] permClient(UserID) then UserID#' has crashed' [] permServer then 'Server has crashed' end}} end

5.1.4 Definition (with fault tolerance)

The non-fault-tolerant version of Section 5.1.2 is easily extended to detect client and server crashes.

Client definition

This definition extends the definition given in Section 5.1.2. The client creates a concurrent failure handler that monitors the server's port with its fault stream. Whenever the server's port reaches state permFail, the client sends the message permServer to its own port. This message will be seen by the user message handler.

<Client's server failure handler>=: thread if {List.member permFail {DP.getFaultStream S}} then {Send self.client permServer} server := unit end end

Notice that the attribute server is set to unit, like in method close. This makes every subsequent call to sendMessage on the client object fail with an exception.

Server definition

We extend the server's definition to handle permanent failures from clients. For that purpose we modify a bit the Loop procedure in ServeClient, and add a failure handler that monitors the client's port.

<Server's robust ServeClient>=: proc {ServeClient Client UserID L} proc {Loop X|T} case X of broadcast(U M) then {Send Client message(U M)} [] register(C U) then {Send Client registered(C U)} [] unregister(C U) andthen C==Client then raise done end [] unregister(C U) then {Send Client unregistered(C U)} [] permClient(C U) andthen C==Client then raise done end [] permClient(C U) then {Send Client permClient(U)} end {Loop T} end in thread try {Loop L} catch done then skip end end thread if {List.member permFail {DP.getFaultStream Client}} then {Send Server permClient(Client UserID)} end end end

The client thread in the server now also processes the message permClient. The thread that recognizes its own client port in the message raises the exception done to stop sending messages to that client, just like if the client was unregistered. Otherwise it simply notifies its client with the appropriate event. The failure handler created by ServeClient is responsible for sending the message permClient to the server's port. The message will be seen by all client threads on the server.