11 General Distributed Programming Support: `DP`

This module presents the primitive support for distribution provided by the Mozart system. It gives the programmer some control on how a language entity is distributed, with a system of annotations. It also provides primitives to handle distribution failures.

11.1 Initializing the Distribution Layer

The distribution layer of Mozart is dynamically loaded when used. Its load and initialization can be forced with the procedures DP.init and DP.initWith.

init

{DP.init}

Initializes the distribution layer with parameters determined automatically by the system. The call has no effect if the layer has already been initialized.

initWith

{DP.initWith +Spec}

Initializes the distribution layer according to Spec. It has no effect if the layer had already been initialized. The value Spec is a record of the type

init(ip: IP <= best port: PN <= 'from'(9000) firewall: FW <= false)

All entries in record Spec are optional. The default values shown above are used if not given. The fields have the following meaning.

IP defines the IP address that the Mozart site should expose as its home address. It is either a string like "193.10.66.192" or the atom best. With the latter, the system will map the host name of the computer to an IP address. The IP address should only be set if the operating system returns a faulty address for some reason.
PN sets the port number that should be used for listening to incoming connection attempts. If PN has the form exact(N), the distribution layer will try to obtain the port number N. If PN has the form 'from'(N), the system will try to obtain the port number N. If that port is unavailable, port N+1 will be tried, then N+2, and so on, until a port number is granted by the operating system. If PN is free, then the distribution layer will pick the first port number granted by the operating system.
FW is a boolean value that indicates if this site's host is behind a firewall. If it has value true, connection attempts will be tried from inside to outside the firewall.

The record Spec given to DP.initWith can also be set as property 'dp.listenerParams' in module Property. Setting the property does not force the distribution layer to be initialized. The call {DP.initWith Spec} is equivalent to

{Property.put 'dp.listenerParams' {Adjoin {Property.get 'dp.listenerParams'} Spec} {DP.init}

11.2 Distribution Annotations

Since Mozart 1.4.0, the distribution support provides more choices on how an entity is distributed. The programmer can choose between several protocols for the management of an entity's state, and its distributed memory management. The choice is specified by attaching an annotation to the entity. Once an entity is distributed, the system determines its distribution parameters (state protocol and memory management) by looking up the entity's annotation. If some parameters are missing in the annotation, the system uses default values for the corresponding parameters. Default values are discussed below (Section 11.2.3).

An annotation value is either an atom or a list of atoms. Valid values are given below. It may specify a protocol (like stationary), a memory management scheme (like [persistent]), or both (like [stationary persistent]). The annotation is complete if both a protocol and a memory management scheme are given, otherwise it is partial.

11.2.1 Annotating Entities

The following two operations handle entity operations. The annotation mechanism is incremental: one can annotate an entity several times, the resulting annotation for the entity being the conjunction of the given annotations.

annotate: {DP.annotate X +Annot}; Annotate entity X with the given annotation. The annotation must be consistent and valid for the given entity. An exception is raised otherwise.
getAnnotation: {DP.getAnnotation X ?Annot}; Return the annotation Annot of entity X as a list of atomic annotations. The list is empty if the entity has not been annotated yet.

11.2.2 Annotation Values

The following table lists the valid protocol annotations, and to which kind of entity they can be applied.

Value	Entity type	Description
`sited`	mutable, immutable	Makes an entity sited (no distribution support). The entity will appear outside its home site as an unusable.
`stationary`	mutable, immutable	The entity's state remains on its home site.
`migratory`, `pilgrim`	mutable	The state of the entity is freely mobile; it migrates on sites where entity operations must be performed. With the `pilgrim` protocol, the state continuously moves between the sites that regularly perform operations on the entity.
`replicated`	mutable	The entity's state is replicated on all sites referring to the entity. Read operations are performed locally, while write operations atomically update all replicas with a two-phase commit-like protocol.
`immediate`	immutable	The complete representation of the entity is sent together with its reference.
`eager`, `lazy`	immutable	With those protocols, the representation of the entity is sent at most once to a site. The `eager` protocol copies the entity as soon as possible, while the `lazy` protocol delays the copy until it is required on the site.
`variable`, `reply`	transient	For variables and read-only futures. The `reply` protocol is an optimization of the `variable` protocol for the case where two sites share the variable, and the variable is bound by the remote site. That protocol should not be used with read-only futures.

The table below lists annotations for distributed memory management.

Annotation	Description
`persistent`	The entity remains alive forever.
`credit`	All references to the entity (on sites or in network messages) embed a certain amount of credits. The total amount of credits remains constant, and remote sites can exchange references without notifying the home site (they put a part of their own credits with the reference they exchange). When all credits are sent back to the home site (by the remote sites' respective garbage collectors), that site is able to remove the entity from its own memory.
`lease`	All sites referring to an entity periodically notify its home site of their presence. When the home site has not had any notification for a long time, it considers that the entity has no more remote references, and makes the entity local to its site. This scheme allows to collect memory in the case where remote references are lost with site failures. It is not guaranteed correct, though.

Note. The annotations credit and lease can be combined in a single annotation as [credit lease]. In that case, the home site will localize the entity as soon as one of both schemes declares it as local. The credit scheme will react quicker in case of no failure, while the lease scheme will still allow garbage collection in case some credits are lost in a site failure.

11.2.3 Default Annotations

The system defines default annotations for all types of entities. Those default annotations are set as system properties in the module Property (Section ``Distribution: dp''). The default annotation set for a given type must be complete and adequate for that type of entity. For instance, the statement below configures objects to be stationary by default, and classes to be copied lazily. Note that default annotations are used for entities created on the current site only.

{Property.put 'dp.annotation.object' [stationary credit]} {Property.put 'dp.annotation.class' [lazy credit]}

11.3 Fault Handling

At global scale, there is only one kind of entity failure in Mozart: the entity permanently cease to function. For instance, if the home site of an entity crashes, and that site was necessary for the entity to function properly, the entity itself fails permanently. The failure of an entity can also be provoked on purpose by the programmer. This may simplify some fault handling algorithms, because it makes sure the entity is failed permanently.

At local scale, every site can observe entity failures. It may observe the global failure of an entity, but it may also observe unknown failures. Bad network communication can prevent an operation to perform on an entity. The failure may be temporary, and go away once the network communication is reestablished. But it may also hide a real permanent failure. The possible failures observed at a site are

ok: the entity seems to be working properly.
tempFail: the entity is unusable on this site because of an unknown failure.
localFail: the entity is permanently unusable on this site, but the failure may not be global. This failure is provoked by the operation break below.
permFail: the entity is failed globally.

Valid fault state transitions are shown in Figure 11.1. One can see that from fault states localFail and permFail, one can never go back to state ok again (see Section 11.5, though).

Figure 11.1: State diagram with valid fault state transitions

A synchronous operation on a failed entity simply blocks. If the failure is transient, then the operation naturally resumes once the failure goes away. If the failure is permanent, then the operation blocks forever. Contrary to versions of Mozart prior to 1.4.0, an operation on failed entity never raises an exception because of the failure.

An asynchronous operation on a failed entity returns immediately as usual. If the failure is temporary, then the operation will eventually perform its effect. In the case of a permanent failure, the effect will never occur.

11.3.1 Entity Fault Stream

Every site maintains a local fault state for every entity, and that fault state is available to the programmer. In order to allow the programmer to react to fault state transitions, the site provides a fault stream for every entity. The fault stream of an entity is extended automatically with that entity's local fault state every time it changes.

getFaultStream: {DP.getFaultStream X ?FS}; Return the current tail of the fault stream of entity X, prefixed with the current fault state of X. Therefore, the current fault state of X is given by {DP.getFaultStream X}.1.

Once an entity is removed from memory by the local garbage collector, its fault stream is automatically closed, i.e., its tail is bound to nil. This gives a simple way for a fault handler to detect that the entity it watches is no longer in use. This finalization mechanism is triggered at the same time as the post-mortem finalization of the entity (Chapter 26).

11.3.2 Kill and Break

Entity failure can be provoked explicitly by the programmer. It can apply either globally, or locally. These operations can be useful in order to guarantee that an entity will no longer be used.

kill: {DP.kill X}; Eventually make entity X fail permanently, if possible. This operation is asynchronous, and is not guaranteed to proceed. Its success is confirmed once the fault state permFail appears on the entity's fault stream.
break: {DP.break X}; This operation makes the entity X fail on the current site, if possible. The operation proceeds immediately, since only the current site is involved. It has no effect on other sites.

11.4 Special Case: Variables

Variables are a bit trickier to manipulate because of their transient nature. One may experience a race condition if a variable is bound and concurrently annotated: the annotation may fail. In order to avoid such a race condition, one should perform the annotation on a fresh variable, then bind the latter to the target variable. If the target variable is not annotated yet, the annotation is automatically transferred to it.

local Y in {DP.annotate Y reply} % annotate Y X=Y % transfer annotation to X if possible end

A similar rule applies for the fault stream of a variable. If a variable is bound to another variable, their fault streams are merged: their tails are bound to each other, possibly with an intermediate fault state if they have different fault states.

local Y in FS={DP.getFaultStream Y} % take fault stream of Y X=Y % FS is now the fault stream of X, too end

Note that if the variable is bound to a value, its fault stream is closed.

11.5 Limitations

If an entity X has fault state localFail on a site S, then no operation on X will ever succeed on S. This property is guaranteed, but only as long as X remains in memory of S. Indeed, this state has no effect on other sites, and by default only S knows that fault state for X. If X is removed by the garbage collector of S, then its fault state on S vanishes. In that case, if X is reintroduced later to S, it might be considered in state ok again.
Since version 1.4.0, the fault state tempFail is triggered based on an adaptive timeout. The system monitors its site connections, and determines an average round-trip time and deviation for each connection. Because the mechanism is adaptive, it requires some time to find appropriate values. Therefore the failure detection can sometimes appear too sensitive, and sometimes a bit sloppy.