Automatically assigned DDC number:
Manually assigned DDC number: 00435
Title: Replication For Efficiency And Fault Tolerance In A Dsm System
Author:
Subject: Anne-marie Kermarrec Replication For Efficiency And Fault Tolerance In A Dsm System
Description: Distributed Shared Memory (DSM) systems implemented on a network of workstations (NOW) have become a convenient alternative to shared memory architectures to execute long running parallel applications. However, such architectures are susceptible to experience failures. This paper presents the design and implementation of a recoverable DSM (RDSM) based on a backward error recovery (BER) mechanism. Our RDSM's design has focused on exploiting replication of data for both fault-tolerance and efficiency. This RDSM has been implemented on a NOW and performance evaluation shows the benefits of exploiting both types of replication to design an efficient, scalable and low-cost recoverable DSM. Key Words: Distributed Shared Memory, Replication, Fault Tolerance, Network of Workstations. 1 INTRODUCTION Networks of workstations (now) are an attractive and much cheaper alternative [1] to shared memory parallel architectures for executing long-running parallel applications. A dsm [2] implemented o...
Contributor: The Pennsylvania State University CiteSeer Archives
Publisher: unknown
Date: 1998-04-03
Pubyear: unknown
Format: ps
Identifier: http://citeseer.ist.psu.edu/140391.html
Source: http://www.irisa.fr/EXTERNE/projet/solidor/members/../doc/ps97/pdcs.ps.gz
Language: en
Rights: unrestricted
<?xml version="1.0" encoding="UTF-8"?>
<references_metadata>
<rec ID="SELF" Type="SELF" CiteSeer_Book="SELF" CiteSeer_Volume="SELF" Title="Replication For Efficiency And Fault Tolerance In A Dsm System" />
</references_metadata>