Logo image
Performability Modeling And Analysis Of Fault Tolerance Support In Communication Protocols
Technical documentation   Open access

Performability Modeling And Analysis Of Fault Tolerance Support In Communication Protocols

Samian Kaur
Rutgers University
2000
DOI:
https://doi.org/10.7282/T39027DX

Abstract

There has been much research in assessing the performance of different messaging systems, but often messaging systems cannot be completely expressed by performance metrics alone. For an emerging class of large-scale distributed servers, robustness is at least as as important as performance. Three factors make protocol robustness critical: (i) these servers have very high availability requirements (e.g., minutes of down-time per year), implying that even occasional message loss cannot be catastrophic; (ii) intra-server communication depends on external client service demands, making it extremely difficult to exert enough control over the system "by design'' to avoid message loss; and (iii) many commodity LANs do not implement sufficient hardware flow control to always prevent loss inside the network under arbitrarily adverse communication patterns. Most of the current paradigms of reliable communication either provide strong consistency semantics with high overhead (e.g. transactional RPC) or reliability with indeterminate failure states using retransmissions(e.g.,TCP/IP). This work aims at building a new messaging layer that provides additional recovery states for applications to allow designers to reason about the cause of the error and to build customized recovery mechanisms. We present the design and implementation of a high performance Active Message (AM) layer over the Virtual Interface Architecture (VIA) library as such a messaging infrastructure. Its performance is evaluated to ensure that the additional recovery states are achieved at a reasonable overhead. We then present a queuing model to allow in the analysis and evaluation of robustness of this messaging protocol by computing its performance as a function of dependability in the presence of component and overall failures.
pdf
dcs-tr-426219.58 kBDownloadView
Version of Record (VoR) Open Access
url
Report an accessibility issueView
Please complete a content remediation request to report an accessibility issue with a library electronic resource, website, or service.

Metrics

96 File downloads
60 Record Views

Details

Logo image