Abstract:
In view of the different characteristics of software and hardware
reliability problems in computer systems, the latest development status of
fault tolerance technology is discussed, and various fault tolerance methods in
computer systems are analyzed, including traditional redundancy design, error
rollback recovery mechanism and general fault tolerance design methods which
are studied more at present. This paper studies the defects of some existing fault
tolerance methods in response delay, fault tolerance cost, accurate quantization,
heterogeneous synchronization, reliability modeling and other aspects as well as
the key problems to be solved, and summarizes how to further improve and use
these fault tolerance methods.
针对计算机系统中软、硬件可靠性问题的不同特点,讨论容错技术的最新发展现状,分析计算机系统中的各种容错方法,包括传统的冗余设计、错误回卷恢复机制以及当前研究较多的一般化容错设计方法等,研究目前已有的一些容错方法在反应延迟、容错成本、精确量化、异构同步、可靠性建模等方面存在的缺
陷以及待解决关键问题,并对如何进一步更好地完善和使用这些容错方法进行总结。