Bibliography
[1] “PCI Express® Base Specification Revision 6.0.” [Online]. Available: pcisig.com/pci-express-6.0-specification.
[2] “Compute Express® Link (CXL) Specification Revision 3.0.” [Online]. Available: www.computeexpresslink.org/download-the-specification.
[3] A. Avizienis, J.-C. Laprie, B. Randell, and C. Landwehr, “Basic Concepts and Taxonomy of Dependable and Secure Computing,” IEEE Trans. Dependable Secur. Comput., vol. 1, no. 1, pp. 11–33, Jan. 2004, doi: 10.1109/TDSC.2004.2.
[4] M. Snir et al., “Addressing Failures in Exascale Computing,” Int. J. High Perform. Comput. Appl., vol. 28, no. 2, pp. 129–173, May 2014, doi: 10.1177/1094342014522573.
[5] Y. Kim et al., “Flipping Bits in Memory without Accessing Them: An Experimental Study of DRAM Disturbance Errors,” in Proceeding of the 41st Annual International Symposium on Computer Architecuture, 2014, pp. 361–372.
[6] P. Radojkovic, “Towards Resilient EU HPC Systems: A Blueprint,” in Proceedings of the 16th ACM International Conference on Computing Frontiers, New York, NY, USA, 2019, p. 339, doi: 10.1145/3310273.3323434.
[7] F. Cappello, G. Al, W. Gropp, S. Kale, B. Kramer, and M. Snir, “Toward Exascale Resilience: 2014 Update,” Supercomput. Front. Innov.: Int. J., vol. 1, no. 1, pp. 5–28, Apr. 2014, doi: 10.14529/jsfi140101.
[8] B. Schroeder, E. Pinheiro, and W.-D. Weber, “DRAM Errors in the Wild: A Large-Scale Field Study,” Commun. ACM, vol. 54, no. 2, pp. 100–107, Feb. 2011, doi: 10.1145/1897816.1897844.
[9] V. Sridharan and D. Liberty, “A Study of DRAM Failures in the Field,” in International Conference on High Performance Computing, Networking, Storage and Analysis (SC), 2012, pp. 76:1–76:11.
[10] D. Zivanovic et al., “DRAM Errors in the Field: A Statistical Approach,” 2019.
[11] A. A. Hwang, I. A. Stefanovici, and B. Schroeder, “Cosmic Rays Don’t Strike Twice: Understanding the Nature of DRAM Errors and the Implications for System Design,” 2012.
[12] J. Meza, Q. Wu, S. Kumar, and O. Mutlu, “Revisiting Memory Errors in Large-Scale Production Data Centers: Analysis and Modeling of New Trends from the Field,” in IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2015, pp. 415–426.
[13] D. Tang, P. Carruthers, Z. Totari, and M. W. Shapiro, “Assessment of the Effect of Memory Page Retirement on System RAS Against Hardware Faults,” 2006.
[14] X. Du et al., “Fault-Aware Prediction-Guided Page Offlining for Uncorrectable Memory Error Prevention,” 2021.
[15] C. D. Martino, Z. Kalbarczyk, R. K. Iyer, F. Baccanico, J. Fullop, and W. Kramer, “Lessons Learned from the Analysis of System Failures at Petascale: The Case of Blue Waters,” in International Conference on Dependable Systems and Networks (DSN), 2014, pp. 610–621.
[16] X. Du, C. Li, S. Zhou, M. Ye, and J. Li, “Predicting Uncorrectable Memory Errors for Proactive Replacement: An Empirical Study on Large-Scale Field Data,” 2020.
[17] “RISC-V Instruction Set Manual, Volume II: Privileged Architecture.” [Online]. Available: github.com/riscv/riscv-isa-manual.
[18] “RISC-V Instruction Set Manual, Volume I: Unprivileged ISA.” [Online]. Available: github.com/riscv/riscv-isa-manual.