Please use this identifier to cite or link to this item: http://localhost/handle/Hannan/630275
Title: Using Low Cost Erasure and Error Correction Schemes to Improve Reliability of Commodity DRAM Systems
Authors: Hsing-Min Chen;Supreet Jeloka;Akhil Arunkumar;David Blaauw;Carole-Jean Wu;Trevor Mudge;Chaitali Chakrabarti
subject: DRAM Memory system|error control coding (ECC)|erasure and error correction|chipkill-correct|reliability
Year: 2016
Publisher: IEEE
Abstract: Most server-grade systems provide Chipkill-Correct error protection at the expense of power and performance. In this paper we present a low overhead solution to improving the reliability of commodity DRAM systems with no change in the existing memory architecture. Specifically, we propose five erasure and error correction (E-ECC) schemes that provide at least Chipkill-Correct protection for x4 (Schemes 1, 2 and 3), x8 (Scheme 4) and x16 (Scheme 5) DRAM systems. All schemes have superior error correction performance due to the use of strong symbol-based codes. Synthesis results in 28 nm node show that the decoding latency of these codes is negligible compared to the DRAM access latency. In addition, we make use of erasure codes to extend the lifetime of the DRAM systems. Specifically, once a chip is marked faulty due to persistent errors, all E-ECC schemes correct erasures due to that faulty chip and also correct an additional random error in a second chip. Evaluation with SPEC2006 workloads show that compared to x4 Chipkill-Correct schemes, Scheme 5 has the highest IPC improvement (mean of 7 percent) and Scheme 4 has the largest power reduction (mean of 18 percent) and the largest increase in energy efficiency (mean of 25 percent).
Description: 
URI: http://localhost/handle/Hannan/181814
http://localhost/handle/Hannan/630275
ISSN: 0018-9340
volume: 65
issue: 12
Appears in Collections:2016

Files in This Item:
File Description SizeFormat 
7447716.pdf1.97 MBAdobe PDFThumbnail
Preview File
Title: Using Low Cost Erasure and Error Correction Schemes to Improve Reliability of Commodity DRAM Systems
Authors: Hsing-Min Chen;Supreet Jeloka;Akhil Arunkumar;David Blaauw;Carole-Jean Wu;Trevor Mudge;Chaitali Chakrabarti
subject: DRAM Memory system|error control coding (ECC)|erasure and error correction|chipkill-correct|reliability
Year: 2016
Publisher: IEEE
Abstract: Most server-grade systems provide Chipkill-Correct error protection at the expense of power and performance. In this paper we present a low overhead solution to improving the reliability of commodity DRAM systems with no change in the existing memory architecture. Specifically, we propose five erasure and error correction (E-ECC) schemes that provide at least Chipkill-Correct protection for x4 (Schemes 1, 2 and 3), x8 (Scheme 4) and x16 (Scheme 5) DRAM systems. All schemes have superior error correction performance due to the use of strong symbol-based codes. Synthesis results in 28 nm node show that the decoding latency of these codes is negligible compared to the DRAM access latency. In addition, we make use of erasure codes to extend the lifetime of the DRAM systems. Specifically, once a chip is marked faulty due to persistent errors, all E-ECC schemes correct erasures due to that faulty chip and also correct an additional random error in a second chip. Evaluation with SPEC2006 workloads show that compared to x4 Chipkill-Correct schemes, Scheme 5 has the highest IPC improvement (mean of 7 percent) and Scheme 4 has the largest power reduction (mean of 18 percent) and the largest increase in energy efficiency (mean of 25 percent).
Description: 
URI: http://localhost/handle/Hannan/181814
http://localhost/handle/Hannan/630275
ISSN: 0018-9340
volume: 65
issue: 12
Appears in Collections:2016

Files in This Item:
File Description SizeFormat 
7447716.pdf1.97 MBAdobe PDFThumbnail
Preview File
Title: Using Low Cost Erasure and Error Correction Schemes to Improve Reliability of Commodity DRAM Systems
Authors: Hsing-Min Chen;Supreet Jeloka;Akhil Arunkumar;David Blaauw;Carole-Jean Wu;Trevor Mudge;Chaitali Chakrabarti
subject: DRAM Memory system|error control coding (ECC)|erasure and error correction|chipkill-correct|reliability
Year: 2016
Publisher: IEEE
Abstract: Most server-grade systems provide Chipkill-Correct error protection at the expense of power and performance. In this paper we present a low overhead solution to improving the reliability of commodity DRAM systems with no change in the existing memory architecture. Specifically, we propose five erasure and error correction (E-ECC) schemes that provide at least Chipkill-Correct protection for x4 (Schemes 1, 2 and 3), x8 (Scheme 4) and x16 (Scheme 5) DRAM systems. All schemes have superior error correction performance due to the use of strong symbol-based codes. Synthesis results in 28 nm node show that the decoding latency of these codes is negligible compared to the DRAM access latency. In addition, we make use of erasure codes to extend the lifetime of the DRAM systems. Specifically, once a chip is marked faulty due to persistent errors, all E-ECC schemes correct erasures due to that faulty chip and also correct an additional random error in a second chip. Evaluation with SPEC2006 workloads show that compared to x4 Chipkill-Correct schemes, Scheme 5 has the highest IPC improvement (mean of 7 percent) and Scheme 4 has the largest power reduction (mean of 18 percent) and the largest increase in energy efficiency (mean of 25 percent).
Description: 
URI: http://localhost/handle/Hannan/181814
http://localhost/handle/Hannan/630275
ISSN: 0018-9340
volume: 65
issue: 12
Appears in Collections:2016

Files in This Item:
File Description SizeFormat 
7447716.pdf1.97 MBAdobe PDFThumbnail
Preview File