Please use this identifier to cite or link to this item: http://localhost/handle/Hannan/581837
Title: VLSI Implementation of Fully Parallel LTE Turbo Decoders
Authors: An Li;Luping Xiang;Taihai Chen;Robert G. Maunder;Bashir M. Al-Hashimi;Lajos Hanzo
subject: LTE turbo code|fully-parallel turbo decoder|VLSI design
Year: 2016
Publisher: IEEE
Abstract: Turbo codes facilitate near-capacity transmission throughputs by achieving a reliable iterative forward error correction. However, owing to the serial data dependence imposed by the logarithmic Bahl-Cocke-Jelinek-Raviv algorithm, the limited processing throughputs of the conventional turbo decoder implementations impose a severe bottleneck upon the overall throughputs of real-time communication schemes. Motivated by this, we recently proposed a floating-point fully parallel turbo decoder (FPTD) algorithm, which eliminates the serial data dependence, allowing parallel processing and hence significantly reducing the number of clock cycles required. In this paper, we conceive a technique for reducing the critical datapath of the FPTD, and we propose a novel fixed-point version as well as its very large scale integration (VLSI) implementation. We also propose a novel technique, which allows the FPTD to also decode shorter frames employing compatible interleaver patterns. We strike beneficial tradeoffs amongst the latency, core area, and energy consumption by investigating the minimum bit widths and techniques for message log-likelihood ratio scaling and state metric normalization. Accordingly, the design flow and design tradeoffs considered in this paper are also applicable to other fixed-point implementations of error correction decoders. We demonstrate that upon using Taiwan Semiconductor Manufacturing Company (TSMC) 65-nm low-power technology for decoding the longest long-term evolution frames (6144 b) received over an additive white Gaussian noise channel having E<sub>b</sub>/N<sub>0</sub> = 1 dB, the proposed fixed-point FPTD VLSI achieves a processing throughput of 21.9 Gb/s and a processing latency of 0.28 &#x03BC;s. These results are 17.1 times superior to those of the state-of-the-art benchmarker. Furthermore, the proposed fixed-point FPTD VLSI achieves an energy consumption of 2.69 &#x03BC;J/frame and a normalized core area of 5 mm2 /Gb/s, which are 34% and 23% lower than those of the benchmarker, respectively.
Description: 
URI: http://localhost/handle/Hannan/182293
http://localhost/handle/Hannan/581837
ISSN: 2169-3536
volume: 4
Appears in Collections:2016

Files in This Item:
File Description SizeFormat 
7378273.pdf11.53 MBAdobe PDFThumbnail
Preview File
Title: VLSI Implementation of Fully Parallel LTE Turbo Decoders
Authors: An Li;Luping Xiang;Taihai Chen;Robert G. Maunder;Bashir M. Al-Hashimi;Lajos Hanzo
subject: LTE turbo code|fully-parallel turbo decoder|VLSI design
Year: 2016
Publisher: IEEE
Abstract: Turbo codes facilitate near-capacity transmission throughputs by achieving a reliable iterative forward error correction. However, owing to the serial data dependence imposed by the logarithmic Bahl-Cocke-Jelinek-Raviv algorithm, the limited processing throughputs of the conventional turbo decoder implementations impose a severe bottleneck upon the overall throughputs of real-time communication schemes. Motivated by this, we recently proposed a floating-point fully parallel turbo decoder (FPTD) algorithm, which eliminates the serial data dependence, allowing parallel processing and hence significantly reducing the number of clock cycles required. In this paper, we conceive a technique for reducing the critical datapath of the FPTD, and we propose a novel fixed-point version as well as its very large scale integration (VLSI) implementation. We also propose a novel technique, which allows the FPTD to also decode shorter frames employing compatible interleaver patterns. We strike beneficial tradeoffs amongst the latency, core area, and energy consumption by investigating the minimum bit widths and techniques for message log-likelihood ratio scaling and state metric normalization. Accordingly, the design flow and design tradeoffs considered in this paper are also applicable to other fixed-point implementations of error correction decoders. We demonstrate that upon using Taiwan Semiconductor Manufacturing Company (TSMC) 65-nm low-power technology for decoding the longest long-term evolution frames (6144 b) received over an additive white Gaussian noise channel having E<sub>b</sub>/N<sub>0</sub> = 1 dB, the proposed fixed-point FPTD VLSI achieves a processing throughput of 21.9 Gb/s and a processing latency of 0.28 &#x03BC;s. These results are 17.1 times superior to those of the state-of-the-art benchmarker. Furthermore, the proposed fixed-point FPTD VLSI achieves an energy consumption of 2.69 &#x03BC;J/frame and a normalized core area of 5 mm2 /Gb/s, which are 34% and 23% lower than those of the benchmarker, respectively.
Description: 
URI: http://localhost/handle/Hannan/182293
http://localhost/handle/Hannan/581837
ISSN: 2169-3536
volume: 4
Appears in Collections:2016

Files in This Item:
File Description SizeFormat 
7378273.pdf11.53 MBAdobe PDFThumbnail
Preview File
Title: VLSI Implementation of Fully Parallel LTE Turbo Decoders
Authors: An Li;Luping Xiang;Taihai Chen;Robert G. Maunder;Bashir M. Al-Hashimi;Lajos Hanzo
subject: LTE turbo code|fully-parallel turbo decoder|VLSI design
Year: 2016
Publisher: IEEE
Abstract: Turbo codes facilitate near-capacity transmission throughputs by achieving a reliable iterative forward error correction. However, owing to the serial data dependence imposed by the logarithmic Bahl-Cocke-Jelinek-Raviv algorithm, the limited processing throughputs of the conventional turbo decoder implementations impose a severe bottleneck upon the overall throughputs of real-time communication schemes. Motivated by this, we recently proposed a floating-point fully parallel turbo decoder (FPTD) algorithm, which eliminates the serial data dependence, allowing parallel processing and hence significantly reducing the number of clock cycles required. In this paper, we conceive a technique for reducing the critical datapath of the FPTD, and we propose a novel fixed-point version as well as its very large scale integration (VLSI) implementation. We also propose a novel technique, which allows the FPTD to also decode shorter frames employing compatible interleaver patterns. We strike beneficial tradeoffs amongst the latency, core area, and energy consumption by investigating the minimum bit widths and techniques for message log-likelihood ratio scaling and state metric normalization. Accordingly, the design flow and design tradeoffs considered in this paper are also applicable to other fixed-point implementations of error correction decoders. We demonstrate that upon using Taiwan Semiconductor Manufacturing Company (TSMC) 65-nm low-power technology for decoding the longest long-term evolution frames (6144 b) received over an additive white Gaussian noise channel having E<sub>b</sub>/N<sub>0</sub> = 1 dB, the proposed fixed-point FPTD VLSI achieves a processing throughput of 21.9 Gb/s and a processing latency of 0.28 &#x03BC;s. These results are 17.1 times superior to those of the state-of-the-art benchmarker. Furthermore, the proposed fixed-point FPTD VLSI achieves an energy consumption of 2.69 &#x03BC;J/frame and a normalized core area of 5 mm2 /Gb/s, which are 34% and 23% lower than those of the benchmarker, respectively.
Description: 
URI: http://localhost/handle/Hannan/182293
http://localhost/handle/Hannan/581837
ISSN: 2169-3536
volume: 4
Appears in Collections:2016

Files in This Item:
File Description SizeFormat 
7378273.pdf11.53 MBAdobe PDFThumbnail
Preview File