Please use this identifier to cite or link to this item: http://localhost/handle/Hannan/717032
Title: Joint Estimation of Reverberation Time and Early-To-Late Reverberation Ratio From Single-Channel Speech Signals
Other Titles: IEEE/ACM Transactions on Audio, Speech, and Language Processing
Authors: Feifei Xiong|Stefan Goetze|Birger Kollmeier|Bernd T. Meyer
subject: multi-task learning|temporal modulation features|Reverberation time|early-to-late reverberation ratio|joint estimation
Year: -1-Uns- -1
Abstract: The reverberation time (RT) and the early-to-late reverberation ratio (ELR) are two key parameters commonly used to characterize acoustic room environments. In contrast to conventional blind estimation methods that process the two parameters separately, we propose a model for joint estimation to predict the RT and the ELR simultaneously from single-channel speech signals from either full-band or sub-band frequency data, which is referred to as joint room parameter estimator (jROPE). An artificial neural network is employed to learn the mapping from acoustic observations to the RT and the ELR classes. Auditory-inspired acoustic features obtained by temporal modulation filtering of the speech time-frequency representations are used as input for the neural network. Based on an in-depth analysis of the dependency between the RT and the ELR, a two-dimensional (RT, ELR) distribution with constrained boundaries is derived, which is then exploited to evaluate four different configurations for jROPE. Experimental results show that-in comparison to the single-task ROPE system which individually estimates the RT or the ELR-jROPE provides improved results for both tasks in various reverberant and (diffuse) noisy environments. Among the four proposed joint types, the one incorporating multi-task learning with shared input and hidden layers yields the best estimation accuracies on average. When encountering extreme reverberant conditions with RTs and ELRs lying beyond the derived (RT, ELR) distribution, the type considering RT and ELR as a joint parameter performs robustly, in particular. From state-of-the-art algorithms that were tested in the acoustic characterization of environments challenge, jROPE achieves comparable results among the best for all individual tasks (RT and ELR estimation from full-band and sub-band signals).
URI: http://localhost/handle/Hannan/717032
ISBN: 2329-9290
volume: Volume
issue: Issue
Appears in Collections:New Ieee 2019

Files in This Item:
File Description SizeFormat 
08506462.pdf2.27 MBAdobe PDFThumbnail
Preview File
Title: Joint Estimation of Reverberation Time and Early-To-Late Reverberation Ratio From Single-Channel Speech Signals
Other Titles: IEEE/ACM Transactions on Audio, Speech, and Language Processing
Authors: Feifei Xiong|Stefan Goetze|Birger Kollmeier|Bernd T. Meyer
subject: multi-task learning|temporal modulation features|Reverberation time|early-to-late reverberation ratio|joint estimation
Year: -1-Uns- -1
Abstract: The reverberation time (RT) and the early-to-late reverberation ratio (ELR) are two key parameters commonly used to characterize acoustic room environments. In contrast to conventional blind estimation methods that process the two parameters separately, we propose a model for joint estimation to predict the RT and the ELR simultaneously from single-channel speech signals from either full-band or sub-band frequency data, which is referred to as joint room parameter estimator (jROPE). An artificial neural network is employed to learn the mapping from acoustic observations to the RT and the ELR classes. Auditory-inspired acoustic features obtained by temporal modulation filtering of the speech time-frequency representations are used as input for the neural network. Based on an in-depth analysis of the dependency between the RT and the ELR, a two-dimensional (RT, ELR) distribution with constrained boundaries is derived, which is then exploited to evaluate four different configurations for jROPE. Experimental results show that-in comparison to the single-task ROPE system which individually estimates the RT or the ELR-jROPE provides improved results for both tasks in various reverberant and (diffuse) noisy environments. Among the four proposed joint types, the one incorporating multi-task learning with shared input and hidden layers yields the best estimation accuracies on average. When encountering extreme reverberant conditions with RTs and ELRs lying beyond the derived (RT, ELR) distribution, the type considering RT and ELR as a joint parameter performs robustly, in particular. From state-of-the-art algorithms that were tested in the acoustic characterization of environments challenge, jROPE achieves comparable results among the best for all individual tasks (RT and ELR estimation from full-band and sub-band signals).
URI: http://localhost/handle/Hannan/717032
ISBN: 2329-9290
volume: Volume
issue: Issue
Appears in Collections:New Ieee 2019

Files in This Item:
File Description SizeFormat 
08506462.pdf2.27 MBAdobe PDFThumbnail
Preview File
Title: Joint Estimation of Reverberation Time and Early-To-Late Reverberation Ratio From Single-Channel Speech Signals
Other Titles: IEEE/ACM Transactions on Audio, Speech, and Language Processing
Authors: Feifei Xiong|Stefan Goetze|Birger Kollmeier|Bernd T. Meyer
subject: multi-task learning|temporal modulation features|Reverberation time|early-to-late reverberation ratio|joint estimation
Year: -1-Uns- -1
Abstract: The reverberation time (RT) and the early-to-late reverberation ratio (ELR) are two key parameters commonly used to characterize acoustic room environments. In contrast to conventional blind estimation methods that process the two parameters separately, we propose a model for joint estimation to predict the RT and the ELR simultaneously from single-channel speech signals from either full-band or sub-band frequency data, which is referred to as joint room parameter estimator (jROPE). An artificial neural network is employed to learn the mapping from acoustic observations to the RT and the ELR classes. Auditory-inspired acoustic features obtained by temporal modulation filtering of the speech time-frequency representations are used as input for the neural network. Based on an in-depth analysis of the dependency between the RT and the ELR, a two-dimensional (RT, ELR) distribution with constrained boundaries is derived, which is then exploited to evaluate four different configurations for jROPE. Experimental results show that-in comparison to the single-task ROPE system which individually estimates the RT or the ELR-jROPE provides improved results for both tasks in various reverberant and (diffuse) noisy environments. Among the four proposed joint types, the one incorporating multi-task learning with shared input and hidden layers yields the best estimation accuracies on average. When encountering extreme reverberant conditions with RTs and ELRs lying beyond the derived (RT, ELR) distribution, the type considering RT and ELR as a joint parameter performs robustly, in particular. From state-of-the-art algorithms that were tested in the acoustic characterization of environments challenge, jROPE achieves comparable results among the best for all individual tasks (RT and ELR estimation from full-band and sub-band signals).
URI: http://localhost/handle/Hannan/717032
ISBN: 2329-9290
volume: Volume
issue: Issue
Appears in Collections:New Ieee 2019

Files in This Item:
File Description SizeFormat 
08506462.pdf2.27 MBAdobe PDFThumbnail
Preview File