Please use this identifier to cite or link to this item: http://localhost/handle/Hannan/232385
Title: Detection of Double Compressed AMR Audio Using Stacked Autoencoder
Authors: Da Luo;Rui Yang;Bin Li;Jiwu Huang
Year: 2017
Publisher: IEEE
Abstract: The adaptive multi-rate (AMR) audio codec adopted by many portable recording devices is widely used in speech compression. The use of AMR speech recordings as evidence in court is growing. Nowadays, it is easy to tamper with digital speech recordings, which makes audio forensics increasingly important. The detection of double compressed audio is one of the key issues in audio forensics. In this paper, we propose a framework for detecting double compressed AMR audio based on the stacked autoencoder (SAE) network and the universal background model-Gaussian mixture model (UBM-GMM). Instead of hand-crafted features, we used the SAE to learn the optimal features automatically from the audio waveforms. Audio frames are used as network input and the last hidden layer's output constitutes the features of a single frame. For an audio clip with many frames, the features of all the frames are aggregated and classified by UBM-GMM. Experimental results show that our method is effective in distinguishing single/double compressed AMR audio and outperforms the existing methods by achieving a detection accuracy of 98% on the TIMIT database. Exhaustive experiments demonstrate the effectiveness and robustness of the proposed method.
URI: http://localhost/handle/Hannan/232385
volume: 12
issue: 2
More Information: 432,
444
Appears in Collections:2017

Files in This Item:
File SizeFormat 
7707437.pdf3.37 MBAdobe PDF
Title: Detection of Double Compressed AMR Audio Using Stacked Autoencoder
Authors: Da Luo;Rui Yang;Bin Li;Jiwu Huang
Year: 2017
Publisher: IEEE
Abstract: The adaptive multi-rate (AMR) audio codec adopted by many portable recording devices is widely used in speech compression. The use of AMR speech recordings as evidence in court is growing. Nowadays, it is easy to tamper with digital speech recordings, which makes audio forensics increasingly important. The detection of double compressed audio is one of the key issues in audio forensics. In this paper, we propose a framework for detecting double compressed AMR audio based on the stacked autoencoder (SAE) network and the universal background model-Gaussian mixture model (UBM-GMM). Instead of hand-crafted features, we used the SAE to learn the optimal features automatically from the audio waveforms. Audio frames are used as network input and the last hidden layer's output constitutes the features of a single frame. For an audio clip with many frames, the features of all the frames are aggregated and classified by UBM-GMM. Experimental results show that our method is effective in distinguishing single/double compressed AMR audio and outperforms the existing methods by achieving a detection accuracy of 98% on the TIMIT database. Exhaustive experiments demonstrate the effectiveness and robustness of the proposed method.
URI: http://localhost/handle/Hannan/232385
volume: 12
issue: 2
More Information: 432,
444
Appears in Collections:2017

Files in This Item:
File SizeFormat 
7707437.pdf3.37 MBAdobe PDF
Title: Detection of Double Compressed AMR Audio Using Stacked Autoencoder
Authors: Da Luo;Rui Yang;Bin Li;Jiwu Huang
Year: 2017
Publisher: IEEE
Abstract: The adaptive multi-rate (AMR) audio codec adopted by many portable recording devices is widely used in speech compression. The use of AMR speech recordings as evidence in court is growing. Nowadays, it is easy to tamper with digital speech recordings, which makes audio forensics increasingly important. The detection of double compressed audio is one of the key issues in audio forensics. In this paper, we propose a framework for detecting double compressed AMR audio based on the stacked autoencoder (SAE) network and the universal background model-Gaussian mixture model (UBM-GMM). Instead of hand-crafted features, we used the SAE to learn the optimal features automatically from the audio waveforms. Audio frames are used as network input and the last hidden layer's output constitutes the features of a single frame. For an audio clip with many frames, the features of all the frames are aggregated and classified by UBM-GMM. Experimental results show that our method is effective in distinguishing single/double compressed AMR audio and outperforms the existing methods by achieving a detection accuracy of 98% on the TIMIT database. Exhaustive experiments demonstrate the effectiveness and robustness of the proposed method.
URI: http://localhost/handle/Hannan/232385
volume: 12
issue: 2
More Information: 432,
444
Appears in Collections:2017

Files in This Item:
File SizeFormat 
7707437.pdf3.37 MBAdobe PDF