Feature extraction design for embedded neural network urban sound classifier
Urban sound research has become a hot topic in recent years for city growth observation and surveillance application through noise source identification. However, the sound identification is challenging due to the multiple sound sources that are blended. There are also new sounds that are unclassifi...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
2021
|
Subjects: | |
Online Access: | http://eprints.utm.my/id/eprint/99482/1/LimChinShenMKE2021.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Urban sound research has become a hot topic in recent years for city growth observation and surveillance application through noise source identification. However, the sound identification is challenging due to the multiple sound sources that are blended. There are also new sounds that are unclassified by recent studies as the region of the city becomes more developed. In recent work of audio classification, the features of sound are extracted by its image which is obtained from the pattern of time-frequency representation or otherwise known as spectrogram. This project aims to design a noise robust, neural network urban sounds classifier that is implemented on an embedded system. Two feature extractors that converts audio to image will be explored and compared to produce better features for urban sound. Mel Frequency Cepstral Coefficient (MFCC) is commonly used throughout all sound classifiers with good results while Gammatone Frequency Cepstral Coefficient (GFCC) is an emerging feature extractor said to be better at extracting noisy data. Urbansound8k, which contains 8732 labelled sound classified into eight classes, is used as the dataset. Different decibels of noise were added to the dataset to simulate the actual urban sound scenario and to explore the noise robustness of the two feature extractors. To classify urban sound, the audio is converted into an image. Therefore, Convolutional Neural Network (CNN) model is employed because it is one of the best machine learning models for image. Since the design are focusing on embedded system application, lightweight CNN model MobileNetV2 will be used in this project. The feature extractor and the neural network model will be developed using a python language and TensorFlow library. The experimental result shows that MFCC outperforms GFCC in terms of classification accuracy by an average of 14.34% across all SNR levels. MFCC is also more robust to noise in dataset, with 2.75% and 2.87% drop in accuracy at 30dB and 10dB noise signal respectively compared to baseline of noiseless signal, whereas GFCC has a drop of 6.18% and 3.87% at 30dB and 10dB noise signal respectively. |
---|