A DNN-Based Accurate Masking Using Significant Feature Sets

dc.contributor.authorShoba Sivapatham
dc.contributor.authorPankaj Goel
dc.contributor.authorSrikanth Burra
dc.contributor.authorPitikhate Sooraksa
dc.contributor.authorAsutosh Kar
dc.date.accessioned2026-05-08T19:23:05Z
dc.date.issued2022-11-23
dc.description.abstractMonaural speech separation has remained a very challenging problem for a longtime which can be addressed using a supervised learning approach that uses features of the noisy input to predict an accurate time-frequency mask. Effective acoustic phonetic features can help in the accurate mask prediction at low Signal-to-Noise Ratios (SNRs). Individual features capture specific attributes of the audio signal; therefore, it’s essential to employ a set of features. This work examines different combinations of monaural features as input and ideal ratio mask a straining target to the DNN model. Feature combination sets are constructed by examining single features and then combining the most relevant ones. The results are evaluated for different feature combinations under non-stationary noises at low SNR levels. The feature performance is evaluated by using intelligibility and quality measures. A combination of two features is considered the best feature combination as it indicates a significant increase in speech intelligibility as compared to individual features and combinations consisting of more than two features.
dc.identifier.doi10.1109/ictke55848.2022.9982802
dc.identifier.urihttps://dspace.kmitl.ac.th/handle/123456789/18883
dc.subjectSpeech and Audio Processing
dc.subjectMusic and Audio Processing
dc.subjectVideo Analysis and Summarization
dc.titleA DNN-Based Accurate Masking Using Significant Feature Sets
dc.typeArticle

Files

Collections