Inverse filter - Misplaced Pages

Signal processing is an electrical engineering subfield that focuses on analysing, modifying, and synthesizing signals such as sound, images, and scientific measurements. For example, with a filter g, an inverse filter h is one such that the sequence of applying g then h to a signal results in the original signal. Software or electronic inverse filters are often used to compensate for the effect of unwanted environmental filtering of signals.

In speech science

In all proposed models for the production of human speech, an important variable is the waveform of the airflow, or volume velocity, at the glottis. The glottal volume velocity waveform provides the link between movements of the vocal folds and the acoustical results of such movements, in that the glottis acts approximately as a source of volume velocity. That is, the impedance of the glottis is usually much higher than that of the vocal tract, and so glottal airflow is controlled mostly (but not entirely) by glottal area and subglottal pressure, and not by vocal-tract acoustics. This view of voiced speech production is often referred to as the source-filter model.

A technique for obtaining an estimate of the glottal volume velocity waveform during voiced speech is the “inverse-filtering” of either the radiated acoustic waveform, as measured by a microphone having a good low frequency response, or the volume velocity at the mouth, as measured by a pneumotachograph at the mouth having a linear response, little speech distortion, and a response time of under approximately 1/2 ms. A pneumotachograph having these properties was first described by Rothenberg and termed by him a circumferentially vented mask or CV mask.

As practiced, inverse-filtering is usually limited to non-nasalized or slightly nasalized vowels, and the recorded waveform is passed through an “inverse-filter” having a transfer characteristic that is the inverse of the transfer characteristic of the supraglottal vocal tract configuration at that moment. The transfer characteristic of the supraglottal vocal tract is defined with the input to the vocal tract considered to be the volume velocity at the glottis. For non-nasalized vowels, assuming a high-impedance volume velocity source at the glottis, the transfer function of the vocal tract below about 3000 Hz contains a number of pairs of complex-conjugate poles, more commonly referred to as resonances or formants. Thus, an inverse-filter would have a pair of complex-conjugate zeroes, more commonly referred to as an anti-resonance, for every vocal tract formant in the frequency range of interest.

If the input is from a microphone, and not a CV mask or its equivalent, the inverse filter also must have a pole at zero frequency (an integration operation) to account for the radiation characteristic that connects volume velocity with acoustic pressure. Inverse filtering the output of a CV mask retains the level of zero flow, while inverse filtering a microphone signal does not.

Inverse filtering depends on the source-filter model and a vocal tract filter that is linear system, however, the source and filter need not be independent.

References

Sengupta, Nandini; Sahidullah, Md; Saha, Goutam (August 2016). "Lung sound classification using cepstral-based statistical features". Computers in Biology and Medicine. 75 (1): 118–129. doi:10.1016/j.compbiomed.2016.05.013. PMID 27286184.
^ M. Rothenberg, A new inverse-filtering technique for deriving the glottal air flow waveform during voicing, J. Acoust. Soc. Amer., Vol. 53, #6, 1632 - 1645

Speech synthesis

Free software

Speaking	eSpeak/eSpeakNG Gnopernicus Gnuspeech Orca Festival Speech Synthesis System/Flite FreeTTS Automatik Text Reader Retrieval-based Voice Conversion
Singing	eCantorix Lyricos / Flinger Sinsy Retrieval-based Voice Conversion

Proprietary
software

Speaking	Amazon Polly DECtalk Software Automatic Mouth Talk It! Microsoft Agent Microsoft Speech API Microsoft text-to-speech voices Readspeaker Voice browser CoolSpeech IVONA CereProc CeVIO Creative Studio Voiceroid LaLaVoice 15.ai ElevenLabs
Singing	Alter/Ego Cantor CeVIO Creative Studio Chipspeech NIAONiao Virtual Singer PPG Phonem Symphonic Choirs UTAU Vocalina Vocaloid Xiaoice

Machine

Applications

Protocols

Developers/
Researchers

Process

Category:

Speech synthesis