Acoustic Analysis for Comparison and Identification of Normal and Disguised Speech of Individuals

: Although the rapid development of forensic speaker recognition technology has been conducted, there are still many problems to be solved. The biggest problem arises when the cases involving disguised voice samples come across for the purpose of examination and identification. Such type of voice samples of anonymous callers is frequently encountered in crimes involving kidnapping, blackmailing, hoax extortion and many more, where the speaker makes a deliberate effort to manipulate their natural voice in order to conceal their identity due to the fear of being caught. Voice disguise causes serious damage to the natural vocal parameters of the speakers and thus complicates the process of identification. The sole objective of this doctoral project is to find out the possibility of rendering definite opinions in cases involving disguised speech by experimentally determining the effects of different disguise forms on personal identification and percentage rate of speaker recognition for various voice disguise techniques such as raised pitch, lower pitch, increased nasality, covering the mouth, constricting tract, obstacle in mouth etc by analyzing and comparing the amount of phonetic and acoustic variation in of artificial (disguised) and natural sample of an individual, by auditory as well as spectrographic analysis.


Materials and Methods
This research was conducted at Voice Division of Directorate of Forensic Science, Gandhinagar and Institute of Forensic Science, Gujarat Forensic Sciences University, Gandhinagar.The study included disguise samples and control samples of 200 individuals of different sex, religion and age groups, mostly of Gujarat origin.Out of 200, 102 samples were collected from male speakers and 98 from female speakers of age group 20 to 60 years.Most of the speakers both male and females were in the age group of 25 to 35 years.All the voices samples were collected using high quality Digital recorder.The disguise voice samples were carefully collected from each speaker under distinctive condition which imposes certain variations in the acoustic and perceptual parameters of recorded voice sample.Besides this three control samples (routine voice sample) were also collected from each individual, in order to study the degree of variations among disguise voice and natural voice of a person.The disguise conditions on which we focused were: a. Keeping hand/cloth on mouth b.Variations in the vocal pitch c.Simulating anger d.Condition of extreme cold e. Condition of bad throat f.Chewing pan or tobacco g.Constriction of vocal tract h.Pinching nostrils i. Pulling cheeks j.Changing the accent and talking style k.Mimicry 1. Go Gear Philips digital recorder 2. High quality head phones 3. Data Cable 4. Gold wave software 5. Computerized speech lab model-4500 6. SIS Software 7. Voice Net automatic software 1.A transcript was prepared with the contents designed to simulate a blackmailing call of approximately 2 min duration, which was presented to each individual for collection of their voice sample.

Materials required
4. All the Voice samples were collected on Go Gear mix Philips Digital recorder at a distance of approximately 40 cm from the mouth of the speaker.
the speaker with the intention to modify his voice".This is the biggest limitation faced by the voice experts all over India [23].This study aims to solve problems occurring in the speech of individual due to different forms of disguise and assist the experts while examination of such challenging voice exhibits [24][25][26].
Therefore, the sole objective of this project is to find out the possibility of rendering definite opinions in cases involving disguised speech by experimentally determining the effects of different disguise forms on personal identification and percentage rate of speaker recognition for various voice disguise techniques such as raised pitch, lower pitch, increased nasality, covering the mouth, constricting tract, obstacle in mouth etc by analyzing and comparing the amount of phonetic and acoustic variation in of artificial (disguised) and natural sample of an individual, by auditory as well as spectrographic analysis [27,28].

Steps for sample collection
2. The recording were conducted in sound proof recording room of DFS, Gandhinagar 3.While collecting voice samples, all the speakers were asked to recite the same transcript four times in same session i.e. one in disguised state (with his/her choice) and three in control state.Therefore, a total four samples were collected from 200 different speakers.

5.
A duly filled consent form from each speaker was collected along with their voice samples.Also a declaration was provided to each speaker to ensure the secrecy and usability of their voice samples.
6.The detailed records of name, age, sex, concerned guardian, geographical origin and educational background of each speaker was maintained properly along with their samples.
• Auditory features: quality of speech sample, delivery of speech, frequently used words, pronunciation, accent, talking style, dialect used, flow of speech, degree of phonation, nature & degree of pauses, nasality and speech time (S/T) rate.
The results for the analysis were recorded and were statistically evaluated to frame the final conclusions.The statistics applied include: • Sampling rate: 11025 Hz • Bit rate : 172 Kbps • Bit depth: 16 bits • Channel: Mono • File Format: Wave with the help of Goldwave Software and saved.
All the disguised and control speech samples of each individual were then subjected to different softwares for comparison in order to determine the similarities and dissimilarities in their auditory and spectrographic parameters.Almost 22 acoustic parameters were compared for identification of disguised speakers including: The subjects were asked to give one of the voice samples by doing some modifications in their original voice.Among the 200 different subjects, the following disguise techniques were adopted including constriction of tract (6%), lowering of pitch (6%), pinching nostrils (9%), pulling cheeks (3%), raising pitch (10%), changing tone/accent (1%), covering mouth (34%), simulating anger (5%), state of cold (2%), mimicry (3%), with some obstacle in mouth (9%), protruding lips (3%), throat infection (3%) and whispering (6%) (Figure 1).

Examination and analysis of voice samples
1. Pearson correlation in order to measure the association between ideal and disguised speech parameters.2. Chi-square test for assessing the dependency between a set of observed values (disguised speech parameters) and those expected (Control speech parameters).3. Z-test for measuring the amount and nature of variations between disguised and control voice samples of individuals.

Preparing Files for Analysis
Each recording device has its individual format of recording the voice file.The files with inappropriate format do not suit for spectrographic analysis, therefore, it is recommended to convert the file into the accepted format:

Aural parameters in disguised and control/normal speech
While examination, analysis and comparison of disguised speech sample (n=200) with their respective controls (n=200), it was observed that, the disguising of the voice leads to the degradation of aural parameters with respect to the normal voice conditions.The auditory analysis was carried out in presence of 3 expert listeners in the age group of 25-40 years, using high quality headphones.The disguised and control speech samples of each individual were listened again and again in order to determine the amount of similarities and dissimilarities between them.The results of the analysis were recorded in the proper format (Figure 2).
The parameters like quality of speech, delivery of speech, flow of speech, speech rate and dynamic loudness degrades at higher degree in the disguised conditions as compared to their respective control samples.Degree of phonation in disguised voice samples showed moderate variations as compared to that in their control samples.The parameters like nasality and nature of pauses were found to be consistent with that in their control samples.
Most of the aural parameters of voice samples disguised by constricting tract, pinching nostrils, covering mouth, obstacle in mouth, in state of cold, in state of throat infection and whispering showed higher deviations from that in their respective control samples.On other hand the voice samples disguised by simulating anger, pulling cheeks and changing accent/tone showed high consistency and similarity in aural parameters with that of their respective control counterparts.
The variations in the aural parameters significantly depend upon the type of speech sample, and were found to be independent of sex of the speaker.

Quality of speech:
Voice quality is derived from a variety of laryngeal and supralaryngeal features, running continuously through the individual's speech.Speech quality degrades at higher level with the condition of voice disguise.About 61% of the total disguised speech samples collected from 200 subjects were having low quality of speech, while the percentage of low quality of speech in control voice samples was only 12% (5 times less than that in disguised samples).
A strong negative correlation in speech quality was observed between the samples disguised by constricting tract, lowering pitch, pinching nostrils, raising pitch, covering mouth, in state of cold, mimicry, obstacle in mouth, protruding lips, throat infection and whispering when compared to speech quality in their respective control samples, indicating significant variations (at alpha=0.05) between two samples (Table 1 and 2 On other hand, high consistency and a strong positive correlation was observed in speech quality of the voice samples disguised by pulling cheeks, simulation of anger and changing of accent/tone and their control counterparts, indicating non-significant variations (at alpha=0.05) between two samples (Table 1 and 2

Delivery of speech:
As voice imitation involves the manipulation of articulators to deliver a sound more close to the model voice, majority of about 65% the total disguised speech samples collected from 200 subjects showed low speech delivery, while the percentage of low speech delivery in control voice samples was 25% (2.6 times less than that in disguised samples).
A strong negative correlation in speech delivery was observed between the samples disguised by constricting tract, pinching nostrils, raising pitch, covering mouth, in state of cold, mimicry, obstacle in mouth, throat infection and whispering when compared to speech delivery in their respective control samples, indicating significant variations (at alpha=0.05) between two samples (Table 3 and 4).
On other hand, high consistency and a strong positive correlation was observed in speech delivery of the voice samples disguised by lowering pitch, pulling cheeks, simulation of anger, protruding lips and changing of accent/tone and their control counterparts, indicating non-significant variations (at alpha=0.05) between two samples (Table 3 and 4).

Degree of phonation
25% of the total disguised speech samples of both males and females showed low degree of phonation.While none of total control samples of both males and females showed low degree of phonation.This was due to the fact that phonation occurs when the potential energy of the airstream compressed airstream below the larynx converts into the kinetic energy of egressive airflow producing audible sounds.Any form of constriction or modification of the laryngeal passage (in case of voice disguise) results in the turbulence in the airflow causing audible friction, degrading the degree of phonation of voice.
A strong negative correlation in degree of phonation was observed between the samples disguised by constricting tract, pinching nostrils, in state of cold, obstacle in mouth, protruding lips, throat infection and whispering when compared to that in their respective control samples, indicating significant variations (at alpha=0.05) between two samples (Table 5 and 6).
On other hand, high consistency and a strong positive correlation was observed in degree of phonation of the voice samples disguised by lowering pitch, pulling cheeks, raising pitch, simulation of anger, protruding lips, mimicry and changing of accent/ tone and their control counterparts, indicating non-significant variations (at alpha=0.05) between two samples (Table 5 and 6).Flow of speech: Flow of speech strongly degrades with voice disguise.About 62% the total disguised speech samples collected from 200 subjects including both males and females showed degraded and low flow of speech, because of the unnatural manipulation of the vocal tract.On other only 11% of control voice samples of both males and females were having low flow of speech.
A strong negative correlation in speech delivery was observed between the samples disguised by constricting tract, pinching nostrils, raising, changing accent/tone, covering mouth, in state of cold, mimicry, obstacle in mouth, protruding lips, throat infection and whispering when compared to speech delivery in their respective control samples, indicating significant variations (at alpha=0.05) between two samples (Table 7 and 8).
On other hand, high consistency and a strong positive correlation was observed in flow of speech of the voice samples disguised by lowering pitch, pulling cheeks and simulation of anger and their control counterparts, indicating non-significant variations (at alpha=0.05) between two samples(Table 7 and 8).Speaking rate: Speech rate strongly degrades with voice disguise.About 43% the total disguised speech samples collected from 200 subjects including both males and females, showed high variations in speech rate from their control counterparts, because voice disguise is a conscious effort where at each point the impersonator has to go slow to impart perfection in imitated voice model.Sometimes the mimicry artist has to impersonate the voice of person having high speaking rate than his normal capacity.
A weak correlation in speech rate was observed between the samples disguised by constricting tract, lowering pitch, pinching nostrils, pulling cheeks, raising pitch, changing accent/tone, covering mouth, in state of cold, mimicry, obstacle in mouth, protruding lips, throat infection and whispering when compared to speech rate in their respective control samples, indicating significant variations (at alpha=0.05) between two samples (Table 9 and 10).
On other hand, moderate correlation was observed in speech rate of the voice samples disguised by simulation of anger and their control counterparts, indicating non-significant variations (at alpha=0.05) between two samples (Table 9 and 10).

Nasality:
The percentage of nasality and nonnasality in disguised samples of both males and females was found to be 12% and 88% respectively.100% of the control voice samples of both males and females showed non nasal sounds.The chi-square value for nasality in all disguised and control voice samples was found to be 23.45 (p<0.0001;df=1) which was found to be significant at alpha=0.05, rejecting the null hypothesis and accepting the alternate hypothesis that the variations seen in nasality significantly depends upon the type of speech sample.
Dynamic loudness: About 40% the total disguised speech samples collected from 200 subjects including both males and females showed low loudness.Loudness varies under different disguise conditions depending upon how much kinetic energy is been delivered to egressive speech sound by the impersonator.On other hand only 10% of control voice samples of both male and female subjects were having low loudness.

Volume 4 | Issue 4
Annex Publishers | www.annexpublishers.comA weak correlation in dynamic loudness was observed between the samples disguised by constricting tract, pinching nostrils, pulling cheeks, raising pitch, obstacle in mouth, simulating anger, in state of cold, mimicry, protruding lips, throat infection and whispering when compared to dynamic loudness in their respective control samples, indicating significant variations (at alpha=0.05) between two samples (Table 11 and 12).
On other hand, strong positive correlation was observed in dynamic loudness of the voice samples disguised by lowering pitch and their control counterparts, indicating non-significant variations (at alpha=0.05) between two samples (Table 11 and 12 Nature of pauses: 95% of the disguised voice samples and 100% of the control voice samples showed normal pauses.Only the voice samples disguised in state of cold and throat infection, showed abnormal pauses (5%).
The chi-square value for nature of pauses in all disguised and control voice samples was found to be 8.31 (p=0.0039;df=1) which was found to be significant at alpha=0.05, rejecting the null hypothesis and accepting the alternate hypothesis that the variations seen in nature of pauses significantly depends upon the type of speech sample.
Computerized speech lab model4500 was used for conducting the spectrographic analysis of voice samples.The spectrographic parameters like fundamental frequency, formant bands, formant frequencies, energy levels were found to be significantly more reliable in cases involving disguised speech samples than the aural parameters.The values of these voice parameters in disguised remained more consistent with that seen in their respective control samples as compared to aural parameters.

Spectrographic parameters in disguised and control/normal speech
Annex Publishers | www.annexpublishers.comVolume 4 | Issue 4 Third formant (F3) and Fourth formant (F4) were found to be more essential in identification of the disguised voice samples, followed by parameters like fundamental frequency (F0), first formant (F1), second formant (F2) and energy levels.Fifth formant (F5) found to be least important for comparison and identification of disguise voice samples.
Fundamental Frequency (F0): F0 was found to be crucial parameter for identification of voice samples disguised by constricting tract, lowering pitch (in male subjects), changing accent, pulling cheeks, in state of cold, simulating anger and covering mouth.The values of F0 in samples disguised by these techniques showed no significant variations from their control counterparts (at significance level of 0.05).
F0 does not found to be important for identification of voice samples disguised by lowering pitch (in female subjects), pinching nostrils, raising pitch, mimicry, obstacle in mouth, in state of throat infection and whispering.The values of F0 in samples disguised by these techniques showed significant variations from their control counterparts (at significance level of 0.05).
First formant (F1): F1 was found to be crucial parameter for identification of voice samples disguised by constricting tract, lowering pitch (in male subjects), pulling cheeks, raising pitch, in state of cold, simulating anger and covering mouth.The values of F1 in samples disguised by these techniques showed no significant variations from their control counterparts (at significance level of 0.05).
F1 does not found to be important for identification of voice samples disguised by lowering pitch (in female subjects), pinching nostrils, changing accent, mimicry, obstacle in mouth, in state of throat infection and whispering.The values of F1 in samples disguised by these techniques showed significant variations from their control counterparts (at significance level of 0.05).
Second formant (F2): F2 was found to be crucial parameter for identification of voice samples disguised by constricting tract, lowering pitch (in female subjects), mimicry obstacle in mouth, throat infection and whispering.The values of F2 in samples disguised by these techniques showed no significant variations from their control counterparts (at significance level of 0.05).
F2 does not found to be important for identification of voice samples disguised by lowering pitch (in male subjects), pinching nostrils, pulling cheeks, raising pitch, changing accent, covering mouth, simulating anger and in state of cold.The values of F2 in samples disguised by these techniques showed significant variations from their control counterparts (at significance level of 0.05).
Third formant (F3): F3 was found to be crucial parameter for identification of voice samples disguised by lowering pitch, pinching nostrils, pulling cheeks, raising pitch, changing accent, covering mouth, obstacle in mouth and whispering.The values of F3 in samples disguised by these techniques showed no significant variations from their control counterparts (at significance level of 0.05).
F3 does not found to be important for identification of voice samples disguised by constricting tract, simulating anger, in state of cold, mimicry and in state of throat infection.The values of F3 in samples disguised by these techniques showed significant variations from their control counterparts (at significance level of 0.05).
Fourth formant (F4): F4 was found to be crucial parameter for identification of voice samples disguised by lowering pitch, pinching nostrils, changing accent, covering mouth, simulating anger, in state of cold, mimicry, in state of throat infection and whispering.The values of F4 in samples disguised by these techniques showed no significant variations from their control counterparts (at significance level of 0.05).
F4 does not found to be important for identification of voice samples disguised by constricting tract, pulling cheeks, raising pitch and obstacle in mouth.The values of F4 in samples disguised by these techniques showed significant variations from their control counterparts (at significance level of 0.05).
Fifth formant (F5): F5 was found to be crucial parameter for identification of voice samples disguised in state of throat infection, state of cold, changing accent, pinching nostrils and constricting tract.The values of F5 in samples disguised by these techniques showed no significant variations from their control counterparts (at significance level of 0.05).
F5 does not found to be important for identification of voice samples disguised by whispering, obstacle in mouth, mimicry, simulation of anger, covering mouth, raising pitch, lowering pitch and pulling cheeks.The values of F5in samples disguised by these techniques showed significant variations from their control counterparts (at significance level of 0.05).
Energy contour: Energy pattern was found to be crucial parameter for identification of voice samples disguised by lowering pitch, raising pitch, pulling cheeks, pinching nostrils, change of accent and simulation of anger.The values of energy in samples disguised by these techniques showed no significant variations from their control counterparts (at significance level of 0.05).
Energy pattern does not found to be important for identification of voice samples disguised by constricting tract, covering mouth, in state of cold, mimicry, obstacle in mouth, in throat infection and whispering.The values of energy in samples disguised by these techniques showed significant variations from their control counterparts (at significance level of 0.05).

Figure 1 :
Figure 1: Chart showing the disguise techniques preferred by the different speakers (Total, N=200)

Figure 2 :
Figure 2: Observation sheet maintained for recording the results of auditory analysis

Table 1 :
). Pearson correlation coefficient for speech quality between disguised & control voice samples of both males and females (TOTAL, N=200) Volume 4 | Issue 4 Annex Publishers | www.annexpublishers.com

Table 2 :
). p-values for chi-square test for speech quality of disguised and control voice samples (TOTAL, N=200; df=2)

Table 3 :
Pearson correlation coefficient for speech delivery between disguised & control voice samples of both males and females (TOTAL, N=200) Annex Publishers | www.annexpublishers.comVolume 4 | Issue 4

Table 4 :
p-values for chi-square test for delivery of speech of disguised and control voice samples (TOTAL, N=200; df=1)

Table 5 :
Pearson correlation coefficient for degree of phonation between disguised & control voice samples of both males and females subjects (TOTAL, N=200)

Table 6 :
p-values for chi-square test for degree of phonation of disguised and control voice samples (TOTAL, N=200; df=2)

Table 7 :
Pearson correlation coefficient for flow of speech between disguised & control voice samples of both males and females subjects (TOTAL, N=200)

Table 9 :
Pearson correlation coefficient for speech rate between disguised & control voice samples of both males and females (TOTAL, N=200)

Table 10 :
p-values for chi-square test for speech rate of disguised and control voice samples (TOTAL, N=200; df=2)

Table 11 :
). Pearson correlation coefficient for dynamic loudness between disguised & control voice samples of both male and female subjects (TOTAL, N=200)

Table 12 :
pp-values for chi-square test for dynamic loudness of disguised and control voice samples (TOTAL, N=200; df=2)