HMM (Hidden Markov Model) can be model asa sequence of feature vector quite accurately where as GMM (Gaussian Mixture Model)takes only single feature vector corresponding to single frame. HMM is efficientfor text-dependent and GMM is good for text-independent task. The patternmatching is shown above in Fig. 2.
3.Different number of classes defined foreach speech frame first like unvoiced or voiced then it will use vectorquantization in order to group feature vector according to their similarity andthe last phase will use speech sounds information.In the training phase pattern matchingalgorithm will use whole training feature vectors to make speaker models. Pervoice and per speaker for each model will be created.
The model will be calledinitial model architecture and re-estimating and values them accordingly. Lastly, in the testing phase, featurevectors will be evaluated with the past trained models and likelihood scorei.e. the probability of the given voice sound arises by speaker model will beoutputted for each speakers 72.Thepatter matching algorithm used in this thesis to function text-dependent data.Thus, stochastic model are used with the combinations of GMMs and HMMs. Somespeaker independent sequential information is quite interesting for text-dependentspeaker identification and verification which will b extracted from the randomfeature vector sequence.
2.5 Gaussian MixtureModel (GMM)The GMM was first introduced by Rose andReynold, 1995 and is mainly used speaker model because it has ability to modelrandom choice shaped probability density functions (pdfs) using superpositionof multivibrate Gaussian. For the diagonal covariance matrix this is even truewhen the loss in expressible induced by the Gaussians being restricted to acircular area can be suffering using more Gaussians. On using diagonalcovariance will help to boost recognition performance less parameters of themodel can be estimated more comfortably from the limited training data.
Themain reason for choosing such model formulation is that each mixture models anunderlying large speech sounds class present in a speaker voice. GMM consists of a mixture with MGaussians, where M completely depends non-linearly on the context and size ofthe training data provided by the user.Atypical value of M is 32 for characteristics feature dimensions for the rangeof 12 to 26. D-dimensional employs for each mixture with mean vector anddiagonal covariance vector , weighted by a factor ‘w’ so that the overallmass is 1and models forms a distribution. The