Frame blocking and windowing: Frame blocking method is employed to extract the feature parameters. Each of the frameis windowed (Gupta 2016) to minimize the discontinuities of the signal at boththe ends of frame. The windowing process can be expressed as follows: where denotes the windowed independent components inthe frequency bin, and ?(n) denotes Hamming window (Gupta 2016) and it is definedas Linear Predictive Coding Coefficient(LPCC):The feature extraction isused to represent speech signal into a finite number of measures of signal.Each feature represents the spectrum of speech signal in a windowed frame. Thecoefficients taken from auto regressive model minimizes the difference betweenreckoned and pristine value.
LPCanalysis (Gupta 2016) is an effective method to estimate the parameters ofspeech signals. Where ?denotes the autocorrelation matrixTheautocorrelation vector is given as The filtercoefficients vectors are given as follows The matrix of equations that need to be solved is = Where nrepresents the autocorrelation function of a windowed speech signal.Cepstral analysis is the process offinding the cepstrum of a speech sequence.
Cepstral coefficients (Gupta 2016)can be reckoned from the LPC via a set of recursive procedure. The cepstralcoefficients obtained in this way are called Linear Predictive CepstralCoefficients (LPCC). The recursive procedure is given as follows For 1 ? n ? p For n > p Thereby the resulting speech signals are linearcombination of the previous p samples. Therefore, the speech production modelcan also be defined as linear prediction model or the autoregressive model.Here ?p? indicates the order of the LPC analysis and the excitation signal emreckoned here can be termed as prediction error signal or residual signal forLPC analysis. LPC analysis results in reckoning of smoothed spectrum so most ofthe influence of the excitation is discarded.DynamicTime Warping (DTW): End Extracting feature parameters (LPCC) Frequency bin Inverse Establish new model Check if it is last frequency bin Need inverse Reference model Frame blocking and windowing DTW algorithm Y N NN3 YN3 Fig3. Flow chart for Dynamic TimeWarpingDynamic Time Warping is used forreckoning of distance between two time series.
A time series is a list ofsamples taken from a pristine signal and they are ordered by the time at whichtheir corresponding samples were obtained.The matching distance can be usedbetween two time series to resample one of them followed by making comparisonin sample-by-sample The drawback here is, that it does not produceintuitive results because the compared samples may not correspond well. The DTWalgorithm removes this discrepancy by reckoning the optimal alignments betweenthe sample points in the two time series. The algorithm is called “time warping” because it warpsthe axes of the two time series in such a way that the corresponding sampleswill appear at same location on a common time axis The adjustment matrix P can be determined by minimumdistortion of independent components between two adjacentfrequency bins The minimum distortion can be obtained between theindependent components and .
The maximum value in the correlativecoefficients will located in the diagonal and the maximum correlativecoefficient sets are 1 and the others are 0. Then the two adjustment matricescan be represented as or When the adjustment matrix P (Noboru Murata 2001) is Then it is understood that the independent components locatedat the diagonal come from the same speech source and the position does need notto be adjusted. Otherwise, it should be inversed.
Thus the permutation ambiguity getssolved by multiplying permutation matrix with the scaled independentcomponents. Let it is given as Then, the independent component is multiplied by the permutation matrix P, Similarly the independent component is multiplied by the permutation matrix P (D. S.Jayaraman 2002) , Finally a new reference template will beoriginated and this will replace the previously stored templatePerceptualEstimation of Speech Quality (PESQ):Qualityevaluation for speech processing is important in the field of BSS when speechsignal is taken into account, which has been growing in the recent years. For convolutive BSS, the quality of algorithms is reckonedusing signal-to-interference ratio but it requires the knowledge of mixingconditions. It is found to be difficult to determine the signal-to-interferenceratio in an real time environment.
So Perceptual Estimation of SpeechQuality is adapted. In PESQ, both thereference signal REF and degraded signal DEG will be sampled at Hz. It can measure both NB-PESQ (narrowbandPESQ measure) as well as WB-PESQ (wideband PESQ measure). It supports bothmodes through the MODE parameter. Using the score value PESQ can be determined. Simulated ResultsStep1: Initially three input signals weregiven and their spectrogram representation is given in x-y axis as Time VsAmplitude Fig4.
Input signals gets read andtheir spectrogram representationThe input signal is taken at range <1x5121 double>, noclipping is done at this range to obtain full fidelity of the signal. If thesignal exceeds the range then audio is clipped at ‘-1’ to ‘+1’Step2: The three input signals were mixed andtheir spectrogram representation in x-y axis as follows Fig5. Mixed input signalThe mixing value range is at’3x50000double’. The mixing signal obtained by generating random matrix fromthe given input signal and multiplying it with the transpose of each inputsignalStep3:Here the three mixed signal gets separated. Using RKHS thehigher order feature parameters are extracted and ambiguity gets overcome byDTW.
Fig6.Seperation of target signal fromthe mixed signal Step4:Here the PESQ was estimated for theoriginal and the estimated signal. The PESQ is obtained by taking the scorevalue. The score value is estimated for all the three signals.