Frame blocking and windowing:

Frame blocking method is employed to extract the feature parameters. Each of the frame

is windowed (Gupta 2016) to minimize the discontinuities of the signal at both

the ends of frame. The windowing process can be expressed as follows:

where

denotes the windowed independent components in

the

frequency bin, and ?(n) denotes Hamming window (Gupta 2016) and it is defined

as

Linear Predictive Coding Coefficient

(LPCC):

The feature extraction is

used to represent speech signal into a finite number of measures of signal.

Each feature represents the spectrum of speech signal in a windowed frame. The

coefficients taken from auto regressive model minimizes the difference between

reckoned and pristine value.

LPC

analysis (Gupta 2016) is an effective method to estimate the parameters of

speech signals.

Where ?

denotes the

autocorrelation matrix

The

autocorrelation vector is given as

The filter

coefficients vectors are given as follows

The matrix of equations that need to be solved is

=

Where

n

represents the autocorrelation function of a windowed speech signal.

Cepstral analysis is the process of

finding the cepstrum of a speech sequence. Cepstral coefficients (Gupta 2016)

can be reckoned from the LPC via a set of recursive procedure. The cepstral

coefficients obtained in this way are called Linear Predictive Cepstral

Coefficients (LPCC).

The recursive procedure is given as follows

For 1 ? n ? p

For n > p

Thereby the resulting speech signals are linear

combination of the previous p samples. Therefore, the speech production model

can also be defined as linear prediction model or the autoregressive model.

Here ?p? indicates the order of the LPC analysis and the excitation signal em

reckoned here can be termed as prediction error signal or residual signal for

LPC analysis. LPC analysis results in reckoning of smoothed spectrum so most of

the influence of the excitation is discarded.

Dynamic

Time Warping (DTW):

End

Extracting feature parameters

(LPCC)

Frequency bin

Inverse

Establish new model

Check if it is last frequency bin

Need

inverse

Reference model

Frame blocking and windowing

DTW algorithm

Y

N

NN3

YN3

Fig3. Flow chart for Dynamic Time

Warping

Dynamic Time Warping is used for

reckoning of distance between two time series. A time series is a list of

samples taken from a pristine signal and they are ordered by the time at which

their corresponding samples were obtained.The matching distance can be used

between two time series to resample one of them followed by making comparison

in sample-by-sample

The drawback here is, that it does not produce

intuitive results because the compared samples may not correspond well. The DTW

algorithm removes this discrepancy by reckoning the optimal alignments between

the sample points in the two time series. The algorithm is called “time warping” because it warps

the axes of the two time series in such a way that the corresponding samples

will appear at same location on a common time axis

The adjustment matrix P can be determined by minimum

distortion

of independent components between two adjacent

frequency bins

The minimum distortion can be obtained between the

independent components

and

. The maximum value in the correlative

coefficients will located in the diagonal and the maximum correlative

coefficient sets are 1 and the others are 0. Then the two adjustment matrices

can be represented as

or

When the adjustment matrix P (Noboru Murata 2001) is

Then it is understood that the independent components located

at the diagonal come from the same speech source and the position does need not

to be adjusted. Otherwise, it should be inversed.

Thus the permutation ambiguity gets

solved by multiplying permutation matrix with the scaled independent

components. Let it is given as

Then, the independent component

is multiplied by the permutation matrix P,

Similarly the independent component

is multiplied by the permutation matrix P (D. S.

Jayaraman 2002) ,

Finally a new reference template will be

originated and this will replace the previously stored template

Perceptual

Estimation of Speech Quality (PESQ):

Quality

evaluation for speech processing is important in the field of BSS when speech

signal is taken into account, which has been growing in the recent years. For convolutive BSS, the quality of algorithms is reckoned

using signal-to-interference ratio but it requires the knowledge of mixing

conditions. It is found to be difficult to determine the signal-to-interference

ratio in an real time environment. So Perceptual Estimation of Speech

Quality is adapted. In PESQ, both the

reference signal REF and degraded signal DEG will be sampled at

Hz. It can measure both NB-PESQ (narrowband

PESQ measure) as well as WB-PESQ (wideband PESQ measure). It supports both

modes through the MODE parameter. Using the score value PESQ can be determined.

Simulated Results

Step1:

Initially three input signals were

given and their spectrogram representation is given in x-y axis as Time Vs

Amplitude

Fig4. Input signals gets read and

their spectrogram representation

The input signal is taken at range <1x5121 double>, no

clipping is done at this range to obtain full fidelity of the signal. If the

signal exceeds the range then audio is clipped at ‘-1’ to ‘+1’

Step2:

The three input signals were mixed and

their spectrogram representation in x-y axis as follows

Fig5. Mixed input signal

The mixing value range is at

‘3x50000double’. The mixing signal obtained by generating random matrix from

the given input signal and multiplying it with the transpose of each input

signal

Step3:

Here the three mixed signal gets separated. Using RKHS the

higher order feature parameters are extracted and ambiguity gets overcome by

DTW.

Fig6.Seperation of target signal from

the mixed signal

Step4:

Here the PESQ was estimated for the

original and the estimated signal. The PESQ is obtained by taking the score

value. The score value is estimated for all the three signals.