(1) Department of Electrical and Computer Engineering,
Engineering Science Building,
The University of Texas at Austin,
Austin, TX 78712-1084 USA
arslan@ece.utexas.edu -
bevans@ece.utexas.edu
(2) Department of Electronics and Telecommunication Engineering,
Yildiz Technical University,
80750 Istanbul, Turkey
sakarya@ana.cc.yildiz.edu.tr
(3) Wireless Technology Laboratory,
Lucent Technologies,
Holmdel, NJ 07733-3030 USA
sakarya@lucent.com
We are not determining where the source is. The neural network is taking care of it. We calculate the cross-correlations between sensors. In the far field, the time delay (or phase difference for the cross-correlations) is constant for all sensors. In the near field, it is not. So there is a unique pattern in the cross-correlation coefficients and the neural network can decide which approximation it should use. Actually the far field case is an approximation of the near field case. If you have a technique which works in the near field, then it also should work for the far field.
Our speech model is a sum of several sinusoids that models the strong harmonics in the speech signal. So that should not be a problem.
One problem with our model is that it does not account for echo. Our model assumes that speech is coming from only one direction. Most of the applications you would use a speaker localization system would have echos.
Last Updated 08/07/99.