{"id":176,"date":"2016-02-29T14:37:17","date_gmt":"2016-02-29T13:37:17","guid":{"rendered":"http:\/\/project.inria.fr\/ssse\/?p=176"},"modified":"2017-06-21T10:16:21","modified_gmt":"2017-06-21T08:16:21","slug":"4-multichannel-speech-activity-detection-localization-and-tracking","status":"publish","type":"post","link":"https:\/\/project.inria.fr\/ssse\/4-multichannel-speech-activity-detection-localization-and-tracking\/","title":{"rendered":"4. Multichannel source activity detection, localization, and tracking"},"content":{"rendered":"<h1>Exercises:<\/h1>\n<hr \/>\n<h2>Exercise 1: GCC-PHAT &amp; Acoustic Maps<\/h2>\n<p>Given a speech signal of a static human source recorded in a real multi-channel acquisition set up, compute the GCC-PHAT focusing on:<\/p>\n<ul>\n<li>temporal evolution of the GCC-PHAT due to speech sparsity;<\/li>\n<li>behaviour of GCC-PHAT at different microphone pairs.<\/li>\n<\/ul>\n<p>Using the computed GCC-PHAT, derive the corresponding GCF (SRP-PHAT) acoustic map.<\/p>\n<p>The <a href=\"http:\/\/project.inria.fr\/ssse\/files\/2016\/08\/Chap04_ex1.zip\">package<\/a> include:<a href=\"http:\/\/project.inria.fr\/ssse\/files\/2016\/02\/layout.png\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-341 alignright\" src=\"http:\/\/project.inria.fr\/ssse\/files\/2016\/02\/layout-300x176.png\" alt=\"layout\" width=\"300\" height=\"176\" srcset=\"https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/layout-300x176.png 300w, https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/layout-250x147.png 250w, https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/layout-150x88.png 150w, https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/layout.png 604w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<ul>\n<li>audio signals;<\/li>\n<li>Matlab scripts with:\n<ul>\n<li>functions to read and de-interlace audio files;<\/li>\n<li>a simple implementation of GCC-PHAT;<\/li>\n<li>microphone and source nominal positions;<\/li>\n<li>pseudo-code for GCF (SRP-PHAT) computation.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>The figure on the left shows the layout of the experimental set up. The source is oriented downward. Note that the Matlab script is not optimized to allow an easier reading.<\/p>\n<hr \/>\n<h2>Exercise 2: Effect of microphone array geometry on source localization<\/h2>\n<p>For a near-field source, the TDOA equation is written in Chapter 3 of the book (Eq.3.8). A\u00a0small error\u00a0in the TDOA measurement is reflected on the source position <strong>s<\/strong> in a way that depends on the microphone and sensor geometry. By investigating the partial derivative of TDOA w.r.t. source position \u2202\u03c4\/\u2202<strong>s<\/strong>, i.e. the Jacobian matrix <strong>J<\/strong>, we can quantify the expected behavior of the localization error with Gaussian noise in the TDOA estimate \u03c4\u00a0(zero mean, standard deviation \u03c3).<\/p>\n<p><strong>Question 1<\/strong>: Derive the Jacobian matrix <strong>J<\/strong>. Hint: use vector form of TDOA equation,\u03c4<sub>ii&#8217;<\/sub>= (||<strong>s<\/strong>&#8211;<strong>m<\/strong><sub>i<\/sub>||-||<strong>s<\/strong>&#8211;<strong>m<\/strong><sub>i&#8217;<\/sub>||)\/c and differentiate it with respect to vector <strong>s<\/strong>, where <strong>m<\/strong><sub>i<\/sub>\u00a0and\u00a0<strong>m<\/strong><sub>i&#8217; <\/sub>are the two<em>\u00a0<\/em>microphone position of a single microphone pair.<\/p>\n<p><strong>Question 2<\/strong>: Take a look at the Matlab program code to reproduce the position error figures in Chapter 4. (You need to download an external Matlab tool called <a href=\"https:\/\/se.mathworks.com\/matlabcentral\/fileexchange\/4705-error-ellipse\">error_ellipse<\/a>, that plots the error ellipse using a mean and covariance matrix).<\/p>\n<p>Provided material: Matlab implementation for the error analysis: <a href=\"http:\/\/project.inria.fr\/ssse\/files\/2016\/02\/partial_derivative_of_position.m.zip\">error visualization code<\/a><\/p>\n<p>Note that only two microphone pairs are used. Consider the two following exercises:<\/p>\n<ol>\n<li>In the code, replace the current microphone setup with\u00a0a circular microphone array of 8 elements and diameter 15 cm. Place a source at 2m distance from array center. How does the error ellipse look like?<\/li>\n<li>Intuitively, in such geometry estimating the 3D source position is difficult, while direction of arrival is easier to obtain. Does the resulting error ellipse support this or not?<\/li>\n<\/ol>\n<hr \/>\n<h2>Exercise 3: Tracking with particle filter<\/h2>\n<p>This exercise deals with tracking a moving source in a distributed microphone recording set up consisting of 7 triplet of microphones. Data are recorded in FBK labs and the ground truth trajectory was obtained using a multi-camera target tracking system. The speaker is silent twice during the recording. The figures below show the recording set up, the speaker trajectory and the audio file recorded at 1 channel of the acquisition network.<\/p>\n<div id=\"attachment_395\" style=\"width: 261px\" class=\"wp-caption alignright\"><a href=\"http:\/\/project.inria.fr\/ssse\/files\/2016\/02\/trajectory.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-395\" class=\"wp-image-395\" src=\"http:\/\/project.inria.fr\/ssse\/files\/2016\/02\/trajectory-300x225.png\" width=\"251\" height=\"188\" srcset=\"https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/trajectory-300x225.png 300w, https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/trajectory-768x576.png 768w, https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/trajectory-1024x768.png 1024w, https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/trajectory-200x150.png 200w, https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/trajectory-150x113.png 150w, https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/trajectory.png 1200w\" sizes=\"auto, (max-width: 251px) 100vw, 251px\" \/><\/a><p id=\"caption-attachment-395\" class=\"wp-caption-text\">Source trajectory in the x- and y-coordinate<\/p><\/div>\n<div id=\"attachment_394\" style=\"width: 261px\" class=\"wp-caption alignright\"><a href=\"http:\/\/project.inria.fr\/ssse\/files\/2016\/02\/mikeandtrajectory.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-394\" class=\"wp-image-394\" src=\"http:\/\/project.inria.fr\/ssse\/files\/2016\/02\/mikeandtrajectory-300x225.png\" width=\"251\" height=\"188\" srcset=\"https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/mikeandtrajectory-300x225.png 300w, https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/mikeandtrajectory-768x576.png 768w, https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/mikeandtrajectory-1024x768.png 1024w, https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/mikeandtrajectory-200x150.png 200w, https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/mikeandtrajectory-150x113.png 150w, https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/mikeandtrajectory.png 1200w\" sizes=\"auto, (max-width: 251px) 100vw, 251px\" \/><\/a><p id=\"caption-attachment-394\" class=\"wp-caption-text\">Position of the microphone triplets and speaker trajectory. Green dots indicate trajectory portions where the source was active, red segments correspond to silence.<\/p><\/div>\n<div id=\"attachment_393\" style=\"width: 261px\" class=\"wp-caption alignright\"><a href=\"http:\/\/project.inria.fr\/ssse\/files\/2016\/02\/ch1-1.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-393\" class=\"wp-image-393\" src=\"http:\/\/project.inria.fr\/ssse\/files\/2016\/02\/ch1-1-300x225.png\" width=\"251\" height=\"188\" srcset=\"https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/ch1-1-300x225.png 300w, https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/ch1-1-768x576.png 768w, https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/ch1-1-1024x768.png 1024w, https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/ch1-1-200x150.png 200w, https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/ch1-1-150x113.png 150w, https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/ch1-1.png 1200w\" sizes=\"auto, (max-width: 251px) 100vw, 251px\" \/><\/a><p id=\"caption-attachment-393\" class=\"wp-caption-text\">Audio signal (channel 1)<\/p><\/div>\n<p>The <a href=\"http:\/\/project.inria.fr\/ssse\/files\/2016\/02\/Chap04_ex3.zip\">package<\/a> includes:<\/p>\n<ul>\n<li>Audio files: raw 16 bit, 14 interlaced channels at 44.1 kHz.<\/li>\n<li>Reference trajectory=[time x\u00a0 y]n<\/li>\n<li>Matlab script to be completed with localization and tracking algorithms<\/li>\n<\/ul>\n<p><strong>Question<\/strong>: Complete the script filling in the empty spaces and implementing the GCC-PHAT (see Ex.1) and the particle filtering (see the pseudo code below).<\/p>\n<p>&nbsp;<\/p>\n<hr \/>\n<h2>Exercise 4: non linear phase combination<\/h2>\n<p>Modeling the phase of the cross-spectrum is a crucial aspect in many speech processing techniques. In presence of early arrivals, the linear anechoic phase is non-linearly modified. Considering a simple finite impulse response (FIR) modeling of the room impulse response (RIR) as:<\/p>\n<p>H(<em>f<\/em>)=\u03a3<em>h<sub>i<\/sub><\/em>exp(<em>-j2\u03c0f\u03c4<sub>i<\/sub><\/em>),<\/p>\n<p>where <em>f<\/em> is frequency,<em>\u03c4<sub>i<\/sub><\/em>\u00a0is the sound propagation path&#8217;s time of flight to the i-th microphone, and the sum is taken over several image source positions with different amplitudes. Each image source models the sound propagation path of a reflection. In this exercise we observe how the phase of the cross-spectrum changes as isolated early arrivals are added to the direct propagation path.<\/p>\n<p>Given a microphone pair <strong>m<\/strong>, the magnitude-normalized or whitened cross spectrum is:<br \/>\n<em>\u03a8<sub>m<\/sub>(f)=H<sub>m1<\/sub>(f)H<sup>*<\/sup><sub>m2<\/sub>(f)\/(|H<sub>m1<\/sub>(f)|\u00b7|H<sub>m2<\/sub>(f)|),<\/em><\/p>\n<p>where <strong>m<sub>1<\/sub><\/strong> and <strong>m<sub>2<\/sub><\/strong> denote two separate microphones, and |<em>x<\/em>| denotes the absolute value of <em>x<\/em>.<\/p>\n<h3>Anechoic model<\/h3>\n<p>Let us consider an anechoic model, where both RIRs consists of a single tap associated to the direct path between the source and the microphones:<br \/>\nH<sub>m1<\/sub>(f)=h<sub>m1,0<\/sub>exp(-j2\u03c0f\u03c4<sub>m1,0<\/sub>)<br \/>\nH<sub>m2<\/sub>(f)=h<sub>m2,0<\/sub>exp(-j2\u03c0f\u03c4<sub>m2,0<\/sub>),<br \/>\nwhere\u00a0\u03c4<sub>m1,0 \u00a0<\/sub>and \u03c4<sub>m2,0<\/sub>\u00a0denote the time of flights of the direct path signals for the two microphones.<\/p>\n<p>The phase of the whitened cross spectrum simplifies as a linear phase:<br \/>\n\u2220\u03a8<sub>m<\/sub>(f) =\u2220[H<sub>m1<\/sub>(f)H<sup>*<\/sup><sub>m2<\/sub>(f)\/(|H<sub>m1<\/sub>(f)|\u00b7|H<sub>m2<\/sub>(f)|)]<br \/>\n=\u00a0\u2220[h<sub>m1,0<\/sub>exp(-j2\u03c0f\u03c4<sub>m1,0<\/sub>)\u00b7h<sub>m2,0<\/sub>exp(j2\u03c0f\u03c4<sub>m2,0<\/sub>)\/(h<sub>m1,0<\/sub>\u00b7h<sub>m2,0<\/sub>)]<br \/>\n=\u2220[exp(-j2\u03c0f(\u03c4<sub>m1,0<\/sub>-\u03c4<sub>m2,0<\/sub>))]<br \/>\n=-2\u03c0f(\u03c4<sub>m1,0<\/sub>-\u03c4<sub>m2,0<\/sub>).<\/p>\n<p>To summarize, in anechoic conditions the whitened cross-spectrum of the received wavefront consists of a linear phase component that depends on frequency <em>f<\/em> and whose slope is a function of the time difference of arrival between the wavefronts \u00a0\u03c4<sub>m1,0<\/sub>-\u03c4<sub>m2,0<\/sub>. The figure below describes two impulse responses (h1 and h2) and the resulting (unwrapped) phase component of the cross-spectrum \u2220\u03a8<sub>m<\/sub>(f). What is the phase component\u2220\u03a8<sub>m<\/sub>(f) value when the wavefront arrives at\u00a0the microphones simultaneously, i.e.\u00a0\u03c4<sub>m1,0<\/sub>=\u03c4<sub>m2,0<\/sub>?<\/p>\n<p><a href=\"http:\/\/project.inria.fr\/ssse\/files\/2016\/02\/RIR_0.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-458\" src=\"http:\/\/project.inria.fr\/ssse\/files\/2016\/02\/RIR_0-300x225.png\" alt=\"rir_0\" width=\"300\" height=\"225\" srcset=\"https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/RIR_0-300x225.png 300w, https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/RIR_0-200x150.png 200w, https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/RIR_0-150x113.png 150w, https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/RIR_0.png 560w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><a href=\"http:\/\/project.inria.fr\/ssse\/files\/2016\/02\/phase0.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-459\" src=\"http:\/\/project.inria.fr\/ssse\/files\/2016\/02\/phase0-300x225.png\" alt=\"phase0\" width=\"300\" height=\"225\" srcset=\"https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/phase0-300x225.png 300w, https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/phase0-200x150.png 200w, https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/phase0-150x113.png 150w, https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/phase0.png 560w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<h3>Echoic model: one early\u00a0arrival<\/h3>\n<p>Let&#8217;s now consider a case where the first microphone&#8217;s RIR is anechoic and the second microphone receives one reflected wavefront. The RIRs can be written as:<br \/>\nH<sub>m1<\/sub>(f)=h<sub>m1,0<\/sub>exp(-j2\u03c0f\u03c4<sub>m1,0<\/sub>)<br \/>\nH<sub>m2<\/sub>(f)=h<sub>m2,0<\/sub>exp(-j2\u03c0f\u03c4<sub>m2,0<\/sub>) + h<sub>m2,1<\/sub>exp(-j2\u03c0f\u03c4<sub>m2,1<\/sub>),<br \/>\nwhere \u03c4<sub>m2,1<\/sub>\u00a0is the time of flight of the reflected wavefront, and h<sub>m2,1<\/sub>\u00a0is the amplitude of the reflected wavefront at microphone 2. The whitened cross-spectrum of these two signals becomes<\/p>\n<p>\u03a8<sub>m<\/sub>(f) = H<sub>m1<\/sub>(f)H<sup>*<\/sup><sub>m2<\/sub>(f)\/(|H<sub>m1<\/sub>(f)|\u00b7|H<sub>m2<\/sub>(f)|)<br \/>\n= \u03b1<sub>0<\/sub>(f)\u00b7exp(-j2\u03c0f(\u03c4<sub>m1,0<\/sub>-\u03c4<sub>m2,0<\/sub>))+\u03b1<sub>1<\/sub>(f)\u00b7exp(-j2\u03c0f(\u03c4<sub>m1,0<\/sub>-\u03c4<sub>m2,1<\/sub>)),<br \/>\nwhere\u00a0\u03b1<sub>0<\/sub>(f) is (h<sub>m1,0\u00a0<\/sub>\u00b7h<sub>m2,0<\/sub>)\/\u03b3, \u03b1<sub>1<\/sub>(f)\u00a0is (h<sub>m1,0<\/sub>\u00b7h<sub>m2,1<\/sub>)\/\u03b3, and\u00a0\u03b3=|h<sub>m1,0<\/sub>|\u00b7|h<sub>m2,0<\/sub>exp(-j2\u03c0f\u03c4<sub>m2,0<\/sub>) + h<sub>m2,1<\/sub>exp(-j2\u03c0f\u03c4<sub>m2,1<\/sub>)|<\/p>\n<p><a href=\"http:\/\/project.inria.fr\/ssse\/files\/2016\/02\/RIR_1.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-460\" src=\"http:\/\/project.inria.fr\/ssse\/files\/2016\/02\/RIR_1-300x225.png\" alt=\"rir_1\" width=\"300\" height=\"225\" srcset=\"https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/RIR_1-300x225.png 300w, https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/RIR_1-200x150.png 200w, https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/RIR_1-150x113.png 150w, https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/RIR_1.png 560w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><a href=\"http:\/\/project.inria.fr\/ssse\/files\/2016\/02\/phase1.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-461\" src=\"http:\/\/project.inria.fr\/ssse\/files\/2016\/02\/phase1-300x225.png\" alt=\"phase1\" width=\"300\" height=\"225\" srcset=\"https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/phase1-300x225.png 300w, https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/phase1-200x150.png 200w, https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/phase1-150x113.png 150w, https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/phase1.png 560w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>The second arrival modulates the anechoic linear phase, depending on the relative energy \u03bb=h<sub>m2,1\u00a0<\/sub>\/\u00a0h<sub>m2,0<\/sub>\u00a0of the arrival, see figure above.<\/p>\n<h3>Echoic model: multiple arrivals<\/h3>\n<p>Let&#8217;s now consider two more complex cases.<\/p>\n<ol>\n<li>Both RIRs have one reflected path, and the whitened\u00a0cross-spectrum consists of the cross-terms of each component:<br \/>\n\u03a8<sub>m<\/sub>(f)=\u03b1<sub>0<\/sub>(f)exp(-j2\u03c0f(\u03c4<sub>m1,0<\/sub>-\u03c4<sub>m2,0<\/sub>))+\u03b1<sub>1<\/sub>(f)exp(-j2\u03c0f(\u03c4<sub>m1,0<\/sub>-\u03c4<sub>m2,1<\/sub>))+\u03b1<sub>2<\/sub>(f)exp(-j2\u03c0f(\u03c4<sub>m1,1<\/sub>-\u03c4<sub>m2,0<\/sub>))+\u03b1<sub>3<\/sub>(f)exp(-j2\u03c0f(\u03c4<sub>m1,1<\/sub>-\u03c4<sub>m2,1<\/sub>))<br \/>\n<a href=\"http:\/\/project.inria.fr\/ssse\/files\/2016\/02\/RIR_2.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-462\" src=\"http:\/\/project.inria.fr\/ssse\/files\/2016\/02\/RIR_2-300x225.png\" alt=\"rir_2\" width=\"300\" height=\"225\" srcset=\"https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/RIR_2-300x225.png 300w, https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/RIR_2-200x150.png 200w, https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/RIR_2-150x113.png 150w, https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/RIR_2.png 560w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><a href=\"http:\/\/project.inria.fr\/ssse\/files\/2016\/02\/phase2.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-463\" src=\"http:\/\/project.inria.fr\/ssse\/files\/2016\/02\/phase2-300x225.png\" alt=\"phase2\" width=\"300\" height=\"225\" srcset=\"https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/phase2-300x225.png 300w, https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/phase2-200x150.png 200w, https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/phase2-150x113.png 150w, https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/phase2.png 560w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/li>\n<li>One RIR has 3 arrivals (cross-spectrum similarly consists of all cross-terms):<br \/>\n<a href=\"http:\/\/project.inria.fr\/ssse\/files\/2016\/02\/RIR_3.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-464\" src=\"http:\/\/project.inria.fr\/ssse\/files\/2016\/02\/RIR_3-300x225.png\" alt=\"rir_3\" width=\"300\" height=\"225\" srcset=\"https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/RIR_3-300x225.png 300w, https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/RIR_3-200x150.png 200w, https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/RIR_3-150x113.png 150w, https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/RIR_3.png 560w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><a href=\"http:\/\/project.inria.fr\/ssse\/files\/2016\/02\/phase3.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-465\" src=\"http:\/\/project.inria.fr\/ssse\/files\/2016\/02\/phase3-300x225.png\" alt=\"phase3\" width=\"300\" height=\"225\" srcset=\"https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/phase3-300x225.png 300w, https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/phase3-200x150.png 200w, https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/phase3-150x113.png 150w, https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/phase3.png 560w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/li>\n<\/ol>\n<p><strong>Question:<\/strong> Modify the provided\u00a0<a href=\"http:\/\/project.inria.fr\/ssse\/files\/2016\/11\/nonlinearPhase.zip\">Matlab script<\/a> to evaluate what happens when<br \/>\n1) the number of taps in the RIRs is inceased;<br \/>\n2) the relative amplitude of the reflections is increased;<\/p>\n<h1>Particle filtering<\/h1>\n<p>As discussed in the text, particle filtering (PF) is a tracking method suitable for using acoustic maps to estimate speaker trajectory. Several versions of PFs exist, and the basic method of &#8220;bootstrap filter&#8221; or Sequential Importance Resampling (SIR), see [1] is described in Algorithm 2 and resampling in Algorithm 1 (after [1] and [2]). An example of Matlab implementation of Algorithm 1 is available <a href=\"http:\/\/project.inria.fr\/ssse\/files\/2017\/05\/resample.zip\">here.<\/a><\/p>\n<p><a href=\"http:\/\/project.inria.fr\/ssse\/files\/2016\/02\/ch04-sir.png\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-296 alignleft\" src=\"http:\/\/project.inria.fr\/ssse\/files\/2016\/02\/ch04-sir-1024x465.png\" alt=\"ch04-sir\" width=\"543\" height=\"247\" srcset=\"https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/ch04-sir-1024x465.png 1024w, https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/ch04-sir-300x136.png 300w, https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/ch04-sir-768x349.png 768w, https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/ch04-sir-250x113.png 250w, https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/ch04-sir-150x68.png 150w, https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/ch04-sir.png 1062w\" sizes=\"auto, (max-width: 543px) 100vw, 543px\" \/><\/a><a href=\"http:\/\/project.inria.fr\/ssse\/files\/2016\/02\/res2.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-506\" src=\"http:\/\/project.inria.fr\/ssse\/files\/2016\/02\/res2-290x300.png\" alt=\"\" width=\"335\" height=\"347\" srcset=\"https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/res2-290x300.png 290w, https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/res2-145x150.png 145w, https:\/\/project.inria.fr\/ssse\/files\/2016\/02\/res2.png 310w\" sizes=\"auto, (max-width: 335px) 100vw, 335px\" \/><\/a><\/p>\n<p>Typical modeling of the target dynamics are (depending on the target state and the application scenario under investigation):<\/p>\n<ul>\n<li>Gaussian noise<\/li>\n<li>Langevin<\/li>\n<\/ul>\n<p>[1]\u00a0Arulampalam, M. S., Maskell, S., Gordon, N., &amp; Clapp, T. (2002). A tutorial on particle filters for online nonlinear\/non-Gaussian Bayesian tracking. IEEE Transactions on signal processing, 50(2), 174-188.<\/p>\n<p>[2] Stone, L. D., Streit, R. L., Corwin, T. L., Bell, K. L., Bayesian Multiple Target Tracking, 2nd edition, Artec House, p.100, 2014.<\/p>\n<hr \/>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Exercises: Exercise 1: GCC-PHAT &amp; Acoustic Maps Given a speech signal of a static human source recorded in a real multi-channel acquisition set up, compute the GCC-PHAT focusing on: temporal evolution of the GCC-PHAT due to speech sparsity; behaviour of GCC-PHAT at different microphone pairs. Using the computed GCC-PHAT, derive\u2026<\/p>\n<p> <a class=\"continue-reading-link\" href=\"https:\/\/project.inria.fr\/ssse\/4-multichannel-speech-activity-detection-localization-and-tracking\/\"><span>Continue reading<\/span><i class=\"crycon-right-dir\"><\/i><\/a> <\/p>\n","protected":false},"author":983,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-176","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/project.inria.fr\/ssse\/wp-json\/wp\/v2\/posts\/176","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/project.inria.fr\/ssse\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/project.inria.fr\/ssse\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/project.inria.fr\/ssse\/wp-json\/wp\/v2\/users\/983"}],"replies":[{"embeddable":true,"href":"https:\/\/project.inria.fr\/ssse\/wp-json\/wp\/v2\/comments?post=176"}],"version-history":[{"count":65,"href":"https:\/\/project.inria.fr\/ssse\/wp-json\/wp\/v2\/posts\/176\/revisions"}],"predecessor-version":[{"id":512,"href":"https:\/\/project.inria.fr\/ssse\/wp-json\/wp\/v2\/posts\/176\/revisions\/512"}],"wp:attachment":[{"href":"https:\/\/project.inria.fr\/ssse\/wp-json\/wp\/v2\/media?parent=176"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/project.inria.fr\/ssse\/wp-json\/wp\/v2\/categories?post=176"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/project.inria.fr\/ssse\/wp-json\/wp\/v2\/tags?post=176"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}