As part of the LOCATA Challenge, an extensive data corpus is released, targeted at sound source localization and tracking in general and at the above 5 tasks in particular. The corpus is open access and distributed under the Open Data Commons license. The corpus aims at providing a wide range of scenarios encountered in acoustic signal processing with an emphasis on dynamic scenarios. All recordings contained in the corpus were made in a realistic, reverberant acoustic environment. Ground truth positions, trajectories, and orientations of sources and sensors were obtained by means of an optical tracking system that uses 10 infrared cameras to localize and track moving objects. Ground truth positional data are made available to the participants. Ground truth positions of the sources will be used for evaluation of the challenge results, and released as part of the data corpus after completion of the challenge. Due to the installation of the OptiTrack system, recordings are limited to a single room. To ensure different acoustic conditions between recordings, source-sensor distances and angles were changed, thereby enforcing varying Direct-to-Reverberant Ratios (DRRs) between the recordings.
Tasks 1 and 2, involving static loudspeakers, are based on the CSTR VCTK1 database. The VCTK database provides over 400 newspaper sentences spoken by 109 native English talkers, recorded in a semi-anechoic environment at 96 kHz and down-sampled to 48 kHz. The database is distributed under the Open Data Commons license, therefore permitting open access for participants. As a result, the challenge corpus is also distributed under the Open Data Commons license to facilitate open access. Tasks 3 to 6 use speech recordings of live talkers reading randomly selected VCTK sentences. The talkers were equipped with DPA microphones near their mouths to record the close-talking speech signals. Participants are provided with the close-talking speech signals only for the development phase. The corresponding signals for the evaluation dataset are released as part of the corpus once the challenge is completed. These recordings are representative of the practical challenges, including natural speech inactivity during sentences, sporadic utterances as well as dialogues between talkers.
Acoustic Sensor Configurations
The following microphone arrays were used for the recordings:
- Planar microphone array with 15 microphones includes different linear uniform sub-arrays
- 32-channel spherical Eigenmike of the manfacturer mh-acoustics
- 12-channel pseudo-spherical microphone array integrated in the prototype head of the humanoid robot NAO
- Binaural recordings from a pair of hearing aid dummies (Siemens Signia) mounted on a dummy head (HeadAcoustic).
These recordings are representative of the practical challenges, including variation in orientation, position, and speed of the microphone arrays as well as the talkers. Detailed information about the room acoustics (like the DRR) are not provided to the participants of the challenge to stimulate the development of algorithms that require a minimum of a priori information. Measurements of the head-related transfer functions (HRTFs) / steering vectors for the equipment used for the LOCATA recordings are not provided as part of the LOCATA data corpus, but participants are allowed to use any measured or simulated HRTFs that are available in the public domain or have been acquired using the participant’s own equipment. For example, simulated generalized head-related transfer functions (GHRTFs) for the robot prototype head used for the recordings are provided at http://www.ee.bgu.ac.il/~acl/register.php. Measured head-related impulse responses (HRIRs) for a robot prototype head which is manufactured to the same specifications as the one used for the LOCATA Challenge are provided at https://robot-ears.eu/category/data-sets-and-videos/.
The LOCATA data corpus can be downloaded via this link.