Virtual 3D sound synthesis is aimed to create a three-dimensional perception of a
sound using only two earphones. In comparing to physical 3D sound synthesis which
requires multiple speakers placed at designated positions, virtual 3D sound synthesis
has advantages in a wide range of applications, such as mobile entertainment devices
(MP3, MP4, etc), human aid systems, computer games and military simulations.
Head-related impulse response (HRIR), which captures the filtering effects of human
torso, head and pinna to a sound propagating from a specific spatial position to
the eardrum of a listener, is the core part in virtual 3D sound synthesis. Using the
measured HRIR, a vivid 3D sound illusion can be created with sounds produced by two
speakers which are positioned in the listener's ears. However, the measured HRIR,
which is a function of time, elevation and azimuth and varies with subjects, is a large
dataset. Besides, the tedious measuring procedures and the special equipments required
have made experimental measurement impractical for adoption in commercial
applications.
It has long been desirable to generate individualized HRIR in a more efficient way
and to reduce the storage requirement and computational complexity in real time
virtual 3D sound synthesis. In this thesis, a hybrid implementation scheme based
on combining principal component analysis (PCA) with balanced model truncation
(BMT) is proposed to reduce both the computational complexity and the storage
requirement. A grouping strategy is developed to divide HRIRs into several groups
according to their similarities so that PCA performs better. This implementation
scheme has potential advantages in case where there are multiple sound sources. The
computational complexity of this implementation hardly increases with the adding of
sound sources. A common factor decomposition (CFD) algorithm with IIR modeling
of the directional factor is also proposed to improve the performance of the virtual
sound system. A two-dimension common factor decomposition (2D-CFD) algorithm is
further developed to represent the 3-dimensional (time, elevation and azimuth) HRIR dataset with a set of elevation-dependent impulse responses and a set of azimuthdependent
impulse responses to reduce the storage requirement. Common pole IIR
(CP-IIR) filter modeling is further used for computation simplification. The proposed
algorithm is much more efficient and results in low distortion as compared to other
algorithms in literature. However, as the size of the dataset obtained by 2D-CFD
and CP-IIR modeling is related to the spatial resolution of measurement, the storage
requirement is still large if HRIRs are measured at a high resolution. To avoid the
expansion in dataset size caused by increasing the spatial resolution of measurement, a
continuous function model is proposed to represent the measured HRIR as an IIR filter
whose coefficients are low order harmonic functions of elevation and azimuth. The
continuous function model reduces the HRIR storage dramatically and the memory
requirement will not be increased even if HRIRs are measured at a higher spatial
resolution. By applying an efficient method for harmonic function calculation, the
proposed model requires comparatively low computation complexity.
For HRIR customization, 2D-CFD algorithm is further applied to a HRIR dataset
which contains HRIRs of multiple subjects and at multiple directions to extract a set
of direction-dependent impulse responses (DDIRs) which are common for all subjects.
A subject-dependent impulse response (SDIR) is extracted for each subject
simultaneously to capture the subject-dependent information contained in HRIR.
Such modeling not only reduces the dimensionality of the HRIR dataset but also
allows the customization of a set of HRIRs via the customization of a SDIR. Two
methods are proposed to calculate a target subject's SDIR for customization. In the
first method, joint support vector regression (JSVR) is applied to train a nonlinear
model to predict a target subject's SDIR from his/her anthropometric parameters.
In the second method, the target subject's SDIR is extracted from several sampled
HRIR measurements of the subject. The derived SDIR is then convolved with the
trained DDIRs to construct the whole set of HRIRs of the target subject. Listening
tests show that both methods can generate HRIR similar to the measured one.
| Date of Award | 3 Oct 2014 |
|---|
| Original language | English |
|---|
| Awarding Institution | - City University of Hong Kong
|
|---|
| Supervisor | Cheung Fat CHAN (Supervisor) |
|---|
- Auditory perception
- Surround-sound systems
Head-related transfer function modeling and customization
WANG, Z. (Author). 3 Oct 2014
Student thesis: Doctoral Thesis