Natural Sound Statistics in Auditory Scene Analysis


Student thesis: Doctoral Thesis

View graph of relations

Related Research Unit(s)


Awarding Institution
Award date30 Sep 2020


We come across a wide variety of sounds every day. Most of the time we receive sounds from a multitude of sources rather than a single source. In a complex auditory environment, the brain's ability to segregate the flux of incoming sounds into separate auditory sources or streams plays a crucial role in auditory perception. In auditory scene analysis, how the brain represents different sound objects still remains an open area of research. Among natural sounds, "sound textures" have recently been recognized as an important class of sounds. Textures are stochastic streams of sounds which have temporal homogeneity, i.e. the statistical properties of these sounds do not vary significantly over a period of time. Typical examples might include the noises made by waves on a beach or the buzzing of a swarm of insects. Such sound textures are easily identified, and segregated into forward or background sounds in the course of scene analysis, suggesting that the auditory system must be sensitive to the statistical features of sounds that make sound textures identifiable and discriminable.

In a recent psychoacoustic study, [McDermott and Simoncelli, 2011] have described methods that make it possible to synthesize naturalistic sounds from white noise by systematic imposition of statistical features, such as mean, variance, skew, kurtosis of the amplitudes in sound envelopes in cochlear frequency channels, correlations between frequencies, and modulation power. However, how neurons at mid and higher level auditory stations encode or represent these statistical features is not known in detail. Moreover, the space of all theoretically possible sound textures is huge, making the exploration of this sound space in a systematic or representative way a challenging task. My thesis therefore has two objectives:

1. To compile and survey a sufficiently large corpus of natural sound textures to estimate the distributions of statistical features that are typically found in our environment, given that knowledge of these distributions will enable us to explore the sensitivity of the auditory system in a systematic manner.

2. To characterize the sensitivity of neurons in the auditory pathway to statistical features, using synthetic stimuli selected to form a representative sample of the "natural sound texture space" characterised in objective 1.

To address the first objective, I collected a corpus of 200 natural sounds and established a statistical framework based on principal component analysis to explore the natural sound texture space. I found that the large dimensionality of the statistical parameters of the natural sound texture space are mostly redundant and with only a few statistical parameters the natural sound texture space can be explored efficiently. To address the second objective, I selected a set of sounds from the sound corpus which I call as representative textures. I resynthesized these sounds to generate a set of synthetic stimuli or morphed textures from white noise by systematically morphing and imposing these statistics in a hierarchical fashion to explore the impact of different statistics. I have used these morphed textures for electrophysiological recordings from the inferior colliculus (IC) and auditory cortex (AC) of young adult female Wistar rats. Subsequent analysis revealed that above 70% of the neurons in the inferior colliculus during onset and around ~30% of auditory cortical neurons are sensitive to only power to variance the statistical transition present in the natural sound textures. For other transitions auditory cortical neurons remain insensitive. On the other hand ~2­30% IC neurons are sensitive to other statistical transitions during onset response. For ongoing response around ~10­90% of IC neurons are sensitive whereas only ~2% cortical neurons are sensitive to modulation power only.