Abstract
Music production is the creative process of composing, recording and manipulating audio to create a piece of music. Within the music production workflow, composing and recording concern the creation and generation of initial music, while manipulating takes care of music processing and refinement to produce the final version. In this sense, deep learning methods have been introduced to music production, especially for music generation and processing.The overarching research questions that have guided my research are: How has AI technology been involved in music production? How can AI tools better be used to support the music production pipeline? (refer to Chapter 5 and 6) How do we evaluate new tools? Do those tools produce the desired results from technical and artistic aspects? (refer to Chapter 7) What concerns are aroused by this development? How do we consider different attitudes? (refer to Chapter 8) And what are the technical and conceptual challenges that we are currently facing?
This thesis aims to explore deep learning architectures in three aspects: music signal generation, music signal processing, and the final evaluations of the music samples output from neural networks. In other words, we investigate deep learning in different steps in the music production workflow and then estimate the output music from musical and artistic perspectives.
Firstly, in terms of music generation, generating music with pitch and rhythm has received ample attentions from researchers; by contrast, noise music as a special genre has been largely ignored. Therefore, in this project, we focus on constructing a deep learning architecture called variational autoencoder (VAE) to generate noise music in the “Merzbow" style. The results show that our proof-of-concept system can generate noise music.
Secondly, we look into music dereverberation using the proposed neural network architecture in music processing. Reverberation is an audio effect widely used in music production, affecting the audio spectrum, timbre and sense of space in music. Technical ear training in how reverberation works on music samples is crucial for sound engineers in music production work. To achieve the desired texture, sound engineers should be capable of perceiving subtle audio changes aroused by reverberation. However, “dry" recordings, i.e., those that lack reverberation, are not universally available for music production students for the purposes of practice. The paper proposes a deep learning-based music dereverberation method to generate dereverbed music samples for technical ear training in music production education. The experiment results show that based on various objective evaluation metrics, the proposed method can effectively realise dereverberation compared to other neural network-based methods.
Thirdly, we show an approach to evaluate the music sample output from a commercial AI-reverberation tool through objective analysis metrics and perceptual listening tests. On the one hand, for objective metrics, we extract the audio features regarding dynamics and spectral dimensions to compare the target and processed musical samples. On the other hand, two perceptual listening tests (N = 10, N = 33) were conducted by the questionnaires to allow participants to rate and estimate audio samples, where perceptual ratings on semantic scales were collected. Logistic regression was carried out on the two datasets in parallel, from which we observe that an increase in perceived Wetness or decrease in perceived Clarity was associated with a higher probability that AI, rather than a Human, made the reverberation. For extracted audio features, lower Brightness, Rolloff, and Centroid, which are all indicators for a darker, low-frequency emphasised sound, were more likely made by the AI. This study contributes to understanding the differences between AI- and human-generated audio effects in music production.
Lastly, rather than focus on deep learning systems, we use a philosophical theory from Simondon to reflect the current state in AI music mastering and to analyse the reasons for people’s different attitudes towards the changes and developments in music mastering brought by AI technology.
| Date of Award | 15 Oct 2024 |
|---|---|
| Original language | English |
| Awarding Institution |
|
| Supervisor | P. M. Lindborg (Supervisor) & Ryo IKESHIRO (Co-supervisor) |