Audio Musical Genre Classification Using Convolutional Neural Networks and Pitch and Tempo Transformations
Author | : 黎立華 |
Publisher | : |
Total Pages | : 124 |
Release | : 2010 |
ISBN-10 | : OCLC:713010663 |
ISBN-13 | : |
Rating | : 4/5 (63 Downloads) |
Read and Download All BOOK in PDF
Download Audio Musical Genre Classification Using Convolutional Neural Networks And Pitch And Tempo Transformations full books in PDF, epub, and Kindle. Read online free Audio Musical Genre Classification Using Convolutional Neural Networks And Pitch And Tempo Transformations ebook anywhere anytime directly on your device. Fast Download speed and no annoying ads.
Author | : 黎立華 |
Publisher | : |
Total Pages | : 124 |
Release | : 2010 |
ISBN-10 | : OCLC:713010663 |
ISBN-13 | : |
Rating | : 4/5 (63 Downloads) |
Author | : Shijia Geng |
Publisher | : |
Total Pages | : 0 |
Release | : 2016 |
ISBN-10 | : OCLC:1424826114 |
ISBN-13 | : |
Rating | : 4/5 (14 Downloads) |
It is not difficult for most people to distinguish one music style from another. However, how the brain processes this simple task is still unknown. In order to shed light on this problem, and explore ways to apply cutting-edge deep learning technology in the music engineering field, two tasks have been conducted using convolutional neural networks (CNNs). CNNs, inspired by biological visual systems, have been widely used for image -related applications and achieved great success, but they have rarely been applied in audio-related field. In this thesis study, we examined the possibility of deploying a CNN in audio-related tasks and the potential of using it as a creative music composition tool. The first task applied a CNN model with three convolutional and two fully connected layers to a binary music style classification task. The trained CNN is designed to distinguish a five-second Chinese population music ( C-pop) clip from a same duration melodic death metal music (MDM) clip by using the raw audio signal as input. With 4800 training examples and 20 epochs, it obtained about 80% accuracy on 480 testing examples. The second task was based on the trained CNN model and analogous to the DeepDream visual project. The DeepDream project uses a CNN that is trained for a visual classification task to enhance the emergence of elements that may not exist in an input image. The resulting image has a dreamlike appearance and, depending on which CNN layer is used for the enhancement, the emerging elements will be different. For lower layers, the image appears with more elementary shapes, and for higher layers, it displays more complete objects. Also, if using a reference image to guide the modification, the elements of the reference image will be blended into the input. In this thesis study, similar procedures were done with a randomlyselected C-pop clip using the trained CNN model from the classification task. The goal is to modify the input audio signal to increase activations from a particular convolutional layer such that extra elements stored in this layer can be obtained along with the original audio signal. The resulting non-guided audio clips were hierarchical from bursts and pulses to a mixing of original C-pop with some metal textures, based on different convolutional layers from lower to higher depths. The guided audio clips gained some metal style features, but lost the original timing of dynamic changes.
Author | : Rosalind M. Davis |
Publisher | : |
Total Pages | : 99 |
Release | : 2018 |
ISBN-10 | : OCLC:1200244640 |
ISBN-13 | : |
Rating | : 4/5 (40 Downloads) |
Since 2015, the music industry has experienced a resurgence driven by online music sales and streaming, which has in turn been facilitated by very large archives of musical data. These large musical archives, however, remain challenging to search and index effectively, due to the scale of the data involved and the subjective, perceptual nature of how humans relate to music. Contemporary research in music information retrieval seeks to bridge this gap by using algorithmic analysis on features extracted from the underlying audio to automatically classify and identify perceptual features in music. This project applied three machine learning techniques (support vector classification, traditional neural networks, and convolutional neural networks) to two sets of audio features (Mel-frequency cepstral coefficients and the discrete wavelet transform) for the purposes of genre classification. Because convolutional neural networks have been used on images to great effect, the discrete wavelet transform data was used to map audio into the image domain, to leverage publicly available, pre-trained weight sets for four large, sophisticated image recognition networks. For all tasks, two subsets of a large, publicly available musical dataset were used, along with multiple training and optimization techniques. While all models were able to meet or exceed some pre-existing benchmarks for the genre classification task, support vector classification was found to yield better results, with a best overall test set accuracy of 61%, than either traditional neural networks (51.4%) or convolutional neural networks (40.5%) on an eight-genre multi-class classification task. The application of the pre-trained image recognition networks to audio wavelet data decreased training time, but was not found to yield accuracies comparable to the accuracies those networks achieved on image data. The small size of the dataset relative to datasets in other domains, the reuse of data augmentation techniques intended for use on images, and sub-optimal feature extraction techniques are suggested as factors in the inability of the machine-learning models evaluated in this project to achieve the quality of results observed in the image domain. Audio-native augmentation techniques and the use of ensemble models present worthwhile avenues for future investigation.
Author | : Yi-Hsuan Yang |
Publisher | : CRC Press |
Total Pages | : 251 |
Release | : 2011-02-22 |
ISBN-10 | : 9781439850473 |
ISBN-13 | : 143985047X |
Rating | : 4/5 (73 Downloads) |
Providing a complete review of existing work in music emotion developed in psychology and engineering, Music Emotion Recognition explains how to account for the subjective nature of emotion perception in the development of automatic music emotion recognition (MER) systems. Among the first publications dedicated to automatic MER, it begins with
Author | : Alexander Lerch |
Publisher | : John Wiley & Sons |
Total Pages | : 273 |
Release | : 2012-11-05 |
ISBN-10 | : 9781118393505 |
ISBN-13 | : 1118393503 |
Rating | : 4/5 (05 Downloads) |
With the proliferation of digital audio distribution over digital media, audio content analysis is fast becoming a requirement for designers of intelligent signal-adaptive audio processing systems. Written by a well-known expert in the field, this book provides quick access to different analysis algorithms and allows comparison between different approaches to the same task, making it useful for newcomers to audio signal processing and industry experts alike. A review of relevant fundamentals in audio signal processing, psychoacoustics, and music theory, as well as downloadable MATLAB files are also included. Please visit the companion website: www.AudioContentAnalysis.org
Author | : Nikki Pelchat |
Publisher | : |
Total Pages | : 0 |
Release | : 2021 |
ISBN-10 | : OCLC:1339100408 |
ISBN-13 | : |
Rating | : 4/5 (08 Downloads) |
Music recommendation systems have become popular in recent years with the increasing variety of music content being produced as well as the sheer size of digital music collections which are available at the touch of a finger. Large collections of digital music are commonly organized using genre labels. In addition, music genres are regularly used by recommendation systems to suggest new music to the listeners. The chore of classifying a large amount of music manually can be difficult and time consuming. It is for these reasons, the automatic classification of music by genre is a crucial task. The ability to automatically classify music by genre using machine learning can be quicker and arguably more accurate than doing it manually. Using neural networks for generic classification tasks is a well researched area within machine learning. In recent years, the classification of music by genre has become part of the same problem domain. Differences in song libraries, machine learning techniques, input formats, and types of neural networks implemented have all had varying levels of success. This thesis implements a convolutional neural network that classifies music by genre through the examination of spectrogram images. It concentrates on three specific types of spectrogram inputs (Linear, Logarithmic, and Mel scaled spectrograms) as well as several input variables and neural network learning techniques to determine the effect that they have on the overall accuracy of the genre classification network. This thesis demonstrates these convolutional neural network techniques for music genre classification and assesses their viability and accuracy.
Author | : Jean-Pierre Briot |
Publisher | : Springer |
Total Pages | : 284 |
Release | : 2019-11-08 |
ISBN-10 | : 9783319701639 |
ISBN-13 | : 3319701630 |
Rating | : 4/5 (39 Downloads) |
This book is a survey and analysis of how deep learning can be used to generate musical content. The authors offer a comprehensive presentation of the foundations of deep learning techniques for music generation. They also develop a conceptual framework used to classify and analyze various types of architecture, encoding models, generation strategies, and ways to control the generation. The five dimensions of this framework are: objective (the kind of musical content to be generated, e.g., melody, accompaniment); representation (the musical elements to be considered and how to encode them, e.g., chord, silence, piano roll, one-hot encoding); architecture (the structure organizing neurons, their connexions, and the flow of their activations, e.g., feedforward, recurrent, variational autoencoder); challenge (the desired properties and issues, e.g., variability, incrementality, adaptability); and strategy (the way to model and control the process of generation, e.g., single-step feedforward, iterative feedforward, decoder feedforward, sampling). To illustrate the possible design decisions and to allow comparison and correlation analysis they analyze and classify more than 40 systems, and they discuss important open challenges such as interactivity, originality, and structure. The authors have extensive knowledge and experience in all related research, technical, performance, and business aspects. The book is suitable for students, practitioners, and researchers in the artificial intelligence, machine learning, and music creation domains. The reader does not require any prior knowledge about artificial neural networks, deep learning, or computer music. The text is fully supported with a comprehensive table of acronyms, bibliography, glossary, and index, and supplementary material is available from the authors' website.
Author | : Albert S. Bregman |
Publisher | : MIT Press |
Total Pages | : 800 |
Release | : 1994-09-29 |
ISBN-10 | : 0262521954 |
ISBN-13 | : 9780262521956 |
Rating | : 4/5 (54 Downloads) |
Auditory Scene Analysis addresses the problem of hearing complex auditory environments, using a series of creative analogies to describe the process required of the human auditory system as it analyzes mixtures of sounds to recover descriptions of individual sounds. In a unified and comprehensive way, Bregman establishes a theoretical framework that integrates his findings with an unusually wide range of previous research in psychoacoustics, speech perception, music theory and composition, and computer modeling.
Author | : Meinard Müller |
Publisher | : Springer Nature |
Total Pages | : 495 |
Release | : 2021-04-09 |
ISBN-10 | : 9783030698089 |
ISBN-13 | : 3030698084 |
Rating | : 4/5 (89 Downloads) |
The textbook provides both profound technological knowledge and a comprehensive treatment of essential topics in music processing and music information retrieval (MIR). Including numerous examples, figures, and exercises, this book is suited for students, lecturers, and researchers working in audio engineering, signal processing, computer science, digital humanities, and musicology. The book consists of eight chapters. The first two cover foundations of music representations and the Fourier transform—concepts used throughout the book. Each of the subsequent chapters starts with a general description of a concrete music processing task and then discusses—in a mathematically rigorous way—essential techniques and algorithms applicable to a wide range of analysis, classification, and retrieval problems. By mixing theory and practice, the book’s goal is to offer detailed technological insights and a deep understanding of music processing applications. As a substantial extension, the textbook’s second edition introduces the FMP (fundamentals of music processing) notebooks, which provide additional audio-visual material and Python code examples that implement all computational approaches step by step. Using Jupyter notebooks and open-source web applications, the FMP notebooks yield an interactive framework that allows students to experiment with their music examples, explore the effect of parameter settings, and understand the computed results by suitable visualizations and sonifications. The FMP notebooks are available from the author’s institutional web page at the International Audio Laboratories Erlangen.
Author | : Niraj Kumar |
Publisher | : Springer Nature |
Total Pages | : 838 |
Release | : 2021-04-12 |
ISBN-10 | : 9789811599569 |
ISBN-13 | : 9811599564 |
Rating | : 4/5 (69 Downloads) |
This book comprises the select proceedings of the International Conference on Future Learning Aspects of Mechanical Engineering (FLAME) 2020. This volume focuses on several emerging interdisciplinary areas involving mechanical engineering. Some of the topics covered include automobile engineering, mechatronics, applied mechanics, structural mechanics, hydraulic mechanics, human vibration, biomechanics, biomedical Instrumentation, ergonomics, biodynamic modeling, nuclear engineering, and agriculture engineering. The contents of this book will be useful for students, researchers as well as professionals interested in interdisciplinary topics of mechanical engineering.