Musical Tour in programming

Table of Contents

Cool Things
Working with audio
1. The anatomy of audio samples
My visualizer - Murl
1. Visualizer Demo
Conclusion
References

The Audio Industry

To be completely honest, I had no real knowledge or the curiosity to know about this field and that it even exists, but it just happened to be the case that YouTube recommended this talk¹ from CppCon 2015 by Timur Doumler.

This area intrigues me as it (as well as a lot of other fields) isn’t really “well known”. All the software that you might be using on a daily basis has a sound engine in it. From the viewing YouTube videos on your browser, streaming music on your phone, even the small QOL audio feedback you get when interacting with software on your OS has been carefully engineered to talk with the actual audio hardware. This software is key in our lives and even for livelihoods such as music production, DJ’s, musicians, artists, etc. just to name a few.

Cool Things

What really makes me interested is the new and upcoming “digital” instruments and synthesizers that have surfaced into production in the last few years.


Roli Seaboard²

Which is a 3D touch surface that has an audio engine running on an embedded system. It’s dealing with extremely high quality instrument samples that have to be played back in real-time with < 2ms latency.

Working with audio

Anyway, coming back to what I have the ability to explain, it is all about getting these key points that we are looking for:

Fast & Efficient Digital Signal Processing
Lock-free thread synchronization
Cross-platform support

Hence, the obvious question is: What is the best way to go about it? C++ is a good choice, since most of these drivers and APIs are written in C it is easier to interface with, which is what I will be talking about.

There are conventions and best practices with using these APIs that suit your needs. Audio Data has a lot of ways in which it is represented, but this is the most common that I have come across.

Audio data is represented as amplitude data for each channel in time that are played back at the sample rate.

We interact with this data through some audio callbacks that usually run on a separate high priority thread. These are used to generate as well as process the audio and typically have the following signature:

/**
 * @brief Audio callback
 * @param userdata Pointer to user specified data shared by the audio thread
 * @param stream The audio stream to write or read from.
 * @param len The number of bytes to write or read.
 */
void audio_callback(void *userdata, uint8_t *stream, uint32_t len) noexcept;

The sample rate is in the order of kilohertz, commonly 44.1kHz, 98kHz, etc., which means that our audio callbacks are being called with chunks of samples (typically 32 or 1024) which are not necessarily the same size for ~100 times per second. Hence, our callbacks cannot be blocking on the CPU.

Exceeding this rate will result in audio being dropped causes an audio dropout or a glitch which is basically you can hear like as a crackle or like a silent gap in the audio. This is because the audio buffer is being filled faster than it is being consumed. It’s immediately audible and, even if you drop just one buffer you can hear that.

So here are the rules of audio code that Timur Doumler mentioned in his talk:

Rule #0 of audio code: The audio callback waits for nothing.
Rule #1 of audio code: You never want to cause audio callbacks to dropout

The anatomy of audio samples

Audio samples have a butt-ton of terminology associated with them. So let’s define the following terms:

Number of channels $C = 2$ (for stereo)
Number of samples per buffer $N = 256$
The sample rate $f$ is the number of samples per second
The stream size $S$ can be represented as this matrix:

\mathbf{S}_{C \times N} = \begin{bmatrix} x_{1,1} & x_{1,2} & \dots & x_{1,N} \\ x_{2,1} & x_{2,2} & \dots & x_{2,N} \\ \vdots & \vdots & \ddots & \vdots \\ x_{C,1} & x_{C,2} & \dots & x_{C,N} \end{bmatrix}

I gloss over a lot of explanation here, but to give the gist of it, each $x_{i, j}$ is present in a specified format that your OS or API supports. Some usual data types are int32_t, int16_t and normalized ( $[-1.0, 1.0]$ ) float, and double.

Just this fact causes a lot of code to be platform dependent for correctly parsing these buffers.

My visualizer - Murl

I decided to create a visualizer that can be used to visualize these audio buffers. Murl³ (Music Uptake Rendering Library) is an open source and the source code is available under the GPLv3 license.

I’m using stuff here that I have learned from a lot of places and got the idea from our beloved programming outcast Zozin⁴ and a lot of other stuff that will definitely make this him very angry XD.

It taught me many things, and I am happy to share it with you. I know that I can improve it a lot more, and I’ll definitely post updates in the future.

It is very simple for now, you can drag and drop any audio file, and it will visualize it.

Visualizer Demo

Here⁵ is a demo of the visualizer compiled to WebAssembly in action:

Open the Demo in a new tab

Use R to reload the fragment shader (when testing on desktop).
Use Space to play/pause or Tap the screen on mobile.

If you aren’t able to interact with the demo above, here is a small video⁶ of it visualizing a sine wave:

Conclusion

I would appreciate it if you would give a star to the repo³ and follow me on GitHub. I would also love to hear your feedback, suggestions, or contributions to the repo. And I hope to have made you just a little more excited the next time you are listening or recording some music.