VSVI Dev. Blog 7: Digital Audio Myths

Spectrograph

Artificial harmonic glissando

In digital audio, we think about fidelity in four domains:

  • Bit Depth (bits)
  • Sample Rate (Hz)
  • Bitrate (kbps or kb/s)
  • Channels

It’s important to understand what each of the terms means when shopping for samples and sample libraries, as some advertised features will do very little more than exponentially increase the size of the library, making it seem more valuable than it is!

Bit Depth describes just one single element of the digital audio equation: noise floor. It is the measure of bits in each sample taken. It does not have ANY role in the sound-quality of your audio, only where the noise-floor is located. With noise-shaped Dither, 16-bit audio can easily cover beyond the theoretical range of human hearing (-96 dB, or as far down as -120 dB with shaped dither). We always record and process in 24-bit for improved filter performance, but provide our instruments in 16-bit.

Why? The preexisting noise-floor, even with careful recording and even denoising is always considerably higher than the theoretical -96 dB (possibly extendable down to -120 with correct dither usage) noise-floor found in all recordings done with microphones. In fact, any sample library developer that tries to sell you samples with more than 16-bit audio that were not recorded in a fully isolated and insulated anechoic chamber is wasting their bandwidth and storage space, and your hard-drive space and time.

The best way to understand bit depth is to imagine we have a set of sine waves we’ve cut into 44100 columns (samples). At each slice (in 16-bit PCM), we pick a number between −32,768 and +32,767 (with 0 representing the middle line of the waveform) that most closely resembles the point we see on our analog arc and place our sample there. It must be an integer (0,1,2,3, etc.). If we get something very small (i.e. quiet) and boost it a bunch digitally, then we will encounter artifacts from our earlier quantization (don’t worry, you would have to be recording something at close to -40 or lower dB for this to happen). For 24 bit, we get to pick from 16,777,216 possible integer points. Therefore, smaller waveforms are possible to represent quieter waveforms and boost them digitally without encountering quantization distortion. 32 bit float is another form, using a float rather than an integer, so it can provide a decimal value. Because it is so resource expensive and the dynamic range it provides is essentially completely unnecessary (extending exponentially beyond the range of human hearing), 32-bit float is not used except for the recording of highly unpredictable sources and ultra-critical processing, and requires ultra-high-end equipment and recording conditions to generate any necessary need (most mic and preamp self-noise is far too high for 32-bit noise-floor), i.e. industrial/scientific uses. It can be useful for extremely heavy effects processing on a single signal, where repetitive quantization could result in a noise increase, but the amount of usage would require a very, very powerful computer just to function.

Sample Rate is a function of the total frequency range that can be represented in the digital audio. It is a measure of the number of times the audio signal is sampled every second. Under the Nyquist Theorem, if we accept the hearing of a young female toddler may be, at its very greatest, 20,000 Hz, a sample rate that would fully include all frequencies in this range would be 40,000 Hz (40 kHz). Add a little buffer and do a little manipulation to make synchronization with video recording easier, and voila, 44.1 kHz! In Europe, they decided to add a bit more of a buffer (no pun intended), and went with 48 kHz. We use 44.1 kHz at all point in the sampling and distribution process.

Why? A sample rate of 44.1 kHz extends beyond the maximum range of human hear (if you’re male and/or over 20, your hearing likely drops off around 17-18 kHz). Recording too much higher results in distortion on equipment (amplifiers, speakers, etc.) not designed to handle those rates, which could result in issues with our customers, aside from using enormous amounts of space. Any developer who sells samples more than 44.1 kHz that are not intended for very extensive resampling/manipulation is possibly multiplying the size of their library by 2, 3, or even 4x for NO perceivable improvement. Beware!

If we take our bit depth example from before, imagine we had 16 (or 24) rows and wanted to cut our sine waves in a different number of columns. Increasing the number of samples would mean increasing the fidelity to each sine wave. Remember, a lower sample rate means any sound greater than 1/2 the frequency of the sample rate will be lost (this is why applying an 8kHz sample rate results in a sound not dissimilar to the fidelity of 78 rpm records, at which time, recordings could only reach around 4 kHz total frequency range).

Bitrate measures the number of total bits stored every second. In lossless audio, this is Bit Depth * Sample Rate (or for 44.1/16 mono, 705.6 kbps (stereo would be 1411.2 kbps)). Bitrate only changes from the lossless measurements if a form of lossy compression is applied, such as the .ogg vorbis or .mp3 lame codecs. Lossy compression, for obvious reasons, degrades the sound quality of the audio, no matter how little you use. For .mp3, anywhere down to 320 kbps is more or less indistinguishable from uncompressed signals for most music (particularly signals without strong transients) for consumers. We do not use any lossy compression in any stage of our development process.

Why? Compression compromises the signal much more than other formats. Chances are, many customers will want a higher fidelity sample than compressed audio is capable of.

Channels describes the number of different audio streams used. Most modern audio work is recorded in stereo (2-channel), and occasionally in mono (1-channel), although recent advances in technology have led to the development of affordable ambisonic microphone arrays, capable of recording a 360-degree signal, and, with the help of a decoder, reduce it to a single 2-channel, 4-channel, 7-channel, or so on experience. We record all instruments in stereo whenever possible, and if multi-mic recording is done, occasionally used arrays of mono or stereo design to capture different angles.

How does this fit in with other digital formats, such as video?

In digital video, we think of a number of frames per second, and an amount of data per frame (for example, 30 frames per second of 720p footage (that’s 921,600 pixels per frame) is 27.65 million pixels every second. Typically the color of each pixel is expressed in 8-bit, so we would end up with  221,184,000 bits (about 27.65 MB every second, or 221,200 kbps, compared to a mere 1,411.2 kbps for 44.1/16 audio). Of course, in this example, we assume zero compression and also leave out other information that might be included in the specific codec, but it is a good way to get a feeling for the size of audio data.

How do all of the above elements fit together?

In digital audio, we comprise our recording of a series of samples (sample rate), each containing a certain number of bits (bit depth), with a certain number of channels. Multiplying these three values will give us an understanding for the total amount of data being transferred in bits (make sure you convert bits to bytes if you are concerned with storage space).

IotW 6: Shure Hercules (?)

Picture 2

Although the exact model of this mic is a bit of a mystery, it sounds pretty good!

Something often neglected in sampling and recording is the less expensive side of microphones. For example, when wanting to produce a track with an early jazz sound matching the era in which it was written a bit closer, I turned to this interesting mic I picked up at a tag sale years ago, which appears to be a late model or descendant of the Shure Hercules.

Check out the results (unaltered) and a parallel stereo recording with the XY capsule of a H6. That trumpet solo at 50″ sounds straight off a 78, minus the distortion and cracks. It also worked well on another period piece.

VSVI Dev. Blog 6: Mic Usage in Sampling

“Ur Doin’ It Wrong”; Image by Geoff Kaiser

There is a fascination, in the last few years, with primarily three features of orchestral sample libraries: If the number of multisamples/RR is in two digits, if it has sampled/”live legato, and how many mixable mic positions are available. Today, I’m going to talk a bit about the latter, “multiple mixable mic positions”, as well as using microphones effectively to create an effective experience for the end user of the samples, to the point where they really do have control over the tone of the instrument.

Continue reading

IotW 5: Performing for Samples

10300786_808040149230818_1345759263991502050_n

I’m not always the one behind the mics, such as this (admittedly shaky) shot from last summer’s sampling bonanza.

If you’ve seen trailers for virtual instruments with real footage of the musicians performing, you probably see a 10-second or less clip of some cool note or just some silent close-ups while some dramatic music created using the plugin months after the original session.

In reality, sampling sessions are long, slow, borderline Zen marathons of endurance, especially when alone, as is often the case in such Guerrilla-style sampling sessions as those I often run. In which case, I either see something like the above or like the image below-

Any pianist will tell you sitting at a piano for two hours is a long time... sitting at a piano for two hours playing one note at a time waiting for each note to decay is an eternity.

Any pianist will tell you sitting at a piano for two hours is a long time… sitting at a piano for two hours playing one note at a time waiting for each note to decay is an eternity.

Continue reading

IotW 4: Consorts and Cousins: A Tale of Two Trombones

TenorBassTrombones

A “Bb” Tenor Trombone by C.G. Conn (Foreground) and a “G” Bass Trombone by Hawkes & Sons (Background) lounging.

The two instruments you see depicted are roughly contemporaries (the Bass Trombone is actually a little later, in the 1910’s, and English rather than American, but contemporaries they are just fine enough).

Continue reading

Image

Image of the Week 3

IMG_1278

Possibly early 1900’s or late 1800’s Classical Trombone Mouthpiece

I actually rescued this mouthpiece from a pile that were going to be scrapped, and boy what a find! A little polish and a great background shot for a future library is possibly born.

Some consultation with a more experienced individual points towards this mouthpiece being based on the sort of mouthpiece measurements one might find from the Classical period- large, relatively flat rim, smallish cup, somewhat sharp transition into backbore, small bore.

Image

Image of the Week 2

Interior close-up of a grand piano sampled in December, 2014.

Interior close-up of a grand piano sampled in December, 2014.

The above shot comes from a brief sampling session in which I completed a basic sampling of a grand piano for the upcoming VSCO 2 and other applications. It’s a bit dusty, but sounds pretty nice!