Audio Engineering Basics

joepampel

Jun 19, 202415 min read

Updated: Jan 19

Basic Basics

By no means exhaustive or definitive, but ideally short & helpful for new folks to fill in some gaps.

"Stands For Decibel"

Signals we record and later listen to are converted from pressure waves in the air into electrical waves by a transducer, generally a microphone. The signal from the microphone is a very low voltage signal, and it needs to be amplified before we can do much with it. Pre-amps (whether mic pre-amps or home HiFi pre-amps) are first and foremost voltage amplifiers. They bring the signal up to a level where now we can work with it.

If we are recording the sound or running it through various devices (EQ, Compressors, etc) this voltage gain is enough. But if we want to reproduce the sound through loudspeakers, we need to deliver power - we need a current amplifier. This is essentially what a power amplifier is. An integrated amplifier or receiver for a home system has both kinds of amplifiers - voltage and current - built in. So does a guitar amplifier for that matter. There is a pre-amp to raise the voltage level of the guitar signal up to where we can drive a power amplifier section and finally some speakers.

This voltage gain is measured in decibels, or 1/10 of a Bel. Bel is capitalized because it is derived from Alexander Graham Bell. Abbreviated dB, decibels are always a ratio. Because human hearing is logarithmic in its response, the dB scale is a log scale as well. We can (and do) use dB to measure voltages, we can measure power, and we can measure acoustic level (SPL, for sound pressure level)

The formula for voltage gain is 20 X Log (Vo/Vi), Power is the same basic formula except it is 10 X Log (P2/P1) - same ratio, just 10X instead of 20X. The Bel is a dimensionless unit.

If your input signal is 100mV and the pre-amp in question can output 10 volts cleanly, we would say that it has a voltage gain of 20 X Log (10/.1) or 40dB.
If you revise your power amplifier circuit to produce twice as many watts of output, say going from 50W to 100W at the same range of frequencies and the same level of distortion, you will have increased the power by 10 X Log (P2/P1) or 3dB.
"Line Level" in a studio is considered to be +4dBU and in consumer grade equipment line level is -10dBU. Line level is the level your CD player puts out, or your pre-amp can send to your power amp (if you have a separate one) This is referenced to 0dB which is .775 volts. This voltage reference goes back to the early days of Telephony*
Mic level can be far lower, and may need anywhere up to around 60dB of gain to get up to line level (depends on both the mic and the source).
When we talk about acoustic sound levels, we use 0dB as the reference for how loud something is. 0dB is the accepted threshold of human perception - the quietest thing we can hear. And we add the suffix SPL (for Sound Pressure Level) to signify what we are referring to. You may also see it referenced to air pressure in Pascals. (most commonly in Mic specs)

There is more dB detail than I can possibly cover here, but hopefully this will get you started. We use 10X and 20X in the 2 formulas because we're converting to Bels from tenths of a Bel (decibels) and we also cannot directly compute power. There are certainly better explanations but for now, 'trust me'.

Originally, we used dBV for voltage and dBm for power. 0dB was referenced to .775V across a 600 Ohm termination which equated to 1 miliwatt of power. This was relevant to analog telephony. We use dBU today because we don't reference 600 Ohms any more, but .775V has hung in there.

TL;DR, levels mattered a lot more in the analog world because of self-noise and dynamic range. All devices make their own noise. Worse, if you chain two devices together that each have the same signal to noise ratio, you actually double the noise - you lose 3dB of S/N. Noise is additive. The general answer is to boost the signal as much as you can as early as you can to minimize this issue (you'll never have less noise than you do at the source) and then go through as few devices as possible. In the digital world this is largely a thing of the past.

Photo: Rupert Neve Designs

Mixers - Parts of a Mixer

All those knobs can look intimidating, but really they are just fairly intuitive channels just replicated in parallel. Each channel going back to front will have a matching set up controls that generally fall into these categories:

Pre-amp: voltage gain to get you from input level to line level. Input is usually a Mic but could be a DI box, a keyboard, etc.

EQ: filters to boost or cut certain frequencies. There are several types of EQs.

Sends & Returns: Take some of the signal on a channel and send it somewhere for processing, Reverb for example, and then return that signal. Often the send is a Buss - where multiple channels can send signal to the same reverb unit so that their sounds happen in the same space.

Insert: Rather then send some signal off on a buss, we can put the additional device in-line for just a single channel. This is called inserting it.

Busses: We can create a group of sends and they can all go to the same device. One buss you always have is your stereo bus or master bus. In your DAW this is commonly provided by default. All of your channels go here to give you a stereo mix.

Fader: Most commonly a mixing board will have a gain knob on the pre-amp and then a fader for the channel level. It controls the level of that channel whether recording or mixing.

Summing: We need to sum the various channels together into a mix in order to listen to it. This can be an active or a passive device, but is often active. When you pan tracks to one side or the other, you are summing them into Left or Right busses for the output.

Routing

How we route signals through the studio was pretty intuitive back in the day because we ran actual cables into patch panels. It's a bit more opaque in a modern DAW but the same things are happening - they just try to set thing up in a useable manner so you don't spend the first hour of every session wiring stuff together. In a classic studio you would "normalize" the board and patch bays when you finished a session, meaning you would un-do everything you did so the next session would not have to clean up after you.

One thing to keep in mind is that if you want to do something special, say re-amp a guitar track, you need to know where your routing configuration is so you can map things accordingly. Generally the mixers are very flexible and you can do things like send a channel or Aux to an output of some sort. The idea is flexibility. Even my old Roland 2000CD had routing screens where you could edit how things were connected. Logic Pro X which I use now has a much simpler schema, you just go to the track you are working with and use a drop down to see all of the options, and it will make (and plumb) any Aux you make automatically. What a time to be alive. And if you open a new project, it is already normalized. And if you want a specific layout, you can make a template and save it for re-use whenever you want.

Aux sends

These are really busses that are sent outside the mixer (sends) and brought back in (returns). Typically they are for some soft of effect, like Reverb, where you want to put a bunch of instruments in the same space and you can conserve CPU but not running an instance per track.

VCAs

In Logic they call these "groups" and essentially they are faders which are locked together so if you move one, they all go. Say you get the drum kit mixed up so it sounds great and now you want to mix. You can put the kit into a group and bring the whole thing up and down without messing up your settings. Backing vocals, a string section, you name it, anything you might want to group will work.

Mic level - is whatever your mic is putting out, and this is determined by the sound source (how intense/loud?) and the mic's own sensitivity. Roughly this is -60dB to -35dB. You'll see it quoted as millivolts per kilo pascal. That's thousandths of a volt per thousand pascals. Pascals are a measure of air pressure and are how we measure sound pressure.

Line level - in pro systems is it often either -10dB for consumer stuff or +4dB for pro gear

dBu - "back in the day..." we had dBV and dBm. These were established by the phone company (Western Electric, the manufacturing arm of AT&T) since they did the all the early work with audio signals (telephones!). 0 dBV was .775 volts referenced to 600 ohms and 1 milliwatt. dBu is unterminated - so we don't reference 600 ohms now. That is what the "u" stands for.

Outboard Gear / Plugin Basics

A Compressor - An easy one but very misunderstood. It simply reduces the dynamic range of a signal. A normal pre-amp with a voltage gain of 5dB might get a 1dB volt signal in and produce a 5 dB output. 2dB volts in would yield 10 dB of output. This is a nice linear function. Input X Gain = Output.

A Compressor has a "compression ratio" which is the inverse of the above. A 4:1 ratio means you have to put in another 4dB to get 1dB more out. 8:1 is put in 8 more dB to get 1 more out. and so on. There is a threshold value where this takes effect - after what volume level do we start compressing? And there are attack and release controls which affect the timing - how fast does the device respond to volume changes and how long does it take to stop responding to them.

When I was in school in the mid 1980's, a device with a compression ratio of 8:1 or higher was referred to as a limiter. In the UA article below they seem to have moved that definition out to 20:1 through infinity:1. But the idea is the same. At some point you are not merely reducing the dynamic range, you are giving it a hard ceiling.

The origin of these devices, like many others, was in radio. Stations wanted to broadcast the loudest, clearest undistorted signal they could. It turned out to be really handy for recording, mixing and mastering as well. The human ear has a dynamic range of about 140dB from the quietest sound we can perceive (zero dB SPL) to 140dB SPL where we have immediate hearing damage. Most commercial releases run around 10dB of dynamic range so that you can hear them over background noise and also not distort your reproducing systems with huge transients. An increase of 10dB is perceived as twice as loud and requires 10 times the power to reproduce. A 20dB peak would be 100 times more energy - literally if you were listening to a 1 watt playback your system would need to produce 100 watts just to handle the peak. You multiply power changes, so there it's 10 times 10.

Besides the attack and release there can also be a "knee" which describes how the compression settings take effect as the signal reaches the threshold setting. All at once? Or as some compressors advertise, using a "soft knee" so there is a bit more of a transition to going from an uncompressed program to whatever full setting you have chosen.

The most basic idea is you take the signal, you rectify it (change it from AC to DC voltage) and use that DC control voltage to change the gain of compressor circuit. Older compressors (using tubes) often contained push-pull power amps and non-linear control tubes which often gave them interesting distortion and coloration characteristics. Even newer transistor units like the Urei 1176 generated some "beneficial" eq changes and distortion products which became part of their charm. They fall into 2 broad categories, feed-forward or feed-back. Feed forward takes the input signal, splits and and processes it to control the circuit. These are the more modern designs. Feed-back compressors are the vintage type where you amplify the signal and take a sample of that output to control the compressor.

Keep in mind that your initial transients have more high frequency information than the sustain of the notes so playing with the compressor ratio and attack can impact the sound significantly.

Lastly, when we think about specs like signal to noise ratio, they are measured as maximum undistorted output signal divided by the self noise of the unit. If we reduce the dynamic range, the signal to noise gets smaller - which as a practical matter raises the noise floor. It's less of an issue in our digital age, but still something to be aware of.

Three little tricks with compressors to try out:

Ducking is controlling a compressor with a 2nd signal. The classic example is using the kick to duck the bass signal such that the kick hit will lower the level on the bass just for that moment. So you have the compressor on the bass track, but you take a send from the kick and use that as the actual trigger. Most compressor plug-ins can handle this.

Pre-eq is a bit more subtle, but the idea is simple: If you compress a wide band signal (maybe a buss) there could be low frequency transients that are triggering the compressor which then may interfere with the frequencies you are most interested in. By placing an EQ ahead of the compressor you can enable the compressor to focus on the frequencies you are most interested in and achieve a better mix. A HPF is often beneficial here.

Side-Chain compression is where you put a compressor on an Aux and then mix the compressed sound back with the uncompressed sound. You can get some really neat effects this way. Maybe you EQ the high frequencies of a vocal track (the breathy sounds) and compress it fairly heavily to really bring them out and then mix that back in to give your vocals a certain feel.

More:

https://www.uaudio.com/blog/audio-compression-basics

An Equalizer ("EQ") is a device that can cut or boost the program signal at specific frequencies or frequency ranges. The bass and treble controls on your car stereo are an EQ.

Back in the day it was common to roll off very low frequencies since they would show up as 'rumble' on LP records. Some old HiFi gear even featured "rumble filters". These are a family of filters known as "high pass" filters. They allow frequencies over some value X to pass through, and attenuate anything lower.

There are low pass filters which are the opposite - anything below X will pass. These appeared on old HiFi's as 'hiss filters'.

There are bandpass filters, which combine high and low rolloffs.

There are parametric EQs which generally have an adjustable center frequency, an adjustable gain but also a 3rd control called "Q" which refers to how tightly the EQ acts around the center frequency. You can be very specific and use the EQ to reduce a frequency that is feeding back in a PA system, or you can put a big wide hump on some low end frequency to boost the kick drum. Parametrics can be really handy for finding a resonance; live sound engineers use them to "ring out the room" - make a peaky (high Q) EQ curve and then sweep the frequency band from low to high, listening for ringing or feedback. Then configuring cuts at those points. You can use that same technique to learn about the frequency ranges that are present in any program material and quickly find out what boosting or cutting somewhere can do. Just set up a cut and then sweep the frequency to see if there is a desired effect.

When do you use one or the other? You need to consider the problem you need to solve; knowing what is available just helps you once you know what the problem is. Less is nearly always more, so find the solution that takes the least amount of tweaking. When I was in school at least, cutting was preferred over boosting (aka 'subtractive eq'). With some circuits it's a lot cleaner and less likely to add distortion. It can also be easier to control and focus on what you are trying to get done. Cutting mids is like boosting lows and highs. The goal is to record a good sound and then do as little to it as possible as you go along.

Reverb

One issue we run in to when recording is "space". Your brain is pretty sensitive to placing sounds in space; your brain examines the delay between each ear and estimates where things are.

Since many things are close mic'd in a studio, we often want (or need) to add back some space. Space comes in 2 basic flavors - echo and reverb. Reverb is short for 'reverberation' and if you have every been in an empty house or office space you are already familiar with it. It is simply all of the sound reflections from all of the surfaces from the time the sound occurs until time time X where it's too quiet to be heard. It's an important distinction that reverb is happening both during and after the sound is produced; echo as we know it in the studio is primarily happening after the fact. The timing on these reflections is generally low - too short to hear as an echo generally speaking, often a few miliseconds. The timing, the reflections, how they affect the sound and more play into how big a space you perceive. Is it a tiled bathroom or a stone church or a concert hall? Legends like Al Schmitt mentioned using 5 reverbs at once when mixing. Famous studios like EMI (Abbey Road) and Capitol Records in LA have large rooms designed to create reverberations. They would send some signal to the room where it would be reproduced by a loudspeaker and then there would 1 or more mics to pick up the reflections. There could also be room components (pillars etc) to change the sound in the room, and often 2 or 3 rooms of different sizes.

For my own music I have been playing with 3 or 4 of them since reading Al's book. It's made a big difference for me. I use something to give me short reflections, like the studio, something to give me mid-sized reverb and then usually something big and clear for "space". I use a lot of UA plug-ins so in my case that is often Ocean Way (room), the EMT140 (mid size space) and Capitol Chambers ("The One" - the Al Schmitt setting!), respectively. Just put each one on an Aux and mix to taste. Not every song needs everything, and neither does every instrument.

Adding space "after the fact" lets us control it better and also lets us use the spaces we have to record in - which might be less than ideal. I work out of my attic most of the time for example. In the old days you needed a great room, and if you go back and listen to tracks like Ray Charles' "Hit The Road Jack" you will hear a live band recorded in a great room and it's pretty incredible. (Atlantic Studios in NYC, engineered by Tom Dowd, live to 8 track) It is hard to beat the energy of a tight band playing live, but it's hard to do these days.

Duane Eddy famously used an empty 2,000 gallon water tank as an echo chamber which is what you hear on many of his hits (link below) Many instruments, including Hammond Organs and most guitar amps use the Hammond patented Reverb Tank, which is a transducer that uses the audio signal to vibrate a system of springs; those vibrations are picked up by another transducer and then amplified. Their use of different springs that were interconnected led to the richness of the effect. German company EMT used a giant steel plate for reverb and the operator would move the transducers around to get different reverb times. The first rule of recording club is there are no rules, just what sounds good to you.

An actual EMT140 plate reverb. You would mount this thing inside a wall usually. They are big and heavy, and pretty amazing. Photo: EMT

Delay / Echo

Echoes are clear repeats of a sound usually in the range of a few milliseconds and higher. These are really heard after the sound. There may be 1 repeat or more, and many delay units can feedback on themselves at high gain settings with many repeats. For pop or dance tracks, there may be echoes that are set in time to the beat so that while you don't hear the discrete echoes, your brain still processes them and you perceive the "space". You can use the calculator linked below (or any other - just search millisecond to BPM calculator) to create echoes in time with your rhythm track. Musicians like David Gilmour, Pete Townshend & The Edge have experimented a lot with dotted 8th note echoes to built complex textures out of simple guitar figures. In the case of The Edge, it became part of his signature sound. Rockabilly often uses a single short delay, also known as "slap back" as a rhythm element. The original delays were tape machines, and you would get a delay that equated to the speed of the tape and the distance between the record and play heads. If your tape machine had say 7-1/2" IPS and 15 IPS speed settings, you had two possible echo settings to work with, and the slower tape speed would be the longer delay. You will probably read or hear stories about studios where they ran tape off the tape deck and around a mic stand or something to get other delay times. It was a thing. "Tape" settings on modern units often try to duplicate the various imperfections of tape echo - rolled off high frequencies, maybe some chorus/flange type sounds from tape speed variation and often some small amount of distortion too. Of course we have super clean digital delays available as well.

Adjacent - Phase Plug-ins: Since we are in the magical digital world of today, we can play with even shorter time offsets. While reverb and echo are audible as unique sounds of their own, if you continue to reduce the amount of time delay you eventually start to just tweak the phase of the signal. Phase is measured in degrees where a sine wave consists of a total of 360 degrees - 180 for the peak, 180 for the trough - and small amounts of phase change can result in a variety of effects from subtle masking to comb filtering to making certain sounds jump out of a mix. Humans cannot hear absolute phase - but we can hear relative phase. So the phase difference between the same sound source hitting 2 different microphones will have a phase component to it that will vary with frequency (different frequencies have different wave lengths after all) If you use more than 1 mic on drum overheads or a guitar amp or a snare, try playing with the phase of one of the 2 sources to see what happens. You might surprise yourself.

References:

Intro to dBs: https://www.production-expert.com/production-expert-1/2020/8/19/dbs-explained-a-musicians-guide-to-understanding-decibels

Atlantic Studios (Hit the road jack) https://en.wikipedia.org/wiki/Atlantic_Studios

Millisecond to BPM calculator: https://tuneform.com/tools/time-tempo-bpm-to-milliseconds-ms

Al Schmitt "On The Record" https://www.halleonard.com/product/158230/al-schmitt-on-the-record

Duane Eddy's water tank: https://www.musicradar.com/news/duane-eddy-classic-interview

Hammonds original Reverb Patent (1939) https://patents.google.com/patent/US2230836A/en

Mic Sensitivity (high level) https://www.neumann.com/fr-fr/company/neumann-im-homestudio/homestudio-academy/what-is-sensitivity/

Intro to Audio Phase: https://www.uaudio.com/blog/understanding-audio-phase

Audio Engineering Basics

Recent Posts

Comments