Digital audio is audio which has been sampled at regular, discreet intervals and
converted to numeric data. All the original audio information that lies
between the sample intervals is, of course, lost.
Delta Sigma ("new hotness")
Delta Sigma analog/digital converters are rapidly gaining
popularity in affordable digital-processing chips. They involve sampling
the audio signal at extremely high frequencies (usually 64 or more times higher
than than the upper limit of the audio frequency) and storing the data as a
1-bit stream. This stream is basically the image of the audio signal with
a lot of higher-frequency noise superimposed over it and represented as a
sequence of on and off bits. The sampling frequency is so high that the
original analog waveform is approximated by dithering the on/off bits over
the time axis, with longer "on" times for the positive side of the wave, and
longer "off" times for the negative side. To convert it back to analog,
just pump the stream into an analog integrator (a simple op-amp circuit) and
filter out the noise. The on's and off's then average out as the original
audio wave form
again. It's not perfect, but I think a very "tape-like" sound quality can
result from just the right amount of low-pass filtering. If
the sample rate is sufficiently high, the added noise is above the audio
frequency,
so the processed audio has plenty of clean bandwidth.
Learn more about Delta Sigma:
http://en.wikipedia.org/wiki/Delta-sigma_modulation
http://www.beis.de/Elektronik/DeltaSigma/DeltaSigma.html
http://www.dsdproaudio.com/html/dsd_sacd_explained.html
44.1 kHz ("old and busted")
At the top limit of the human audio spectrum, 20 kHz, a sampler
digitizing at the standard CD-quality rate of 44.1 kHz only measures the signal roughly
2 times per cycle.
The problem I have with 44.1 kHz is that sampling a signal only
slightly higher than 2 times
per cycle is the bare theoretical minimum requirement to produce "accurate"
samples. I use quotation marks around the word "accurate" because actually
information is lost and errors are introduced during the sampling process.
The only reason the word "accurate" can be used is that the original waveforms
can be reproduced using complex algorithms, assuming that the original source
material consisted only of sine waves and no frequencies greater than
1/2 the sample rate. Much of the original detail
must be replaced by a mathematically reconstructed version during playback.
Theorists will tell you that this is "perfect" reproduction (ignoring
the distortion caused by the various filters which process the information
during its conversion from analog to digital, through various up-sampling and
down-sampling intermediate steps, and back to analog, again through
various intermediate steps--all of which rely on imperfect technology that is
forced to make compromises along the way).
The frequency range of human hearing lies roughly between
20 Hz and 20 kHz. Many species and some humans can hear
frequencies beyond this range, but there are several other reasons for
sampling audio at much more than just twice the highest audio frequency. One reason
is the
Hypersonic Effect. Another reason is that the anti-aliasing
filters would be much less critical and do their work above the audio
spectrum, where they could cause less damage.
|
One huge problem is the collision
between what is theoretically possible and what technology actually
produces.
|
Time Smearing
As a first step, a sampler must filter away everything higher in frequency
than 1/2 the sample rate. A steep-slope low-pass filter such as is needed
to pass 20 kHz and block 20.05 kHz would cause distortions in the phase of the
signal as the frequency increases. What this means is that the higher
frequencies will be shifted somewhat in the time domain relative to the lower
frequencies. This will have some audible effect on the sound itself.
(I think high-hat cymbals, for example, sound somewhat fake and plastic when
digitized; the sizzle is gone, IMO) Even more apparent to the human ear
will be the 'smearing' effect on the 3-D localization of a stereo image.
The human ear is very sensitive to differences in time between the left and
right ears. If the higher frequencies arrive just a tiny bit later than
the lower frequencies, the 3-D soundstage will be less detailed and less
natural.
Jitter
Another source of of sampler error is clock jitter. The very fast, precise
timing required to do our 44.1 kHz sampling rate should take one sample every 22
microseconds; but instead the point in time when the sampling occurs can vary by
50 nanoseconds. More or less random inaccuracies in the timing of the
sampler smear the audio image and give it a cheap, artificial quality.
Aliasing, Errors, Noise, and Loss
The sampler will detect and store data less and less accurately as the input
signal rises in frequency, and this is further complicated by quick,
high-frequency transient spikes (such as cymbal crashes). Everything above
the 2 kHz frequency range is subjected to increasing degrees of sampler
distortion as the frequency rises (because the sampler will rarely encounter a
higher-frequency signal at the exact top or bottom of its cycle).
Sampling and Data
Loss
Analog input

Let's say this a 5 kHz audio signal

Now, if we add a 16 kHz signal to it, we don't simply add the two
waves. Some parts will sum together, and some parts will subtract
from each other...

So the result of adding the 16 kHz to the 5 kHz might look something like
this.
Sample Points

Now if we sample this signal at 44.1 kHz, these are the points we'd end up
with.
Decimation

The sampled audio is a now series of numbers, which would not sound
too good if it were not for the mathematics involved in bringing it back
into the analog domain. Note that the sharp corners of the samples
and the differences between the formerly smooth wave and the stair-step
patterns are called "quantization noise."
This quantization noise must be filtered out at the digital-to-analog
conversion stage.
Reconstruction

Modern Digital-to-Analog filters
up-sample the data at a much higher rate than they were originally sampled
at in order to fill in the spaces
between samples with curves using the
sinc function.
Here's roughly how the sampled audio might look after being converted back
into analog by a digital interpolation filter. (forgive my
crude rendering) It does a very good job of approximating the analog waveform,
considering that much of the original data was lost in the sampling process.
In fact, it will probably do a better job than my drawing.
I will
try to create these curves using the actual sinc function soon (looking
for some free software to run the numbers through); this
rendering is just using simple curves. |
A high-quality digital-to-analog system can re-create surprisingly accurate sine
waves based on even scant information from the digital samples by using digital
filters which re-create smooth, curved transitions between the samples, rather
than jagged stair-step patterns. What if the original data was much more complex
than a simple sine wave, though? No doubt a great deal of lost high-frequency
audio detail, ambience, and texture can never be re-created in this way, since
the only information stored for each sample is the amplitude, not the actual
angle of the rise or fall of the wave. The angle can only be calculated by the
data before and after each sample, filling the lost pieces with smooth curves
derived from sine-waves. All the original audio data in between each sample
point has already been lost forever. Hence the "deadness" that is ascribed to
digital audio by audiophiles. Digital has the ability to reproduce simple sine
waves quite well, but more complex and textured wave forms must necessarily be
simplified and smoothed out somewhat by the conversion processes, and,
ultimately, not enough data is being saved at the currently-popular sample rates
to ensure detailed reproduction of real-world audio. Real-world audio will
contain waveforms that are not only additive mixtures of various pure
frequencies, but also phase-shifted components (frequencies which are subtracted
rather than added, in varying, frequency-dependent degrees), subtle harmonic
overtones and ambient nuances, very fast transitory spikes, and asymmetrical and unnatural wave forms
produced by electronic instruments, making them no longer simple derivations of
the sine function. In other words, the spaces in between the samples (at
the common sample rate of 44.1 kHz) are often significant and not
always predictable.
The details the anti-aliasing filter and sampler errors leave out will
have to be inferred by the listener at some psycho-acoustic level. This
extra effort required of the listener, plus the harshness and lack of subtlety
of digitally-sampled audio, results in a quick onset of ear fatigue. The
absence of some the original harmonic and ambient information kills the feeling
of "being there." The best 30ips half-inch analog recorders can capture
frequencies past 50 kHz, re-creating a live, detailed, realistic, "present",
exciting sonic image. I remember when I was a kid, I would close my
eyes as I listened to records (especially the first few tracks along the outside
of the record, where the bandwidth is highest) and the stereo image would
transport me. I don't get that from CDs (sample rate of 44.1 kHz for a
bandwidth of just barely 20 kHz) or FM radio (32 kHz for a bandwidth of only 16
kHz).
Music as Art and Self Expression
From an electronic standpoint, an all-analog signal path presents a musician
with an instrument which is physically coupled to the output device. An electric
guitarist can think of his instrument as consisting of everything from the
strings to the amp's speakers, which then acoustically feed back again into the
strings; they are all one. There is no latency. That feeling of oneness is part
of the magic of performing music.
The feeling of playing through a digital effect is that of inputting to a system
which then creates an artificial product. The link is broken. What comes out is
not what you put in; it is something different. It might even sound awesome, but
it is still different. This can have an adverse affect on the creative
flow of a performance.
Latency
Virtually all digital recording and signal processing equipment will have
some discernable lag. Many times it is not enough of a delay to affect the
player's performance. In the case of multitrack digital recording,
latency can keep a musician's tracks from being tight and "in the pocket."
Even the best player can't make his performance sync up with a recording if the
equipment won't track and play back accurately in real time.
Cheapo Digital Guitar Effects and Playing Dynamics
There's a reason digital audio has
earned its reputation for sterility, lifelessness, and harshness. Mainly,
it's because it tends to be sterile, lifeless, and harsh. Especially when
used as an electric guitar effect, in my opinion (since most of those
applications tend to be very cheaply made). Electric guitars are generally used in
conjunction with distortion or overdrive. The most natural-sounding
overdrive devices are really vacuum tubes, which many solid-state distortions
are designed to emulate. A natural-sounding overdrive will make use of
playing dynamics and will respond to different string attacks with varying tones
and levels of distortion. Tube amplifiers are generally used with guitars
to help facilitate this desirable behavior. A low-quality digital sampler anywhere in the signal
chain between a guitar and an amplifier will remove this natural
link between the player's hands and the tone of the amp. If you must
use digital effects, keep in mind they are generally best used after
distortion effects (unless you prefer a compressed, dead guitar tone). If your amp's
input stage is overdriven, then using your digital effect in the amp's effects loop will
help to minimize its dynamic-killing tendency (assuming the digital effect can
handle the amp's line-level signal; if not, it's easy enough to add an
attenuator in front of the digital effect's input).
Best Digital Practices
Ideally, the best place to make use of digital effects is
after the guitar amp's speaker, processing either a mic'ed or a
cabinet-simulated, direct-out signal. At this point in the signal chain,
the instrument and amplifier have already done everything they need to do
together (assuming there is an acoustic interaction between the guitar and the
speaker to allow for natural feedback), so the basic tone won't be killed.
In the same way, when recording, going first into analog tape
and then converting to digital may warm up the signal and create a more
natural-sounding result.
Obviously, the higher quality your digital equipment and the
higher sample rates and word lengths you use, the better results you will get.
Technology Improvements
I think that the emerging 24
bit/96khz standard is a move in the right direction. Unfortunately,
the consumer audio CD format standard is still 16 bit/44khz for the time
being. The next big thing, however, is called DSD, Direct Stream
Digital, by Sony (see
http://www.dsdproaudio.com/html/dsd_sacd_explained.html)
By sampling a signal at 2.8 Mhz (2,822,400 per second) using only one bit
(On/Off), the audible result is virtually indistinguishable from analog, and the
capacity is increased to 4.7 GB for the same physical size disc. The
dynamic range is also increased to beyond 120db.