Klarinet - 1997 12 000235

Klarinet Archive - Posting 000235.txt from 1997/12

From: Josias Associates <josassoc@-----.com>
Subj: Sampling Parameters in Digital Audio Systems
Date: Fri, 5 Dec 1997 19:20:49 -0500

Jonathan:

During recent conversations about digital audio technology, I
posted a message reporting 12-year-old findings of an audio expert
asserting that commercial digital audio sampling rates (44.1 kHz) and
quantizing resolution (16 bits) were inadequate for the medium. When you
disputed those claims, implying that they were typical of those advanced
by "audiophile zealots," I prepared a draft of a detailed quantitative
reply on the subject. But, because the debate on the thread was then still
at its height, I decided against the knee-jerk reaction of posting an
immediate response and opted to wait until the dust settled. During this
hiatus, I've thought additionally about the subject and have solicited
comments outside the list about some of the message traffic.

Although I'd prefer not spending further time on this matter, I
view it as unfinished business requiring at least an interim attempt at
closure. Here, then, is my reply, tempered by a few weeks of reflection
and also by the inclusion of information previously unknown to me.

Before going any further, I should comment on what motivated me
to delve into this subject in detail. It certainly wasn't a passion about
audio engineering, because that's not one of my interests or professional
fortes. However, as a developer of sophisticated scientific instrumentation
for more years than I'd care to admit, data-sampling similar to what is
used in digital audio systems has been a frequent element of my stock in
trade. With that background, I was struck by the seeming inconsistency
of strongly held opinions on the subject. For example, I agreed with a
number things that Jerry Korten said that you rejected. In my case, with
all the admiration I have for your musicianship and your breadth of other
knowledge, I felt particularly hurt that your comments seemed to scold me
as though I were a callow student who was collossally ignorant of basic
science and mathematics. I refer to your comments about my relayed report
about undersampling, underresolution, and high-frequency phase distortion
as being inconsistent with the Sampling Theorem. And yet, I believe that,
however you chose to couch your disagreement, your statements about the
Sampling Theorem were valid....as far as they went. While I also commend
Mark Charette for his superb and valuable work as keeper of the flame,
I was troubled with his negative comment about Jerry Korten, whose
postings had merit. Those are the things that prompted me to write this
message.

I regret loading up the in-box memories of disinterested people, but
the idea of taking this outside the list is not practical. (I have already
offered an apology in advance about this to Neil Leupold, who has
expressed strong feelings about such specialized threads.) For those
people only mildly interested in the subject, I preface my discussions
with a comparatively brief summary. Those more interested in the subject
matter or provoked by the summary claims, as I believe you will be,
Jonathan, should read further before deleting.

SUMMARY

1. SAMPLING RATE - Based on the expectation of a useful hearing
range of 20 kHz, analog reconstruction of audio waveforms at sampling
rates of 44.1 kHz produces non-negligible amounts of several types of
high-frequency distortions that would be reduced at higher sampling and
reconstruction rates;

2. SAMPLING THEOREM VALIDITY - I have no quarrel with your
arguments about the sampling theorem. However, the analog
reconstruction process, which provides a delayed staircase approximation
to the source waveform (because it is convenient and expedient to do it
that way), departs from the kind of curve fitting to discrete data points
or from spectral synthesis that permits the perfection contemplated by the
Sampling Theorem;

3. FREQUENCY RANGE OF HEARING - While I argue that there are deficiencies
in the ability of commercial digital audio systems to reproduce audio
signals accurately out to 20 kHz, there is growing evidence that human
auditory response goes beyond 20 kHz. As an example, human auditory
perception through bone conduction extends to beyond 40 kHz (Ref. 1:
Lenhardt et al). There is also other evidence of sound perception above
the audible range (Ref. 2: Oohashi et al);

4. COMMERCIAL USE OF HIGHER SAMPLING RATES - Much of
the recording industry has been aware for some time of correctable
deficiencies in recording practices. In one example, studios are already
using higher sampling rates, and digital video discs will soon do the same
for the audio portions;

5. QUANTIZING RESOLUTION - Louis Fielder, now Chief Engineer
at Dolby Labs, reports in a series of related papers that the dynamic range
of live music dwarfs the dynamic ranges of analog and 16-bit-digital
systems. Results of tests performed about 10 years ago by the British
Broadcasting Company demonstrate that 99% of the population can hear
quantizing granularity down to a 22-bit resolution threshold (where the
granularity becomes concealed), and 1% can hear better than that; and

6. DOUBLE BLIND TESTS - Although I suspect that a number of
comparative analog/digital double-blind tests have been run, especially at
manufacturers' facilities, I do know without speculation that one such
double-blind test was performed off the same feed at Caltech in 1982. I
witnessed a convincing demonstration of the Caltech comparisons in 1985.

The balance of this posting is devoted mainly to a discussion of the
kinds and levels of distortions that result from undersampling.

DISCUSSION

If I am correct about the existence of high-frequency anomalies in
digital audio systems, how then can such effects be reconciled with your
valid statements about the Sampling Theorem?

I aim to answer this question and then describe quantitatively the
consequences to the reconstructed waveforms of sampling too slowly.
Within the constraints of the Sampling Theorem, one can presumably
curve fit the original source waveform perfectly about discrete sampling
points, preserving both amplitude and phase information. Alternatively,
one could conceivably synthesize periodic waveforms using fast Fourier
transforms, except that audio waveforms are inconveniently aperiodic.

While sampling procedures used in recording digitizers are
consistent with the Sampling Theorem up to that point, the analog
reconstruction playback process is not, because it employs a flat step
between reconstituted samples and not a discrete point value against
which curve fitting or some other synthetic procedure could be used. At
frequencies much lower than F/2 (where F is the sampling frequency),
this reconstruction process has a negligible effect on waveform
reproduction. But, at higher frequencies, this type of system not only
produces high-frequency phase distortion, it also produces non-negligible
amounts of harmonic distortion and amplitude modulation together with
attendant spurious sidebands.

These pernicious high-frequency effects are artifacts of the non-ideal
reconstruction process and have nothing to do with imperfect ADCs or
DACs. I have satisfied myself that the artifacts do exist, and others have
demonstrated that the artifacts can be heard.

The following section examines the distorting effect of the delayed
staircase approximation to curve fitting used in reconstructing sampled
audio signals. You questioned my understanding of the Nyquist Theorem
as follows:

On Thu, 13 Nov 1997, Jonathan Cohler wrote:

> This is not correct. The Nyquist Theorem says that if an analog
> signal is digitally sampled at a frequency F then using those
> digital samples one can PERFECTLY reconstruct all frequency
> components of the original signal up to a frequency of 1/2 * F.

You highlight the word "perfectly" with caps, which is valid, but you
should also highlight the preceding word "can." That is, because, although
one "can" presumably reconstruct perfectly all frequency components
up to F/2 using digital samples, the reproductive process employed in
digital audio systems "does not" do a perfect reconstruction. While it may
be possible to approach such perfection with another system, the
complexity and costs of such a reconstruction processor would be much
greater and less convenient than the comparatively simpler and more
straightforward approach now in use. As a result, there are important
limitations to being able to perfectly reconstruct waveforms with respect
to phase, amplitude, and structure, as I'll demonstrate.

Assuming perfect ADCs and DACs, the receiver DAC will extrapolate
a flat level until receipt of the next digital number. At audio frequencies
well below the sampling frequency, one can expect faithful waveform
reproduction. But that is not the case in reconstructing the top end of the
sampled-data spectrum.

As an extreme example, consider a phase-synchronous sinusoidal input
to the ADC at the Nyquist Frequency. The output will be a square wave
at the Nyquist Frequency. If sampling occurs near the nodes, the output
square wave will be nearly in phase with the source signal, except that the
amplitude will be highly attenuated. Thus, like doppler radars, which are
blind to moving targets of certain velocities, sampling systems produce
two blind phases at F/2 -- phases where the signal disappears. If ADC
sampling occurs at the waveform maxima, a full-amplitude square wave
will be produced, but phase shifted by 90 degrees. Thus, reconstructed
amplitude and phase are not automatically that of the source signal. At
nearby lower frequencies, signals will slip by the sampler so that a
maximum can be detected for purposes of a spectral display. But, in terms
of real-time audio near F/2, there will be a square-wave component
modulated at the slip rate with a peak-to-peak modulation index of 100%.
Viewed on a spectrum analyzer, one would see the signal frequency plus
side bands. At frequencies higher than F/2, aliasing occurs which
produces false signals at apparent frequencies below F/2. We'll assume,
however, that the audio input to the ADC is band limited to attenuate any
residual signals above the Nyquist Frequency.

Now consider a similar source signal at 1/4 of the sampling frequency,
which would be near Jerry Korten's 10 kHz. For the sake of illustration,
sample this waveform at the nodes and maxima. Beginning at the first
node, one gets zero for 90 degrees, plus maximum for the next 90
degrees, zero for the next 90 degrees, and finally negative maximum for
the last 90 degrees of that cycle. This full-amplitude bipolar pulsed
waveform is displaced by 45 degrees and, worse still, is rich in harmonic
content that was never there in the first place. But, fast edge transitions
and higher harmonics will undoubtedly be attenuated with a post-DAC
low-pass filter, which might possibly produce a bumpy poor-man's
triangular wave of the kind Jerry Korten saw.

Sampling of the same F/4 waveform at 45 degrees, 135 degrees,
225 degrees, and 315 degrees produces a square wave with 70.7% full
amplitude, also phase shifted by 45 degrees lagging. As before, signals at
nearby frequencies, both above and below, produce artificial amplitude
modulation (in this case with a peak-to-peak modulation index of 29.3%)
and unwanted sidebands. The analog reconstruction approaches
distortionless perfection asymptotically as the signal frequency becomes
a smaller fraction of the sampling frequency.

The question of phase distortion was raised again in your
critique, as follows:

You said:
> Wrong. "The digitizing process" has no high-frequency phase
> distortion associated with it. Some bad A-to-D converters may
> have some high-frequency phase distortion. Certainly, phase
> distortion problems are much more prevalent in analog recording
> equipment.

The problem is not with the digitizing process; the problem is with
the decoding or reconstruction process. The phase distortion of the
high-frequency components has nothing to do with imperfections of the
ADCs or DACs. It is, moreover, a byproduct of the relationship of the
sampling and signal phases at higher frequencies and the staircase
approximation of the reconstructed signal to the original signal. Also,
the problems associated with the composite output signal are not
exclusively those of phase displacement, because undesirable
attenuations, modulations, and harmonic distortions occur, as well.

Furthermore, if predigitized signals are band limited to prevent
frequency foldback of aliased signals above F/2 after sampling, something
that is usually done in sampled-data systems, reconstructed signals will
then not only have the previously mentioned distortions attributable to
the playback process, there will be an additional phase lag from the
pre-digitizing band limiter in the recording setup. If, for example, a
double-pole filter at a corner frequency of F/2 (22 kHz) were used ahead
of the ADC, the phase shift at F/4, 11 kHz, would be -53 degrees, and
that would be in addition to the -45 degrees resulting from
reconstruction-induced phase distortion.

While the industry still has a way to go to reach a Utopian ideal in
digital audio signal processing, I have no doubt that digital processing
quality, which is already superior in noise performance, can eventually
equal that of analog systems. As illogical and surprising as it may seem
to some who are first becoming aware of it, knowledge of defects in
digital audio reproduction due to undersampling and underresolution has
existed for a long time in the industry. I am pleased to learn that
manufacturers are moving in a direction to eliminate these deficiencies,
something they would not be doing unless they believed that there were both
room for improvement and competitive pressure to do so.

Regards,

Connie

Conrad Josias
Consulting Engineer
La Canada, California

Reference 1: Martin L. Lenhardt, Ruth Skellett, Peter Wang, Alex M.
Clarke, "Human Ultrasonic Speech Perception." Science, Vol. 253, 5 July
1991, pp. 82-85.

Reference 2: Tsutomi Oohashi, Emi Nishina, Norie Kawai, Yoshitaka
Fuwamoto, Hiroshi Imai, "High-Frequency Sound Above the Audible
Range Affects Brain Electric Activity and Sound Perception." Audio
Engineering Society preprint No. 3207 (91st Convention, New York
City).