fit a smooth curve through one dot. A calculated average of a single number is simply that number itself, and thus can't give you any improvement over that number.
Averaging provides progressively more benefits at progressively lower frequencies, since at lower frequencies there are more sample dots per cycle to average. The more dots there are in a statistical scatter, the more accurately a waveform curve can be re-created that fits the original musical waveform curve that spawned the scatter. Averaging can improve re-created waveform accuracy at 10 kc more than at 20 kc, and it can do even better at 5 kc. For every doubling in the number of sample dots (i.e. every halving of frequency), averaging can yet further improve bit resolution, transparent revelation of subtle inner detail, and natural musicality (the degree of improvement per octave depends on the curve engineered for the averaging algorithm, a subject beyond our scope here). This is good news for sonic improvement at music's middle and low frequencies, but bad news for music's highest frequencies at the top of the passband, still within the audible spectrum. That's why, for example, some 1 bit digital systems that have such absurdly crude native resolution, and rely so heavily on averaging to improve resolution even to passable sonic quality (e.g. Bitstream and DSD-SACD), nevertheless still have very poor resolution in music's upper trebles, and this is obviously audible as very defocused, incoherent, smeared, veiled, and noisy sound quality in their upper trebles. If we could get just a few more samples to average at music's highest frequencies (still within the audible spectrum), we could improve the resolution and sonic quality up here too, and this could provide benefits to all digital systems within the audible spectrum, including both 1 bit systems and 16 bit PCM.
Solving the Four Problems
All four digital problems, just discussed above, adversely affect the accuracy with which the music waveform is re-created, and well within the audible range, below 20 kc. If we could ameliorate these digital problems, we could re-create the original music waveform more accurately. This would give us improved bit resolution, better transparency revealing more inner detail (with better stereo imaging), cleaner purity with less distortion, and more natural musicality since the more accurate waveform would naturally sound more like real music and less like artificial digital. And all these sonic improvements would be audible throughout much of the musical spectrum.
But how could we possibly ameliorate these various problems within the audible spectrum? Might there perchance be one miracle silver bullet that could somehow address all these digital shortcomings, even though they are diverse in nature? Yes.
To improve these practical digital problems within the audible range, we should sample the music at a higher sampling rate!
But how on earth could a higher sampling rate improve the sonic quality of digital within the audible spectrum? How is one even relevant to the other? Doesn't a higher sampling rate merely extend the passband, and for that matter extend it merely into the ultrasonic region where we can't hear any benefits from such extension anyway?
Let's take a look at what a higher sampling rate could do for each of the four digital problem areas just discussed above. For simplicity, let's assume we double the sampling rate, from 44.1 or 48 kc to 88.2 or 96 kc.
The first problem just above was that sampling at just 44.1 kc or 48 kc forced the analog anti-aliasing filter to be very complex and steep. So here we have a digital system forcing a degradation of the analog music signal before it is even digitized. If however we double the sampling rate to say 96 kc, then we can employ a much gentler analog anti-aliasing filter. This filter could leisurely start above the 20 kc musical passband, and then not have to be significantly down until the 48 kc Nyquist frequency, a span of over an octave (in contrast, with 44.1 kc sampling the filter has to achieve even greater rejection in the small span from 20 kc to 22.05 kc). Additionally, with rejection not needed until 48 kc instead of 22.05 kc, this filter could be even gentler than you might think, since other links in the recording chain provide additional filtering (e.g. microphones, which die above 25 kc). This much gentler anti-aliasing filter would degrade the analog signal far less, since mathematically it would introduce nearly as much phase distortion, and also since the far smaller parts count would mean that the analog music signal would have to traverse far fewer fidelity degrading parts in the signal path. It's worth noting that the benefits here are disproportionately large; doubling the sampling frequency cuts signal degradation to much less than half, because the filter can be so much gentler. That's one reason why many of today's digital master recording systems employ higher sampling rates in their initial digitizing, even if the signal is ultimately destined to be transported to you via a medium with only 44.1 kc sampling such as CD.
The second problem above was that we cannot physically make playback digital reconstruction filters with the ideally required boxcar shape, and therefore even our best playback filters are imperfect at filling in the gaps left by the sketchy clues. As you recall, the sample dots coming off a CD at a 44.1 kc sampling rate are sufficient to outline the correct music waveform only up to about 2 kc. Above 2 kc they become inadequate, and at progressively higher frequencies they become progressively more inadequate, since they become fewer and farther between relative to a cycle at progressively higher frequencies. Thus, above 2 kc we can no longer simply connect the sample dots to correctly outline the original music waveform, and at progressively higher frequencies above 2 kc any connect-the-dots model becomes progressively worse at accurately giving us the original music waveform, making progressively worse mistakes. Therefore, above 2 kc we must begin relying on an alternative powerful algorithm to correctly fill in the gaps among the too sketchy data points coming off the CD, and at progressively higher frequencies above 2 kc we must rely on this alternative powerful algorithm more and more. This alternative powerful algorithm is of course the boxcar reconstruction filter, whose job it is to correctly re-create the original music waveform, especially above 2 kc where the sample points coming off the CD become progressively more inadequate. So we begin relying upon the digital filter somewhat in the 2-4 kc octave, and we rely on it more in the 4-8 kc octave, and yet more in the 8-16 kc octave. The only problem is that this digital filter, upon whom we begin relying above 2 kc and upon whom we progressively rely more and more at higher frequencies, is itself imperfect, and we cannot physically build the ideal boxcar digital filter that the Nyquist theory requires.
How nice it would be if we didn't have to rely so much on the digital filter to reconstruct so much of our musical spectrum, all the way from 2 kc to 20 kc. If we could rely on it less, then its inevitable errors would not degrade our music waveform as much. And that's precisely what happens when we raise the sampling rate! If we double the sampling rate on the digital medium to 96 kc, and deliver this higher sampling rate to the consumer's playback unit, then the sample dots coming off this digital medium will be twice as frequent, so they'll be adequate to outline the correct original music waveform up to 4 kc instead of merely up to 2 kc. Thus, with 96 kc sampling we won't have to rely on the imperfect digital filter at all in the 2 kc to 4 kc octave. This octave is crucial to music and is also an octave in which our hearing is very sensitive and discriminating, so we can predict that better waveform fidelity in this octave will have all kinds of audible sonic benefits (including those benefits which people in fact report hearing, such as better inner detail and more natural musicality).
Then, in the next octave up from 4 kc to 8kc, with 96 kc sampling we'll only have to rely on the imperfect digital filter half as much (approximately) as before, to fill in the gaps among sample points, since the sample points occur twice as frequently. So the musical waveform fidelity in the 4 kc to 8 kc octave will be (approximately) twice as good from 96 kc sampling as it was with 44.1 or 48 kc sampling, since we will be relying upon the imperfect digital filter only half as much to fill the gaps and re-create the music waveform. Of course, the 4 to 8 kc octave is also musically crucial and still in the region of peak human hearing acuity. And then, the octave from 8 kc to 16 kc will likewise be approximately twice as accurate and sound approximately twice as good with 96 kc sampling instead of 44.1 or 48 kc sampling. Note emphatically that all these sonic benefits, from superfluously doubling the sampling rate, occur plumb in the middle of the musical spectrum where they're dramatically audible, not just in the ultrasonic region above 20 kc. And note emphatically that these sonic benefits relate unexpectedly to waveform fidelity, not just to the spectral extension one expects from superfluously increasing the sampling rate beyond the Nyquist criterion. The chief sonic benefits people hear from 96 kc sampling media include improved transparency, more inner detail (with better stereo imaging), cleaner purity, and more natural musicality - all of which relate to improved waveform fidelity, not to bandwidth extension. Also, note incidentally that relying less on the digital filter means that we will be liberated from all kinds of errors it might make, including both the amplitude type of errors and also the temporal ringing interval type of errors discussed above.
The third problem just above related to a specific shortcoming of practical digital filters where a guard band is required during playback. When the highest frequency of interest is close to the Nyquist frequency (22.05 kc, with a sampling rate of 44.1 kc from the CD), then there is danger during playback of ultrasonic images creeping down into the passband and causing distortion. The Nyquist criterion is being met, but just barely. This means that the playback reconstruction filter must also be a vigorous anti-imaging filter policeman, cutting out all ultrasonic images above 22.05 kc. But, since practical anti-imaging filters cannot have infinitely steep cutoff slopes, the corner frequency of this filter must be set a ways down from 22.05 kc, usually at 20 kc, which is the top edge of the passband of interest.
Unfortunately, since the anti-imaging filter is also used as the reconstruction filter, this means that the corner of the reconstruction filter's response is also set at 20 kc. And 20 kc is the wrong frequency at which to place the corner for accurate reconstruction. For accurate reconstruction of the original music waveform, we want the corner of the reconstruction filter to be at the Nyquist frequency, which is 22.05 kc for a 44.1 kc sampling medium. As discussed above, when we set the corner at the wrong frequency, then the ringing pattern of the boxcar filter has the wrong temporal spacing between ringing wavelets, and thus this filter would supply ringing energy at the wrong time instants, thereby re-creating a temporally inaccurate music waveform. We'd like to be able to set the reconstruction filter's corner at the Nyquist frequency (half the sampling rate), but we can't so long as we also need this filter to act as a vigorous anti-imaging policeman. Is there any way out of this Catch 22?
One way might be to separate the roles of anti-imaging filter and reconstruction filter in the playback unit, probably an expensive proposition. Is there another way? Yes. Raise the sampling rate! What could this accomplish? If we (approximately) double the sample rate of the medium, say up to 96 kc, then the playback filter no longer needs to act as a strict anti-imaging policeman.
Why not? In digital systems, the unwanted first image is essentially a mirror image of the wanted audio spectrum, and so comes down from the sampling rate frequency toward the Nyquist frequency, even as the primary audio signal goes up from 20 hz to 20 kc and then toward the Nyquist frequency. For example, with the CD medium and its 44.1 kc sampling rate, the Nyquist frequency is 22.05 kc, and as the primary music signal generates ever higher frequencies going up toward 20 kc, its mirror image comes down from 44.1 kc, with ever lower frequencies coming down toward 24.1 kc (which is 44.1 kc minus 20 kc). The danger here lies in the fact that some musical sounds (cymbal sounds, vocal sibilants, etc.) naturally contain lots of very high frequency spectral energy, up close to the 22.05 kc Nyquist frequency (and up close to the presumably 20 kc anti-aliasing filter cutoff in the digital master recorder). If the original music signal has lots of energy up at 20 kc, then its digital mirror image in playback from a 44.1 kc medium will have lots of energy down at 24.1 kc. We can't leave this image energy in the system, especially so close in frequency to the primary music signal, because the two signals will beat against each other, causing bad distortion at the difference beat frequency (4.1 kc), which would be highly audible and presumably obnoxious. So we must resolutely eliminate this image energy at 24.1 kc. But no practical filter can be steep enough to have its corner at 22.05 kc (the correct frequency for its reconstruction role) and yet have adequate image rejection by 24.1 kc. Thus, most playback filters are designed for needed adequate image rejection, but suboptimal reconstruction accuracy, by placing their corners at 20 kc instead of the correct 22.05 kc Nyquist frequency.
Now, how does all this change when we (approximately) double the sampling rate to 96 kc on the medium? With a sampling rate of 96 kc, the first digital mirror image now comes downward from 96 kc instead of downward from 44.1 kc, and they meet each other at 48 kc. Since they meet at 48 kc, this means that the music signal would have to reach upward for a 48 kc frequency span before it met ts image coming down the corresponding 48 kc frequency span from 96 kc. Thus, with a sampling rate of 96 kc, the recorded primary music signal would have to have lots of energy up around 48 kc before it got near its mirror image. Of course, the rolloffs of the microphones and the anti-aliasing filters at the beginning of the digital recording chain guarantee that the music signal on the medium won't have any meaningful energy up near 48 kc. So beat distortion from unwanted digital mirror images ceases to be a problem if we simply switch from a 44.1 kc sampling rate to a 96 kc sampling rate. And then we don't need to worry much about employing a strict anti-imaging filter policeman. We no longer need to force the digital reconstruction filter to also act as a strict anti-imaging filter. This therefore frees us to finally be able to optimize the corner frequency of the reconstruction filter, by setting it correctly at the Nyquist frequency (now 48 kc).
Incidentally, with its corner frequency set at the Nyquist frequency of 48 kc, the reconstruction filter is still acting as a potent anti-imaging filter over most of the spectrum. It is only leaving relatively unguarded the small segment just above its Nyquist frequency corner, where its non-infinitely-steep cutoff slope has yet to achieve much rejection (from 48 kc up to say 52 kc). This unguarded narrow 4 kc span could get us into trouble if a music signal were to come along with lots of energy in the narrow span from 44 kc to 48 kc, but that's not likely. In contrast, with a 44.1 kc sampling rate we could not afford to set the corner at the correct 22.05 kc Nyquist frequency and thereby leave the 2 kc span from 22.04 to 24.01 kc unguarded, since the primary music signal could easily still have lots of energy in this region.
In sum, doubling the sampling rate on the medium to 96 kc has an unexpected and fascinating byproduct. It allows us to correct the temporal errors that the digital reconstruction filter was committing. Since we are now free to set the filter's corner at the correct frequency, its ringing pattern will now have the correct temporal spacing to supply ringing energy at the correct instants instead of at the wrong instants, when re-creating the original music waveform. Thus, one whole type of reconstruction filter error can be entirely eliminated (not merely ameliorated), by the simple expedient of raising the sample rate. It was an insidious type of temporal error, which was re-creating temporally warped music waveforms, so it is all the more remarkable that such an error can be eliminated by the simple and seemingly unrelated tactic of raising the sampling rate.
The above three problem areas can also be ameliorated by high power averaging. But if we can first solve or reduce these problems in other ways, such as by raising the sampling rate, then high power averaging has a more accurate musical waveform to begin with, and upon which to wreak its magic. And so the results are even more sonically wonderful when we combine both high power averaging and also raising the sampling rate above the Nyquist minimum.
The fourth problem above relates to limitations in averaging itself as a tool, even high power averaging. With each progressively higher octave, averaging has only half as many sample points to average, so it can't do as much of its magic. When we reach the Nyquist frequency, averaging pretty much runs out of steam, since there is now only one sample point to average with itself. If the Nyquist frequency is near the top edge of the passband (as 22.05 kc is near to 20 kc), then the top edge of the musical passband won't be improved much if at all by averaging. It would be nice if we could give averaging more muscle in the upper reaches of our musical passband, so it could wreak its magical improvements in musical accuracy there as well, just as it does so well at lower musical frequencies.
Is there a way to do this? Yes. Increase the sampling rate! If the averaging algorithm has twice as many sample points to average for improving a given audio frequency, then it can do at least twice as good a job, at that frequency, of reducing various digital errors and improving the accuracy of the music waveform. High power averaging algorithms can do even better than twice as well, depending on the curves engineered into the algorithm. If we double the sampling rate, we double the number of sample points per cycle at every audio frequency.
The most spectacular degree of empowerment for the averaging algorithm comes at the top edge of the passband. If our sampling rate just barely meets the Nyquist criterion, e.g. 44.1 kc sampling rate for a 20 kc passband, then the musical information at the top edge (20 kc) of the passband will have essentially only one sample per half cycle, so the averaging algorithm (as we've discussed it thus far) won't be able to improve the fidelity much if at all at these highest audio frequencies. But if we double the sampling rate so that it superfluously surpasses the Nyquist criterion, then the averaging algorithm will have plural samples to average and wreak its magic upon, at even the highest audio frequencies around 20 kc. Thus, for music's highest audible frequencies, superfluously doubling the sampling rate empowers the averaging algorithm to go from virtually zero improvement to a significant improvement, which of course is far more than twice as good (it's actually infinitely better, since some improvement divided by zero improvement is infinity).
At musical frequencies below the top edge of the passband, even small improvements are far more audible, since our hearing is more sensitive. So, if doubling the sampling rate empowers the averaging algorithm to improve musical fidelity by at least a factor of two over most of the audio spectrum, compared to its improvement powers with half the sampling rate, then we have achieved a very important improvement in musical fidelity over the entire musical spectrum, simply by
(Continued on page 30)