it because many people make the following mistake: they take a technique, which works for this rare, weird, identical clone entity (so different from everything else on earth that is normally individuated), and then they apply it to these other normally individuated things anyway. Of course, that mistake is utterly invalid. You can't take a technique which works only for this weird, identical clone entity, and then even hope to apply it legitimately to normally individuated populations of other entities. Since people keep trying to do so, however, we need to keep correcting their mistake. Oh, by the way, this rare, weird identical clone entity is called a sine wave.
By looking at and then averaging 100,000 cycles of this sine wave with noise, we have developed a description of the prototypical, average sine wave without the noise. Even though we couldn't really see the sine wave well in any single sample (since it was always hidden by added noise), we have been able to infer with great accuracy what the hidden sine wave signal itself must look like, via the averaging technique. In other words, the averaging technique doesn't actually strip away the noise on any one of the 100,000 input sine wave cycles we gathered in, and doesn't allow us to see any of the input signal more accurately, with less noise. Instead, the averaging technique only infers and generates a new hypothetical model of what an average, prototypical input signal would look like. The averaging technique essentially looks for the lowest common denominator among all the many, many noisy input samples, and then hypothetically infers that the average, prototypical input signal without the noise must look like.
The averaging technique takes in 100,000 cycles and gives us only one cycle out. And the cycle it gives us as output is not a real sine wave cycle, but rather a hypothetical model of an average, typical sine wave cycle. In other words, 100,000 distinct real individuals go into the hopper, but they are discarded in a tremendous loss of information, and out comes only one entity that is not even a distinctive real individual but instead is merely a hypothetical pastiche, a lowest common denominator generalization.
Then comes the time to generate the output signal. We have only one model cycle upon which to base the entire span of 100,000 cycles of the output signal. So all of the members of the population of our output signal, for this entire span, will be mere uniform clones, the same as one another. None of them will be individuals. Furthermore, the model on which all these clones are based is itself not even a real individual input signal, but instead is merely a lowest common denominator average of 100,000 input individuals, whose individuality has long since been discarded. Note that this is actually a double insult, a double loss twice removed from true individuality. The output signal samples are not individuals (as they should be for real music), but rather are merely all clones, and furthermore what they are a clone of is itself merely a generalization rather than a true individual. It's as if the government first ascertained that the favorite food on average was hamburger, and then forced everyone to uniformly eat nothing but hamburger all the time.
In the Faustian bargain to achieve our goal of quieting noise, we have had to give up all the information we knew about 100,000 individual samples as individuals, and treat them generically as if they were all the same and also no better or different than the lowest common denominator average that summarily characterizes them generically (from Faust to Orwell in one tragic misstep). We have had to discard all this wealth of information about 100,000 individual samples and collapse it all into a generalization about one hypothetical model sample. Then we have to re-expand the population back to 100,000 members. But since we now have only this single generic model as information, all the re-expanded 100,000 members will be identical clones of the single generic model. Furthermore, since the single generic model has no individualistic characteristics left, but instead represents only the lowest common denominator average of 100,000 individuals, it itself will be characterless and generically bland, and therefore all its 100,000 clones will be characterless and generically bland.
It's as if we took 100,000 different complex beautiful flowers, collapsed them all into a common vat of dust by drying and grinding them into powder and mixing up the powder - and then we tried to re-expand this dust into 100,000 members of a population by adding water to reconstitute this dust and form 100,000 identical blobs of flower dough. All 100,000 blobs of dough would be the same as each other, and none would have the striking individualistic characteristics that made each original individual flower beautiful.
These are horrible crimes against an input signal, and horrible crimes to perpetrate on a created output signal destined for human listeners, and horrible crimes for any system that pretends to be a recording/reproducing system. There's a huge loss of individualistic information from the input signal samples. There's the creation of output signal samples that are identical clones. And there's the modeling of these output clones upon a bland, characterless lowest common denominator generalized average, rather than upon a true exemplar or paradigm individual.
But none of these horrible crimes matter if the input signal is a single sine wave. If the input signal is a single sine wave, then all the input samples (appropriately timed) are identical clones already. They have no differentiating individualistic features (other than perhaps the added noise, which we want to get rid of). Also, if the input signal is a single sine wave, then of course the output signal will also be a single sine wave, and it too should consist of only identical clones, so it's perfectly OK if we create 100,000 identical clones from a single model as an output signal. Furthermore, a sine wave is a very simple signal, indeed arguably the simplest possible AC signal, so it already as bland and characterless as you can get. Therefore, the bland, characterless lowest common denominator generalized average we use as a model, for creating all 100,000 output clones, really hasn't lost any practically meaningful individuality relative the 100,000 individual member samples of the input signal. In other words, the individual members of the input signal were already so boringly simple (and so uniform), that generalizing their characteristics into a lowest common denominator average really didn't lose any interesting information.
Of course, you can't say these same things about populations of most other entities. Take 100,000 individual humans, or flowers, or musical sounds in a symphony, and you'll see striking and complex individuating characteristics. With real music, every instant is a different individual, with complex, multifaceted differences individuating it from its neighboring instants. A single sine wave is a very special, singular case - not at all typical or representative of the rest of the world.
Therefore, it is tragically misleading to use a single sine wave as a representative example in the context of music recording and reproduction. The worst danger is that someone will see this HFN article that gives such impressive results for a sine wave, and think he can get the same improvements on music by using aggressive averaging, so he goes ahead with the aggressive averaging and winds up butchering the music by making too much of it sound like a repeated clone of its averaged lowest common denominator self.
This of course is exactly the tragic mistake the DSD/SACD designers made. They aimed for a 1 bit digital system with only 6 bits of intrinsic resolution, and then they were forced to aim for a dramatic amount of noise quieting and resolution enhancement via averaging, in order to make the system's performance saleable. Indeed, it's a mistake to aim for any specific degree of improvement in noise quieting or resolution enhancement, and then have to do aggressive enough averaging to achieve that goal. This effectively commits two sins upon real music: it characterizes a long input span of music by a single number, an average, a lowest common denominator; and then it tyrannically imposes this uniformity on us by creating a long output span of music samples that are all mere clones of this lowest common denominator average. And that of course is precisely what DSD/SACD indeed sounds like: a smoothed down series of cloned samples which are modeled on a lowest common denominator average, and which have lost much of the distinct individuality that each moment and transient had in the original music signal.
The averaging technique can be applied beneficially to a music signal, but only in limited ways, if we are to avoid destroying the very music we are hoping to improve. And naturally these limited applications of the averaging technique will only provide a limited degree of noise quieting and resolution enhancement. The averaging technique can't provide for music nearly the benefits it can for a single sine wave, because those hidden assumptions, which encouraged us to use averaging over a wide span of say 100,000 samples to get dramatic improvements in noise quieting and resolution enhancement, apply to a sine wave but don't apply to music. With a real music signal, we can't gather in a wide span or duration with many, many samples, on pain of discarding individualistic musical information and changing the face, heart, and soul of the music into a simplistic uniform cloned repetition of merely its lowest common denominator averaged self.
With a real music signal, preserving the integrity of the information already there has to be our primary goal, thereby making a secondary goal of any improvements we might realize in noise quieting and resolution enhancement. What's the good of seeking to enhance information beyond what's there, if we have to discard musical information that's already there? That's self defeating.
We know that in a real music signal there are unique, singular transients which are never repeated. The sample taken at the instant a guitar is plucked, a triangle is struck, etc. is a unique, individualistic sound sample, different from its neighboring samples. And it cannot be averaged or commingled with its neighbors if we are to preserve its musical integrity. Nor can we ever accurately recreate that unique transient instant if we generate an output signal composed of a series of identical clone samples.
Incidentally, it's worth mentioning in passing that most averages are in practice computed as a sliding or running average. This doesn't significantly change the fundamentals of the cloning problem we're discussing here. It only means that the clones differ slightly instead of being absolutely identical. But the fundamental problems remain. If the input span for the running average is too wide, then there will be loss of individualistic musical information, and the output signal samples will be too alike, and the output samples will be too much like a lowest common denominator generic average.
The Limit for Legitimate Averaging
Obviously, then, we have to limit the averaging technique so that we limit the scope, span, and duration of how many samples we gather in to sum up and average, in our hoped for attempt to achieve that long run which allows self-cancellation of random noise (as we discussed at the beginning). We're caught between a rock and a hard place. The longer run of input samples we can gather into one average, the better noise quieting and resolution enhancement we can achieve. But if we overstep a limit in this quest and gather in too long a run with too many samples, then we'll start discarding important individualistic information about singular musical transients. We will be creating a distorted, averaged version of the input music signal, with less true information in it, not more.
So where is the limit? There's a simple fact which can be our guide. At any given frequency, each half cycle of information in a music signal is an individual, and can be different from neighboring half cycles. That of course is in diametric opposition to what we saw in the single sine wave, where we knew in advance that every half cycle was already an exact clone of all its neighbors. Clearly, we need to preserve this individuality of each musical half cycle if we are to preserve the integrity of the true musical information already encoded in the input signal.
How do we preserve the individuality of each half cycle of a music signal? We can simply turn to the Nyquist criterion, which tells us how. The Nyquist criterion dictates that we need at least one sample point, i.e. one number, per half cycle (at a given frequency), in order to be able to digitally preserve all the information in a waveform (assuming a suitable reconstruction filter is available).
This tells us clearly and simply that the span of samples gathered in for our average can only be as wide as a half cycle at any given frequency.
That's because, as we discussed above, a calculated average gives us only one number to characterize the whole span of input samples gathered to be averaged together. And we now know that we need at least one distinct (unique, individual) number per half cycle, in order to preserve the individuality of each half cycle and thereby preserve all the information in a waveform. Thus, if we are to preserve the integrity of the music signal through the averaging technique's process, we need to come up with a fresh number, a fresh average for every new half cycle (at any given frequency), which simply means that the span of samples gathered in for that fresh average cannot be any wider than that half cycle of music (at that given frequency).
If we tried extending the averaging span beyond a half cycle, hoping for more dramatic noise quieting and resolution enhancement, we'd wind up with one number (the average) attempting to characterize more than half a cycle. This means that we'd be forcing neighboring half cycles to be clones, which would destroy genuine individualistic musical information, and would smooth down the music toward being merely repetitive clones of an average lowest common denominator.
From this guideline, we can develop some easy practical rules to follow, again obeying the Nyquist criterion. The Nyquist criterion tells us how much individualistic information is necessary to define or reproduce the waveform of real music. If we don't violate this criterion, we won't destroy the music. The Nyquist criterion says that we need at least one individualistic sample per half cycle (at any given frequency), to preserve the individualistic information about that half cycle (as distinct and different from other neighboring half cycles, which in real music are not clones).
Suppose we're working with a digital system that oversamples. An oversampling system samples the input data more often, and generates more samples, than the minimum required by the Nyquist criterion. Thus, an oversampling system deals with more than one sample point per half cycle of the input signal, at a given frequency. For example, if the given frequency we're reproducing is 20 kHz, the Nyquist criterion requires a minimum sampling rate of 40 kHz, in order to provide the required one sample point per half cycle. For practical reasons, this 40 kHz must be made slightly higher, usually 44.1 kHz or 48 kHz.
If our digital system oversamples the music signal at any given frequency, then we will have more than one sample per half cycle at that frequency. We may average these oversamples together, to yield one unique number per half cycle, to define that individual half cycle. For example, if our digital system oversamples at 176 kHz or 192 kHz, then it will be sampling musical information at 20 kHz four times more often than the Nyquist criterion requires it to. This oversampling system will have four sample points for every half cycle of musical information at 20 kHz. But the Nyquist criterion tells us that we only need one sample per half cycle, to completely define that individual half cycle as unique and distinct from all its neighboring half cycles, and to preserve all its information. Therefore, in this digital system it would be legitimate to average together four adjacent samples, to produce just one number to individually characterize each half cycle at 20 kHz.
There would still be some loss of information, in the collapse of four individualistic samples into one number that represents only their average, lowest common denominator shared characteristics. But this information loss would only be relevant to ultrasonic information, beyond 20 kHz, that might be represented by the individualistic differences among the four samples. So far as the 20 kHz frequency is concerned, one sample per half cycle can contain all the information that needs to be preserved, so there need not be any relevant information loss in collapsing four sample point numbers into one number via averaging.
And of course there could be some benefits to averaging these four sample points into one. We could achieve some noise quieting and some resolution enhancement. These benefits would be very modest, because averaging together four samples won't give us nearly the noise quieting and resolution enhancement we saw when we averaged 100,000 samples. But that was a single sine wave whose half cycles could be endlessly cloned, while this is real music, each of whose half cycles needs to be preserved as a distinct individual.
If we are able to engineer a more complex averaging algorithm, we might be able to effectively design a shaped averaging curve, by which we do different amounts of averaging at different frequencies. For example, if our oversampling digital system handles samples at 176kHz or 192 kHz, then there are 4 samples per half cycle of musical information at 20 kHz, as we just discussed. But there are also 8 samples per half cycle of musical information at 10 kHz, and 16 samples per half cycle of musical information at 5 kHz, and 32 samples per half cycle at 2500 Hz, and 64 samples at 1250 Hz, and 128 samples at 625 Hz, and 256 samples at 312 Hz. We could legitimately average together that many samples at each musical frequency, without sacrificing the integrity of the original musical information. And of course the more samples we can average together, the better improvements we can legitimately obtain in noise quieting and resolution enhancement.
It's especially telling to focus our attention on the lower frequencies, where the greatest benefits can be legitimately obtained. Note that 312 Hz is still a higher frequency than middle C, which is the heart of any musical melody. By the time we get down to middle C, the heart of the music, we can average together more than 256 samples, without sacrificing any individualistic integrity of the original music signal. That could possibly be the equivalent of over 8 more bits of resolution enhancement, thus taking a 16 bit system to over 24 bits of effective resolution at middle C.
Noise quieting also can improve at these lower frequencies, and indeed can give us benefits in other areas beyond resolution enhancement. Any type of noise present in the signal can be reduced, whatever its cause. Various causes for noise of course include noise per se and quantization error (from limited bit resolution), but also can include various amplitude errors and distortions, and various timing distortions (which produce amplitude errors). Thus, for example, the noise quieting could also effectively reduce the effects of input signal jitter, some errors from nonlinear digital encoding, etc. In other words, averaging can ameliorate a wide variety of sins, including sins peculiar to digital systems (not germane to analog systems).
Thus, averaging can help a digital music signal to be more accurate to the original music signal, to sound more like real live music, and to sound more like great analog. The best upsamplers and
(Continued on page 51)