had before, which was merely the tan surface color of the sand. But you also missed some information. Because you're nearsighted or color blind, you missed seeing the fact that, among the layer of uniformly grey rocks which repeat the same color over and over, there are also many individual rocks of sparklingly different color, which give startling accents to an otherwise bland, smooth, uniformly repeating color field of grey.
Thus, even though you correctly inferred the overall average color as grey, when you apply this repeating pattern average globally and uniformly, to label or to re-create the color of the entire rock layer beneath, you are making a mistake. You are taking the repeating pattern average, the lowest common denominator, and then you are painting the whole layer of rocks with that single, uniform, oversimplifying broad brush. And by so doing you have eliminated the many sparkling individual rocks with dazzling accent colors. You have eliminated the singular, non-recurring transient events like the starting and stopping noises of a violin note, and you have re-created a whole violin note which is entirely painted with the single broad brush of the average sound of the violin note for most of the duration of the violin note, which is essentially an ongoing sine wave with a few ongoing overtones, i.e. a smoothed down, blander, gentler, prettified version of the original. You have applied the principle of Orwellian society to music, erasing the individuality of singular transients, and insisting that each individual instant be more like the average of all other instants of the same musical note.
Actually, human eyesight, which has pretty poor resolution (far worse than human hearing, and far worse than an eagle's eyesight), inherently makes the same kind of averaging mistake, painting big blobs of what it can discern (given its poor, nearsighted resolution) with a single broad brush consisting of the overall average trend of individual contributing details, who lose their individuality. For example, suppose you are looking at a distant mountainside covered with intermingled yellow and red flowers. Your eyesight's resolution is too poor to resolve the individual flowers, so you only discern an overall blob that is colored.
Worse yet, your eyesight discerns only the average of the strikingly different red and yellow individuals, and then applies and attributes their overall average color, say a murky monotonic orange, to the whole blob. The reality of the field of flowers is that these flowers are strikingly different individuals, vividly contrasting punctuating accents, so that a high resolution (eagle eye) scan would sparkle with individual, singular, non-recurring transient information from the various and varying flowers (just as a high resolution rendition of a violin note sparkles with individual, singular, non-recurring timbral and textural transient sounds). But your poor resolution eyesight with its averaging function sees only a smoothed down, uniform, continuously ongoing single color in a featureless ongoing blob (just as an rendition of a violin note, which calculates the repeating pattern average sound trend for the whole note and then applies that average with a broad brush to the re-creation of the entire note, will create a smoothed down, uniform, ongoing simplistic sound).
Worst of all, your poor resolution eyesight with its averaging function not only fails to see, and not only smoothes down the detailed information that is there, but also actually sees wrong. Your eyesight's averaging function, imposing the Orwellian average upon all individuals, sees only orange. But in point of fact there is not even a single orange flower in this whole field. There are only red flowers and yellow flowers. So your poor resolution eyesight's averaging function makes the mistake of seeing something that isn't actually even there, and of changing the appearance of what is in fact there (just as DSD/SACD fails to reproduce violin or close-miked guitar timbral and textural sounds that are really there, and then changes the sound of what is in fact there by creating a fictional new, smoothed down averaged violin or guitar sound that the microphone never picked up).
Incidentally, pointillist painters take purposeful advantage of these mistakes made by our eyesight's poor resolution and averaging function. When you stand at the properly far viewing distance, your visual resolution can't tell that the painting is actually composed of individual dots of paint, since your eyesight can no longer resolve the individual dots at that distance. So instead your eyesight's poor resolution sees only smooth larger shapes or blobs. Since your eyesight can't see the individual dots, your eyesight also can't see that they come in strikingly different and varied colors. Now, here comes the most interesting part. Your eyesight's averaging function creates a pretty uniform single color for each larger shape or blob composed of hundreds of individual dots actually having strikingly different strong colors. That single color created by your eyesight's averaging function is pretty much an average of the many varied colors of the dots actually in the field. That created single color, being an average, will almost always be a paler, smoothed down, gentler, more pastel color than the individual dots, which are often stronger colors. The individuality of the strikingly vivid, differently and strongly saturated color dots (with their individual sharp transients) is lost, and replaced by a watered down, more pallid, uniform, continuous single color for a whole field or area of the picture. Most interestingly, that created single color, representing the average manufactured by your vision, is actually a totally false and spurious creation, since no single dot in the field on the painting actually has this color. After you view a pointillist painting at the proper distance, it's quite a shock when you come close not only to see the individual transient dots, but also to see that their colors are so strikingly strong and saturated and varied, are so sharply contrasty from one another, and are so very different from the single, gentler color your vision manufactured as an average. Of course, pointillists intentionally exploit this mistake, made by the averaging function of human vision with its inherent averaging function, as a tool of their art; the human act of perceiving and the mistakes thereof become part of the work of art itself, and the perceiver assumes an active role in helping to create this work of art, instead of merely being the usual passive perceiver of a work of art (pointillist art would lose its effect if an eagle were viewing it).
We can now return to our poker chip analogy from before. Recall that, on our 32 mile trip that represented the total maximum amplitude of the music waveform, we could discern the color of each individual poker chip in a 32 mile high stack that was laid down alongside the road, if we had 24 bit resolution like DVD-A. But with only 6 bit intrinsic resolution like DSD/SACD, we could not even discern objects alongside the road unless they were at least half a mile big. If each poker chip had a strikingly different color from its neighbor, just like the individual neighboring transients that sparkle with vividly different sonic life in the sound of real live musical instruments, then we could see and appreciate all these colorful variations if we had high resolution vision like 24 bit DVD-A. But with only crude nearsighted vision, equivalent to the 6 bit intrinsic resolution of DSD/SACD we could at best only create an average overall single uniform color for these poker chips, a single color uniform over a half mile big blob of these poker chips (just as our poor resolution eyesight could only see a single average orange color for the huge blob on the mountainside that was actually occupied by individual yellow and red flowers). We would miss all the variety and subtle ever changing detail of the individually colored poker chips. The color we perceived would be a smoothed down, pale color that represented merely the average or lowest common denominator of all the actual vivid colors. And the color we perceived would be a complete fiction, since not even one single poker chip in the whole half mile stretch would actually have this average color (just as no flower on the mountainside was actually orange).
Again, DSD/SACD can achieve better than this crude 6 bit resolution, by using its aggressive algorithms, but only for repeating patterns of music, not for individual transient events. So it couldn't help us to see the actual varied, non-repeating colors of individual poker chips any more clearly.
DSD/SACD's pattern recognition and averaging algorithm can only find one kind of musical information object hidden beneath the surface of the garbage and noise of its intrinsically low resolution nearsightedness. It can only discover repeating patterns as averages, i.e. lowest common denominator trends of objects that are the same as each other and keep repeating. Thus, it can only help to discern simple, continuously repeating musical waveform objects (like sine waves). The algorithm cannot help to discern singular, unique, non-repeating musical transient events. Of course, it's these singular, unique, non-repeating transients that represent the ever changing textural and timbral noises that real musical instruments make as they start, stop, and modulate musical notes. Without these unique transient events, only the simpler, repeating aspects of an ongoing music note are left. Each musical note is left as a more simplified, gentler, smoothed down (lowest common denominator, averaged) version of its original sound, sounding more like an ongoing sine wave. And of course that's exactly the dramatic transformation that DSD/SACD wreaks upon the sound of the original music input signal.
It makes one wonder whether the design engineers actually ever critically listened to this new digital recording system as they were developing DSD/SACD. Perhaps they only listened to its output single ended, never comparing its sound to anything. Perhaps they thought that the sound of the music output by the system was pretty, smooth, gentle, and easy listening, so they were pleased with their work. They probably kept cranking up the aggressiveness of their pattern recognition and averaging algorithm, in their quest for the specmanship of higher resolution on paper, and, when they heard the music getting ever smoother and gentler as a result, they probably liked the sound of that and thought it was good or would sell well. One cannot imagine that they ever took the time to actually listen to a comparison of the input signal with the output signal, the classic bypass test that should be the cornerstone of every engineer's training. If they had, they would have been shocked at the sonic changes wrought on music by their system design, and one hopes they would have scrapped this design approach (crude intrinsic resolution, very aggressive enhancement) or even the whole project forthwith. Their lapse might be forgivable if they were designing a cheap, playback only consumer format (say like MP3 but for the cocktail hour), wherein easy listening likeability and inoffensive gentle smoothness would be important selling criteria, whereas accuracy would be less important. But when you're designing a recording, mastering, and archiving system, accuracy is not merely the most important thing; it is, as Lombardi said, the only thing. To design a recording, mastering, and archiving system as manifestly inaccurate as DSD/SACD is unforgivable.
White Noise Bursts and Distortion
We've now explained perhaps 90% of the large sonic change that DSD/SACD imposes on the input signal. But we also reported hearing some other changes and artifacts. We also reported hearing treble musical sounds changed into bursts of white noise, with a phase inverted or random phase quality. And we reported hearing ugly artifacts when strong treble energy came along, such as cymbals and sibilants. What's the explanation for these other sonic gremlins in DSD/SACD?
There are yet more pipers to pay, for the quieting and resolution enhancement achieved by DSD/SACD's aggressive algorithm. We discussed above how DSD/SACD's aggressive pattern recognition and averaging algorithm quiets random noise and garbage, in favor of emphasizing repeated patterns that it finds. But how can an algorithm or circuit quiet noise? What does it do with the noise? Where does the noise go? It turns out that there is a law of conservation of noise (much like the law of conservation of energy, and the law that entropy cannot increase). The disorder of noise and garbage intrinsic to a low resolution system (like DSD/SACD's mere 6 bits) cannot truly be decreased or quieted. It can only be transferred or shifted. What DSD/SACD's algorithm does is to shift the noise and garbage out of the main music spectrum, and dump it into the ultrasonic part of the system sampling spectrum, which winds up having worse than 6 bit resolution. This shifting of noise can be characterized by a curve, which looks like a frequency response curve, and different noise shifting algorithms can be designed to produce different shaped curves (hence the moniker of noise shaping for the process).
But there are limitations on the shape of the final noise curve achieved by a noise shaping algorithm. Note that the frequency response curve of any physically real system has some limitations, dictated by circuit capabilities, restrictions of instability or overload, etc. This is true for speaker systems, for amplifiers, and also for digital systems. The curve can be shaped only in certain limited ways, reflecting the fact that the circuit can only do things in certain limited ways. For example, DSD/SACD cannot achieve quieting of noise to such a high degree (6 bit quieting enhanced all the way up to 20 bit quieting) at a constant level all the way up to 20 kc, and then suddenly transition to dumping adequately huge amounts of that shifted noise immediately above 20 kc; there has to be a more gradual transition from the quieting region below 20 kc to the dumping region above 20 kc. What this boils down to is the fact that for music's upper treble the algorithm cannot be as effective doing its quieting as it can be at music's lower frequencies. Less quieting in music's upper treble means that there is more random noise and garbage competing with the music signal, so that portion of the music will sound noisier and less like pure clean music.
This shortcoming is then compounded by two further problems. First, for most music the upper treble energy is at a much lower level than the rest of the spectrum. When we measured a full symphony orchestra blazing away with the Rite of Spring, the spectral energy of the music was flat up to about 2 kc, but then fell above that at about 6 db per octave. That would mean that this music's upper treble energy was already about 18-20 db down from midrange energy. Now, DSD/SACD's intrinsic resolution of 6 bits puts its noise and garbage plateau just 36 db down from maximum full scale amplitude. And maximum full scale amplitude has to encompass the total sum of musical energy at all frequencies (which were 18-20 db higher than the upper treble, from 2 kc all the way down to the bass). So this means that music's upper treble is already buried pretty much down into the noise and garbage level of DSD/SACD's 6 bit intrinsic resolution. The second compounding problem is that music's upper treble content by nature tends not to be a very repetitive or uniform pattern (there is too much variation over time). So DSD/SACD's quieting algorithm is crippled even further in trying to help lift this upper treble musical information out of the noise it is buried in, because this information is not enough of a repeating pattern for the algorithm to recognize and do any quieting on.
In sum, there's a triple whammy that conspires to keep music's upper treble information buried in noise for DSD/SACD. When a burst of genuine musical upper treble information appears at the input, DSD/SACD responds with a burst of white noise accompanying this musical information. And indeed the level of noise is so high that it obscures much of the true timbral and textural information in the original musical sound. For example, the subtle individuated brassy sounds of a gentle cymbal kiss get lost amidst the noise, so that the whole cymbal kiss sounds like a burst from a white noise generator or FM tuner.
Moreover, random white noise has no coherent phase to it. So even a coherent piece of upper treble musical information would, if accompanied by enough random phase white noise, lose its phase coherency. It would sound like random phase or inverted polarity musical information, just as we reported hearing on the upper trebles of the violin master recording. This random noise phase incoherence problem of DSD/SACD is then further aggravated by the fact that its pattern recognition and averaging algorithm softens and smoothes down transient attacks and other important cues that help to establish correct phase coherence.
DSD/SACD overall lacks believable tactile phase coherence, and sounds indirect and far away instead of direct and immediate. That's partly because it is too smoothed down and averaged, and averages don't have coherent phase. It's partly because the singular event attack cues which help establish coherent phase have been stripped away. It's partly because the output music waveform is a total fiction (created by recognizing repeating patterns in the input noise and applying these repeating patterns uniformly with too broad a brush to the output), and repeating uniform patterns don't have unique events to establish phase coherent reference points (an ongoing sine wave sounds the same no matter when you start or stop it). And finally it's partly because there's still lingering noise and garbage that accompanies some of the music, especially the upper trebles (where the incisive transients establishing phase coherence get their leading edge from), and DSD/SACD cannot quiet this noise.
What about the ugly artifacts that we hear when there's strong treble energy in the music, as with struck cymbals and vocal sibilants? We think the most likely culprit is temporary overload in the very aggressive algorithm that does the quieting, resolution enhancement, and noise shaping. Our colleague Martin Colloms measured a huge mountain of noise in DSD/SACD, with a peak at about 53 kc that was merely 20 db below maximum full scale amplitude. Our understanding from the Sony and Philips engineers is that this mountain of noise with its high peak was necessary to allow the desired aggressive fifth order noise quieting within the audible spectrum. You can think of this in the following way. There was a huge amount of rubbish, i.e. noise and garbage, to remove from the audible spectrum, in order to get a crude 6 bit resolution system to approach the resolution of a 20 bit system. When you excavate a mountain of rubbish from one location, you wind up with a mountain of rubbish you have to pile high at another location. Hence the huge mountain of noise DSD/SACD adds just above the audible spectrum, peaking at about 53 kc.
It's worth noting that this huge mountain of noise, peaking at 53 kc, is DSD/SACD's response to the simplest possible input test signal, a single 1 kc sine wave tone. It's conceivable that this mountain of noise would get even worse when DSD/SACD is faced with real music as an input signal, which would contain large energy at all spectral frequencies simultaneously, and which therefore might excite or stimulate even worse amounts of noise dumping into the ultrasonic region.
Now, this mountain of noise could create two problems. First, although the mountain's peak is at 53 kc, consider the huge amount of noise energy down on this mountain's lower skirts, especially at 40 kc and below. When a significant burst of musical upper treble information comes along just below 20 kc, the noise energy in this mountain at 40 kc and below could intermodulate with the genuine musical energy, producing very noisy sounding byproducts in the upper treble (just below 20 kc) that would directly compete with the genuine musical information. Incidentally, this could
(Continued on page 44)