oversamples into one sample. In the case of DSD/SACD, it oversamples by 64 times, so it can average these 64 samples to obtain a single sample with improved resolution at the required minimum 44.1 kc sample rate. Since 64 is equivalent to 2 raised to the 6th power, DSD/SACD's 64 times oversampling is equivalent to having 6 bits of intrinsic resolution.
As we've discussed previously, this is actually 4 times worse than Bitstream, which at 256 times oversampling has the equivalent of 8 bits of intrinsic resolution. Philips invented Bitstream as a cheap alternative to multibit PCM, for its cheapest consumer CD players. So Bitstream was intended only for consumer use, only for playback, and only for the cheapest players. Now along comes Sony with DSD/SACD. It is intended for professional use, indeed for archiving use, not just for consumer use as Bitstream was. It is intended for recording use (which means that its errors are forever enshrined), not just playback use as Bitstream was. It is intended for the most expensive systems and professional equipment, not just for the cheapest consumer systems. And yet, amazingly, its intrinsic resolution is four times worse than Bitstream! We already found Bitstream's sonic performance to be unacceptable, and we documented this with measurements in IAR issue # 61-62 (pp. 3-9). So you can imagine what the quality must be of a system like DSD/SACD that is intrinsically four times worse than Bitstream.
It's worth mentioning briefly that DSD/SACD's 6 bits of resolution is equivalent to merely a 36 db signal to noise ratio, which is worse than most AM radio. This intrinsic 36 db s/n ratio of DSD/SACD contrasts dramatically with the 144 db of 24 bit DVD-A, and with the 96 db of conventional CD. Again, since classical music spends most of its time with its loudest notes around the 10% level, this means that the effective intrinsic s/n ratio for DSD/SACD would be far worse even than the wretched 36 db, for most of the time. This means that any musical detail smaller than 1/6, of the loudest classical music notes for most of the time, would be invisibly undetectable and lost to the intrinsic resolution of DSD/SACD. Looking back at our 32 mile long trip analogy, let's re-scale this trip length so that the loudest classical music note for most of the time (which is at the 10% level) is re-scaled to 32 miles since it represents the full amplitude most of the time; then any objects smaller than 1/6 of this trip, i.e. smaller than 5 miles big, would be undetectable by the intrinsic DSD/SACD amidst its random noise and garbage. That's pretty sad.
Enhancing Repeated Patterns
Obviously, something had to be done to improve this very sad intrinsic resolution of DSD/SACD. So Sony did something. They pulled out the old smoke and mirrors bag of tricks. They implemented a kind of pattern recognition and averaging algorithm, sometimes called noise shaping.
How does this special kind of pattern recognition and averaging work? Recall that, with only 6 bits of intrinsic resolution, any and all objects of musical information smaller than 1/64 of full scale are simply lost and undetectable. They are basically buried beneath the random garbage and noise of the 6 bit digital system.
The goal of this pattern recognition and averaging algorithm is to attempt to penetrate beneath the surface of this garbage and noise, and probe for repeating patterns of musical information buried beneath the rubble. Essentially, the algorithm sifts through the rubble over and over (using a digital recirculating loop), comparing old noise and garbage with new noise and garbage, and looking for hints of repeating patterns in that noise and garbage.
When it finds a repeating pattern, this algorithm can bring that repeating pattern into the foreground, and can quiet the rest of the noise and garbage relative to that repeating pattern. Thus, this algorithm can enhance the effective resolution of the digital system for that repeating pattern. It can enhance the 6 bit intrinsic resolution of DSD/SACD to a higher effective resolution for repeating musical patterns. If this algorithm is designed to be very aggressive, it can provide dramatic improvements in effective resolution, even approaching 20 bits, for repeating musical patterns. For example, say a violin note lasts two seconds. This algorithm can sift through all the noise and garbage for the entire two seconds (which is a very long time for a digital system), and can recognize that certain patterns are repeated continuously, over and over, for virtually the entire two seconds. There's the fundamental sine wave of the violin note, plus sine waves representing a few overtones, and these several sine waves continue as a repeating pattern for the entire two second duration of the violin note. The algorithm can bring this simple repeating pattern into the foreground, and can quiet the rest of the noise and garbage into the background, thereby enhancing the resolution of DSD/SACD for that simple repeating pattern, of the continuing sine waves of the two second violin note.
So far, so good. At this point, Sony's engineers might be feeling mighty proud of themselves, achieving (via their new aggressive algorithm) 20 bit effective resolution (16 times better than Bitstream) from a system that intrinsically has only 6 bits of resolution (4 times worse than Bitstream).
But there's a problem. Several problems in fact. There's no free lunch, and there are several pipers to pay.
Let's start with the simplest, most obvious problem. The algorithm, sifting through random noise and garbage, can only look for repeating patterns among the noise, so it can only recognize repeating patterns of music. It cannot recognize singular or non-recurring musical events, unique transients that happen only once within a note, sounds that tell us about the timbre and texture of the real instrument, sounds such as the starting and stopping sounds of each note (which obviously happen only once within each note, and which do not continuously repeat as a pattern throughout the note). All these singular sounds are crucial to making a musical instrument sound vivid, vibrant, tactile, direct, and real. But the algorithm does not look for and cannot recognize these crucial singular transient musical sounds, so it cannot bring them forward, nor can it enhance the system's effective resolution to better reveal them.
Second, the algorithm wreaks an even worse insult to these crucial singular musical sounds. Not only does it fail to recognize and enhance them, but it also actually subdues them. The algorithm, looking for repeating patterns of the random noise and garbage, simply cannot tell the difference between a genuine singular musical sound and a singular burst of random noise (the algorithm doesn't have a sensitive ear and musical intelligence as we humans do). Random noise spikes look the same as genuine musical transients, both being singular distinctive waveform events that are not part of a repeating pattern. So the algorithm can't tell them apart, since it has no hearing with musical sensibilities. And so it thinks all unique, distinctive musical transients are just noise, and it subdues them along with the noise. Thus, when the algorithm quiets the noise and garbage in order to bring the repeating patterns to the foreground, it also quiets and subdues the genuine singular musical sounds that were mixed in with the noise and garbage (especially all those singular musical sounds which were at less than 1/64 of full scale, i.e. less than 1/6 of a typical note's full loudness in classical music, and which therefore were buried amongst the noise and garbage of DSD/SACD's poor instrinsic 6 bit resolution).
Third, by finding and then emphasizing the repeating pattern that exists for virtually the entire musical note, the algorithm is effectively computing the simplistic average of the whole note and then applying it uniformly, with a broad brush, as a lowest common denominator to the entire duration of the musical note. Thus, this algorithm is literally transforming the sound and character of the whole musical note. It is making the whole musical note more simplistically uniform, from beginning to end. The individual musical sounds that are different at the beginning, middle, and end of the note are subdued. You know that a string sounds different (in the midranges as well as the trebles) when it is first attacked by a bow of a violin or the pluck of a guitar, than it does later on for the long sustain of the note. The algorithm's emphasis of the repeating pattern, the average trend over time, will ignore and erase the singular non-recurring sounds, and those sounds that only exist for a minority of the time -- thereby simplifying and subduing music's true transient variety, and producing a very smoothed down, rounded, averaged sound, more like a simple sine wave, which is one uniform tone lasting unchanged forever. The initial harder attack transient, the sudden scruffiness in the middle of the note as the violinist digs in or changes fingering, are crucial, distinctive, individualistic sounds that mark distinct instants in the progress of a violin's musical note, but these differentiating individualistic sounds are stifled, to make way for the uniformity of the continuously repeating pattern. The distinctive true sounds of the note's beginning, end, and sparkling accents in the middle are all rounded, sanitized, and emasculated. In making the note more uniform from beginning to end, the algorithm is imposing an Orwellian society dictum wherein individuality is squelched, and uniformity reigns throughout the duration of the note. Moreover, in imposing the tyranny of the average or lowest common denominator upon the whole musical note, the algorithm is literally creating a new sound that never actually occurred in the first place.
Thus, the algorithm does succeed in providing a quieting of background noise and garbage, and in enhancing DSD/SACD's effective resolution for repeating musical patterns. But there are pipers to pay. Individual singular musical sounds are subdued along with the noise. The uniformity of the average reigns throughout each musical note. And the nature of all musical notes is changed by these transformations. These are big changes, and of course no recording/reproduction system, certainly not a mastering and archiving system, should be making such changes to musical signals.
What would these unwanted signal changes actually sound like? Guess what? By strange coincidence, these unwanted signal changes would sound exactly like DSD/SACD does in fact sound. These unwanted signal changes would change the sound of a live mike feed of a close miked guitar to sound very different, and (most importantly) different in exactly the ways that we heard DSD/SACD sounding when its master recording was played. These unwanted signal changes would change the sound of violin notes to sound just as we heard DSD/SACD sounding, in contrast to the sound revealed by 24/192 DVD-A (which is also the sound that we know a real live violin has).
After DSD/SACD's aggressive pattern recognition averaging algorithm gets through with them, the sound of each violin note and each guitar note becomes more uniform and homogenized from beginning to end. The distinctive, individual, non-repeating parts of each note, such as the transient attack, are dramatically subdued and smoothed down. The whole musical note becomes much smoother, gentler, more like a simple continuing sine wave (with a few continuing overtones), uniform from beginning to end. That's because the algorithm recognized and emphasized the pattern that repeated itself most from beginning to end of the whole note, i.e. that was most uniform throughout the whole note. The sharp, sparkling, gruff, even grating sounds of real live music (especially as picked up by a close up mike) are squelched (discarded with the noise), since they do not form a continuously repeating pattern throughout the note. A violin's real sounds, the singular scruffy noises of gut and rosin scraping steel that are so unique that they do not form a simple repeating pattern, are discarded as noise by DSD/SACD's algorithm, and are replaced by a syrupy sweet simpler sine wave with a few overtones.
The beginning and end of each musical note from the guitar and violin are changed if they are abrupt transitions (say a sudden transient attack). DSD/SACD's aggressive algorithm subdues the individual, singular, unique aspects of the actual original sharp transient attack, and then what remains is merely a simpler and uniformly ongoing sine wave tone. And DSD/SACD even changes the beginning and ending of those simple sine wave tones, averaging them with the preceding and succeeding silence, such that each simplified musical note begins and ends gradually and gently, rather than starting suddenly with a hard transient attack as the original real musical instrument did. The sound of that musical note's beginning becomes changed into a gradual onset of the sine wave tones of the main body of the note -- a more gradual, rounded, gentled down introduction to each musical note. The whole musical note is rounded and prettified, made more like a clarinet note (clarinets start and stop relatively gradually, and their tone is a pretty simplistic, uniformly ongoing sine wave with overtones). With each and every musical note, you hear more of the simple, rounded, prettified, ongoing sine wave aspect of the note, and less of the singular, non-repeating aspects of the note. The language of music is stripped of its consonants by DSD/SACD, and is left with only its vowels. The sound of all music loses its immediacy, forceful directness, and tactile coherence, instead becoming indirect, soft, gentle, and diffuse.
All musical notes are dramatically changed in character. The change is similar to what a live musical instrument would sound like if you placed it behind a velvet curtain, or if you heard it at a great distance (especially in a large room or hall). Listening through a velvet curtain, or through a long distance of air from a seat in the indirect reverberant field of a large room, also changes the sound that a real musical instrument makes, the sound that you would hear if you were listening up close where the mike actually was. The velvet curtain subdues the individual sharp edges at the beginning and end of each note, and the unique, individual subtle grating sounds within each note, thereby making each note softer, gentler, more homogenized, more uniform throughout its duration, just as the DSD/SACD algorithm does. When musical notes are changed this way, to become more like homogenized pablum, it's certainly more relaxing on the ears. It's a fine effect if you want background cocktail music or elevator music. And, as discussed previously, it could even be a euphonic transformation for some recordings that were miked much too closely and then EQ'd too harshly. We don't dispute those who think that DSD/SACD gives them a prettified and easy listening version of music that they subjectively enjoy. But any rational person must take severe issue with the inclusion or promotion of such a drastically inaccurate coloration tool in the context of a recording, mastering, and archiving system.
You might think we're being too harsh on DSD/SACD. Actually, we're being too polite. The first truth is that, for all DSD/SACD's claims of having 20 bit resolution for music, it only has this resolution for repeating patterns of music (and only for the lower frequencies of the spectrum). For singular, unique, transient musical sounds, DSD/SACD might as well just be a crude system with only 6 bits of resolution. If singular transient sounds fall below 1/64 of full scale amplitude, then they become part of and lost in the noise and garbage, and DSD/SACD's pattern recognition and averaging algorithm has no means of recognizing and rescuing them, since they are non-repeating (indeed, DSD/SACD's algorithm might subdue them further, as part of its noise quieting in favor of uniform repeating musical patterns).
The second truth is that the effective resolution gains of DSD/SACD's aggressive algorithm are largely smoke and mirrors, playing a cheap parlor trick that fools the hearing of some of us. The basic trick is that if you give people something that sounds quiet, clean, relaxing, and musically pretty, then they won't question how accurate it is. The aggressive DSD/SACD pattern recognition and averaging algorithm achieves a very dramatic improvement in noise and garbage, starting with a 6 bit system and driving down the noise and garbage all the way down to about 20 bits down (at least for lower spectral frequencies). But the smoke and mirrors deception is that the signal DSD/SACD gives you, riding on top of this dramatically quieter and cleaner background, is not at all the same signal as was input to the DSD/SACD system. The resolution enhancement algorithm can only enhance repeating patterns, so the output signal is different than the input signal. The output signal largely consists of repeated patterns, recognized in and extracted from the input signal, those repeated patterns having been enhanced and magnified until they form most of the cloth of the output signal, making the output signal much more uniform and smoothed down than the input signal. DSD/SACD's resolution enhancing algorithm quiets the background noise in the restaurant, but it gives you different music as the main food course. You ordered spicy food with dazzling contrasts of vibrant distinct flavors, but DSD/SACD instead brings you politely pallid porridge of uniform flavor that is guaranteed not to offend anyone. DSD/SACD's algorithm might be claimed to enhance the resolution with which the music is reproduced (compared to the intrinsic 6 bit system capability), but what's the point if it is different music that is reproduced?
The third truth is that DSD/SACD does not merely smooth down the real sound of music, the real sound of the input signal. It actually creates a fictional new sound that never existed. The recording mike never heard this sound of the guitar or violin, as comes out of DSD/SACD. An existing recording, transcribed, re-issued, or archived via DSD/SACD, never sounded at all like what DSD/SACD does to it. Thus, in a sense, DSD/SACD is committing 100% distortion, creating fictional new music that sounds very different from the true input signal. This fictional new sound might be euphonically pleasing to some listeners, because it is generally similar to the sound of music heard at a distance in a large, heavily curtained room. But it is a total fiction, and a very different fiction than the input signal.
In previous IAR articles, we have discussed other digital systems that make some of these same smoothing mistakes, but in much milder form than DSD/SACD does. We have also demonstrated this with some measurements. It's worth re-reading IAR # 55, pp. 3-9, and IAR # 58, pp. 8-18. Graph 12 on page 17 is especially dramatic, showing a music waveform being correctly reproduced by one digital system, but being smoothed down and averaged by another digital system, with a dramatic loss of real musical timbral and textural information.
Let's consider a couple of visual analogies, to help get a better intuitive feel for the above discussion. Imagine a river bank with a beach of tan sand. Hidden beneath the sand is a layer of thousands of uniformly grey round river rocks. With some effort, you move aside the sand here and there, and you see this solid grey layer of river rocks beneath, at the several places you look. You inferentially conclude that the repeating color pattern of the stone layer is grey, and so in your mind you apply that grey color globally and uniformly to the whole stone layer. Now, you have made a correct inference about the overall average color of the layer beneath the sand. And you are correct that the only repeating pattern of color is indeed grey, and the average color is indeed grey. And you have indeed penetrated beneath the sand, so you now discern and have more information than you
(Continued on page 43)