Page Title

using a contemporary video DVD player as a complete audio player (the contamination can come from plural clocks, jitter, noise leakage, video frame rate chopping, etc.). Incidentally, if you are going to use a video DVD player as just a transport, you should restrict yourself to only Pioneer DVD players, since they are the only ones to output a 96 kHz digital signal. Other video DVD players output merely a 48 kHz digital signal. This degrades the music worse than you might suspect. Not only is the sampling rate cut in half, but distortion is added. This added distortion comes from the fact that the 48 kHz digital output is derived from the 96 kHz original internal signal by a process called decimation, in which some digital samples are simply omitted. Simply omitting some digital information causes distortion when the reconstruction filter later does its work to reconstruct the correct analog music waveform. If the sampling rate is to be reduced without adding significant extra distortion, then the fewer digital samples of the lower sampling rate must be recalculated from the original (full sampling rate) digital data, using a sophisticated recalculating algorithm, implemented by a sophisticated digital computer chipset. The correct values of the recalculated samples differ from the values of any of the original data samples, so any machine that reduces sampling rate by simply picking some of the original data samples will have the wrong values for those remaining samples, and thereby will add spurious distortion. Only the Pioneer DVD video players avoid this trap, and they avoid it by not reducing the sampling rate at all. So the Pioneer video DVD players, used as a transport in conjunction with a good outboard D/A convertor, can at least provide very good sound from today's 24/96 audio CDs.
Does this mean that we can finally recommend that you buy a DVD video player or transport for music? No. Not even a Pioneer video DVD player. Why not? Not because the sound still isn't good enough. Rather, the reason has to do with our second piece of good news. There's something even better right around the corner. So good that it deserves a whole IAR story for itself.

The Sound of Tomorrow: 24/96 from DVD-A vs. DSD from Sony-Philips

DVD-A is a new standard format for putting music (and some optional video information) on DVD. It offers 24/96 music, but it is different from the ad hoc 24/96 standard (now called Advanced Audio Disc) that shoehorns 24/96 music onto today's video DVD players. Today's video DVD players will not play the new DVD-A DVD audio standard. New audio+video DVD players that will play this new DVD-A standard, as well as all video DVDs, should be available to you in about a year or less. That's why we advise that you should not invest in a video DVD player for audio today (buying a DVD player just for video might be OK). As the new DVD-A standard for 24/96 music takes root, today's ad hoc 24/96 standard (shoehorning 24/96 music onto video DVD) will gradually fade away, as will today's 24/96 music DVDs made to this ad hoc standard.
Why should you wait for the new DVD-A digital music standard? The DVD-A standard, for putting music on DVD, achieves the best sound for your home we have heard yet from digital. The best, period.
The technical people from DVD-A put on a music demo at AES, using a variety of real consumer DVD player prototypes (not a fancy studio master hard disk recorder, as Sony-Philips used for their SACD demo). The sound was stunning. Especially impressive was a recent classical symphonic recording, featuring complex music, replete with abundant high frequency transients (cymbals, triangles, violins) that are so difficult for any recording system (particularly digital) to get right. The DVD-A digital also allowed us to hear into the music's complex orchestration better than any other recording we have ever heard.
The most remarkable achievement of DVD-A digital is the superb quality of music's trebles. The difference between DVD-A and Sony-Philips DSD-SACD is huge here in the trebles. DVD-A achieves the best treble reproduction we have ever heard from digital, while DSD-SACD achieves some of the worst, with horrible mangling of some musical sounds above 8000 Hz, thanks to a too-low sampling rate that is 4 times worse than cheap, old fashioned consumer Bitstream CD players.
The difference is so humungous that it's a joke to compare the two systems on everyday musical sounds like cymbals, vocal sibilants, or triangles. Yet we in the marketplace are being asked to seriously compare these two systems, as the prime candidates for our next digital music standard. In fact, I wish that someone would set up a direct comparison demo, featuring music rich in energy above 8000 Hz, so audio pros everywhere could hear for themselves the huge difference in quality of musical treble reproduction we heard between these two contending systems.
Incidentally, in music's lower frequencies, from the midranges on down, both systems are excellent, with some subtle sonic pros and cons (DSD sounds a little more relaxed and airy, while DVD-A sounds a little more focussed and coherent).
It is the region above 8000 Hz, still a very important part of music, that sets them apart. In this region the sonic difference is simply put: DVD-A is glorious and DSD-SACD is incompetent.
Some apologists for Sony-Philips might say that music's treble region is not that important, that sonic compromises are allowable here, since people don't pay that much attention to high quality for music's treble regions (as must be the case, since many listeners are impressed by DSD's midranges and don't seem to notice the mangling of treble information). But such apologies miss the whole point.
You see, virtually any half-baked digital system can do a good job reproducing music's bass and midranges. That's the easy part. The hard part is handling music's trebles, the high frequencies that require a digital system to turn around quickly, and track subtle details while turning (music's treble details usually take the form of small hairpin squiggle turns, superimposed on the larger overall, gently curving waveform that follows the lower frequencies, as we showed you in Hotline 49 and succeeding issues).
Consider this analogy. A large circular, smooth surface racetrack requires only that a car have four wheels and a motor (no steering response at all required, since the wheels could be cocked at a fixed angle to follow the circle). It doesn't demand cars with good handling, good tracking and traction over bumps, and good transient steering response -- all of which are required in the real world, to accurately follow complex transient changes in a real road's curves and bumpy surface (especially a road such as a mountain road, whose many hairpin curves have various and changing radiuses, slopes, and attack angles).
So if I tell you that I can offer you a car which performs great on a smooth circular track, I haven't told you anything (i.e. that doesn't prove anything) about the car's real world handling capabilities, specifically it's ability to handle that transiently changing part of real world driving that's difficult for a car to handle (indeed, I might be offering you only half a car, a car so primitive that it has only a motor and 4 wheels, but no steering response or suspension at all). Likewise, if I offer you a digital system that performs great tracking the smooth, gently turning (relatively constant radius) turns that are characteristic of a music's waveform at bass and midrange frequencies, I might be offering you the equivalent of half a car, with 4 wheels and a motor, but no steering response or suspension; I haven't told you anything about the digital system's ability to handle the real world, specifically that transiently changing part of the real world that's difficult for a digital system to handle.
In order for a car to perform well, it has to be able to follow quick transient changes in real mountain roads. Without this ability it is useless in the real world. This ability requires well engineered handling. And handling is a difficult achievement for cars and car designers, which separates the men from the boys. If a car merely performs well on a smooth circular track, that's the easy stuff, which could even be managed by a car with no handling at all, so it doesn't tell you anything about how well the car is designed, especially in the area that's difficult for cars.
Similar considerations apply to digital (indeed also analog) systems reproducing music. Music's bass and midrange frequencies create a waveform that has smooth, gently turning curves, like a smooth circular racetrack. It's easy for any digital system, even a poorly designed one, to follow these smooth curves. A digital system could even follow these smooth gentle curves if it were only half a digital system, or a very primitive digital system, the equivalent of a half car with a motor and four wheels but no steering response or suspension for handling.
But music's trebles superimpose sharp, suddenly changing little squiggles on the overall music waveform. The overall trend shape of the music waveform is established by the lower frequencies, but the high frequencies add the sharp curves within the overall trend, and a music reproducing system must be able to have the transient handling ability to accurately track these changing sharp curves, otherwise it will crash off the road.
The mountain road is the metaphor for a musical waveform. Both have complex, ever changing sharp curves within an overall trend shape. Both demand excellent transient handling if they are to be accurately tracked. If your machine doesn't have the handling capability to stay accurately on track, it will crash off the road/waveform path, with painfully audible scrunching byproducts.
A road map might show you that the overall trend of a mountain road is northeast, but it obviously doesn't show you every curve in the actual mountain road. When you arrive at the road, you can't just say that your map shows the road's general trend at northeast, so you'll just fix your car's steering at northeast and go. That's not good enough. Your car must be able to follow every squiggle and curve in the actual road, and if it can't you'll crash off the edge of the road.
Likewise, any half-baked digital system can easily follow the gently curving overall road map trend set by music's lower frequencies, but it's difficult for a digital system to accurately track the sharp curving transient squiggles that music's trebles superimpose on the overall road map trend. It isn't good enough for a digital system to merely follow the overall trend of a road as indicated on a road map. A digital system has to be able to follow the actual road of music's complex waveform, including the difficult sharp curves that are added to the overall trend by music's trebles. That's what separates the men from the boys, a half-baked digital system from a real digital system. That's why treble performance is the key to evaluating a digital system's success and technical prowess, and is not just an incidental aspect that we can brush aside or excuse.
That's why, if I tell you that a car works well going around a smooth surfaced large circular track, you'd reply: "So what? That's the easy stuff, which even half a car with no handling can manage. That doesn't tell me squat about how well the car is engineered to handle real world roads, which have bumps and changing sharp curves." Likewise, if you tell me that you listened to DSD-SACD and it sounded great up through the midranges, I'd reply: "So what? That's the easy stuff, which any half-baked digital system can manage. That doesn't tell me squat about how well the digital system is engineered to handle real world music, which has vocal sibilants and cymbal sounds."
If a digital system can't handle the difficult part, can't handle the sharp curves in the waveform of real music, then it will crash off the road plotted by that musical waveform, as surely as a car built with no handling will crash off a real world mountain road.
And, ladies and gentlemen, it is precisely the sound of that crash that we hear from Sony-Philips DSD-SACD, as it literally mangles music's trebles, every time a sharp musical transient curve comes along that requires good handling capability. The handling capability engineered into DSD-SACD is so poor that it literally can't handle the sharper curves thrown at it by real world musical sounds (such as vocal sibilants and cymbal sounds). Like a car with poorly engineered, substandard handling, DSD-SACD can't stay on a real world road when the curves get sharp, so it crashes off the musical waveform road, with grotesque distortions -- that indeed do actually sound much like a car crash.
The Sony-Philips DSD-SACD is a 1 bit oversampling system. Any half-baked kind of 1 bit oversampling system, even one with a too-low oversampling rate and with incompetent averaging and noise-shaping algorithms, can do a credible job reproducing music's lower frequencies, since there are so many 1 bit samples to average per cycle of the music. But at music's high frequencies there are fewer samples to average per musical cycle, so it is here that a high margin of oversampling rate becomes crucial, to preserve enough samples to average per musical cycle (as does the skill of the averaging algorithm). This becomes especially critical when the musical note being captured and reproduced is a single transient that is non-repetitive (hence affords the digital system even less window for averaging), and/or is a steeply rising waveform with a high slew rate.
The Sony-Philips DSD-SACD system simply has too low an oversampling rate to handle all the actual sounds that music makes in the trebles. One could even predict that it would crash, on the sharper curves of music's waveform, from the simple fact that its oversampling rate is merely 1/4 of the sampling rate Philips used 10 years ago for their cheapest consumer CD players in their ancient Bitstream system.
The merit of a digital system has to be assessed where music's curves are the sharpest, because that's the difficult part, which proves whether a digital system's handling is well engineered or not. Where the curves are the sharpest is precisely where DSD-SACD crashes the worst. Its grotesque sonic performance on music's sharpest curves thus earns it a grotesque (eminently unsatisfactory) rating as a whole for handling music in general.
Ask yourself this simple question: if a single digital system can't handle all musical notes, what good is it at all? Are we supposed to use DSD-SACD for all music except vocals and cymbals, running another digital system in parallel for these other musical sounds? Or perhaps we should use DSD-SACD for all music below 8000 Hz, running another digital system in parallel to fill in all music above 8000 Hz?
Where the Sony-Philips DSD-SACD is at its sonic worst, the new DVD-A digital standard is at its glorious best, surpassing all other digital systems in handling the sharpest curves of music's complex trebles. So DVD-A excels at precisely the area that is most difficult for all digital systems, indeed the one area that is the key for evaluating a digital system's success. DVD-A is like the best handling car in the world, and DSD-SACD is like a car with such poor handling that it keeps crashing off the curves of real world roads.
Why does the DVD-A standard sound so superb, especially in handling music's difficult trebles? It is a PCM system, which has an inherent advantage over 1 bit systems (such as DSD) in handling the sharpest curves and steepest waveforms of music. PCM doesn't rely on averaging at all, and thus it can negotiate the sharpest curves and steepest waveforms in a single step, without having to rely on many sampling steps for averaging to follow some average path. As discussed in our 1998 Master Guide, PCM can leap tall buildings in a single bound. PCM also has nimble steering response, so it can instantly step sideways to follow a squiggle in the path of the road, to follow a musical treble detail squiggle in the overall path trend of the musical waveform.
And, perhaps most importantly, PCM's combination, of nimble steering response and instant tracking in a single step, means that it can accurately track treble transients that occur only once and are non-repetitive. The 1 bit digital systems such as DSD-SACD, which rely on averaging, can't accurately track a single non-repetitive treble transient of music, since there are no repetitions to average together.
Furthermore, low bit systems rely on averaging not just to follow a musical waveform, but also to reduce the very high noise and very high distortion that's inherent in the crude amplitude resolution of a low bit system. A low bit system can only crudely approximate the correct amplitude of the music at any given instant. The fewer the bits of resolution, the worse the noise and distortion. Indeed, the reason that 24 bit PCM sounds audibly better than 16 bit PCM is that human hearing can even detect and appreciate the difference in musical accuracy, in low noise, in low distortion, between a 16 bit approximation of music's correct amplitude and a 24 bit approximation (this confirms our research findings reported way back in Hotline 49). Digital systems with merely 8 bits of resolution sound hopelessly crude at each musical sample, with gross noise and distortion. Digital systems with merely 4 bits sound worse yet, and those with merely 1 bit (such as DSD) even worse. How then can these low bit systems deliver tolerably low noise and distortion? They rely on complex averaging algorithms to average many samples of music, and the average sounds decent because the complex averaging process reduces the excessive noise and distortion, by homing in on the correct average value of the music signal. But this complex averaging process only works when there are many oversampled data points to average for each musical note. The gross inherent noise and distortion can only be reduced in proportion to how many oversampled data points there are to average per musical note. Thus, this reduction of inherent gross noise and distortion works very well at music's lower frequencies, where a musical note lingers a long time, so that there are many oversampled data points per cycle of the musical note. For example, our listening evaluations show that DSD-SACD accomplishes the required noise and distortion reduction very well at frequencies below 8000 Hz. But at higher frequencies there are fewer data points to average, so the complex averaging algorithm simply doesn't have as many data points to average, in order to reduce the gross inherent noise and distortion of the low bit digital system. It can't be as effective, and so, with every low bit digital system, the noise and distortion always gets worse at music's higher frequencies. The big question is, how bad does the noise and distortion get at the highest frequencies put out by real musical instruments? If the low bit system has a very high oversampling rate, then there are still enough data samples at music's highest frequencies to average together, in order to reduce the low bit system's inherently gross noise and distortion down to audibly acceptable levels. But if there's not a high enough oversampling rate, then there simply are not enough data points to average on music's highest frequency notes, and therefore music's highest frequency notes will be very noisy (smeared together with a shshsh noise) and will be very distorted (frazzled, etc.). That's exactly the problem we hear with DSD-SACD. And it's no coincidence that DSD-SACD has a tragically low oversampling rate. This is the key factor that renders DSD-SACD incompetent at handling music's highest frequencies, and produces both the noisy smearing and the distortion we hear from DSD-

(Continued on page 19 )