A correct theoretical analysis shows that MQA's nearly 'ideal impulse response' actually eviscerates and erases high frequency information, especially short transient attack peaks - which of course makes MQA sound too dull. This too dull tonal balance is then partially offset by MQA's addition of several types of high frequency garbage (the sharp isolated spikes in MQA's 'reconstruction', the leakage of negative frequency ultrasonic images, the interference patterns from the music signal beating with these ultrasonic images, etc.).
But, evidently, this added high frequency garbage is not sufficient to fully offset MQA's too dull tonal balance, and its erasure of high frequency sharp transients. So, with such a horribly imperfect system as a true baseline (rather than the perfect 'ideal' system they had erroneously assumed), it might well help to deliberately import and add even more high frequency information, and MQA's design engineers would hear MQA's tonal balance get better. Thus, it is simply the case that MQA is so bad at 'reconstructing' the original time domain signal that it needs to deliberately import and add even more high frequency information, just to sound as passable as it does. Note that MQA's designers are way off base in mis-interpreting their experiment (which deliberately added even more high frequency information, to help offset MQA's fatal flaws), as purportedly showing that human hearing benefits from ultrasonic information, indeed so much so that it's worth sacrificing music's fine detail (bits 17-24).
Interestingly, this deliberate adding of extra high frequency information might sound better not only because it improves brightness in MQA's erroneous tonal balance, but also because it might help to partially heal the time domain portrayal of all those high frequency transient attacks that were eviscerated or even erased in the time domain by MQA's 'reconstruction'. The spurious ultrasonic sampling images still implicitly represent and contain, scattered among a wide group of sample dots, all the information required to reconstruct the unluckily sampled sharp peak transients that MQA is utterly unable to directly reconstruct due to the very narrow data gathering field of view afforded by its 'ideally' short 'impulse response'. So it's conceivable that deliberately importing and adding an extra dose of high frequency information, sourced from these ultrasonic images, could help slightly (but still not correctly) to strengthen (in the time domain) the sharp high frequency transient peaks that MQA eviscerated.
Many years ago we published signal waveforms showing that the human ear/brain can hear and appreciate very fine musical details that measured way down at the 21 bit resolution depth. And with today's higher resolution sources and playback systems, we doubtless can hear and appreciate musical detail down at the 24 bit resolution level. Thus, MQA has made yet another blunder by deliberately erasing and discarding bits 17-24 of bit depth resolution, from all music recordings. And, again, their only reason for doing so was, when truly analyzed, to give a crutch to the hopelessly defective system that is MQA.
We mentioned above that most PCM is also sonically imperfect here, but in the opposite direction, sounding too closed in and dull. To be fair, we'll briefly mention here how and why this PCM defect occurs (a full explanation requires plural entire installments of our article Digital Done Wrong).
Modern revisionist PCM, employing a 'time domain optimized' short reconstruction filter, sounds much too rounded and dull, because its short 'impulse response' coefficient function is not nearly wide enough, to gather information from other sample dots far away, in order to correctly reconstruct the full amplitude and correct signal waveform shape, for any and all high frequency information (especially sharp transients) that were inevitably unluckily sampled (i.e. not at their peak).
But even traditional PCM, that fully depends on the sampling theorem, still sounds too closed in and dull (even though it is far better than modern revisionist PCM). Why? It still does digital wrong.
First, its reconstruction filter coefficient function, even if it's the correct sinc function, is still truncated too short nowadays, and needs to be much wider, in order to reconstruct high frequencies, especially within an accuracy of 1 LSB in a 24 bit resolution environment.
Second, the new interpolated dots it reconstructs, between incoming sample data dots, need to be much denser and numerous, far more than the 8x 'oversampling' now common in reconstruction filters.
Third, as we reveal and prove in a full serial installment of Digital Done Wrong, the sampling theorem itself, the very bedrock foundation for all digital signal handling, is actually invalid. Because it fails to deliver its promise of accurate reconstruction up the Nyquist frequency (half the sampling rate), it is in fact totally invalid. In terms of actual performance, it progressively degrades signal reconstruction at progressively higher frequencies, getting really bad for the whole top ¾ of the promised Nyquist passband (i.e. 5kHz-20 kHz).
The cure for this is simply to never rely on the sampling theorem for the top ¾ of its promised Nyquist passband, which is simply accomplished by sampling the target signal at a 4x higher rate (eg. 192 kHz for audio signals). High sample rate recordings (176 or 192 kHz sample rate) do not try to fully utilize the passband promised by the sampling theorem, but instead use only the bottom ¼ of this promised passband. Here the sampling theorem works quite well, so high frequency transients can be reconstructed far more accurately.
That's the main reason why high sample rate PCM recordings sound so much better than low sample rate PCM that relies on the full Nyquist passband promised by the sampling theorem.
PCM using a high sampling rate sets the sonic standard for accurate reconstruction of high frequency transients, and accurate open airiness, and extended trebles. In contrast, MQA's phasey soft high frequency garbage doesn't even come close. And this sonic contrast is confirmed and corroborated by our correct technical analysis here, which shows that these sonic contrasts are really there, and how and why they occur.
Incidentally, speculation by others that the sonic superiority of high sample rate PCM arises from a shorter 'impulse response' (or from audible ultrasonic information) are all wrong, indeed even so wrong that they are backwards. It's true that doubling the sampling rate of a given filter design shortens the duration of its 'impulse response' by half. But this shortening per se actually makes the curve-fitting reconstruction, by a given filter design, less accurate (not more accurate as speculated). That's because this shortening factor, per se, halves the width of this filter's data gathering reach, over the signal waveform's extent.
On the other hand, this high sample rate doubling also provides a distinct, different factor, of making the sampling dots twice as dense, which does help bypass the sampling theorem's fatal defects. The fault, dear Brutus, lies in our sampling theorem.
MQA's third unique sonic feature is that, compared to a PCM rendition of the same master tape, MQA produces a spatial image much larger, with much richer ambience, around the performers. The simple key question is, of course, whether MQA is providing a more accurate rendition than PCM of the hall space as encoded by the recording microphones, or whether MQA is adding a signal distortion that, however euphonically seductive, is objectively an inaccuracy and hence wrong, perhaps due to yet another defect in MQA's signal handling.
To begin to answer this key question, here is what we do know.
First, we know that, in a Columbia close miked recording of an intimate duo (Bob Dylan's voice and guitar), MQA somehow generated a large, richly ambient surrounding space halo that was almost certainly not encoded by the close-up microphones into the original recording. This of course aroused our scientific suspicions.
Second, we know that the listener's perception of space and ambience can be enriched by adding or creating random incoherent phase information from a signal, that is then injected into that signal itself.
Third, we know that dedicated reverb chambers (acoustic or electronic) have often been employed to add the sound of a large, ambience rich space into a recording, and that many listeners like this deliberate distortion of the original microphone signal (which is precisely why mastering engineers employ these reverb chambers). These reverb chambers effectuate their sonic change by delaying the signal (sometimes in a complex pattern), and then intermingling the delayed signal version(s) back into the main signal, which in turn directly causes random incoherent phase interference, the very same phenomenon that enriches spatial dimensions and spatial ambience for the listener.
Fourth, we know that large concert halls employ the very same techniques, but acoustically, to allow perception of the hall's genuinely large space and genuinely rich ambience - namely random incoherent phase interference, and also long reverberant delays, from the hall's multiple complex wall reflections.
Thus, we know that there are indeed manipulative methods for enriching spatial size and spatial ambience. Our question here therefore becomes: can and does MQA employ any of these methods, and does it thereby create a distortion by fabricating a larger ambience rich space that, however subjectively beguiling, is nevertheless objectively wrong?
Again, a correct technical analysis explains everything we heard here.
It's actually pretty simple. It all starts with MQA's single minded pursuit of what they very erroneously believe to be their goal: a reconstructed signal waveform whose 'impulse response' meets their tragically uncomprehending 'ideal' of appearing virtually perfect. This MQA design goal, successfully implemented, guarantees that signal waveform reconstruction will in fact be maximally distorted (as discussed above), and also guarantees that there will be maximal leakage of spurious ultrasonic sampling images, down into the audio passband. These spurious leaked images directly interact with the original signal in the audio passband, to form complex interference patterns, having random incoherent phase.
Presto! MQA has now, via one of its defects, artificially manufactured excess, bloated rich spatial ambience, which is added as a distorted halo (whether you want it or not) around the performers (including close miked performers like the Columbia Bob Dylan recording). The random incoherent phase from these complex interference patterns is especially potent, in its spatial distorting efficacy, because the music signal's positive frequencies, progressing spectrally upward in the normal fashion, are beating (interfering) with the spurious ultrasonic images that represent negative frequencies, progressing spectrally downward in inverted fashion (so higher frequencies actually appear at lower frequencies).
Note that this random incoherent phase interference, caused by the music signal beating against the spurious leakage by MQA of the signal from ultrasonic sampling images, would already occur even if these two mutually beating signals were parked entirely within their separate adjacent spectral bands, with no overlap (and the spurious extra ambience generated thereby would already sound like random incoherent phase shadows of the music, because one of these beating signals consists of negative frequencies progressing backward spectrally).
But this artificially fabricated added ambience is made even stronger and more bloated by two of MQA's deliberate design features. First, MQA's extremely short 'impulse response' produces sharp spikes in attempting to 'reconstruct' the music signal, and these sharp spikes, shorter than a sampling interval, surely contain ultrasonic information that does overlap into the spectral territory of the spurious negative frequency ultrasonic sampling images leaked by MQA, thereby causing even more direct hence stronger beat interference and even more complex interference patterns with complex random incoherent phase. Second, MQA's misguided design trick, of deliberately adding extra ultrasonic overtone energy to the music signal, winds up fabricating yet more spectral overlap with the spurious negative frequency ultrasonic sampling images leaked by MQA, thereby causing even more direct hence stronger beat interference and even more complex interference patterns with complex random incoherent phase.
What about the large spatial size manufactured and added by MQA? This would require that the complex interference patterns, and the random incoherent phase they produce, also incorporate some time delays. Not to worry. A correct technical analysis shows that MQA can also generate this kind of signal distortion. Here's how. When portions of the mutually interfering signals, the original music signal and the spurious ultrasonic sampling images radiating backwards down into the audio passband, happen to be close in frequency, then the beat pattern generated by their mutual interference will be a low frequency beat pattern This means that MQA's phony ambience enrichment will cover the full spectrum, even down into low frequencies, which is a sonic hallmark of what a large dimensioned concert hall does. Furthermore, low frequencies by definition last a long time, so these low frequency beat interference patterns artificially manufactured by MQA will last a long time, hence will produce the same sonic effect as an acoustic reverberant field in a large concert hall produces, with its long temporal delays built into its reverb pattern.
This is much the same effect as is produced by artificial reverb chambers, employed to artificially alter what the recording microphones heard. Recording and mastering engineers use their judgment to decide how much, if any, artificial reverb to add to the recordings that they sell and you buy.
What about the vague amorphous nature of MQA's rich ambience and bloated space? This is explained, and also confirmed and corroborated, by the fact that these sonic effects originate from high frequency spectral garbage (e.g. spurious ultrasonic images with negative frequencies that progress spectrally backwards), not from the genuine music recording and its capture of the recording venue acoustics. Also, these sonic effects are caused by random incoherent interference between signals, not by the signals themselves.
Thus, these vague amorphous bloated sonic effects are modulated by the primary audio sgnal, and ride along with that signal, but they do not represent or convey any true recorded signal information, e.g. the true size of the recording venue, or where the reflecting hall walls are located. Instead, MQA's vague amorphous artificial fabrication of spatial effects actually swamps and drowns any genuine spatial information captured in the original recording, be it of a concert hall or a close-miked recording booth.
The question for you is simple. Do you want your digital playback system to accurately play back the recording as the mastering engineer chose to publish it, for your benefit and enjoyment? Or do you want to employ a digital playback system that effectively adds the distortion of a further artificial reverb chamber, with a heavy extra dose of bloated space and ambience added to what the mastering engineer chose to give you on all your recordings, without giving you any choice to delete it and accurately hear the original correct recording?
Further Foundation of Technical Analysis
Any and every digital playback device has one primary job: connect the sample dots, to accurately reconstruct the path of the original pre-sampled signal waveform (much like a 3 year old child, who connects the dots to accurately re-create the path of the outline of a bunny rabbit's head).
The sampling theorem, which is the bedrock foundation and literal sine qua non for all digital signal handling, promises us that we can correctly execute this primary playback job of reconstruction, even with very sparse sampling, where the sample dots are spaced far apart, relative to the twists and turns of the signal waveform.
This sparse sampling in turn necessarily a priori dictates that virtually all of our sample dots will be unlucky samplings, since they fail to occur at the precise instant that the signal waveform happens to be at a local peak, or at the inflection point for a twist or turn. Thus, sample dots fail to explicitly show 2 crucial signal waveform facts: exactly when each peak or inflection point occurred in the original pre-sampled signal waveform, and exactly what amplitude they occurred at. Somehow, our primary playback job must reconstruct peaks and inflection points between sample dots that are not even shown by the sample dots themselves.
It is easy to lazily think of digital reconstruction as merely plotting a path that connects the sample dots, just as digital textbooks teach. But this is erroneous, naively over-simplistic thinking and teaching.
In point of fact, the sampling theorem promises us that we can correctly reconstruct the original re-sampled signal waveform path, even when the sample dots are so sparse and so unlucky that the correct original signal path to be reconstructed is not at all apparent to the human eye trying to connect the dots.
For example, the sampling theorem promises us that we can correctly reconstruct a simple single full scale sine wave even when and where the sample dots unluckily happen to be at and very near zero amplitude (and the sampling theorem does indeed work correctly for this difficult reconstruction circumstance). This is equivalent to us at age 3 trying to connect the dots outlining that bunny rabbit head profile, and seeing dots at the left and right zero amplitude bases of the bunny's ear, but no lucky sampling dots whatsoever anywhere on the height of this ear, nor at its top. How on earth can we reconstruct the correct shape and amplitude of the bunny's ear, or the full scale sine wave, when an unlucky sampling gives us no sample dots whatsoever above zero?
So, how can the playback digital reconstruction filter possibly reconstruct the correct peak, and the correct reconstructed signal path through that peak, when that peak occurs between unlucky sampling dots, and thus this peak is not even explicitly represented at all by the unlucky sampling dots? The answer in a nutshell is that this unsampled peak, though not explicitly represented by the sample dots on either side that unluckily missed it, is nevertheless implicitly represented in the overall pattern of sample dots, specifically in the overall pattern that extends beyond (sometimes far beyond) the single pair of sample dots immediately adjacent to this unsampled peak (the 2 dots that immediately flank this peak).