Tuesday 7 May 2013

Digital Clipping

Clipping is a term that originated with analog audio, and refers to the situation where the magnitude of the signal rises to a level that is larger than the ability of the medium to store it, or the electronics to deliver it.  For example, with magnetic tape, the signal is stored by embedding a magnetic signal onto the tape.  But if the musical signal gets beyond a certain level, the tape will not have the magnetic capacity to store a large enough magnetic signal.  Same thing with an amplifier – if you amplify a signal enough, you will eventually run out of voltage (or current) at the output.

With magnetic tape, as well as – generally speaking – with good old-fashioned vacuum tube amplifiers, when the signal level approaches and exceeds the maximum the system was designed to handle, the musical peaks get gradually compressed, so gradually in fact that for the most part you don’t notice it happening.  This so-called “soft clipping” meant that, for the most part, clipping was not the most crucial sonically degrading issue faced by early audio designers.

This all changed with the advent of solid-state electronics.  Your typical transistor amplifier does not soft-clip.  It hard-clips.  This means that when it tries to deliver an output voltage larger than it the maximum it was designed for, the output voltage just sits at the maximum value and stays there until the output signal drops below that maximum value.  The peak of the signal is just wiped out, and the signal waveform develops a flat-topped appearance everywhere this hard-clip occurs.  Imagine Shaquille O’Neill walking through your front door, and instead of gracefully ducking to avoid bumping his head, the door simply chops his head off.  The effect on the music is similarly messy.

In digital audio, the effect of clipping can actually be even worse!  Lets look at what happens when a signal is clipped.  The easiest way to do that is to consider the clipping as being an error signal which is added to the music signal.  This error signal comprises nothing but the peaks that got chopped off.  If we analyze this signal, we find that it has frequency components which extend from within the audio bandwidth (which is considered to be about 16Hz – 20,000Hz) on up into frequency ranges above the audio bandwidth.  In analog space, we can generally just ignore any components above the audio bandwidth because we can’t hear them anyway.  But in digital audio we can’t do that.

Typical digital audio has a sampling frequency of 44,100Hz, the standard developed for the Compact Disc.  There is a firm and fixed mathematical law that says if we want to sample a waveform at a certain frequency, then we have to make sure that the waveform contains no frequencies above exactly one half of the sampling frequency.  This frequency is termed the “Nyquist” frequency.  For CD, that means it has to have no content at any frequency above 22,050Hz.  What happens if you try and encode a signal at, say, “N” Hz ABOVE the Nyquist frequency?  What you find is that the result you get is EXACTLY THE SAME as you would have got if instead the signal was “N” Hz BELOW the Nyquist frequency.  When you play back this signal, it is not the original high frequencies you will hear, but the "bogus" lower ones.  This effect is called mirroring, and is a very audibly destructive artifact.  It explains why the original analog signal has to be very tightly filtered prior to being sampled, to eliminate all traces of any frequency components above the Nyquist frequency.

Back to clipping.  If you take a perfectly good signal in the digital domain, and perform some signal processing on it, then the possibility generally exists that the resultant signal will contain peaks that are above the maximum value that can be represented by the digital encoding system.  What do you do with those peaks?  The easiest thing is to “clip” them at the digital maximum, so that just as with analog clipping in a solid-state amplifier, each sample that works out to be above the digital maximum is encoded as a digital maximum.  You will have, in effect, encoded a waveform containing frequency components above the Nyquist frequency.  When you play back that signal, those otherwise inaudible components will be recreated as audible components at corresponding frequencies below the Nyquist frequency.  This will sound even worse than hard-clipping in an amplifier.

The solution is to use mathematics to “re-shape” the portion of the signal that is being driven into clipping, in such a way as to remove all of the unwanted high-frequency components.  Of course, there will be a sonic price to pay, even for this.  But once you have driven the signal into overload in the first place, there is no escaping without some sort of penalty.

This sort of situation arises in general with any form of signal processing, but "mirroring" is most commonly encountered when down-sampling from a higher sample rate to a lower one, particularly one derived from a DSD source which has (by design) a lot of high-frequency noise.  In general, you have to assume that the higher-rate-sampled “source” data can contain frequency components anywhere below its own Nyquist frequency.  But some of those frequencies can still be higher than the Nyquist frequency of the lower sample rate which is the “target” of the conversion.  So, unless you absolutely know for a certainty that the “source” material contains no frequency content above the Nyquist frequency of the “target”, then your downsampling process needs to incorporate an appropriately designed low-pass digital filter.