MPEG (pronounced EM-peg, an acronym for Motion Picture Experts Group) is a set of standards for compressing and storing digital audio and video. MP3 is short for “MPEG Audio Layer 3” and it identifies a way to store digital audio files. MP3 files give you CD-quality sound in a file format that requires roughly 1 megabyte for every minute of sound. (CDs and WAV files, by contrast, require about 11MB per minute.)
Regardless of where a sound comes from, what you actually hear is analog sound. Computers translate and store this information as digital sound, however. This is done through sampling–the process of taking a snapshot of the sound many times per second. CDs store information in a digital audio format known as CD-DA, which is very similar to the standard WAV format and samples 44,000 times per second.
MP3 files are based on psychoacoustics–the study of how the human brain perceives sound. This science has determined that not all of the sound we hear is perceived by the brain. To create an MP3 file, an MP3 encoder reads a WAV file and then strips out the parts that you won’t miss hearing.
For example, most people can’t hear sounds above 16 kHz, so the encoder strips out any sounds above a preset threshold level. Loud sounds will mask quieter sounds at or near the same frequency; the encoder removes these, too. By whittling away the parts you don’t hear, the encoder creates a file that sounds almost the same but is dramatically smaller.
An MP3 file can also contain information about the file itself in a tag. The tag can contain things like the artist’s name, a graphic (usually the CD cover art), the genre, and more.
“MPEG Layer-3 is a perceptual audio coding scheme that analyses the audio signal and applies a psycho-acoustic model using the properties of the human ear trying to maintain the original sound quality as far as possible.”
So what does this mean in practical terms? Well, there are sounds beyond the range of human hearing (extreme highs and lows) that are part of the digital signal that can easily be stripped away without changing the sound of the audio file. In the same way, when there is a strong signal it overpowers the weaker signals.
In terms of a song, a loud snare drum might overpower the weaker guitar. In the WAV file, all the data is maintained, but when you convert it to MP3 or WMA, these extraneous sounds are stripped away. Hence, the algorithm attempts to duplicate the way the human ear perceives the audio, and tosses everything else out. There is a more detailed explanation of these concepts here.
The potential severity of this data loss is measured in terms of “bit rate”. A song on a CD is recorded at a bit rate of 1411 Kbps, while an extremely high quality MP3 is 192 Kbps or even 320 Kbps. Frequently MP3s are recorded at 128 Kbps, which is high enough audio quality that most people wouldn’t notice the difference between that and the original CD format.
I prefer to record my MP3s at 320 Kbps however, because when I’m listening with headphones I can hear the data loss of a 128 kbps format file (specifically in the high-end “swish” of cymbals on drums). Different people have different perceptions of the data loss, so it’s very much an individual decision as to which bit rate sounds the “best.” It depends how sensitive your hearing is.
To conserve space on portable devices, some people will even go down to 64 Kbps with their files. At this low of a bit rate, data loss becomes extremely evident as both the low and high-end frequencies are lost. Bass response disappears and high-frequency sounds have an artificial tone.
The final spice in the mix is the esoteric-sounding “variable bit rate encoding” (VBR). Constant bit rate (CBR) is the regular method of encoding — a single bit rate throughout the whole file.
VBR is what makes MP3 even more efficient. During the parts in a song where it’s quiet or less sound is going on, it will drop the bit rate as low as it needs to go — often down to 32 Kbps or lower. Then in parts where there is a great deal of signal, it will increase the bit rate up to whatever maximum you specify — usually 192 Kbps or higher. The benefit of VBR is higher overall sound quality at a smaller file size. If your encoder gives you the option of using VBR, use it!