Before working with the iPhone, I had sadly little experience with sound formats. I knew the difference between .WAVs and .MP3s, but for the life of me I couldn’t tell you exactly what a .AAC or a .CAF was, or what the best way to convert audio files was on the Mac.
I’ve learned that if you want to develop on the iPhone, it really pays to have a basic understanding of file and data formats, conversion, recording, and which APIs to use when.
This audio tutorial is the first in a three-part Audio Tutorial series covering audio topics of interest to the iPhone developer. In this article, we’ll start by covering file and data formats.
File Formats and Data Formats, Oh My!
The thing to understand is that there are actually two pieces to every audio file: its file format (or audio container), and its data format (or audio encoding).
File Formats (or audio containers) describe the format of the file itself. The actual audio data inside can be encoded many different ways. For example, a CAF file is a file format, that can contain audio that is encoded in MP3, linear PCM, and many other data formats.
So let’s dig into each of these more thoroughly.
Data Formats (or Audio Encoding)
We’re actually going to start with the audio encoding rather than the file format, because the encoding is actually the most important part.
Here are the data formats supported by the iPhone, and a description of each:
- AAC: AAC stands for “Advanced Audio Coding”, and it was designed to be the successor of MP3. As you would guess, it compresses the original sound, resulting in disk savings but lower quality. However, the loss of quality is not always noticeable depending on what you set the bit rate to (more on this later). In practice, AAC usually does better compression than MP3, especially at bit rates below 128kbit/s (again more on this later).
- HE-AAC: HE-AAC is a superset of AAC, where the HE stands for “high efficiency.” HE-AAC is optimized for low bit rate audio such as streaming audio.
- AMR: AMR stands for “Adaptive Multi-Rate” and is another encoding optimized for speech, featuring very low bit rates.
- ALAC: Also known as “Apple Lossless”, this is an encoding that compresses the audio data without losing any quality. In practice, the compression is about 40-60% of the original data. The algorithm was designed so that data could be decompressed at high speeds, which is good for devices such as the iPod or iPhone.
- iLBC: This is yet another encoding optimized for speech, good for voice over IP, and streaming audio.
- IMA4: This is a compression format that gives you 4:1 compression on 16-bit audio files. This is an important encoding for the iPhone, the reasons of which we will discuss later.
- linear PCM: This stands for linear pulse code modulation, and describes the technique used to convert analog sound data into a digital format. In simple terms, this just means uncompressed data. Since the data is uncompressed, it is the fastest to play and is the preferred encoding for audio on the iPhone when space is not an issue.
- μ-law and a-law: As I understand it, these are alternate encodings to convert analog data into digital format, but are more optimized for speech than linear PCM.
- MP3: And of course the format we all know and love, MP3. MP3 is still a very popular format after all of these years, and is supported by the iPhone.
For more information about these types see Apple’s Using Audio.
So which do I use?
That looks like a big list, but there are actually just a few that are the preferred encodings to use. To know which to use, you have to first keep this in mind:
- You can play linear PCM, IMA4, and a few other formats that are uncompressed or simply compressed quite quickly and simultaneously with no issues.
- For more advanced compression methods such as AAC, MP3, and ALAC, the iPhone does have hardware support to decompress the data quickly – but the problem is it can only handle one file at a time. Therefore, if you play more than one of these encodings at a time, they will be decompressed in software, which is slow.
So to pick your data format, here are a couple of rules that generally apply:
- If space is not an issue, just encode everything with linear PCM. Not only is this the fastest way for your audio to play, but you can play multiple sounds simultaneously without running into any CPU resource issues.
- If space is an issue, most likely you’ll want to use AAC encoding for your background music and IMA4 encoding for your sound effects.
The Many Variants of Linear PCM
One final and important note about linear PCM encoding, which again is the preferred uncompressed data format for the iPhone. There are several variants of linear PCM depending on how the data is stored. The data can be stored in big or little endian formats, as floats or integers, and in varying bit-widths.
The most important thing to know here is the preferred variant of linear PCM on the iPhone is little-endian integer 16-bit, or LEI16 for short. Note that this differs from the preferred variant on the Mac OSX, which is native-endian floating point 32-bit. Because audio files are often created on the Mac, it’s a good idea to examine the files and convert them to the preferred format for the iPhone.
File Formats (or Audio Containers)
The iPhone supports many file formats including MPEG-1 (.mp3), MPEG-2 ADTS (.aac), AIFF, CAF, and WAVE. But the most important thing to know here is that usually you’ll just want to use CAF, because it can contain any encoding supported on the iPhone, and it is the preferred file format on the iPhone.
There’s an important piece of terminology related to audio encoding that we need to mention next: bit rates.
The bit rate of an audio file is the number of bits that are processed per unit of time, usually expressed as bits per second (bit/s) or kilobits per second (kbit/s). Higher bit rates produce larger files. Some encodings such as AAC or MP3 let you specify the bit rate to use when compressing the audio file. When you lower the bit rate, you lose quality as well. Unlike other computer-related units, 1 kbit/s is actually 1000 bit/s, not 1024 bit/s.
You should choose a bit rate based on your particular sound file – try it out at different bit rates and see where the best match between file size and quality is. If your file is mostly speech, you can probably get away with a lower bit rate.
Here’s a table that gives an overview of the most common bit rates:
- 32kbit/s: AM Radio quality
- 48kbit/s: Common rate for long speech podcasts
- 64kbit/s: Common rate for normal-length speech podcasts
- 96kbit/s: FM Radio quality
- 128kbit/s: Most common bit rate for MP3 music
- 160kbit/s: Musicians or sensitive listeners prefer this to 128kbit/s
- 192kbit/s: Digital radio broadcasting quality
- 320kbit/s: Virtually indistinguishable from CDs
- 500kbit/s-1,411kbit/s: Lossless audio encoding such as linear PCM