
WAV files are probably the simplest of the common formats for
storing audio samples. Unlike MPEG and other compressed formats,
WAVs store samples "in the raw" where no pre-processing is
required other that formatting of the data.
The following information was derived from several sources
including some on the internet which no longer exist. Being
somewhat of a proprietary Microsoft format there are some
elements here which were empirically determined and so some
details may remain somewhat sketchy. From what I've heard, the
best source for information is the File Formats Handbook
by Gunter Born (1995, ITP Boston)
The WAV file itself consists of three "chunks" of information:
The RIFF chunk which identifies the file as a WAV file, The
FORMAT chunk which identifies parameters such as sample rate and
the DATA chunk which contains the actual data
(samples).
Each Chunk breaks down as follows:
| Byte Number |
|
| 0 - 3 |
"RIFF" (ASCII
Characters) |
| 4 - 7 |
Total Length Of Package To
Follow (Binary, little endian) |
| 8 - 11 |
"WAVE" (ASCII
Characters) |
FORMAT Chunk (24 bytes in length total)
| Byte Number |
|
| 0 - 3 |
"fmt_" (ASCII
Characters) |
| 4 - 7 |
Length Of FORMAT Chunk
(Binary, always 0x10) |
| 8 - 9 |
Always 0x01 |
| 10 - 11 |
Channel Numbers (Always
0x01=Mono, 0x02=Stereo) |
| 12 - 15 |
Sample Rate (Binary, in
Hz) |
| 16 - 19 |
Bytes Per Second |
| 20 - 21 |
Bytes Per Sample: 1=8 bit
Mono, 2=8 bit Stereo or 16 bit Mono, 4=16 bit
Stereo |
| 22 - 23 |
Bits Per Sample |
DATA Chunk
| Byte Number |
|
| 0 - 3 |
"data" (ASCII
Characters) |
| 4 - 7 |
Length Of Data To
Follow |
| 8 - end |
Data (Samples) |
The easiest approach to this file format might be to look at
an actual WAV file to see how data is stored. In this case, we
examine DING.WAV which is standard with all Windows packages.
DING.WAV is an 8-bit, mono, 22.050 KHz WAV file of 11,598 bytes
in length. Lets begin by looking at the header of the file
(using DEBUG).
246E:0100 52 49 46 46 46 2D 00 00-57 41 56 45 66 6D 74 20 RIFFF-..WAVEfmt 246E:0110 10 00 00 00 01 00 01 00-22 56 00 00 22 56 00 00 ........"V.."V.. 246E:0120 01 00 08 00 64 61 74 61-22 2D 00 00 80 80 80 80 ....data"-...... 246E:0130 80 80 80 80 80 80 80 80-80 80 80 80 80 80 80 80 ................ 246E:0140 80 80 80 80 80 80 80 80-80 80 80 80 80 80 80 80 ................
As expected, the file begins with the ASCII characters
"RIFF" identifying it as a WAV file. The next four bytes tell
us the length is 0x2D46 bytes (11590 bytes in decimal) which is
the length of the entire file minus the 8 bytes for the "RIFF"
and length (11598 - 11590 = 8 bytes).
The ASCII characters for "WAVE" and "fmt " follow. Next
(line 2 above) we find the value 0x00000010 in the first 4
bytes (length of format chunk: always constant at 0x10). The
next four bytes are 0x0001 (Always) and 0x0001 (A mono WAV, one
channel used).
Since this is a 8-bit WAV, the sample rate and the
bytes/second are the same at 0x00005622 or 22,050 in decimal.
For a 16-bit stereo WAV the bytes/sec would be 4 times the
sample rate. The next 2 bytes show the number of bytes per
sample to be 0x0001 (8-bit mono) and the number of bits per
sample to be 0x0008.
Finally, the ASCII characters for "data" appear followed by
0x00002D22 (11,554 decimal) which is the number of bytes of
data to follow (actual samples). The data is a value from 0x00
to 0xFF. In the example above 0x80 would represent "0" or
silence on the output since the DAC used to playback samples is
a bipolar device (i.e. a value of 0x00 would output a negative
voltage and a value of 0xFF would output a positive voltage at
the output of the DAC on the sound card).
Note that there are extension to the basic WAV format which
may be supported in newer systems -- for example if you look at
DING.WAV in C:\Windows\Media you'll see some extra bytes
added after the format chunk before the "data" area -- but the
basic format remains the same.
As a final example consider the header for the following WAV
file recorded at 44,100 samples per second in 16-bit
stereo.
246E:0100 52 49 46 46 2C 48 00 00-57 41 56 45 66 6D 74 20 RIFF,H..WAVEfmt 246E:0110 10 00 00 00 01 00 02 00-44 AC 00 00 10 B1 02 00 ........D....... 246E:0120 04 00 10 00 64 61 74 61-00 48 00 00 00 00 00 00 ....data.H...... 246E:0130 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
Again we find all the expected structures. Note that the
sample rate is 0xAC44 (44,100 as an unsigned int in decimal)
and the bytes/second is 4 times that figure since this is a
16-bit WAV (* 2) and is stereo (again * 2). The Channel Numbers
field is also found to be 0x02 here and the bits per sample is
0x10 (16 decimal).