Yeah, I actually need to fix that so it takes standard wav files.
As it is now, you have to save it as a raw audio file at 16kHz, 8-bit, mono, and unsigned format.
A bit of an explainer about that:
16kHz is the sample rate of the audio. It's not really important other than controlling the size of the file for the voiceprint graphic that's integrated into the ID. You can re-sample the recording in most audio programs, or record at that rate initially.
As for 8-bit vs. 16-bit, that's how many bits per sample. Image data is almost always 8-bits per channel (24-bit graphics are 8 bits for the red, blue, and green channels each). Hence, it needs to be 8-bits per channel in order to work as a greyscale image.
It needs to be mono, again, for simplicity and the lowest file size possible. The voiceprint block will automaticallly be trimmed to fit the space provided in the ID template.
As for unsigned format, there are usually three options for saving raw files regarding this: Signed, unsigned, and with a sign-bit. Save it as unsigned so that the greyscale image directly reflects the amplitude of the signal (black is the highest negative ampitude, 128 grey is zero amplitude, and 255 white is the maximum positive amplitude). Suffice to say, it makes a more pleasant and easier-to-interpret voiceprint image to use the unsigned format.
This is probably as clear as mud, but I hope it helps a little.
--------------------------------