Let's first tackle the audio quality issue. Every single audio file played over the phone (PSTN or VOIP) must be encoded at 8Khz 8-bit u-law audio. In order for our IVR servers to start playing a high quality mp3 file we first have to download the file from your IVR server, convert it to a wav file, downsample the file to 8Khz, and then convert it to 8-bit u-law audio. Since what is playing over the phone is an 8Khz 8bit u-law audio file, if you are able to hear any difference between a ulaw file and an mp3 file when calling into our IVR platform it is because your audio conversion process is introducing artifacts that our IVR platform does not. Most often downsampling is the biggest source of audio quality issues. The best way to avoid this is to have all audio be recorded directly at 8Khz. If you do have preexisting audio files that need to be downsampled the software utility SoX performs high quality conversions.
Due to this fetch/convert/downsample/convert process, by using mp3 files you are creating the worst case scenario possible for audio latency. Encoding your audio as 8Khz 8bit u-law will give you the lowest possible latency with minimal file sizes and no conversion process. If low latency is your primary concern then mp3s should never be used.
Regarding caching, what you stated is not entirely accurate. The caching system is designed to minimize fetch latency for frequently used files, and there is a substantial amount of drive space given to each server to minimize exercising infrequently used files. However, there are no guarantees that all of your audio files will be stored in all of our server caches at all times. You are on a multi-server shared system that caches other customers files as well. If you call into a server and the next audio file that needs to be played is either not in the cache or has been expired from the cache then that server will fetch and cache your audio file.
The best way to minimize latency is to follow the audio format mentioned above and to make sure you set the IVR global property ,"audiomaxage
" to the highest possible value (currently 604800) in all of your documents. This will minimize latency for audio files not yet cached and tell the cache to only refetch the files that are more than a week old.