We've Moved! Please visit our new and improved forum over at our new portal: https://portal.plumvoice.com/hc/en-us/community/topics

How do I convert audio files to a different encoding?

Questions and answers about IVR programming for Plum DEV

Moderators: admin, support

Post Reply
prairieblue
Posts: 10
Joined: Sat Nov 22, 2003 1:15 am

How do I convert audio files to a different encoding?

Post by prairieblue »

I would like to upload voice files from VoiceXML to my server. I am getting close, but need some help.
I use:

Code: Select all

<record name="message" beep="true" maxtime="10s" type="audio/x-wav" finalsilence="4000ms" dtmfterm="true">
to record.
I use:
<submit next="http://www.somedomain.com/vxml/base/RcdTest.php"
  method="post" 
  namelist="message fromname"
  enctype="multipart/form-data"/>
to submit.

I use this php to read the file and save it.

Code: Select all

$file = $_FILES['message'];
if ($file['size'] > 0)
{
  $bytelen = $file['size'];
  $fp = fopen($file['tmp_name'], 'rb');
  $data = fread($fp, $bytelen);
  fclose($fp);

  $fh = fopen('../temp/wavtest.wav', 'w') or die("can't open file");
  if (fwrite($fh, $data) == -1) { die("can't write data"); }
  fclose($fh) or die("can't close file");

  $from = "File loaded.";
}
I use this file successfully again with:

Code: Select all

<prompt>
  <audio src="http://www.somedomain.com/vxml/temp/wavtest.wav">
    I hope I do not hear this.
  </audio>
</prompt>
Now comes the hard part. I cannot play this file using WinAmp, and I cannot manipulate as I need to (change formats, etc.). I am using Apache, php, Windows NT, and C++ language routines.

I could not record and playback per the above without setting type="audio/x-wav”.

The code that I want to interface with requires WAVE file format headers. I noticed that the AudioFormat set with “audio/x-wav” is 7, and not 1 as my existing code is looking for. Do you know what AudioFormat = 7 means. This could be a useful clue.

In my research, I used http://support.plumgroup.com/plumdocs/v ... audio.html, which states “See section 4.14.1 for supported audio formats.” Do you know where this section is. I couldn’t find it.

prairieblue
Posts: 10
Joined: Sat Nov 22, 2003 1:15 am

sox

Post by prairieblue »

I found the reference to sox in another topic, and have given it a try. I think this will meet my needs. I would still enjoy your insight into strategies for uploading and making files available on the server. My sox solution is implemented via one more program interface and one more call, rather than out of php or VoiceXML. This seems unecessarily complicated.

support
Posts: 3632
Joined: Mon Jun 02, 2003 3:47 pm
Location: Boston, MA
Contact:

Re: IVR developers Insight for converting audio formats

Post by support »

sox is the choice most of our in house IVR developers turn to when conversion to and from various audio formats for use in other contexts.


While this solution makes support for different audio formats slightly more cumbersome, it also provides a greater degree of flexibility since sox is a powerful tool for audio processing, and can be run via command line with support for a variety of features.

This is not to say there is a lack of desire to support built in audio types; there's simply been a lack of good reason to pursue more comprehensive built-in audio support. This is primarily due to the fact that are a number of different wav file codecs in use, and deciding which formats to use would be a difficult process since adding support for all of them is not a realistic goal.

hope this helps

Plum support Staff
Last edited by support on Wed Jan 06, 2010 3:25 pm, edited 1 time in total.

prairieblue
Posts: 10
Joined: Sat Nov 22, 2003 1:15 am

Post by prairieblue »

Here is some additional info on voice coding, gained by detective work mostly.

Microsoft SAPI TTS nominally generates a wav file with 16 bit linear coding at a sample rate of 22050 Hz. If you look inside the RIFF header for a wav file, the linear encoding is identified by audio format = 1.

The plum voice record parameter selection of "audio/basic" generates a headerless voice file with 8 bit mu-law coding at a sample rate of 8000 Hz. Mu-law coding provides a logarithmic compression rather than the linear encoding above.

The plum voice record parameter selection of "audio/x-wav" adds a header to the above file.

The plum voice record parameter selection of "audio/x-alaw-basic" generates a headerless voice file with 8 bit A-law encoding at a sample rate of 8000 Hz. A-law encoding is another logarithmic compression scheme.

This was enough information for me to be able to use SoX as a tool for converting between encoding schemes.

I also use the Brooktrout convert routines to convert in and out of vox format, and lame to convert to mp3 files. I found that by using the SAPI TTS wav format as the go-between format for all files, I ended up with good fidelity and reasonable performance.

support
Posts: 3632
Joined: Mon Jun 02, 2003 3:47 pm
Location: Boston, MA
Contact:

IVR post for research done on audio conversions

Post by support »

Thank you for posting the information you've discovered on the support site!

Undoubtedly, the research you have done will be of help to other IVR developers in doing audio conversions of this type.

Thanks again!

Plum Support Staff

Post Reply