Help - Search - Members - Calendar
Full Version: Ported VB ActiveX DTMF Decoder
Skype Community > English > Development, Betas and Skype Garage > Skype Public API
henrik gustav
hello, i found this VB code for decoding DTMF
(67331_DTMF_DECODER_UPDATED_V2.0.zip)

Limitation : It does not seem to detect silence, when I press, say '6', it may detect as 6666. So for non-sequencing DTMF keys, this does the job. so it detects 1234567890*#ABCD nicely

For those who does not have visualbasic, i made an activeX com
DTMFDecoder.zip


example of use :

Dim dtmfDecoder As New DTMFDecoder.CMain
While cCallStatus_Inprogress = oCall.Status
Me.Text &= dtmfDecoder.GetKey
Threading.Thread.Sleep(10)
End While
dtmfDecoder = Nothing

replace thread.sleep with proper threading code

using this activeX com, it works perfectly for my requirement.
minerva
Looks interesting. I notice that the component has a couple of properties (ConnectKind and RemoteMachineName). What did you set these to? Also, what did you do to feed the Skype audio stream into the component?

Thanks,
minerva
TheUberOverlord
QUOTE (henrik gustav @ Wed Sep 26 2007, 01:28)
Go to the original post
hello, i found this VB code for decoding DTMF
(67331_DTMF_DECODER_UPDATED_V2.0.zip)

Limitation : It does not seem to detect silence, when I press, say '6', it may detect as 6666. So for non-sequencing DTMF keys, this does the job. so it detects 1234567890*#ABCD nicely

For those who does not have visualbasic, i made an activeX com
DTMFDecoder.zip


example of use :

Dim dtmfDecoder As New DTMFDecoder.CMain
While cCallStatus_Inprogress = oCall.Status
Me.Text &= dtmfDecoder.GetKey
Threading.Thread.Sleep(10)
End While
dtmfDecoder = Nothing

replace thread.sleep with proper threading code

using this activeX com, it works perfectly for my requirement.


Original Source code and article are located here:

http://www.planet-source-code.com/vb/scrip...31&lngWid=1

Please be aware of what it says there, I quote:

"2) You MAY NOT redistribute this code (for example to a web site) without written permission from the original author. Failure to do so is a violation of copyright laws."

You place this author in a very odd situation, if you added code, that could be of harm to others and you also gave no link to the orignal source code or article of this author.

Reguarding detecting silence, I wrote some posts in the API forum about that issue here as well, do a search for DTMF g729.
minerva
Thanks Uber,

You seem to know more about DTMF with Skype than anybody else. The VB code behind this ActiveX is much as expected - FFT Goertzel style.

I have Delphi code that can detect DTMF embedded in a .wav file

I need to better understand how to map the bytestream output from a 'pCall.OutputDevice[callIoDeviceTypePort]' assigned port into my FFT function.

Can I use the raw bytestream from the port or is some sort of codec needed?

Your '5*' implementation to work around the 'multiple digit' issue is a good idea.

Several posts point to pages on testing.onlytherightanswers.com, but these seem broken just now.

minerva
TheUberOverlord
QUOTE (minerva @ Fri Aug 15 2008, 08:02)
Go to the original post
Thanks Uber,

You seem to know more about DTMF with Skype than anybody else. The VB code behind this ActiveX is much as expected - FFT Goertzel style.

I have Delphi code that can detect DTMF embedded in a .wav file

I need to better understand how to map the bytestream output from a 'pCall.OutputDevice[callIoDeviceTypePort]' assigned port into my FFT function.

Can I use the raw bytestream from the port or is some sort of codec needed?

Your '5*' implementation to work around the 'multiple digit' issue is a good idea.

Several posts point to pages on testing.onlytherightanswers.com, but these seem broken just now.

minerva


I need to answer this in a very long-winded-complicated way so that not only you, but others understand what the issues are that can/will be encountered when decoding DTMF in real time using the Skype API/Skype4COM voice stream interface.

1. You can't use the .wav file options, because the Skype client locks the file(s) when opened, so you can't read data from them or write data to them in real time ("Using normal methods"). This means real time DTMF processing for calls must be done using the "Port" option.

2. You can only use the PORT interface as well as the .wav file interface When/While a call is in progress, and you can't be using the same voice API methods for more than one call. This means for example, if you are the host of a conference call, you can not be looking for DTMF on more than one call at a time in the conference call.

3. Don't be frugal, use a 64K buffer size for data from the Skype client for the port data, in C#:

CODE
int BufferSize = 65536;
handler.BeginReceive(state.buffer, 0, BufferSize, 0,
                     new AsyncCallback(ReadCallback), state);


4. The Microsoft Wave file format states, I quote from:

http://ccrma.stanford.edu/CCRMA/Courses/42...cts/WaveFormat/

"8-bit samples are stored as unsigned bytes, ranging from 0 to 255. 16-bit samples are stored as 2's-complement signed integers, ranging from -32768 to 32767."

So, since we are getting bytes from the Skype client we need to make sure that before these bytes are made into 16 bit samples that they are "2's-complement signed integer" 16 bit samples and NOT 16 Bit unsigned integer 16 bit samples. Example using one byte:

-1 = Signed Int vs 255 = Unsigned Int - from Byte 1111 1111

See: http://academic.evergreen.edu/projects/bio...ram/2s_comp.htm

So, as you can see if we don't sew these bytes back together again properly, we can create BAD 16 bit samples that will not be decoded correctly by our DTMF decoder code.

In C# we would do this, 2 bytes at a time from the bytes read on the port from our byte buffer, we need to sew them together from bytes to a "2's-complement signed integer" so that 2 bytes become one 16 bit samples and then pass that 16 bit sample to our DTMF decoder code, one 16 bit sample at a time:

CODE
for (int index = 0; index < bytesRead; index+=2)
    {
         Int16 sample = BitConverter.ToInt16(byte_buffer, index);
         // The above int sample contains 2 bytes now Signed Int
         // which is 1 sample of 16 bit PCM
         // Your DTMF Processing goes here;
    }


5. We also have some other issues:

a. If we are using a single threaded application we need to process this Port data from Skype and not wait for the data to arrive, we need to do other things as well asynchronously. We can do this by using a Asynchronous Server Socket. More here:

http://msdn.microsoft.com/en-us/library/5w7b7x5f.aspx

NOTE: The Above Microsoft example code can fail in some cases, for the "StartListening" section, the reason being is that Vista defaults to IPv6 of TCP/IP and XP and older Windows use IPv4 of TCP/IP, so to set a listener that will work on ALL Windows operating systems including IPv6 Vista, you need to do this, example in C#:

Replace these two lines in the Microsoft Example code listed at the above link, under "Start Listner", with the one line below it:

BAD in some cases, only supports IPv4 ("XP and before but NOT Vista")
CODE
IPHostEntry ipHostInfo = Dns.Resolve(Dns.GetHostName());
IPEndPoint localEP = new IPEndPoint(ipHostInfo.AddressList[0],11000);


See: http://bytes.com/forum/thread406011.html

Good IPv4 and Ipv6 (Vista, XP and all operating systems prior that support .NET)
CODE
IPEndPoint localEP = new IPEndPoint(System.Net.IPAddress.Loopback,11000);


See: http://msdn.microsoft.com/en-us/library/sy...s.loopback.aspx

b. We want to make sure we only support/allow one open of this port, we can't support more than one call in progress anyway and we don't want any hacker code trying to open ports with our application. Example in C# for 1 accept on the port:

CODE
listener.Listen(0);


The Windows Example shows;
CODE
listener.Listen(10);

We don't want to manage 10 opens on this port let allow another program to open the port 9 other times besides the open we need wink.png for the Skype Client to send data.

c. We want to make sure that this port is available and that the call is in progress before we call the 'pCall.OutputDevice[callIoDeviceTypePort]' code.

d. We also do want to not accept any open on the port, if there is NOT a call in progress, see .b, to avoid any hacking attempts as well. Why take the chance when we can avoid it.

NOTES: .b .d, call me paranoid, I don't take chances, if I have server code that is waiting to accept port opens, and I know it should only be done when a call is in progress, well then I do this, after the call has finished, in C#:

CODE
if (handler.Connected)
{
    handler.Shutdown(SocketShutdown.Both);
    handler.Close();
}


Just before getting data from the port from Skype, once a call is in progress to accept request to open I do this, in C#:

CODE
if (handler.Connected)
{
    handler.BeginReceive(state.buffer, 0, BufferSize, 0,
                             new AsyncCallback(ReadCallback), state);
}
else
   {
    listener.BeginAccept(new AsyncCallback(AcceptCallback), listener);
   }


I am moving my web site to another hosting company so it is a mess for the moment. It will be back to normal soon however.
trying
Hi, I have two ports receiving data on both the microphone and output from Skype in C#; however, if I append the bytes received from Skype to a .wav file, the WAV file is incorrect. You mention "DTMF Processing" -- is there some way to interweave the bytes coming from Skype and write to a WAV file?

CODE
                    int readBytes = state.Read(m_buffer, 0, 65536);
                    if (readBytes > 0)
                    {
                        lock (s_lock)
                        {
                            using (FileStream stream = File.Open(m_file, FileMode.Append, FileAccess.Write, FileShare.None))
                            {
                                for (int index = 0; index < readBytes; index += 2)
                                {
                                    Int16 sample = BitConverter.ToInt16(m_buffer, index);
                                    // The above int sample contains 2 bytes now Signed Int
                                    // which is 1 sample of 16 bit PCM
                                    // Your DTMF Processing goes here;
                                    stream.WriteByte(Convert.ToByte(sample & 0xff));
                                    stream.WriteByte(Convert.ToByte((sample >> 8) & 0xff));
                                }
                            }
                        }
                       }


Thanks!
TheUberOverlord
What do you mean when you say "WAV file is incorrect."? It plays, but the sounds is bad? It does notplay?


Did you put a .wav header on the file, is it correct?
trying
Thanks for the fast response.. The WAV file doesn't play in any media player. Also, the media player is unable to read the effective duration of the media file, so therefore something must be corrupt.
I didn't add any WAV header, I'll start to read the documentation on that. By the way, is it okay for me to concurrently write what I'm getting from the Skype TCP client to the same file? I want to make one WAV file which has the microphone and audio output in the same WAV File..

Thanks!
TheUberOverlord
QUOTE (trying @ Mon Aug 18 2008, 00:10)
Go to the original post
Thanks for the fast response.. The WAV file doesn't play in any media player. Also, the media player is unable to read the effective duration of the media file, so therefore something must be corrupt.
I didn't add any WAV header, I'll start to read the documentation on that. By the way, is it okay for me to concurrently write what I'm getting from the Skype TCP client to the same file? I want to make one WAV file which has the microphone and audio output in the same WAV File..

Thanks!


Yes, you always must have a .wav file header, when creating a .wav file, if the header is corrupt, .wav file players will NOT play the .wav file, or if they do, and use bad header data, they may not play the data in the file at the correct sample rates, Byte Rates, and may not use the correct number of channels. here are the specs for .wav file format:

http://ccrma.stanford.edu/CCRMA/Courses/42...cts/WaveFormat/

Just create a header with the ChunkSize set to Zero, then after adding your sample data, after the header, POSITION back to the top of the file, and re-write the header with the correct ChunkSize.

You can quickly read a working .wav file to see if your header structure is correct, and even use your code to read a good file to see if you calculate the correct ChunkSize and that you write the same data, then you can compare easily that your .wav file processing is working correctly.
trying
Great news! With your help, I got it working. I now have the TCP socket servers writing the Skype data from both the microphone and speaker to a file, then I write the WAV header and append the contents of the file (code below).

However, when I play the file, it plays fine, and the length of the file is correct, and I can also hear both the speaker on the other end as well as myself talking into the microphone... the problem is that the sound is at half of the speed! Is there some fiddling I need to do with channels, bit rate, or something else? Attached is the wav file so that you can hear.

CODE
FileInfo tfile = new FileInfo(_tempFile);
int length = (int)tfile.Length;
short channels = 1;
int samplerate = 16000;
int dataLength = length;
short bitsPerSample = 16;

// http://ccrma.stanford.edu/CCRMA/Courses/422/projects/WaveFormat/
using (FileStream fs = new FileStream(_file, FileMode.Create, FileAccess.Write))
{
    using (BinaryWriter bw = new BinaryWriter(fs))
    {
        bw.Write(new char[4] { 'R', 'I', 'F', 'F' });
        // This is the size of the entire file in bytes minus 8 bytes
        bw.Write(length + 44 - 8);
        bw.Write(new char[8] { 'W', 'A', 'V', 'E', 'f', 'm', 't', ' ' });
        bw.Write((int)16);
        bw.Write((short)1);
        bw.Write(channels);
        bw.Write(samplerate);
        bw.Write((int)(samplerate * ((bitsPerSample * channels) / 8)));
        bw.Write((short)((bitsPerSample * channels) / 8));
        bw.Write(bitsPerSample);
        bw.Write(new char[4] { 'd', 'a', 't', 'a' });
        bw.Write(dataLength);
    
        // now append all of the data from the temp file
        using (FileStream readerfs = new FileStream(_tempFile, FileMode.Open, FileAccess.Read))
        {
            using (BinaryReader br = new BinaryReader(readerfs))
            {
                byte[] buffer = new byte[SkypeTcpServer.BufferSize];
                int bytesRead;
                while ((bytesRead = br.Read(buffer, 0, buffer.Length)) > 0)
                {
                    bw.Write(buffer, 0, bytesRead);
            }
        }
    }
}


Wave file attached
TheUberOverlord
First, lets fix the header here, Skype does NOT play by the Pure Microsoft PCM rules. It uses this an Optional Header field of 2 bytes called:

2 ExtraParamSize if PCM, then doesn't exist
X ExtraParams space for extra parameters

Which makes the header 2 bytes longer for the .wav file.

Where the value of Subchunk1Size is normally 16 for PCM. This is the size of the rest of the Subchunk which follows this number. Skype uses the ExtraParamSize which makes it 18 or hex 12 vs hex 10 in your case, so you need to make your Wave Header 2 bytes longer and the 2 Byte Value of ExtraParamSize should be Hex "0000".

See SubChunk1Size here:

http://ccrma.stanford.edu/CCRMA/Courses/42...cts/WaveFormat/





Size of TheUberOverLordCount1To10



I also attached the Example .wav which is used in the picture as well.

So......

1. Your header is 2 bytes short ("Not your fault, I forget that Skype is NOT using a pure Microsoft PCM .wav header, has been awhile")

2. The extra 2 bytes are located at the "ExtraParamSize" field in the header which is shown here:

http://ccrma.stanford.edu/CCRMA/Courses/42...cts/WaveFormat/

3, The value of those 2 bytes "ExtraParamSize" should be Zeros, as the picture displays.

4. Of course now bw.Write(length + 44 - cool.png; should be bw.Write(length + 46 - cool.png; as well.

Here is what the original File Length in Bytes was before I made my changes:



Here is what the new size is after I inserted 2 extra bytes for the "ExtraParamSize":



Here is what the changes look like:

Your Old file

Your File with Header changes:


What was changed.

1. Chunksize Increased by 2.
2. 2 Byte "ExtraParamSize" added with a value of Hex "0000" bewteen BitsPerSample and Subchunk2ID

Lets fix this header first, But are you also saying you are merging the Microphone and ("and as you say") Speakers, so this data is port data merged from 2 sources?

The attached files include a .wav produced by the Skype Client of me counting 1 to 10 and your original file modified by me:
trying
Hi TheUberOverlord, thanks for fixing the header. One question: The Microsoft WAVE pseudo-standard says:

"ExtraParamSize if PCM, then doesn't exist"

However, you didn't mention changing PCM to 0 (AudioFormat), so is this correct that PCM is 1, yet ExtraParamSize exists?

To answer your question:

QUOTE
But are you also saying you are merging the Microphone and ("and as you say") Speakers, so this data is port data merged from 2 sources?


Yes. It's still not quite working. What's interesting in the latest skype.wav file (attached) is that the first few seconds, before my microphone layer kicks in is fine, but once the microphone is layed over the speaker, both of them slow down.

Updated code:

CODE
FileInfo tfile = new FileInfo(_tempFile);
int length = (int)tfile.Length;
short channels = 1;
int samplerate = 16000;
int dataLength = length;
short bitsPerSample = 16;
int headerSize = 46;

// http://ccrma.stanford.edu/CCRMA/Courses/422/projects/WaveFormat/
// http://forum.skype.com/index.php?showtopic=97944&st=0&gopid=831361&#entry831361
using (FileStream fs = new FileStream(_file, FileMode.Create, FileAccess.Write))
{
    using (BinaryWriter bw = new BinaryWriter(fs))
    {
        bw.Write(new char[4] { 'R', 'I', 'F', 'F' });
        // This is the size of the entire file in bytes minus 8 bytes
        bw.Write(length + headerSize - 8);
        bw.Write(new char[8] { 'W', 'A', 'V', 'E', 'f', 'm', 't', ' ' });
        // Skype uses a custom PCM
        bw.Write((int)18);
        //bw.Write((int)16);
        bw.Write((short)1);
        bw.Write(channels);
        bw.Write(samplerate);
        bw.Write((int)(samplerate * ((bitsPerSample * channels) / 8)));
        bw.Write((short)((bitsPerSample * channels) / 8));
        bw.Write(bitsPerSample);

        // 2 bytes extra for custom PCM
        bw.Write((short)0);

        bw.Write(new char[4] { 'd', 'a', 't', 'a' });
        bw.Write(dataLength);

        // now append all of the data from the temp file
        using (FileStream readerfs = new FileStream(_tempFile, FileMode.Open, FileAccess.Read))
        {
        using (BinaryReader br = new BinaryReader(readerfs))
        {
            byte[] buffer = new byte[SkypeTcpServer.BufferSize];
            int bytesRead;
            while ((bytesRead = br.Read(buffer, 0, buffer.Length)) > 0)
            {
            bw.Write(buffer, 0, bytesRead);
            }
        }
        }
    }
}


Thanks!
TheUberOverlord
You say:

"However, you didn't mention changing PCM to 0 (AudioFormat), so is this correct that PCM is 1, yet ExtraParamSize exists?"

AudioFormat has NOTHING to do with ExtraParamSize, in fact a value of other than 1 for AudioFormat "PCM = 1 (i.e. Linear quantization) Values other than 1 indicate some form of compression. So it stays at 1.

You CANNOT mix two different audio sources and claim in the .wav file header you have ONLY 1 channel. In fact, Not only does NumChannels need to go from a value of 1 to 2 (Stereo), Byte Rate, BlockAlign, and Subchunk2Size also need to be recalculated because NumChannels also went from 1 to 2.

Your samples now need to be sewn as:

16 Bits from the Speaker, 16 Bits from the Microphone.

Every 32 Bits has Speaker on the left 16 bits and Microphone on the right, as an example. You have 2 sources that impact the samples in different ways. The assumsion is that there are 16,000, 16 bits samples per second, from one source, if you add another source, and don't add a channel. ALL.....the samples are impacted!

Did you even read this? It even shows a Left/Right 2 channel example.



http://ccrma.stanford.edu/CCRMA/Courses/42...cts/WaveFormat/

I don't have a clue on how you are adding the bytes from these 2 ports/inputs currently, as you get them, or what?

I HAD a full head of hair, NOW I am BALD tongueout.png

Just Kidding....Sorry Bald people.
trying
Hi, Yes, I read it, but I can't say I understand it all that well worried.png

Right now, yes, I just append the bytes to the file as I get them. That, as you mention, is clearly wrong. How do I know when a sample for one of the champles is complete? I'm guessing I'll need to put them into memory first to coordinate adding the samples to the file. I found this:

http://www.lightlink.com/tjweber/StripWav/Canon.html

Is there a fixed size for a Skype WAV sample?

Also, beyond changing channels from 1 to 2, will I also need to change the "samplerate" and "bitsPerSample" in the following algorithm? Do I just multiply both by the number of channels?

CODE
FileInfo tfile = new FileInfo(_tempFile);
            int length = (int)tfile.Length;
            short channels = 2;
            int samplerate = 16000;
            int dataLength = length;
            short bitsPerSample = 16;
            int headerSize = 46;

            using (FileStream fs = new FileStream(_file, FileMode.Create, FileAccess.Write))
            {
                using (BinaryWriter bw = new BinaryWriter(fs))
                {
                    bw.Write(new char[4] { 'R', 'I', 'F', 'F' });
                    // This is the size of the entire file in bytes minus 8 bytes
                    bw.Write(length + headerSize - 8);
                    bw.Write(new char[8] { 'W', 'A', 'V', 'E', 'f', 'm', 't', ' ' });
                    bw.Write((int)18);
                    bw.Write((short)1);
                    bw.Write(channels);
                    bw.Write(samplerate);
                    bw.Write((int)(samplerate * ((bitsPerSample * channels) / 8)));
                    bw.Write((short)((bitsPerSample * channels) / 8));
                    bw.Write(bitsPerSample);
                    bw.Write((short)0);
                    bw.Write(new char[4] { 'd', 'a', 't', 'a' });
                    bw.Write(dataLength);


Thanks!

Oh, I guess bitsPerSample is 16 (per channel)? So I will just need to put the samples Skype sends on the ports into memory first, and coordinate writing them, so that I write as:

Sample 0 from Microphone port (16 bits)
Sample 0 from Speaker port (16 bits)
Sample 1 from Microphone port (16 bits)
Sample 1 from Microphone port (16 bits)
...etc.

Is the samplerate still 16000?

Thanks!
trying
One thing I don't understand in this example:

CODE
                                    Int16 sample = BitConverter.ToInt16(m_buffer, index);
                                    // The above int sample contains 2 bytes now Signed Int
                                    // which is 1 sample of 16 bit PCM
                                    // Your DTMF Processing goes here;
                                    stream.WriteByte(Convert.ToByte(sample & 0xff));
                                    stream.WriteByte(Convert.ToByte((sample >> 8) & 0xff));


The Skype TCP server sends my port a 4byte integer, but a sample is 2 bytes, so does that mean the Skype TCP server always sends my port 2 samples?
TheUberOverlord
Try taking 2 bytes of data from the Speaker, and then 2 bytes of data from the Microphone, keep doing this until you run out of data, if you end up not having more bytes left from the speaker or the microphone, process those bytes, an ad a zero for the other. Assume the order is:

16 Bits from the Speaker, 16 Bits from the Microphone

While you have port data from both ports to process, add to the .wav file using the above logic.

Say suddenly you end up, at the end of the call with zero bytes of port data for the speaker but still have 200 for the Microphone, 2 bytes at a time from the Microphone, write this 100 times to the .wav file:

16 Bits of Zero, 16 bits from port data from the Microphone.

Say suddenly you end up, at the end of the call with zero bytes of port data for the Microphone but still have 200 bytes for the Speaker, 2 bytes at a time from the Speaker byte data from the port, write this 100 times to the .wav file:

16 Bits from the Speaker, 16 Bits of Zero.

Get it?

You say "Is there a fixed size for a Skype WAV sample?" what?

Yes, until the call is no longer in progress, port data can/will be returned frm each source by Skype in 16 bit samples.

What do you mean, ALL the samples returned are 16 bits long, your causing the problem by merging 2 sources of samples and calling it one channel, and NOT meging them correctly and changing the header to reflect what you are doing tongueout.png .

There is no need to use another link about .wav file format, the picture, I added here for samples, from the original link is excellent in showing how 2 channels ("Sources/Inputs"0 of 16 bit samples are added/merged to the .wav file.
trying
Yep, I think I got it! Let me try it...

I'm learning :-)

Thanks!
trying
One issue here is that I have two sockets running on two threads. Let's say I get data on one of the sockets, I can't "peek" at the other socket because Receive is a blocking call. I guess what I'll do is just read all bytes as I get them, then use some cross-thread data block to say whether or not at the exact time of writing a sample, there is another sample in the other socket. If not, I write a 0 sample.
TheUberOverlord
QUOTE (trying @ Tue Aug 19 2008, 12:57)
Go to the original post
One issue here is that I have two sockets running on two threads. Let's say I get data on one of the sockets, I can't "peek" at the other socket because Receive is a blocking call. I guess what I'll do is just read all bytes as I get them, then use some cross-thread data block to say whether or not at the exact time of writing a sample, there is another sample in the other socket. If not, I write a 0 sample.


Right, the only other way to do it is to write the data to 2 files as binary data as you receive it, then after the call is completed, re-read both files and create a .wav merged result, with any silence at the end of the merged file for a given channel. Even using this case timing is not perfect, when played back.

If you have no need to use these files in real-time, this might be your best option, because padding Zeros, may create a stuttering effect, that is just creating by theading, due to more bytes arriving at a time on one port vs the other, and not really present in any real sense.

A tricky way might be to process for length of the SHORTEST buffer:

For (I = 0 to $MIN(buffer1,buffer2) -1) DO
{
sew together channels.
}

Save remaining bytes longer that the short buffer, to some saved buffer >= buffer length ("Suggested Buffer Length on these port reads is 64k") and next time when your thread wakes up process the leftover bytes from last time, tricky part about this is overflow, if one port never has data, or very little data, at some point you may get yourself in trouble. This is where the file option comes in to just store data now, process later.

The reason, WHY I wanted you to understand the header difference, is that if you wanted to feed one of these .wav files you are creating, back into a Skype call at some point, that it would "FAIL" without these header changes.

The closest method to perfection would be to timestamp receipt of each buffer on each port, and then realize that 16000 samples a second at 2 bytes each per port is 32,000 Bytes, per second per port, and pad zeros covering receipt in each buffer.

Sudo Code Example:

32,000/1000 = 32 bytes per millisecond

Microphone Initial timestamp: 10:30:01:00
Speaker Initial timetamp : 10:30:01:00

Speaker Port buffer received of 500 bytes at 10:30:01:30
LastBufferReceiveAT: 10:30:01:30

TimeElapased = 30 Milliseconds

32 * 30 = 960 Bytes max during this time.

960 - 500 = 460

SpeakerBuff = (460 * [0] + SpeakerPortBuffer for 500;
MicrophoneBuff = ( 960 * [0]);

Micrphone Port Buffer received of 110 bytes at 10:30:01:45
10:30:01:45 - LastBufferReceivedAt(10:30:01:30) = 15 Millieconds
LastBufferReceiveAT: 10:30:01:45

32 * 15 = 480 Bytes max during this time.
480 - 110 = 370

SpeakerBuff = (480 * [0]);
MicrophoneBuff = ( 370 * [0] + MicrophonePortBuffer);

......continue to call is finished.

Best way to preserve timing, since Skype might NOT send bytes if silence threshold is NOT broken for the Microphone or Speaker, you can prove this by covering up the Microphone and looking at bytes read from the port data is being sent to for the Microphone and then talking into the Microphone, number of bytes received will be greater for same time period when someone is speaking. So the above calculation, can preserve actual channel times better than even file storage would.

CPU Overhead using this method would be minimal, but file size could/would increase. Timing of channels would be near perfect vs anyones guess using the other methods.
trying
Yes, I agree with you, padding and synchronizing real time has too much overhead and is too much at the whim of the OS in how it schedules the threads, so like you said, I could end up with a badly sewn-together WAV file. I have decided to just forget about all the port stuff, and let Skype write each channel to a WAV and now I am writing code to weave the two waves together...

Thanks!
trying
Okay, now everything works (see code below), except that the microphone only goes to the left ear, and the speaker goes to the right ear. I guess this is the whole point of "channels" lol. What do I need to do to make the microphone and speaker both go to both of the channels? Do I just need to rearrange how I write my bytes below?

Here is my weave function which takes the speaker and microphone WAV files and appends them to the BinaryWriter bw which is the output WAV (to which the header has already been written):

CODE
using (FileStream f1s = new FileStream(file1, FileMode.Open, FileAccess.Read))
            {
                using (FileStream f2s = new FileStream(file2, FileMode.Open, FileAccess.Read))
                {
                    f1s.Position = SkypeWavHeaderSize;
                    f2s.Position = SkypeWavHeaderSize;
                    int i1a, i1b, i2a, i2b;
                    byte b1a, b1b, b2a, b2b;

                    while (true)
                    {
                        i1a = f1s.ReadByte();
                        i1b = f1s.ReadByte();
                        i2a = f2s.ReadByte();
                        i2b = f2s.ReadByte();

                        if (i1a == -1 && i2a == -1)
                        {
                            // both files at the end
                            break;
                        }
                        else if (i1b == -1 && i2b == -1)
                        {
                            // both files not at 2 byte boundary
                            break;
                        }

                        b1a = i1a == -1 ? (byte)0 : (byte)i1a;
                        b1b = i1b == -1 ? (byte)0 : (byte)i1b;
                        b2a = i2a == -1 ? (byte)0 : (byte)i2a;
                        b2b = i2b == -1 ? (byte)0 : (byte)i2b;

                        bw.Write(b1a);
                        bw.Write(b1b);
                        bw.Write(b2a);
                        bw.Write(b2b);
                    }
                }
            }


Thanks!

Attached is the WAV file created. Notice the speaker goes to the left ear, and my voice goes to the right ear.
TheUberOverlord
QUOTE (trying @ Tue Aug 19 2008, 14:13)
Go to the original post
Yes, I agree with you, padding and synchronizing real time has too much overhead and is too much at the whim of the OS in how it schedules the threads, so like you said, I could end up with a badly sewn-together WAV file. I have decided to just forget about all the port stuff, and let Skype write each channel to a WAV and now I am writing code to weave the two waves together...

Thanks!


Ok but please realize you cannot access those .wav files Skype has open for write, until the call in question, is no longer in progress.

Going on hold will also cause the Skype client to stop writing data to the .wav file as well as the call being finsihed. Anything that causes the call to change states from in progress, will cause to .wav file to be closed by teh Skype client.

Once the call is no longer in progress, you can open both files and easily merge them then, and even delete the files aftewards, if you wish.

If Skype has the security to do so, Skype will create/overwrite a .wav that is already there, better to be safe than sorry maybe and delete them once you do you merge however, just in case you application shares the same folder, with different Windows Users, who could use the same Skype name. This will avoid posible Windows user security issues.
TheUberOverlord
QUOTE (trying @ Tue Aug 19 2008, 14:30)
Go to the original post
Okay, now everything works (see code below), except that the microphone only goes to the left ear, and the speaker goes to the right ear. I guess this is the whole point of "channels" lol. What do I need to do to make the microphone and speaker both go to both of the channels? Do I just need to rearrange how I write my bytes below?

Here is my weave function which takes the speaker and microphone WAV files and appends them to the BinaryWriter bw which is the output WAV (to which the header has already been written):

CODE
using (FileStream f1s = new FileStream(file1, FileMode.Open, FileAccess.Read))
            {
                using (FileStream f2s = new FileStream(file2, FileMode.Open, FileAccess.Read))
                {
                    f1s.Position = SkypeWavHeaderSize;
                    f2s.Position = SkypeWavHeaderSize;
                    int i1a, i1b, i2a, i2b;
                    byte b1a, b1b, b2a, b2b;

                    while (true)
                    {
                        i1a = f1s.ReadByte();
                        i1b = f1s.ReadByte();
                        i2a = f2s.ReadByte();
                        i2b = f2s.ReadByte();

                        if (i1a == -1 && i2a == -1)
                        {
                            // both files at the end
                            break;
                        }
                        else if (i1b == -1 && i2b == -1)
                        {
                            // both files not at 2 byte boundary
                            break;
                        }

                        b1a = i1a == -1 ? (byte)0 : (byte)i1a;
                        b1b = i1b == -1 ? (byte)0 : (byte)i1b;
                        b2a = i2a == -1 ? (byte)0 : (byte)i2a;
                        b2b = i2b == -1 ? (byte)0 : (byte)i2b;

                        bw.Write(b1a);
                        bw.Write(b1b);
                        bw.Write(b2a);
                        bw.Write(b2b);
                    }
                }
            }


Thanks!

Attached is the WAV file created. Notice the speaker goes to the left ear, and my voice goes to the right ear.


Fails at end, using Windows Media player. So, make sure your last sample contains a sample for both channels, you need to be on 32 bit boundries now, meaning, your data in bytes must be divisable by 4 now, not 2, like before. Check your number of samples and header data, it works it seems up until the end.
trying
Okay, I will fix that, but didn't you notice that the speaker goes into one ear and the microphone goes into the other? I'd like to have them go to both ears-- is that possible to do with weaving of two WAV files?

Thanks!
trying
Hmmmm, yes, I have been using Winamp to test and it works fine in Winamp, but I can see the error in Windows Media Player now. I'm not quite sure what you mean by keeping it on a word boundary? You mean something like:

CODE
f1s.Position = SkypeWavHeaderSize;
                    f2s.Position = SkypeWavHeaderSize;
                    int i1a, i1b, i2a, i2b;
                    byte b1a, b1b, b2a, b2b;
                    int numWrites = 0;

                    while (true)
                    {
                        i1a = f1s.ReadByte();
                        i1b = f1s.ReadByte();
                        i2a = f2s.ReadByte();
                        i2b = f2s.ReadByte();

                        if (i1a == -1 && i2a == -1)
                        {
                            // both files at the end
                            break;
                        }
                        else if (i1b == -1 && i2b == -1)
                        {
                            // both files not at 2 byte boundary
                            break;
                        }

                        b1a = i1a == -1 ? (byte)0 : (byte)i1a;
                        b1b = i1b == -1 ? (byte)0 : (byte)i1b;
                        b2a = i2a == -1 ? (byte)0 : (byte)i2a;
                        b2b = i2b == -1 ? (byte)0 : (byte)i2b;

                        bw.Write(b1a);
                        bw.Write(b1b);
                        bw.Write(b2a);
                        bw.Write(b2b);
                        numWrites++;
                    }

                    if ((numWrites % 2) != 0)
                    {
                        bw.Write((byte)0);
                        bw.Write((byte)0);
                    }
TheUberOverlord
QUOTE (trying @ Tue Aug 19 2008, 14:48)
Go to the original post
Okay, I will fix that, but didn't you notice that the speaker goes into one ear and the microphone goes into the other? I'd like to have them go to both ears-- is that possible to do with weaving of two WAV files?

Thanks!


Yes, you could do something like this:

http://www.codeproject.com/KB/cs/WAVE_Processor_In_C_.aspx
TheUberOverlord
QUOTE (trying @ Tue Aug 19 2008, 14:56)
Go to the original post
Hmmmm, yes, I have been using Winamp to test and it works fine in Winamp, but I can see the error in Windows Media Player now. I'm not quite sure what you mean by keeping it on a word boundary? You mean something like:

CODE
f1s.Position = SkypeWavHeaderSize;
                    f2s.Position = SkypeWavHeaderSize;
                    int i1a, i1b, i2a, i2b;
                    byte b1a, b1b, b2a, b2b;
                    int numWrites = 0;

                    while (true)
                    {
                        i1a = f1s.ReadByte();
                        i1b = f1s.ReadByte();
                        i2a = f2s.ReadByte();
                        i2b = f2s.ReadByte();

                        if (i1a == -1 && i2a == -1)
                        {
                            // both files at the end
                            break;
                        }
                        else if (i1b == -1 && i2b == -1)
                        {
                            // both files not at 2 byte boundary
                            break;
                        }

                        b1a = i1a == -1 ? (byte)0 : (byte)i1a;
                        b1b = i1b == -1 ? (byte)0 : (byte)i1b;
                        b2a = i2a == -1 ? (byte)0 : (byte)i2a;
                        b2b = i2b == -1 ? (byte)0 : (byte)i2b;

                        bw.Write(b1a);
                        bw.Write(b1b);
                        bw.Write(b2a);
                        bw.Write(b2b);
                        numWrites++;
                    }

                    if ((numWrites % 2) != 0)
                    {
                        bw.Write((byte)0);
                        bw.Write((byte)0);
                    }


Check your data and header.
TheUberOverlord
Please Check your Work, don't make me do it for you, playing a file, is NOT checking a file. Good thing you are not paying my hourly consulting rate, I have been helping you ALL day, and this is the last time I will let you be LAZY! wink.png You debug your code from now on, and check it like a programmer, look at the file in HEX, not playing it in a media player!

You have MANY things wrong. First. Here is what the file length says it is:



Note it says it is 496,686 bytes. Now Lets look at what the header says:



The header says:

http://ccrma.stanford.edu/CCRMA/Courses/42...cts/WaveFormat/

ChunkSize = This is the size of the entire file in bytes minus 8 bytes for the two fields not included in this count.

So 496,686 - 8 = 486,678

Chunksize is right after "RIFF" and is reversed in Indian Hex "079454" which in decimal = 496,724 Which is LARGER than what the file size is let alone not the file size - 8.

Also SubChunk2size, right after "data" = 496,686 which is wrong as well. it should be 496,686 - 46 = 496,640 or Indian Hex "009407" NOT "2E9407"

Fixed header

Bad Header


Modifed working .wav file, works with Media player:
trying
Hi, your help has been amazing! I am very grateful. Unfortunately, I've gone to a simpler solution, so I am no longer even writing my own header (because we both know I wasn't good at that anyway ;-P). Here is the code now:

http://code.google.com/p/opensourceskyperecorder/
http://opensourceskyperecorder.googlecode.com/svn/trunk/

It is a command line program you run, type the path of the WAV file and it will record the conversation..

Thanks,
TheUberOverlord
QUOTE (trying @ Tue Aug 19 2008, 16:01)
Go to the original post
Hi, your help has been amazing! I am very grateful. Unfortunately, I've gone to a simpler solution, so I am no longer even writing my own header (because we both know I wasn't good at that anyway ;-P). Here is the code now:

http://code.google.com/p/opensourceskyperecorder/
http://opensourceskyperecorder.googlecode.com/svn/trunk/

It is a command line program you run, type the path of the WAV file and it will record the conversation..

Thanks,


Well, hopefully, after spending my day on this, it will help somebody.

I would just suggest next time, too please check your work, otherwise it takes HOURS to debug what you seem to busy to do wink.png

The only reason why I let it get this far is I am sure this will be helpful to others, in the future.

FYI: You can use the Windows Calculator to do the hex/decimal conversions easily, simply start Windows Calculator, from:

Start -> All Program -> Accessories -> Calculator

Click on View an set it to Scientific

Enter a decimal number and select Hex to see the Hex value or a Hex Value and then select decimal, always remember the hex values are reversed in the .wav file vs the calculator.

If anyone has any questions, please ask, this should make it much easier for others to see what is involed in using the Voice API with .wav files and their headers and data formats.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.