Help - Search - Members - Calendar
Full Version: Audio over TCP
Skype Community > English > Development, Betas and Skype Garage > Public API
Bundabrg
Hi All,

I have had good success at getting skype to pull audio from a 'wav' file by redirecting the INPUT to this file.

I have decided to instead now get skype to connect to my app using the PORT approach. To this end, I have directed both the INPUT and OUPUT to different ports.

To my suprise, in both cases, Skype acts as a client and tries to connect to the port (thus you need to listen as a server). This seems contrary to some of the posts I've read, unless MIC is the opposite (I've not tried, nor do I need to capture that). This is quite tricky in skype with call conferences since you direct one of the 'calls' to a port, and if they disconnect, it gets hard to know whether to redirect on someone else or not. Ideally the 'redirects' would be done on a 'conversation' basis rather than call, but I'm guessing call conference is more of a later addition.

I can read the audio data fine, and in fact my app does a good job at silence detection and is able to isolate sentences very well. For the moment, I'm just dumping them into separate wav files (with a header autogenerated), but the eventual aim is to pass it through the Julius speech recognition system (actually I've already done this, and it does an ok job). Sphinx isn't bad either.

My issue is to play sound back to the caller. Skype correctly connects to my listener, and I can correctly pass the raw data to it (making sure to avoid the first 44 or 46 bytes of the WAV header if sending a wav file). However, there seems to be no limit to how fast I can write to the socket.

I am performing a 'select' on the socket to tell me when its available for writing, and it comes back ready-to-write nearly every time. This means I can write 30s of audio in just a few seconds.

What happens, however, is that after about 10s or so, the audio get very mushy and garbled, as if the skype audio buffer is being overrun.

Any thoughts? I'm trying to think how I might slow down my transmission to skype without it skipping. Putting slight delays seems to make it last longer, but only a bit. As a fallback, I may resort back to just redirecting to wav files for INPUT.

Keep in mind I'm using Skype Linux, so for all I know, this issue is resolved in the far newer windows one.

Thanks,
Bundabrg
TheUberOverlord
Not sure how you got this to work as Server sockets only. I would check to make sure.

When receiving TCP/IP data from the Skype client, you should be acting like a Server ("Output"), and when sending TCP/IP data to the Skype client ("Input"), you should be acting as a Client.

Personally, I use a buffer size of 8192, when receiving data from the Skype client.

When sending to the Skype client I would use 4096 as a buffer size, and send as often as possible, without impacting your processing flow, so that there is always data to process.

You only can be using one Voice API method, at a time, so you can't use the same Voice API on multpile calls, you can use the get method, the see what call is doing what for example, for Input/Output.

In Windows, this process, is easier to do, I think:

Using an Asynchronous Server Socket:

http://msdn.microsoft.com/en-us/library/5w7b7x5f.aspx

Using an Asynchronous Client Socket:

http://msdn.microsoft.com/en-us/library/bbx2eya8.aspx

The trick on sending data to the client, is to try to always have some data, for the client to process.
Bundabrg
Definately the Linux Skype connects to both the input and output sockets as a client (I have to be a server). Possibly something they changed later?

I use 4096 for receiving (mainly since I want about 4*4096 packets to detect silence, and be able to keep about 3*4096 backwards in the buffer to tack onto the end of the saved data).

I use 8196 for sending, though I dropped that to 256, and raised to 16K without any difference. I even ended up using a blocking socket in its own thread and the sound still gets really messed up after about 10 seconds of continuous sound. I think its probably a bug in the Skype for Linux fixed in the later releases for Windows. I'll probably end up using 'wav' files redirects instead, though I have issues where sometimes the 'OnInputStatus' is triggered too quickly when a person joins a chat and thus the first sentence has a possibility of being cut off, something I'm sure I can work around by perhaps delaying a tiny bit.

For an example of the sounds issue, try call 'lsf.chmmr', and then send a 'text' chat to it saying 'what is aiml'. The response is long enough to display the issue. Sounds a bit demonic.

Thanks,
- Bundabrg
TheUberOverlord
I am checking with Skype on this to see what is different, with the Linux API in your case.
Bundabrg
Muchly appreciated.

- Bundabrg
TheUberOverlord
Well, if it is working as a server socket connection, then too me at least, that's odd.

I am wondering if there is anything you can try to make it a client socket connection, that you might not have tried yet, what happens when you have tried to do this?

I can't emulate it, because, currently, all my systems are Windows based, but I do wonder, if acting as a server, could be causing/contributing to the delay.
Bundabrg
I'll write a small proof of concept program and will get back with my results of running it under both a Linux and a Windows environment (and will post the code here).

- Bundabrg
TheUberOverlord
Thanks
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.