IceLink 3 introduces a new media pipeline that is vastly more powerful than its previous iteration from IceLink 2.
Both media stacks are rooted in the concept of a local media stream and one or more remote media streams. The local media stream represents access to local source media, like your webcam or microphone. The remote media stream represents access to incoming media from a remote peer, usually directed to your screen (video) or speakers (audio).
The local media stream can exist independently of any network connections, but is most often attached to a network stream where it feeds outbound media to remote peers. The remote media stream only ever exists within the context of a network stream.
Generally speaking, you have one remote media stream for each connection to a remote peer.
IceLink 3 vs. IceLink 2 Media Pipelines
A media stream sub-divides into media tracks, where each track represents either audio or video.
A local audio track, for example, will usually involve a microphone source and one or more encoders (e.g. Opus) that can compress the raw audio to a size suitable for sending out over a network. On the remote side of things, you would need a decoder that can decompress the sound as well as access to the device audio output to play it.
IceLink 2 is somewhat flexible in allowing you to customize where the local media comes from (the source, or capture provider), where the remote media goes to (the sink, or render provider), and which codecs to use at a global level within your application.
For example, you can switch between using your webcam or screen as the video source, or even configure a particular connection to write its incoming audio to disk instead of rendering to your device speakers.
However, there are a number of use cases that aren’t covered easily by these options, and this is where IceLink 3’s integrated media pipeline really shines.
Each remote media track is built up uniquely for each connection, allowing you to optimize the available codec selection uniquely for each peer. Each element in a given media track is customizable, and new elements can be wired in easily, making extensibility a breeze.
Want to add a watermark, take a snapshot, or do face detection? Easy.
Want to stream media out from an IP camera, resample audio, scale images, or convert between formats? Simple.
The Basics of IceLink 3
Each of the IceLink 3 chat examples comes with source examples of LocalMedia and RemoteMedia classes that are set up for you and ready to go. You can view the examples by downloading a free trial of IceLink 3 here.
These classes support the “standard” use case - basic audio/video calls and screen sharing:
- One audio track, one video track, or one of each
- Webcam or screen capture
- Recording of audio and/or video
- Support for the standard WebRTC/ORTC audio and video codecs
If what you want to build is a video calling or screen sharing application, then the bulk of the work is already done.
If you want to tweak a few things or add an event handler, then you can probably get away with applying a few minor customizations to these two classes.
For example, say you want to apply a bit of dynamic volume adjustment to the audio source. Just add an event handler to CreateAudioSource:
protected override AudioSource CreateAudioSource(AudioConfig config)
{
var source = new NAudio.Source(config);
source.OnRaiseFrame += (e) =>
{
e.Buffer.ApplyGain(2.0); // 1.0 == unity gain
};
return source;
}
Want to scale the video source images at runtime? Just modify CreateImageScaler:
private Yuv.ImageScaler _ImageScaler = new Yuv.ImageScaler(1.0);
protected override VideoPipe CreateImageScaler()
{
return _ImageScaler;
}
public void SetImageScale(double scale)
{
_ImageScaler.Scale = scale;
}
Thinking about using your IP camera as a video source? Change CreateVideoSource to pull from the motion JPEG URL using AForge.NET:
protected override VideoSource CreateVideoSource()
{
return new AForge.MotionJpegSource(Url);
}
IceLink 3 ensures that the API is the same on each platform, so even though the above snippets use C#, the same functionality is available to your Objective-C, Swift, or Java applications.
Going Deeper
Both LocalMedia and RemoteMedia are wrappers around a WebRTC-compatible AudioTrack and/or VideoTrack.
Because of this, they are also completely optional. IceLink 3 allows you to create your own tracks directly (noting that this does not work in JavaScript where we are limited to the high-level abstraction built into HTML5 and WebRTC).
For example, if we already have access to a stream of Opus packets to use as an audio source, we can set up an AudioTrack that simply adds an RTP header before streaming out to the remote peer:
var localAudioTrack = new AudioTrack(new CustomOpusSource())
.Next(new Opus.Packetizer());
…
var audioStream = new AudioStream(localAudioTrack, …);
Perhaps we want our client to record incoming media and not pass it to the device speakers. If we know the remote side supports Opus (e.g. supports WebRTC), we can do this easily:
var remoteAudioTrack = new AudioTrack(new Opus.Depacketizer())
// Matroska is an open-standard file format
// that supports Opus, VP8, and H.264.
.Next(new Matroska.AudioSink("remote.opus"));
…
var audioStream = new AudioStream(..., remoteAudioTrack);
Simple, right?
Deeper Still
Let’s take this a step further. At its core, each MediaTrack is composed of one or more of the following elements:
- Sources (output-only)
- Pipes (input/output)
- Sinks (input-only)
An output element (source/pipe) raises media frames which are then processed by one or more input elements (pipes/sinks). Sources are always initial, in the sense that nothing precedes them, while sinks are always terminal, in the sense that nothing follows them.
Pipes are what go in between - encoders, decoders, image converters, resamplers, etc. Within a given MediaTrack, it is possible to create a branching pipeline. The most common use case for this is to support multiple codecs.
Take a basic WebRTC-compatible audio track and video track for example:
var localAudioTrack = new AudioTrack(new NAudio.Source(...))
.Next(new[]
{
// Branch! Assuming we are capturing at 48,000Hz
// stereo (the default for Opus), we can feed
// directly into the Opus encoder. PCMU and PCMA
// (a.k.a. G.711) both run at 8,000Hz mono, so
// we will need to downsample before encoding.
new AudioTrack(new Opus.Encoder())
.Next(new Opus.Packetizer()),
new AudioTrack(new SoundConverter(G711.Format.DefaultConfig))
.Next(new[]
{
// Branch! Since PCMU and PCMA both run
// at 8,000Hz mono, we can use the same
// sound converter for both.
new AudioTrack(new Pcmu.Encoder())
.Next(new Pcmu.Packetizer()),
new AudioTrack(new Pcma.Encoder())
.Next(new Pcma.Packetizer())
})
});
var localPreview = new Wpf.ImageSink();
var localVideoTrack = new VideoTrack(new AForge.CameraSource(...))
.Next(new[]
{
// Branch! The camera source can feed its
// RGB data directly into a local preview.
// The VP8 and H.264 video encoders require
// YUV (I420) as input, we will need to
// convert the images before encoding.
new VideoTrack(localPreview),
new VideoTrack(new Yuv.ImageConverter(VideoFormat.I420))
.Next(new[]
{
// Branch! Since VP8 and H.264 both
// take I420 as input, we can use the
// same image converter for both.
new VideoTrack(new Vp8.Encoder())
.Next(new Vp8.Packetizer()),
new VideoTrack(new OpenH264.Encoder())
.Next(new H264.Packetizer())
})
});
IceLink 3 makes use of the phenomenal libyuv library in its ImageConverter to make switching formats fast and easy.
Stay Tuned for More on IceLink 3
In our next blog post, we’ll be looking at building custom media elements that can be used in an IceLink 3 media pipeline, starting with a custom video source.
If you are building your own custom media elements, or are curious about customizing the media pipeline to suit your needs, be sure to contact our support team who can offer helpful suggestions and make sure your application runs as efficiently as possible!