Migration Guide
This guide details the main steps to migrate between major release versions.
Migrate from 1.x to 2.x
The 2.0 release introduces some significant changes in the API compared to the 1.0 release in order to include support for multiple media tracks per peer connection.
Old C++ library
Previously in the 1.x release the Microsoft.MixedReality.WebRTC.Native.dll
module acted both as an implementation DLL for the C# library as well as a C++ library for direct use in end-user C++ applications. This double usage had diverging constraints which were making the internal implementation unnecessarily complex.
Starting from the 2.0 release, this module now exposes a pure C API providing the core implementation of MixedReality-WebRTC. This library can be used from C/C++ programs with ease, as the use of a C API:
- allows use from C programs, which was not previously possible;
- sidesteps C++ complications with DLLs (destructor, template and inlining, etc.).
The library has been renamed to mrwebrtc.dll
(libmrwebrtc.so
on Android) to emphasize this change and the fact there is no C++ library anymore, only a C library.
C# library
The C# library exposes a transceiver API very similar to the one found in the WebRTC 1.0 standard, and therefore familiarity with that standard helps understanding the API model. The API is not guaranteed to exactly follow the standard, but generally stays pretty close to it.
Standalone audio and video sources
Audio and video sources (webcam, microphone, external callback-based source) are now standalone objects not tied to any peer connection. Those objects, called track sources, can be reused by many audio and video tracks, which allows usage scenario such as sharing a single webcam or microphone device among multiple peer connections.
- Users must create the audio and video track source objects explicitly, independently of the peer connection.
- Device-based (microphone) audio track sources are created with the class method
DeviceAudioTrackSource.CreateAsync()
. - Video track sources are created with class methods such as
DeviceVideoTrackSource.CreateAsync()
for device-based sources (webcam) orExternalVideoTrackSource.CreateFromI420ACallback()
for custom callback-based sources.
- Device-based (microphone) audio track sources are created with the class method
- Users are owning those track source objects and must ensure they stay alive while in use by any audio or video track, and are properly disposed after use (
IDisposable
).
Standalone local track objects
Audio and video tracks are now standalone objects, not owned by a peer connection, and not initially tied to any peer connection but bound to one on first use.
- Users must create the audio and video track objects explicitly, independently of the peer connection. Those tracks are created from a track source (see above).
- Audio tracks are created with
LocalAudioTrack.CreateFromSource()
. - Video tracks are created with
LocalVideoTrack.CreateFromSource()
.
- Audio tracks are created with
- Users are owning those track objects and must ensure they stay alive while in use by the peer connection they are bound to on first use (when first assigned to a transceiver; see below), and are properly disposed after use (
IDisposable
).
Note that remote tracks remain owned by the peer connection which created them in response to an SDP offer or answer being received and applied, like in the previous 1.0 API.
Transceivers
Previously in the 1.0 API the peer connection was based on an API similar to the track-based API of the pre-standard WebRTC specification. The 2.0 release introduces a different transceiver-based API for manipulating audio and video tracks, which is more closely based on the WebRTC 1.0 standard.
- A transceiver is a "media pipe" in charge of the encoding and transport of some audio or video tracks.
- Each transceiver has a media kind (audio or video), and a sender track slot and a receiver track slot. Audio tracks can be attached to audio transceivers (transceivers with a
Transceiver.MediaKind
property equal toMediaKind.Audio
). Conversely, video tracks can be attached to video transceivers (MediaKind.Video
). - An empty sender track slot on a transceiver makes it send (if its direction include sending) empty data, that is black frames for video or silence for audio. An empty receiver track slot on a transceiver means the received media data, if any (depends on direction), is discarded by the implementation.
- A peer connection owns an ordered collection of audio and video transceivers. Users must create a transceiver with
PeerConnection.AddTransceiver()
. Transceivers cannot be removed; they stay attached to the peer connection until that peer connection is destroyed. - Transceivers have a media direction which indicates if they are currently sending and/or receiving media from the remote peer. This direction can be set by the user by changing the
Transceiver.DesiredDirection
property. - Changing a transceiver direction requires an SDP session renegotiation, and therefore changing the value of the
Transceiver.DesiredDirection
property raises aPeerConnection.RenegotiationNeeded
event. After the session has been renegotiated, the negotiated direction can be read from theTransceiver.NegotiatedDirection
read-only property. - Media tracks are attached to and removed from transceivers. Unlike in 1.0, this does not require any session negotiation. Tracks can be transparently (from the point of view of the session) attached to a transceiver, detached from it, attached to a different transceiver, etc. without any of these raising a
PeerConnection.RenegotiationNeeded
event.
A typical workflow with the transceiver API is as follow:
- The offering peer creates some transceivers and create an SDP offer.
- The offer is sent to the answering peer.
- The answering peer accepts the offer; this automatically creates the transceivers present in the offer that the offering peer created in step 1.
- The answering peer optionally add more transceivers beyond the ones already existing.
- The answering peer creates an SDP answer.
- The answer is sent back to the offering peer.
- The offering peer accepts the answer; this automatically creates any additional transceiver that the answering peer added in step 4.
Migrating from the 1.x release, users typically:
- On the offering peer, replace calls to
PeerConnection.AddLocalAudioTrack()
with:- a call to
DeviceAudioTrackSource.CreateAsync()
to create the device-based (microphone) audio track source. - a call to
LocalAudioTrack.CreateFromSource()
to create the audio track that will bind the microphone audio of that source to the transceiver sending it to the remote peer. - a call to
PeerConnection.AddTransceiver()
to add an audio transceiver. - assigning the
Transceiver.LocalAudioTrack
property to the audio track. - setting the
Transceiver.DesiredDirection
toDirection.SendReceive
orDirection.SendOnly
depending on whether they expect to also receive an audio track from the remote peer.
- a call to
- On the answering peer, replace calls to
PeerConnection.AddLocalAudioTrack()
in the same way as on the offering peer, creating a track source and a track. However, do not immediately callPeerConnection.AddTransceiver()
, but instead wait for the offer to create the transceiver. This requires some coordination, either implicit (pre-established transceiver order) or explicit (user communication between the 2 peers, for example using data channels), to determine on each peer which transceiver to use for which track. Note that the Unity library uses implicit coordination via media lines (declarative model). - Proceed similarly for video tracks.
Signaling
- SDP messages are now encapsulated in a data structure for clarity (
SdpMessage
andIceCandidate
).
Unity integration
Unlike the C# library, which stays close to the WebRTC 1.0 in terms of transceiver behavior, the Unity integration takes a step back and build some convenience features on top of the transceiver API of the C# library. For the user, this avoids having to deal manually with pairing transceivers and tracks, and relying instead on a much simpler declarative model.
Users are encouraged to follow the updated Unity tutorial to understand how to setup a peer connection with this new API.
Peer connection
The PeerConnection
component now holds a collection of media lines, which can be described as a sort of "transceiver intent". These describe the final result the user intend to produce after an SDP session is fully negotiated in terms of transceivers and their attached tracks. The component internally manages adding transceivers when needed to match the user's media line description, as well as creating and destroying local sender tracks when a source is assigned by the user to the media line or removed from it.
Signaler
- The
PeerConnection.Signaler
property has been removed; theSignaler
component is now only a helper to creating custom signaling solution, but is not required anymore. - As a result, the
Signaler
component now has aSignaler.PeerConnection
property which must be set up by the user. This is true in particular for theNodeDssSignaler
component which derives from theSignaler
abstract base component.
Video
- The
LocalVideoSource
component, which was previously using a webcam to capture video frames, has been renamed into the more explicitWebcamSource
component. - The
WebcamSource
component derives from the abstractVideoTrackSource
component, which is also the base class for the callback-based sources like theSceneVideoSource
component (previously:SceneVideoSender
) which captures its video frame from the rendering of any Unity Camera component. - The
RemoteVideoSource
component has been renamed into theVideoReceiver
component. - The
VideoSource
component has been split intoVideoTrackSource
for local sources andVideoReceiver
for remote sources. - The
MediaPlayer
component has been renamed into theVideoRenderer
component for clarity, has it only deal with video rendering and not audio. This also prevent a name collision with the Unity built-in component. This component can render any video source exposing some method taking anIVideoSource
, and most notablyVideoTrackSource
(local video) andVideoReceiver
(remote video).
Audio
- The
LocalAudioSource
component, which was previously using a microphone to capture audio frames, as been renamed into the more explicitMicrophoneSource
component. - The
MicrophoneSource
component derives from the abstractAudioTrackSource
component to enable future customizing usages and for symmetry with video. - The
RemoteAudioSource
component has been renamed into theAudioReceiver
component. This component now forwards its audio to a localUnity.AudioSource
component on the sameGameObject
, which allows the audio to be injected into the Unity DSP pipeline and mixed with other audio sources from the scene. This in particular enables usage scenarios such as spatial audio (3D localization of audio sources for increased audio immersion). - The
AudioSource
component has been split intoAudioTrackSource
for local sources andAudioReceiver
for remote sources.