RTP --- Real-time Transport Protocol
For the future Integrated Services Internet will provide means to transmit real-time multimedia data across networks. RSVP, RTP, RTCP and RTSP are the foundation of real-time services. RTP is the transport protocol for real-time data. It provides timestamp, sequence number and other means to handle the timing issues in real-time data transport. It relies on RVSP for resource reservation to provide quality of service.
Real-time transport protocol (RTP) is an IP-based protocol providing support for the transport of real-time data such as video and audio streams. The services provided by RTP include time reconstruction, loss detection, security and content identification. RTP is primarily designed for multicast of real-time data, but it can be also used in unicast. It can be used for one-way transport such as video-on-demand as well as interactive services such as Internet telephony.
RTP is designed to work in conjunction with the auxiliary control protocol RTCP to get feedback on quality of data transmission and information about participants in the on-going session.
Attempts to send voice over networks began in early 70's. Several patents on packet transmission of speech, time stamp and sequence numbering were granted in 70's and 80's. In 1991, a series of voice experiments were completed on DARTnet. In August 1991, the Network Research Group of Lawrence Berkeley National Laboratory released an audio conference tool vat for DARTnet use. The protocol used was referred later as RTP version 0.
In December 1992 Henning Schulzrinne, GMD Berlin, published RTP version 1. It went through several states of Internet Drafts and was finally approved as an Proposed Standard on November 22, 1995 by the IEFT/IESG. This version was called RTP version 2 and was published as
- RFC 1889, RTP: A Transport Protocol for Real-Time Applications
- RFC 1890, RTP Profile for Audio and Video Conferences with Minimal Control
On January 31, 1996, Netscape announced "Netscape LiveMedia" based on RTP and other standards. Microsoft claims that their NetMeeting Conferencing Software supports RTP. The latest extensions have been made by an industry alliance around Netscape Inc., who uses RTP as the basis of the Real Time Streaming Protocol RTSP.
- RFC References
- RFC 4103, RTP Payload Format for Text Conversation
- RFC 3984, RTP Payload Format for H.264 Video
- RFC 3640, RTP Payload Format for Transport of MPEG-4 Elementary Streams
- RFC 3267, Real-Time Transport Protocol (RTP) Payload Format and File Storage for the Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate Wideband (AMR-WB) Audio Codec
- RFC 3016, RTP Payload Format for MPEG-4 Audio/Visual Streams
- RFC 3551, Standard 65, RTP Profile for Audio and Video Conferences with Minimal Control
- RFC 3550, Standard 64, RTP: A Transport Protocol for Real-Time Applications
- RFC 2250, Proposed Standard, RTP Payload Format for MPEG1/MPEG2 Video
- How does RTP Work
As discussed in the first section, Internt is a shared datagram network. Packets sent on the Internet have unpredictable delay and jitter. But multimedia applications require appropriate timing in data transmission and playing back. RTP provides time stamping, sequence numbering, and other mechanisms to take care of the timing issues. Through these mechanisms, RTP provides end-to-end transport for real-time data over datagram network.
Timestamping is the most important information for real-time applications. The sender sets the timestamp according to the instant the first octet in the packet was sampled. Timestamps increase by the time covered by a packet. After receiving the data packets, the receiver uses the timestamp to reconstruct the original timing in order to play out the data in correct rate. Timestamp is also used to synchronize different streams with timing properties, such as audio and video data in MPEG. However, RTP itself is not responsible for the synchronization. This has to be done in the application level.
UDP does not deliver packets in timely order, so sequence numbers are used to place the incoming data packets in the correct order. They are also used for packet loss detection. Notice that in some video format, when a video frame is split into several RTP packets, all of them can have the same timestamp. So just timestamp is not enough to put the packets in order.
The payload type identifier specifies the payload format as well as the encoding/compression schemes. From this payload type identifier, the receiving application knows how to interpret and play out the payload data. Default payload types are defined in RFC 1890. Example specifications include PCM, MPEG1/MPEG2 audio and video, JPEG video, Sun CellB video, H.261 video streams, et al. More payload types can be added by providing a profile and payload format specification. At any given time of transmission, an RTP sender can only send one type of payload, although the payload type may change during transmission, for example, to adjust to network congestion.
Another function is source identification. It allows the receiving application to know where the data is coming from. For example, in an audio conference, from the source identifier a user could tell who is talking.
The above mechanisms are implemented through the RTP header. Figure 1 shows a RTP packet encapsulated in a UDP/IP packet.
Figure 1: RTP data in an IP packet
RTP is typically run on top of UDP to make use of its multiplexing and checksum functions. TCP and UDP are two most commonly used transport protocols on the Internet. TCP provides a connection-oriented and reliable flow between two hosts, while UDP provides a connectionless but unreliable datagram service over the network. UDP was chosen as the target transport protocol for RTP because of two reasons. First, RTP is primarily designed for multicast, the connection-oriented TCP does not scale well and therefore is not suitable. Second, for real-time data, reliability is not as important as timely delivery. Even more, reliable transmission provided by retransmission as in TCP is not desirable. For example, in network congestion, some packets might get lost and the application would result in lower but acceptable quality. If the protocol insists a reliable transmission, the retransmitted packets could possibly increase the delay, jam the network, and eventually starve the receiving application.
RTP and RTCP packets are usually transmitted using UDP/IP service. However, efforts have been made to make them transport-independent so they can be also run on CLNP(Connectionless Network Protocol), IPX (Internetwork Packet Exchange), AAL5/ATM or other protocols.
In practice, RTP is usually implemented within the application. Many issues, such as lost packet recovery and congestion control, have to be implemented in the application level. To set up an RTP session, the application defines a particular pair of destination transport addresses (one network address plus a pair of ports for RTP and RTCP). In a multimedia session, each medium is carried in a separate RTP session, with its own RTCP packets reporting the reception quality for that session. For example, audio and video would travel on separate RTP sessions, enabling a receiver to select whether or not to receive a particular medium.
An audio-conferencing scenario presented in RFC 1889 illustrates the use of RTP. Suppose each participant sends audio data in segments of 20 ms duration. Each segment of audio data is preceded by an RTP header, and then the resulting RTP message is placed in a UDP packet. The RTP header indicates the type of audio encoding that is used, e.g., PCM. Users can opt to change the encoding during a conference in reaction to network congestion or, for example, to accommodate low-bandwidth requirements of a new conference participant. Timing information and a sequence number in the RTP header are used by the receivers to reconstruct the timing produced by the source, so that in this example, audio segments are contiguously played out at the receiver every 20 ms.
- RTP Fixed Header Fields
The RTP header has the following format:
Figure 2: RTP Packet Header
The RTP header has a minimum size of 12 bytes. After the header, optional header extensions may be present. This is followed by the RTP payload, the format of which is determined by the particular class of application. The fields in the header are as follows:
Version: (2 bits) Indicates the version of the protocol. Current version is 2.
P (Padding): (1 bit) Used to indicate if there are extra padding bytes at the end of the RTP packet. Padding might be used to fill up a block of certain size, for example as required by an encryption algorithm.
X (Extension): (1 bit) Indicates presence of an Extension header between standard header and payload data. This is application or profile specific.
CC (CSRC Count): (4 bits) Contains the number of CSRC identifiers (defined below) that follow the fixed header.
M (Marker): (1 bit) Used at the application level and defined by a profile. If it is set, it means that the current data has some special relevance for the application.
PT (Payload Type): (7 bits) Indicates the format of the payload and determines its interpretation by the application. This is specified by an RTP profile. For example, see RTP Profile for audio and video conferences with minimal control (RFC 3551).
Sequence Number: (16 bits) The sequence number is incremented by one for each RTP data packet sent and is to be used by the receiver to detect packet loss and to restore packet sequence. The RTP does not take any action when it sees a packet loss, but it is left to the application to take the desired action. For example, video applications may play the last known frame in place of the missing frame. According to RFC 3550, the initial value of the sequence number should be random to make known-plaintext attacks on encryption more difficult. RTP provides no guarantee of delivery, but the presence of sequence numbers makes it possible to detect missing packets.
Timestamp: (32 bits) Used to enable the receiver to play back the received samples at appropriate intervals. When several media streams are present, the timestamps are independent in each stream, and may not be relied upon for media synchronization. The granularity of the timing is application specific. For example, an audio application that samples data once every 125 µs (8 kHz, a common sample rate in digital telephony) could use that value as its clock resolution. The clock granularity is one of the details that is specified in the RTP profile or payload format for an application.
SSRC: (32 bits) Synchronization source identifier uniquely identifies the source of a stream. The synchronization sources within the same RTP session will be unique.
CSRC: Contributing source IDs enumerate contributing sources to a stream which has been generated from multiple sources.
Extension header: (optional) The first 32-bit word contains a profile-specific identifier (16 bits) and a length specifier (16 bits) that indicates the length of the extension (EHL=extension header length) in 32-bit units, excluding the 32 bits of the extension header.
- Profiles and Payload formats
One of the design considerations of the RTP was to support a range of multimedia formats (such as H.264, MPEG-4, MJPEG, MPEG, etc.) and allow new formats to be added without revising the RTP standard. The design of RTP is based on the architectural principle known as Application Level Framing (ALF). The information required by a specific application needs are not present in the generic RTP header and are specified by RTP Profiles and Payload formats. For each class of application (e.g., audio, video), RTP defines a profile and one or more associated payload formats.
The Profile defines the codecs used to encode the payload data and their mapping to payload format codes in the Payload Type (PT) field of the RTP header (see below). Each profile is accompanied by several payload format specifications, each of which describes the transport of a particular encoded data. Some of the audio payload formats include: G.711, G.723, G.726, G.729, GSM, QCELP, MP3, DTMF etc., and some of the video payload formats include: H.261, H.263, H.264, MPEG etc.
Examples of RTP Profiles include:
- The RTP profile for Audio and video conferences with minimal control (RFC 3551) defines a set of static payload type assignments, and a mechanism for mapping between a payload format, and a payload type identifier (in header) using Session Description Protocol (SDP).
- The Secure Real-time Transport Protocol (SRTP) (RFC 3711) defines a profile of RTP that provides cryptographic services for the transfer of payload data.
A complete specification of RTP for a particular application usage will require a profile and/or payload format specification(s).
- RTP-Based Systems
A complete network based system will include other protocols and standards in conjunction with RTP. Protocols like SIP, RTSP, H.225 and H.245 are used for session initiation, control and termination. Other standards like H.264, MPEG, H.263 etc., are used to encode the payload data (specified via RTP Profile).
An RTP sender captures the multimedia data, which are then encoded as frames and transmitted as RTP packets, with appropriate timestamps and increasing sequence numbers. Depending on the RTP Profile in use, the Payload Type field is set. The RTP receiver, captures the RTP packets, and may perform reordering of packets, which may have resulted because of the underlying IP network and the frames are decoded depending on the payload format and presented to the end user.
- RTP Features
RTP provides end-to-end delivery services for data with real-time characteristics, such as interactive audio and video. But RTP itself does not provide any mechanism to ensure timely delivery. It needs support from lower layers that actually have control over resources in switches and routers. RTP depends on RSVP to reserve resources and to provide the requested quality of service.
RTP doesn't assume anything about the underlying network, except that it provides framing. RTP is typically run on the top of UDP to make use of its multiplexing and checksum service, but efforts have been made to make RTP compatible with other transport protocols, such as ATM AAL5 and IPv6.
Unlike usual data transmission, RTP does not offer any form of reliability or flow/congestion control. It provides timestamps, sequence numbers as hooks for adding reliability and flow/congestion control, but how to implement is totally left to the application.
RTP is a protocol framework that is deliberately not complete. It is open to new payload formats and new multimedia software. By adding new profile and payload format specifications, one can tailor RTP to new data formats and new applications.
RTP/RTCP provides functionality and control mechanisms necessary for carrying real-time content. But RTP/RTCP itself is not responsible for the higher-level tasks like assembly and synchronization. These have to be done at application level.
The flow and congestion control information of RTP is provided by RTCP sender and receiver reports.
- RTP Implementation Resources
RTP is an open protocol that does not provide pre-implemented system calls. Implementation is tightly coupled to the application itself. Application developers have to add the complete functionality in the application layer by themselves. However, it is always more more efficient to share and reuse code rather than starting from scratch. The RFC 1889 specification itself contains numerous code segments that can be borrowed directly to the applications. There are some implementations with source code available on the web for evaluation and educational purposes. Many modules in the source code can be usable with minor modifications. The following is a list of useful resources:
- vat (http://www-nrg.ee.lbl.gov/vat/)
- RTP page (http://www/cs.columbia.edu/~hgs/rtp) maintained by Henning Schulzrinne is a very complete reference.
- RTP Library (http://www.cs.columbia.edu/~jdrosen/rtp_api.html) provides a high level interface for developing applications that make use of the Real Time Transport Protocol (RTP).