Session Initiation Protocol (SIP) is the most popular Voice over IP (VoIP) signaling protocol. Several VoIP devices and applications, including Apple’s FaceTime, use this protocol. What’s a signaling protocol, though, and, more importantly, what is SIP? If you’re curious to know more, keep reading! I’ll tell you all about what SIP is, the various elements that make up SIP, and the protocol’s key features.
SIP is defined in RFC 3261, so if you’ve got the technical know-how and patience to read a 200+ page highly technical document, feel free to click that link. That said, if you’re looking for a solid introduction to the subject that you can easily digest in just about 10 minutes, this article is for you.
Let’s start by answering the question, “What is SIP?”
What Is SIP?
As indicated earlier, SIP is a signaling protocol. A signaling protocol is a set of rules for establishing, maintaining, modifying, and tearing down IP-based calls or multimedia sessions. You can think of an IP-based call as a session composed of two phases:
- The phase for setting up the call.
- The phase for transmitting voice or other multimedia packets.
SIP mainly operates in phase one, but it does stick around in case the session needs any modifications or if it’s time to terminate the session.
Developers and network administrators prefer SIP over other signaling protocols, like H.323, MGCP, and Q.931, since it’s text-based. That means it’s easier to review and debug. SIP is also more scalable and flexible and has a lower cost of entry.
Depending on the nature of the SIP session you want to carry out, you’ll need SIP user agents, registrars, proxy servers, and other SIP elements (e.g., session border controller, gateway, redirect server). However, let’s focus our discussion on the key SIP elements. It’s all we’ll need to understand a basic SIP call.
3 Key SIP Elements
SIP has 3 main elements: user agents, registrars, and reverse proxies. Yes, a full-blown SIP network will certainly have several other SIP elements, but these three are the most widely used. Let’s discuss each one.
1. User Agent
When a user wants to make a call or send a message to another user using the SIP protocol, they’ll do that through a SIP user agent. A SIP user agent is a physical device or software application that sends or receives calls/messages. It can function both as a client and as a server on a SIP network. As a client, a user agent sends requests to a server, which can be another user agent or another SIP element like a reverse proxy. As a server, it responds to requests sent by a client, which also can be another user agent or SIP element.
Theoretically, a user agent can be used to communicate directly with another user agent. That’s only if the first user agent knows the second user agent’s location (e.g., IP address). This isn’t the case in the real world though. The user or user agent usually doesn’t know the other party’s location . That’s where other SIP elements come into play.
One of the key functions of the SIP protocol is to pinpoint the location of a user that another user wants to communicate with. To do that, the location of each user should be recorded somewhere on the SIP network. That’s the job of an SIP registrar. An SIP registrar is a server that accepts registrations from users through the user agent they’re currently logged in to.
Upon successful registration, the registrar associates the user in question with the user agent they logged in with. The registrar then records that association in a location service. This service is simply a database that contains the locations of users and user agents. Thus, when another SIP element (normally a proxy server) wants to find a user’s location, it simply looks up this database.
While a registrar can be hosted in a separate machine, it’s usually colocated in the same physical server as the location service on what’s known as a proxy server.
3. Proxy Server
The job of a register is only to register the location of a user on a SIP network. It doesn’t route user calls. That’s the proxy server’s job. Proxy servers are SIP elements that perform several functions. These functions are mostly routing requests and calls. That said, SIP elements also provide authentication and authorization services, as well as enforce policies.
As mentioned earlier, a proxy server can also host a location service, as well as a registrar. When a user wants to call another user, that call will go through proxy servers. While not shown in the diagram below, each proxy server looks up users’ locations in its respective location service.
Once two user agents know the other party’s location, they can start communicating. That part is no longer the SIP’s responsibility. That said, as mentioned earlier, SIP can sometimes intervene to make modifications or terminate the session.
Although it’s certainly one of the main functions of the SIP protocol, determining every user’s location isn’t its only feature.
5 SIP Features
As mentioned earlier, SIP is mainly responsible for setting up an IP-based call and other multimedia sessions. That should entail more than just finding a user’s location on the network. Here are 5 key SIP features involved in setting up and terminating multimedia communications.
1. User Location
You already know this from our discussions earlier. SIP determines the location of a SIP user and the user agent it’s currently logged on to. Without this feature, your callers can’t find whoever they want to communicate with across a network or the internet.
2. User Availability
In SIP, a user can register in more than one location. That said, that user can’t be in multiple locations at the same time. SIP can determine if a user is available and willing to take a call. If a call doesn’t go through, this means the user is unavailable on this user agent. SIP can then transfer the call to another user agent registered with that user. If the feature is enabled, it can also transfer it to a voicemail.
3. User Capabilities
SIP has a built-in feature that enables user agents to know the capabilities of the party they want to communicate with. This is necessary because user agents can come in different forms and capabilities. As a result, incompatibility issues may arise. For example, let’s say a PC-based user agent wants to communicate with another user agent embedded on a limited-function device. It clearly won’t be able to use all of its features. In addition, some user agents can do video calls, while others can only do voice calls. By knowing the capabilities of the other party, the two parties can communicate using common denominators.
4. Session Setup
This feature comes into play when a user agent alerts its user that another party is inviting it (the user being called) to a call. It also works to establish session parameters between the calling party and the receiving party. The step-by-step example shown below will illustrate how a SIP session setup is carried out.
5. Session Management
This encompasses several functions, including transferring sessions, modifying session parameters, invoking services, and terminating sessions. Using this feature, your users can, for example, transfer seamlessly from a PC-based SIP application to an IP-based phone. They can even transfer from a voice call to a conference call.
Alright, let’s get into the meat of this discussion. Let’s talk about how SIP works.
How Does SIP Work?
I’ll explain how SIP works using a diagram. I’ve drawn a simple example showing what a typical SIP call looks like. In this example, Bob is calling Alice. Each of them has IP phones or user agents they’re using to communicate.
The blue arrows indicate packets going to Alice, while the orange arrows indicate packets going to Bob. Some of the arrows need to hop on a couple of proxy servers before reaching their destination. Some arrows go directly from Bob to Alice and vice versa.
The numbers indicate the various steps involved. Note that some numbers (e.g., #3) aren’t on the same arrow or even on the next adjacent arrow as the number before it (e.g., #2). I want to point that out so you understand the numbering. The steps behave like this because SIP follows a request and response transaction model. That is, one party or element issues a request and the other party responds to that request.
Now, let’s run through the steps.
Step-by-Step Explanation of the Sample SIP Session Flowchart
- Bob sends an INVITE request to Alice. That INVITE includes Alice’s identity, known as a SIP Uniform Resource Identifier (URI). Bob indicates he wants to use that URI on his user agent by either typing it in, clicking a link, or selecting an entry in the user agent’s contact list. For Alice to know the caller, the INVITE request also includes Bob’s URI. Bob’s user agent then sends all that information along with several other pieces of information to Bob’s proxy server, Proxy Server A.
- Upon receiving Bob’s INVITE request, Proxy Server A looks up Alice’s location. It then forwards Bob’s INVITE request to the nearest proxy server, which in this case is Proxy Server B.
- Proxy Server A replies to Bob’s INVITE request with a TRYING response. This TRYING response confirms that the INVITE request was received and that Proxy Server A is already routing the request to the right destination.
- Upon receiving Bob’s INVITE request from Proxy Server A, Proxy Server B looks up Alice’s location. It then forwards the INVITE request to the location as well.
- Proxy Server B also replies to Proxy Server A with its TRYING response. This TRYING response confirms that it received Proxy Server A’s forwarded INVITE request and that it was already routing the request to the right destination.
- Upon receiving the INVITE request, Alice’s user agent notifies Alice and then sends back its response, a ‘Ringing’ response. This “Ringing” response indicates that Alice has already been notified. It’s then sent to Proxy Server B
- Proxy Server B recognizes who this ‘Ringing’ response is for. It then forwards it to the proxy server that forwarded the request. (Proxy A, in this case).
- Proxy Server A recognizes who this ‘Ringing’ response is for, and then forwards it to the user agent that sent the request. (Bob’s user agent, in this case).
Call Answer Stage
- If Alice answers the call, an OK response goes to Proxy Server B, indicating that Alice is ready to talk.
- Proxy Server B recognizes who this OK response is for and thus forwards it to Proxy Server A.
- Proxy Server A likewise recognizes who this OK response is for and in turn forwards it to Bob’s user agent.
- Bob’s user agent acknowledges and the call or media session commences. Notice that the acknowledgment now doesn’t pass through the proxy servers. Rather, it goes directly to Alice’s user agent. That’s because, at this point, Bob already knows how to locate Alice.
This is now the main call or media session. Most of what takes place here no longer uses SIP. If we recall the discussion above, this is Phase 2 of a call.
- Once Alice decides to end the call and hangs up, a BYE message will be sent to Bob’s user agent.
- Bob’s user agent then sends back an OK response, and the call terminates.
Well, there you have it. That’s how an SIP call works. Surely, the whole process is more detailed, but that sums up what happens.
Before we end this article, I’d like to cover one question that’s asked quite often when people start rolling out a VoIP solution.
SIP vs VoIP
I’m not sure why, probably due to misconceptions of what SIP does, but one of the questions I often encounter about SIP is how it compares with VoIP. Well, SIP and VoIP aren’t competing protocols or technologies. Rather, one (in this case, SIP) contributes to the other’s existence (in this case, VoIP).
As you might have figured out by now, SIP takes charge of setting up VoIP calls. Remember those two phases we talked about when we answered the question, “What is SIP”? Well, SIP is phase one of a VoIP call. Yes, it has alternatives. Some VoIP solutions use H.323 and/or MGCP as their VoIP signaling protocol. That said, they’re not nearly as widely used as SIP. I hope that clears up the confusion!
Most VoIP implementations, devices, or software applications, rely on SIP to set up calls. As an IT administrator, it’s important to have a solid understanding of the SIP protocol, as you may need it when troubleshooting VoIP-related issues. This article aims to provide an overview of the various aspects of SIP and insights on them.
In this article, you’ve learned what SIP is, the three key SIP elements, the 5 SIP features, and the basic flow of an SIP call. I hope you’ve gained enough knowledge to proceed with more in-depth research on the subject.
Have more questions about SIP? Check out the FAQ and Resources sections below.
Is SIP secure?
No. By default, all SIP messages are sent in plaintext. Thus, an eavesdropper can easily obtain information from intercepted SIP packets. To protect your SIP communications, you can employ Transport Layer Security (TLS). SIP readily supports TLS, which encrypts all SIP messages from the caller to the domain of the receiver. You can also use a VPN.
Can you protect SIP with a VPN?
Yes, since Virtual Private Network (VPN) solutions like IPsec encrypt internet traffic. A VPN, especially one deployed using site-to-site architecture, provides overarching protection on any traffic passing through two VPN gateways. That said, not configuring a VPN properly can cause your calls to suffer performance issues. To solve this, you can standardize the routers your remote workers use and consult your vendor on how to configure them for your VPN solution.
What ports should I configure on my firewall to use SIP?
SIP only uses TCP or UDP ports 5060 and 5061. That said, you normally use SIP as part of a VoIP solution, not on its own. The entire VoIP solution, unfortunately, may use a multitude of ports. That can be a problem if you’re only using a packet filter firewall. Your best bet is to use a type of firewall that does stateful packet inspection, as it can determine which packets are part of the same conversation.
Would you recommend managed VoIP solutions?
Yes. If you’re not familiar with them, managed VoIP solutions are VoIP solutions managed by a third-party service provider. If you prefer to outsource your VoIP infrastructure to free up most of your administrative overhead, a managed VoIP solution is a good option. You also can see the best-in-class VoIP solutions for businesses. They’re all managed services, so you might want to check that out.
What voice and video conferencing tools would you recommend?
You can pick from several good options out there. For instance, choose Zoom, GoToMeeting, Cisco Webex, and several others. Check these seven voice and video conferencing tools for businesses. They mostly still have the features you need!
Subscribe to our newsletters for more quality content.
TechGenix: Article on VoIP Trends
Learn how VoIP is evolving.
TechGenix: Article on Voice Communication
Discover why voice communication is becoming cool again.
TechGenix: Article on Cloud Computing Trends
Find out the top 6 latest trends in cloud computing to help grow your business.
TechGenix: Guide on Remote Network Access
Dive into the concepts and options for remote network access in this definitive guide.
TechGenix: Guide on Deploying a VPN on Windows
Understand what you need to know in deploying a VPN in Windows.