Understanding the SMTP Protocol
The Simple Mail Transfer Protocol, and you
One of the most attractive parts of the internet and computers to many people is the ability to send and receive email. How this sending and receiving of email works though is largely a mystery to many. What we will talk about in this article is the protocol responsible for the sending of email. That protocol would be SMTP, or as it also known as, Simple Mail Transfer Protocol. This protocol will be listening on port 25, or more precisely the SMTP server will be listening for client connections on port 25. The best known email server in use today is Microsoft Exchange.
Well, as always I try to use the OSI Reference model to map protocols to, and this one is no exception. SMTP would itself be an application layer protocol. It uses TCP as a transport protocol, and in turn uses IP for routing. Much like HTTP, the SMTP protocol has a number of status codes to enhance its functionality. These status codes are used to relay specific conditions between the client and server. Yes you are indeed right! This protocol does conform to the much talked about client/server model. Think of Microsoft Outlook as the client, and Microsoft Exchange as the server.
Further to the status codes that SMTP uses there are also a series of SMTP commands. Commands such as “AUTH” for authentication, and “EHLO” for extended hello. These commands are the way that the email client and server talk to each other. I always say that seeing is believing so let’s see an example of the client and server talking to each other.
Give me packets!
We can see in the below noted packet that the SMTP command “HELO” is present. This command is issued once the TCP/IP three handshake is complete between the client and server. What does HELO actually mean though? Well, it actually pretty much stands for what it sounds like. The email client is saying hello and following the HELO command is the domain that the client is coming from. We can see the domain in the below noted ASCII content and it is bolded for clarity.
The command HELO has been superseded by the EHLO command. EHLO stands for “extended hello”. When the EHLO command is sent, this will cause the mail server to advertise all of its features. Features such as being able to transport characters other then safe ASCII characters. In reality though, EHLO has not superseded HELO as that would imply that HELO is no longer used. It is very much in use today, and all mail servers are still required to accept a simple HELO.
06/09/2005 06:10:46.595221 192.168.1.100.40565 > 192.168.1.200.25: P [tcp sum ok] 159505509:159505543(34) ack 578397676 win 33304 <nop,nop,timestamp 310237481 108030715> (DF) (ttl 52, id 34293, len 86)
0x0000 4500 0056 85f5 4000 3406 5235 c0a8 0164 [email protected]
0x0010 c0a8 01c8 9e75 0019 0981 dc65 2279 a5ec .....u.....e"y..
0x0020 8018 8218 0449 0000 0101 080a 127d d929 .....I.......}.)
0x0030 0670 6afb 4845 4c4f 2077 6562 3334 3231 .pj.HELO.web3421
0x0040 332e 6d61 696c 2e6d 7564 2e79 6168 6f6f 3.mail.mud.yahoo
0x0050 2e63 6f6d 0d0a .com..
Well, what happens after the HELO has been issued to the mail server? What happens next is that the mail client will say that it has mail from someone. We can see this in the below noted packet. I have underlined the “MAIL FROM” in the ASCII content. Regressing back a bit here to TCP/IP we can see from the TCP sequence numbers in the above and below packet that the TCP sequence numbers follow each other, as they should. We can also infer from this that the mail server has yet to acknowledge either packet as seen above and below.
06/09/2005 06:10:46.641311 192.168.1.100.40565 > 192.168.1.200.25: P [tcp sum ok] 159505543:159505580(37) ack 578397699 win 33304 <nop,nop,timestamp 310237486 108030720> (DF) (ttl 52, id 35311, len 89)
0x0000 4500 0059 89ef 4000 3406 4e38 c0a8 0164 [email protected]
0x0010 c0a8 01c8 9e75 0019 0981 dc87 2279 a603 .....u......"y..
0x0020 8018 8218 053c 0000 0101 080a 127d d92e .....<.......}..
0x0030 0670 6b00 4d41 494c 2046 524f 4d3a xxxx .pk.MAIL.FROM:<x
0x0040 xxxx xxxx xxxx xxxx xxxx xxxx 4079 6168 [email protected]
0x0050 6f6f 2e63 6f6d 3e0d 0a oo.com>..
Now let's examine the packet above a little further. Some quick steps to orient ourselves are as follows. We know that we have an IP header at the start, which is using IPv4 as declared by the underlined 4. Also we can see that the transport protocol is TCP, as declared by the underlined 06 in the IP header. From the underlined 8 in the TCP header we see that we have 12 bytes of TCP options set. From bytes 4d41 onwards is where we have our actual SMTP application layer data starting. I have bolded those bytes.
Once again lets take an opportunity to backtrack a bit to TCP. In the packet above, we have two various TCP options. Notably we have NOP (no operation) and timestamp, as it is shown in the ASCII above. Now lets bust out these options at the hex level.
Starting at bytes 0101 shown above, and which is underlined, this represents the TCP option 01 and the length of said option is 01 byte in length ie: one byte. Following this we have byte 08, which represents the timestamp option as seen at the bottom of the TCP/IP and tcpdump flyer found at the bottom of this page. Following this byte is byte 0a which represents the length of the timestamp option as measured in bytes. 0a equates to ten in decimal. Lastly, following this are bytes 127d d92e, and this represents the first timestamp value of 310237486. Then there is bytes 0670 6b00, and that represents the final timestamp value of 108030720. The timestamp option is not one that you always see so I wanted to take this opportunity to show it to you, and how it looks.
Let's get back to it!
Right then, let’s get back to discussing SMTP itself and how it works. We last covered the second step that a client takes when sending email to a mail server. This is displayed in the above packet by the “MAIL FROM:” ASCII content. What is next though in the chain of events? Well, the next packet would contain the actual email message itself. Also contained would be the email body and header fields. By “email body” I mean the actual contents of the email itself. The last step taken by the client is the “QUIT” command, thereby severing the connection to the web server.
That, in a nutshell, is how SMTP works and how it delivers mail for you. There is of course more to SMTP then what I have stated here. I would certainly encourage you to read the appropriate RFC’s pertaining to SMTP. Heck, you can even interface with a mail server directly if you wish, and play with issuing commands to it. I hope that this taste of SMTP is enough to whet your appetite, and perhaps lead you to further reading on this protocol. As always I sincerely hope this article was of use to, and I always welcome feedback. Till next time!