Application Aware Triggered Quality of Service

APPLICATION AWARE TRIGGERED QUALITY OF SERVICE (AATQoS)

Jared Valentine (hidden at xmission.com)
v1.1 February 28, 2008

This is a previous version of the AATQoS webpage. A current version can be found here: http://www.xmission.com/~hidden/aatqos

BACKGROUND / PROBLEM DESCRIPTION

I live in a multi-computer household. There are desktops, laptops, media centers, and servers on the network. When I'm not out visiting clients, I typically work out of my home office. The only acceptable option for broadband is a Qwest 1.5mbps DSL circuit (1500k down / 1000k up). My problem, and the part that frustrates me to no end, is my Vonage Voice over IP phone line. Don't get me wrong, I've been quite pleased with the Vonage service. What frustrates me is the contention for bandwidth that causes the Vonage audio quality to suffer.

Some people say "well, don't use the Internet while you use the phone", but that's not realistic. Everyone uses bandwidth in different ways and at different times of the day. All it takes is one person downloading something, anything, and my Vonage quality tanks. Windows Updates, AntiVirus updates, YouTube, inline "video" ads on webpages, e-mail with large attachments, or my daughter watching an online preview of the next "My Little Pony" movie all make callers sound robotic and unintelligible. That doesn't even consider heavier traffic like FTP, NNTP, Bittorrent, online backups, etc. The future holds even more contention for limited bandwidth as media companies rush to make HD movies available for download over the internet and streaming IPTV takes off.

For those of you who are "gamers", feel free to substitute the terms "VoIP" and "phone call" in this article with your favorite online game. You know what I mean... every millisecond counts and a congested line can be the difference between an enjoyable game and being pwnd by a 14-year-old with a better ping than you.

I've spent countless hours on Google searching for the perfect traffic prioritization solution. Almost all of the QoS-related search results emphasize that "you can prioritize your outbound traffic, not your inbound." True prioritization requires control of both ends of the connection. Since I don't have control of my ISP's routers, this doesn't help either.

- http://www.faqs.org/docs/Linux-HOWTO/ADSL-Bandwidth-Management-HOWTO.html#AEN149 (see 3.5.1)
- http://vonage.nmhoy.net/qos.html
- http://www.aetherwide.com/articles/voip-pf.html

UNACCEPTABLE SOLUTIONS

I was able to implement one QoS scheme that provided perfect sounding VoIP calls. Unfortunately, it came at a cost that was too much to bear. I applied a heavy-handed traffic policing scheme to my WAN router that limited all incoming TCP traffic to 800kbps, which is a little more than half my 1.5Mbps DSL connection. The Vonage service uses UDP packets so they would be allowed to pass through the rate limiting uninhibited. By limiting all TCP traffic to 800k things sounded great. The cost: 700k of wasted bandwidth... bandwidth I couldn't use for anything else. You'd think that reserving something smaller would work, maybe 100-200k. Not so. I had to reserve a total of 700k before things sounded perfect. Granted, the Vonage service doesn't use all 700k of bandwidth, but that's the headroom needed to meet the latency requirements for a good VoIP conversation on my particular DSL line. Severely limiting everything else, all the time, just didn't sit well with me. (The ADSL-Bandwidth-Management-HOWTO referenced above poses a similar problem and resolution - and recommends against it due to the high cost).

For a short time, I used a one-click script that would telnet to my router and enable the traffic policing. I would launch that shortcut before I made or answered a call, and then launched a different script that would disable the policing when the call was complete. That worked great except for when my computer was off, rebooting, blue-screened, or being tinkered upon. And of course, it didn't help at all when I was away from my computer.

I started investigating alternative traffic-shaping products looking for a better solution. I looked at dd-wrt on a Linksys WRT54G and loaded pfSense on an old PC. I read all sorts of IOS manuals and QoS guides for a Cisco 1801 DSL router, the same for a 3Com 3031 DSL router. No matter what I tried or looked at, it always came back to the same paradigm: "you can only prioritize your outbound, not the inbound."

THE HOLY GRAIL - AATQoS

For me, the holy grail of traffic prioritization would limit all incoming data traffic to 800k, only_while_I_was_on_the_phone_. I would call this perfect scheme "Application Aware Triggered Quality of Service" or "AATQoS" for short. The basic steps for this prioritization scheme include: 1.) detecting the presence of a VoIP call, 2.) reacting to the presence of a VoIP call with rate limiting, 3.) detecting the absence of a VoIP call, and 4.) reacting to the absence of a VoIP call by removing rate limiting..

I have a background in Networking, Security, and VoIP. When I took a step back and spent some a little time defining the problem and visualizing the "perfect solution", the answer became readily apparent. Why not take a security application that is generally tasked with looking for "bad" traffic and have it watch for traffic that needs to be prioritized instead? When the important traffic is seen, then the previously identified steps can be taken to rate limit all other traffic. This ensures the latency-sensitive traffic's timely and unhindered delivery. Once the latency-sensitive traffic is gone, then the temporary limits on all of the other traffic could be removed. This restores full access to all of the bandwidth.

At first I thought this project might take days, but thanks to pfSense and a handful of other open-source projects, this project only took an hour or two.

THE PLATFORM - pfSense

First off, pfSense (www.pfsense.org) is super-easy to install. I downloaded an ISO and installed it on an unused PC. pfSense is a very powerful platform that supports firewalling, traffic shaping, routing, intrusion detection, and transparent webcaching among many other things. Once the base system is configured, pfSense offers one-click options to install additional services. The Snort IDS engine is perfectly suited to detect VoIP calls. I am currently running the 1.2-RELEASE version of pfSense, although this process worked with the pfSense BETA-3 and -4 releases.

THE SENSING MECHANISM - Snort

In the pfSense GUI, go to System, then Packages. From here, you can click on the "+" sign next to Snort, which will launch the automated installation process. After installing Snort, I registered at the Snort website (www.snort.org) and requested an "oinkmaster key". This key allowed me to download pre-packaged Snort rules. I entered this key into the pfSense GUI and pfSense automatically downloaded a bunch of Snort rules. pfSense makes it very easy to enable rule categories and individual Snort rules. I enabled all of the voip.rules to see what would happen. Unfortunately, the built-in rules didn't seem to alert when I made or received calls, most likely because Snort usually looks for malicious traffic, not normal VoIP traffic. Without reliable logs, I couldn't get reliable traffic policing. I ended up writing my own Snort rules.

PACKET CAPTURE - tcpdump

Before I could write Snort rules, I needed to get a packet trace using a sniffer. It is impossible to write a detection rule when you don't know what you're looking for. Lucky for me, pfSense includes the ever-popular tcpdump utility. With tcpdump, I could capture packets on the wire and see the Vonage adapter's communications while making & receiving calls. I ssh'd to the pfSense box, got a shell, and then issued the command "tcpdump -w capture.cap -s 0 udp port 10000". I moved the capture.cap file from pfSense to my desktop using SCP and used Wireshark (www.wireshark.org) to investigate exactly what happend when I made a call, and when I hung up. Each call included SIP "INVITE" messages, and when I hung up, I either saw "BYE" or more usually, "CANCEL" messages. If I could key off of these SIP messages, all would be well. Please keep in mind that different VoIP providers may use different port numbers and protocols. Most standard SIP traffic uses UDP port 5060 and 5061 for call control messages, while Vonage seems to have moved to UDP 10000 for their signalling. Your mileage may vary depending on your provider & underlying VoIP technology. If you're unsure, you could always use "tcpdump -w capture.cap -s 0 udp" and capture all UDP traffic to see what happens during a call, otherwise use Google or talk to your VoIP provider.

I tried having Snort match on the SIP keywords INVITE and CANCEL. For some reason this produced a ton of INVITE matches, even while I wasn't on a call. I dug a little deeper into the packet capture to see if I could found out why. My Vonage VoIP adapter sends a SIP Register packet every 10 seconds, and the very end of each SIP Register packet includes the INVITE keyword. This means I needed something more specific to match on. Taking a closer look at the packet captures, I saw that I could use "INVITE sip:", "CANCEL sip:", and "BYE sip:" instead.

CREATING SNORT RULES

At this time, I wasn't interested in using any other Snort rules, so I disabled all of the default rules through the pfSense GUI and created my own category. Creating a new category is as simple as creating a file in the /usr/local/etc/snort/rules directory called 00police.rules. I used the vi (http://en.wikipedia.org/wiki/Vi) text editor to create the file. I used the name 00police.rules so that it is the first category to show up in the pfSense Snort GUI. The 00police.rules file has the following entries:

/usr/local/etc/snort/rules/00police.rules

alert udp any 10000 -> any any (msg:"VOIP-SIP Outbound INVITE Message"; content:"INVITE sip:"; reference:url,www.ietf.org/rfc/rfc3261.txt; classtype:protocol-command-decode; sid:72001; rev:1;)
alert udp any any -> any 10000 (msg:"VOIP-SIP Inbound INVITE Message"; content:"INVITE sip:"; reference:url,www.ietf.org/rfc/rfc3261.txt; classtype:protocol-command-decode; sid:72002; rev:1;)
alert udp any any -> any 10000 (msg:"VOIP-SIP CANCEL Message IN"; content:"CANCEL sip:"; reference:url,www.ietf.org/rfc/rfc3261.txt; classtype:protocol-command-decode; sid:72003; rev:1;)
alert udp any 10000 -> any any (msg:"VOIP-SIP CANCEL Message OUT"; content:"CANCEL sip:"; reference:url,www.ietf.org/rfc/rfc3261.txt; classtype:protocol-command-decode; sid:72004; rev:1;)
alert udp any any -> any 10000 (msg:"VOIP-SIP BYE Message IN CANCEL"; content:"BYE sip:"; reference:url,www.ietf.org/rfc/rfc3261.txt; classtype:protocol-command-decode; sid:72005; rev:1;)
alert udp any 10000 -> any any (msg:"VOIP-SIP BYE Message OUT CANCEL"; content:"BYE sip:"; reference:url,www.ietf.org/rfc/rfc3261.txt; classtype:protocol-command-decode; sid:72006; rev:1;) #

Note: I don't claim to be a Snort expert. Until today, I've never written a Snort rule. I'm sure there are more specific and accurate ways to write these rules with offsets, distances, regular expressions, etc. However, the above rules work perfectly for my purposes.

Now, with Snort running, I issued the command "tail -f /var/log/snort/alert" and made some test calls. For each call setup and teardown, there were corresponding INVITEs and CANCELs seen in the Snort log file. Initially, I was concerned that there were multiple INVITEs and CANCELs for each call. I later found out that this could be worked around using a log file analysis tool. I also placed the CANCEL description in each of the VOIP-SIP BYE lines so that matching CANCEL would catch both styles of call disconnects. Once Snort was reliably detecting when I was on and off the phone, it was time to move to the next step: automating the policing.

AUTOMATING ACTIONS - Simple Event Correlator

I didn't see anything in the Snort documentation about automated actions based on rule matches, although I have to admit I didn't look for long. I found a log file analysis tool that did exactly what I was looking for: the Simple Event Correlator (SEC). You can download SEC from here: (http://sourceforge.net/projects/simple-evcorr/). SEC Documentation is found here: (http://simple-evcorr.sourceforge.net/)

In a nutshell, SEC can be configured to watch a log file for specific entries and then take customized actions for each match.

While reading the SEC documentation, the "pair" event correlation rule type quickly stood out as the right one to use. From the SEC documentation: "Pair - match input event, execute an action list, and ignore the following matching events until some other input event arrives. On the arrival of the second event execute another action list." I used the Pair rule because of the multiple INVITE and CANCEL messages. The SEC pair rule will only trigger once on the first INVITE message, and will only trigger once on the first corresponding CANCEL message.

I downloaded sec-2.4.1.tar.gz and placed it into the /usr/src directory. I extracted the files with "tar xzfp sec-2.4.1.tar.gz". This created a new directory called "/usr/src/sec-2.4.1", which I ended up using as the working directory for all of the scripts and log files associated with this project. There's probably a better, more appropriate place for all of these files - but this worked for me. Here is the conf.txt configuration file I used for SEC:

/usr/src/sec-2.4.1/conf.txt

type=Pair

ptype=RegExp
pattern=INVITE
desc=Received SIP Invite, Enable Police
action=shellcmd /usr/bin/perl /usr/src/sec-2.4.1/police.pl;shellcmd /bin/date >> /usr/src/sec-2.4.1/sec.log;write /usr/src/sec-2.4.1/sec.log Enabling Police

ptype2=substr
pattern2=CANCEL
desc2=Received SIP Cancel, DisablingPolice
action2=shellcmd /usr/bin/perl /usr/src/sec-2.4.1/nopolice.pl;shellcmd /bin/date >> /usr/src/sec-2.4.1/sec.log;write /usr/src/sec-2.4.1/sec.log Disabling Police
window=10800
#

Note: you don't tell SEC which log file to watch in the configuration file, you do it as part of the actual sec.pl command line. Details on launching sec.pl are found a little later in this document.

The first SEC rule watches the input file for an INVITE match (specifically matching "VOIP-SIP Outbound INVITE Message" from the 00police.rules file). When successfully matched, it executes 3 actions:

launches a perl script called police.pl (script included below) that telnets to the router and enables traffic policing.
takes the output of /bin/date and adds it to the sec.log file
writes "Enabling Police" to the sec.log file.

Note: the SEC documentation says that you can use $d or %d to write the date, but I couldn't get that part to work properly on my pfSense box

The second SEC rule watches the same input file looking for any of the CANCEL lines from the 00police.rules file. When matched it takes these 3 actions:

launches the nopolice.pl script (script included below) that telnets to the router and disables the policing.
takes the output of /bin/date and adds it to the sec.log file
writes "Disabling Police" to the sec.log file

AUTOMATED ACTIONS - Perl Telnet Scripts

The Net::Telnet perl module is required in order to create a perl-based telnet script. I don't think Net::Telnet was included with my pfSense distribution. I was unable to use cpan to get a copy of it either. I manually downloaded Net::Telnet from the following location: (http://search.cpan.org/~jrogers/Net-Telnet-3.03/lib/Net/Telnet.pm)

I placed Telnet.pm into /usr/local/lib/perl5/5.8.8/net/. Thankfully, there are plenty of perl/telnet script examples available on the Internet. After some trial and error, I came up with these scripts. They are very simplistic scripts that leverage 3 functions: open a telnet connection, wait for text, send text.

/usr/src/sec-2.4.1/police/pl

use Net::Telnet ();
$t = new Net::Telnet (Timeout => 3,
Prompt => '/bash\$ $/');
$t->open("10.0.0.1");
$t->waitfor('/Username: $/i');
$t->print("admin");
$t->waitfor('/Password: $/i');
$t->print("password");
$t->waitfor('/Router.*$/');
$t->print("conf t");
$t->waitfor('/config.*$/');
$t->print("policy-map police");
$t->waitfor('/-pmap.*$/');
$t->print("class acgroup110");
$t->waitfor('/-pmap-c.*$/');
$t->print("police 800000 conform tr ex dr");
$t->waitfor('/-c-police.*$/');
#

/usr/src/sec-2.4.1/nopolice.pl

use Net::Telnet ();
$t = new Net::Telnet (Timeout => 5,
Prompt => '/bash\$ $/');
$t->open("10.0.0.1");
$t->waitfor('/Username: $/i');
$t->print("admin");
$t->waitfor('/Password: $/i');
$t->print("password");
$t->waitfor('/Router1801.*$/');
$t->print("conf t");
$t->waitfor('/config.*$/');
$t->print("policy-map police");
$t->waitfor('/-pmap.*$/');
$t->print("class acgroup110");
$t->waitfor('/-pmap-c.*$/');
$t->print("no police");
$t->waitfor('/-pmap-c.*$/');
#

ROUTER CONFIGURATION - Cisco 1801

The router I used for the test is a Cisco 1801 DSL router, although any rate-limiting router would do the trick. Here are the snippets from the configuration that apply the rate limiting. They could probably be a little more specific and detailed (ie: different limits for inbound vs outbound traffic, further limiting "bulk" applications vs. interactive ones, etc.). I'm sure I'll modify this to be much more specific as time goes on.

class-map match-all acgroup110
match access-group 110

policy-map police
class acgroup110
police 800000 conform-action transmit exceed-action drop

access-list 110 permit tcp any any

interface Dialer0
service-policy input police

The above police.pl perl script goes into policy-map police, class acgroup110, and adds the "police 800000..." line when I start a call, and nopolice.pl removes that same police line once I hang up. The rest of the lines in the router configuration remain unchanged during and after VoIP calls.

TYING IT TOGETHER w/ pfSense

Finally, I wanted SEC to launch each time that pfSense reboots. To do this, I placed a startup script on the pfSense box.

/usr/local/etc/rc.d/sec.sh

#!/bin/sh
pkill perl \/usr\/src
sleep 1
perl /usr/src/sec-2.4.1/sec.pl -conf=/usr/src/sec-2.4.1/conf.txt -input=/var/log/snort/alert -detach
#

I also ran "chmod a+x sec.sh" to make sure it was seen as an executable file by the operating system. I believe pfSense calls the scripts in /usr/local/etc/rc.d once in a while (every 24 hours?) and I didn't want multiple instances of the SEC script running. The "pkill" line effectively kills any running instance of the SEC perl script first and then launches a new copy. I rebooted the pfSense box, ssh'd in and checked the output of "ps aux". From here, I could see Perl running the sec.pl script.

MAKING SURE IT ALL WORKS

Finally, to make sure it was all working; I tailed the /usr/src/sec-2.4.1/sec.log file and made a few calls. I saw exactly one Enabling message, and one Disabling message corresponding with the call setup and tear-down. SEC was doing its job and suppressing the additional INVITE and CANCEL messages. I also made sure that the police statement was in my router while I was on a call. I checked it again once the call was complete and the police statement was gone. A few more tests with multiple HTTP/FTP downloads saturating the link showed that the police statement immediately took effect and the Vonage voice quality was crystal-clear.

OTHER THOUGHTS

Keep in mind that AATQoS doesn't need to be specific to pfSense, although I don't know if it could have been any easier with a different firewall/shaper. This should work great with any package that includes Snort and Perl. I fully expect others to be able to do this with IPCop, dd-wrt, m0n0wall, smoothwall, etc. I'm sure that router manufacturers could easily add capability like this to their products.

This should work for online gaming as well, as long as you can capture the sign-in/out messages. Once you understand the protocol, Snort rules could be written and the rest of the AATQoS framework could be followed.

Hopefully, someone will read this article and take it to the next level. I'd love to see a setup that detects the presence of a particular datastream as opposed to watching for control messages. In the Vonage world, this would mean seeing streaming UDP port 10000-20000. A "while" loop could potentially enable policing for as long as the UDP packets are flowing, and turn off policing once the UDP stream has stopped. It would also make it a lot easier to create Snort rules for games and other VoIP services that don't have easily identifiable start/stop messages.

Actually, at that point, you wouldn't even need to necessarily run Snort watching for Layer 7 control messages. You could just key off of standard Layer 3/4 ACL matches.

CAVEATS

This scheme only works for one Vonage line. It breaks when there are 2 or more lines active at once. (Imagine making a call on the first line, making a call on the 2nd line, and then finishing the call on either line. The CANCEL message would disable policing even though one of the calls was still connected). This could be fixed creating rules & smart scripts that key off of the individual IP addresses of the VoIP phones/adapters and keeps track of the number of active calls. It could even perform additional rate limiting based on the number of calls active at once. I'm not going to do this because I don't need a 3rd phone line.

THINGS TO DO

Find a better directory to house everything
figure out how to get the contents of the sec.log to show up in the pfSense GUI log
figure out a way to key off the presence of a particular data stream as opposed to start/stop control messages

FREQUENTLY ASKED QUESTIONS

Q1.) Why didn't you have the scripts modify the built-in QoS in pfSense?
A1.) This is a great idea and I'd love to see someone do it. Personally, I don't use pfSense's built-in QoS because it interferes (read: rate-limits) responses from the Squid cache. What's the point of having a web cache that's limited to 1.5mbps?) For those who don't have a router capable of ACLs and rate-limiting, this would actually be a great way to do it. pfSense prioritizes based on a percentage of available bandwidth. The automated action could re-write the config file that holds the prioritization settings, modify the qlanroot/qwanroot to the point where TCP data is limited to some smaller value, and then restart the shaper process with the new settings. While plenty of headroom will be available for VoIP, the remaining data would still be prioritized within the smaller pipe. Interactive sessions like Telnet could still have a higher priority than HTTP, which would have a higher priority than bulk applications like FTP, Bittorrent, etc. At the end of the call, replace the config file, restart the shaper process, and all of the bandwidth becomes available again.

Q2.) Would this work with broadband technologies that have a "boost" or "burst" feature (10+mbps for the first few seconds of a download, and drops off to the standard 5mbps or 3mbps)?
A2.) Sure. Just make your rate limiting calculations assume the worst case scenario, i.e. without the boost technology there. Not that any connection is "guaranteed", but base your calculations off of your standard bandwidth number, not the boost number. While you're on the phone, the boost won't be available, but things will sound crystal-clear. Once you're off the phone the boost will be ready to go.

Q3.) Why don't you just get more bandwidth?
A3.) Since I'm only a block or two away from the DSLAM, I assume the reason I can only get 1.5mbps is because it's a T1/copper-fed remote terminal. I wish Qwest would pull some fiber to it as I know the copper supports 7mbps. Cable is not available in my cul-de-sac and Comcast won't run a cable to me even though my backyard neighbor has it.

Q4.) Why don't you get another phone line from the local Telco?
A4.) I'm not happy about only getting 1.5mbps and would rather spread my consumer dollars around. Vonage has some cool features too and the price is good.

Q5.) Why don't you just use a cellular telephone instead?
A5.) Cell service in my basement office is dismal

Q6.) Will this work for IAX VoIP?
A6.) I don't see why not. IAX, like SIP, uses UDP as a transport and signaling protocol. Where SIP usually uses UDP 5060/5061 for call control, IAX uses UDP port 4569 for both control & media. You would probably be matching on the IAX "NEW" message for call setup, and the "HANGUP" message for teardown. Fire up a sniffer and watch the traffic on UDP 4569.

Q7.) AATQoS?
A7.) All of the shorter acronyms were already taken (Application Aware QoS, Application Triggered QoS, Triggered QoS, etc.). and had plenty of results on search engines. I wanted to be able to watch the search engine results grow over time... maybe I'm a little vain that way.

DISCLAIMER

Use the information in this document at your own risk. I disavow any potential liability for the contents of this document. Use of the concepts, examples, and/or other content of this document is entirely at your own risk. All copyrights are owned by their owners, unless specifically noted otherwise. Use of a term in this document should not be regarded as affecting the validity of any trademark or service mark. Naming of particular products or brands should not be seen as endorsements. You are strongly recommended to take a backup of your system before major installation and backups at regular intervals.

CREDITS

I'd like to publicly thank the authors and contributors of pfSense, tcpdump, Snort, Simple Event Correlator, and FreeBSD for their time and effort in providing us these great tools.

v1.0 January 15, 2008 - Initial Release
v1.1 February 28, 2008 - After a power bump, the script stopped working. Further inspection showed that Vonage had changed the SIP Signalling port on my Linksys adapter from 5060 / 5061 to UDP Port 10000. Made modifications to the snort scripts for INVITE / CANCEL / BYE and things started working again.

The content presented on this web page is provided for your personal non-commercial use only and may not be republished in whole or in part without the express written or verbal consent of the publisher. All rights are reserved.