At Christian Aid, our field offices quite often have limited bandwidth, particularly at sites in Africa. In some cases we are using expensive VSAT links for these sites, so I am very interested in traffic shaping to ensure we get the best value and don’t spend too much to make our applications work for the local teams.
I haven’t found there to be a lot of material available on traffic shaping strategies. It is easy enough to find instructions on how to set up traffic shaping, but little out there that explains the theory. And the theory would assist in working out what settings one wants to apply. So I’ve ended up taking the following approach based on my own made up theory. Comments would be appreciated!
Set your uplink bandwidth as accurately as possible
For most offices we have a contracted bandwidth level. In Meraki MX networks Traffic shaping device I can set this using a slider, or clicking in details where I can enter more specific details (necessary where the up and down bandwidth are different which is often the case). Here is an image of the UI for this:
Note that there are WAN 2 and Cellular options – the Meraki equipment allows for a second failover/load balancing uplink, as well as a 3/4G uplink using a USB dongle.
This is an important for two reasons. Firstly, by setting the available bandwidth by here you are ensuring that any queuing that results from bandwidth demand exceeding supply occurs here at the MX appliance where you can manage it, rather than at the ISP equipment where you probably can’t.
Secondly, the settings here allow traffic shaping rules to be based on proportions of your available bandwidth. If you leave these at 100 Mbps (the default) then the priority settings for traffic shaping rules will not work effectively!
Note – I am not sure what to do in case of a contended connection where you know the maximum burst rate and the CIR you may see most of the time. Please comment if you know about this!
Apply device bandwidth limits if you have limited bandwidth
You want to ensure that you prevent a single user (or device) from hogging all the bandwidth.
Using the Global bandwidth limits you can specify the maximum bandwidth available to a single client. There is also a feature called SpeedBurst which allows a client to burst above this at the start of a single download for 10 seconds. This reduces the amount that the bandwidth limitation will be noticed by users. I am not clear whether this has been effective, but it certainly hasn’t hurt.
I’m not sure the best way to determine the per-client limit. It’s a balance between preventing a single client from hogging all the bandwidth – however if there is only one client using the connection you don’t necessarily want to prevent them using the bandwidth. Currently the Speedburst setting is the only tool that allows any sort of compromise.
If anyone has any ideas of how to determine a per-client limit, please leave a comment!
Note that you should set the Per-client limit here lower than the overall bandwidth – otherwise it will not achieve anything.
Traffic shaping rules
Now those basic global settings are out of the way, you can start shaping the different kinds of traffic passing through the device.
The first thing you need to do is think about the different kinds of traffic – I find it helpful to make five categories and identify the kinds of traffic in each:
- Realtime traffic
- Background traffic
- Things you want to limit but not block
- Things you want to block
- Everything else
The first three are the ones you want to create traffic shaping rules for. Blocking is achieved through the firewall (and possibly through another filtering solution such as OpenDNS about which I may write soon). Everything else you just leave alone.
Slicing up the pie
Consider the uplink bandwith you set as 100%. One element of traffic shaping is limiting different kinds of traffic to a portion of this hundred percent. Decide the proportions for each category you want to limit. I do it as follows:
- Background traffic – 30% of bandwidth
- Other things I want to limit but not block – 20%
This leaves 50% of my bandwidth if the background and discouraged traffic is running at full whack. Here is a schematic that explains better:
Calculate the actual bandwidth limits using the percentages you decide and the uplink bandwidth you set.
This is the kind of traffic where any delays will be immediately noticeable and are critical to your operations – for example voice traffic, live video (video conferences rather than streaming concert footage) and remote desktop solutions such as Citrix or RDP.
In my opinion this is the only kind of traffic you want to give high priority to. Don’t be tempted to put email traffic or things that non IT folk might consider high priority in here.
Here is my rule:
In the definition box I have used a combination of custom rules and Meraki’s predefined categories.
In my organisation I want to prioritise:
- voice traffic – that means Skype, Lync (now Skype for Business) – Skype is included in the predefined VoIP & Video Conferencing rule, but Lync isn’t yet, so using the available documentation about Lync traffic I specified ports and hosts used in Lync communication. update Meraki support confirmed Skype for Business is included in the Skype rule.
- video conference traffic – again Skype covers some of this, but also port and host values for VSee which isn’t yet in Meraki’s predefined rules
- remote desktop traffic – Citrix and RDP – again this isn’t in the specified rules, so I’ve specified our internal hostnames – citrixfarm and the FQDN citrixfarm.caid.local. I’ve also specified the ports 3389 which is used by RDP traffic.
I’ve allowed this traffic to ignore the per-client limit – if people doing a video conference need it then let them have it even if it gets in the way of other traffic.
I’ve also specified the Priority as high. Initially I understood this a a QoS setting – if a queue formed at the device, then let this traffic through first. After reading Meraki’s documenation on prioritisation more deeply I now think this works a bit differently, and it may be better to just use this setting rather than the custom bandwidth limits I’ve specified below.
Lastly is DSCP tagging – here I am tagging packets matching the tool with DSCP value 7 – I think this may result in the traffic being recognised by upstream devices doing traffic shaping as needing some kind of priority but I am unsure how effective this is. Can’t hurt right?
In my analysis of our use of bandwidth in each of our field offices I found that the number one application that was likely to hog bandwidth was email – mostly internal organisational email. Not torrenting, video or anything else. Our staff just send and receive a lot of email with large attachments. The worst cases happen when someone needs to resync their entire mailbox with Outlook.
Staff consider email to be the highest priority, but in most cases they won’t notice if syncing is slow. Since uncontrolled email regularly hogs the connections, this should be limited. In our case we use Office 365 email which is matched under the Windows Live Hotmail and Outlook rule (perhaps Meraki will update this label sometime soon?).
I’ve also put software updates in here – both from public update sources such as Windows update, and also our internal WSUS server lps253gbr. When Microsoft release a lot of patches this often ties up limited bandwidth connections for many hours. We still want the updates, but not at the expense of other things working. I’m hoping that Microsoft use of peer to peer technology in Windows 10 and Windows Update for Business will help us here.
Microsoft Skydrive (now called OneDrive) is also in here. We are planning on using OneDrive for Business to allow syncing of content – but any syncing tool is potentially a bandwidth hog, so I am trying to manage it.
I’ve specified that traffic matching this rule should be limited to 30% of the uplink – approximately 333 Kb/s down and 150 Kb/s up. AS mentioned previously the Priority setting may be a better way of applying the limits, but I’m not confident enough I understand why yet, so I am still manually setting the bandwidth as well.
Finally I’m tagging this with DSCP value 3. Again, I’m not quite sure I understand which setting to use and may change this later.
Apple.com is listed here in order to control automatic updating of ios clients such as iPhone – these are often huge and while I don’t want out of date clients on our networks, I really don’t want their updating to prevent day to day work by other people.
The other items are generally related to non-work related traffic. It is debatable whether these should really be limited as much as this. For the moment I’m throttling them in most locations.
The bandwidth is limited to 20% of the uplink – 200 Kb/s down and 100 Kb/s up. The priority is set as Low and the DSCP tag is 0.
There isn’t a lot of traffic that I think we should be blocking – it is really things that potentially jeopardise our operations or reputation – namely bittorrenting copyrighted material, which could result in our ISPs disconnecting us, or getting threatening messages from litigating bodies or worse. This blocking isn’t done on the Traffic Shaping page, but instead the Firewall page.
I’ve only selected Peer to Peer and Web file sharing. Peer to peer blocks BitTorrent and similar traffic, whereas Web file sharing blocks the sites where you find torrent files – e.g. Pirate bay, Kick Ass Torrents etc.
How effective is this?
I had the opportunity to confirm that these rules could improve the experience of using a saturated connection when our office in Delhi started to complain about slow speeds and disconnections.
Before applying traffic shaping rules based on the above the connection looked like this:
The “plateau” shape of this graph is classic saturation – the link is demanding more bandwidth than is available. Looking at the specific traffic that was happening at the time, I concluded that no-one was abusing the link – it was just the combination of so many staff doing work related things demading more bandwidth than was available.
After applying traffic shaping rules the link looked far healthier:
Staff reported that the connection had improved. I feel this validated the principles of my rules. In other locations I think that applying these rules have reduced our need to pay for increased bandwidth – and that was one of the main selling features of adopting Meraki MX Security Appliances in the first place.
However, I still have questions and am not sure I am applying the best possible rules for what I apply. Should I use only the Priority settings, or only the custom bandwidth limits? What are appropriate DSCP tags to apply, and are they at all effective.
If you have any thoughts on this, please leave a comment.