Nagle Delays Explained


#1

The ExtraHop Nagle Counter

We count Nagle delays not just based on the presence of the (ubiquitous) algorithm, but based on the interaction between delayed acknowledgements and Nagle.

They both try to do the right thing, but can interact poorly.

Nagle Overview

Here’s the meat of the logic. We’ll start with the wikipedia article on Nagle’s algorithm (emphasis mine):

  IF the window size >= MSS and the available data is >= MSS 
   --> send complete MSS segment now
  ELSE IF there is <b>unconfirmed data still in the pipe
   --> enqueue data in the buffer **until an ACK is received**

So the “cause” is this logic and its caveat:

  1. You’ve got data to send, but you’ve got extra room in the segment.
  2. You’ll can fit more data onto the segment if you wait a bit. If more data appears, add it to the segment until it’s full and then send.

The benefit here is pretty obvious: more efficient network utilization, etc.

But sometimes (or often) you want to send what you’ve got and don’t want to wait. In these cases Nagle can hurt you.

Let’s go back to the algorithm - see the “until ack is received” bit? THAT is the killer here, because of Delayed ACKs.

Delayed ACKs
Delayed ACK is another well intentioned algorithm with tries to send more data per segment if it can. But because part of Nagle’s algo depends on an ACK to send data, it creates a problem.

Effectively it’s a race condition - let’s imagine for a moment that your app is serving, say, lots of small (i.e. less than the MSS) images or small XML responses. The payloads are small and the should be delivered ASAP.

Here’s the race: You’ve got data to send. Great! But recall that Nagle is waiting until:

  1. It gets enough data to fill up the the MSS -or-
  2. Its timer expires -or-
  3. It gets an ACK from the receive side

See #3 above? That’s the issue - delayed ACK is doing something similar!

Its logic goes like this: "Hey, if I CAN STASH DATA on this segment, I will. Don’t send a “bare ACK” - we’ll be kinder to the network this way.

Here’s the algorithm, basically:

    IF you are ready to send an ACK:
       --> wait (usually 200ms, can be up to 500ms) for data to piggyback. 
    ELSE IF (the delayed_ack timer fires) OR (I get another inbound segment to ACK):
       --> send the packet

And there’s the deadlock:

Delayed ACKs are waiting around to send the ACK and Nagle’s is waiting around to receive the ACK!

So the net effect? You get random stalls of 200-500ms on segments that could otherwise be sent immediately and delivered to the receive side stack and apps above it.

COMMON CAUSES

Many servers these days disable nagle by setting TCP_NODELAY on its socket options, but not all. Many others expose it as a config option. Intermediate devices like proxies and load balancers often re-impose the algorithm because of “sane” defaults. They’re trying to do the right thing but they’re not.

RESOLUTIONS

Basically, here are the options to minimize or eliminate the issue:

  1. Disable Nagle via global socket options on your servers, or profile tweaks on your proxy / LB / ADC. This will eliminate the issue.

  2. Pull down the delayed ACK timer on your servers, LBs, etc. This will help, but not eliminate.

At the end of the day, it all boils down to your specific workload and dominant traffic patterns on a service. The extreme example is X windows: if I move my mouse or type a set of characters out on my terminal session, I want to see those events echoed immediately.

COMMON PROTOCOLS that take the Nagle hit are:

  1. Interactive type traffic
  2. RPC-style calls (MS-RPC, SOAP, XMLRPC, CICS, some JMS, etc.)
  3. HTTP / web traffic, at least in many cases

Some examples where Nagle really helps:

  1. Large file transfers
  2. Large responses in email
  3. Large CIFS
  4. Large images

…the pattern should be pretty clear by now.

Hope this helps!

–Matt


#2

I forgot to add this killer write-up by Stuart Cheshire: definitely worth a read http://www.stuartcheshire.org/papers/NagleDelayedAck/


#3

And an answer to a question about this implementation on servers.

Q: If it’s not practical to disable Nagle on servers, are there any drawbacks to turning off delayed acks on the ‘upstream’ devices?

A: Very generally speaking, the L7 process that is handling the connections on the server side will have set their sockets up to disable nagle (if that is appropriate), or they will expose it as a configuration option. For example, Apache and other web servers will usually have this socket option set (TCP_NODELAY). By far the most common cause that I’ve seen is full-proxy devices in flow, like an application delivery controller. This makes resolution easy.
Turning off delayed ack entirely will cause more chatter on the wire, as you’ll be sending far more bare acks. This may or may not be an issue - it sort of depends. But pulling down the delayed ack timer is often a good compromise.