In past blog posts, we've talked about performance metrics, such as TCP retransmission timeouts (RTOs) and database performance metrics, which help IT teams solve hard-to-pinpoint issues before they cause big problems. This month, we turn our attention to an especially tricky network performance metric: PAWS Dropped SYNs.
One customer we worked with saw odd connection behavior between their office network and colocated data center. The speed at which the network established new connections would decrease slowly over time, and sometimes the connections would fail altogether. Looking at traffic through the ExtraHop system, one metric that stood out was PAWS Dropped SYNs. This metric counts the number of times synchronizations are dropped due to the Protection Against Wrapped Sequence numbers (PAWS) mechanism that is part of the TCP protocol.
PAWS was proposed for TCP back in 1992 to protect against old duplicate segments from corrupting open TCP connections. This approach was necessary on faster Internet connections where TCP sequence numbers could wrap during the transfer of a long data stream. At gigabit network speeds, this issue could occur after approximately 17 seconds. In addition to checking the sequence number, PAWS also considers the timestamp of the packet to ensure the packet is new and not a duplicate. If the sequence number has wrapped, but the timestamp is monotonically higher, packets are not discarded and the data transfer continues.
However, in some fairly common deployment scenarios, the PAWS mechanism can introduce unintended consequences. In the case of one of our customers, PAWS dropped SYNs were the culprit behind failing connection attempts from a branch office to a datacenter. ExtraHop was able to confirm this conclusion right away by analyzing connection attempts in real time during a live repro test. From that point, the customer IT team quickly deduced the root cause: the NAT-enabled firewall obscured the IP addresses for multiple servers, many of which had mismatched internal clocks and sent out packets with timestamps that sometimes did not fall into sequence. If a synchronization attempt from behind the NAT carried a timestamp that ran afoul of PAWS, it got dropped because the receiving device interpreted the packet as belonging to a previous connection.
Have you seen slow connection attempts in your environment? If so, PAWS may be the culprit. If this is the case, the solution is often much simpler than diagnosing the problem. Either increase the connection linger time on the NAT device or decrease the connection linger time on the receiving device.
We'd like to hear about your experiences with similar problems. Just leave a comment and let us know your take on PAWS Dropped SYNs.
This is a companion discussion topic for the original entry at http://www.extrahop.com/post/blog/performance-metrics/paws-dropped-syns-network-address-translation/