[nis-transit] Incident Report: Seattle IPv6 BGP session flapping

Fri May 15 23:40:57 UTC 2020

We have noticed frequent IPv6 BGP session flapping on our Seattle router. Upon further investigation, it appears to be caused by a misconfigured kernel parameter and/or a kernel bug.

A kernel parameter, "net.ipv6.route.max_size", has a low default value of 4096. 4096 value was fine as it was meant to be the size of the IPv6 routing table cache, not the size of the actual IPv6 routing table. However, on kernel 4.19, this parameter does not work properly and appears to affect the actual IPv6 routing table and breaks IPv6 forwarding. Here's a list of some of the clues we found:

[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=861115#10
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1221915#c7
[3] https://bird.network.cz/pipermail/bird-users/2020-February/014270.html
[4] https://serverfault.com/questions/902161/linux-host-randomly-stops-answering-ipv6-neighbor-solicitation-requests
[5] https://www.prolixium.com/blog?id=1041

We have increased the value of the "net.ipv6.route.max_size" parameter, which should mitigate the issue for now. We will continue to monitor the situation and try our best to solve the issue.

