The TCP/IP stack written in Python
PyTCP is a TCP/IP stack written in pure Python. It runs in user space, attached to a Linux TAP/TUN interface, and implements the protocol layers itself rather than calling the host stack.
The stack covers Ethernet II and IEEE 802.3 framing, ARP, IPv4 and IPv6 (extension headers and fragmentation), ICMPv4 and ICMPv6, IPv6 Neighbor Discovery and SLAAC, a DHCPv4 client, UDP, and RFC 9293 TCP. The TCP implementation includes the full finite state machine, congestion control (CUBIC, NewReno, PRR, HyStart++), SACK and RACK-TLP loss recovery, and RFC 5961 hardening. It exchanges traffic with other hosts on the local segment and over the Internet.
The project's goal is a pure-Python stack that is feature-equivalent to the Linux kernel network stack. RFC text is the primary authority; where a spec is silent or offers a choice, PyTCP follows Linux. Host-stack parity is the current scope; router-grade forwarding is planned.
Behaviour is covered by roughly 11,000 unit and integration tests and tracked against more than 100 per-RFC adherence audits kept in the repository under docs/rfc/.
The stack has zero runtime dependencies (standard library only), is organised as three packages (net_addr, net_proto, pytcp), and exposes a Berkeley-sockets-style API so it can be used in place of the standard socket layer.
Contributions are welcome.
- Zero-copy packet parser and assembler (buffer-protocol / memoryview based).
net_addrvalue-type libraries for MAC / IPv4 / IPv6 addresses, networks, hosts and masks - no Python standard-library dependency.- Importable as a zero-runtime-dependency library (stdlib only), split into three independent packages:
net_addr,net_proto,pytcp. - Event-driven millisecond-resolution timer (heap-based deadline scheduler, no polling tick).
- Runtime-tunable sysctl registry mirroring the Linux
/proc/sys/net/surface (boot-time and live overrides). - Link control API (ip-link-style): per-interface MAC / MTU / state / counters.
- Per-protocol packet-flow stat counters; TX-path feedback so send failures reach sockets.
- Homegrown high-performance logger (no third-party logging dependency).
- Berkeley-sockets-style API for TCP / UDP / RAW:
fileno()/eventfd +selectorsintegration, blocking & non-blocking modes, errno-mappedOSError,getaddrinfofamily, commonsetsockoptoptions,IP_RECVERR/MSG_ERRQUEUEerror queue. - Native
unittestsuite (~11,000 unit + integration tests); per-RFC adherence audits indocs/rfc/.
- Ethernet II framing with EtherType demux, broadcast and multicast mapping (RFC 894)
- Inbound IEEE 802.3 / LLC + SNAP support (RFC 1042)
- ARP resolution with a neighbor cache, replies and queries (RFC 826, RFC 1122)
- IPv4 Address Conflict Detection — probe, announce, defend (RFC 5227)
- IANA-correct ARP codepoint handling (RFC 5494)
- IPv4 with options parsing, inbound reassembly and outbound fragmentation (RFC 791, RFC 815)
- Multiple host addresses; private, special-purpose and broadcast address handling (RFC 1918, RFC 6890, RFC 919, RFC 922)
- ECN, DSCP and Router Alert support (RFC 3168, RFC 2474, RFC 6398)
- IPv4 link-local autoconfiguration (RFC 3927)
- Host-side IP multicasting (RFC 1112)
- Echo, Destination Unreachable, Time Exceeded and Parameter Problem, with RFC-correct generation gating and rate-limiting (RFC 792, RFC 1122)
- Obsolete message types correctly omitted (RFC 6633, RFC 6918)
- IPv6 with the full extension-header chain and TLV options (RFC 8200)
- Inbound reassembly and outbound fragmentation, with fragmentation hardening (RFC 5722, RFC 6946, RFC 7739)
- Unique-local and special-purpose addressing (RFC 4193, RFC 8190)
- Flow-label generation (RFC 6437)
- Default source-address selection (RFC 6724); Path MTU Discovery (RFC 8201); node requirements (RFC 8504)
- Full ICMPv6 message set including Packet Too Big (RFC 4443)
- Stateless Address Autoconfiguration: link-local, DAD, RA prefixes and lifetimes (RFC 4862)
- Stable opaque and temporary (privacy) addresses (RFC 7217, RFC 8981)
- Optimistic DAD, Enhanced DAD and Gratuitous NA (RFC 4429, RFC 7527, RFC 9131)
- Neighbor Discovery with a NUD cache and Router Solicitation backoff (RFC 4861, RFC 7559)
- MLDv2 listener (RFC 3810)
- UDP with full host-requirements conformance (RFC 768, RFC 1122)
- Zero-checksum UDP over IPv6 (RFC 6935)
- Ephemeral-port randomisation (RFC 6056)
- Echo / Discard / Daytime example services
- Complete TCP: full finite state machine and reliable bulk transfer (RFC 9293, RFC 1122)
- Modern congestion control — CUBIC, NewReno, PRR, HyStart++, ABE, IW10 (RFC 9438, RFC 6582, RFC 6937, RFC 9406, RFC 8511, RFC 6928)
- Advanced loss recovery — SACK, D-SACK, RACK-TLP, F-RTO, limited transmit (RFC 2018, RFC 2883, RFC 8985, RFC 5682, RFC 3042)
- RFC-correct RTO with Karn's algorithm and backoff (RFC 6298, RFC 8961)
- Window Scale, Timestamps, PAWS, MSS and TCP Fast Open (RFC 7323, RFC 6691, RFC 7413)
- ECN and Accurate ECN (RFC 3168, RFC 9768)
- Blind-attack and ICMP-attack hardening, randomised ISS and ports, robust TIME-WAIT (RFC 5961, RFC 5927, RFC 6528, RFC 1337, RFC 6191)
- Keep-alive, zero-window probing, silly-window-syndrome avoidance, Nagle
- Full DHCPv4 client: lease acquisition, RENEW / REBIND / DECLINE (RFC 2131, RFC 1542)
- Detecting Network Attachment and client-ID handling (RFC 4436, RFC 6842, RFC 4361)
The PyTCP stack depends on a Linux TAP/TUN interface. The TAP interface is a virtual interface that, on the network end, can be 'plugged' into existing virtual network infrastructure via either Linux bridge or Open vSwitch. On the internal end, the TAP interface can be used like any other NIC by programmatically sending and receiving packets to/from it.
If you wish to test the PyTCP stack in your local network, I'd suggest creating the following network setup that will allow you to connect both the Linux kernel (essentially your Linux OS) and the PyTCP stack to your local network at the same time.
<INTERNET> <---> [ROUTER] <---> (eth0)-[Linux bridge]-(br0) <---> [Linux TCP/IP stack]
|
|--(tap7) <---> [PyTCP TCP/IP stack]After the example program (either client or service) starts the stack, it can communicate with it
via simplified BSD Sockets like API interface. There is also the possibility of sending packets
directly by calling one of the internal _phtx_*() methods on the PacketHandler.
In most cases, PyTCP should be cloned directly from the GitHub repository, as this type of installation provides full development and testing environment.
git clone https://github.com/ccie18643/PyTCPAfter cloning, we can run one of the included examples:
- Go to the stack root directory (it is called 'PyTCP').
- Run the
sudo make bridgecommand to create the 'br0' bridge if needed. - Run the
sudo make tap7command to create the tap7 interface and assign it to the 'br0' bridge. - Run the
make venvcommand to create the virtual environment for development and testing. - Run
. venv/bin/activatecommand to activate the virtual environment. - Execute any example, e.g.,
python -m examples.stack(see theexamples/directory; pass--helpfor options). - Hit Ctrl-C to stop it.
Stack parameters are configured per run via the stack.init(...) keyword arguments and the runtime sysctl registry (see pytcp/stack/), not a static config file.
PyTCP can also be installed as a regular module from the PyPi repository.
python -m pip install PyTCPAfter installation, please ensure the TAP interface is operational and added to the bridge.
sudo ip tuntap add name tap7 mode tap
sudo ip link set dev tap7 up
sudo ip link add name br0 type bridge
sudo ip link set dev br0 up
sudo ip link set dev tap7 master br0PyTCP is consumed as a library through the pytcp.stack lifecycle API
(stack.init(...) → stack.start() → stack.stop()) and the
pytcp.socket Berkeley-sockets-style API. The subsystems run in their own
threads; after start() control returns to your code.
For a complete, runnable reference — opening the TAP/TUN file descriptor,
calling stack.init(...), and driving the stack — see
examples/stack.py and the other programs in the
examples/ directory.
All output below is captured from a live stack on a Linux tap7
interface bridged to a LAN — PyTCP's own log plus a tshark wire
capture. RFC back-off delays (RFC 5227 ACD, RFC 4862 DAD) are
visible in the timestamps.
Every wire block uses the same columns:
time(s) PROTO src → dst summary
src → dst is the IPv4/IPv6 source → destination; for ARP it is
the ARP-payload sender → target. — marks IPv6 ND/MLD frames
whose link-local/multicast endpoints are named in the summary
instead (the boot capture did not record them as columns).
Every example is produced by the bundled tools/capture runner
and is reproducible. With the TAP/bridge up and the venv built —
sudo make tap7 && sudo make bridge && make venv— run any example with the exact command listed under it (loss is
random, so a --loss run differs every time; everything else is
deterministic). The general form is
sudo PYTHONPATH=. venv/bin/python -m tools.capture [GLOBAL OPTS] <scenario>;
python -m tools.capture --help lists every scenario and option.
Reproduce:
sudo PYTHONPATH=. venv/bin/python -m tools.capture bootOn start the stack autoconfigures itself: it derives an IPv6 link-local address and runs Duplicate Address Detection, reports its multicast groups via MLDv2, solicits routers, builds a global address from the Router Advertisement and DADs that too, then runs RFC 5227 conflict detection for its IPv4 address.
Stack log:
0000.05 | STACK | ICMPv6 ND DAD - Starting process for fe80::7bde:94e9:3254:9daf
0001.28 | STACK | ICMPv6 ND DAD - No duplicate address detected for fe80::7bde:94e9:3254:9daf
0001.28 | STACK | Successfully claimed IPv6 address fe80::7bde:94e9:3254:9daf/64
0001.28 | STACK | Sent out ICMPv6 ND Router Solicitation
0001.28 | STACK | ICMPv6 ND DAD - Starting process for 2603:808c:2800:4301:7d08:ba99:95db:c5
0002.78 | STACK | Successfully claimed IPv6 address 2603:808c:2800:4301:7d08:ba99:95db:c5/64
0006.21 | STACK | Sent out ARP Announcement for 192.168.1.77
0008.21 | STACK | Successfully claimed IPv4 address 192.168.1.77
Wire capture (tshark -i tap7, rebased to the first frame; IPv6
ND/MLD endpoints are now real columns, not —):
0.000 ARP 0.0.0.0 → 192.168.1.77 Who has 192.168.1.77? (ARP Probe)
0.177 ICMPv6 :: → ff02::1:ff54:9daf Neighbor Solicitation for fe80::7bde:94e9:3254:9daf (link-local DAD)
1.178 ICMPv6 fe80::7bde:94e9:3254:9daf → ff02::16 Multicast Listener Report Message v2
1.179 ICMPv6 fe80::7bde:94e9:3254:9daf → ff02::2 Router Solicitation from 02:00:00:77:77:77
1.679 ICMPv6 :: → ff02::1:ffdb:c5 Neighbor Solicitation for 2603:808c:2800:4301:7d08:ba99:95db:c5 (SLAAC GUA DAD)
6.107 ARP 192.168.1.77 → 192.168.1.77 ARP Announcement for 192.168.1.77
6.180 ICMPv6 fe80::7bde:94e9:3254:9daf → fe80::2e0:67ff:fe26:88cb Neighbor Advertisement fe80::7bde:94e9:3254:9daf (sol) is at 02:00:00:77:77:77
8.108 ARP 192.168.1.77 → 192.168.1.77 ARP Announcement for 192.168.1.77
Reproduce:
sudo PYTHONPATH=. venv/bin/python -m tools.capture arp-acdThe stack defends each configured IPv4 address: it sends three ARP
Probes (sender 0.0.0.0), and if no host objects, claims the
address with two ARP Announcements (sender = target).
Wire capture (tshark -i tap7 -f arp):
0.00 ARP 0.0.0.0 → 192.168.1.77 ARP Probe — Who has 192.168.1.77?
1.83 ARP 0.0.0.0 → 192.168.1.77 ARP Probe — Who has 192.168.1.77?
3.38 ARP 0.0.0.0 → 192.168.1.77 ARP Probe — Who has 192.168.1.77?
6.44 ARP 192.168.1.77 → 192.168.1.77 ARP Announcement for 192.168.1.77
8.45 ARP 192.168.1.77 → 192.168.1.77 ARP Announcement for 192.168.1.77
Probe vs. Announcement, decoded (tshark -V):
ARP Probe Opcode: request Sender IP: 0.0.0.0 Target IP: 192.168.1.77
ARP Announcement Opcode: request Sender IP: 192.168.1.77 Target IP: 192.168.1.77
Reproduce:
sudo PYTHONPATH=. venv/bin/python -m tools.capture ip4-icmp-echoA host on the segment pings the stack. Having learned the stack's MAC from its ARP Announcement, the host sends the Echo Request directly; the stack then resolves the host's MAC via ARP before replying:
Wire capture (tshark -i tap7, rebased to the first Echo Request):
0.000 ICMP 192.168.1.10 → 192.168.1.77 Echo (ping) request id=0x626e, seq=1, ttl=64
0.001 ARP 192.168.1.77 → 192.168.1.10 Who has 192.168.1.10? Tell 192.168.1.77
0.001 ARP 192.168.1.10 → 192.168.1.77 192.168.1.10 is at a2:4b:a1:00:92:56
0.001 ICMP 192.168.1.77 → 192.168.1.10 Echo (ping) reply id=0x626e, seq=1, ttl=64
1.001 ICMP 192.168.1.10 → 192.168.1.77 Echo (ping) request id=0x626e, seq=2, ttl=64
1.002 ICMP 192.168.1.77 → 192.168.1.10 Echo (ping) reply id=0x626e, seq=2, ttl=64
2.032 ICMP 192.168.1.10 → 192.168.1.77 Echo (ping) request id=0x626e, seq=3, ttl=64
2.033 ICMP 192.168.1.77 → 192.168.1.10 Echo (ping) reply id=0x626e, seq=3, ttl=64
From the pinging host:
3 packets transmitted, 3 received, 0% packet loss; rtt min/avg/max/mdev = 0.693/0.873/1.185/0.221 ms.
Reproduce:
sudo PYTHONPATH=. venv/bin/python -m tools.capture ip6-icmp-echoThe IPv6 counterpart: a host on a ULA pings the stack's IPv6 address. The host sends the Echo Request directly; the stack resolves the host with ICMPv6 Neighbor Discovery (Neighbor Solicitation → Neighbor Advertisement) before replying:
Wire capture (tshark -i tap7, rebased to the first Echo
Request; unrelated LAN router/host traffic filtered out):
0.000 ICMPv6 fd00:1::1 → fd00:1::77 Echo (ping) request id=0x626f, seq=1, hlim=64
0.001 ICMPv6 fd00:1::77 → ff02::1:ff00:1 Neighbor Solicitation for fd00:1::1 (from 02:00:00:77:77:77)
0.001 ICMPv6 fd00:1::1 → fd00:1::77 Neighbor Advertisement — fd00:1::1 is at a2:4b:a1:00:92:56
0.001 ICMPv6 fd00:1::77 → fd00:1::1 Echo (ping) reply id=0x626f, seq=1, hlim=255
1.001 ICMPv6 fd00:1::1 → fd00:1::77 Echo (ping) request id=0x626f, seq=2, hlim=64
1.002 ICMPv6 fd00:1::77 → fd00:1::1 Echo (ping) reply id=0x626f, seq=2, hlim=255
2.044 ICMPv6 fd00:1::1 → fd00:1::77 Echo (ping) request id=0x626f, seq=3, hlim=64
2.045 ICMPv6 fd00:1::77 → fd00:1::1 Echo (ping) reply id=0x626f, seq=3, hlim=255
(tshark's heuristic dissector tags the Echo payload as
"HiPerConTracer" — a harmless false positive; the frames are plain
ICMPv6 Echo.)
From the pinging host:
3 packets transmitted, 3 received, 0% packet loss; rtt min/avg/max/mdev = 0.680/0.882/1.276/0.278 ms.
Reproduce:
sudo PYTHONPATH=. venv/bin/python -m tools.capture ip4-tcp-monkeysPyTCP ships a matching TCP echo client and service
(examples/client__tcp_echo.py / examples/service__tcp_echo.py).
As a quick end-to-end check the client streams two ASCII-art
"monkeys" as the payload and the service echoes them back over the
TCP connection — the original "two monkeys delivered via TCP" demo,
now reproducible as plain text. Connecting to the service returns
its banner, then the monkeys make the full round trip through the
stack's TCP path intact; sending quit asks the service to close,
and PyTCP performs the graceful active close itself:
$ { printf 'malpi\n'; sleep 3; printf 'quit\n'; } | nc 192.168.1.77 7
***CLIENT OPEN / SERVICE OPEN***
______AAAA_______________AAAA______
VVVV VVVV
(__) (__)
\ \ / /
.="=. \ \ / /
_/.-.-.\_ _ > \ .="=. / <
( ( o o ) ) )) > \ / \ / <
|/ " \| // > \\_o_o_// <
\'---'/ // > ( (_) ) <
/`---`\ (( >| |<
/ /_,_\ \ \\ / |\___/| \
\_\_'__/ \ )) / \_____/ \
/` /`~\ |// / \
/ / \ / / o \
,--`,--'\/\ / ) ___ (
'-- "--' '--' / / \ \
( / \ )
>< ><
///\ /\\\
''' '''
***CLIENT OPEN, SERVICE CLOSING***
On the wire (tshark -i tap7, rebased to the SYN) — the full
RFC 9293 exchange, handshake through graceful close:
0.000 TCP 192.168.1.10 → 192.168.1.77 [SYN] Seq=0 MSS=1460 SACK_PERM WS=1024 TSopt
0.002 ARP 192.168.1.77 → 192.168.1.10 Who has 192.168.1.10? Tell 192.168.1.77
0.002 ARP 192.168.1.10 → 192.168.1.77 192.168.1.10 is at a2:4b:a1:00:92:56
0.002 TCP 192.168.1.77 → 192.168.1.10 [SYN,ACK] Seq=0 Ack=1 MSS=1460 SACK_PERM WS=128 TSopt
0.002 TCP 192.168.1.10 → 192.168.1.77 [ACK] Seq=1 Ack=1
0.002 TCP 192.168.1.10 → 192.168.1.77 [PSH,ACK] len 6 "malpi\n" (request)
0.005 TCP 192.168.1.77 → 192.168.1.10 [ACK] len 1448 banner + monkeys, segment 1 (full MSS)
0.005 TCP 192.168.1.10 → 192.168.1.77 [ACK] Ack=1449
0.007 TCP 192.168.1.77 → 192.168.1.10 [PSH,ACK] len 146 monkeys, segment 2
0.007 TCP 192.168.1.10 → 192.168.1.77 [ACK] Ack=1595
2.999 TCP 192.168.1.10 → 192.168.1.77 [PSH,ACK] len 5 "quit\n" (request)
3.000 TCP 192.168.1.77 → 192.168.1.10 [PSH,ACK] len 35 "SERVICE CLOSING" banner
3.000 TCP 192.168.1.10 → 192.168.1.77 [ACK] Ack=1630
3.003 TCP 192.168.1.77 → 192.168.1.10 [FIN,ACK] PyTCP active close
3.044 TCP 192.168.1.10 → 192.168.1.77 [ACK] Ack=1631 peer acks the FIN
6.000 TCP 192.168.1.10 → 192.168.1.77 [FIN,ACK] peer closes its half
6.001 TCP 192.168.1.77 → 192.168.1.10 [ACK] Ack=13 connection fully closed (no RST)
The stack negotiates MSS / SACK-permitted / window-scale /
timestamps on the handshake, resolves the peer's MAC via ARP
mid-handshake, segments the echoed monkeys to the MSS, tracks the
peer's cumulative ACKs, and on quit performs the RFC 9293 §3.6
active close — FIN, peer ACK, peer FIN, FIN ACK — a complete TCP
connection opened, used, and gracefully torn down entirely by
pure-Python code.
Reproduce:
sudo PYTHONPATH=. venv/bin/python -m tools.capture ip6-tcp-monkeysThe same demo, unchanged, over IPv6 (the service bound to a ULA; the host resolves it with ICMPv6 Neighbor Discovery instead of ARP). The IPv6 MSS is 1440 (vs 1460 on IPv4 — the 20-byte-larger fixed header). Same handshake, echo, and RFC 9293 §3.6 graceful close:
0.000 ICMPv6 fd00:1::1 → ff02::1:ff00:77 Neighbor Solicitation for fd00:1::77 (from a2:4b:a1:00:92:56)
0.001 ICMPv6 fd00:1::77 → fd00:1::1 Neighbor Advertisement — fd00:1::77 is at 02:00:00:77:77:77
0.001 TCP fd00:1::1 → fd00:1::77 [SYN] Seq=0 MSS=1440 SACK_PERM WS=1024 TSopt
0.003 TCP fd00:1::77 → fd00:1::1 [SYN,ACK] Seq=0 Ack=1 MSS=1440 SACK_PERM WS=128 TSopt
0.003 TCP fd00:1::1 → fd00:1::77 [ACK] Seq=1 Ack=1
0.003 TCP fd00:1::1 → fd00:1::77 [PSH,ACK] len 6 "malpi\n" (request)
0.006 TCP fd00:1::77 → fd00:1::1 [ACK] len 1428 banner + monkeys, segment 1 (full MSS)
0.006 TCP fd00:1::1 → fd00:1::77 [ACK] Ack=1429
0.008 TCP fd00:1::77 → fd00:1::1 [PSH,ACK] len 166 monkeys, segment 2
0.008 TCP fd00:1::1 → fd00:1::77 [ACK] Ack=1595
2.999 TCP fd00:1::1 → fd00:1::77 [PSH,ACK] len 5 "quit\n" (request)
3.001 TCP fd00:1::77 → fd00:1::1 [PSH,ACK] len 35 "SERVICE CLOSING" banner
3.001 TCP fd00:1::1 → fd00:1::77 [ACK] Ack=1630
3.005 TCP fd00:1::77 → fd00:1::1 [FIN,ACK] PyTCP active close
3.045 TCP fd00:1::1 → fd00:1::77 [ACK] Ack=1631 peer acks the FIN
6.001 TCP fd00:1::1 → fd00:1::77 [FIN,ACK] peer closes its half
6.002 TCP fd00:1::77 → fd00:1::1 [ACK] Ack=13 connection fully closed (no RST)
(tshark labels the port-7 data segments "ECHO" — a heuristic;
they are plain TCP.)
Reproduce:
sudo PYTHONPATH=. venv/bin/python -m tools.capture ip4-udp-monkeysThe same ASCII monkeys, echoed over the UDP service. The reply (~1.5 KB) exceeds the 1500-byte link MTU, so the stack IPv4-fragments it — the classic "IP fragmentation" demo, captured for real.
$ printf 'malpi\n' | nc -u 192.168.1.77 7
______AAAA_______________AAAA______
VVVV VVVV
(__) (__)
\ \ / /
.="=. \ \ / /
_/.-.-.\_ _ > \ .="=. / <
( ( o o ) ) )) > \ / \ / <
|/ " \| // > \\_o_o_// <
\'---'/ // > ( (_) ) <
/`---`\ (( >| |<
/ /_,_\ \ \\ / |\___/| \
\_\_'__/ \ )) / \_____/ \
/` /`~\ |// / \
/ / \ / / o \
,--`,--'\/\ / ) ___ (
'-- "--' '--' / / \ \
( / \ )
>< ><
///\ /\\\
''' '''
On the wire (tshark -i tap7, rebased to the request; the
summary carries the IPv4 fragmentation fields — IP-id, MF,
frag-offset):
0.000 UDP 192.168.1.10 → 192.168.1.77 id=0x9655 MF=0 off=0 UDP "malpi\n" request (14 B)
0.001 ARP 192.168.1.77 → 192.168.1.10 Who has 192.168.1.10? Tell 192.168.1.77
0.001 ARP 192.168.1.10 → 192.168.1.77 192.168.1.10 is at a2:4b:a1:00:92:56
0.001 UDP 192.168.1.77 → 192.168.1.10 id=0x0001 MF=1 off=0 fragment 1 — UDP header + first 1480 B
0.002 UDP 192.168.1.77 → 192.168.1.10 id=0x0001 MF=0 off=185 fragment 2 — final 89 B (offset 185×8 = 1480)
The oversized UDP datagram is split into two IPv4 fragments sharing
one IP id; the peer's kernel reassembles them and nc -u prints
the monkeys. The first datagram is held in the per-neighbour queue
until the ARP reply resolves the peer's MAC (RFC 1122 §2.3.2.2),
then both fragments are flushed in order — a fragmented datagram
delivered to a cold neighbour, lost by neither the DF bit nor a
single-slot queue.
Reproduce:
sudo PYTHONPATH=. venv/bin/python -m tools.capture ip6-udp-monkeysThe same oversized echo over IPv6. IPv6 fragments differently from IPv4: the base header is never modified — the source inserts a Fragment extension header (RFC 8200 §4.5), and only the source may fragment. The stack resolves the peer via ICMPv6 Neighbor Discovery (NS → NA), then emits the ~1.5 KB reply as two IPv6 fragments sharing one identification:
0.000 UDP fd00:1::1 → fd00:1::77 "malpi\n" request
0.001 ICMPv6 fd00:1::77 → ff02::1:ff00:1 Neighbor Solicitation for fd00:1::1 (from 02:00:00:77:77:77)
0.001 ICMPv6 fd00:1::1 → fd00:1::77 Neighbor Advertisement — fd00:1::1 is at a2:4b:a1:00:92:56
0.002 IPv6 fd00:1::77 → fd00:1::1 Fragment header: off=0 more=1 ident=0xc6713a45 next=UDP (fragment 1)
0.002 UDP fd00:1::77 → fd00:1::1 final fragment — reassembles to the 1569-byte datagram
(tshark labels the port-7 datagrams "ECHO" — a heuristic; they
are plain UDP. The 1561-byte reply + 8-byte UDP header = 1569 B,
over the 1500-byte link MTU, so the stack splits it across the two
fragments above.)
Reproduce:
sudo PYTHONPATH=. venv/bin/python -m tools.capture ip4-icmp-frag-rx --count 1The receive-side counterpart of the fragmentation demos. The host
sends a 4000-byte ping, which its kernel splits into three IPv4
fragments. The stack reassembles them into one Echo Request,
then replies with a 4000-byte Echo Reply that it itself
fragments into three:
0.000 IPv4 192.168.1.10 → 192.168.1.77 id=0xf29f MF=1 off=0 Echo Request — fragment 1/3
0.000 IPv4 192.168.1.10 → 192.168.1.77 id=0xf29f MF=1 off=185 fragment 2/3 (off 185×8 = 1480 B)
0.000 IPv4 192.168.1.10 → 192.168.1.77 id=0xf29f MF=0 off=370 fragment 3/3 → reassembles to Echo Request id=0x6271, seq=1
0.001 ARP 192.168.1.77 → 192.168.1.10 Who has 192.168.1.10? Tell 192.168.1.77
0.001 ARP 192.168.1.10 → 192.168.1.77 192.168.1.10 is at a2:4b:a1:00:92:56
0.002 IPv4 192.168.1.77 → 192.168.1.10 id=0x0001 MF=1 off=0 Echo Reply — fragment 1/3
0.002 IPv4 192.168.1.77 → 192.168.1.10 id=0x0001 MF=1 off=185 fragment 2/3
0.002 IPv4 192.168.1.77 → 192.168.1.10 id=0x0001 MF=0 off=370 fragment 3/3 → Echo Reply id=0x6271, seq=1
From the pinging host:
1 packets transmitted, 1 received, 0% packet loss; rtt min/avg/max/mdev = 2.048/2.048/2.048/0.000 ms
(4008 bytes from 192.168.1.77 — the full 4000-byte payload made
the round trip, reassembled on both ends).
Reproduce (needs a DHCPv4 server reachable on the bridge):
sudo PYTHONPATH=. venv/bin/python -m tools.capture ip4-dhcpWith no static IPv4 configured, the stack runs its DHCPv4 client: the full DORA exchange (Discover → Offer → Request → ACK), and then — because the address is unverified — RFC 5227 Address Conflict Detection on the DHCP-assigned address before it is used. A randomized RFC 2131 initial-desync delay (~6.8 s here) precedes the first Discover:
0.000 DHCP 0.0.0.0 → 255.255.255.255 DHCP Discover xid 0x3207aee
0.000 DHCP 192.168.1.1 → 255.255.255.255 DHCP Offer xid 0x3207aee (offers 192.168.1.145)
3.002 DHCP 0.0.0.0 → 255.255.255.255 DHCP Request xid 0x3207aee (requesting 192.168.1.145)
3.002 DHCP 192.168.1.1 → 255.255.255.255 DHCP ACK xid 0x3207aee (lease 3600 s)
3.810 ARP 0.0.0.0 → 192.168.1.145 ARP Probe — Who has 192.168.1.145? (RFC 5227 ACD on the leased address)
5.599 ARP 0.0.0.0 → 192.168.1.145 ARP Probe — Who has 192.168.1.145?
6.891 ARP 0.0.0.0 → 192.168.1.145 ARP Probe — Who has 192.168.1.145?
10.252 ARP 192.168.1.145 → 192.168.1.145 ARP Announcement for 192.168.1.145
12.252 ARP 192.168.1.145 → 192.168.1.145 ARP Announcement for 192.168.1.145
Stack log:
0015.05 | DHCP4 | Initial desync delay: 6.83s
0021.89 | DHCP4 | TX - DHCPv4 Request ... [message_type Discover ...]
0021.89 | DHCP4 | RX - DHCPv4 Reply ... yiaddr 192.168.1.145 ... [message_type Offer, server_id 192.168.1.1 ...]
0024.89 | DHCP4 | TX - DHCPv4 Request ... [message_type Request, server_id 192.168.1.1, req_ip_addr 192.168.1.145 ...]
0024.89 | DHCP4 | RX - DHCPv4 Reply ... [message_type ACK, lease_time 3600 ...]
0032.14 | DHCP4 | Lease acquired: 192.168.1.145/24 (lease_time=3600s, server=192.168.1.1)
Reproduce (asserts the connection still completes — exits non-zero if it does not):
sudo PYTHONPATH=. venv/bin/python -m tools.capture \
--loss 20 --expect-wire '\[FIN, ACK\]' ip4-tcp-monkeys
# … → [PASS] wire: /\[FIN, ACK\]/ , exit 0Every example above runs on a clean bridge, so the loss-recovery
machinery never fires. Driven through a tc netem loss 20%
qdisc, the same TCP monkeys exchange has segments dropped in both
directions — and the stack recovers: it retransmits its own
segments on RTO, the peer SACKs the holes, and the connection
still completes and closes cleanly (no RST). One representative
run (loss is random — every run drops different packets; the
invariant is that it completes), rebased to the SYN:
0.000 TCP 192.168.1.10 → 192.168.1.77 [SYN] Seq=0 MSS=1460 SACK_PERM WS=1024
0.003 TCP 192.168.1.77 → 192.168.1.10 [SYN,ACK] Seq=0 Ack=1
0.003 TCP 192.168.1.10 → 192.168.1.77 [ACK]
0.006 TCP 192.168.1.77 → 192.168.1.10 [PSH,ACK] open banner, Len 33
0.035 TCP 192.168.1.77 → 192.168.1.10 [PSH,ACK] [TCP Retransmission] Seq=1 Len 33 (banner drop → RTO resend)
0.035 TCP 192.168.1.10 → 192.168.1.77 [ACK] [Previous segment not captured] SACK SLE=1 SRE=34
0.036 TCP 192.168.1.77 → 192.168.1.10 [ACK] [TCP Dup ACK]
0.208 TCP 192.168.1.10 → 192.168.1.77 [PSH,ACK] [TCP Retransmission] "malpi\n" request resent
0.211 TCP 192.168.1.77 → 192.168.1.10 [PSH,ACK] monkeys, segment 1
0.279 TCP 192.168.1.77 → 192.168.1.10 [PSH,ACK] [TCP Retransmission] Seq=1482 Len 113 monkeys seg 2 resent
0.279 TCP 192.168.1.10 → 192.168.1.77 [ACK] SACK SLE=1482 SRE=1595
3.210 TCP 192.168.1.10 → 192.168.1.77 [PSH,ACK] "quit\n" request
4.001 TCP 192.168.1.77 → 192.168.1.10 [PSH,ACK] [TCP Retransmission] "SERVICE CLOSING" banner resent
4.004 TCP 192.168.1.77 → 192.168.1.10 [FIN,ACK] PyTCP active close
4.044 TCP 192.168.1.10 → 192.168.1.77 [ACK] peer acks the FIN
6.000 TCP 192.168.1.10 → 192.168.1.77 [FIN,ACK] peer closes its half
6.001 TCP 192.168.1.77 → 192.168.1.10 [ACK] connection fully closed (no RST)
--loss (and --delay-ms / --reorder / --duplicate /
--corrupt) plus the --expect-log / --expect-wire /
--expect-client assertions are global options that go before
any scenario, so any capture can be turned into a loss /
latency e2e check.