In the previous exercise we implemented ECMP, a very basic (but widely used) technique to load balance traffic across multiple equal cost paths. ECMP works very well when it has to load balance many small flows with similar sizes (since it
randomly maps them to one of the possible paths). However, real traffic does not look as described above, real traffic is composed by many
small flows, but also but very few that are quite bigger. This makes ECMP suffer from a well-known performance problem such as hash collisions,
in which few big flows end up colliding in the same path. In this exercise we will use state and information provided by the simple_switch's
standard_metadata to fix the collision problem of ECMP, by implementing flowlet switching on top.
Flowlet switching leverages the burstiness of TCP flows to achieve a better load balancing. TCP flows tend to come in bursts (for instance because a flow needs to wait to get window space). Every time there is gap which is big enough (i.e., 50ms) between packets from the same flow, flowlet switching will rehash the flow to another path (by hashing an ID value together with the 5-tuple).
For more information about flowlet switching check out this paper
This exercise is an enhancement of ECMP Exercise. However, table names, action names, and arguments are not the exactly same. Please modify accordingly based on the information in the sx-runtimes.json files.
To solve this exercise you will have to use two registers, one for flowlet_ids (to extend previous hash fields) and one to keep the last timestamp for
every flow. You will have to slightly change the ingress logic, define a new action to read/write the flowlet registers. And modify
the hash function used in ECMP, adding a new field (the flowlet_id) which will vary over time.
You will have to fill the gaps in flowlet.p4 file. To successfully complete the exercise you have to do the following:
-
Like in the previous exercises, header definitions are already provided.
-
Define the parser that is able to parse packets up to
tcp. Note that for simplicity we do not considerudppackets in this exercise. -
Copy the tables and actions from the ECMP Exercise. However, table names, action names, and arguments are not the exactly same. Please modify accordingly based on the information in the
sx-runtimes.jsonfiles. -
Define two registers
flowlet_to_idandflowlet_time_stamp(for register sizing use the constant defined at the beginning offlowlet.p4file: REGISTER_SIZE, TIMESTAMP_WIDTH, ID_WIDTH). We will use this two registers to keep two things:-
In
flowlet_to_idregister we keep the id (a random generated number) of each flowlet, this id is now added to the hash function that devices the output port. As long as this id does not change, packets for that flow will stay in the same path. -
In
flowlet_time_stampregister we keep the last timestamp for the last observed packet belonging to a flow.
Note: for more information about registers look at the v1model.p4 architecture file or check the lecture slides.
-
-
Define an action to read the flowlet's register values (
read_flowlet_registers). In this task, you will need to hash the 5-tuple of each packet to determine the index for reading the flowlet registers. To store the index, define a new metadata field with a width of 14 bits. Using the index you got from the hash function read flowlet id and last timestamp and save them in a metadata field (you also have to define them). Finally, update the timestamp register usingstandard_metadata.ingress_global_timestamp. -
Define another action to update the flowlet id (
update_flowlet_id). We will use this action to update flowlet ids when needed. In this action you just have to generate a random number, and then save it in the flowlet to id register (using the id you already computed previously). -
Modify the
hashfunction you defined in the ECMP exercise (ecmp_group), now instead of just hashing the 5-tuple, you have to add the metadata field where you store theflowlet_idyou read from the register (or you just updated). -
Define the ingress control logic (keep the logic from the ecmp example and add):
Before applying the
ipv4_lpmtable:- Read the flowlet registers (calling the action)
- Compute the time difference between now and the last packet observed for the current flow.
- Check if the time difference is bigger than
FLOWLET_TIMEOUT(define at the beginning of the file with a default value of 200ms). - Update the flowlet id if the difference is bigger. Updating the flowlet id will make the hash function output a new value.
- Apply
ipv4_lpmandecmp_groupis the same way you did inecmp.
At this point you should be able to send ping requests from h1 to h2. You can run tcpdump from h2 to see the result: (tcpdump -enn -l -i eth0)
It is your task now to modify your existing solution such that the ping responses from h2 get back to h1. Note that you do not need to change anything from the p4-code itself. The only thing you have to do is to add more table entries in the sx-runtimes.json files. Don't forget to configure s6 as an ECMP load balancer as well, similar to s1.
Once you have finished both Task 1 and Task2, you can test its behaviour:
-
Start the topology (this will also compile and load the program).
make run
-
Check that you can ping:
mininet> pingall -
Monitor the 4 links from
s1that will be used duringecmp(froms1-eth2tos1-eth5). Doing this you will be able to check which path is each flow taking.sudo tcpdump -enn -i s1-ethX
-
Do iperf between two hosts:
If you do iperf between
h1andh2you should see all the packets cross the same interfaces almost all the time (unless you set the gap interval very small). -
Get a terminal in
h1. Use thesend.pyscript.send.py 10.0.6.2 <num_of_packets> <sleep_time_between_packets>
This will send
tcp synpackets with the same 5-tuple. You can play with the sleep time (third parameter). If you set it bigger than your gap, packets should change paths, if you set it smaller (set it quite smaller since the software model is not very precise) you will see all the packets cross the same interfaces.