GSO (“Generic Segmentation Offload”) is a performance optimization which is a generalisation of the concept of TSO .
It has been added into Linux 2.6.18
Taken from Herbert Xu's posting on linux-netdev
Many people have observed that a lot of the savings in TSO come from traversing the networking stack once rather than many times for each super-packet. These savings can be obtained without hardware support. In fact, the concept can be applied to other protocols such as TCPv6, UDP, or even DCCP.
The key to minimising the cost in implementing this is to postpone the segmentation as late as possible. In the ideal world, the segmentation would occur inside each NIC driver where they would rip the super-packet apart and either produce SG (scatter/gather) lists which are directly fed to the hardware, or linearise each segment into pre-allocated memory to be fed to the NIC. This would elminate segmented skb's altogether.
Unfortunately this requires modifying each and every NIC driver so it would take quite some time. A much easier solution is to perform the segmentation just before the entry into the driver's xmit routine. This concept is called GSO: Generic Segmentation Offload.
Herbert Xu has also posted some numbers on the performance gains by doing this:
The test was performed through the loopback device which is a fairly good approxmiation of an SG-capable NIC. GSO like TSO is only effective if the MTU is significantly less than the maximum value of 64K. So only the case where the MTU was set to 1500 is of interest. There we can see that the throughput improved by 17.5% (3061.05Mb/s ⇒ 3598.17Mb/s). The actual saving in transmission cost is in fact a lot more than that as the majority of the time here is spent on the RX side which still has to deal with 1500-byte packets.
The worst-case scenario is where the NIC does not support SG and the user uses write(2) which means that we have to copy the data twice. The files gso-off/gso-on provide data for this case (the test was carried out on e100). As you can see, the cost of the extra copy is mostly offset by the reduction in the cost of going through the networking stack.
For now GSO is off by default but can be enabled through ethtool. It is conceivable that with enough optimisation GSO could be a win in most cases and we could enable it by default.
However, even without enabling GSO explicitly it can still function on bridged and forwarded packets. As it is, passing TSO packets through a bridge only works if all constiuents support TSO. With GSO, it provides a fallback so that we may enable TSO for a bridge even if some of its constituents do not support TSO.
This provides massive savings for Xen as it uses a bridge-based architecture and TSO/GSO produces a much larger effective MTU for internal traffic between domains.