TCP Offload Engine (TOE) is the name for allowing the network driver to do part or all of the TCP/IP protocol processing. Vendors have made modifications to Linux to support TOE, and these changes have been submitted changes for kernel inclusion but were rejected. This page describes the reasons why Linux engineers currently feel that full network stack offload (TCP Offload Engine, TOE) has little merit.
A TOE net stack is closed source firmware. Linux engineers have no way to fix security issues that arise. As a result, only non-TOE users will receive security updates, leaving random windows of vulnerability for each TOE NIC's users.
Each TOE NIC has a limited lifetime of usefulness, because system hardware rapidly catches up to TOE performance levels, and eventually exceeds TOE performance levels. We saw this with 10mbit TOE, 100mbit TOE, gigabit TOE, and soon with 10gig TOE.
System administrators are quite familiar with how the Linux network stack interoperates with the world at large. TOE is a black box, each NIC requires re-examination of network behavior. Network scanners and analysis tools must be updated, or they will provide faulty analysis.
Experience has shown that TOE implementations require additional work (programming the hardware, hardware-specific socket manipulation) to set up and tear down connections. For connection intensive protocols such as HTTP, TOE often underperforms.
TOE NICs are more resource limited than your overall computer system. This is most readily apparent under load, when trying to support thousands of simultaneous connections. TOE NICs simply do not have the memory resources to buffer thousands of connections, much less have the CPU power to handle such loads. Further, each TOE NIC has different resource limitations (often unpublished, only to be discovered at the worst moments).
Once resources are exhausted, TOE will either fall back to 100% software net stack, defeating the purpose of TOE, or will deny service to additional clients.
If an attacker can discover the TOE NIC model in use, they can use this information to enable resource-based algorithmic attacks. For example, a SYN flood could potentially use up all TOE resources in a matter of seconds. The TOE NIC will either stop accepting connections (complete DoS), or will constantly bounce back to the software net stack.
Linux is the most RFC-compliant network stack available. TOE can at best equal this, and is more likely to diminish it. Further, as a black box, each TOE NIC may have a different level of RFC compliance, and different supported TCP/IP features.
TOE is by definition poorly integrated into Linux. TOE NICs will not provide netfilter, packet scheduling, QoS, and many other features that Linux users depend on. Or if they do provide this, they implement the features in a vendor-specific manner. The featureset becomes vendor-specific.
In order to configure a TOE NIC, hardware-specific tools are usually required. This dramatically increases support costs.
Linux engineers cannot provide an adequate level of support for TOE users, and must instead refer users to the vendor – who in all likelihood cares more about non-Linux operating systems.
Supporting TOE requires massive, heavily invasive hooks into the network stack. This increases the kernel maintenance burden on Linux engineers, to support a solution Linux engineers have no control over.
Linux has been in existence for over a decade, and some pieces of decade-old hardware continue to be used and supported. In contrast, most hardware vendors end-of-life (stop supporting) their hardware after just a few years. For most hardware vendors, the sales of old hardware simply do not justify dedicating engineers to Linux support for many years.
Similarly, kernel engineers must support TOE for as long as users continue to use the hardware. Hardware vendors disappear, get bought, or simply disappear (go out of business) during our maintenance timeframe. Once a hardware vendor loses interest in Linux, TOE NICs will cease to receive security updates, and hardware issues become incredibly difficult to debug. Each new generation of system hardware often requires re-examination of hardware drivers, a task made far more difficult without a hardware vendor to receive questions.
With TOE, the system no longer has a complete picture of all resources used by network connections. Some connections are software-based, and thus limited by existing policy controls (such as per-socket memory limits). Other connections are managed by TOE, and these details are hidden. As such, the VM cannot adequately manage overall socket buffer memory usage, TOE-enabled connections cannot be rate-limited by the same controls as software-based connections, per-user socket security limits may be ignored, etc.
Linux has several TCP Congestion Control algorithms available. For TOE connections, this would no longer be true, all the congestion control would be done by proprietary vendor specific algorithms on the card.