From sac-owner Mon May 13 17:55:03 2002 Received: from sunmail2.sfbay.sun.com (sunmail2.SFBay.Sun.COM [129.149.246.180]) by sac.eng.sun.com (8.10.2+Sun/8.10.2) with ESMTP id g4E0t3X23972 for ; Mon, 13 May 2002 17:55:03 -0700 (PDT) Received: (from noaccess@localhost) by sunmail2.sfbay.sun.com (8.11.6+Sun/8.11.6/ENSMAIL,v2.2) id g4E0qee17399 for one-pager-not-2b-used-directly; Mon, 13 May 2002 17:52:40 -0700 (PDT) Received: from jurassic.eng.sun.com (jurassic-17-a.Eng.Sun.COM [129.146.17.55]) by sunmail2.sfbay.sun.com (8.11.6+Sun/8.11.6/ENSMAIL,v2.2) with ESMTP id g4E0qeq17393 for ; Mon, 13 May 2002 17:52:40 -0700 (PDT) Received: from dango (dango.SFBay.Sun.COM [129.144.60.45]) by jurassic.eng.sun.com (8.12.3+Sun/8.12.3) with ESMTP id g4E0qd93284163; Mon, 13 May 2002 17:52:39 -0700 (PDT) Date: Mon, 13 May 2002 17:52:38 -0700 (PDT) From: Adi Masputra Sender: masputra@dango.sfbay.sun.com To: one-pager@Sun.COM Subject: TCP Multi-Data Transmit (MDT) Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Status: RO Content-Length: 7939 Template Version: @(#)onepager.txt 1.19 02/02/28 SMI @(#)mdt.onepager.txt 1.13 This information is Copyright 2002, Sun Microsystems 1. Introduction 1.1. Project/Component Working Name: TCP Multi-Data Transmit (MDT) 1.2. Name of Document Author/Supplier: Adi Masputra 1.3. Date of This Document: 13 May 2002 1.4. Name of Major Document Customer(s)/Consumer(s): ONSC, PSARC, SNT, Networking & Security 1.5. Email Aliases: 1.5.1. Responsible Manager: Mimi Wong 1.5.2. Responsible Engineer: Adi Masputra 1.5.3. Marketing Manager: Smita Thakur 1.5.4. Interest List: Data Movement I-Team Multi-Data I-Team 2. Project Summary 2.1. Project Description The current model for packet transmission throughout the Solaris TCP/IP stack and network device driver is geared towards sending one packet at a time. Because of the high per-packet costs in the Solaris stack, this model consumes a lot of the system processing time and makes it hard for Solaris to saturate the 1 gigabit per second (Gbps) line rate on a single TCP connection. This project introduces a Multi-Data Transmit (MDT) mechanism support for TCP, which allows for multiple TCP segments of the same connection to be generated and processed together, thereby reducing the per-packet transmission costs. This translates to improving the host CPU utilization and network throughput. This scheme is hardware independent, and may be applied to all of Sun's network device drivers. A Sun-private protocol-independent extension to the X/Open Data Link Provider Interface (DLPI) [2] specification which would allow for multiple packet transmission is proposed. In addition, the project also defines a new capability type which may be used by the DLPI user and provider to query and obtain information [4] related to the MDT optimization. Source code of the prototype implementation along with a paper describing the technical details and benchmark results can be found in the MDT project home page: http://arachnid.eng/inet/InternetPerf/multidata/ 2.2. Risks and Assumptions There are no apparent risks which would keep this project from meeting its requirements. No assumptions have been made. 3. Business Summary 3.1. Problem Area The concept of transmitting multiple packets to a network device in one call from the networking stack to the data link layer is popular outside Sun, and lends itself very easily to improving network throughput. This is particularly important for high-speed networking, such as TCP/IP over Gigabit Ethernet. At any given time, the TCP/IP stack may have large enough open window to allow for multiple MSS segments to be sent out. Without the ability for multiple packet transmission, the TCP/IP stack and driver has to process and transmit each packet one at a time using either the DL_UNITDATA_REQ or the M_DATA fastpath method [3]. The current model is expensive with respect to DMA resources, due to the per-packet mapping and flushing that are prone to trashing the MMU page table entries, as well as incurring the overhead related to such operations. It is also suboptimal with respect to maintain- ing efficient CPU cache usages, because the current transmission scheme causes many traversals to be made across different networking modules when multiple packets are allowed to be transmitted (e.g. during TCP bulk data transfers). With simple changes allowing multiple packets to be transmitted from the TCP/IP stack to the driver per call, the overall system performance is increased. 3.2. Market/Requester All network applications, especially those involved in sending large amount of network data (e.g. Web and File servers) will benefit from this technology. 3.3. Business Justification This will help increase the performance of web and file servers. In addition, the effect it has in reducing the host CPU utilization will allow for more tasks to be handled by the system. 3.4. Competitive Analysis Hewlett-Packard's HP-UX is already capable of generating 1 Gbps network traffic without requiring the speed of the host CPUs to be in the gigahertz (GHz) range. 3.5. Opportunity Window/Exposure Solaris bulk data transfer performance will be more (or at least comparable) to those offered by the competitor. 3.6. How will you know when you are done? MDT capability is available and well tested, in addition to delivering at least 5% performance (CPU utilization and/or network throughput) improvement on benchmarks of real applications. In addition, quality will drive the schedule. 4. Technical Description The current DLPI data transmission path can accomodate data packet of no larger than the link MTU size per call. Each DLPI transmit call invokes expensive IOMMU operations, e.g. setting up the DVMA addresses and flushing the streaming buffer cache. MDT helps to combine many packets into one call, thus reducing the IOMMU costs. Under most conditions, the TCP/IP stack may be able to transmit more than one packets at a time. Since the current TCP send path generates up to one MSS size packet at a time, multiple traversal across the OS transmission path needs to be repeated until all of the packets are sent. This makes it hard to maintain high instruc- tion execution rate and data locality TCP MDT provides an ability for the networking stack to transmit multiple segments of the same TCP connection to the network driver in one call. It allows for the overhead of IOMMU operations to be amortized across a given number of packets, which in turn reduces the per-packet transmission costs. It also provides better cache utilization throughout the transmission path of the networking stack and driver, therefore reducing the CPU system time. More details can be found in [1]. 5. Reference Documents [1] A. Masputra, F. Dimambro, K. Poon. "An Efficient Networking Transmit Mechanism for Solaris: Multi-Data Transmit (MDT)." May 2002. http://arachnid.eng/inet/InternetPerf/multidata/mdt.pdf [2] DLPI: X/Open Data Link Provider Interface. PSARC/1997/235. [3] D. Butterfield. "Solaris Network Fastpath Technical Description." November 1998. http://devi.eng/~dab/fastpath.html [4] DL_CAPABILITY_REQ/DL_CONTROL_REQ extensible interface for detecting, enabling and controlling DLS provider capabilities. PSARC/2001/070. 6. Resources and Schedule 6.1. Projected Availability Q1 FY03 6.2. Cost of Effort 4 months, 2 person engineering 2 months, 1 person test engineering 6.3. Cost of Capital Resources No additional capital resources expected. 6.4. Steering Committee requested information 6.4.1. Consolidation C-team Name: Solaris OS/Net (ON C-team) 6.4.2. Contributing OpCo/BU/Division Name: SSG/SOE 6.4.3. Type of SC Approval needed: FastTrack 6.4.4. Project Boundary Conditions: TBW 6.4.5. Is this a necessary project for OEM agreements: [Y/N] N 6.4.6. Notes/Dependencies: None. 6.4.7. Target RTI Date/Release: Solaris 9 Update 3 (S9u3) 6.4.8. Target Code Design Review Date: 17 June 2002 6.4.9. Did this project have prior SOESC approval for a Marketing Release and now your requesting to go into an Update Release or Early Access CD? No 6.5. ARC review type: FastTrack 7. Prototype Availability 7.1. Prototype Availability Kernel implementation exists to support TCP MDT over IPv4, coupled with the MDT-capable Gigaswift Ethernet (internally known as Cassini) network device driver. IPv6 support will be incorporated in the delivered code. Pointers to the locations can be found in the project's home page. 7.2. Prototype Cost 2 man-month