From Larry.Liu@sun.com Mon Jul 26 22:31:55 2010 Received: from sunmail6brm.central.sun.com (sunmail6brm.Central.Sun.COM [129.147.4.169]) by sac.sfbay.sun.com (8.13.8+Sun/8.13.8) with ESMTP id o6R5VtnO004922 for ; Mon, 26 Jul 2010 22:31:55 -0700 (PDT) Received: from nwk-avmta-2.sfbay.sun.com (nwk-avmta-2.SFBay.Sun.COM [129.145.155.6]) by sunmail6brm.central.sun.com (8.13.8+Sun/8.13.8/ENSMAIL,v2.4) with ESMTP id o6R5VqmY002179 for <@sunmail2sca.sfbay.sun.com:PSARC-ext@sun.com>; Tue, 27 Jul 2010 00:31:55 -0500 (CDT) Received: from pmxchannel-daemon.nwk-avmta-2.sfbay.sun.com by nwk-avmta-2.sfbay.sun.com (Sun Java System Messaging Server 6.2-3.04 (built Jul 15 2005)) id <0L6700G03BCJ8P00@nwk-avmta-2.sfbay.sun.com> for PSARC-ext@sun.com (ORCPT PSARC-ext@sun.com); Mon, 26 Jul 2010 22:31:31 -0700 (PDT) Received: from gmp-eb-inf-1.sun.com ([192.18.6.21]) by nwk-avmta-2.sfbay.sun.com (Sun Java System Messaging Server 6.2-3.04 (built Jul 15 2005)) with ESMTP id <0L6700KTVBCI5HA0@nwk-avmta-2.sfbay.sun.com> for PSARC-ext@sun.com (ORCPT PSARC-ext@sun.com); Mon, 26 Jul 2010 22:31:31 -0700 (PDT) Received: from fe-emea-13.sun.com (gmp-eb-lb-1-fe1.eu.sun.com [192.18.6.7] (may be forged)) by gmp-eb-inf-1.sun.com (8.13.7+Sun/8.12.9) with ESMTP id o6R5VU80011738 for ; Tue, 27 Jul 2010 05:31:30 +0000 (GMT) Received: from conversion-daemon.fe-emea-13.sun.com by fe-emea-13.sun.com (Sun Java(tm) System Messaging Server 7u2-7.04 64bit (built Jul 2 2009)) id <0L6700900BBMBD00@fe-emea-13.sun.com> for PSARC-ext@sun.com (ORCPT PSARC-ext@sun.com); Tue, 27 Jul 2010 06:31:21 +0100 (BST) Received: from [129.158.144.83] ([unknown] [129.158.144.83]) by fe-emea-13.sun.com (Sun Java(tm) System Messaging Server 7u2-7.04 64bit (built Jul 2 2009)) with ESMTPSA id <0L6700BGUBC60G90@fe-emea-13.sun.com> for PSARC-ext@sun.com (ORCPT PSARC-ext@sun.com); Tue, 27 Jul 2010 06:31:21 +0100 (BST) Date: Tue, 27 Jul 2010 13:31:30 +0800 From: Yu Larry Liu Subject: Add tunable to control RMW for Flash Devices [PSARC/2010/296 FastTrace timeout 08/02/2010] Sender: Larry.Liu@sun.com To: PSARC-ext@sun.com Cc: Bo.Zhou@sun.com Message-id: <4C4E6F32.1030007@Sun.COM> MIME-version: 1.0 Content-type: text/plain; CHARSET=US-ASCII; format=flowed Content-transfer-encoding: 7BIT X-PMX-Version: 5.4.1.325704 User-Agent: Thunderbird 2.0.0.23 (X11/20090929) Status: RO Content-Length: 5672 I'm sponsoring the following case for Bo Zhou. The case seeks patch binding. The timer is set to 08/02/2010. Template Version: @(#)sac_nextcase 1.70 03/30/10 SMI This information is Copyright (c) 2010, Oracle and/or its affiliates. All rights reserved. 1. Introduction 1.1. Project/Component Working Name: Add tunable to control RMW for Flash Devices 1.2. Name of Document Author/Supplier: Author: Bo Zhou 1.3. Date of This Document: 27, Jul, 2010 4. Technical Description: 4.1 Details o Background Recent putback enabled support for large-sector size disk drives: PSARC 2008/769 Multiple disk sector size support. This putback include (Read-Modify-Write) RMW code for disk drives which advertise 4K logical sector size to host: if non-4k aligned IO are issued, the sd driver uses a RMW to handle these cases and make sure applications and modules can still work as usual, albeit with some expected performance degradation. There are two kinds of disk block size: physical block size and logical block size. The physical block size is the physical unit of storage on the surface of the disk, it is the smallest unit of data which can be physically written to or read from the disk; The logical block size is the disk firmware presents itself to the host as a linear address space of logical block. Based on these two kinds of disk block size, we define two different modes of a disk: native mode and emulation mode. Native mode means a disk's physical block size equals to logical block size; emulation mode means the two block sizes are different, disk firmware reports different block size to the host. o Problems Disks in emulation mode requires RMW executed in either disk f/w or host OS (sd driver) in order to align logical I/O size and starting address to the physical ones, but where to perform RMW is not an option open to users now since normally running RMW algorithm in disk f/w has much better performance than in the host, RMW in sd driver is never triggered for devices which export 512B logical sector size. But in some case, a few SSDs have extremely bad performance with RMW in f/w, which requires to move RMW to sd driver to gain higher IOPS. In PSARC 2008/769, a tunable named "rmw-type" is defined to turn on or off RMW for disk drives in native mode. Native mode means a disk drive has the same logical and physical block size. But this tunable can not be used in emulation mode since it doesn't consider physical block size at all. o Proposal In order to contorl RMW in sd driver for disks in emulation mode, a new tunable "emulation-rmw" is proposed to open to users. It's a boolean type which 0 means disable RMW in sd while 1 means enable it. If no tunable is set, the RMW behavior will be according to the internal static configuration. This tunable is just applied to the types of flash devices which list in the disk table of sd driver and the ones support READ CAPACITY 16 command which physical block size is 4KB and logical block size is 512B. VID/PID is used to identify the types of devices above. For example, a user wants to turn on RMW for a flash device, he can add the follow lines to (s)sd.conf file: sd-config-list = "ATA MARVELL SD88SA02", "emulation-rmw:1"; o Performance gain Performance benchmark shows very positive results on F20/F5100: - ZFS random write, random fs block size test shows 100x improvement compared to running RMW in f/w. - ZFS random write, 4K-aligned fs block size test shows 230x improvement compared to running RMW in f/w. 4.2. Bug/RFE Number(s): 6947063 Add tunable to control RMW in sd driver for drives in emulation mode 6951276 FMOD, DOM, SSD device defaults can be optimized 4.5. Interfaces: This change exports the following interface: Interface name Commitment Data Type Comments ------------------------------------------------------------------------------ emulation-rmw Committed BOOLEAN Turn on/off RMW in sd driver for devices in emulation mode. 4.6. Doc Impact: Manpage change to sd(7D): emulation-rmw To turn on or turn off RMW in sd driver for disks in emulation mode. Emulation mode is a disk which has different physical block size and logical block size. This is to improve the throughputs of some SSDs which has bad RMW performance in firmware. 5. Reference Documents: [1] PSARC 2008/465 Improved [s]sd-config-list support http://sac.sfbay/PSARC/2008/465 [2] PSARC 2008/769 Multiple disk sector size support http://sac.sfbay/PSARC/2008/769/ 6. Resources and Schedule 6.4. Steering Committee requested information 6.4.1. Consolidation C-team Name: ON 6.5. ARC review type: Automatic 6.6. ARC Exposure: open From Larry.Liu@Sun.COM Mon Jul 26 22:40:14 2010 Received: from sunmail6brm.central.sun.com (sunmail6brm.Central.Sun.COM [129.147.4.169]) by sac.sfbay.sun.com (8.13.8+Sun/8.13.8) with ESMTP id o6R5eDgm005012 for ; Mon, 26 Jul 2010 22:40:14 -0700 (PDT) Received: from nwk-avmta-1.SFBay.Sun.COM (nwk-avmta-1.SFBay.Sun.COM [129.146.11.74]) by sunmail6brm.central.sun.com (8.13.8+Sun/8.13.8/ENSMAIL,v2.4) with ESMTP id o6R5eDQI005957 for <@sunmail2sca.sfbay.sun.com:PSARC-ext@sun.com>; Tue, 27 Jul 2010 00:40:13 -0500 (CDT) Received: from pmxchannel-daemon.nwk-avmta-1.sfbay.Sun.COM by nwk-avmta-1.sfbay.Sun.COM (Sun Java System Messaging Server 6.2-3.04 (built Jul 15 2005)) id <0L6700307BR1UL00@nwk-avmta-1.sfbay.Sun.COM> for PSARC-ext@sun.com (ORCPT PSARC-ext@sun.com); Mon, 26 Jul 2010 22:40:13 -0700 (PDT) Received: from gmp-eb-inf-2.sun.com ([192.18.6.24]) by nwk-avmta-1.sfbay.Sun.COM (Sun Java System Messaging Server 6.2-3.04 (built Jul 15 2005)) with ESMTP id <0L6700ESKBR0Y9E0@nwk-avmta-1.sfbay.Sun.COM> for PSARC-ext@sun.com (ORCPT PSARC-ext@sun.com); Mon, 26 Jul 2010 22:40:13 -0700 (PDT) Received: from fe-emea-10.sun.com (gmp-eb-lb-1-fe1.eu.sun.com [192.18.6.7] (may be forged)) by gmp-eb-inf-2.sun.com (8.13.7+Sun/8.12.9) with ESMTP id o6R5eBLb002086 for ; Tue, 27 Jul 2010 05:40:12 +0000 (GMT) Received: from conversion-daemon.fe-emea-10.sun.com by fe-emea-10.sun.com (Sun Java(tm) System Messaging Server 7u2-7.04 64bit (built Jul 2 2009)) id <0L6700L00BGFI500@fe-emea-10.sun.com> for PSARC-ext@sun.com (ORCPT PSARC-ext@sun.com); Tue, 27 Jul 2010 06:40:08 +0100 (BST) Received: from [129.158.144.83] ([unknown] [129.158.144.83]) by fe-emea-10.sun.com (Sun Java(tm) System Messaging Server 7u2-7.04 64bit (built Jul 2 2009)) with ESMTPSA id <0L670036SBQR7P90@fe-emea-10.sun.com> for PSARC-ext@sun.com (ORCPT PSARC-ext@sun.com); Tue, 27 Jul 2010 06:40:06 +0100 (BST) Date: Tue, 27 Jul 2010 13:40:15 +0800 From: Yu Larry Liu Subject: Add tunable to control RMW for Flash Devices [PSARC/2010/296 FastTrack timeout 08/02/2010] Sender: Larry.Liu@Sun.COM To: PSARC-ext@Sun.COM Cc: Bo Steven Zhou Message-id: <4C4E713F.5080309@Sun.COM> MIME-version: 1.0 Content-type: text/plain; CHARSET=US-ASCII; format=flowed Content-transfer-encoding: 7BIT X-PMX-Version: 5.4.1.325704 User-Agent: Thunderbird 2.0.0.23 (X11/20090929) Status: RO Content-Length: 5740 Sorry, there is a typo in the title. Resending. -Larry ======== I'm sponsoring the following case for Bo Zhou. The case seeks patch binding. The timer is set to 08/02/2010. Template Version: @(#)sac_nextcase 1.70 03/30/10 SMI This information is Copyright (c) 2010, Oracle and/or its affiliates. All rights reserved. 1. Introduction 1.1. Project/Component Working Name: Add tunable to control RMW for Flash Devices 1.2. Name of Document Author/Supplier: Author: Bo Zhou 1.3. Date of This Document: 27, Jul, 2010 4. Technical Description: 4.1 Details o Background Recent putback enabled support for large-sector size disk drives: PSARC 2008/769 Multiple disk sector size support. This putback include (Read-Modify-Write) RMW code for disk drives which advertise 4K logical sector size to host: if non-4k aligned IO are issued, the sd driver uses a RMW to handle these cases and make sure applications and modules can still work as usual, albeit with some expected performance degradation. There are two kinds of disk block size: physical block size and logical block size. The physical block size is the physical unit of storage on the surface of the disk, it is the smallest unit of data which can be physically written to or read from the disk; The logical block size is the disk firmware presents itself to the host as a linear address space of logical block. Based on these two kinds of disk block size, we define two different modes of a disk: native mode and emulation mode. Native mode means a disk's physical block size equals to logical block size; emulation mode means the two block sizes are different, disk firmware reports different block size to the host. o Problems Disks in emulation mode requires RMW executed in either disk f/w or host OS (sd driver) in order to align logical I/O size and starting address to the physical ones, but where to perform RMW is not an option open to users now since normally running RMW algorithm in disk f/w has much better performance than in the host, RMW in sd driver is never triggered for devices which export 512B logical sector size. But in some case, a few SSDs have extremely bad performance with RMW in f/w, which requires to move RMW to sd driver to gain higher IOPS. In PSARC 2008/769, a tunable named "rmw-type" is defined to turn on or off RMW for disk drives in native mode. Native mode means a disk drive has the same logical and physical block size. But this tunable can not be used in emulation mode since it doesn't consider physical block size at all. o Proposal In order to contorl RMW in sd driver for disks in emulation mode, a new tunable "emulation-rmw" is proposed to open to users. It's a boolean type which 0 means disable RMW in sd while 1 means enable it. If no tunable is set, the RMW behavior will be according to the internal static configuration. This tunable is just applied to the types of flash devices which list in the disk table of sd driver and the ones support READ CAPACITY 16 command which physical block size is 4KB and logical block size is 512B. VID/PID is used to identify the types of devices above. For example, a user wants to turn on RMW for a flash device, he can add the follow lines to (s)sd.conf file: sd-config-list = "ATA MARVELL SD88SA02", "emulation-rmw:1"; o Performance gain Performance benchmark shows very positive results on F20/F5100: - ZFS random write, random fs block size test shows 100x improvement compared to running RMW in f/w. - ZFS random write, 4K-aligned fs block size test shows 230x improvement compared to running RMW in f/w. 4.2. Bug/RFE Number(s): 6947063 Add tunable to control RMW in sd driver for drives in emulation mode 6951276 FMOD, DOM, SSD device defaults can be optimized 4.5. Interfaces: This change exports the following interface: Interface name Commitment Data Type Comments ------------------------------------------------------------------------------ emulation-rmw Committed BOOLEAN Turn on/off RMW in sd driver for devices in emulation mode. 4.6. Doc Impact: Manpage change to sd(7D): emulation-rmw To turn on or turn off RMW in sd driver for disks in emulation mode. Emulation mode is a disk which has different physical block size and logical block size. This is to improve the throughputs of some SSDs which has bad RMW performance in firmware. 5. Reference Documents: [1] PSARC 2008/465 Improved [s]sd-config-list support http://sac.sfbay/PSARC/2008/465 [2] PSARC 2008/769 Multiple disk sector size support http://sac.sfbay/PSARC/2008/769/ 6. Resources and Schedule 6.4. Steering Committee requested information 6.4.1. Consolidation C-team Name: ON 6.5. ARC review type: Automatic 6.6. ARC Exposure: open From garrett@damore.org Tue Jul 27 12:47:59 2010 Received: from sunmail2sca.sfbay.sun.com (sunmail2sca.SFBay.Sun.COM [129.145.155.234]) by sac.sfbay.sun.com (8.13.8+Sun/8.13.8) with ESMTP id o6RJlx08016637 for ; Tue, 27 Jul 2010 12:47:59 -0700 (PDT) Received: from nwk-avmta-2.sfbay.sun.com (nwk-avmta-2.SFBay.Sun.COM [129.145.155.6]) by sunmail2sca.sfbay.sun.com (8.13.8+Sun/8.13.8/ENSMAIL,v2.4) with ESMTP id o6RJlwwe025317 for <@sunmail2sca.sfbay.sun.com:PSARC-ext@sun.com>; Tue, 27 Jul 2010 12:47:58 -0700 (PDT) Received: from pmxchannel-daemon.nwk-avmta-2.sfbay.sun.com by nwk-avmta-2.sfbay.sun.com (Sun Java System Messaging Server 6.2-3.04 (built Jul 15 2005)) id <0L6800D09EZYWA00@nwk-avmta-2.sfbay.sun.com> for PSARC-ext@sun.com (ORCPT PSARC-ext@sun.com); Tue, 27 Jul 2010 12:47:58 -0700 (PDT) Received: from sca-ea-mail-3.sun.com ([192.18.43.21]) by nwk-avmta-2.sfbay.sun.com (Sun Java System Messaging Server 6.2-3.04 (built Jul 15 2005)) with ESMTP id <0L6800CIHEZYN640@nwk-avmta-2.sfbay.sun.com> for PSARC-ext@sun.com (ORCPT PSARC-ext@sun.com); Tue, 27 Jul 2010 12:47:58 -0700 (PDT) Received: from relay41i.sun.com ([192.5.209.70]) by sca-ea-mail-3.sun.com (8.13.6+Sun/8.12.9) with ESMTP id o6RJjAbV011744 for ; Tue, 27 Jul 2010 19:47:57 +0000 (GMT) Received: from mmp42es.mmp.us.syntegra.com ([160.41.221.11] [160.41.221.11]) by relay41i.sun.com with ESMTP id BT-MMP-297123 for PSARC-ext@sun.com; Tue, 27 Jul 2010 19:47:33 +0000 (Z) Received: from relay41i.sun.com (relay41i.sun.com [192.5.209.70]) by mmp42es.mmp.us.syntegra.com with ESMTP id BT-MMP-47999722 for PSARC-ext@sun.com; Tue, 27 Jul 2010 19:47:33 +0000 (Z) Received: from oproxy1-pub.bluehost.com ([66.147.249.253] [66.147.249.253]) by relay4i.sun.com id BT-MMP-14351480 for PSARC-ext@sun.com; Tue, 27 Jul 2010 19:47:32 +0000 (Z) Received: (qmail 16320 invoked by uid 0); Tue, 27 Jul 2010 18:48:57 +0000 Received: from unknown (HELO box374.bluehost.com) (69.89.31.174) by oproxy1.bluehost.com.bluehost.com with SMTP; Tue, 27 Jul 2010 18:48:57 +0000 Received: from cpe-75-82-74-133.socal.res.rr.com ([75.82.74.133] helo=[192.168.251.102]) by box374.bluehost.com with esmtpsa (SSLv3:AES256-SHA:256) (Exim 4.69) (envelope-from ) id 1Odq7f-0008GA-Os; Tue, 27 Jul 2010 13:47:32 -0600 Date: Tue, 27 Jul 2010 12:47:31 -0700 From: "Garrett D'Amore" Subject: Re: Add tunable to control RMW for Flash Devices [PSARC/2010/296 FastTrace timeout 08/02/2010] In-reply-to: <4C4E6F32.1030007@Sun.COM> To: Yu Larry Liu Cc: PSARC-ext@sun.com, Bo.Zhou@sun.com Message-id: <1280260051.6326.23.camel@velocity> MIME-version: 1.0 X-Mailer: Evolution 2.28.3 Content-type: text/plain; charset=UTF-8 Content-transfer-encoding: 7BIT DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=damore.org; h=Received:Subject:From:To:Cc:In-Reply-To:References:Content-Type:Date:Message-ID:Mime-Version:X-Mailer:Content-Transfer-Encoding:X-Identified-User; b=LF7T6Gr3QWiV4HrRd17Z9pFL1aVHIjXESocfDh0wvSGhwkoxyX93gKufyoI3KQh6wBzb5e5RYc/hVlvsfPLfr/EnUhNBocnzdUDZm9b+qf75Fqb+8z5ndQUDR/4ANdnX; X-PMX-Version: 5.4.1.325704 X-Brightmail-Tracker: AAAAAA== X-Identified-User: {2225:box374.bluehost.com:damoreor:damore.org} {sentby:smtp auth 75.82.74.133 authed with garrett+damore.org} X-Antispam: No, score=-2.6/5.0, scanned in 0.262sec at (localhost [127.0.0.1]) by smf-spamd v1.3.1 - http://smfs.sf.net/ References: <4C4E6F32.1030007@Sun.COM> Status: RO Content-Length: 6658 I'm really confused here. Why run in an emulation mode at all? It seems like we can align and use 4K blocks directly, then we should *always* do so - at least for those devices which have a 4K physical block size. The "emulated" block size might be helpful for legacy OS, but we can do better, can't we? Is there a reason that this we would ever, under normal circumstances, want to use RMW on these devices? Is there a reason that this should even be exposed as a tunable to customers? - Garrett On Tue, 2010-07-27 at 13:31 +0800, Yu Larry Liu wrote: > I'm sponsoring the following case for Bo Zhou. The case seeks patch > binding. The timer is set to 08/02/2010. > > Template Version: @(#)sac_nextcase 1.70 03/30/10 SMI > This information is Copyright (c) 2010, Oracle and/or its affiliates. > All rights reserved. > > 1. Introduction > 1.1. Project/Component Working Name: > Add tunable to control RMW for Flash Devices > > 1.2. Name of Document Author/Supplier: > Author: Bo Zhou > > 1.3. Date of This Document: > 27, Jul, 2010 > > 4. Technical Description: > > 4.1 Details > > o Background > > Recent putback enabled support for large-sector size disk drives: > PSARC 2008/769 Multiple disk sector size support. This putback > include (Read-Modify-Write) RMW code for disk drives which > advertise 4K logical sector size to host: if non-4k aligned IO > are issued, the sd driver uses a RMW to handle these cases and > make sure applications and modules can still work as usual, > albeit with some expected performance degradation. > > There are two kinds of disk block size: physical block size and > logical > block size. The physical block size is the physical unit of > storage on > the surface of the disk, it is the smallest unit of data which can be > physically written to or read from the disk; The logical block > size is > the disk firmware presents itself to the host as a linear address > space > of logical block. Based on these two kinds of disk block size, we > define > two different modes of a disk: native mode and emulation mode. Native > mode means a disk's physical block size equals to logical block size; > emulation mode means the two block sizes are different, disk firmware > reports different block size to the host. > > o Problems > > Disks in emulation mode requires RMW executed in either disk f/w > or host OS (sd driver) in order to align logical I/O size and > starting address to the physical ones, but where to perform > RMW is not an option open to users now since normally running RMW > algorithm in disk f/w has much better performance than in the host, > RMW in sd driver is never triggered for devices which export 512B > logical sector size. But in some case, a few SSDs have extremely > bad > performance with RMW in f/w, which requires to move RMW to sd driver > to gain higher IOPS. > > In PSARC 2008/769, a tunable named "rmw-type" is defined to turn on > or off RMW for disk drives in native mode. Native mode means a disk > drive has the same logical and physical block size. But this tunable > can not be used in emulation mode since it doesn't consider physical > block size at all. > > o Proposal > > In order to contorl RMW in sd driver for disks in emulation mode, > a new tunable "emulation-rmw" is proposed to open to users. > It's a boolean type which 0 means disable RMW in sd while 1 means > enable it. If no tunable is set, the RMW behavior will be > according to > the internal static configuration. > > This tunable is just applied to the types of flash devices which list > in the disk table of sd driver and the ones support READ CAPACITY 16 > command which physical block size is 4KB and logical block size is > 512B. > > VID/PID is used to identify the types of devices above. For example, > a user wants to turn on RMW for a flash device, he can add the follow > lines to (s)sd.conf file: > > sd-config-list = "ATA MARVELL SD88SA02", > "emulation-rmw:1"; > > o Performance gain > > Performance benchmark shows very positive results on F20/F5100: > > - ZFS random write, random fs block size test shows > 100x improvement compared to running RMW in f/w. > > - ZFS random write, 4K-aligned fs block size test shows > 230x improvement compared to running RMW in f/w. > > 4.2. Bug/RFE Number(s): > > 6947063 Add tunable to control RMW in sd driver for drives in > emulation mode > 6951276 FMOD, DOM, SSD device defaults can be optimized > > 4.5. Interfaces: > > This change exports the following interface: > > Interface name Commitment Data Type > Comments > > ------------------------------------------------------------------------------ > > emulation-rmw Committed BOOLEAN > Turn on/off RMW > in > sd driver for > > devices in > > emulation mode. > > 4.6. Doc Impact: > > Manpage change to sd(7D): > > emulation-rmw To turn on or turn off RMW in sd > driver for disks > in emulation mode. Emulation > mode is a disk which > has different physical block > size and logical block > size. This is to improve the > throughputs of some > SSDs which has bad RMW > performance in firmware. > > > 5. Reference Documents: > > [1] PSARC 2008/465 Improved [s]sd-config-list support > http://sac.sfbay/PSARC/2008/465 > > [2] PSARC 2008/769 Multiple disk sector size support > http://sac.sfbay/PSARC/2008/769/ > > 6. Resources and Schedule > > 6.4. Steering Committee requested information > 6.4.1. Consolidation C-team Name: > ON > 6.5. ARC review type: > Automatic > 6.6. ARC Exposure: > open > _______________________________________________ > opensolaris-arc mailing list > opensolaris-arc@opensolaris.org From Nicolas.Williams@oracle.com Tue Jul 27 13:13:28 2010 Received: from sunmail2sca.sfbay.sun.com (sunmail2sca.SFBay.Sun.COM [129.145.155.234]) by sac.sfbay.sun.com (8.13.8+Sun/8.13.8) with ESMTP id o6RKDSvn017438 for ; Tue, 27 Jul 2010 13:13:28 -0700 (PDT) Received: from brm-avmta-1.central.sun.com (brm-avmta-1.Central.Sun.COM [129.147.4.11]) by sunmail2sca.sfbay.sun.com (8.13.8+Sun/8.13.8/ENSMAIL,v2.4) with ESMTP id o6RKDQOV003502; Tue, 27 Jul 2010 13:13:26 -0700 (PDT) Received: from pmxchannel-daemon.brm-avmta-1.central.sun.com by brm-avmta-1.central.sun.com (Sun Java System Messaging Server 6.2-3.04 (built Jul 15 2005)) id <0L6800I07G6EY500@brm-avmta-1.central.sun.com>; Tue, 27 Jul 2010 14:13:26 -0600 (MDT) Received: from brmea-mail-4.sun.com ([192.18.98.36]) by brm-avmta-1.central.sun.com (Sun Java System Messaging Server 6.2-3.04 (built Jul 15 2005)) with ESMTP id <0L6800IHZG6EMGC0@brm-avmta-1.central.sun.com>; Tue, 27 Jul 2010 14:13:26 -0600 (MDT) Received: from rcsinet15.oracle.com (rcsinet15.oracle.com [148.87.113.117]) by brmea-mail-4.sun.com (8.13.6+Sun/8.12.9) with ESMTP id o6RKDPZE019823; Tue, 27 Jul 2010 20:13:25 +0000 (GMT) Received: from acsmt354.oracle.com (acsmt354.oracle.com [141.146.40.154]) by rcsinet15.oracle.com (Switch-3.4.2/Switch-3.4.1) with ESMTP id o6RICbmV031753; Tue, 27 Jul 2010 20:13:22 +0000 (GMT) Received: from abhmt015.oracle.com by acsmt355.oracle.com with ESMTP id 462350651280261600; Tue, 27 Jul 2010 13:13:20 -0700 Received: from oracle.com (/129.153.128.104) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 27 Jul 2010 13:13:19 -0700 Date: Tue, 27 Jul 2010 15:13:55 -0500 From: Nicolas Williams Subject: Re: Add tunable to control RMW for Flash Devices [PSARC/2010/296 FastTrace timeout 08/02/2010] In-reply-to: <1280260051.6326.23.camel@velocity> To: "Garrett D'Amore" Cc: Yu Larry Liu , PSARC-ext@sun.com, Bo.Zhou@sun.com Message-id: <20100727201354.GK566@oracle.com> MIME-version: 1.0 Content-type: text/plain; charset=us-ascii Content-transfer-encoding: 7BIT Content-disposition: inline X-PMX-Version: 5.4.1.325704 X-Source-IP: acsmt354.oracle.com [141.146.40.154] X-Auth-Type: Internal IP X-CT-RefId: str=0001.0A090206.4C4F3DE3.024A:SCFMA4539814,ss=1,fgs=0 References: <4C4E6F32.1030007@Sun.COM> <1280260051.6326.23.camel@velocity> User-Agent: Mutt/1.5.20 (2010-03-02) Status: RO Content-Length: 848 On Tue, Jul 27, 2010 at 12:47:31PM -0700, Garrett D'Amore wrote: > I'm really confused here. Why run in an emulation mode at all? It > seems like we can align and use 4K blocks directly, then we should > *always* do so - at least for those devices which have a 4K physical > block size. The "emulated" block size might be helpful for legacy OS, > but we can do better, can't we? > > Is there a reason that this we would ever, under normal circumstances, > want to use RMW on these devices? Is there a reason that this should > even be exposed as a tunable to customers? Probably when accessing UFS, FAT, and other such filesystems. But I don't get why this needs to be a tunable either, at least for ZFS, since we could ensure that ZFS always does 4KB aligned writes, and then who cares if UFS, FAT, and friends run slow on flash. Nico -- From Bo.Zhou@sun.com Tue Jul 27 16:06:55 2010 Received: from sunmail2sca.sfbay.sun.com (sunmail2sca.SFBay.Sun.COM [129.145.155.234]) by sac.sfbay.sun.com (8.13.8+Sun/8.13.8) with ESMTP id o6RN6tZL022337 for ; Tue, 27 Jul 2010 16:06:55 -0700 (PDT) Received: from brm-avmta-1.central.sun.com (brm-avmta-1.Central.Sun.COM [129.147.4.11]) by sunmail2sca.sfbay.sun.com (8.13.8+Sun/8.13.8/ENSMAIL,v2.4) with ESMTP id o6RN6sVm029047 for <@sunmail2sca.sfbay.sun.com:PSARC-ext@sun.com>; Tue, 27 Jul 2010 16:06:54 -0700 (PDT) Received: from pmxchannel-daemon.brm-avmta-1.central.sun.com by brm-avmta-1.central.sun.com (Sun Java System Messaging Server 6.2-3.04 (built Jul 15 2005)) id <0L6800B07O7IV500@brm-avmta-1.central.sun.com> for PSARC-ext@sun.com (ORCPT PSARC-ext@sun.com); Tue, 27 Jul 2010 17:06:54 -0600 (MDT) Received: from gmp-eb-inf-1.sun.com ([192.18.6.21]) by brm-avmta-1.central.sun.com (Sun Java System Messaging Server 6.2-3.04 (built Jul 15 2005)) with ESMTP id <0L680014DO7HG1B0@brm-avmta-1.central.sun.com> for PSARC-ext@sun.com (ORCPT PSARC-ext@sun.com); Tue, 27 Jul 2010 17:06:54 -0600 (MDT) Received: from fe-emea-13.sun.com (gmp-eb-lb-1-fe1.eu.sun.com [192.18.6.7] (may be forged)) by gmp-eb-inf-1.sun.com (8.13.7+Sun/8.12.9) with ESMTP id o6RN6rWg013088 for ; Tue, 27 Jul 2010 23:06:53 +0000 (GMT) Received: from conversion-daemon.fe-emea-13.sun.com by fe-emea-13.sun.com (Sun Java(tm) System Messaging Server 7u2-7.04 64bit (built Jul 2 2009)) id <0L6800I00NZZDV00@fe-emea-13.sun.com> for PSARC-ext@sun.com (ORCPT PSARC-ext@sun.com); Wed, 28 Jul 2010 00:06:28 +0100 (BST) Received: from [129.150.144.12] ([unknown] [129.150.144.12]) by fe-emea-13.sun.com (Sun Java(tm) System Messaging Server 7u2-7.04 64bit (built Jul 2 2009)) with ESMTPSA id <0L6800HYHO6P8CD0@fe-emea-13.sun.com> for PSARC-ext@sun.com (ORCPT PSARC-ext@sun.com); Wed, 28 Jul 2010 00:06:28 +0100 (BST) Date: Wed, 28 Jul 2010 07:06:20 +0800 From: bz211116 Subject: Re: Add tunable to control RMW for Flash Devices [PSARC/2010/296 FastTrace timeout 08/02/2010] In-reply-to: <20100727201354.GK566@oracle.com> Sender: Bo.Zhou@sun.com To: Nicolas Williams Cc: "Garrett D'Amore" , Yu Larry Liu , PSARC-ext@sun.com Message-id: <4C4F666C.8080101@sun.com> MIME-version: 1.0 Content-type: text/plain; CHARSET=US-ASCII; format=flowed Content-transfer-encoding: 7BIT X-PMX-Version: 5.4.1.325704 References: <4C4E6F32.1030007@Sun.COM> <1280260051.6326.23.camel@velocity> <20100727201354.GK566@oracle.com> User-Agent: Thunderbird 2.0.0.24 (Windows/20100228) Status: RO Content-Length: 1807 Nicolas Williams wrote: > On Tue, Jul 27, 2010 at 12:47:31PM -0700, Garrett D'Amore wrote: > >> I'm really confused here. Why run in an emulation mode at all? It >> seems like we can align and use 4K blocks directly, then we should >> *always* do so - at least for those devices which have a 4K physical >> block size. The "emulated" block size might be helpful for legacy OS, >> but we can do better, can't we? >> >> Is there a reason that this we would ever, under normal circumstances, >> want to use RMW on these devices? Is there a reason that this should >> even be exposed as a tunable to customers? >> > > Probably when accessing UFS, FAT, and other such filesystems. > Yes, For other file systems or applications which still issue non-4K aligned I/Os to those SSDs which has low performance RMW in f/w, turn on RMW in sd can greatly improve the performance, our experiments show 100x faster than RMW in f/w. But for those disk drives which perform RMW better at f/w, we should turn RMW in sd driver off. This is one reason we need this tunable. > But I don't get why this needs to be a tunable either, at least for ZFS, > since we could ensure that ZFS always does 4KB aligned writes, and then > who cares if UFS, FAT, and friends run slow on flash. > For ZFS, which now uses READ CAPACITY 16 (if succeed) to get the physical block size of SSD. If the physical block size is 4096, ZFS aligns its minimal I/O size and request address at 4K boundary which can get best performance. But most of the SSDs now still report 512B physical sector size or even do not support physical sector size at all, and some of them also has bad RMW performance in f/w. in this case, we should turn on RMW in sd for them. that's another reason we need this tunable. Thanks. -bo > Nico > From Nicolas.Williams@oracle.com Tue Jul 27 16:14:09 2010 Received: from sunmail2sca.sfbay.sun.com (sunmail2sca.SFBay.Sun.COM [129.145.155.234]) by sac.sfbay.sun.com (8.13.8+Sun/8.13.8) with ESMTP id o6RNE9wQ022422 for ; Tue, 27 Jul 2010 16:14:09 -0700 (PDT) Received: from brm-avmta-1.central.sun.com (brm-avmta-1.Central.Sun.COM [129.147.4.11]) by sunmail2sca.sfbay.sun.com (8.13.8+Sun/8.13.8/ENSMAIL,v2.4) with ESMTP id o6RNE7O1001372; Tue, 27 Jul 2010 16:14:07 -0700 (PDT) Received: from pmxchannel-daemon.brm-avmta-1.central.sun.com by brm-avmta-1.central.sun.com (Sun Java System Messaging Server 6.2-3.04 (built Jul 15 2005)) id <0L6800C03OJJHY00@brm-avmta-1.central.sun.com>; Tue, 27 Jul 2010 17:14:07 -0600 (MDT) Received: from brmea-mail-2.sun.com ([192.18.98.43]) by brm-avmta-1.central.sun.com (Sun Java System Messaging Server 6.2-3.04 (built Jul 15 2005)) with ESMTP id <0L68001LBOJIFZ90@brm-avmta-1.central.sun.com>; Tue, 27 Jul 2010 17:14:07 -0600 (MDT) Received: from acsinet15.oracle.com (acsinet15.oracle.com [141.146.126.227]) by brmea-mail-2.sun.com (8.13.6+Sun/8.12.9) with ESMTP id o6RNE6L6025322; Tue, 27 Jul 2010 23:14:06 +0000 (GMT) Received: from acsmt354.oracle.com (acsmt354.oracle.com [141.146.40.154]) by acsinet15.oracle.com (Switch-3.4.2/Switch-3.4.1) with ESMTP id o6RNE2sB015990; Tue, 27 Jul 2010 23:14:03 +0000 (GMT) Received: from abhmt001.oracle.com by acsmt355.oracle.com with ESMTP id 442355771280272334; Tue, 27 Jul 2010 16:12:14 -0700 Received: from oracle.com (/129.153.128.104) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 27 Jul 2010 16:12:13 -0700 Date: Tue, 27 Jul 2010 18:12:09 -0500 From: Nicolas Williams Subject: Re: Add tunable to control RMW for Flash Devices [PSARC/2010/296 FastTrace timeout 08/02/2010] In-reply-to: <4C4F666C.8080101@sun.com> To: bz211116 Cc: "Garrett D'Amore" , Yu Larry Liu , PSARC-ext@sun.com Message-id: <20100727231208.GO566@oracle.com> MIME-version: 1.0 Content-type: text/plain; charset=us-ascii Content-transfer-encoding: 7BIT Content-disposition: inline X-PMX-Version: 5.4.1.325704 X-Source-IP: acsmt354.oracle.com [141.146.40.154] X-Auth-Type: Internal IP X-CT-RefId: str=0001.0A090209.4C4F683C.00EC:SCFMA4539814,ss=1,fgs=0 References: <4C4E6F32.1030007@Sun.COM> <1280260051.6326.23.camel@velocity> <20100727201354.GK566@oracle.com> <4C4F666C.8080101@sun.com> User-Agent: Mutt/1.5.20 (2010-03-02) Status: RO Content-Length: 1783 On Wed, Jul 28, 2010 at 07:06:20AM +0800, bz211116 wrote: > Nicolas Williams wrote: > >On Tue, Jul 27, 2010 at 12:47:31PM -0700, Garrett D'Amore wrote: > >>Is there a reason that this we would ever, under normal circumstances, > >>want to use RMW on these devices? Is there a reason that this should > >>even be exposed as a tunable to customers? > > > >Probably when accessing UFS, FAT, and other such filesystems. > > Yes, For other file systems or applications which still issue non-4K > aligned I/Os to those SSDs which has low performance RMW in f/w, turn > on RMW in sd can greatly improve the performance, our experiments show > 100x faster than RMW in f/w. But for those disk drives which perform > RMW better at f/w, we should turn RMW in sd driver off. This is one > reason we need this tunable. But do we care about performance of non-ZFS here? (Answer: probably. I'm thinking of raw device uses too.) > >But I don't get why this needs to be a tunable either, at least for ZFS, > >since we could ensure that ZFS always does 4KB aligned writes, and then > >who cares if UFS, FAT, and friends run slow on flash. > > For ZFS, which now uses READ CAPACITY 16 (if succeed) to get the > physical block size of SSD. If the physical block size is 4096, ZFS > aligns its minimal I/O size and request address at 4K boundary which > can get best performance. But most of the SSDs now still report 512B > physical sector size or even do not support physical sector size at > all, and some of them also has bad RMW performance in f/w. in this > case, we should turn on RMW in sd for them. that's another reason we > need this tunable. But shouldn't ZFS just always do 4KB aligned writes and be done? Who cares if the SSD claims to have a 512B physical sector size? Nico -- From Bo.Zhou@sun.com Tue Jul 27 16:18:23 2010 Received: from sunmail3mpk.sfbay.sun.com (sunmail3mpk.SFBay.Sun.COM [129.146.11.52]) by sac.sfbay.sun.com (8.13.8+Sun/8.13.8) with ESMTP id o6RNIN2K022436 for ; Tue, 27 Jul 2010 16:18:23 -0700 (PDT) Received: from brm-avmta-1.central.sun.com (brm-avmta-1.Central.Sun.COM [129.147.4.11]) by sunmail3mpk.sfbay.sun.com (8.13.8+Sun/8.13.8/ENSMAIL,v2.4) with ESMTP id o6RNIMSY019500 for <@sunmail2sca.sfbay.sun.com:PSARC-ext@sun.com>; Tue, 27 Jul 2010 16:18:23 -0700 (PDT) Received: from pmxchannel-daemon.brm-avmta-1.central.sun.com by brm-avmta-1.central.sun.com (Sun Java System Messaging Server 6.2-3.04 (built Jul 15 2005)) id <0L6800C09OQMVU00@brm-avmta-1.central.sun.com> for PSARC-ext@sun.com (ORCPT PSARC-ext@sun.com); Tue, 27 Jul 2010 17:18:22 -0600 (MDT) Received: from gmp-eb-inf-2.sun.com ([192.18.6.24]) by brm-avmta-1.central.sun.com (Sun Java System Messaging Server 6.2-3.04 (built Jul 15 2005)) with ESMTP id <0L68001YMOQLFZ90@brm-avmta-1.central.sun.com> for PSARC-ext@sun.com (ORCPT PSARC-ext@sun.com); Tue, 27 Jul 2010 17:18:22 -0600 (MDT) Received: from fe-emea-13.sun.com (gmp-eb-lb-1-fe1.eu.sun.com [192.18.6.7] (may be forged)) by gmp-eb-inf-2.sun.com (8.13.7+Sun/8.12.9) with ESMTP id o6RNILbY029622 for ; Tue, 27 Jul 2010 23:18:21 +0000 (GMT) Received: from conversion-daemon.fe-emea-13.sun.com by fe-emea-13.sun.com (Sun Java(tm) System Messaging Server 7u2-7.04 64bit (built Jul 2 2009)) id <0L6800I00OK1GX00@fe-emea-13.sun.com> for PSARC-ext@sun.com (ORCPT PSARC-ext@sun.com); Wed, 28 Jul 2010 00:17:56 +0100 (BST) Received: from [129.150.144.12] ([unknown] [129.150.144.12]) by fe-emea-13.sun.com (Sun Java(tm) System Messaging Server 7u2-7.04 64bit (built Jul 2 2009)) with ESMTPSA id <0L6800I3GOPTI500@fe-emea-13.sun.com> for PSARC-ext@sun.com (ORCPT PSARC-ext@sun.com); Wed, 28 Jul 2010 00:17:56 +0100 (BST) Date: Wed, 28 Jul 2010 07:17:48 +0800 From: bz211116 Subject: Re: Add tunable to control RMW for Flash Devices [PSARC/2010/296 FastTrace timeout 08/02/2010] In-reply-to: <20100727231208.GO566@oracle.com> Sender: Bo.Zhou@sun.com To: Nicolas Williams Cc: "Garrett D'Amore" , Yu Larry Liu , PSARC-ext@sun.com Message-id: <4C4F691C.1090907@sun.com> MIME-version: 1.0 Content-type: text/plain; CHARSET=US-ASCII; format=flowed Content-transfer-encoding: 7BIT X-PMX-Version: 5.4.1.325704 References: <4C4E6F32.1030007@Sun.COM> <1280260051.6326.23.camel@velocity> <20100727201354.GK566@oracle.com> <4C4F666C.8080101@sun.com> <20100727231208.GO566@oracle.com> User-Agent: Thunderbird 2.0.0.24 (Windows/20100228) Status: RO Content-Length: 1980 Nicolas Williams wrote: > On Wed, Jul 28, 2010 at 07:06:20AM +0800, bz211116 wrote: > >> Nicolas Williams wrote: >> >>> On Tue, Jul 27, 2010 at 12:47:31PM -0700, Garrett D'Amore wrote: >>> >>>> Is there a reason that this we would ever, under normal circumstances, >>>> want to use RMW on these devices? Is there a reason that this should >>>> even be exposed as a tunable to customers? >>>> >>> Probably when accessing UFS, FAT, and other such filesystems. >>> >> Yes, For other file systems or applications which still issue non-4K >> aligned I/Os to those SSDs which has low performance RMW in f/w, turn >> on RMW in sd can greatly improve the performance, our experiments show >> 100x faster than RMW in f/w. But for those disk drives which perform >> RMW better at f/w, we should turn RMW in sd driver off. This is one >> reason we need this tunable. >> > > But do we care about performance of non-ZFS here? (Answer: probably. > I'm thinking of raw device uses too.) > > >>> But I don't get why this needs to be a tunable either, at least for ZFS, >>> since we could ensure that ZFS always does 4KB aligned writes, and then >>> who cares if UFS, FAT, and friends run slow on flash. >>> >> For ZFS, which now uses READ CAPACITY 16 (if succeed) to get the >> physical block size of SSD. If the physical block size is 4096, ZFS >> aligns its minimal I/O size and request address at 4K boundary which >> can get best performance. But most of the SSDs now still report 512B >> physical sector size or even do not support physical sector size at >> all, and some of them also has bad RMW performance in f/w. in this >> case, we should turn on RMW in sd for them. that's another reason we >> need this tunable. >> > > But shouldn't ZFS just always do 4KB aligned writes and be done? Who > cares if the SSD claims to have a 512B physical sector size? > Most of SSDs in F5100/F20 is in this case... Thanks. -bo > Nico > From garrett@damore.org Mon Aug 16 21:16:58 2010 Received: from sunmail3mpk.sfbay.sun.com (sunmail3mpk.SFBay.Sun.COM [129.146.11.52]) by sac.sfbay.sun.com (8.13.8+Sun/8.13.8) with ESMTP id o7H4GvIE021288 for ; Mon, 16 Aug 2010 21:16:57 -0700 (PDT) Received: from brm-avmta-1.central.sun.com (brm-avmta-1.Central.Sun.COM [129.147.4.11]) by sunmail3mpk.sfbay.sun.com (8.13.8+Sun/8.13.8/ENSMAIL,v2.4) with ESMTP id o7H4GvrR019029 for <@sunmail2sca.sfbay.sun.com:PSARC-ext@sun.com>; Mon, 16 Aug 2010 21:16:57 -0700 (PDT) Received: from pmxchannel-daemon.brm-avmta-1.central.sun.com by brm-avmta-1.central.sun.com (Sun Java System Messaging Server 6.2-3.04 (built Jul 15 2005)) id <0L7A00E013W92R00@brm-avmta-1.central.sun.com> for PSARC-ext@sun.com (ORCPT PSARC-ext@Sun.COM); Mon, 16 Aug 2010 22:16:57 -0600 (MDT) Received: from brmea-mail-4.sun.com ([192.18.98.36]) by brm-avmta-1.central.sun.com (Sun Java System Messaging Server 6.2-3.04 (built Jul 15 2005)) with ESMTP id <0L7A00B7H3W8H910@brm-avmta-1.central.sun.com> for PSARC-ext@sun.com (ORCPT PSARC-ext@Sun.COM); Mon, 16 Aug 2010 22:16:56 -0600 (MDT) Received: from relay44i.sun.com ([192.5.209.118]) by brmea-mail-4.sun.com (8.13.6+Sun/8.12.9) with ESMTP id o7H49f3A003933 for ; Tue, 17 Aug 2010 04:16:56 +0000 (GMT) Received: from mmp42es.mmp.us.syntegra.com ([160.41.221.11] [160.41.221.11]) by relay44i.sun.com with ESMTP id BT-MMP-511999 for PSARC-ext@Sun.COM; Tue, 17 Aug 2010 04:16:56 +0000 (Z) Received: from relay45i.sun.com (relay45i.sun.com [192.5.209.94]) by mmp42es.mmp.us.syntegra.com with ESMTP id BT-MMP-81762557 for PSARC-ext@Sun.COM; Tue, 17 Aug 2010 04:16:56 +0000 (Z) Received: from oproxy1-pub.bluehost.com ([66.147.249.253] [66.147.249.253]) by relay4i.sun.com id BT-MMP-9062245 for PSARC-ext@Sun.COM; Tue, 17 Aug 2010 04:16:55 +0000 (Z) Received: (qmail 24487 invoked by uid 0); Tue, 17 Aug 2010 04:17:27 +0000 Received: from unknown (HELO box374.bluehost.com) (69.89.31.174) by oproxy1.bluehost.com.bluehost.com with SMTP; Tue, 17 Aug 2010 04:17:27 +0000 Received: from cpe-75-82-74-133.socal.res.rr.com ([75.82.74.133] helo=[192.168.251.102]) by box374.bluehost.com with esmtpsa (SSLv3:AES256-SHA:256) (Exim 4.69) (envelope-from ) id 1OlDba-0006Nn-QZ; Mon, 16 Aug 2010 22:16:55 -0600 Date: Mon, 16 Aug 2010 21:15:26 -0700 From: "Garrett D'Amore" Subject: Re: Add tunable to control RMW for Flash Devices [PSARC/2010/296 FastTrace timeout 08/02/2010] In-reply-to: <4C4F691C.1090907@sun.com> To: bz211116 Cc: Nicolas Williams , Yu Larry Liu , PSARC-ext@sun.com Message-id: <1282018526.1840.277.camel@velocity> MIME-version: 1.0 X-Mailer: Evolution 2.28.3 Content-type: text/plain; charset=UTF-8 Content-transfer-encoding: 7BIT DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=damore.org; h=Received:Subject:From:To:Cc:In-Reply-To:References:Content-Type:Date:Message-ID:Mime-Version:X-Mailer:Content-Transfer-Encoding:X-Identified-User; b=TJ1hITYSWS5pRg/azXrgk5DthnIIAOwYz2SLRfxZB27LA+7XneoSIbLoBvWg0gwmFr9IIvzJePaCGb+J1rd3S5nMg0aXTlJUzNhlixOtniN0f0ws24hAzwRZL9xbvsnZ; X-PMX-Version: 5.4.1.325704 X-Brightmail-Tracker: AAAAAA== X-Identified-User: {2225:box374.bluehost.com:damoreor:damore.org} {sentby:smtp auth 75.82.74.133 authed with garrett+damore.org} X-Antispam: No, score=-0.7/5.0, scanned in 0.279sec at (localhost [127.0.0.1]) by smf-spamd v1.3.1 - http://smfs.sf.net/ References: <4C4E6F32.1030007@Sun.COM> <1280260051.6326.23.camel@velocity> <20100727201354.GK566@oracle.com> <4C4F666C.8080101@sun.com> <20100727231208.GO566@oracle.com> <4C4F691C.1090907@sun.com> Status: RO Content-Length: 2213 Having looked at this case, it seems like the issues are settled. +1 - Garrett On Wed, 2010-07-28 at 07:17 +0800, bz211116 wrote: > Nicolas Williams wrote: > > On Wed, Jul 28, 2010 at 07:06:20AM +0800, bz211116 wrote: > > > >> Nicolas Williams wrote: > >> > >>> On Tue, Jul 27, 2010 at 12:47:31PM -0700, Garrett D'Amore wrote: > >>> > >>>> Is there a reason that this we would ever, under normal circumstances, > >>>> want to use RMW on these devices? Is there a reason that this should > >>>> even be exposed as a tunable to customers? > >>>> > >>> Probably when accessing UFS, FAT, and other such filesystems. > >>> > >> Yes, For other file systems or applications which still issue non-4K > >> aligned I/Os to those SSDs which has low performance RMW in f/w, turn > >> on RMW in sd can greatly improve the performance, our experiments show > >> 100x faster than RMW in f/w. But for those disk drives which perform > >> RMW better at f/w, we should turn RMW in sd driver off. This is one > >> reason we need this tunable. > >> > > > > But do we care about performance of non-ZFS here? (Answer: probably. > > I'm thinking of raw device uses too.) > > > > > >>> But I don't get why this needs to be a tunable either, at least for ZFS, > >>> since we could ensure that ZFS always does 4KB aligned writes, and then > >>> who cares if UFS, FAT, and friends run slow on flash. > >>> > >> For ZFS, which now uses READ CAPACITY 16 (if succeed) to get the > >> physical block size of SSD. If the physical block size is 4096, ZFS > >> aligns its minimal I/O size and request address at 4K boundary which > >> can get best performance. But most of the SSDs now still report 512B > >> physical sector size or even do not support physical sector size at > >> all, and some of them also has bad RMW performance in f/w. in this > >> case, we should turn on RMW in sd for them. that's another reason we > >> need this tunable. > >> > > > > But shouldn't ZFS just always do 4KB aligned writes and be done? Who > > cares if the SSD claims to have a 512B physical sector size? > > > Most of SSDs in F5100/F20 is in this case... > > Thanks. > -bo > > Nico > > >