From sacadmin Wed Mar 21 13:48:29 2007 Received: from jurassic.eng.sun.com (jurassic-224-b.SFBay.Sun.COM [129.146.224.130]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l2LKmT8k008267; Wed, 21 Mar 2007 13:48:29 -0700 (PDT) Received: from jurassic.eng.sun.com (localhost [127.0.0.1]) by jurassic.eng.sun.com (8.13.8+Sun/8.13.8) with ESMTP id l2LKmS4Y388571 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 21 Mar 2007 13:48:29 -0700 (PDT) Received: (from maybee@localhost) by jurassic.eng.sun.com (8.13.8+Sun/8.13.8/Submit) id l2LKmSXp388561; Wed, 21 Mar 2007 13:48:28 -0700 (PDT) Date: Wed, 21 Mar 2007 13:48:28 -0700 (PDT) From: Mark Maybee Message-Id: <200703212048.l2LKmSXp388561@jurassic.eng.sun.com> To: PSARC@sac.sfbay.sun.com Cc: neil.perrin@sun.com Subject: ZFS Separate Intent Log [PSARC/2007/171 Timeout: 03/28/2007] Status: RO Content-Length: 7139 Subject: PSARC FastTrack [03/28/2007]: ZFS Separate Intent Log Template Version: @(#)sac_nextcase %I% %G% SMI This information Copyright 2007 Sun Microsystems 1. Introduction 1.1. Project/Component Working Name: ZFS Separate Intent Log 1.2. Name of Document Author/Supplier: Author: Neil Perrin 1.3 Date of This Document: 21 March, 2007 4. Technical Description This case adds extensions to several existing zpool commands to allow separate log devices to be created and manipulated. It also extends the output of some commands to include log device status. The stability of these changes is committed, and the release binding is patch/micro. SUMMARY: This is a proposal to allow separate devices to be used for the ZFS Intent Log (ZIL). The sole purpose of this is performance. The devices can be disks, solid state drives, nvram drives, or any device that presents a block interface. PROBLEM: The ZIL satisfies the synchronous requirements of POSIX. For instance, databases often require their transactions to be on stable storage on return from the system call. NFS and other applications can also use fsync() to ensure data stability. The speed of the ZIL is therefore essential in determining the latency of writes for these critical applications. Currently the ZIL is allocated dynamically from the pool. It consists of a chain of varying block sizes which are anchored in fixed objects. Blocks are sized to fit the demand and will come from different metaslabs and thus different areas of the disk. This causes more head movement. Furthermore, the log blocks are freed as soon as the intent log transaction (system call) is committed. So a swiss cheesing effect can occur leading to pool fragmentation. PROPOSED SOLUTION: This proposal takes advantage of the greatly faster media speeds of nvram, solid state disks, or even dedicated disks. To this end, additional extensions to the zpool command are defined: zpool create log Creates a pool with a separate log. If more than one log device is specified then writes are load-balanced between devices. It's also possible to mirror log devices. For example a log consisting of two sets of two mirrors could be created thus: zpool create \ log mirror c1t8d0 c1t9d0 mirror c1t10d0 c1t11d0 A raidz/raidz2 log is not supported, nor is placing logs on files. zpool add log Creates a separate log if it doesn't exist, or adds extra devices if it does. zpool remove Remove the log devices. If all log devices are removed we revert to placing the log in the pool. Evacuating a log is easily handled by ensuring all txgs are committed. zpool attach Attaches a new log device to an existing log device. If the existing device is not a mirror then a 2 way mirror is created. If device is part of a two-way log mirror, attaching new_device creates a three-way log mirror, and so on. zpool detach pool Detaches a log device from a mirror. zpool status Additionally displays the log devices zpool iostat Additionally shows IO statistics for log devices. zpool export/import Will export and import the log devices. When a separate log that is not mirrored fails then logging will start using chained logs within the main pool. The name "log" will become a reserved word. Attempts to create a pool with the name "log" will fail with: "cannot create 'log': name is reserved pool name may have been omitted" Hot spares cannot replace log devices. BINDING: A micro/patch binding is requested. MAN PAGE CHANGES: *** zpool.ori Fri Mar 16 10:05:31 2007 --- zpool.new Sat Mar 17 00:50:24 2007 *************** *** 169,174 **** --- 169,180 ---- hot spares for a pool. For more information, see the "Hot Spares" section. + log A separate intent log device. If more than one + log device is specified then writes are load-balanced + between devices. Log devices can be also be mirrored. + However, neither raidz/raidz2 nor files are supported + for the intent log. For more information, see the + "Intent Log" section. *************** *** 286,292 **** --- 292,301 ---- pools. + Spares cannot replace log devices. + + Alternate Root Pools The "zpool create -R" and "zpool import -R" commands allow users to create and import a pool with a different root *************** *** 313,318 **** --- 322,346 ---- + Intent Log + The ZFS Intent Log satisfies the synchronous requirements of POSIX. + For instance, databases often require their transactions + to be on stable storage on return from the system call. + NFS and other applications can also use fsync() to ensure + data stability. By default, the intent log is allocated from + blocks within the pool, however for greater performance, separate + intent log device(s) can be specified. For example, + + # zpool create pool c0d0 c1d0 log c2d0 + + Multiple log devices can also be specified, and they can be + mirrored. See the EXAMPLES section later for an example of this. + + Log devices can be added with the "zpool add" command and + removed with the "zpool remove" command. + + + Subcommands All subcommands that modify state are logged persistently to the pool in their original form. *************** *** 355,362 **** ices specified on the command line. The pool name must begin with a letter, and can only contain alphanumeric characters as well as underscore ("_"), dash ("-"), and ! period ("."). The pool names "mirror", "raidz", and ! "spare" are reserved, as are names beginning with the pattern "c[0-9]". The vdev specification is described in the "Virtual Devices" section. --- 383,390 ---- ices specified on the command line. The pool name must begin with a letter, and can only contain alphanumeric characters as well as underscore ("_"), dash ("-"), and ! period ("."). The pool names "mirror", "raidz", ! "spare", and "log" are reserved, as are names beginning with the pattern "c[0-9]". The vdev specification is described in the "Virtual Devices" section. *************** *** 1283,1288 **** --- 1311,1324 ---- # zpool remove tank c0t2d0 + Example 11 Creating a pool with mirrored separate intent logs. + + The following command creates a pool with a separate intent + log consisting of two sets of two mirrors: + + # zpool create pool c0d0 c1d0 log mirror c2d0 c3d0 mirror c4d0 c5d0 + + EXIT STATUS The following exit values are returned: 6. Resources and Schedule 6.4. Steering Committee requested information 6.4.1. Consolidation C-team Name: ON 6.5. ARC review type: FastTrack From sacadmin Wed Mar 21 14:21:25 2007 Received: from eastmail1bur.East.Sun.COM (eastmail1bur.East.Sun.COM [129.148.9.49]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l2LLLOkd008990 for ; Wed, 21 Mar 2007 14:21:25 -0700 (PDT) Received: from thunk.east.sun.com (thunk.East.Sun.COM [129.148.174.66]) by eastmail1bur.East.Sun.COM (8.13.6+Sun/8.13.6/ENSMAIL,v2.2) with ESMTP id l2LLLKxp016783; Wed, 21 Mar 2007 17:21:20 -0400 (EDT) Received: from [IPv6:::1] (localhost [IPv6:::1]) by thunk.east.sun.com (8.13.8+Sun/8.13.8) with ESMTP id l2LLLKGp023886; Wed, 21 Mar 2007 17:21:20 -0400 (EDT) Subject: Re: ZFS Separate Intent Log [PSARC/2007/171 Timeout: 03/28/2007] From: Bill Sommerfeld To: Mark Maybee Cc: PSARC@sac.sfbay.sun.com, Neil.Perrin@sun.com In-Reply-To: <200703212048.l2LKmSXp388561@jurassic.eng.sun.com> References: <200703212048.l2LKmSXp388561@jurassic.eng.sun.com> Content-Type: text/plain Date: Wed, 21 Mar 2007 17:21:19 -0400 Message-Id: <1174512079.22436.75.camel@thunk> Mime-Version: 1.0 X-Mailer: Evolution 2.8.1.1 Content-Transfer-Encoding: 7bit Status: RO Content-Length: 1208 On Wed, 2007-03-21 at 13:48 -0700, Mark Maybee wrote: > + Intent Log > + The ZFS Intent Log satisfies the synchronous requirements of POSIX. > + For instance, databases often require their transactions > + to be on stable storage on return from the system call. > + NFS and other applications can also use fsync() to ensure > + data stability. By default, the intent log is allocated from > + blocks within the pool, however for greater performance, separate > + intent log device(s) can be specified. This implies that a dedicated log device will always be faster than putting the log within the pool. my insticts tell me that this is a situation where the real answer will start with "it depends"... When all devices are of similar speed, can we give any guidance better than "try a few configs and see what works best" at this point? Also, is there any guidance on how you might figure out how much nvram you'd need to keep the intent log size from being a bottleneck? (Maybe some of this guidance would belong in a higher level administrative/sizing/best practices document but I don't see one referenced in a recent version of zpool(1m)). - Bill From sacadmin Wed Mar 21 14:32:02 2007 Received: from ivrel.sfbay.sun.com (ivrel.SFBay.Sun.COM [129.146.74.76]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l2LLW2pt009472 for ; Wed, 21 Mar 2007 14:32:02 -0700 (PDT) Received: from ivrel (ivrel [129.146.74.76]) by ivrel.sfbay.sun.com (8.13.8+Sun/8.13.8) with SMTP id l2LLU04o004963; Wed, 21 Mar 2007 14:30:00 -0700 (PDT) Message-Id: <200703212130.l2LLU04o004963@ivrel.sfbay.sun.com> Date: Wed, 21 Mar 2007 14:30:00 -0700 (PDT) From: Glenn Skinner Reply-To: Glenn Skinner Subject: Re: 2007/171 [ZFS Separate Intent Log] To: PSARC@sac.sfbay.sun.com, maybee@jurassic.eng.sun.com Cc: neil.perrin@sun.com MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii Content-MD5: bkBccPWFTRi0+4oi9fpKqg== X-Mailer: dtmail 1.3.0 @(#)CDE Version 1.6_36 SunOS 5.11 sun4u sparc Status: RO Content-Length: 629 Date: Wed, 21 Mar 2007 13:48:28 -0700 (PDT) From: Mark Maybee Subject: ZFS Separate Intent Log [PSARC/2007/171 Timeout: 03/28/2007] ... The name "log" will become a reserved word. Attempts to create a pool with the name "log" will fail with: "cannot create 'log': name is reserved pool name may have been omitted" What happens to existing pools with this (now unfortunate) name? Does the introduction of this feature require an explicit "zpool upgrade" to enable it (at which point the upgrade could me made to fail pending renaming the offending pool)? -- Glenn From sacadmin Wed Mar 21 15:47:54 2007 Received: from sfbaymail1sca.SFBay.Sun.COM (sfbaymail1sca.SFBay.Sun.COM [129.145.154.35]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l2LMlsVK014491 for ; Wed, 21 Mar 2007 15:47:54 -0700 (PDT) Received: from brmea-mail-3.sun.com (brmea-mail-3.Sun.COM [192.18.98.34]) by sfbaymail1sca.SFBay.Sun.COM (8.13.6+Sun/8.13.6/ENSMAIL,v2.2) with ESMTP id l2LMlr9s026448 for ; Wed, 21 Mar 2007 15:47:53 -0700 (PDT) Received: from fe-amer-04.sun.com ([192.18.108.178]) by brmea-mail-3.sun.com (8.13.6+Sun/8.12.9) with ESMTP id l2LMlrev011737 for ; Wed, 21 Mar 2007 22:47:53 GMT Received: from conversion-daemon.mail-amer.sun.com by mail-amer.sun.com (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) id <0JF900F01Z7OZO00@mail-amer.sun.com> (original mail from Neil.Perrin@Sun.COM) for PSARC@sac.sfbay.sun.com; Wed, 21 Mar 2007 16:47:53 -0600 (MDT) Received: from [129.147.9.35] by mail-amer.sun.com (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) with ESMTPSA id <0JF9008PSZBLZC35@mail-amer.sun.com>; Wed, 21 Mar 2007 16:47:53 -0600 (MDT) Date: Wed, 21 Mar 2007 16:47:44 -0600 From: Neil Perrin Subject: Re: ZFS Separate Intent Log [PSARC/2007/171 Timeout: 03/28/2007] In-reply-to: <1174512079.22436.75.camel@thunk> Sender: Neil.Perrin@Sun.COM To: Bill Sommerfeld Cc: Mark Maybee , PSARC@sac.sfbay.sun.com, Cindy.Swearingen@Sun.COM Reply-to: Neil.Perrin@Sun.COM Message-id: <4601B610.5060802@Sun.COM> MIME-version: 1.0 Content-type: text/plain; format=flowed; charset=us-ascii Content-transfer-encoding: 7BIT X-Accept-Language: en-us, en References: <200703212048.l2LKmSXp388561@jurassic.eng.sun.com> <1174512079.22436.75.camel@thunk> User-Agent: Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.7) Gecko/20060120 Status: RO Content-Length: 2989 Bill Sommerfeld wrote On 03/21/07 15:21,: > On Wed, 2007-03-21 at 13:48 -0700, Mark Maybee wrote: > > >>+ Intent Log >>+ The ZFS Intent Log satisfies the synchronous requirements of POSIX. >>+ For instance, databases often require their transactions >>+ to be on stable storage on return from the system call. >>+ NFS and other applications can also use fsync() to ensure >>+ data stability. By default, the intent log is allocated from >>+ blocks within the pool, however for greater performance, separate >>+ intent log device(s) can be specified. > > > This implies that a dedicated log device will always be faster than > putting the log within the pool. my insticts tell me that this is a > situation where the real answer will start with "it depends"... Yes, you're right Bill. It will depend on the type of device added; the rest of the pool configuration (harware & software); and the application workload. I think the wording should be changed to: By default, the intent log is allocated from blocks within the pool, however separate intent log device(s) can be specified. Depending on the speed and size of the device(s), and the rest of the pool configuration, then greater application performance may be achieved. > > When all devices are of similar speed, can we give any guidance better > than "try a few configs and see what works best" at this point? Using a local benchmark that has multiple threads doing 8K O_DSYNC writes (similar to DB, I have seen it better to have 2 pool + 1 log rather than using all 3 devices in the pool. I believe, Bryan in his fishworks testing has seen a 1+1 configuration outperforms a 2 device pool. Of course nvram is much faster and solid state disks should be good as well. > Also, is there any guidance on how you might figure out how much nvram you'd > need to keep the intent log size from being a bottleneck? A good question, to which I don't have good answer. Again it will depend on the workload. However, if the log device becomes full we have the option of waiting for the oldest txg to sync (hopefully rare) or reverting to using the the main pool for the log. This later scheme is more complex for replay. Anyway, this is an implementation detail, and not part of this case. I agree, there will need to be visibility into such stall or overflow events - probably a kstat. More analysis needs to be done. > > (Maybe some of this guidance would belong in a higher level > administrative/sizing/best practices document but I don't see one > referenced in a recent version of zpool(1m)). Right. So the General Storage Pool Performance Considerations section of the ZFS Best Practices Guide (which Cindy helpfully pointed me at) will need updating: http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#General_Storage_Pool_Performance_Considerations > > - Bill > > > > From sacadmin Wed Mar 21 17:21:47 2007 Received: from sfbaymail2sca.sfbay.sun.com (sfbaymail2sca.SFBay.Sun.COM [129.145.155.42]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l2M0LkkX020647 for ; Wed, 21 Mar 2007 17:21:46 -0700 (PDT) Received: from brmea-mail-3.sun.com (brmea-mail-3.Sun.COM [192.18.98.34]) by sfbaymail2sca.sfbay.sun.com (8.13.6+Sun/8.12.10/ENSMAIL,v2.2) with ESMTP id l2M0LkMw007665 for ; Wed, 21 Mar 2007 17:21:46 -0700 (PDT) Received: from fe-amer-01.sun.com ([192.18.108.175]) by brmea-mail-3.sun.com (8.13.6+Sun/8.12.9) with ESMTP id l2M0Lkjc013101 for ; Thu, 22 Mar 2007 00:21:46 GMT Received: from conversion-daemon.mail-amer.sun.com by mail-amer.sun.com (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) id <0JFA002013L7KF00@mail-amer.sun.com> (original mail from Neil.Perrin@Sun.COM) for PSARC@sac.sfbay.sun.com; Wed, 21 Mar 2007 18:21:46 -0600 (MDT) Received: from [129.147.9.35] by mail-amer.sun.com (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) with ESMTPSA id <0JFA00DAX3OAJBD1@mail-amer.sun.com>; Wed, 21 Mar 2007 18:21:46 -0600 (MDT) Date: Wed, 21 Mar 2007 18:21:46 -0600 From: Neil Perrin Subject: Re: 2007/171 [ZFS Separate Intent Log] In-reply-to: <200703212130.l2LLU04o004963@ivrel.sfbay.sun.com> Sender: Neil.Perrin@Sun.COM To: Glenn Skinner Cc: PSARC@sac.sfbay.sun.com, maybee@jurassic.eng.sun.com Reply-to: Neil.Perrin@Sun.COM Message-id: <4601CC1A.6070501@Sun.COM> MIME-version: 1.0 Content-type: text/plain; format=flowed; charset=us-ascii Content-transfer-encoding: 7BIT X-Accept-Language: en-us, en References: <200703212130.l2LLU04o004963@ivrel.sfbay.sun.com> User-Agent: Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.7) Gecko/20060120 Status: RO Content-Length: 1004 Glenn Skinner wrote On 03/21/07 15:30,: > Date: Wed, 21 Mar 2007 13:48:28 -0700 (PDT) > From: Mark Maybee > Subject: ZFS Separate Intent Log [PSARC/2007/171 Timeout: > 03/28/2007] > > ... > The name "log" will become a reserved word. Attempts to create > a pool with the name "log" will fail with: > > "cannot create 'log': name is reserved > pool name may have been omitted" > > What happens to existing pools with this (now unfortunate) name? Does > the introduction of this feature require an explicit "zpool upgrade" to > enable it (at which point the upgrade could me made to fail pending > renaming the offending pool)? > > -- Glenn Good question. We went around a bit on the name, but an easy admin model won the day. This case will change the on-disk format so yes a zpool upgrade will be needed, and yes it should fail if a reserved name is used for the pool. There is precedent for this when "spare" was added. Thanks: Neil. From sacadmin Wed Mar 21 17:42:03 2007 Received: from sineb-mail-1.sun.com ([192.18.19.6]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l2M0g2m2020866 for ; Wed, 21 Mar 2007 17:42:03 -0700 (PDT) Received: from fe-apac-06.sun.com (fe-apac-06.sun.com [192.18.19.177] (may be forged)) by sineb-mail-1.sun.com (8.13.6+Sun/8.12.9) with ESMTP id l2M0fuCU008053 for ; Thu, 22 Mar 2007 00:41:57 GMT Received: from conversion-daemon.mail-apac.sun.com by mail-apac.sun.com (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) id <0JFA00B014D38X00@mail-apac.sun.com> (original mail from Darren.Reed@Sun.COM) for PSARC@sac.sfbay.sun.com; Thu, 22 Mar 2007 08:41:56 +0800 (SGT) Received: from [129.146.106.55] by mail-apac.sun.com (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) with ESMTPSA id <0JFA00LNH4LVIK31@mail-apac.sun.com>; Thu, 22 Mar 2007 08:41:56 +0800 (SGT) Date: Wed, 21 Mar 2007 17:41:12 -0700 From: Darren.Reed@Sun.COM Subject: Re: ZFS Separate Intent Log [PSARC/2007/171 Timeout: 03/28/2007] In-reply-to: <200703212048.l2LKmSXp388561@jurassic.eng.sun.com> Sender: Darren.Reed@Sun.COM To: Neil.Perrin@Sun.COM Cc: Mark Maybee , PSARC@sac.sfbay.sun.com Message-id: <4601D0A8.6060401@Sun.COM> MIME-version: 1.0 Content-type: text/plain; format=flowed; charset=us-ascii Content-transfer-encoding: 7BIT X-Accept-Language: en-us, en References: <200703212048.l2LKmSXp388561@jurassic.eng.sun.com> User-Agent: Mozilla/5.0 (X11; U; SunOS i86pc; en-US; rv:1.7) Gecko/20060120 Status: RO Content-Length: 1485 Mark Maybee wrote: >... > > zpool add log > Creates a separate log if it doesn't exist, or > adds extra devices if it does. > > zpool remove > Remove the log devices. If all log devices are removed > we revert to placing the log in the pool. Evacuating a > log is easily handled by ensuring all txgs are committed. > > zpool attach > Attaches a new log device to an existing log device. If > the existing device is not a mirror then a 2 way mirror > is created. If device is part of a two-way log mirror, > attaching new_device creates a three-way log mirror, > and so on. > > zpool detach pool > Detaches a log device from a mirror. > > Just a nit on the interface here, why is "add" the only place where the "log" keyword is present? From the configuration of the pool, yes, it should be obvious whether a device is being used for logging or data, but wouldn't it be more clear (and better for sanity checking of the command line options being used, both by the programs AND the administrator) to also require the "log" keyword elsewhere? How does "zpool scrub" interact with log devices? Are the log devices included or excluded? Also, "zpool replace" doesn't appear to be supported for log devices, is that likely to change in the future or is there some sort of limitation in the architecture for log devices that makes them ineligable for use with replace? Darren From sacadmin Wed Mar 21 17:58:22 2007 Received: from sfbaymail2sca.sfbay.sun.com (sfbaymail2sca.SFBay.Sun.COM [129.145.155.42]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l2M0wLss021411 for ; Wed, 21 Mar 2007 17:58:21 -0700 (PDT) Received: from brmea-mail-4.sun.com (brmea-mail-4.Sun.COM [192.18.98.36]) by sfbaymail2sca.sfbay.sun.com (8.13.6+Sun/8.12.10/ENSMAIL,v2.2) with ESMTP id l2M0wLF0021731 for ; Wed, 21 Mar 2007 17:58:21 -0700 (PDT) Received: from fe-amer-03.sun.com ([192.18.108.177]) by brmea-mail-4.sun.com (8.13.6+Sun/8.12.9) with ESMTP id l2M0wLin015647 for ; Thu, 22 Mar 2007 00:58:21 GMT Received: from conversion-daemon.mail-amer.sun.com by mail-amer.sun.com (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) id <0JFA00L014AUHS00@mail-amer.sun.com> (original mail from Neil.Perrin@Sun.COM) for PSARC@sac.sfbay.sun.com; Wed, 21 Mar 2007 18:58:21 -0600 (MDT) Received: from [129.147.9.35] by mail-amer.sun.com (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) with ESMTPSA id <0JFA00CJY5D9GIH0@mail-amer.sun.com>; Wed, 21 Mar 2007 18:58:21 -0600 (MDT) Date: Wed, 21 Mar 2007 18:58:20 -0600 From: Neil Perrin Subject: Re: ZFS Separate Intent Log [PSARC/2007/171 Timeout: 03/28/2007] In-reply-to: <4601D0A8.6060401@Sun.COM> Sender: Neil.Perrin@Sun.COM To: Darren.Reed@Sun.COM Cc: Mark Maybee , PSARC@sac.sfbay.sun.com Reply-to: Neil.Perrin@Sun.COM Message-id: <4601D4AC.2060308@Sun.COM> MIME-version: 1.0 Content-type: text/plain; format=flowed; charset=us-ascii Content-transfer-encoding: 7BIT X-Accept-Language: en-us, en References: <200703212048.l2LKmSXp388561@jurassic.eng.sun.com> <4601D0A8.6060401@Sun.COM> User-Agent: Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.7) Gecko/20060120 Status: RO Content-Length: 2280 Darren.Reed@Sun.COM wrote On 03/21/07 18:41,: > Mark Maybee wrote: > >> ... >> >> zpool add log >> Creates a separate log if it doesn't exist, or adds >> extra devices if it does. >> >> zpool remove >> Remove the log devices. If all log devices are removed >> we revert to placing the log in the pool. Evacuating a >> log is easily handled by ensuring all txgs are committed. >> >> zpool attach >> Attaches a new log device to an existing log device. If >> the existing device is not a mirror then a 2 way mirror >> is created. If device is part of a two-way log mirror, >> attaching new_device creates a three-way log mirror, >> and so on. >> >> zpool detach pool >> Detaches a log device from a mirror. >> >> > > Just a nit on the interface here, why is "add" the only place > where the "log" keyword is present? It's also on "create". > > From the configuration of the pool, yes, it should be obvious > whether a device is being used for logging or data, but wouldn't > it be more clear (and better for sanity checking of the command > line options being used, both by the programs AND the administrator) > to also require the "log" keyword elsewhere? The scheme follows the method used for adding/removing spares. The "log" specifier is not needed in other cases as internally we can determine its type, so is not required. I do see what you mean about lack of symmetry though. > > How does "zpool scrub" interact with log devices? Scrubbing is not done to the log. The log holds future committed transactions for the pool. Each log block is ephemeral - lasting only seconds (a few txgs). > Are the log devices included or excluded? They are excluded. > > Also, "zpool replace" doesn't appear to be supported for log > devices, is that likely to change in the future or is there some > sort of limitation in the architecture for log devices that makes > them ineligable for use with replace? Er, well caught, thanks. Somehow this got dropped. This should be relatively easy to implement and will be supported. > > Darren > Thanks for your feedback: Neil. From sacadmin Thu Mar 22 03:39:58 2007 Received: from bebop.France.Sun.COM (bebop.France.Sun.COM [129.157.174.15]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l2MAdvF2000982 for ; Thu, 22 Mar 2007 03:39:58 -0700 (PDT) Received: from corn.Sun.COM (corn [129.157.192.240]) by bebop.France.Sun.COM (8.13.8+Sun/8.13.8) with SMTP id l2MAdpfJ018335; Thu, 22 Mar 2007 11:39:51 +0100 (MET) Message-ID: <17922.23686.964792.928122@gargle.gargle.HOWL> Date: Thu, 22 Mar 2007 11:37:58 +0100 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit From: Roch - PAE To: Mark Maybee Cc: PSARC@sac.sfbay.sun.com, Neil.Perrin@Sun.Com Subject: Re: ZFS Separate Intent Log [PSARC/2007/171 Timeout: 03/28/2007] In-Reply-To: <200703212048.l2LKmSXp388561@jurassic.eng.sun.com> References: <200703212048.l2LKmSXp388561@jurassic.eng.sun.com> X-Mailer: VM 7.19 under 21.1 (patch 3) "Acadia" XEmacs Lucid Organization: SUN Microsystems Phone: (+33).4.76.18.83.20 (x[70]38320) Status: RO Content-Length: 8152 The ZFS Intent log (zil) is currently protected to the level of the weakest vdev. A pool with mirrored vdevs has mirrored ZILs. Should we not preserve this characteristic and prevent single device logs associated with mirrored or raidz pools. For instance we should raise an alarm upon this scenario : # zpool create pool mirror c0d0 c1d0 log c2d0 -r Mark Maybee writes: > Subject: PSARC FastTrack [03/28/2007]: ZFS Separate Intent Log > > > Template Version: @(#)sac_nextcase %I% %G% SMI > This information Copyright 2007 Sun Microsystems > 1. Introduction > 1.1. Project/Component Working Name: > ZFS Separate Intent Log > 1.2. Name of Document Author/Supplier: > Author: Neil Perrin > 1.3 Date of This Document: > 21 March, 2007 > 4. Technical Description > > This case adds extensions to several existing zpool commands to allow > separate log devices to be created and manipulated. It also extends > the output of some commands to include log device status. The stability > of these changes is committed, and the release binding is patch/micro. > > SUMMARY: > > This is a proposal to allow separate devices to be used > for the ZFS Intent Log (ZIL). The sole purpose of this is > performance. The devices can be disks, solid state drives, > nvram drives, or any device that presents a block interface. > > PROBLEM: > > The ZIL satisfies the synchronous requirements of POSIX. > For instance, databases often require their > transactions to be on stable storage on return from the system > call. NFS and other applications can also use fsync() to ensure > data stability. The speed of the ZIL is therefore essential in > determining the latency of writes for these critical applications. > > Currently the ZIL is allocated dynamically from the pool. > It consists of a chain of varying block sizes which are > anchored in fixed objects. Blocks are sized to fit the > demand and will come from different metaslabs and thus > different areas of the disk. This causes more head movement. > > Furthermore, the log blocks are freed as soon as the intent > log transaction (system call) is committed. So a swiss cheesing > effect can occur leading to pool fragmentation. > > PROPOSED SOLUTION: > > This proposal takes advantage of the greatly faster media speeds > of nvram, solid state disks, or even dedicated disks. > To this end, additional extensions to the zpool command > are defined: > > zpool create log > Creates a pool with a separate log. If more than one > log device is specified then writes are load-balanced > between devices. It's also possible to mirror log > devices. For example a log consisting of > two sets of two mirrors could be created thus: > > zpool create \ > log mirror c1t8d0 c1t9d0 mirror c1t10d0 c1t11d0 > > A raidz/raidz2 log is not supported, nor is placing logs > on files. > > zpool add log > Creates a separate log if it doesn't exist, or > adds extra devices if it does. > > zpool remove > Remove the log devices. If all log devices are removed > we revert to placing the log in the pool. Evacuating a > log is easily handled by ensuring all txgs are committed. > > zpool attach > Attaches a new log device to an existing log device. If > the existing device is not a mirror then a 2 way mirror > is created. If device is part of a two-way log mirror, > attaching new_device creates a three-way log mirror, > and so on. > > zpool detach pool > Detaches a log device from a mirror. > > zpool status > Additionally displays the log devices > > zpool iostat > Additionally shows IO statistics for log devices. > > zpool export/import > Will export and import the log devices. > > When a separate log that is not mirrored fails then > logging will start using chained logs within the main pool. > > The name "log" will become a reserved word. Attempts to create > a pool with the name "log" will fail with: > > "cannot create 'log': name is reserved > pool name may have been omitted" > > Hot spares cannot replace log devices. > > BINDING: > > A micro/patch binding is requested. > > MAN PAGE CHANGES: > > *** zpool.ori Fri Mar 16 10:05:31 2007 > --- zpool.new Sat Mar 17 00:50:24 2007 > *************** > *** 169,174 **** > --- 169,180 ---- > hot spares for a pool. For more information, see the > "Hot Spares" section. > > + log A separate intent log device. If more than one > + log device is specified then writes are load-balanced > + between devices. Log devices can be also be mirrored. > + However, neither raidz/raidz2 nor files are supported > + for the intent log. For more information, see the > + "Intent Log" section. > > > > *************** > *** 286,292 **** > --- 292,301 ---- > pools. > > > + Spares cannot replace log devices. > > + > + > Alternate Root Pools > The "zpool create -R" and "zpool import -R" commands allow > users to create and import a pool with a different root > *************** > *** 313,318 **** > --- 322,346 ---- > > > > + Intent Log > + The ZFS Intent Log satisfies the synchronous requirements of POSIX. > + For instance, databases often require their transactions > + to be on stable storage on return from the system call. > + NFS and other applications can also use fsync() to ensure > + data stability. By default, the intent log is allocated from > + blocks within the pool, however for greater performance, separate > + intent log device(s) can be specified. For example, > + > + # zpool create pool c0d0 c1d0 log c2d0 > + > + Multiple log devices can also be specified, and they can be > + mirrored. See the EXAMPLES section later for an example of this. > + > + Log devices can be added with the "zpool add" command and > + removed with the "zpool remove" command. > + > + > + > Subcommands > All subcommands that modify state are logged persistently to > the pool in their original form. > *************** > *** 355,362 **** > ices specified on the command line. The pool name must > begin with a letter, and can only contain alphanumeric > characters as well as underscore ("_"), dash ("-"), and > ! period ("."). The pool names "mirror", "raidz", and > ! "spare" are reserved, as are names beginning with the > pattern "c[0-9]". The vdev specification is described in > the "Virtual Devices" section. > > --- 383,390 ---- > ices specified on the command line. The pool name must > begin with a letter, and can only contain alphanumeric > characters as well as underscore ("_"), dash ("-"), and > ! period ("."). The pool names "mirror", "raidz", > ! "spare", and "log" are reserved, as are names beginning with the > pattern "c[0-9]". The vdev specification is described in > the "Virtual Devices" section. > > *************** > *** 1283,1288 **** > --- 1311,1324 ---- > # zpool remove tank c0t2d0 > > > + Example 11 Creating a pool with mirrored separate intent logs. > + > + The following command creates a pool with a separate intent > + log consisting of two sets of two mirrors: > + > + # zpool create pool c0d0 c1d0 log mirror c2d0 c3d0 mirror c4d0 c5d0 > + > + > > EXIT STATUS > The following exit values are returned: > > 6. Resources and Schedule > 6.4. Steering Committee requested information > 6.4.1. Consolidation C-team Name: > ON > 6.5. ARC review type: FastTrack From sacadmin Thu Mar 22 04:08:59 2007 Received: from bebop.France.Sun.COM (bebop.France.Sun.COM [129.157.174.15]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l2MB8wOR001738 for ; Thu, 22 Mar 2007 04:08:58 -0700 (PDT) Received: from corn.Sun.COM (corn [129.157.192.240]) by bebop.France.Sun.COM (8.13.8+Sun/8.13.8) with SMTP id l2MB8tWI018617; Thu, 22 Mar 2007 12:08:55 +0100 (MET) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <17922.25431.158509.54336@gargle.gargle.HOWL> Date: Thu, 22 Mar 2007 12:07:03 +0100 From: Roch - PAE To: Mark Maybee Cc: PSARC@sac.sfbay.sun.com, Neil.Perrin@Sun.Com Subject: Re: ZFS Separate Intent Log [PSARC/2007/171 Timeout: 03/28/2007] In-Reply-To: <200703212048.l2LKmSXp388561@jurassic.eng.sun.com> References: <200703212048.l2LKmSXp388561@jurassic.eng.sun.com> X-Mailer: VM 7.19 under 21.1 (patch 3) "Acadia" XEmacs Lucid Organization: SUN Microsystems Phone: (+33).4.76.18.83.20 (x[70]38320) Status: RO Content-Length: 8463 This cases establishes a set of devices that are to be used exclusively for storing ZFS Intent logs. Since the ZIL are per filesystems structures, the current proposal is to have each filesystem use the new log devices for the ZIL. There might be a future need to control the use or not of the dedicated device per individual ZFS. For instance a DB redo logs would be a real target for using this extension however, in the interest of preserving the fast log, the main DB space might want to avoid commiting to it. So we might just open up the wording and dissociated the existence of log-only devices and its implied use by every filesystem logs. -r Mark Maybee writes: > Subject: PSARC FastTrack [03/28/2007]: ZFS Separate Intent Log > > > Template Version: @(#)sac_nextcase %I% %G% SMI > This information Copyright 2007 Sun Microsystems > 1. Introduction > 1.1. Project/Component Working Name: > ZFS Separate Intent Log > 1.2. Name of Document Author/Supplier: > Author: Neil Perrin > 1.3 Date of This Document: > 21 March, 2007 > 4. Technical Description > > This case adds extensions to several existing zpool commands to allow > separate log devices to be created and manipulated. It also extends > the output of some commands to include log device status. The stability > of these changes is committed, and the release binding is patch/micro. > > SUMMARY: > > This is a proposal to allow separate devices to be used > for the ZFS Intent Log (ZIL). The sole purpose of this is > performance. The devices can be disks, solid state drives, > nvram drives, or any device that presents a block interface. > > PROBLEM: > > The ZIL satisfies the synchronous requirements of POSIX. > For instance, databases often require their > transactions to be on stable storage on return from the system > call. NFS and other applications can also use fsync() to ensure > data stability. The speed of the ZIL is therefore essential in > determining the latency of writes for these critical applications. > > Currently the ZIL is allocated dynamically from the pool. > It consists of a chain of varying block sizes which are > anchored in fixed objects. Blocks are sized to fit the > demand and will come from different metaslabs and thus > different areas of the disk. This causes more head movement. > > Furthermore, the log blocks are freed as soon as the intent > log transaction (system call) is committed. So a swiss cheesing > effect can occur leading to pool fragmentation. > > PROPOSED SOLUTION: > > This proposal takes advantage of the greatly faster media speeds > of nvram, solid state disks, or even dedicated disks. > To this end, additional extensions to the zpool command > are defined: > > zpool create log > Creates a pool with a separate log. If more than one > log device is specified then writes are load-balanced > between devices. It's also possible to mirror log > devices. For example a log consisting of > two sets of two mirrors could be created thus: > > zpool create \ > log mirror c1t8d0 c1t9d0 mirror c1t10d0 c1t11d0 > > A raidz/raidz2 log is not supported, nor is placing logs > on files. > > zpool add log > Creates a separate log if it doesn't exist, or > adds extra devices if it does. > > zpool remove > Remove the log devices. If all log devices are removed > we revert to placing the log in the pool. Evacuating a > log is easily handled by ensuring all txgs are committed. > > zpool attach > Attaches a new log device to an existing log device. If > the existing device is not a mirror then a 2 way mirror > is created. If device is part of a two-way log mirror, > attaching new_device creates a three-way log mirror, > and so on. > > zpool detach pool > Detaches a log device from a mirror. > > zpool status > Additionally displays the log devices > > zpool iostat > Additionally shows IO statistics for log devices. > > zpool export/import > Will export and import the log devices. > > When a separate log that is not mirrored fails then > logging will start using chained logs within the main pool. > > The name "log" will become a reserved word. Attempts to create > a pool with the name "log" will fail with: > > "cannot create 'log': name is reserved > pool name may have been omitted" > > Hot spares cannot replace log devices. > > BINDING: > > A micro/patch binding is requested. > > MAN PAGE CHANGES: > > *** zpool.ori Fri Mar 16 10:05:31 2007 > --- zpool.new Sat Mar 17 00:50:24 2007 > *************** > *** 169,174 **** > --- 169,180 ---- > hot spares for a pool. For more information, see the > "Hot Spares" section. > > + log A separate intent log device. If more than one > + log device is specified then writes are load-balanced > + between devices. Log devices can be also be mirrored. > + However, neither raidz/raidz2 nor files are supported > + for the intent log. For more information, see the > + "Intent Log" section. > > > > *************** > *** 286,292 **** > --- 292,301 ---- > pools. > > > + Spares cannot replace log devices. > > + > + > Alternate Root Pools > The "zpool create -R" and "zpool import -R" commands allow > users to create and import a pool with a different root > *************** > *** 313,318 **** > --- 322,346 ---- > > > > + Intent Log > + The ZFS Intent Log satisfies the synchronous requirements of POSIX. > + For instance, databases often require their transactions > + to be on stable storage on return from the system call. > + NFS and other applications can also use fsync() to ensure > + data stability. By default, the intent log is allocated from > + blocks within the pool, however for greater performance, separate > + intent log device(s) can be specified. For example, > + > + # zpool create pool c0d0 c1d0 log c2d0 > + > + Multiple log devices can also be specified, and they can be > + mirrored. See the EXAMPLES section later for an example of this. > + > + Log devices can be added with the "zpool add" command and > + removed with the "zpool remove" command. > + > + > + > Subcommands > All subcommands that modify state are logged persistently to > the pool in their original form. > *************** > *** 355,362 **** > ices specified on the command line. The pool name must > begin with a letter, and can only contain alphanumeric > characters as well as underscore ("_"), dash ("-"), and > ! period ("."). The pool names "mirror", "raidz", and > ! "spare" are reserved, as are names beginning with the > pattern "c[0-9]". The vdev specification is described in > the "Virtual Devices" section. > > --- 383,390 ---- > ices specified on the command line. The pool name must > begin with a letter, and can only contain alphanumeric > characters as well as underscore ("_"), dash ("-"), and > ! period ("."). The pool names "mirror", "raidz", > ! "spare", and "log" are reserved, as are names beginning with the > pattern "c[0-9]". The vdev specification is described in > the "Virtual Devices" section. > > *************** > *** 1283,1288 **** > --- 1311,1324 ---- > # zpool remove tank c0t2d0 > > > + Example 11 Creating a pool with mirrored separate intent logs. > + > + The following command creates a pool with a separate intent > + log consisting of two sets of two mirrors: > + > + # zpool create pool c0d0 c1d0 log mirror c2d0 c3d0 mirror c4d0 c5d0 > + > + > > EXIT STATUS > The following exit values are returned: > > 6. Resources and Schedule > 6.4. Steering Committee requested information > 6.4.1. Consolidation C-team Name: > ON > 6.5. ARC review type: FastTrack From sacadmin Thu Mar 22 07:02:38 2007 Received: from sfbaymail1sca.SFBay.Sun.COM (sfbaymail1sca.SFBay.Sun.COM [129.145.154.35]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l2ME2cYW004559 for ; Thu, 22 Mar 2007 07:02:38 -0700 (PDT) Received: from gmp-ea-fw-1.sun.com (gmpes-gis-mail-1.UK.Sun.COM [129.156.42.5]) by sfbaymail1sca.SFBay.Sun.COM (8.13.6+Sun/8.13.6/ENSMAIL,v2.2) with ESMTP id l2ME2a3N016982 for ; Thu, 22 Mar 2007 07:02:37 -0700 (PDT) Received: from d1-emea-09.sun.com (d1-emea-09.sun.com [192.18.2.119]) by gmp-ea-fw-1.sun.com (8.13.6+Sun/8.12.9) with ESMTP id l2ME2Vr9014581 for ; Thu, 22 Mar 2007 14:02:31 GMT Received: from conversion-daemon.d1-emea-09.sun.com by d1-emea-09.sun.com (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) id <0JFB009015AVWS00@d1-emea-09.sun.com> (original mail from Darren.Moffat@Sun.COM) for PSARC@sac.sfbay.sun.com; Thu, 22 Mar 2007 14:02:31 +0000 (GMT) Received: from [192.168.73.101] (nessieroo.force9.co.uk [81.174.224.49]) by d1-emea-09.sun.com (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) with ESMTPSA id <0JFB00C2W5NWDT30@d1-emea-09.sun.com>; Thu, 22 Mar 2007 14:02:21 +0000 (GMT) Date: Thu, 22 Mar 2007 14:02:20 +0000 From: Darren J Moffat Subject: Re: ZFS Separate Intent Log [PSARC/2007/171 Timeout: 03/28/2007] In-reply-to: <200703212048.l2LKmSXp388561@jurassic.eng.sun.com> Sender: Darren.Moffat@Sun.COM To: Mark Maybee Cc: PSARC@sac.sfbay.sun.com, Neil.Perrin@Sun.COM Message-id: <46028C6C.7020706@Sun.COM> MIME-version: 1.0 Content-type: text/plain; format=flowed; charset=ISO-8859-1 Content-transfer-encoding: 7BIT References: <200703212048.l2LKmSXp388561@jurassic.eng.sun.com> User-Agent: Thunderbird 1.5.0.8 (X11/20061127) Status: RO Content-Length: 214 How big a disk is needed to hold the ZIL ? Is it a fixed size of some dynamic size based on pool size ? What happens if the log disk fills up ? What is the minimum size disk for the log disk ? -- Darren J Moffat From sacadmin Thu Mar 22 07:33:22 2007 Received: from bebop.France.Sun.COM (bebop.France.Sun.COM [129.157.174.15]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l2MEXLbs005032 for ; Thu, 22 Mar 2007 07:33:21 -0700 (PDT) Received: from corn.Sun.COM (corn [129.157.192.240]) by bebop.France.Sun.COM (8.13.8+Sun/8.13.8) with SMTP id l2MEXINC022847; Thu, 22 Mar 2007 15:33:18 +0100 (MET) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <17922.37694.378357.200649@gargle.gargle.HOWL> Date: Thu, 22 Mar 2007 15:31:26 +0100 From: Roch - PAE To: Darren J Moffat Cc: Mark Maybee , PSARC@sac.sfbay.sun.com, Neil.Perrin@Sun.Com Subject: Re: ZFS Separate Intent Log [PSARC/2007/171 Timeout: 03/28/2007] In-Reply-To: <46028C6C.7020706@Sun.COM> References: <200703212048.l2LKmSXp388561@jurassic.eng.sun.com> <46028C6C.7020706@Sun.COM> X-Mailer: VM 7.19 under 21.1 (patch 3) "Acadia" XEmacs Lucid Organization: SUN Microsystems Phone: (+33).4.76.18.83.20 (x[70]38320) Status: RO Content-Length: 552 Darren J Moffat writes: > How big a disk is needed to hold the ZIL ? > Is it a fixed size of some dynamic size based on pool size ? > What happens if the log disk fills up ? The ZIL maintains state for 5 seconds between storage pool sync. So we can either estimate this to be "5sec * #pool disk * disk throughput" or "5sec * data channel capacity" Implementation wise, if the log device is half filled, I assume we would cut a transaction at that point. > > What is the minimum size disk for the log disk ? > > -- > Darren J Moffat From sacadmin Thu Mar 22 08:33:46 2007 Received: from localhost.east.sun.com (punchin-sommerfeld.East.Sun.COM [129.148.19.3]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l2MFXjZf006716 for ; Thu, 22 Mar 2007 08:33:45 -0700 (PDT) Received: from localhost.east.sun.com (localhost [127.0.0.1]) by localhost.east.sun.com (8.13.8+Sun/8.13.8) with ESMTP id l2MFXIWR008172; Thu, 22 Mar 2007 15:33:18 GMT Received: (from sommerfeld@localhost) by localhost.east.sun.com (8.13.8+Sun/8.13.8/Submit) id l2MFXHS8008171; Thu, 22 Mar 2007 11:33:17 -0400 (EDT) X-Authentication-Warning: localhost.east.sun.com: sommerfeld set sender to sommerfeld@sun.com using -f Subject: Re: ZFS Separate Intent Log [PSARC/2007/171 Timeout: 03/28/2007] From: Bill Sommerfeld To: Mark Maybee Cc: PSARC@sac.sfbay.sun.com, Neil.Perrin@sun.com In-Reply-To: <200703212048.l2LKmSXp388561@jurassic.eng.sun.com> References: <200703212048.l2LKmSXp388561@jurassic.eng.sun.com> Content-Type: text/plain Content-Transfer-Encoding: 7bit Date: Thu, 22 Mar 2007 11:33:17 -0400 Message-Id: <1174577597.7927.45.camel@localhost> Mime-Version: 1.0 X-Mailer: Evolution 2.6.2 Status: RO Content-Length: 1686 As I understand it, loss of the information in the intent log means that the last few seconds of changes to a pool have been lost, but the pool is otherwise intact. What happens if dedicated intent log devices are missing or unreadable when zfs needs to read them? in at least some cases, the intent log devices are going to be different (nvram vs. disk, possibly in different enclosures, or reachable via different paths) and as a result a system might temporarily or permanently lose access to all log devices while continuing to be able to reach the main pool devices. my expectation would be that: 1) all I/O to the pool (or the part of the pool covered by the intent log) would fail (writes might destroy things; reads might return stale data) until the missing intent log devices surfaced (they might only be temporarily unreachable). 2) there would be some way to tell zfs that the intent log contents were gone forever but we can cope with falling back to the state of the world at the time of the last committed transaction group. rationale: the intent log is there for a reason; if you expect to find it and it's not there it's better to prevent any further damage, *but* if it is actually unrecoverable, the pool minus the last N seconds of changes may still be more current than your most recent offline backups, and forcing you to blow the entire pool away when all that was unrecoverable was the intent log could lead to psychotic reactions in the sysadmins responsible for putting things back together.. (I'm not going to hold up this case waiting for a spec for part (2). I'm more interested in hearing what this case proposes to deliver..) - Bill From sacadmin Thu Mar 22 11:42:42 2007 Received: from sineb-mail-1.sun.com ([192.18.19.6]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l2MIgfmX012175 for ; Thu, 22 Mar 2007 11:42:42 -0700 (PDT) Received: from fe-apac-02.sun.com (fe-apac-02.sun.com [192.18.19.173] (may be forged)) by sineb-mail-1.sun.com (8.13.6+Sun/8.12.9) with ESMTP id l2MIgarH002746 for ; Thu, 22 Mar 2007 18:42:36 GMT Received: from conversion-daemon.mail-apac.sun.com by mail-apac.sun.com (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) id <0JFB00201GF7OJ00@mail-apac.sun.com> (original mail from Darren.Reed@Sun.COM) for PSARC@sac.sfbay.sun.com; Fri, 23 Mar 2007 02:42:35 +0800 (SGT) Received: from [129.146.106.55] by mail-apac.sun.com (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) with ESMTPSA id <0JFB00HARIMY6HI1@mail-apac.sun.com>; Fri, 23 Mar 2007 02:42:35 +0800 (SGT) Date: Thu, 22 Mar 2007 11:41:49 -0700 From: Darren.Reed@Sun.COM Subject: Re: ZFS Separate Intent Log [PSARC/2007/171 Timeout: 03/28/2007] In-reply-to: <4601D4AC.2060308@Sun.COM> Sender: Darren.Reed@Sun.COM To: Neil.Perrin@Sun.COM Cc: Mark Maybee , PSARC@sac.sfbay.sun.com Message-id: <4602CDED.9060100@Sun.COM> MIME-version: 1.0 Content-type: text/plain; format=flowed; charset=ISO-8859-1 Content-transfer-encoding: 7BIT X-Accept-Language: en-us, en References: <200703212048.l2LKmSXp388561@jurassic.eng.sun.com> <4601D0A8.6060401@Sun.COM> <4601D4AC.2060308@Sun.COM> User-Agent: Mozilla/5.0 (X11; U; SunOS i86pc; en-US; rv:1.7) Gecko/20060120 Status: RO Content-Length: 1626 Neil Perrin wrote: > > > Darren.Reed@Sun.COM wrote On 03/21/07 18:41,: > ... > >> >> From the configuration of the pool, yes, it should be obvious >> whether a device is being used for logging or data, but wouldn't >> it be more clear (and better for sanity checking of the command >> line options being used, both by the programs AND the administrator) >> to also require the "log" keyword elsewhere? > > > The scheme follows the method used for adding/removing spares. > The "log" specifier is not needed in other cases as internally we can > determine > its type, so is not required. I do see what you mean about lack of > symmetry though. I give you two different commands: zpool add mypool log /dev/dsk/c1t1d0s6 zpool remove mypool /dev/dsk/c2t3d0s6 With respect to reviewing command history or even just in scripts or pasted into logs, there is a distinct difference in the amount of information carried in each of the above commands. If I replace the latter with: zpool remove mypool spare /dev/dsk/c2t3d0s6 or even: zpool remove mypool log /dev/dsk/c2t3d0s6 any ambiguity that may have existed is removed. For a human to otherwise understand what the 2nd command was intended to do requires knowledge of what's in the pool at that point in time. In general, shell history files, and scripts don't contain that information. Having the extra keyword also protects you from finger trouble, for example, if "s6" is the log device but "s7" is configured as a spare, then accidently typing "zpool remove ..s7" might have unintended consequenes and not even be noticed until some later point in time. Darren From sacadmin Thu Mar 22 14:03:57 2007 Received: from sfbaymail2sca.sfbay.sun.com (sfbaymail2sca.SFBay.Sun.COM [129.145.155.42]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l2ML3uWT016806 for ; Thu, 22 Mar 2007 14:03:56 -0700 (PDT) Received: from brmea-mail-1.sun.com (brmea-mail-1.Sun.COM [192.18.98.31]) by sfbaymail2sca.sfbay.sun.com (8.13.6+Sun/8.12.10/ENSMAIL,v2.2) with ESMTP id l2ML3uBi001690 for ; Thu, 22 Mar 2007 14:03:56 -0700 (PDT) Received: from fe-amer-04.sun.com ([192.18.108.178]) by brmea-mail-1.sun.com (8.13.6+Sun/8.12.9) with ESMTP id l2ML3u5a014886 for ; Thu, 22 Mar 2007 21:03:56 GMT Received: from conversion-daemon.mail-amer.sun.com by mail-amer.sun.com (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) id <0JFB00H01P0GFP00@mail-amer.sun.com> (original mail from Neil.Perrin@Sun.COM) for PSARC@sac.sfbay.sun.com; Thu, 22 Mar 2007 15:03:56 -0600 (MDT) Received: from [129.147.9.35] by mail-amer.sun.com (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) with ESMTPSA id <0JFB008UKP6IYRS4@mail-amer.sun.com>; Thu, 22 Mar 2007 15:03:54 -0600 (MDT) Date: Thu, 22 Mar 2007 15:03:54 -0600 From: Neil Perrin Subject: Re: ZFS Separate Intent Log [PSARC/2007/171 Timeout: 03/28/2007] In-reply-to: <17922.23686.964792.928122@gargle.gargle.HOWL> Sender: Neil.Perrin@Sun.COM To: Roch - PAE Cc: Mark Maybee , PSARC@sac.sfbay.sun.com Reply-to: Neil.Perrin@Sun.COM Message-id: <4602EF3A.1000309@Sun.COM> MIME-version: 1.0 Content-type: text/plain; format=flowed; charset=us-ascii Content-transfer-encoding: 7BIT X-Accept-Language: en-us, en References: <200703212048.l2LKmSXp388561@jurassic.eng.sun.com> <17922.23686.964792.928122@gargle.gargle.HOWL> User-Agent: Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.7) Gecko/20060120 Status: RO Content-Length: 1332 Good point. I don't think we should disallow such configurations. It maybe that the log device is much more relaible. However, I think we should we should issue a warning and fail. The admin can override this using the force flag. this is akin to: : mull ; zpool create whirl c1t10d0 mirror c1t8d0 c1t9d0 invalid vdev specification use '-f' to override the following errors: mismatched replication level: both disk and mirror vdevs are present : mull ; zpool create -f whirl c1t10d0 mirror c1t8d0 c1t9d0 : mull ; zpool status pool: whirl state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM whirl ONLINE 0 0 0 c1t10d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t8d0 ONLINE 0 0 0 c1t9d0 ONLINE 0 0 0 errors: No known data errors : mull ; Roch - PAE wrote On 03/22/07 04:37,: > The ZFS Intent log (zil) is currently protected to the level > of the weakest vdev. A pool with mirrored vdevs has mirrored > ZILs. Should we not preserve this characteristic and prevent > single device logs associated with mirrored or raidz pools. > For instance we should raise an alarm upon this scenario : > > # zpool create pool mirror c0d0 c1d0 log c2d0 > > -r From sacadmin Thu Mar 22 14:56:04 2007 Received: from sfbaymail2sca.sfbay.sun.com (sfbaymail2sca.SFBay.Sun.COM [129.145.155.42]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l2MLu3TS019576 for ; Thu, 22 Mar 2007 14:56:03 -0700 (PDT) Received: from brmea-mail-3.sun.com (brmea-mail-3.Sun.COM [192.18.98.34]) by sfbaymail2sca.sfbay.sun.com (8.13.6+Sun/8.12.10/ENSMAIL,v2.2) with ESMTP id l2MLu3GB026533 for ; Thu, 22 Mar 2007 14:56:03 -0700 (PDT) Received: from fe-amer-04.sun.com ([192.18.108.178]) by brmea-mail-3.sun.com (8.13.6+Sun/8.12.9) with ESMTP id l2MLu3WC006426 for ; Thu, 22 Mar 2007 21:56:03 GMT Received: from conversion-daemon.mail-amer.sun.com by mail-amer.sun.com (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) id <0JFB00801RGDKZ00@mail-amer.sun.com> (original mail from Neil.Perrin@Sun.COM) for PSARC@sac.sfbay.sun.com; Thu, 22 Mar 2007 15:56:03 -0600 (MDT) Received: from [129.147.9.35] by mail-amer.sun.com (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) with ESMTPSA id <0JFB008SXRLEZAU4@mail-amer.sun.com>; Thu, 22 Mar 2007 15:56:03 -0600 (MDT) Date: Thu, 22 Mar 2007 15:56:02 -0600 From: Neil Perrin Subject: Re: ZFS Separate Intent Log [PSARC/2007/171 Timeout: 03/28/2007] In-reply-to: <46028C6C.7020706@Sun.COM> Sender: Neil.Perrin@Sun.COM To: Darren J Moffat Cc: Mark Maybee , PSARC@sac.sfbay.sun.com Reply-to: Neil.Perrin@Sun.COM Message-id: <4602FB72.4090902@Sun.COM> MIME-version: 1.0 Content-type: text/plain; format=flowed; charset=us-ascii Content-transfer-encoding: 7BIT X-Accept-Language: en-us, en References: <200703212048.l2LKmSXp388561@jurassic.eng.sun.com> <46028C6C.7020706@Sun.COM> User-Agent: Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.7) Gecko/20060120 Status: RO Content-Length: 767 Darren J Moffat wrote On 03/22/07 08:02,: > How big a disk is needed to hold the ZIL ? It depends on system synchronous activity (ie how much data and meta data is being fsynced or flushed due to O_DSYNC) > Is it a fixed size of some dynamic size based on pool size ? Neither. It is dynamic but based on system synchronous activity > What happens if the log disk fills up ? If the log device becomes full we have the option of waiting for the oldest txg to sync (hopefully rare) or reverting to using the the main pool for the log. I'm now leaning towards the later, as it turns out to be fairly easy to implement. > > What is the minimum size disk for the log disk ? This should be the same as the mini zpool device size of 64MB. > > -- > Darren J Moffat From sacadmin Thu Mar 22 15:45:57 2007 Received: from sfbaymail1sca.SFBay.Sun.COM (sfbaymail1sca.SFBay.Sun.COM [129.145.154.35]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l2MMjvhM020853 for ; Thu, 22 Mar 2007 15:45:57 -0700 (PDT) Received: from brmea-mail-2.sun.com (brmea-mail-2.Sun.COM [192.18.98.43]) by sfbaymail1sca.SFBay.Sun.COM (8.13.6+Sun/8.13.6/ENSMAIL,v2.2) with ESMTP id l2MMjuEr027876 for ; Thu, 22 Mar 2007 15:45:56 -0700 (PDT) Received: from fe-amer-04.sun.com ([192.18.108.178]) by brmea-mail-2.sun.com (8.13.6+Sun/8.12.9) with ESMTP id l2MMjuWJ012795 for ; Thu, 22 Mar 2007 22:45:56 GMT Received: from conversion-daemon.mail-amer.sun.com by mail-amer.sun.com (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) id <0JFB00G01TRCH300@mail-amer.sun.com> (original mail from Neil.Perrin@Sun.COM) for PSARC@sac.sfbay.sun.com; Thu, 22 Mar 2007 16:45:56 -0600 (MDT) Received: from [129.147.9.35] by mail-amer.sun.com (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) with ESMTPSA id <0JFB0089BTWKZEQ4@mail-amer.sun.com>; Thu, 22 Mar 2007 16:45:56 -0600 (MDT) Date: Thu, 22 Mar 2007 16:45:56 -0600 From: Neil Perrin Subject: Re: ZFS Separate Intent Log [PSARC/2007/171 Timeout: 03/28/2007] In-reply-to: <1174577597.7927.45.camel@localhost> Sender: Neil.Perrin@Sun.COM To: Bill Sommerfeld Cc: Mark Maybee , PSARC@sac.sfbay.sun.com Reply-to: Neil.Perrin@Sun.COM Message-id: <46030724.3070106@Sun.COM> MIME-version: 1.0 Content-type: text/plain; format=flowed; charset=us-ascii Content-transfer-encoding: 7BIT X-Accept-Language: en-us, en References: <200703212048.l2LKmSXp388561@jurassic.eng.sun.com> <1174577597.7927.45.camel@localhost> User-Agent: Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.7) Gecko/20060120 Status: RO Content-Length: 2279 Bill Sommerfeld wrote On 03/22/07 09:33,: > As I understand it, loss of the information in the intent log means that > the last few seconds of changes to a pool have been lost, but the pool > is otherwise intact. True. > > What happens if dedicated intent log devices are missing or unreadable > when zfs needs to read them? > > in at least some cases, the intent log devices are going to be different > (nvram vs. disk, possibly in different enclosures, or reachable via > different paths) and as a result a system might temporarily or > permanently lose access to all log devices while continuing to be able > to reach the main pool devices. > > my expectation would be that: > 1) all I/O to the pool (or the part of the pool covered by the intent > log) would fail (writes might destroy things; reads might return stale > data) until the missing intent log devices surfaced (they might only be > temporarily unreachable). Yes, the pool will go into the faulted state and remain there until the log is readable. > > 2) there would be some way to tell zfs that the intent log contents > were gone forever but we can cope with falling back to the state of the > world at the time of the last committed transaction group. > > rationale: the intent log is there for a reason; if you expect to find > it and it's not there it's better to prevent any further damage, *but* > if it is actually unrecoverable, the pool minus the last N seconds of > changes may still be more current than your most recent offline backups, > and forcing you to blow the entire pool away when all that was > unrecoverable was the intent log could lead to psychotic reactions in > the sysadmins responsible for putting things back together.. I agree it should be possible to force ignore a missing/unreadable log device (that isn't mirrored). No mechanism exists for this currently, but is obviously needed. As you suspected I do not want to make that part of this case. I'm on a very tight schedule to provide this functionality to fishworks. Anyway, nothing specified in this PSARC case prevents such a future enhancement. > > (I'm not going to hold up this case waiting for a spec for part (2). > I'm more interested in hearing what this case proposes to deliver..) > > - Bill From sacadmin Thu Mar 22 15:53:32 2007 Received: from sfbaymail2sca.sfbay.sun.com (sfbaymail2sca.SFBay.Sun.COM [129.145.155.42]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l2MMrWWq020900 for ; Thu, 22 Mar 2007 15:53:32 -0700 (PDT) Received: from brmea-mail-1.sun.com (brmea-mail-1.Sun.COM [192.18.98.31]) by sfbaymail2sca.sfbay.sun.com (8.13.6+Sun/8.12.10/ENSMAIL,v2.2) with ESMTP id l2MMrWah025109 for ; Thu, 22 Mar 2007 15:53:32 -0700 (PDT) Received: from fe-amer-04.sun.com ([192.18.108.178]) by brmea-mail-1.sun.com (8.13.6+Sun/8.12.9) with ESMTP id l2MMrWUk007712 for ; Thu, 22 Mar 2007 22:53:32 GMT Received: from conversion-daemon.mail-amer.sun.com by mail-amer.sun.com (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) id <0JFB00L01U6RVC00@mail-amer.sun.com> (original mail from Neil.Perrin@Sun.COM) for PSARC@sac.sfbay.sun.com; Thu, 22 Mar 2007 16:53:32 -0600 (MDT) Received: from [129.147.9.35] by mail-amer.sun.com (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) with ESMTPSA id <0JFB008KVU97ZEQ4@mail-amer.sun.com>; Thu, 22 Mar 2007 16:53:31 -0600 (MDT) Date: Thu, 22 Mar 2007 16:53:31 -0600 From: Neil Perrin Subject: Re: ZFS Separate Intent Log [PSARC/2007/171 Timeout: 03/28/2007] In-reply-to: <4602CDED.9060100@Sun.COM> Sender: Neil.Perrin@Sun.COM To: Darren.Reed@Sun.COM Cc: Mark Maybee , PSARC@sac.sfbay.sun.com Reply-to: Neil.Perrin@Sun.COM Message-id: <460308EB.20707@Sun.COM> MIME-version: 1.0 Content-type: text/plain; format=flowed; charset=us-ascii Content-transfer-encoding: 7BIT X-Accept-Language: en-us, en References: <200703212048.l2LKmSXp388561@jurassic.eng.sun.com> <4601D0A8.6060401@Sun.COM> <4601D4AC.2060308@Sun.COM> <4602CDED.9060100@Sun.COM> User-Agent: Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.7) Gecko/20060120 Status: RO Content-Length: 2627 I agree there could be some confusion, but it is possible to see the current configuration with "zpool status " and a history with "zpool history". For example: : mull ; zpool status pool: whirl state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM whirl ONLINE 0 0 0 c1t10d0 ONLINE 0 0 0 logs c1t8d0 ONLINE 0 0 0 c1t9d0 ONLINE 0 0 0 spares c1t11d0 AVAIL errors: No known data errors : mull ; zpool history History for 'whirl': 2007-03-22.16:48:36 zpool create whirl c1t10d0 log c1t8d0 2007-03-22.16:49:32 zpool add whirl log c1t9d0 2007-03-22.16:51:15 zpool add whirl spare c1t11d0 : mull ; I think it's better to fit in with the current scheme used by spares. Neil. Darren.Reed@Sun.COM wrote On 03/22/07 12:41,: > Neil Perrin wrote: > >> >> >> Darren.Reed@Sun.COM wrote On 03/21/07 18:41,: >> ... >> >>> >>> From the configuration of the pool, yes, it should be obvious >>> whether a device is being used for logging or data, but wouldn't >>> it be more clear (and better for sanity checking of the command >>> line options being used, both by the programs AND the administrator) >>> to also require the "log" keyword elsewhere? >> >> >> >> The scheme follows the method used for adding/removing spares. >> The "log" specifier is not needed in other cases as internally we can >> determine >> its type, so is not required. I do see what you mean about lack of >> symmetry though. > > > > I give you two different commands: > > zpool add mypool log /dev/dsk/c1t1d0s6 > zpool remove mypool /dev/dsk/c2t3d0s6 > > With respect to reviewing command history or even just in scripts > or pasted into logs, there is a distinct difference in the amount of > information carried in each of the above commands. > > If I replace the latter with: > > zpool remove mypool spare /dev/dsk/c2t3d0s6 > > or even: > > zpool remove mypool log /dev/dsk/c2t3d0s6 > > any ambiguity that may have existed is removed. For a human to > otherwise understand what the 2nd command was intended to do > requires knowledge of what's in the pool at that point in time. In > general, shell history files, and scripts don't contain that information. > > Having the extra keyword also protects you from finger trouble, > for example, if "s6" is the log device but "s7" is configured as a > spare, then accidently typing "zpool remove ..s7" might have > unintended consequenes and not even be noticed until some > later point in time. > > Darren > From sacadmin Thu Mar 22 16:14:27 2007 Received: from eastmail2bur.East.Sun.COM (eastmail2bur.East.Sun.COM [129.148.13.40]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l2MNERTL021545 for ; Thu, 22 Mar 2007 16:14:27 -0700 (PDT) Received: from thunk.east.sun.com (thunk.East.Sun.COM [129.148.174.66]) by eastmail2bur.East.Sun.COM (8.13.6+Sun/8.13.6/ENSMAIL,v2.2) with ESMTP id l2MNEJUa006638; Thu, 22 Mar 2007 19:14:19 -0400 (EDT) Received: from [IPv6:::1] (localhost [IPv6:::1]) by thunk.east.sun.com (8.13.8+Sun/8.13.8) with ESMTP id l2MNEJ8M000021; Thu, 22 Mar 2007 19:14:19 -0400 (EDT) Subject: Re: ZFS Separate Intent Log [PSARC/2007/171 Timeout: 03/28/2007] From: Bill Sommerfeld To: Neil.Perrin@sun.com Cc: Mark Maybee , PSARC@sac.sfbay.sun.com In-Reply-To: <46030724.3070106@Sun.COM> References: <200703212048.l2LKmSXp388561@jurassic.eng.sun.com> <1174577597.7927.45.camel@localhost> <46030724.3070106@Sun.COM> Content-Type: text/plain Date: Thu, 22 Mar 2007 19:14:18 -0400 Message-Id: <1174605258.29799.16.camel@thunk> Mime-Version: 1.0 X-Mailer: Evolution 2.8.1.1 Content-Transfer-Encoding: 7bit Status: RO Content-Length: 521 On Thu, 2007-03-22 at 16:45 -0600, Neil Perrin wrote: > I agree it should be possible to force ignore a missing/unreadable log > device (that isn't mirrored). No mechanism exists for this currently, > but is obviously needed. As you suspected I do not want to make that > part of this case. I'm on a very tight schedule to provide this functionality > to fishworks. Anyway, nothing specified in this PSARC case prevents such > a future enhancement. Can you file an RFE so this doesn't get lost? Thanks. - Bill From sacadmin Thu Mar 22 16:18:29 2007 Received: from sfbaymail2sca.sfbay.sun.com (sfbaymail2sca.SFBay.Sun.COM [129.145.155.42]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l2MNIT2O021601 for ; Thu, 22 Mar 2007 16:18:29 -0700 (PDT) Received: from brmea-mail-2.sun.com (brmea-mail-2.Sun.COM [192.18.98.43]) by sfbaymail2sca.sfbay.sun.com (8.13.6+Sun/8.12.10/ENSMAIL,v2.2) with ESMTP id l2MNITn0008864 for ; Thu, 22 Mar 2007 16:18:29 -0700 (PDT) Received: from fe-amer-05.sun.com ([192.18.108.179]) by brmea-mail-2.sun.com (8.13.6+Sun/8.12.9) with ESMTP id l2MNITdV025496 for ; Thu, 22 Mar 2007 23:18:29 GMT Received: from conversion-daemon.mail-amer.sun.com by mail-amer.sun.com (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) id <0JFB00B01UZWVA00@mail-amer.sun.com> (original mail from Neil.Perrin@Sun.COM) for PSARC@sac.sfbay.sun.com; Thu, 22 Mar 2007 17:18:29 -0600 (MDT) Received: from [129.147.9.35] by mail-amer.sun.com (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) with ESMTPSA id <0JFB00KGBVESQ5U0@mail-amer.sun.com>; Thu, 22 Mar 2007 17:18:29 -0600 (MDT) Date: Thu, 22 Mar 2007 17:18:28 -0600 From: Neil Perrin Subject: Re: ZFS Separate Intent Log [PSARC/2007/171 Timeout: 03/28/2007] In-reply-to: <1174605258.29799.16.camel@thunk> Sender: Neil.Perrin@Sun.COM To: Bill Sommerfeld Cc: Mark Maybee , PSARC@sac.sfbay.sun.com Reply-to: Neil.Perrin@Sun.COM Message-id: <46030EC4.7000809@Sun.COM> MIME-version: 1.0 Content-type: text/plain; format=flowed; charset=us-ascii Content-transfer-encoding: 7BIT X-Accept-Language: en-us, en References: <200703212048.l2LKmSXp388561@jurassic.eng.sun.com> <1174577597.7927.45.camel@localhost> <46030724.3070106@Sun.COM> <1174605258.29799.16.camel@thunk> User-Agent: Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.7) Gecko/20060120 Status: RO Content-Length: 630 Bill Sommerfeld wrote On 03/22/07 17:14,: > On Thu, 2007-03-22 at 16:45 -0600, Neil Perrin wrote: > >>I agree it should be possible to force ignore a missing/unreadable log >>device (that isn't mirrored). No mechanism exists for this currently, >>but is obviously needed. As you suspected I do not want to make that >>part of this case. I'm on a very tight schedule to provide this functionality >>to fishworks. Anyway, nothing specified in this PSARC case prevents such >>a future enhancement. > > > Can you file an RFE so this doesn't get lost? Thanks. > > - Bill Certainly, and thanks for being considerate. Neil. From sacadmin Thu Mar 22 16:23:37 2007 Received: from sineb-mail-2.sun.com ([192.18.19.7]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l2MNNaS0021645 for ; Thu, 22 Mar 2007 16:23:37 -0700 (PDT) Received: from fe-apac-06.sun.com (fe-apac-06.sun.com [192.18.19.177] (may be forged)) by sineb-mail-2.sun.com (8.13.6+Sun/8.12.9) with ESMTP id l2MNNU4I019266 for ; Thu, 22 Mar 2007 23:23:30 GMT Received: from conversion-daemon.mail-apac.sun.com by mail-apac.sun.com (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) id <0JFB00G01VH3K400@mail-apac.sun.com> (original mail from Darren.Reed@Sun.COM) for PSARC@sac.sfbay.sun.com; Fri, 23 Mar 2007 07:23:30 +0800 (SGT) Received: from [129.146.106.55] by mail-apac.sun.com (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) with ESMTPSA id <0JFB006BYVN1W9L1@mail-apac.sun.com>; Fri, 23 Mar 2007 07:23:26 +0800 (SGT) Date: Thu, 22 Mar 2007 16:22:40 -0700 From: Darren.Reed@Sun.COM Subject: Re: ZFS Separate Intent Log [PSARC/2007/171 Timeout: 03/28/2007] In-reply-to: <460308EB.20707@Sun.COM> Sender: Darren.Reed@Sun.COM To: Neil.Perrin@Sun.COM Cc: Mark Maybee , PSARC@sac.sfbay.sun.com Message-id: <46030FC0.5030907@Sun.COM> MIME-version: 1.0 Content-type: text/plain; format=flowed; charset=ISO-8859-1 Content-transfer-encoding: 7BIT X-Accept-Language: en-us, en References: <200703212048.l2LKmSXp388561@jurassic.eng.sun.com> <4601D0A8.6060401@Sun.COM> <4601D4AC.2060308@Sun.COM> <4602CDED.9060100@Sun.COM> <460308EB.20707@Sun.COM> User-Agent: Mozilla/5.0 (X11; U; SunOS i86pc; en-US; rv:1.7) Gecko/20060120 Status: RO Content-Length: 2523 Neil Perrin wrote: > I agree there could be some confusion, but it is possible to see > the current configuration with "zpool status " and a history with > "zpool history". For example: > > : mull ; zpool status > pool: whirl > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > whirl ONLINE 0 0 0 > c1t10d0 ONLINE 0 0 0 > logs > c1t8d0 ONLINE 0 0 0 > c1t9d0 ONLINE 0 0 0 > spares > c1t11d0 AVAIL > > errors: No known data errors > : mull ; zpool history > History for 'whirl': > 2007-03-22.16:48:36 zpool create whirl c1t10d0 log c1t8d0 > 2007-03-22.16:49:32 zpool add whirl log c1t9d0 > 2007-03-22.16:51:15 zpool add whirl spare c1t11d0 > > : mull ; > > I think it's better to fit in with the current scheme used by spares. That worked for spares because spares was the only possible device to remove...this case changes that. And while the current interface works because there is only one type of device to remove, it doesn't work well for the future... The point I'm trying to make here is that you can only make sense of the "other" zpool commands (remove/replace) when you have the full context of the pool before the command is executed. As I mentioned before, add/create are fine, it is elsewhere that the design falls down and is of questionable architecture. If we look further forward, there could be more/other uses of devices that get added to the mix and it becomes less clear, through examining the commands themselves, what is actually being removed. The design here, to me, seems to be more dangerous than what we should be allowing. By requiring a "log" or "spare" keyword to be added to the command line, there is the possibility of a sanity check of "what did the admin really mean". eg if I was to do: zpool remove whirl spare c1t9d0 it could return an error saying that c1t9d0 is not a spare. The extra effort required to add in "spare" or "log" is a small price to pay for the added safety to the operation as a whole. A question for the ZFS team - at the time of PSARC/2006/223 (ZFS hot spares) was presented, was it known if there would be any further uses of add/remove for "other" devices? Given that 2006/223 only sought "Evolving" for the introduction of how to interact with "spares", it would seem that there is nothing to preclude that interface from being revised? Darren From sacadmin Thu Mar 22 16:28:44 2007 Received: from sfbaymail1sca.SFBay.Sun.COM (sfbaymail1sca.SFBay.Sun.COM [129.145.154.35]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l2MNSi3P021664 for ; Thu, 22 Mar 2007 16:28:44 -0700 (PDT) Received: from brmea-mail-2.sun.com (brmea-mail-2.Sun.COM [192.18.98.43]) by sfbaymail1sca.SFBay.Sun.COM (8.13.6+Sun/8.13.6/ENSMAIL,v2.2) with ESMTP id l2MNShxU022327 for ; Thu, 22 Mar 2007 16:28:43 -0700 (PDT) Received: from fe-amer-04.sun.com ([192.18.108.178]) by brmea-mail-2.sun.com (8.13.6+Sun/8.12.9) with ESMTP id l2MNShFc028452 for ; Thu, 22 Mar 2007 23:28:43 GMT Received: from conversion-daemon.mail-amer.sun.com by mail-amer.sun.com (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) id <0JFB00G01VQSWY00@mail-amer.sun.com> (original mail from Neil.Perrin@Sun.COM) for PSARC@sac.sfbay.sun.com; Thu, 22 Mar 2007 17:28:43 -0600 (MDT) Received: from [129.147.9.35] by mail-amer.sun.com (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) with ESMTPSA id <0JFB008ELVVUZC96@mail-amer.sun.com>; Thu, 22 Mar 2007 17:28:42 -0600 (MDT) Date: Thu, 22 Mar 2007 17:28:42 -0600 From: Neil Perrin Subject: Re: ZFS Separate Intent Log [PSARC/2007/171 Timeout: 03/28/2007] In-reply-to: <17922.25431.158509.54336@gargle.gargle.HOWL> Sender: Neil.Perrin@Sun.COM To: Roch - PAE Cc: Mark Maybee , PSARC@sac.sfbay.sun.com Reply-to: Neil.Perrin@Sun.COM Message-id: <4603112A.9090302@Sun.COM> MIME-version: 1.0 Content-type: text/plain; format=flowed; charset=us-ascii Content-transfer-encoding: 7BIT X-Accept-Language: en-us, en References: <200703212048.l2LKmSXp388561@jurassic.eng.sun.com> <17922.25431.158509.54336@gargle.gargle.HOWL> User-Agent: Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.7) Gecko/20060120 Status: RO Content-Length: 1113 Roch - PAE wrote On 03/22/07 05:07,: > This cases establishes a set of devices that are to be > used exclusively for storing ZFS Intent logs. Yes. > > Since the ZIL are per filesystems structures, the current > proposal is to have each filesystem use the new log devices > for the ZIL. There might be a future need to control the use > or not of the dedicated device per individual ZFS. For > instance a DB redo logs would be a real target for using > this extension however, in the interest of preserving the > fast log, the main DB space might want to avoid commiting to > it. So we might just open up the wording and dissociated > the existence of log-only devices and its implied use by > every filesystem logs. I can see what you mean, and the potential uses. I would think this would be tricky to specify and would be complex and harder to implement. The general theme of ZFS is pooling resources, so also providing the opposite of partitioning for various uses doesn't fit well with the architecture. Anyway, I'd prefer not to extend this PSARC case for this functionality. Neil. From sacadmin Thu Mar 22 22:15:11 2007 Received: from sfbaymail1sca.SFBay.Sun.COM (sfbaymail1sca.SFBay.Sun.COM [129.145.154.35]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l2N5FBE8028918 for ; Thu, 22 Mar 2007 22:15:11 -0700 (PDT) Received: from brmea-mail-4.sun.com (brmea-mail-4.Sun.COM [192.18.98.36]) by sfbaymail1sca.SFBay.Sun.COM (8.13.6+Sun/8.13.6/ENSMAIL,v2.2) with ESMTP id l2N5FBaO011656 for ; Thu, 22 Mar 2007 22:15:11 -0700 (PDT) Received: from fe-amer-04.sun.com ([192.18.108.178]) by brmea-mail-4.sun.com (8.13.6+Sun/8.12.9) with ESMTP id l2N5FBZ1014446 for ; Fri, 23 Mar 2007 05:15:11 GMT Received: from conversion-daemon.mail-amer.sun.com by mail-amer.sun.com (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) id <0JFC00H01BN7RN00@mail-amer.sun.com> (original mail from Neil.Perrin@Sun.COM) for PSARC@sac.sfbay.sun.com; Thu, 22 Mar 2007 23:15:11 -0600 (MDT) Received: from [129.147.9.123] by mail-amer.sun.com (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) with ESMTPSA id <0JFC008D5BXBZ9N4@mail-amer.sun.com>; Thu, 22 Mar 2007 23:15:11 -0600 (MDT) Date: Thu, 22 Mar 2007 23:15:10 -0600 From: Neil Perrin Subject: Re: ZFS Separate Intent Log [PSARC/2007/171 Timeout: 03/28/2007] In-reply-to: <1174605258.29799.16.camel@thunk> Sender: Neil.Perrin@Sun.COM To: Bill Sommerfeld Cc: Mark Maybee , PSARC@sac.sfbay.sun.com Reply-to: Neil.Perrin@Sun.COM Message-id: <4603625E.5050200@Sun.COM> MIME-version: 1.0 Content-type: text/plain; format=flowed; charset=us-ascii Content-transfer-encoding: 7BIT X-Accept-Language: en-us, en References: <200703212048.l2LKmSXp388561@jurassic.eng.sun.com> <1174577597.7927.45.camel@localhost> <46030724.3070106@Sun.COM> <1174605258.29799.16.camel@thunk> User-Agent: Mozilla/5.0 (X11; U; SunOS sun4v; en-US; rv:1.7) Gecko/20060120 Status: RO Content-Length: 656 Bill Sommerfeld wrote On 03/22/07 17:14,: > On Thu, 2007-03-22 at 16:45 -0600, Neil Perrin wrote: > >>I agree it should be possible to force ignore a missing/unreadable log >>device (that isn't mirrored). No mechanism exists for this currently, >>but is obviously needed. As you suspected I do not want to make that >>part of this case. I'm on a very tight schedule to provide this functionality >>to fishworks. Anyway, nothing specified in this PSARC case prevents such >>a future enhancement. > > > Can you file an RFE so this doesn't get lost? Thanks. > > - Bill Done: 6538021 Need a way to force pool startup, when a separate log fails. From sacadmin Sun Mar 25 10:40:00 2007 Received: from sfbaymail2sca.sfbay.sun.com (sfbaymail2sca.SFBay.Sun.COM [129.145.155.42]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l2PHe0Ej027381 for ; Sun, 25 Mar 2007 10:40:00 -0700 (PDT) Received: from brmea-mail-1.sun.com (brmea-mail-1.Sun.COM [192.18.98.31]) by sfbaymail2sca.sfbay.sun.com (8.13.6+Sun/8.12.10/ENSMAIL,v2.2) with ESMTP id l2PHe0r7010072 for ; Sun, 25 Mar 2007 10:40:00 -0700 (PDT) Received: from fe-amer-04.sun.com ([192.18.108.178]) by brmea-mail-1.sun.com (8.13.6+Sun/8.12.9) with ESMTP id l2PHe04e003562 for ; Sun, 25 Mar 2007 17:40:00 GMT Received: from conversion-daemon.mail-amer.sun.com by mail-amer.sun.com (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) id <0JFG00H01ZEHVS00@mail-amer.sun.com> (original mail from Neil.Perrin@Sun.COM) for PSARC@sac.sfbay.Sun.COM; Sun, 25 Mar 2007 11:40:00 -0600 (MDT) Received: from [129.147.9.123] by mail-amer.sun.com (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) with ESMTPSA id <0JFG00LK3ZQHFSI3@mail-amer.sun.com> for PSARC@sac.sfbay.Sun.COM; Sun, 25 Mar 2007 11:39:59 -0600 (MDT) Date: Sun, 25 Mar 2007 11:39:52 -0600 From: Neil Perrin Subject: Re: ZFS Separate Intent Log [PSARC/2007/171 Timeout: 03/28/2007] In-reply-to: <200703212048.l2LKmSXp388561@jurassic.eng.sun.com> Sender: Neil.Perrin@Sun.COM To: Mark Maybee Cc: PSARC@sac.sfbay.sun.com Reply-to: Neil.Perrin@Sun.COM Message-id: <4606B3E8.3030605@Sun.COM> MIME-version: 1.0 Content-type: multipart/mixed; boundary="Boundary_(ID_bEezyLk4FJAoINLVXsWueg)" X-Accept-Language: en-us, en References: <200703212048.l2LKmSXp388561@jurassic.eng.sun.com> User-Agent: Mozilla/5.0 (X11; U; SunOS sun4v; en-US; rv:1.7) Gecko/20060120 Status: RO Content-Length: 8154 This is a multi-part message in MIME format. --Boundary_(ID_bEezyLk4FJAoINLVXsWueg) Content-type: text/plain; format=flowed; charset=us-ascii Content-transfer-encoding: 7BIT The feedback seems to have died down a bit, so attached is an updated case. This covers the following comments: Jeff Bonwick Concern: Why disallow separate logs on files. Resolution: Added. Glenn Skinner Concern: What happens to existing pools with the log name. Resolution: A zpool upgrade is needed, which will fail until pool renamed. Darren Reed Concern: Missing "zpool replace" Resolution: Added Darren Reed Concern: Would like additional log or spare word to be added for zpool remove/replace/attach/detach Resolution: Unwilling to change this as it exists for spares and its possible to easily determine device configuration. Bill Sommerfeld Concern: Need guidance on sizing and when to use separate logs. Resolution: Will be added to ZFS Best Practices Guide Bill Sommerfeld Concern: Should be possible to force ignore a faulty separate log device Resolution: Filed RFE 6538021 Roch Bourbonnais Concern: Would like potential to specify a dedicated device for other purposes Resolution: Prefer not to extend this PSARC case for this functionality. I hope I've covered everything. BTW: I'm on vacation until Apr/2. Neil. --Boundary_(ID_bEezyLk4FJAoINLVXsWueg) Content-type: text/plain; name=slog_psarc Content-transfer-encoding: 7BIT Content-disposition: inline; filename=slog_psarc PSARC CASE: 2007/171 ZFS Separate Intent Log SUMMARY: This is a proposal to allow separate devices to be used for the ZFS Intent Log (ZIL). The sole purpose of this is performance. The devices can be disks, solid state drives, nvram drives, or any device that presents a block interface. PROBLEM: The ZIL satisfies the synchronous requirements of POSIX. For instance, databases often require their transactions to be on stable storage on return from the system call. NFS and other applications can also use fsync() to ensure data stability. The speed of the ZIL is therefore essential in determining the latency of writes for these critical applications. Currently the ZIL is allocated dynamically from the pool. It consists of a chain of varying block sizes which are anchored in fixed objects. Blocks are sized to fit the demand and will come from different metaslabs and thus different areas of the disk. This causes more head movement. Furthermore, the log blocks are freed as soon as the intent log transaction (system call) is committed. So a swiss cheesing effect can occur leading to pool fragmentation. PROPOSED SOLUTION: This proposal takes advantage of the greatly faster media speeds of nvram, solid state disks, or even dedicated disks. To this end, additional extensions to the zpool command are defined: zpool create log Creates a pool with a separate log. If more than one log device is specified then writes are load-balanced between devices. It's also possible to mirror log devices. For example a log consisting of two sets of two mirrors could be created thus: zpool create \ log mirror c1t8d0 c1t9d0 mirror c1t10d0 c1t11d0 A raidz/raidz2 log is not supported zpool add log Creates a separate log if it doesn't exist, or adds extra devices if it does. zpool remove Remove the log devices. If all log devices are removed we revert to placing the log in the pool. Evacuating a log is easily handled by ensuring all txgs are committed. zpool replace Replace old log device with new log device. zpool attach Attaches a new log device to an existing log device. If the existing device is not a mirror then a 2 way mirror is created. If device is part of a two-way log mirror, attaching new_device creates a three-way log mirror, and so on. zpool detach pool Detaches a log device from a mirror. zpool status Additionally displays the log devices zpool iostat Additionally shows IO statistics for log devices. zpool export/import Will export and import the log devices. When a separate log that is not mirrored fails then logging will start using chained logs within the main pool. The name "log" will become a reserved word. Attempts to create a pool with the name "log" will fail with: "cannot create 'log': name is reserved pool name may have been omitted" Hot spares cannot replace log devices. BINDING: A micro/patch binding is requested. MAN PAGE CHANGES: *** zpool.ori Fri Mar 16 10:05:31 2007 --- zpool.new Sat Mar 17 00:50:24 2007 *************** *** 169,174 **** --- 169,180 ---- hot spares for a pool. For more information, see the "Hot Spares" section. + log A separate intent log device. If more than one + log device is specified then writes are load-balanced + between devices. Log devices can be also be mirrored. + However, neither raidz nor raidz2 are supported + for the intent log. For more information, see the + "Intent Log" section. *************** *** 286,292 **** --- 292,301 ---- pools. + Spares cannot replace log devices. + + Alternate Root Pools The "zpool create -R" and "zpool import -R" commands allow users to create and import a pool with a different root *************** *** 313,318 **** --- 322,346 ---- + Intent Log + The ZFS Intent Log satisfies the synchronous requirements of POSIX. + For instance, databases often require their transactions + to be on stable storage on return from the system call. + NFS and other applications can also use fsync() to ensure + data stability. By default, the intent log is allocated from + blocks within the pool, however for greater performance, separate + intent log device(s) can be specified. For example, + + # zpool create pool c0d0 c1d0 log c2d0 + + Multiple log devices can also be specified, and they can be + mirrored. See the EXAMPLES section later for an example of this. + + Log devices can be added with the "zpool add" command and + removed with the "zpool remove" command. + + + Subcommands All subcommands that modify state are logged persistently to the pool in their original form. *************** *** 355,362 **** ices specified on the command line. The pool name must begin with a letter, and can only contain alphanumeric characters as well as underscore ("_"), dash ("-"), and ! period ("."). The pool names "mirror", "raidz", and ! "spare" are reserved, as are names beginning with the pattern "c[0-9]". The vdev specification is described in the "Virtual Devices" section. --- 383,390 ---- ices specified on the command line. The pool name must begin with a letter, and can only contain alphanumeric characters as well as underscore ("_"), dash ("-"), and ! period ("."). The pool names "mirror", "raidz", ! "spare", and "log" are reserved, as are names beginning with the pattern "c[0-9]". The vdev specification is described in the "Virtual Devices" section. *************** *** 1283,1288 **** --- 1311,1324 ---- # zpool remove tank c0t2d0 + Example 11 Creating a pool with mirrored separate intent logs. + + The following command creates a pool with a separate intent + log consisting of two sets of two mirrors: + + # zpool create pool c0d0 c1d0 log mirror c2d0 c3d0 mirror c4d0 c5d0 + + EXIT STATUS The following exit values are returned: --Boundary_(ID_bEezyLk4FJAoINLVXsWueg)-- From sacadmin Mon Mar 26 11:35:41 2007 Received: from eastmail1bur.East.Sun.COM (eastmail1bur.East.Sun.COM [129.148.9.49]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l2QIZeZY020877 for ; Mon, 26 Mar 2007 11:35:41 -0700 (PDT) Received: from thunk.east.sun.com (thunk.East.Sun.COM [129.148.174.66]) by eastmail1bur.East.Sun.COM (8.13.6+Sun/8.13.6/ENSMAIL,v2.2) with ESMTP id l2QIZdmu015688; Mon, 26 Mar 2007 14:35:39 -0400 (EDT) Received: from [IPv6:::1] (localhost [IPv6:::1]) by thunk.east.sun.com (8.13.8+Sun/8.13.8) with ESMTP id l2QIZdY2016614; Mon, 26 Mar 2007 14:35:39 -0400 (EDT) Subject: Re: ZFS Separate Intent Log [PSARC/2007/171 Timeout: 03/28/2007] From: Bill Sommerfeld To: Neil.Perrin@sun.com Cc: Mark Maybee , PSARC@sac.sfbay.sun.com In-Reply-To: <4606B3E8.3030605@Sun.COM> References: <200703212048.l2LKmSXp388561@jurassic.eng.sun.com> <4606B3E8.3030605@Sun.COM> Content-Type: text/plain Date: Mon, 26 Mar 2007 14:35:37 -0400 Message-Id: <1174934137.13736.33.camel@thunk> Mime-Version: 1.0 X-Mailer: Evolution 2.8.1.1 Content-Transfer-Encoding: 7bit Status: RO Content-Length: 661 On Sun, 2007-03-25 at 11:39 -0600, Neil Perrin wrote: > + By default, the intent log is allocated from > + blocks within the pool, however for greater performance, separate > + intent log device(s) can be specified. if I'm not mistaken this appears to be unchanged from the first version -- IMHO it should be weakened to add some amount of "your milage may vary". How about something like: "By default, the intent log is allocated from blocks within the pool; under {many, most} workloads, having separate intent log devices will improve write performance at the cost of reduced pool capacity." (you pick "many" or "most"). - Bil From sacadmin Tue Mar 27 01:20:17 2007 Received: from bebop.France.Sun.COM (bebop.France.Sun.COM [129.157.174.15]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l2R8KGO0012373 for ; Tue, 27 Mar 2007 01:20:17 -0700 (PDT) Received: from corn.Sun.COM (corn [129.157.192.240]) by bebop.France.Sun.COM (8.13.8+Sun/8.13.8) with SMTP id l2R8KF7P012176; Tue, 27 Mar 2007 10:20:16 +0200 (MEST) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <17928.54086.193358.131675@gargle.gargle.HOWL> Date: Tue, 27 Mar 2007 10:18:14 +0200 From: Roch - PAE To: Bill Sommerfeld Cc: Neil.Perrin@sun.com, Mark Maybee , PSARC@sac.sfbay.sun.com Subject: Re: ZFS Separate Intent Log [PSARC/2007/171 Timeout: 03/28/2007] In-Reply-To: <1174934137.13736.33.camel@thunk> References: <200703212048.l2LKmSXp388561@jurassic.eng.sun.com> <4606B3E8.3030605@Sun.COM> <1174934137.13736.33.camel@thunk> X-Mailer: VM 7.19 under 21.1 (patch 3) "Acadia" XEmacs Lucid Organization: SUN Microsystems Phone: (+33).4.76.18.83.20 (x[70]38320) Status: RO Content-Length: 1993 Bill Sommerfeld writes: > On Sun, 2007-03-25 at 11:39 -0600, Neil Perrin wrote: > > > + By default, the intent log is allocated from > > + blocks within the pool, however for greater performance, separate > > + intent log device(s) can be specified. > > if I'm not mistaken this appears to be unchanged from the first version > -- IMHO it should be weakened to add some amount of "your milage may > vary". > > How about something like: > > "By default, the intent log is allocated from blocks within the pool; > under {many, most} workloads, having separate intent log devices will > improve write performance at the cost of reduced pool capacity." > > (you pick "many" or "most"). > > The wording covers the situation where a disk a significant size is taken away from a pool of similar disks to be used as a log-only device. Taking away disks from the main pool would trade away pool capacity, read IOPS and (possibly) write throughput. But another interesting scenario is to enable other types of devices to be used as log-only devices. A small NVRAM based lun is one, and a 15K rpm drives could fit well in this space. So in some situations we are not reducing the pool capacity, but just "improving synchronous write latency at the cost of an extra low latency device". If I may: "By default, the intent log is allocated from blocks within the pool; For workloads that are sensitive to synchronous write latency (such as databases, NFS,...) it can be interesting for performance reason to dedicate a subset of pool disks as log devices. We note that disks used as log devices do not contribute to pool block capacity nor participate in read operations. Alternatively, the use of an additional lower latency log devices can have great positive impact on such workloads. In general, log devices need not be of size greater than a few seconds (10) worth of pool throughput." -r > - Bil > > From sacadmin Wed Mar 28 14:53:37 2007 Received: from sfbaymail2sca.sfbay.sun.com (sfbaymail2sca.SFBay.Sun.COM [129.145.155.42]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l2SLraCh012365 for ; Wed, 28 Mar 2007 14:53:37 -0700 (PDT) Received: from brmea-mail-3.sun.com (brmea-mail-3.Sun.COM [192.18.98.34]) by sfbaymail2sca.sfbay.sun.com (8.13.6+Sun/8.12.10/ENSMAIL,v2.2) with ESMTP id l2SLrajm028881 for ; Wed, 28 Mar 2007 14:53:36 -0700 (PDT) Received: from fe-amer-04.sun.com ([192.18.108.178]) by brmea-mail-3.sun.com (8.13.6+Sun/8.12.9) with ESMTP id l2SLra5b014025 for ; Wed, 28 Mar 2007 21:53:36 GMT Received: from conversion-daemon.mail-amer.sun.com by mail-amer.sun.com (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) id <0JFM00201VDV2Y00@mail-amer.sun.com> (original mail from Mark.Maybee@Sun.COM) for PSARC@sac.sfbay.Sun.COM; Wed, 28 Mar 2007 15:53:36 -0600 (MDT) Received: from [192.9.61.138] by mail-amer.sun.com (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) with ESMTPSA id <0JFM0087FVHBZCC8@mail-amer.sun.com> for PSARC@sac.sfbay.Sun.COM; Wed, 28 Mar 2007 15:53:36 -0600 (MDT) Date: Wed, 28 Mar 2007 14:55:26 -0700 From: Mark Maybee Subject: Re: ZFS Separate Intent Log [PSARC/2007/171 Timeout: 03/28/2007] In-reply-to: <4606B3E8.3030605@Sun.COM> Sender: Mark.Maybee@Sun.COM To: Neil.Perrin@Sun.COM Cc: PSARC@sac.sfbay.sun.com Message-id: <460AE44E.6050504@Sun.COM> MIME-version: 1.0 Content-type: text/plain; format=flowed; charset=ISO-8859-1 Content-transfer-encoding: 7BIT References: <200703212048.l2LKmSXp388561@jurassic.eng.sun.com> <4606B3E8.3030605@Sun.COM> User-Agent: Thunderbird 2.0b2 (X11/20070212) Status: RO Content-Length: 76 This case was approved during ARC Business at the 03/28/2007 PSARC meeting.