Subject: PSARC FastTrack [03/28/2007]: ZFS Separate Intent Log Template Version: @(#)sac_nextcase %I% %G% SMI This information is Copyright 2007, Sun Microsystems 1. Introduction 1.1. Project/Component Working Name: ZFS Separate Intent Log 1.2. Name of Document Author/Supplier: Author: Neil Perrin 1.3 Date of This Document: 21 March, 2007 4. Technical Description This case adds extensions to several existing zpool commands to allow separate log devices to be created and manipulated. It also extends the output of some commands to include log device status. The stability of these changes is committed, and the release binding is patch/micro. SUMMARY: This is a proposal to allow separate devices to be used for the ZFS Intent Log (ZIL). The sole purpose of this is performance. The devices can be disks, solid state drives, nvram drives, or any device that presents a block interface. PROBLEM: The ZIL satisfies the synchronous requirements of POSIX. For instance, databases often require their transactions to be on stable storage on return from the system call. NFS and other applications can also use fsync() to ensure data stability. The speed of the ZIL is therefore essential in determining the latency of writes for these critical applications. Currently the ZIL is allocated dynamically from the pool. It consists of a chain of varying block sizes which are anchored in fixed objects. Blocks are sized to fit the demand and will come from different metaslabs and thus different areas of the disk. This causes more head movement. Furthermore, the log blocks are freed as soon as the intent log transaction (system call) is committed. So a swiss cheesing effect can occur leading to pool fragmentation. PROPOSED SOLUTION: This proposal takes advantage of the greatly faster media speeds of nvram, solid state disks, or even dedicated disks. To this end, additional extensions to the zpool command are defined: zpool create log Creates a pool with a separate log. If more than one log device is specified then writes are load-balanced between devices. It's also possible to mirror log devices. For example a log consisting of two sets of two mirrors could be created thus: zpool create \ log mirror c1t8d0 c1t9d0 mirror c1t10d0 c1t11d0 A raidz/raidz2 log is not supported, nor is placing logs on files. zpool add log Creates a separate log if it doesn't exist, or adds extra devices if it does. zpool remove Remove the log devices. If all log devices are removed we revert to placing the log in the pool. Evacuating a log is easily handled by ensuring all txgs are committed. zpool attach Attaches a new log device to an existing log device. If the existing device is not a mirror then a 2 way mirror is created. If device is part of a two-way log mirror, attaching new_device creates a three-way log mirror, and so on. zpool detach pool Detaches a log device from a mirror. zpool status Additionally displays the log devices zpool iostat Additionally shows IO statistics for log devices. zpool export/import Will export and import the log devices. When a separate log that is not mirrored fails then logging will start using chained logs within the main pool. The name "log" will become a reserved word. Attempts to create a pool with the name "log" will fail with: "cannot create 'log': name is reserved pool name may have been omitted" Hot spares cannot replace log devices. BINDING: A micro/patch binding is requested. MAN PAGE CHANGES: *** zpool.ori Fri Mar 16 10:05:31 2007 --- zpool.new Sat Mar 17 00:50:24 2007 *************** *** 169,174 **** --- 169,180 ---- hot spares for a pool. For more information, see the "Hot Spares" section. + log A separate intent log device. If more than one + log device is specified then writes are load-balanced + between devices. Log devices can be also be mirrored. + However, neither raidz/raidz2 nor files are supported + for the intent log. For more information, see the + "Intent Log" section. *************** *** 286,292 **** --- 292,301 ---- pools. + Spares cannot replace log devices. + + Alternate Root Pools The "zpool create -R" and "zpool import -R" commands allow users to create and import a pool with a different root *************** *** 313,318 **** --- 322,346 ---- + Intent Log + The ZFS Intent Log satisfies the synchronous requirements of POSIX. + For instance, databases often require their transactions + to be on stable storage on return from the system call. + NFS and other applications can also use fsync() to ensure + data stability. By default, the intent log is allocated from + blocks within the pool, however for greater performance, separate + intent log device(s) can be specified. For example, + + # zpool create pool c0d0 c1d0 log c2d0 + + Multiple log devices can also be specified, and they can be + mirrored. See the EXAMPLES section later for an example of this. + + Log devices can be added with the "zpool add" command and + removed with the "zpool remove" command. + + + Subcommands All subcommands that modify state are logged persistently to the pool in their original form. *************** *** 355,362 **** ices specified on the command line. The pool name must begin with a letter, and can only contain alphanumeric characters as well as underscore ("_"), dash ("-"), and ! period ("."). The pool names "mirror", "raidz", and ! "spare" are reserved, as are names beginning with the pattern "c[0-9]". The vdev specification is described in the "Virtual Devices" section. --- 383,390 ---- ices specified on the command line. The pool name must begin with a letter, and can only contain alphanumeric characters as well as underscore ("_"), dash ("-"), and ! period ("."). The pool names "mirror", "raidz", ! "spare", and "log" are reserved, as are names beginning with the pattern "c[0-9]". The vdev specification is described in the "Virtual Devices" section. *************** *** 1283,1288 **** --- 1311,1324 ---- # zpool remove tank c0t2d0 + Example 11 Creating a pool with mirrored separate intent logs. + + The following command creates a pool with a separate intent + log consisting of two sets of two mirrors: + + # zpool create pool c0d0 c1d0 log mirror c2d0 c3d0 mirror c4d0 c5d0 + + EXIT STATUS The following exit values are returned: 6. Resources and Schedule 6.4. Steering Committee requested information 6.4.1. Consolidation C-team Name: ON 6.5. ARC review type: FastTrack