Subject: Improved [s]sd-config-list support [PSARC/2008/465 FastTrack timeout 07/30/2008] To: PSARC-ext@Sun.Com Cc: Bcc: one-pager-list@sac.sfbay one-pager-log@sac.sfbay sac-bar@sac.sfbay I am sponsoring the following fasttrack for Li He (Nikko), with timeout set to 07/30/2008. Micro/patch binding is requested -Chris 1. Introduction 1.1. Project/Component Working Name: Improved [s]sd-config-list support 1.2. Name of Document Author/Supplier: Author: Nikko He 1.3. Date of This Document: 15, July 2008 4. Technical Description 4.1 Background Solaris 8/9 supported a system(4) [s]sd driver tunable called [s]sd_retry_count to specify the number of times a disk operation should be retried before returning failure (EIO). This tunable was adjusted by adding a line to /etc/system: set sd:sd_retry_count=5 Some third-party multipathing implementations (e.g Veritas DMP, and EMC Powerpath) developed dependencies on being able to tune error handling parameters: the default settings (60 second IO timeout, 3 retries) caused problems for their products. Without full understanding of the third-party dependencies, this tunable was removed in s10 by PSARC 2003/556 [7]. Now, the inability to tune driver error handling parameters is causing problems for third-party multipathing solutions and is preventing some large customers from migrating to s10. The current SD_BSY_TIMEOUT define is used to specify the delay before retry when busy status received. This is currently hard coded in the [s]sd driver. A customer is requesting to tune this variable to reduce delay in their environment. Based on the scsi_get_device_type_scsi_options(9F) interface defined by PSARC 1995/445 [1], 'Partner Private' [s]sd-config-list support was added to [s]sd(7D) as part of PSARC 1999/015 "delayed retries" [2]. This property is matched to the vendor ID and product ID strings in the device's SCSI INQUIRY data. This mechanism was subsequently used (Project Private) for PSARC 2001/692 [4] minimum throttle setting support PSARC 2001/693 [5] enable/disable disk sorting support PSARC 2002/294 [6] SCSI LOGICAL UNIT RESET support An example that enables LOGICAL UNIT RESET use is: ssd-config-list= "SUN T4", "t4-data"; t4-data=1,0x20000,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1; The first number in the t4-data array above is the version of the ssd-config-list. Currently only version 1 is supported. The second number of t4-data is a bit mask of values to set. Bits 0-17 are already defined and each bit identifies a particular disk behavior. When a particular bit is set in the mask, the value associated with the behavior is found using "bit_number + 2" as an index into the t4-data. In the example above, bit 17 is set indicating support for the LOGICAL UNIT RESET tunable, and the tunable value is 1. 4.2 Problem There are two problems this fasttrack would like to address: 1. The current bit-mask/index approach and the use of indirect properties are hard to use. In addition, due to historical reasons, the definitions of bit positions in [s]sd-config-list are different across platforms: for example, fabricate device id is defined as bit 3 on Sparc but as bit 2 on x86/x64. This inconsistency in bit position definition is confusing and error-prone. 2. Solaris 10 and above currently do not support tuning retry counts. This has been a problem for customers who are willing to migrate to S10, but want to continue using third party multipathing software. The proposed public tunable 'retries-timeout' meets this requirement. Similarly, current SD_BSY_TIMEOUT functionality needs to be tunable to meet customer requirements. The proposed public tunable 'delay-busy' meets this requirement. We need to improve the way [s]sd-config-list works. 4.3 Proposed Solution The proposal is to introduce a new backwards compatible format for [s]sd-config-list property values which uses a name-value pair string, in JSON text format [10], to provide device-specific tuning. The existing [s]sd-conf-list format is '"VIDPID-string", "property-name"'. The new [s]sd-config-list format is '"VIDPID-string", "JSON-text"'. The format of the tuple can be detected by looking for ':' in the second string. In addition to supporting existing private behaviors, support for tuning various forms of retry behavior will be added. Using the proposed [s]sd-config-list improvements, tuning device specific behavior in [s]sd.conf will look something like: sd-config-list = "SUN T4", "delay-busy:600, retries-timeout:6"; Please refer to section 4.4 for specific tunables supported by the improved [s]sd-config-list, and also refer to section 4.5 for string syntax definition. The [s]sd driver will be enhanced to support both the current and the improved [s]sd-config-list formats. The current format is kept unchanged for backward compatibility. Interface name Commitment Comments --------------------------------------------------------------------- sd-config-list Committed New JSON-text format, but only ssd-config-list the specific tunables mentioned below are committed. delay-busy: Committed tunable name in [s]sd-config-list JSON-text that specifies the delay before retry when busy: nsecs retries-timeout: Committed tunable name in [s]sd-config-list JSON-text that specifies the number of retries to perform on an io timeout: count. : ProjPrivate tunable name in [s]sd-config-list JSON-text string. This case supersedes PSARC 2007/505 [9], which was never delivered. 4.4 Supported 'JSON-text' Tunable Names Tunables names supported by improved [s]sd-config-list: _______________________________________________________ |Tunable_Name |Commitment |Data_Type| |_________________________________|___________|_________| | cache-nonvolatile | Private | BOOLEAN | |_________________________________|___________|_________| | controller-type | Private | UINT32 | |_________________________________|___________|_________| | delay-busy | Committed | UINT32 | |_________________________________|___________|_________| | disksort | Private | BOOLEAN | |_________________________________|___________|_________| | timeout-releasereservation | Private | UINT32 | |_________________________________|___________|_________| | reset-lun | Private | BOOLEAN | |_________________________________|___________|_________| | retries-busy | Private | UINT32 | |_________________________________|___________|_________| | retries-timeout | Committed | UINT32 | |_________________________________|___________|_________| | retries-notready | Private | UINT32 | |_________________________________|___________|_________| | retries-reset | Private | UINT32 | |_________________________________|___________|_________| | throttle-max | Private | UINT32 | |_________________________________|___________|_________| | throttle-min | Private | UINT32 | |_________________________________|___________|_________| The general format of a tunable name is '-'. When choosing a name for a new tunables, consideration should be given to wether the tunable fits into any of the existing catefories. timeout- maximum time something should take. delay- delay time before issuing a retry. retries- number of retries before failure. throttle- activity control. reset- reset control. Some of the private tunables supported by the current v1 mechanism (all private) are not being carried forward (same ones dropped by [9]). See Appendix B for private tunables which are now considered obsolete. 4.5 Future We define the format of a compatible JSON-text with reference to the grammar of JSON [10]. In the future, we plan on providing generic interfaces in libnvpair [11] to convert between nvlist and JSON text, and will also adjust the sd implementation to use those generic interfaces at that time. The JSON RFC [10] states that "A JSON parser MAY accept non-JSON forms or extensions". This proposal accepts the following non-JSON forms: o In addition to supporting member = string name-separator value defined in section 2.2 of the RFC, we support member = name name-separator value where name is alpha = %x41-5A / %x61-7A ; A-Z a-z minus = 0x2D ; - ubar = 0x5F ; _ digit1-9 = 0x31-39 ; 1-9 zero = 0x30 ;0 alphap = alpha / zero / digit1-9 / ubar / minus name = alpha *alphap The result is that we will accept member names without quotes. o If JSON text does not start/end with 'begin-object/end-object' or 'begin-array/end-array' as specified in section 2.2/2.3 of the the RFC, then 'begin-object/end-object' bracketing is assumed. These are common extensions. The future nvlist JSON text decode implementation will obtain its type information from a prototype nvlist supplied by the caller that describes the name-type information of all supported objects. 4.6 Release Binding Micro/patch binding is requested. 4.7 References [1] PSARC 1995/445 scsi-options per device type http://sac.sfbay/PSARC/1995/445 http://www.opensolaris.org/os/community/arc/caselog/1995/445 [2] PSARC 1999/015 delayed retries http://sac.sfbay/PSARC/1999/015 http://www.opensolaris.org/os/community/arc/caselog/1999/015 [3] PSARC 2000/016 Merged Disk Driver (s81_37(s9)) http://sac.sfbay/PSARC/2000/016 http://www.opensolaris.org/os/community/arc/caselog/2000/016 [4] PSARC 2001/692 Per-Disk-Device Minimum Throttle Setting http://sac.sfbay/PSARC/2001/692 http://www.opensolaris.org/os/community/arc/caselog/2001/692 [5] PSARC 2001/693 Per-Disk-Device Disabling of disksort http://sac.sfbay/PSARC/2001/693 http://www.opensolaris.org/os/community/arc/caselog/2001/693 [6] PSARC 2002/294 SCSI LOGICAL UNIT RESET http://sac.sfbay/PSARC/2002/294 http://www.opensolaris.org/os/community/arc/caselog/2002/294 [7] PSARC 2003/556 Common Solaris Target Disk Driver (4961447 s10_55) http://sac.sfbay/PSARC/2003/556 http://www.opensolaris.org/os/community/arc/caselog/2003/556 [8] PSARC 2006/710 scsi_get_device_type_string http://sac.sfbay/PSARC/2006/710 http://www.opensolaris.org/os/community/arc/caselog/2006/710 [9] PSARC 2007/505 [s]sd-config-list version 2 and retry count tuning http://sac.sfbay/PSARC/2007/505 http://www.opensolaris.org/os/community/arc/caselog/2007/505 [10]JavaScript Object Notation http://tools.ietf.org/html/rfc4627 http://www.json.org [11]PSARC 2000/212 libnvpair - A Name Value Pairs Library http://sac.sfbay/PSARC/2000/212 http://www.opensolaris.org/os/community/arc/caselog/2000/212 [12]EMC Powerpath information 4.8 Manpage changes See below. Appendix A: Bit definitions for current [s]sd-config-list (v1) Bit sd (sparc) ssd (Sparc) sd/ssd (x86/x64) 0 max_throttle max_throttle max_throttle 1 controller_type not_ready_retries controller_type 2 not_ready_retries busy_retries fabricate_device_id 3 fabricate_device_id fabricate_device_id disable_caching 4 disable_caching disable_caching play_BCD 5 busy_retries controller_type read_subchannel_BCD 6 play_BCD play_BCD read_TOC_TRK_BCD 7 read_subchannel_BCD read_subchannel_BCD read_TOC_ADDR_BCD 8 read_TOC_TRK_BCD read_TOC_TRK_BCD no_READ_HDR 9 read_TOC_ADDR_BCD read_TOC_ADDR_BCD read_CD_XD4 10 no_READ_HDR no_READ_HDR not_ready_retries 11 read_CD_XD4 read_CD_XD4 busy retries 12 reset_retries reset_retries reset_retries 13 reserv_release_time reserv_release_time reserv_release_time 14 TUR_check TUR_check TUR_check 15 min_throttle min_throttle min_throttle 16 disable_disksort disable_disksort disable_disksort 17 enable_LUN_reset enable_LUN_reset enable_LUN_reset 18 cache_is_nonvolatile cache_is_nonvolatile cache_is_nonvolatile Appendix B: Mapping of new tunable 'names' to current pseudo-names above NEW: CURRENT: cache-nonvolatile cache_is_nonvolatile controller-type controller_type delay-busy disksort disable_disksort timeout-releasereservation reservation_release_time reset-lun enable_LUN_reset retries-busy busy_retries retries-timeout retries-notready not_ready_retries retries-reset reset_retries throttle-max max_throttle throttle-min min_throttle TUR_check disable_caching fabricate_device_id no_READ_HDR play_BCD read_CD_XD4 read_TOC_ADDR_BCD read_TOC_TRK_BCD read_subchannel_BCD Appendix C: Changes to sd(7D) --- sd.orig Fri Jul 18 10:46:12 2008 +++ sd.new Fri Jul 18 11:21:16 2008 @@ -216,10 +216,54 @@ (Note: the default behavior for the SPARC-based sd driver prior to Solaris 9 was not to bind to optical devices.) + In addition to the above properties, some device-specific + tunables can be configured in sd.conf using the 'sd-config-list' + global property. The value of this property is a list of + duplets. The formal syntax is: + + sd-config-list = [, ]* ; + + where + + := "" , "" + + and + + := [, ]*; + = : + + The is the string that is returned by the target device + on a SCSI inquiry command. + + The contains one or more tunables to apply to + for the target devices with the specified . + + Each is a " : " pair. Supported + tunable names are: + + delay-busy: when busy, nsecs of delay before retry. + + retries-timeout: retries to perform on an IO timeout. + +EXAMPLES + + The following is an example of a global sd-config-list property: + + sd-config-list = + "SUN T4", "delay-busy:600, retries-timeout:6", + "SUN StorEdge_3510", "retries-timeout:3"; + FILES /kernel/drv/sd.conf driver configuration file /dev/dsk/cntndnsn block files Appendix D: Changes to ssd(7D) --- ssd.orig Fri Jul 18 10:46:37 2008 +++ ssd.new Fri Jul 18 11:22:05 2008 @@ -97,10 +97,54 @@ iostat(1M) even though the -p/-P option is specified. Regardless of this setting, disk IO statis- tics are always maintained. + In addition to the above properties, some device-specific + tunable can be configured in ssd.conf using the 'ssd-config-list' + global property. The value of this property is a list of + duplets. The formal syntax is: + + ssd-config-list = [, ]* ; + + where + + := "" , "" + + and + + := [, ]*; + = : + + The is the string that is returned by the target device + on a SCSI inquiry command. + + The contains one or more tunables to apply to + for the target devices with the specified . + + Each is a " : " pair. Supported + tunable names are: + + delay-busy: when busy, nsecs of delay before retry. + + retries-timeout: retries to perform on an IO timeout. + +EXAMPLES + + The following is an example of a global ssd-config-list property: + + sd-config-list = + "SUN T4", "delay-busy:600, retries-timeout:6", + "SUN StorEdge_3510", "retries-timeout:3"; + FILES ssd.conf Driver configuration file /dev/dsk/cntndnsn block files 6. Resources and Schedule 6.4. Steering Committee requested information 6.4.1. Consolidation C-team Name: ON 6.5. ARC review type: FastTrack 6.6. ARC Exposure: open 6. Resources and Schedule 6.4. Steering Committee requested information 6.4.1. Consolidation C-team Name: ON 6.5. ARC review type: FastTrack 6.6. ARC Exposure: open