From sacadmin Thu Jan 11 18:20:00 2007 Received: from zion.eng.sun.com (zion.SFBay.Sun.COM [129.146.17.75]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l0C2K0sQ028879 for ; Thu, 11 Jan 2007 18:20:00 -0800 (PST) Received: from [129.146.228.109] (cyber [129.146.228.109]) by zion.eng.sun.com (8.13.7+Sun/8.13.7) with ESMTP id l0C2K0No000833; Thu, 11 Jan 2007 18:20:00 -0800 (PST) Message-ID: <45A6F043.9000500@Sun.COM> Date: Thu, 11 Jan 2007 18:19:47 -0800 From: Bart Smaalders Organization: Sun Microsystems User-Agent: Thunderbird 1.5.0.8 (X11/20061204) MIME-Version: 1.0 To: psarc@sac.sfbay.sun.com CC: prakash.sangappa@Sun.COM, darren.kenny@Sun.COM, alan.bateman@Sun.COM, Michen.Chang@Sun.COM, Doug.Leavitt@Sun.COM Subject: PSARC/2007/027 File Events Notification API Content-Type: multipart/mixed; boundary="------------050609070503000105010101" Status: RO Content-Length: 20868 This is a multi-part message in MIME format. --------------050609070503000105010101 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit I'm sponsoring the attached fast track for Prakash Sangappa. The requested release binding is minor, and the proposed timeout is 1/17/2007. The user interfaces described herein extend the ones introduced in PSARC/2002/498 and have the same stability levels. - Bart -- Bart Smaalders Solaris Kernel Performance barts@cyber.eng.sun.com http://blogs.sun.com/barts --------------050609070503000105010101 Content-Type: text/plain; name="file-events.txt" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="file-events.txt" Copyright 2007 Sun Microsystems, Inc. 1. Introduction 1.1. Project/Component Working Name: File Events Notification API 1.2. Name of Document Author/Supplier: Prakash Sangappa 1.3. Date of This Document: 12/8/06 1.4. Name of Major Document Customer(s)/Consumer(s): 1.4.2. The ARC(s) you expect to review your project: PSARC 1.4.4. The name of your business unit: OPG 1.5. Email Aliases: 1.5.1. Responsible Manager: michael.pogue@sun.com 1.5.2. Responsible Engineer: prakash.sangappa@sun.com 1.5.3. Marketing Manager: N/A 1.5.4. Interest List: bart.smaalders@sun.com darren.kenny@sun.com alan.bateman@sun.com Michen.Chang@Sun.COM Doug.Leavitt@Sun.COM 2. Project Summary 2.1. Project Description: This project delivers API for receiving event notifications when file or directory status changes. The API is based on the event ports interfaces(PSARC 2002/498). The file events notification facility will be added as a new event source to the event ports framework. 2.2. Risks and Assumptions: The file events notifications may not be accurate on distributed file systems like NFS, and on file systems which do not update the time stamps(eg: file system mounted with 'noaccess time' update option or read only file systems). 3. Business Summary 3.1. Problem Area: Some applications have the need to monitor files and directories for changes caused by non communicating processes. The current method is to periodically stat them, which is inefficient. Therefore there is a need for a file/directory monitoring facility. This facility will allow applications to monitor files and directories and receive notification when their status changes. 3.4. Competitive Analysis: Linux(inotify, dnotify), SGI(imon, FAM), Mac OS have flavors of file events notification mechanism. There are some user land file monitoring services implemented using the the kernel file events notification API. The FAM(file alteration monitoring) from SGI and Gamin, which is a simplified version of FAM, are user land implementations of file monitoring services. 4. Technical Description: 4.1. Details: The file events notification facility is implemented as a new event source(PORT_SOURCE_FILE) under the events ports framework. The API is based on the event ports(PSARC 2002/498) API. The object and the event types for this event source are described in the man page changes below. Other implementations of file events notification, like the linux's inotify/dnotify, support queuing events in the kernel and the events provide additional context(like the file name created/deleted). As it has been discussed on the perf-discuss@opensolaris.org alias, queuing events can cause scalability issues. On a large multiuser system, there can be many file operations occurring; as a result the kernel may generate events at a faster rate then the rate at which the application can process them, forcing the events to be queued and thus locking down kernel memory. Since a limit must be imposed on the number of events that can be queued, the application will have to implement a fall back method to handle missed events due to overflow. Some applications, like Beagle and Spotlight (both desktop search tools) require watching all the file and directory activity under a given path(directory tree), so that whenever files get modified/created/deleted, the search indicies for those files get updated. For example, the desktop search application 'Beagle' on Linux uses 'inotify' to watch the directory tree. It walks the directory tree registering a file monitor on each file and directory under it. On a large multiuser system with a large number of files and directories, this approach does not scale, since monitoring very large filesystems will require an inordinate amount of system memory. System scaling trends imply that available storage is growing much faster than system memory. A better solution for such use cases will be to have the filesystem provide a mechanism/interface which would provide the list of files and directories that have been added, modified or deleted since some given time in the past. It appears possible on ZFS to provide this functionality by allowing the user to get the difference between two snapshots; this is anticipated to take order the number of files changed, allowing even arbitrarily large filesystems to be indexed. This project does not propose these interfaces at present; we just want to point out what we know we're not addressing. In the approach we are taking with the file events notifications API, there will be no queuing of events. The event types delivered represent changes to the file's 'access', 'modification' and 'change' time stamps. The events do not provide any other details. This approach is in accordance with what the application can find out by statting a file and comparing its timestamps. The goal is to eliminate the need for applications (such as Nautilus, the Gnome file manager, or daemons monitoring config files) to periodically stat the files of interest. The man page section 2 describes the system calls that update the file/directory time stamps. The vnode operations corresponding these system calls are intercepted and relevant events delivered. The file event monitoring (FEM - PSARC 2003/172) hooks are used to intercept the vnode operations. There can be only one event outstanding per file or directory that is being monitored, i.e upon delivering an event, the file monitor is disabled. The file or directory needs to be re-associated to activate monitoring the file and receive the next event. To ensure that no events get missed in between, time stamps are used. The application has to pass the time stamps collected from a stat(2) call at the time of registering the file monitor. The time stamps passed in are compared against the current time stamps of the file and if they have changed, relevant events are delivered immediately. This behavior enables multithreaded programming using the file events notification API. It will also help filter out redundant events. Example: A multithreaded application can have a pool of threads processing file modification events. If file events were to be continuously delivered after a single registration and a file of interest was written to multiple times, multiple threads would receive change notification events and proceed to process them. This would force these threads to synchronize with each other. Note that when only one event is delivered and the file monitor gets disabled, one thread will be able to collect an event from a file and process it. While this thread is processing the file no other thread will process the same file as the file monitor is disabled and no new events get delivered until the file is re-associated and the file monitoring activated. There will be no need for any type of synchronization as only one thread would be processing the file at a time. Another useful aspect of this design is that rapid writes result in a much reduced set of file notification events; the monitoring application is never subject to a flood of events even if it runs very slowly. The following code snippet illustrates how a mulithreaded application with a pool of worker threads can use this file events notification API to process file status change events. /* * To initiate watching a file, this function can be called * once. The fobj_t structure is initialized with the file * name. The fobj pointer will be passed as the user pointer * to be returned with the event. The 'port' is the * event port fd obtained from a port_create(3C) call. */ int watchfile(int port, file_obj_t *fobj, events) { struct stat sbuf; stat(fobj->name, &sbuf); (fobj->name, events); fobj->fo_atime = sbuf.atim; fobj->fo_mtime = sbuf.mtim; fobj->fo_ctime = sbuf.ctim; return(port_associate(port, PORT_SOURCE_FILE, (uintptr_t)fobj, events, fobj)); } /* * Application threads that process file events call * this function. The file name is in the file_obj_t. * This 'fobj' would be passed in as the 'user pointer' * to be returned along with the event. */ void wait_for_fileevents(int port, events) { port_event_t pe; While (1) { struct file_obj *fobj; if (port_get(port, &pe, NULL) == -1) return; /* * Check for exception events and process file. */ if (!(pe.portev_events & (FILE_EXCEPTION))) { fobj = (file_obj_t)pe.portev_user; if (watchfile(port, fobj, events) == -1) return; } } } 4.2. Bug/RFE Number(s): 6367770 add user land interface to fem (file event monitoring) 4667502 need file system event notification framework for Solaris 4.3. In Scope: N/A 4.4. Out of Scope: N/A 4.5. Interfaces: Proposed man page changes: -------------------------- Changes to port_create(3C) man page: source object type association mechanism PORT_SOURCE_AIO struct aiocb aio_read(3RT), aio_write(3RT), lio_listio(3RT) PORT_SOURCE_FD file descriptor port_associate(3C) PORT_SOURCE_MQ mqd_t mq_notify(3RT) PORT_SOURCE_TIMER timer_t timer_create(3RT) PORT_SOURCE_USER uintptr_t port_send(3C) PORT_SOURCE_ALERT uintptr_t port_alert(3C) + PORT_SOURCE_FILE file_obj_t port_associate(3C) ... + PORT_SOURCE_FILE events represent file/directory status change. Once + an event is delivered, the file object is no longer associated with + the port. A file object is associated or re-associated with a port + using the port_associate(3C) function. Changes to port_associate(3C) man page: - The only objects associated with a port by way of the - port_associate() function are objects of type - PORT_SOURCE_FD. Objects of other types have type-specific - association mechanisms. See port_create(3C) for details. to + The objects that can be associated with a port by way of the + port_associate() function are objects of type PORT_SOURCE_FD + and PORT_SOURCE_FILE. Objects of other types have type-specific + association mechanisms. See port_create(3C) for details. Add the following to port_associate(3C) man page : Objects of type PORT_SOURCE_FILE are pointer to the structure file_obj defined in . This event source provides event notification when the specified file/directory is accessed, modified or its status changes. The path name of the file/directory to be watched is passed in the 'struct file_obj' along with the 'access', 'modification', and 'change' time stamps acquired from a stat(2) call. If the file name is a symbolic link, it is not followed; e.g. the link is monitored. The struct file_obj contains the following elements: timestruc_t fo_atime; /* Access time got from stat() */ timestruc_t fo_mtime; /* Modification time from stat() */ timestruc_t fo_ctime; /* Change time from stat() */ char *fo_name; /* Pointer to a null terminated path name */ At the time the port_associate function is called, the time stamps passed in the structure file_obj are compared with the file or directory's current time stamps and if there has been a change an event is immediately sent to the port. If not, an event will be sent when such a change occurs. The event types that can be specified at port_associate() time for the PORT_SOURCE_FILE are FILE_ACCESS, FILE_MODIFIED, FILE_ATTRIB, corresponding to the three time stamps. A atime change will result in the FILE_ACCESS event, mtime time change will result in the FILE_MODIFIED event. The ctime change will result in the FILE_ATTRIB event. Following exception events are delivered when they occur. These event types cannot be filtered. FILE_DELETE /* Monitored file/directory was deleted */ FILE_RENAME_TO /* Monitored file/directory was renamed */ FILE_RENAME_FROM /* Monitored file/directory was renamed */ UNMOUNTED /* Monitored file system got unmounted */ At most one event notification will be generated per associated 'file_obj'. When the event for the associated 'file_obj' is retrieved, the object is no longer associated with the port. The event can be processed without the possibility that another thread can retrieve a subsequent event for the same object. The port_associate() can be called to re-associate the file_obj object with the port. The association is also removed if the port gets closed or when port_dissociate() is called. Note: On NFS file systems, events from only the client side(local) access/modifications to files or directories will be delivered. Add following to the ERRORS section of port_associate() EACCES The "source" argument is PORT_SOURCE_FILE and, Search permission is denied on a component of path prefix or the file exists and the permissions, corresponding to the "events" argument, are denied. ENOENT The "source" argument is PORT_SOURCE_FILE and the file does not exist or the path prefix does not exist or the path points to an empty string. ENOTSUP The "source" argument is PORT_SOURCE_FILE and the filesystem on which the specified file recides, does not support watching for file events notifications. Add following to the ERRORS section of port_dissociate() EINVAL The "source" argument is PORT_SOURCE_FILE and the specified file is currently not associated with the port(not being watched for file events notifications). Changes to the VOP, FEM interfaces ---------------------------------- In order to correctly identify file events on files having hard links, it is required to pass the directory vnode pointer and the file name component along with the VNEVENT type to VOP_VNEVENT() interface routine. Example: If a file has the following links /tmp/dir1/foo /tmp/dir2/foo and an application is watching /tmp/dir2/foo for file events. When /tmp/dir1/foo gets removed(rm), right now we receive a VN_REMOVE vnevent on the vnode. It is not possible to determine if /tmp/dir1/foo got removed or /tmp/dir2/foo got removed. When the link count is increased/decreased, the ctime gets updated on the file. So, the correct event here on /tmp/dir1/foo should be FILE_ATTRIB indicating 'ctime' change. Where as if /tmp/dir2/foo get removed(rm), then it should receive a FILE_DELETE event as the name /tmp/dir2/foo got removed. This can be determined if the directory vnode pointer and the file name components are passed to the VOP_VNEVENT() interface. Modified VOP and supporting FEM interfaces - Consolidation private --------------------------------------- Two new arguments added, 'vnode_t *dvp' and 'char *cname' VOP_VNEVENT(vnode_t *vp, vnevent_t vnevent, vnode_t *dvp, char *cname) fop_vnevent(vnode_t *vp, vnevent_t vnevent, vnode_t *dvp, char *cname) vnext_vnevent(femarg_t *vf, vnevent_t vnevent, vnode_t *dvp, char *cname) vnevent_rename_src(vnode_t *vp, vnode_t *dvp, char *name) vnevent_rename_dest(vnode_t *vp, vnode_t *dvp, char *name) vnevent_remove(vnode_t *vp, vnode_t *dvp, char *name) vnevent_rmdir(vnode_t *vp, vnode_t *dvp, char *name) New VNEVENT types - Consolidation private: ------------------ VE_CREATE - Represents a create operation on an already existing file. VE_LINK - The source file of a 'link' system call to file. VE_RENAME_DEST_DIR - Destination directory of a rename() operation Corresponding new vnevent routine added: - Consolidation private. ----------------------------------- void vnevent_create(vnode_t *vp) void vnevent_create(vnode_t *vp) void vnevent_rename_dest_dir(vnode_t *vp) New member added to private section of 'vnode.h' ----------------------------------------------- + void *v_fopdata; /* file events notification - private data */ VNEVENT support in NFS: ----------------------- Added VNEVENTS support to the NFS file system to report client side file events. It is used to catch any local(client side) file operations on a NFS file system and report file events. Clearly, this is not complete as it will not be able to catch any of the server side file operations. This is documented in the man page. 4.6. Doc Impact: port_associate(3C) - man page port_create(3C) - man page 5. Reference Documents: project page - http://perf.eng.sun.com/twiki/bin/view/EventPorts/EPFileEvents PSARC/2002/498 - Event Completion Framework PSARC/2003/172 - FEM (File event Monitoring) PSARC/2004/170 - VOP_VNEVENT() 6. Resources and Schedule: 6.1. Projected Availability: S11 6.2. Cost of Effort: Development is largely done. Test case development - 1 week. 6.3. Cost of Capital Resources: N/A 6.4. Product Approval Committee requested information: 6.4.1. Consolidation or Component Name: ON 6.4.3. Type of CPT Review and Approval expected: RFE 6.4.4. Project Boundary Conditions: N/A 6.4.5. Is this a necessary project for OEM agreements: No 6.4.6. Notes: N/A 6.4.7. Target RTI Date/Release: 6.4.8. Target Code Design Review Date: 6.4.9. Update approval addition: No 6.5. ARC review type: FastTrack 7. Prototype Availability: 7.1. Prototype Availability: now --------------050609070503000105010101-- From sacadmin Fri Jan 12 10:54:48 2007 Received: from jurassic.eng.sun.com (jurassic.SFBay.Sun.COM [129.146.17.57]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l0CIsmjs017024 for ; Fri, 12 Jan 2007 10:54:48 -0800 (PST) Received: from [129.145.154.94] (sr1-umpk-44.SFBay.Sun.COM [129.145.154.94]) by jurassic.eng.sun.com (8.13.8+Sun/8.13.8) with ESMTP id l0CIsl3Q731455; Fri, 12 Jan 2007 10:54:47 -0800 (PST) Message-ID: <45A7D977.4010801@Sun.COM> Date: Fri, 12 Jan 2007 10:54:47 -0800 From: Kais Belgaied User-Agent: Mozilla/5.0 (X11; U; SunOS sun4v; en-US; rv:1.7) Gecko/20060120 X-Accept-Language: ar-eg, en-us, en, ar, ar-dz, ar-bh, ar-iq, ar-jo, ar-kw, ar-lb, ar-ly, ar-ma, ar-om, ar-qa, ar-sa, ar-sy, ar-tn, ar-ae, ar-ye MIME-Version: 1.0 To: Bart Smaalders CC: psarc@sac.sfbay.sun.com, prakash.sangappa@Sun.COM, darren.kenny@Sun.COM, alan.bateman@Sun.COM, Michen.Chang@Sun.COM, Doug.Leavitt@Sun.COM Subject: Re: PSARC/2007/027 File Events Notification API References: <45A6F043.9000500@Sun.COM> In-Reply-To: <45A6F043.9000500@Sun.COM> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Status: RO Content-Length: 1291 Is this case delivering changes to any consumer for this API? Many subsystems in Solaris have their own internal monitor-config-change then refresh/reload logic and could be ported to this. > There can be only one event outstanding per file or directory that is > being monitored, i.e upon delivering an event, the file monitor is disabled. > The file or directory needs to be re-associated to activate monitoring the > file and receive the next event. To ensure that no events get missed in > > Does this mean at most one application can be monitorig a given file or directory at any given time? > between, time stamps are used. The application has to pass the time stamps > collected from a stat(2) call at the time of registering the file monitor. > The time stamps passed in are compared against the current time stamps of > the file and if they have changed, relevant events are delivered > immediately. This behavior enables multithreaded programming using > the file events notification API. It will also help filter out redundant > events. > > > > Are there any auditing considerations here? when a application signs up for event notification on a file that it has no stat access to? or when the event is actually delivered? Kais. From sacadmin Fri Jan 12 11:41:12 2007 Received: from jurassic.eng.sun.com (jurassic.SFBay.Sun.COM [129.146.17.57]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l0CJfCrO018002 for ; Fri, 12 Jan 2007 11:41:12 -0800 (PST) Received: from [129.146.228.98] (justforkicks.Eng.Sun.COM [129.146.228.98]) by jurassic.eng.sun.com (8.13.8+Sun/8.13.8) with ESMTP id l0CJfBOd745409; Fri, 12 Jan 2007 11:41:11 -0800 (PST) Message-ID: <45A7E387.8040406@sun.com> Date: Fri, 12 Jan 2007 11:37:43 -0800 From: Prakash Sangappa User-Agent: Mail/News 1.5.0.4 (X11/20060701) MIME-Version: 1.0 To: Kais Belgaied CC: Bart Smaalders , psarc@sac.sfbay.sun.com, darren.kenny@sun.com, alan.bateman@sun.com, Michen.Chang@sun.com, Doug.Leavitt@sun.com Subject: Re: PSARC/2007/027 File Events Notification API References: <45A6F043.9000500@Sun.COM> <45A7D977.4010801@Sun.COM> In-Reply-To: <45A7D977.4010801@Sun.COM> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Status: RO Content-Length: 2023 Kais Belgaied wrote: > Is this case delivering changes to any consumer for this API? > Many subsystems in Solaris have their own internal > monitor-config-change then refresh/reload > logic and could be ported to this. > The NSCD daemon does exactly that. The plan is to make NSCD use these API and deliver it. Working with the Michen Chan and Doug Leavitt to make the necessary changes to NSCD. >> There can be only one event outstanding per file or directory that is >> being monitored, i.e upon delivering an event, the file monitor is >> disabled. >> The file or directory needs to be re-associated to activate >> monitoring the >> file and receive the next event. To ensure that no events get >> missed in >> >> > > Does this mean at most one application can be monitorig a given file > or directory > at any given time? No, there will be one event outstanding per file events watch registration. The file events registrations are per process. Therefore More then one process can be watching the same file. When this file status changes, each of these processes will receive an event notification, depending on the event types the process is interested in. > >> between, time stamps are used. The application has to pass the >> time stamps >> collected from a stat(2) call at the time of registering the file >> monitor. >> The time stamps passed in are compared against the current time >> stamps of >> the file and if they have changed, relevant events are delivered >> immediately. This behavior enables multithreaded programming using >> the file events notification API. It will also help filter out >> redundant >> events. >> >> >> > > Are there any auditing considerations here? when a application signs > up for event notification on a file > that it has no stat access to? or when the event is actually delivered? > No, Is this a requirement? The behavior should be similar to what happens when an application stat's a file. -Prakash. > Kais. From sacadmin Fri Jan 12 11:49:44 2007 Received: from binky.Central.Sun.COM (binky.Central.Sun.COM [129.153.128.104]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l0CJnhdk018240 for ; Fri, 12 Jan 2007 11:49:43 -0800 (PST) Received: from binky.Central.Sun.COM (localhost [127.0.0.1]) by binky.Central.Sun.COM (8.13.6+Sun/8.13.6) with ESMTP id l0CJnN1V012752; Fri, 12 Jan 2007 13:49:23 -0600 (CST) Received: (from nw141292@localhost) by binky.Central.Sun.COM (8.13.6+Sun/8.13.6/Submit) id l0CJnNKn012751; Fri, 12 Jan 2007 13:49:23 -0600 (CST) Date: Fri, 12 Jan 2007 13:49:23 -0600 From: Nicolas Williams To: Kais Belgaied Cc: Bart Smaalders , psarc@sac.sfbay.sun.com, prakash.sangappa@sun.com, Darren.Kenny@sun.com, Alan.Bateman@sun.com, Michen.Chang@sun.com, Doug.Leavitt@sun.com Subject: Re: PSARC/2007/027 File Events Notification API Message-ID: <20070112194922.GY1010@binky.Central.Sun.COM> References: <45A6F043.9000500@Sun.COM> <45A7D977.4010801@Sun.COM> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <45A7D977.4010801@Sun.COM> User-Agent: Mutt/1.5.7i Status: RO Content-Length: 992 On Fri, Jan 12, 2007 at 10:54:47AM -0800, Kais Belgaied wrote: > Is this case delivering changes to any consumer for this API? > Many subsystems in Solaris have their own internal monitor-config-change > then refresh/reload > logic and could be ported to this. If only one Solaris component were to be modified initially to use this facility, then that component should be SMF (for file: dependencies FMRIs). Please, pretty please, with sugar on top. > Are there any auditing considerations here? when a application signs up > for event notification on a file > that it has no stat access to? or when the event is actually delivered? I would imagine that normal file permissions/ACLs should take care of this. Whereas an API where the app could get events for an entire filesystem without having to register every node in the filesystem should probably require privilege, rather than normal access controls (or perhaps only perform access controls on the FS root directory). Nico -- From sacadmin Fri Jan 12 18:11:45 2007 Received: from jurassic.eng.sun.com (jurassic.SFBay.Sun.COM [129.146.226.130]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l0D2Bglg001505 for ; Fri, 12 Jan 2007 18:11:45 -0800 (PST) Received: from [129.145.154.94] (sr1-umpk-44.SFBay.Sun.COM [129.145.154.94]) by jurassic.eng.sun.com (8.13.8+Sun/8.13.8) with ESMTP id l0D2BV15838907; Fri, 12 Jan 2007 18:11:32 -0800 (PST) Message-ID: <45A83FD3.4020909@Sun.COM> Date: Fri, 12 Jan 2007 18:11:31 -0800 From: Kais Belgaied User-Agent: Mozilla/5.0 (X11; U; SunOS sun4v; en-US; rv:1.7) Gecko/20060120 X-Accept-Language: ar-eg, en-us, en, ar, ar-dz, ar-bh, ar-iq, ar-jo, ar-kw, ar-lb, ar-ly, ar-ma, ar-om, ar-qa, ar-sa, ar-sy, ar-tn, ar-ae, ar-ye MIME-Version: 1.0 To: Prakash Sangappa CC: Bart Smaalders , psarc@sac.sfbay.sun.com, darren.kenny@Sun.COM, alan.bateman@Sun.COM, Michen.Chang@Sun.COM, Doug.Leavitt@Sun.COM Subject: Re: PSARC/2007/027 File Events Notification API References: <45A6F043.9000500@Sun.COM> <45A7D977.4010801@Sun.COM> <45A7E387.8040406@sun.com> In-Reply-To: <45A7E387.8040406@sun.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Status: RO Content-Length: 636 >> >> Are there any auditing considerations here? when a application signs >> up for event notification on a file >> that it has no stat access to? or when the event is actually delivered? >> > > No, Is this a requirement? The behavior should be similar to what > happens > when an application stat's a file. If you're adding a mechanism for processes to get the same information about files as what they would by directly stat()'ing the file, then the same audit record aught to be generated. you don't want the file event notification to be a way to circumvent getting audited. Kais > > -Prakash. > > >> Kais. > > From sacadmin Tue Jan 16 09:32:19 2007 Received: from jurassic.eng.sun.com (jurassic.SFBay.Sun.COM [129.146.108.31]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l0GHWJ3x004632 for ; Tue, 16 Jan 2007 09:32:19 -0800 (PST) Received: from [192.9.61.136] (punchin-client-192-9-61-136.SFBay.Sun.COM [192.9.61.136]) by jurassic.eng.sun.com (8.13.8+Sun/8.13.8) with ESMTP id l0GHWJOW242512; Tue, 16 Jan 2007 09:32:19 -0800 (PST) Message-ID: <45AD0C20.3010505@sun.com> Date: Tue, 16 Jan 2007 09:32:16 -0800 From: prakash sangappa User-Agent: Thunderbird 1.5 (X11/20060113) MIME-Version: 1.0 To: Kais Belgaied CC: Bart Smaalders , psarc@sac.sfbay.sun.com, darren.kenny@sun.com, alan.bateman@sun.com, Michen.Chang@sun.com, Doug.Leavitt@sun.com Subject: Re: PSARC/2007/027 File Events Notification API References: <45A6F043.9000500@Sun.COM> <45A7D977.4010801@Sun.COM> <45A7E387.8040406@sun.com> <45A83FD3.4020909@Sun.COM> In-Reply-To: <45A83FD3.4020909@Sun.COM> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Status: RO Content-Length: 442 Kais Belgaied wrote: > > > If you're adding a mechanism for processes to get the same information > about files as > what they would by directly stat()'ing the file, then the same audit > record aught to be generated. > you don't want the file event notification to be a way to circumvent > getting audited. > Ok, I will add the appropriate audit calls to the code path. -Prakash. > Kais > >> >> -Prakash. >> >> >>> Kais. >> >> From sacadmin Tue Jan 16 09:39:12 2007 Received: from sfbaymail1sca.SFBay.Sun.COM (sfbaymail1sca.SFBay.Sun.COM [129.145.154.35]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l0GHdClx005104 for ; Tue, 16 Jan 2007 09:39:12 -0800 (PST) Received: from gmp-ea-fw-1.sun.com (gmpes-gis-mail-2.UK.Sun.COM [129.156.42.6]) by sfbaymail1sca.SFBay.Sun.COM (8.13.6+Sun/8.13.6/ENSMAIL,v2.2) with ESMTP id l0GHdB9A026032 for ; Tue, 16 Jan 2007 09:39:11 -0800 (PST) Received: from d1-emea-10.sun.com ([192.18.2.120]) by gmp-ea-fw-1.sun.com (8.13.6+Sun/8.12.9) with ESMTP id l0GHd58G000556 for ; Tue, 16 Jan 2007 17:39:05 GMT Received: from conversion-daemon.d1-emea-10.sun.com by d1-emea-10.sun.com (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) id <0JBZ00I012CFEY00@d1-emea-10.sun.com> (original mail from Darren.Moffat@Sun.COM) for psarc@sac.sfbay.sun.com; Tue, 16 Jan 2007 17:39:05 +0000 (GMT) Received: from [129.156.173.21] by d1-emea-10.sun.com (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) with ESMTPSA id <0JBZ006AE2D4HZ10@d1-emea-10.sun.com>; Tue, 16 Jan 2007 17:39:05 +0000 (GMT) Date: Tue, 16 Jan 2007 17:39:04 +0000 From: Darren J Moffat Subject: Re: PSARC/2007/027 File Events Notification API In-reply-to: <45AD0C20.3010505@sun.com> Sender: Darren.Moffat@Sun.COM To: prakash sangappa Cc: Kais Belgaied , Bart Smaalders , psarc@sac.sfbay.sun.com, Darren.Kenny@Sun.COM, Alan.Bateman@Sun.COM, Michen.Chang@Sun.COM, Doug.Leavitt@Sun.COM Message-id: <45AD0DB8.4080708@Sun.COM> MIME-version: 1.0 Content-type: text/plain; format=flowed; charset=ISO-8859-1 Content-transfer-encoding: 7BIT References: <45A6F043.9000500@Sun.COM> <45A7D977.4010801@Sun.COM> <45A7E387.8040406@sun.com> <45A83FD3.4020909@Sun.COM> <45AD0C20.3010505@sun.com> User-Agent: Thunderbird 1.5.0.8 (X11/20061128) Status: RO Content-Length: 852 prakash sangappa wrote: > Kais Belgaied wrote: >> >> >> If you're adding a mechanism for processes to get the same information >> about files as >> what they would by directly stat()'ing the file, then the same audit >> record aught to be generated. >> you don't want the file event notification to be a way to circumvent >> getting audited. >> > > Ok, I will add the appropriate audit calls to the code path. It might not be that easy. We first need to determine if there should be new audit events or if use of the same event already used for stat is acceptable or not. Please discuss this with audit-core@sun.com, until we have determine this the spec for this case isn't complete. In general every new system call or modification to one could have an impact on auditing and may require a new or updated audit event. -- Darren J Moffat From sacadmin Tue Jan 16 14:02:47 2007 Received: from sfbaymail1sca.SFBay.Sun.COM (sfbaymail1sca.SFBay.Sun.COM [129.145.154.35]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l0GM2lV6015040 for ; Tue, 16 Jan 2007 14:02:47 -0800 (PST) Received: from gmp-ea-fw-1.sun.com (gmpes-gis-mail-1.UK.Sun.COM [129.156.42.5]) by sfbaymail1sca.SFBay.Sun.COM (8.13.6+Sun/8.13.6/ENSMAIL,v2.2) with ESMTP id l0GM2kHN029669 for ; Tue, 16 Jan 2007 14:02:47 -0800 (PST) Received: from d1-emea-09.sun.com (d1-emea-09.sun.com [192.18.2.119]) by gmp-ea-fw-1.sun.com (8.13.6+Sun/8.12.9) with ESMTP id l0GM2fBI019479 for ; Tue, 16 Jan 2007 22:02:41 GMT Received: from conversion-daemon.d1-emea-09.sun.com by d1-emea-09.sun.com (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) id <0JBZ00601EGO9U00@d1-emea-09.sun.com> (original mail from Mark.Phalan@Sun.COM) for psarc@sac.sfbay.sun.com; Tue, 16 Jan 2007 22:02:40 +0000 (GMT) Received: from [192.168.1.34] ([193.85.70.14]) by d1-emea-09.sun.com (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) with ESMTPSA id <0JBZ00M01EKE0JN3@d1-emea-09.sun.com>; Tue, 16 Jan 2007 22:02:40 +0000 (GMT) Date: Tue, 16 Jan 2007 22:59:50 +0100 From: Mark Phalan Subject: Re: PSARC/2007/027 File Events Notification API In-reply-to: <45A6F043.9000500@Sun.COM> Sender: Mark.Phalan@Sun.COM To: Bart Smaalders Cc: psarc@sac.sfbay.sun.com, prakash.sangappa@Sun.COM, Darren.Kenny@Sun.COM, Alan.Bateman@Sun.COM, Michen.Chang@Sun.COM, Doug.Leavitt@Sun.COM Message-id: <1168984790.777.12.camel@localhost> MIME-version: 1.0 X-Mailer: Evolution 2.8.1.1 Content-type: text/plain Content-transfer-encoding: 7BIT References: <45A6F043.9000500@Sun.COM> Status: RO Content-Length: 15513 On Thu, 2007-01-11 at 18:19 -0800, Bart Smaalders wrote: > I'm sponsoring the attached fast track for Prakash Sangappa. > The requested release binding is minor, and the proposed timeout > is 1/17/2007. The user interfaces described herein extend the > ones introduced in PSARC/2002/498 and have the same stability > levels. > > - Bart > > plain text document attachment (file-events.txt) > Copyright 2007 Sun Microsystems, Inc. > > 1. Introduction > 1.1. Project/Component Working Name: > File Events Notification API > > 1.2. Name of Document Author/Supplier: Prakash Sangappa > > 1.3. Date of This Document: 12/8/06 > > 1.4. Name of Major Document Customer(s)/Consumer(s): > 1.4.2. The ARC(s) you expect to review your project: PSARC > 1.4.4. The name of your business unit: OPG > > 1.5. Email Aliases: > 1.5.1. Responsible Manager: michael.pogue@sun.com > 1.5.2. Responsible Engineer: prakash.sangappa@sun.com > 1.5.3. Marketing Manager: N/A > 1.5.4. Interest List: bart.smaalders@sun.com > darren.kenny@sun.com > alan.bateman@sun.com > Michen.Chang@Sun.COM > Doug.Leavitt@Sun.COM > > 2. Project Summary > 2.1. Project Description: > > This project delivers API for receiving event notifications when > file or directory status changes. The API is based on the event > ports interfaces(PSARC 2002/498). The file events notification > facility will be added as a new event source to the event ports > framework. > > 2.2. Risks and Assumptions: > > The file events notifications may not be accurate on distributed > file systems like NFS, and on file systems which do not update the > time stamps(eg: file system mounted with 'noaccess time' update > option or read only file systems). > > 3. Business Summary > 3.1. Problem Area: > > Some applications have the need to monitor files and directories for > changes caused by non communicating processes. The current method is > to periodically stat them, which is inefficient. Therefore there is > a need for a file/directory monitoring facility. This facility will > allow applications to monitor files and directories and receive > notification when their status changes. > > 3.4. Competitive Analysis: > > Linux(inotify, dnotify), SGI(imon, FAM), Mac OS have flavors of file > events notification mechanism. There are some user land file monitoring > services implemented using the the kernel file events notification > API. The FAM(file alteration monitoring) from SGI and Gamin, which is > a simplified version of FAM, are user land implementations of file > monitoring services. > > 4. Technical Description: > 4.1. Details: > > The file events notification facility is implemented as a new event > source(PORT_SOURCE_FILE) under the events ports framework. The API is > based on the event ports(PSARC 2002/498) API. The object and the event > types for this event source are described in the man page changes below. > > Other implementations of file events notification, like the linux's > inotify/dnotify, support queuing events in the kernel and the events > provide additional context(like the file name created/deleted). > > As it has been discussed on the perf-discuss@opensolaris.org alias, > queuing events can cause scalability issues. On a large multiuser system, > there can be many file operations occurring; as a result the > kernel may generate events at a faster rate then the rate at which the > application can process them, forcing the events to be queued > and thus locking down kernel memory. Since a limit must be imposed > on the number of events that can be queued, the application will > have to implement a fall back method to handle missed events due to > overflow. > > Some applications, like Beagle and Spotlight (both desktop search tools) > require watching all the file and directory activity under a given > path(directory tree), so that whenever files get modified/created/deleted, > the search indicies for those files get updated. > > For example, the desktop search application 'Beagle' on Linux uses > 'inotify' to watch the directory tree. It walks the directory tree > registering a file monitor on each file and directory under it. > > On a large multiuser system with a large number of files and directories, > this approach does not scale, since monitoring very large filesystems will > require an inordinate amount of system memory. System scaling trends > imply that available storage is growing much faster than system memory. > A better solution for such use cases will be to have the filesystem > provide a mechanism/interface which would provide the list of files and > directories that have been added, modified or deleted since some given > time in the past. It appears possible on ZFS to provide this functionality > by allowing the user to get the difference between two snapshots; this is > anticipated to take order the number of files changed, allowing even > arbitrarily large filesystems to be indexed. This project does not > propose these interfaces at present; we just want to point out what we > know we're not addressing. > > In the approach we are taking with the file events notifications API, > there will be no queuing of events. The event types delivered represent > changes to the file's 'access', 'modification' and 'change' time stamps. > The events do not provide any other details. This approach is in > accordance with what the application can find out by statting a file > and comparing its timestamps. The goal is to eliminate the need for > applications (such as Nautilus, the Gnome file manager, or daemons > monitoring config files) to periodically stat the files of interest. > > The man page section 2 describes the system calls that update the > file/directory time stamps. The vnode operations corresponding these > system calls are intercepted and relevant events delivered. > The file event monitoring (FEM - PSARC 2003/172) hooks are used to > intercept the vnode operations. > > There can be only one event outstanding per file or directory that is > being monitored, i.e upon delivering an event, the file monitor is disabled. > The file or directory needs to be re-associated to activate monitoring the > file and receive the next event. To ensure that no events get missed in > between, time stamps are used. The application has to pass the time stamps > collected from a stat(2) call at the time of registering the file monitor. > The time stamps passed in are compared against the current time stamps of > the file and if they have changed, relevant events are delivered > immediately. This behavior enables multithreaded programming using > the file events notification API. It will also help filter out redundant > events. > > Example: A multithreaded application can have a pool of threads processing > file modification events. If file events were to be continuously > delivered after a single registration and a file of interest > was written to multiple times, multiple threads would receive > change notification events and proceed to process them. This > would force these threads to synchronize with each other. Note > that when only one event is delivered and the file monitor gets > disabled, one thread will be able to collect an event from a file > and process it. While this thread is processing the file no other > thread will process the same file as the file monitor is disabled > and no new events get delivered until the file is re-associated > and the file monitoring activated. There will be no need for any > type of synchronization as only one thread would be processing > the file at a time. Another useful aspect of this design is that > rapid writes result in a much reduced set of file notification > events; the monitoring application is never subject to a flood of > events even if it runs very slowly. > > The following code snippet illustrates how a mulithreaded > application with a pool of worker threads can use this file > events notification API to process file status change events. > > /* > * To initiate watching a file, this function can be called > * once. The fobj_t structure is initialized with the file > * name. The fobj pointer will be passed as the user pointer > * to be returned with the event. The 'port' is the > * event port fd obtained from a port_create(3C) call. > */ > int > watchfile(int port, file_obj_t *fobj, events) { > struct stat sbuf; > > stat(fobj->name, &sbuf); > > (fobj->name, events); > > fobj->fo_atime = sbuf.atim; > fobj->fo_mtime = sbuf.mtim; > fobj->fo_ctime = sbuf.ctim; > > return(port_associate(port, PORT_SOURCE_FILE, > (uintptr_t)fobj, events, fobj)); > } > > /* > * Application threads that process file events call > * this function. The file name is in the file_obj_t. > * This 'fobj' would be passed in as the 'user pointer' > * to be returned along with the event. > */ > void > wait_for_fileevents(int port, events) { > > port_event_t pe; > > While (1) { > struct file_obj *fobj; > if (port_get(port, &pe, NULL) == -1) > return; > > /* > * Check for exception events and process file. > */ > if (!(pe.portev_events & (FILE_EXCEPTION))) { > fobj = (file_obj_t)pe.portev_user; > > if (watchfile(port, fobj, events) == -1) > return; > } > } > } > > > 4.2. Bug/RFE Number(s): > 6367770 add user land interface to fem (file event monitoring) > 4667502 need file system event notification framework for Solaris > > 4.3. In Scope: > N/A > > 4.4. Out of Scope: > N/A > > 4.5. Interfaces: > > Proposed man page changes: > -------------------------- > > Changes to port_create(3C) man page: > > source object type association mechanism > PORT_SOURCE_AIO struct aiocb aio_read(3RT), > aio_write(3RT), > lio_listio(3RT) > PORT_SOURCE_FD file descriptor port_associate(3C) > PORT_SOURCE_MQ mqd_t mq_notify(3RT) > PORT_SOURCE_TIMER timer_t timer_create(3RT) > PORT_SOURCE_USER uintptr_t port_send(3C) > PORT_SOURCE_ALERT uintptr_t port_alert(3C) > + PORT_SOURCE_FILE file_obj_t port_associate(3C) > > ... > > + PORT_SOURCE_FILE events represent file/directory status change. Once > + an event is delivered, the file object is no longer associated with > + the port. A file object is associated or re-associated with a port > + using the port_associate(3C) function. > > > Changes to port_associate(3C) man page: > > - The only objects associated with a port by way of the > - port_associate() function are objects of type > - PORT_SOURCE_FD. Objects of other types have type-specific > - association mechanisms. See port_create(3C) for details. > > to > > + The objects that can be associated with a port by way of the > + port_associate() function are objects of type PORT_SOURCE_FD > + and PORT_SOURCE_FILE. Objects of other types have type-specific > + association mechanisms. See port_create(3C) for details. > > > Add the following to port_associate(3C) man page : > > Objects of type PORT_SOURCE_FILE are pointer to the structure > file_obj defined in . This event source provides > event notification when the specified file/directory is accessed, > modified or its status changes. The path name of the file/directory > to be watched is passed in the 'struct file_obj' along with the > 'access', 'modification', and 'change' time stamps acquired from > a stat(2) call. If the file name is a symbolic link, it is not > followed; e.g. the link is monitored. > > The struct file_obj contains the following elements: > > timestruc_t fo_atime; /* Access time got from stat() */ > timestruc_t fo_mtime; /* Modification time from stat() */ > timestruc_t fo_ctime; /* Change time from stat() */ > char *fo_name; /* Pointer to a null terminated path name */ > > At the time the port_associate function is called, the time stamps > passed in the structure file_obj are compared with the file or > directory's current time stamps and if there has been a change > an event is immediately sent to the port. If not, an event will be > sent when such a change occurs. > > The event types that can be specified at port_associate() time for > the PORT_SOURCE_FILE are FILE_ACCESS, FILE_MODIFIED, FILE_ATTRIB, > corresponding to the three time stamps. A atime change will result > in the FILE_ACCESS event, mtime time change will result in the > FILE_MODIFIED event. The ctime change will result in the FILE_ATTRIB > event. > > Following exception events are delivered when they occur. These > event types cannot be filtered. > > FILE_DELETE /* Monitored file/directory was deleted */ > FILE_RENAME_TO /* Monitored file/directory was renamed */ > FILE_RENAME_FROM /* Monitored file/directory was renamed */ > UNMOUNTED /* Monitored file system got unmounted */ > Can non-existant files be monitored for creation, i.e. set up a watch on a file-path which doesn't exist (yet)? If not, is the recommended way to monitor the directory in which the file might be created? This may not be desirable - think of monitoring /etc/. It might be desireable to have a file creation event. -Mark From sacadmin Tue Jan 16 16:52:52 2007 Received: from zion.eng.sun.com (zion.SFBay.Sun.COM [129.146.17.75]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l0H0qqrQ020459 for ; Tue, 16 Jan 2007 16:52:52 -0800 (PST) Received: from [129.146.228.109] (cyber [129.146.228.109]) by zion.eng.sun.com (8.13.7+Sun/8.13.7) with ESMTP id l0H0qpNP023913; Tue, 16 Jan 2007 16:52:52 -0800 (PST) Message-ID: <45AD7355.6010003@Sun.COM> Date: Tue, 16 Jan 2007 16:52:37 -0800 From: Bart Smaalders Organization: Sun Microsystems User-Agent: Thunderbird 1.5.0.8 (X11/20061204) MIME-Version: 1.0 To: Mark Phalan CC: psarc@sac.sfbay.sun.com, prakash.sangappa@Sun.COM, Darren.Kenny@Sun.COM, Alan.Bateman@Sun.COM, Michen.Chang@Sun.COM, Doug.Leavitt@Sun.COM Subject: Re: PSARC/2007/027 File Events Notification API References: <45A6F043.9000500@Sun.COM> <1168984790.777.12.camel@localhost> In-Reply-To: <1168984790.777.12.camel@localhost> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Status: RO Content-Length: 743 Mark Phalan wrote: > Can non-existant files be monitored for creation, i.e. set up a watch on > a file-path which doesn't exist (yet)? If not, is the recommended way to > monitor the directory in which the file might be created? This may not > be desirable - think of monitoring /etc/. > It might be desireable to have a file creation event. /etc isn't exactly a hot-bed of activity on my system. We could create an API like this, but we'd end up doing exactly the same thing in the kernel that you'd end up doing in userland - waiting for the directory to change and then re-reading it. Do you have a specific use case in mind? - Bart -- Bart Smaalders Solaris Kernel Performance barts@cyber.eng.sun.com http://blogs.sun.com/barts From sacadmin Wed Jan 17 08:29:51 2007 Received: from sfbaymail2sca.sfbay.sun.com (sfbaymail2sca.SFBay.Sun.COM [129.145.155.42]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l0HGTp3R007806 for ; Wed, 17 Jan 2007 08:29:51 -0800 (PST) Received: from gmp-ea-fw-1.sun.com (gmpes-gis-mail-1.UK.Sun.COM [129.156.42.5]) by sfbaymail2sca.sfbay.sun.com (8.13.6+Sun/8.12.10/ENSMAIL,v2.2) with ESMTP id l0HGTof4022505 for ; Wed, 17 Jan 2007 08:29:50 -0800 (PST) Received: from d1-emea-09.sun.com (d1-emea-09.sun.com [192.18.2.119]) by gmp-ea-fw-1.sun.com (8.13.6+Sun/8.12.9) with ESMTP id l0HGTiEB005776 for ; Wed, 17 Jan 2007 16:29:44 GMT Received: from conversion-daemon.d1-emea-09.sun.com by d1-emea-09.sun.com (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) id <0JC000G01TNT6V00@d1-emea-09.sun.com> (original mail from Mark.Phalan@Sun.COM) for psarc@sac.sfbay.sun.com; Wed, 17 Jan 2007 16:29:44 +0000 (GMT) Received: from [129.157.18.58] by d1-emea-09.sun.com (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) with ESMTPSA id <0JC000ML5TTI0NJS@d1-emea-09.sun.com>; Wed, 17 Jan 2007 16:29:43 +0000 (GMT) Date: Wed, 17 Jan 2007 17:27:35 +0100 From: Mark Phalan Subject: Re: PSARC/2007/027 File Events Notification API In-reply-to: <45AD7355.6010003@Sun.COM> Sender: Mark.Phalan@Sun.COM To: Bart Smaalders Cc: psarc@sac.sfbay.sun.com, prakash.sangappa@Sun.COM, Darren.Kenny@Sun.COM, Alan.Bateman@Sun.COM, Michen.Chang@Sun.COM, Doug.Leavitt@Sun.COM Message-id: <1169051255.2348.19.camel@phalan.czech.sun.com> MIME-version: 1.0 X-Mailer: Evolution 2.8.1.1 Content-type: text/plain Content-transfer-encoding: 7BIT References: <45A6F043.9000500@Sun.COM> <1168984790.777.12.camel@localhost> <45AD7355.6010003@Sun.COM> Status: RO Content-Length: 1375 On Tue, 2007-01-16 at 16:52 -0800, Bart Smaalders wrote: > Mark Phalan wrote: > > Can non-existant files be monitored for creation, i.e. set up a watch on > > a file-path which doesn't exist (yet)? If not, is the recommended way to > > monitor the directory in which the file might be created? This may not > > be desirable - think of monitoring /etc/. > > It might be desireable to have a file creation event. > > > /etc isn't exactly a hot-bed of activity on my system. We could > create an API like this, but we'd end up doing exactly the same thing > in the kernel that you'd end up doing in userland - waiting for the > directory to change and then re-reading it. > Ok. It just seemed like a piece missing from the API. I guess it also depends on how common this scenario is, if everyone ends up doing it it might make sense to have a single implementation. > Do you have a specific use case in mind This came up with gssd which when used with the Kerberos GSS mech relies on /etc/resolv.conf. An SMF file dependency is probably not ideal as not all the GSS mechs rely on /etc/resolv.conf. At the moment gssd needs to be restarted if /etc/resolv.conf pops into existence. It would be nice to be able to get a file creation event for that. I agree that its possible to simply "watch" /etc/ for changes but it seems rather clumsy... -Mark > ? > > - Bart > > From sacadmin Wed Jan 17 10:00:21 2007 Received: from zion.eng.sun.com (zion.SFBay.Sun.COM [129.146.17.75]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l0HI0Km5010637 for ; Wed, 17 Jan 2007 10:00:20 -0800 (PST) Received: from [129.146.228.109] (cyber [129.146.228.109]) by zion.eng.sun.com (8.13.7+Sun/8.13.7) with ESMTP id l0HI0K4v020266; Wed, 17 Jan 2007 10:00:20 -0800 (PST) Message-ID: <45AE6425.1050007@Sun.COM> Date: Wed, 17 Jan 2007 10:00:05 -0800 From: Bart Smaalders Organization: Sun Microsystems User-Agent: Thunderbird 1.5.0.8 (X11/20061204) MIME-Version: 1.0 To: Mark Phalan CC: psarc@sac.sfbay.sun.com, prakash.sangappa@Sun.COM, Darren.Kenny@Sun.COM, Alan.Bateman@Sun.COM, Michen.Chang@Sun.COM, Doug.Leavitt@Sun.COM Subject: Re: PSARC/2007/027 File Events Notification API References: <45A6F043.9000500@Sun.COM> <1168984790.777.12.camel@localhost> <45AD7355.6010003@Sun.COM> <1169051255.2348.19.camel@phalan.czech.sun.com> In-Reply-To: <1169051255.2348.19.camel@phalan.czech.sun.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Status: RO Content-Length: 2333 Mark Phalan wrote: > On Tue, 2007-01-16 at 16:52 -0800, Bart Smaalders wrote: >> Mark Phalan wrote: >>> Can non-existant files be monitored for creation, i.e. set up a watch on >>> a file-path which doesn't exist (yet)? If not, is the recommended way to >>> monitor the directory in which the file might be created? This may not >>> be desirable - think of monitoring /etc/. >>> It might be desireable to have a file creation event. >> >> /etc isn't exactly a hot-bed of activity on my system. We could >> create an API like this, but we'd end up doing exactly the same thing >> in the kernel that you'd end up doing in userland - waiting for the >> directory to change and then re-reading it. >> > > Ok. It just seemed like a piece missing from the API. I guess it also > depends on how common this scenario is, if everyone ends up doing it it > might make sense to have a single implementation. > >> Do you have a specific use case in mind > > This came up with gssd which when used with the Kerberos GSS mech relies > on /etc/resolv.conf. An SMF file dependency is probably not ideal as not > all the GSS mechs rely on /etc/resolv.conf. At the moment gssd needs to > be restarted if /etc/resolv.conf pops into existence. It would be nice > to be able to get a file creation event for that. I agree that its > possible to simply "watch" /etc/ for changes but it seems rather > clumsy... > > -Mark In any case, you're going to have to code it in the same fashion - wait for an event (either that /etc was modified, or that /etc/resolv.conf was created) and then stat the file, and if that succeeds, parse it and see if it is correct and then reconfigure. The whole idea behind this API is that it cuts down on polling needlessly for file changes. It doesn't obviate the need to code defensively; the file can exist for a brief moment and then disappear. # cp /etc/resolv.foo /etc/resolv.conf; rm /etc/resolv.conf can always happen. - Bart -- Bart Smaalders Solaris Kernel Performance barts@cyber.eng.sun.com http://blogs.sun.com/barts From sacadmin Wed Jan 17 10:10:19 2007 Received: from binky.Central.Sun.COM (binky.Central.Sun.COM [129.153.128.104]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l0HIAIpB010862 for ; Wed, 17 Jan 2007 10:10:19 -0800 (PST) Received: from binky.Central.Sun.COM (localhost [127.0.0.1]) by binky.Central.Sun.COM (8.13.6+Sun/8.13.6) with ESMTP id l0HI9ut4020543; Wed, 17 Jan 2007 12:09:56 -0600 (CST) Received: (from nw141292@localhost) by binky.Central.Sun.COM (8.13.6+Sun/8.13.6/Submit) id l0HI9uM2020542; Wed, 17 Jan 2007 12:09:56 -0600 (CST) Date: Wed, 17 Jan 2007 12:09:56 -0600 From: Nicolas Williams To: Bart Smaalders Cc: Mark Phalan , psarc@sac.sfbay.sun.com, prakash.sangappa@sun.com, Darren.Kenny@sun.com, Alan.Bateman@sun.com, Michen.Chang@sun.com, Doug.Leavitt@sun.com Subject: Re: PSARC/2007/027 File Events Notification API Message-ID: <20070117180954.GJ1010@binky.Central.Sun.COM> References: <45A6F043.9000500@Sun.COM> <1168984790.777.12.camel@localhost> <45AD7355.6010003@Sun.COM> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <45AD7355.6010003@Sun.COM> User-Agent: Mutt/1.5.7i Status: RO Content-Length: 1674 On Tue, Jan 16, 2007 at 04:52:37PM -0800, Bart Smaalders wrote: > Mark Phalan wrote: > >Can non-existant files be monitored for creation, i.e. set up a watch on > >a file-path which doesn't exist (yet)? If not, is the recommended way to > >monitor the directory in which the file might be created? This may not > >be desirable - think of monitoring /etc/. > >It might be desireable to have a file creation event. > > > /etc isn't exactly a hot-bed of activity on my system. We could > create an API like this, but we'd end up doing exactly the same thing > in the kernel that you'd end up doing in userland - waiting for the > directory to change and then re-reading it. Really? But the kernel surely could easily post "file created" events as files are created (I'll call this "directory notifications" because that's what this is called in the NFSv4 community). > Do you have a specific use case in mind? I do: SMF file FMRI dependencies. There's also been plenty of talk of similar features for the NFSv4 protocol, so it may well be important for future NFSv4 server functionality in Solaris (and the client should be able to subscribe to directory notifications as well). Re-reading directories doesn't scale for large directories, but directory notifications don't scale with large metadata change volumes. How robust will the event system be? If a file is opened for write many times in a tight loop, how many events will be delivered? I would think that the system should be able to throttle event firing in order to deal with bursts of activity on a single file. I would expect something similar for directory notifications. Nico -- From sacadmin Wed Jan 17 10:24:20 2007 Received: from zion.eng.sun.com (zion.SFBay.Sun.COM [129.146.17.75]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l0HIOKQE011676 for ; Wed, 17 Jan 2007 10:24:20 -0800 (PST) Received: from [129.146.228.109] (cyber [129.146.228.109]) by zion.eng.sun.com (8.13.7+Sun/8.13.7) with ESMTP id l0HIOJh6021723; Wed, 17 Jan 2007 10:24:19 -0800 (PST) Message-ID: <45AE69C4.7080409@Sun.COM> Date: Wed, 17 Jan 2007 10:24:04 -0800 From: Bart Smaalders Organization: Sun Microsystems User-Agent: Thunderbird 1.5.0.8 (X11/20061204) MIME-Version: 1.0 To: Nicolas Williams CC: Mark Phalan , psarc@sac.sfbay.sun.com, prakash.sangappa@Sun.COM, Darren.Kenny@Sun.COM, Alan.Bateman@Sun.COM, Michen.Chang@Sun.COM, Doug.Leavitt@Sun.COM Subject: Re: PSARC/2007/027 File Events Notification API References: <45A6F043.9000500@Sun.COM> <1168984790.777.12.camel@localhost> <45AD7355.6010003@Sun.COM> <20070117180954.GJ1010@binky.Central.Sun.COM> In-Reply-To: <20070117180954.GJ1010@binky.Central.Sun.COM> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Status: RO Content-Length: 768 Nicolas Williams wrote: > I do: SMF file FMRI dependencies. > > There's also been plenty of talk of similar features for the NFSv4 > protocol, so it may well be important for future NFSv4 server > functionality in Solaris (and the client should be able to subscribe to > directory notifications as well). > > Re-reading directories doesn't scale for large directories, but > directory notifications don't scale with large metadata change volumes. > > How robust will the event system be? If a file is opened for write many > times in a tight loop, how many events will be delivered? > Please read the material in the case, this was explained in detail. - Bart -- Bart Smaalders Solaris Kernel Performance barts@cyber.eng.sun.com http://blogs.sun.com/barts From sacadmin Wed Jan 17 11:25:16 2007 Received: from jurassic.eng.sun.com (jurassic.SFBay.Sun.COM [129.146.17.57]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l0HJPFQ4014334 for ; Wed, 17 Jan 2007 11:25:16 -0800 (PST) Received: from [129.146.228.98] (justforkicks.Eng.Sun.COM [129.146.228.98]) by jurassic.eng.sun.com (8.13.8+Sun/8.13.8) with ESMTP id l0HJPBK1796212; Wed, 17 Jan 2007 11:25:12 -0800 (PST) Message-ID: <45AE7740.7010805@sun.com> Date: Wed, 17 Jan 2007 11:21:36 -0800 From: Prakash Sangappa User-Agent: Mail/News 1.5.0.4 (X11/20060701) MIME-Version: 1.0 To: Nicolas Williams CC: Bart Smaalders , Mark Phalan , psarc@sac.sfbay.sun.com, Darren.Kenny@sun.com, Alan.Bateman@sun.com, Michen.Chang@sun.com, Doug.Leavitt@sun.com Subject: Re: PSARC/2007/027 File Events Notification API References: <45A6F043.9000500@Sun.COM> <1168984790.777.12.camel@localhost> <45AD7355.6010003@Sun.COM> <20070117180954.GJ1010@binky.Central.Sun.COM> In-Reply-To: <20070117180954.GJ1010@binky.Central.Sun.COM> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Status: RO Content-Length: 1125 Nicolas Williams wrote: > On Tue, Jan 16, 2007 at 04:52:37PM -0800, Bart Smaalders wrote: > >> Mark Phalan wrote: >> >>> Can non-existant files be monitored for creation, i.e. set up a watch on >>> a file-path which doesn't exist (yet)? If not, is the recommended way to >>> monitor the directory in which the file might be created? This may not >>> be desirable - think of monitoring /etc/. >>> It might be desireable to have a file creation event. >>> >> /etc isn't exactly a hot-bed of activity on my system. We could >> create an API like this, but we'd end up doing exactly the same thing >> in the kernel that you'd end up doing in userland - waiting for the >> directory to change and then re-reading it. >> > > Really? But the kernel surely could easily post "file > created" events as files are created (I'll call this "directory > notifications" because that's what this is called in the NFSv4 > community). > > > That would mean that we have to post one event for each file being created. This leads to the queuing issues discussed in the case. -Prakash. > Nico > From sacadmin Wed Jan 17 11:29:26 2007 Received: from binky.Central.Sun.COM (binky.Central.Sun.COM [129.153.128.104]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l0HJTQqt014387 for ; Wed, 17 Jan 2007 11:29:26 -0800 (PST) Received: from binky.Central.Sun.COM (localhost [127.0.0.1]) by binky.Central.Sun.COM (8.13.6+Sun/8.13.6) with ESMTP id l0HJT3am021057; Wed, 17 Jan 2007 13:29:03 -0600 (CST) Received: (from nw141292@localhost) by binky.Central.Sun.COM (8.13.6+Sun/8.13.6/Submit) id l0HJT3CW021056; Wed, 17 Jan 2007 13:29:03 -0600 (CST) Date: Wed, 17 Jan 2007 13:29:03 -0600 From: Nicolas Williams To: Prakash Sangappa Cc: Bart Smaalders , Mark Phalan , psarc@sac.sfbay.sun.com, Darren.Kenny@sun.com, Alan.Bateman@sun.com, Michen.Chang@sun.com, Doug.Leavitt@sun.com Subject: Re: PSARC/2007/027 File Events Notification API Message-ID: <20070117192902.GK1010@binky.Central.Sun.COM> References: <45A6F043.9000500@Sun.COM> <1168984790.777.12.camel@localhost> <45AD7355.6010003@Sun.COM> <20070117180954.GJ1010@binky.Central.Sun.COM> <45AE7740.7010805@sun.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <45AE7740.7010805@sun.com> User-Agent: Mutt/1.5.7i Status: RO Content-Length: 897 On Wed, Jan 17, 2007 at 11:21:36AM -0800, Prakash Sangappa wrote: > Nicolas Williams wrote: > >Really? But the kernel surely could easily post "file > >created" events as files are created (I'll call this "directory > >notifications" because that's what this is called in the NFSv4 > >community). > > > That would mean that we have to post one event for each file being > created. This > leads to the queuing issues discussed in the case. I alluded to that. I believe that it would be acceptable for an interface that provided directory notifications to drop notifications, provided that there is a marker that indicates that such drops happened (so the application can resort to re-reading the directory). You really want both kinds of interfaces since re-reading directories isn't all that great when you have a network in the mix and large directories to boot. Nico -- From sacadmin Wed Jan 17 13:41:28 2007 Received: from jurassic.eng.sun.com (jurassic.SFBay.Sun.COM [129.146.228.31]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l0HLfSN3017777 for ; Wed, 17 Jan 2007 13:41:28 -0800 (PST) Received: from [129.146.228.98] (justforkicks.Eng.Sun.COM [129.146.228.98]) by jurassic.eng.sun.com (8.13.8+Sun/8.13.8) with ESMTP id l0HLfS6E838420; Wed, 17 Jan 2007 13:41:28 -0800 (PST) Message-ID: <45AE9730.20607@sun.com> Date: Wed, 17 Jan 2007 13:37:52 -0800 From: Prakash Sangappa User-Agent: Mail/News 1.5.0.4 (X11/20060701) MIME-Version: 1.0 To: Nicolas Williams CC: Bart Smaalders , Mark Phalan , psarc@sac.sfbay.sun.com, Darren.Kenny@sun.com, Alan.Bateman@sun.com, Michen.Chang@sun.com, Doug.Leavitt@sun.com Subject: Re: PSARC/2007/027 File Events Notification API References: <45A6F043.9000500@Sun.COM> <1168984790.777.12.camel@localhost> <45AD7355.6010003@Sun.COM> <20070117180954.GJ1010@binky.Central.Sun.COM> <45AE7740.7010805@sun.com> <20070117192902.GK1010@binky.Central.Sun.COM> In-Reply-To: <20070117192902.GK1010@binky.Central.Sun.COM> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Status: RO Content-Length: 2302 Nicolas Williams wrote: > On Wed, Jan 17, 2007 at 11:21:36AM -0800, Prakash Sangappa wrote: > >> Nicolas Williams wrote: >> >>> Really? But the kernel surely could easily post "file >>> created" events as files are created (I'll call this "directory >>> notifications" because that's what this is called in the NFSv4 >>> community). >>> >>> >> That would mean that we have to post one event for each file being >> created. This >> leads to the queuing issues discussed in the case. >> > > I alluded to that. I believe that it would be acceptable for an > interface that provided directory notifications to drop notifications, > provided that there is a marker that indicates that such drops happened > (so the application can resort to re-reading the directory). > > You really want both kinds of interfaces since re-reading directories > isn't all that great when you have a network in the mix and large > directories to boot. > > What is the use case we are talking about?. How would the application use this information? How does it operate now or this is some new requirement . Note, currently, on a NFS file system, only the client side(local) file events get reported. In addition to file create, file delete events, when the files get deleted, will have to be sent, in order for this information to be useful. Otherwise the application will have to stat and verify that the file still exists since a file could get created and deleted before the application can process the event. The issues surrounding event queuing have been described in the case and they have been discussed in detail on the open solaris forum perf-discuss@opensolaris.org The event ports API does not have provision to return more then just the events. Note the events get collected into the port_event_t structure. A pointer to the port_event_t is passed to port_get()/port_getn(). In order to support providing more context, like the file name, with the events, we will have to extend (or overload) some members of the port_event_t structure where the file name could be copied. The proposed API addresses the need where the application can find out when a file or directory status changes without having to periodically stat them. -Prakash. > Nico > From sacadmin Wed Jan 17 13:59:29 2007 Received: from zion.eng.sun.com (zion.SFBay.Sun.COM [129.146.17.75]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l0HLxTvF018394 for ; Wed, 17 Jan 2007 13:59:29 -0800 (PST) Received: from [129.146.228.109] (cyber [129.146.228.109]) by zion.eng.sun.com (8.13.7+Sun/8.13.7) with ESMTP id l0HLxQMe029745; Wed, 17 Jan 2007 13:59:26 -0800 (PST) Message-ID: <45AE9C2D.40900@Sun.COM> Date: Wed, 17 Jan 2007 13:59:09 -0800 From: Bart Smaalders Organization: Sun Microsystems User-Agent: Thunderbird 1.5.0.8 (X11/20061204) MIME-Version: 1.0 To: Nicolas Williams CC: Prakash Sangappa , Mark Phalan , psarc@sac.sfbay.sun.com, Darren.Kenny@Sun.COM, Alan.Bateman@Sun.COM, Michen.Chang@Sun.COM, Doug.Leavitt@Sun.COM Subject: Re: PSARC/2007/027 File Events Notification API References: <45A6F043.9000500@Sun.COM> <1168984790.777.12.camel@localhost> <45AD7355.6010003@Sun.COM> <20070117180954.GJ1010@binky.Central.Sun.COM> <45AE7740.7010805@sun.com> <20070117192902.GK1010@binky.Central.Sun.COM> In-Reply-To: <20070117192902.GK1010@binky.Central.Sun.COM> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Status: RO Content-Length: 1174 Nicolas Williams wrote: > > You really want both kinds of interfaces since re-reading directories > isn't all that great when you have a network in the mix and large > directories to boot. > The right way to implement this would be to give clients a lease on the directory contents; when those contents are updated from other machines the lease holders would receive either updated directory contents or lease invalidation notifications. In any case, today we have no way of getting notification of events that occur due to changes by other clients of the nfs server; we can only detect local changes. In this case, the directory contents are likely to be cached locally, right? As you'll note in the original spec, there is a very strong preference on our part for one port_associate call resulting in at most one event. Designs that break this model do not fit into a multi-threaded programming model, and due to queuing overflow problems require applications to code for two conditions (normal and overflowed event queue), practically insuring bugs. - Bart -- Bart Smaalders Solaris Kernel Performance barts@cyber.eng.sun.com http://blogs.sun.com/barts From sacadmin Wed Jan 17 14:19:59 2007 Received: from binky.Central.Sun.COM (binky.Central.Sun.COM [129.153.128.104]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l0HMJwID019001 for ; Wed, 17 Jan 2007 14:19:59 -0800 (PST) Received: from binky.Central.Sun.COM (localhost [127.0.0.1]) by binky.Central.Sun.COM (8.13.6+Sun/8.13.6) with ESMTP id l0HMJaCL022058; Wed, 17 Jan 2007 16:19:36 -0600 (CST) Received: (from nw141292@localhost) by binky.Central.Sun.COM (8.13.6+Sun/8.13.6/Submit) id l0HMJaqW022057; Wed, 17 Jan 2007 16:19:36 -0600 (CST) Date: Wed, 17 Jan 2007 16:19:36 -0600 From: Nicolas Williams To: Prakash Sangappa Cc: Bart Smaalders , Mark Phalan , psarc@sac.sfbay.sun.com, Darren.Kenny@sun.com, Alan.Bateman@sun.com, Michen.Chang@sun.com, Doug.Leavitt@sun.com Subject: Re: PSARC/2007/027 File Events Notification API Message-ID: <20070117221936.GO1010@binky.Central.Sun.COM> References: <45A6F043.9000500@Sun.COM> <1168984790.777.12.camel@localhost> <45AD7355.6010003@Sun.COM> <20070117180954.GJ1010@binky.Central.Sun.COM> <45AE7740.7010805@sun.com> <20070117192902.GK1010@binky.Central.Sun.COM> <45AE9730.20607@sun.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <45AE9730.20607@sun.com> User-Agent: Mutt/1.5.7i Status: RO Content-Length: 1093 On Wed, Jan 17, 2007 at 01:37:52PM -0800, Prakash Sangappa wrote: > >>Nicolas Williams wrote: > >>>Really? But the kernel surely could easily post "file > >>>created" events as files are created (I'll call this "directory > >>>notifications" because that's what this is called in the NFSv4 > >>>community). > What is the use case we are talking about?. How would the application use > this information? How does it operate now or this is some new > requirement . So, the concern is scalability w.r.t. very large directories. Re-reading on every event is likely to be painful. Both approaches to filesystem event monitoring, whether by delivery of simple "this object changed" or complex "this object changed as follows," have scalability issues. My question is: have these been considered? (A mixed system that degenerates into "this object changed" notifications under change volume pressure seems like a better system, but that isn't obvious; perhaps you've studied the matter and concluded that scaling well with directory size simply isn't achievable.) Nico -- From sacadmin Wed Jan 17 14:34:25 2007 Received: from binky.Central.Sun.COM (binky.Central.Sun.COM [129.153.128.104]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l0HMYOg0019357 for ; Wed, 17 Jan 2007 14:34:25 -0800 (PST) Received: from binky.Central.Sun.COM (localhost [127.0.0.1]) by binky.Central.Sun.COM (8.13.6+Sun/8.13.6) with ESMTP id l0HMY2rN022145; Wed, 17 Jan 2007 16:34:02 -0600 (CST) Received: (from nw141292@localhost) by binky.Central.Sun.COM (8.13.6+Sun/8.13.6/Submit) id l0HMY2v5022144; Wed, 17 Jan 2007 16:34:02 -0600 (CST) Date: Wed, 17 Jan 2007 16:34:02 -0600 From: Nicolas Williams To: Bart Smaalders Cc: Prakash Sangappa , Mark Phalan , psarc@sac.sfbay.sun.com, Darren.Kenny@sun.com, Alan.Bateman@sun.com, Michen.Chang@sun.com, Doug.Leavitt@sun.com Subject: Re: PSARC/2007/027 File Events Notification API Message-ID: <20070117223401.GP1010@binky.Central.Sun.COM> References: <45A6F043.9000500@Sun.COM> <1168984790.777.12.camel@localhost> <45AD7355.6010003@Sun.COM> <20070117180954.GJ1010@binky.Central.Sun.COM> <45AE7740.7010805@sun.com> <20070117192902.GK1010@binky.Central.Sun.COM> <45AE9C2D.40900@Sun.COM> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <45AE9C2D.40900@Sun.COM> User-Agent: Mutt/1.5.7i Status: RO Content-Length: 1539 On Wed, Jan 17, 2007 at 01:59:09PM -0800, Bart Smaalders wrote: > >You really want both kinds of interfaces since re-reading directories > >isn't all that great when you have a network in the mix and large > >directories to boot. > > The right way to implement this would be to give clients a lease > on the directory contents; when those contents are updated from > other machines the lease holders would receive either updated > directory contents or lease invalidation notifications. Well, this too has been discussed w.r.t. NFSv4 extensions. To me it seems that the designs fall into two categories (simple events vs. complex events), with the second degenerating into the first under pressure. Directory leases fall into the first category from a protocol perspective, but allow clients to implement a system in the second category. > In any case, today we have no way of getting notification of > events that occur due to changes by other clients of the nfs > server; we can only detect local changes. In this case, the > directory contents are likely to be cached locally, right? Of course. Today. I mention the NFSv4 extensions work precisely to make the point that this need not always be the case. Speaking of which, IIRC CIFS does have an event facility today. > As you'll note in the original spec, there is a very strong > preference on our part for one port_associate call resulting > in at most one event. Ah! OK. I get it. You really could not implement a complex fs event system atop event ports. Nico -- From sacadmin Wed Jan 17 15:04:58 2007 Received: from dm-holland-01.uk.sun.com (dm-holland-01.UK.Sun.COM [129.156.101.192]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l0HN4vqK020665 for ; Wed, 17 Jan 2007 15:04:58 -0800 (PST) Received: from vaticaan.holland.sun.com (vaticaan.Holland.Sun.COM [129.159.213.1]) by dm-holland-01.uk.sun.com (8.13.6+Sun/8.13.6/ENSMAIL,v2.2) with ESMTP id l0HN4nWG026901; Wed, 17 Jan 2007 23:04:49 GMT Received: from holland (casper@room101 [129.159.201.52]) by vaticaan.holland.sun.com (8.13.6+Sun/8.12.9) with ESMTP id l0HN4mfR006793; Thu, 18 Jan 2007 00:04:48 +0100 (MET) Message-Id: <200701172304.l0HN4mfR006793@vaticaan.holland.sun.com> From: Casper.Dik@sun.com To: Nicolas Williams cc: Prakash Sangappa , Bart Smaalders , Mark Phalan , psarc@sac.sfbay.sun.com, Darren.Kenny@sun.com, Alan.Bateman@sun.com, Michen.Chang@sun.com, Doug.Leavitt@sun.com Subject: Re: PSARC/2007/027 File Events Notification API In-Reply-To: <20070117221936.GO1010@binky.Central.Sun.COM> References: <45A6F043.9000500@Sun.COM> <1168984790.777.12.camel@localhost> <45AD7355.6010003@Sun.COM> <20070117180954.GJ1010@binky.Central.Sun.COM> <45AE7740.7010805@sun.com> <20070117192902.GK1010@binky.Central.Sun.COM> <45AE9730.20607@sun.com> <20070117221936.GO1010@binky.Central.Sun.COM> Date: Thu, 18 Jan 2007 00:04:48 +0100 Sender: casper@holland.sun.com Status: RO Content-Length: 1101 >On Wed, Jan 17, 2007 at 01:37:52PM -0800, Prakash Sangappa wrote: >> >>Nicolas Williams wrote: >> >>>Really? But the kernel surely could easily post "file >> >>>created" events as files are created (I'll call this "directory >> >>>notifications" because that's what this is called in the NFSv4 >> >>>community). >> What is the use case we are talking about?. How would the application use >> this information? How does it operate now or this is some new >> requirement . > >So, the concern is scalability w.r.t. very large directories. >Re-reading on every event is likely to be painful. You mean like how Windows XP repaints the desktop several times when you install software? >Both approaches to filesystem event monitoring, whether by delivery of >simple "this object changed" or complex "this object changed as >follows," have scalability issues. My question is: have these been >considered? I'm not sure why a model which generates an event for a system call would have scalability issues. Requiring a directory to be reread, though, might have such issues. Casper From sacadmin Wed Jan 17 15:17:20 2007 Received: from engmail3mpk.sfbay.Sun.COM (engmail3mpk.SFBay.Sun.COM [129.146.11.26]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l0HNHK9g022108 for ; Wed, 17 Jan 2007 15:17:20 -0800 (PST) Received: from marduk.eng.sun.com (marduk.SFBay.Sun.COM [129.146.108.224]) by engmail3mpk.sfbay.Sun.COM (8.13.6+Sun/8.13.6/ENSMAIL,v2.2) with ESMTP id l0HNHGYK000067; Wed, 17 Jan 2007 15:17:16 -0800 (PST) Received: from marduk.eng.sun.com (localhost [127.0.0.1]) by marduk.eng.sun.com (8.13.6+Sun/8.12.11) with ESMTP id l0HNHQDT013836; Wed, 17 Jan 2007 15:17:26 -0800 (PST) Received: (from gww@localhost) by marduk.eng.sun.com (8.13.6+Sun/8.12.11/Submit) id l0HNHQwh013835; Wed, 17 Jan 2007 15:17:26 -0800 (PST) Date: Wed, 17 Jan 2007 15:17:26 -0800 (PST) From: Gary Winiger Message-Id: <200701172317.l0HNHQwh013835@marduk.eng.sun.com> To: bart.smaalders@sun.com, psarc@sac.sfbay.sun.com Cc: Doug.Leavitt@sun.com, Michen.Chang@sun.com, alan.bateman@sun.com, darren.kenny@sun.com, prakash.sangappa@sun.com Subject: Re: PSARC/2007/027 File Events Notification API Status: RO Content-Length: 1144 > I'm sponsoring the attached fast track for Prakash Sangappa. > The requested release binding is minor, and the proposed timeout > is 1/17/2007. The user interfaces described herein extend the > ones introduced in PSARC/2002/498 and have the same stability > levels. I've been away and am again today. I'd like to ask for more time. There does seem to be an ongoing discussion -- I'm not caught up, so maybe this case will be extended anyway. If I'm the only reason and the project is under time pressure, Friday will do for me to get caught up. Specifically I'd like to know how these events may or may not be eventually related to the ACE_SYSTEM_AUDIT_ACE_TYPE and/or ACE_SYSTEM_ALARM_ACE_TYPE of ZFS (and presumably NFSv4 and CIFS) -- yes I read that about this project not dealing with the with the diffs between two snapshots, however I believe the audit and alarm ACEs may be relevent to the future. I'd also like to digest the question sent to audit-core relative to port-associate. And I'd like to think more about using this mechanism to generate audit events that administrators may wish to set. Thanks, Gary.. From sacadmin Wed Jan 17 15:32:31 2007 Received: from binky.Central.Sun.COM (binky.Central.Sun.COM [129.153.128.104]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l0HNWV63023086 for ; Wed, 17 Jan 2007 15:32:31 -0800 (PST) Received: from binky.Central.Sun.COM (localhost [127.0.0.1]) by binky.Central.Sun.COM (8.13.6+Sun/8.13.6) with ESMTP id l0HNW8K8022507; Wed, 17 Jan 2007 17:32:08 -0600 (CST) Received: (from nw141292@localhost) by binky.Central.Sun.COM (8.13.6+Sun/8.13.6/Submit) id l0HNW8bw022506; Wed, 17 Jan 2007 17:32:08 -0600 (CST) Date: Wed, 17 Jan 2007 17:32:08 -0600 From: Nicolas Williams To: Casper.Dik@sun.com Cc: Prakash Sangappa , Bart Smaalders , Mark Phalan , psarc@sac.sfbay.sun.com, Darren.Kenny@sun.com, Alan.Bateman@sun.com, Michen.Chang@sun.com, Doug.Leavitt@sun.com Subject: Re: PSARC/2007/027 File Events Notification API Message-ID: <20070117233208.GU1010@binky.Central.Sun.COM> References: <45A6F043.9000500@Sun.COM> <1168984790.777.12.camel@localhost> <45AD7355.6010003@Sun.COM> <20070117180954.GJ1010@binky.Central.Sun.COM> <45AE7740.7010805@sun.com> <20070117192902.GK1010@binky.Central.Sun.COM> <45AE9730.20607@sun.com> <20070117221936.GO1010@binky.Central.Sun.COM> <200701172304.l0HN4mfR006793@vaticaan.holland.sun.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200701172304.l0HN4mfR006793@vaticaan.holland.sun.com> User-Agent: Mutt/1.5.7i Status: RO Content-Length: 1686 On Thu, Jan 18, 2007 at 12:04:48AM +0100, Casper.Dik@Sun.COM wrote: > >So, the concern is scalability w.r.t. very large directories. > >Re-reading on every event is likely to be painful. > > You mean like how Windows XP repaints the desktop several times > when you install software? I guess. I don't do that often. My kids do :) > >Both approaches to filesystem event monitoring, whether by delivery of > >simple "this object changed" or complex "this object changed as > >follows," have scalability issues. My question is: have these been > >considered? > > I'm not sure why a model which generates an event for a system call > would have scalability issues. Requiring a directory to be reread, > though, might have such issues. Re-reading directories clearly doesn't scale with directory size, particularly if the volume of changes is large enough (but the app is in control of how soon after re-reading it's willing to do it again). Per-syscall/VOP events don't scale with change volume, not if the events have to be robust. Say I unzip a very large archive containing thousands of files and some GUI is repainting a selection box on each event -- that GUI, and perhaps the unzip process, or even the system as a whole, could slow to a crawl if we guarantee robustness as flow control pushes the GUI's slow refresh period down the chain to the kernel and the source of the events. My guess is that most applications of fs event APIs other than auditing- type applications (for which we already have a suitable audit facility) will not mind non-robust and delayed events provided that they know when events are dropped and can set a time limit on event delays. Nico -- From sacadmin Wed Jan 17 16:37:04 2007 Received: from jurassic.eng.sun.com (jurassic.SFBay.Sun.COM [129.146.104.45]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l0I0b4ga027996 for ; Wed, 17 Jan 2007 16:37:04 -0800 (PST) Received: from [129.146.228.98] (justforkicks.Eng.Sun.COM [129.146.228.98]) by jurassic.eng.sun.com (8.13.8+Sun/8.13.8) with ESMTP id l0I0b37i900337; Wed, 17 Jan 2007 16:37:04 -0800 (PST) Message-ID: <45AEC058.6090801@sun.com> Date: Wed, 17 Jan 2007 16:33:28 -0800 From: Prakash Sangappa User-Agent: Mail/News 1.5.0.4 (X11/20060701) MIME-Version: 1.0 To: Nicolas Williams CC: Casper.Dik@sun.com, Bart Smaalders , Mark Phalan , psarc@sac.sfbay.sun.com, Darren.Kenny@sun.com, Alan.Bateman@sun.com, Michen.Chang@sun.com, Doug.Leavitt@sun.com Subject: Re: PSARC/2007/027 File Events Notification API References: <45A6F043.9000500@Sun.COM> <1168984790.777.12.camel@localhost> <45AD7355.6010003@Sun.COM> <20070117180954.GJ1010@binky.Central.Sun.COM> <45AE7740.7010805@sun.com> <20070117192902.GK1010@binky.Central.Sun.COM> <45AE9730.20607@sun.com> <20070117221936.GO1010@binky.Central.Sun.COM> <200701172304.l0HN4mfR006793@vaticaan.holland.sun.com> <20070117233208.GU1010@binky.Central.Sun.COM> In-Reply-To: <20070117233208.GU1010@binky.Central.Sun.COM> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Status: RO Content-Length: 2502 Nicolas Williams wrote: > >> I'm not sure why a model which generates an event for a system call >> would have scalability issues. Requiring a directory to be reread, >> though, might have such issues. >> > > Re-reading directories clearly doesn't scale with directory size, > particularly if the volume of changes is large enough (but the app is in > control of how soon after re-reading it's willing to do it again). > > Yes, the application can control how often it reads the directory. It could detect if the directory is changing very frequently, then it could wait a short while before re-registering(port_associate()) to receive the next event. The one event notification per port_associate() model, will help filter out redundant events as well. > Per-syscall/VOP events don't scale with change volume, not if the events > have to be robust. Say I unzip a very large archive containing > thousands of files and some GUI is repainting a selection box on each > event -- that GUI, and perhaps the unzip process, or even the system as > a whole, could slow to a crawl if we guarantee robustness as flow > control pushes the GUI's slow refresh period down the chain to the > kernel and the source of the events. > > With the proposed API, we never have to flow control event delivery and stop the event source. The timestamps are used to ensure that no events get missed between the time an event is sent and the application comes back to re-register. Therefore it is robust. For example we don't want to stop directory modifications from occurring because there is a slow GUI application trying to catch up with the changes that have occurred. With the one event per registration (port_associate) approach, the application will receive an event indicating the directory has modified. Then the application can take its sweet time processing the directory changes before it comes back and re-registers(port_associate) the directory. Meanwhile the directory changes can continue to occur. At the time of re-registering, the timestamps passed in will be compared with the current time of the directory and if it has changed, an event will be sent. -Prakash. > My guess is that most applications of fs event APIs other than auditing- > type applications (for which we already have a suitable audit facility) > will not mind non-robust and delayed events provided that they know when > events are dropped and can set a time limit on event delays. > > Nico From sacadmin Wed Jan 17 18:55:35 2007 Received: from zion.eng.sun.com (zion.SFBay.Sun.COM [129.146.17.75]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l0I2tZfK002201 for ; Wed, 17 Jan 2007 18:55:35 -0800 (PST) Received: from [129.146.228.109] (cyber [129.146.228.109]) by zion.eng.sun.com (8.13.7+Sun/8.13.7) with ESMTP id l0I2tYxu015308; Wed, 17 Jan 2007 18:55:34 -0800 (PST) Message-ID: <45AEE198.9060502@Sun.COM> Date: Wed, 17 Jan 2007 18:55:20 -0800 From: Bart Smaalders Organization: Sun Microsystems User-Agent: Thunderbird 1.5.0.8 (X11/20061204) MIME-Version: 1.0 To: Casper.Dik@Sun.COM CC: Nicolas Williams , Prakash Sangappa , Mark Phalan , psarc@sac.sfbay.sun.com, Darren.Kenny@Sun.COM, Alan.Bateman@Sun.COM, Michen.Chang@Sun.COM, Doug.Leavitt@Sun.COM Subject: Re: PSARC/2007/027 File Events Notification API References: <45A6F043.9000500@Sun.COM> <1168984790.777.12.camel@localhost> <45AD7355.6010003@Sun.COM> <20070117180954.GJ1010@binky.Central.Sun.COM> <45AE7740.7010805@sun.com> <20070117192902.GK1010@binky.Central.Sun.COM> <45AE9730.20607@sun.com> <20070117221936.GO1010@binky.Central.Sun.COM> <200701172304.l0HN4mfR006793@vaticaan.holland.sun.com> In-Reply-To: <200701172304.l0HN4mfR006793@vaticaan.holland.sun.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Status: RO Content-Length: 999 Casper.Dik@Sun.COM wrote: > I'm not sure why a model which generates an event for a system call > would have scalability issues. Requiring a directory to be reread, > though, might have such issues. The proposed architecture requires the monitoring application to "re-arm" the event after every event by calling port_associate. This is an explicit and intentional part of the design, as is discussed in some detail in the spec and the discussions on opensolaris referenced in the spec. Note that re-reading a directory to discover the newly created files may take some time - but this cost is borne by the monitoring application. Note that if the monitoring app fall behind, no harm is done; it will catch up sooner or later, or not. The more loaded the box, the further behind the application gets - but there is no increase in costs, and memory is not locked down queuing events. - Bart -- Bart Smaalders Solaris Kernel Performance barts@cyber.eng.sun.com http://blogs.sun.com/barts From sacadmin Fri Jan 19 15:24:39 2007 Received: from jurassic.eng.sun.com (jurassic.SFBay.Sun.COM [129.146.224.31]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l0JNOdjh025677 for ; Fri, 19 Jan 2007 15:24:39 -0800 (PST) Received: from sheplap.central.sun.com (sheplap.Central.Sun.COM [10.1.194.251]) by jurassic.eng.sun.com (8.13.8+Sun/8.13.8) with ESMTP id l0JNOcSt451412; Fri, 19 Jan 2007 15:24:38 -0800 (PST) Received: by sheplap.central.sun.com (Postfix, from userid 76367) id C8CEC360123; Fri, 19 Jan 2007 17:24:36 -0600 (CST) Date: Fri, 19 Jan 2007 17:24:36 -0600 From: Spencer Shepler To: Bart Smaalders Cc: psarc@sac.sfbay.sun.com, prakash.sangappa@sun.com, darren.kenny@sun.com, alan.bateman@sun.com, Michen.Chang@sun.com, Doug.Leavitt@sun.com Subject: Re: PSARC/2007/027 File Events Notification API Message-ID: <20070119232436.GR9232@sheplap.central.sun.com> Reply-To: spencer.shepler@sun.com References: <45A6F043.9000500@Sun.COM> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <45A6F043.9000500@Sun.COM> User-Agent: Mutt/1.4.2.1i Status: RO Content-Length: 776 On Thu, Bart Smaalders wrote: > I'm sponsoring the attached fast track for Prakash Sangappa. > The requested release binding is minor, and the proposed timeout > is 1/17/2007. The user interfaces described herein extend the > ones introduced in PSARC/2002/498 and have the same stability > levels. If it isn't too much trouble, I would like to have this case extended until next Wednesday (1/24). The reasons is that NFSv4.1 will introduce a set of directory/file notification events and I would like to ensure that the architecture presented here will fit effectively with the upcoming capabilities of NFSv4.1. In fairness to the project team, I would like the extra time to be complete in my review so that I don't unnecessarily delay the review any further. Spencer From sacadmin Fri Jan 19 15:42:03 2007 Received: from sfbaymail2sca.sfbay.sun.com (sfbaymail2sca.SFBay.Sun.COM [129.145.155.42]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l0JNg3Ks025760 for ; Fri, 19 Jan 2007 15:42:03 -0800 (PST) Received: from nwk-ea-fw-1.sun.com (nwkes-gis-mail-1.SFBay.Sun.COM [10.4.134.5]) by sfbaymail2sca.sfbay.sun.com (8.13.6+Sun/8.12.10/ENSMAIL,v2.2) with ESMTP id l0JNg3gJ023383 for ; Fri, 19 Jan 2007 15:42:03 -0800 (PST) Received: from d1-sfbay-09.sun.com ([192.18.39.119]) by nwk-ea-fw-1.sun.com (8.13.6+Sun/8.12.9) with ESMTP id l0JNfw4t013604 for ; Fri, 19 Jan 2007 15:41:58 -0800 (PST) Received: from conversion-daemon.d1-sfbay-09.sun.com by d1-sfbay-09.sun.com (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) id <0JC500G0133BZW00@d1-sfbay-09.sun.com> (original mail from Ed.Gould@Sun.COM) for psarc@sac.sfbay.sun.com; Fri, 19 Jan 2007 15:41:58 -0800 (PST) Received: from [129.146.106.203] by d1-sfbay-09.sun.com (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) with ESMTPSA id <0JC50080C35P1M1R@d1-sfbay-09.sun.com>; Fri, 19 Jan 2007 15:41:50 -0800 (PST) Date: Fri, 19 Jan 2007 15:41:56 -0800 From: Ed Gould Subject: Re: PSARC/2007/027 File Events Notification API In-reply-to: <20070119232436.GR9232@sheplap.central.sun.com> Sender: Ed.Gould@Sun.COM To: Spencer.Shepler@Sun.COM Cc: Bart Smaalders , psarc@sac.sfbay.sun.com, prakash.sangappa@Sun.COM, Darren.Kenny@Sun.COM, Alan.Bateman@Sun.COM, Michen.Chang@Sun.COM, Doug.Leavitt@Sun.COM Message-id: <45B15744.7020600@sun.com> Organization: Sun Cluster Engineering - GDD MIME-version: 1.0 Content-type: multipart/mixed; boundary="Boundary_(ID_nwAsA8PwsB+G9ecoXbvqqQ)" References: <45A6F043.9000500@Sun.COM> <20070119232436.GR9232@sheplap.central.sun.com> User-Agent: Thunderbird 1.5 (X11/20060113) Status: RO Content-Length: 962 This is a multi-part message in MIME format. --Boundary_(ID_nwAsA8PwsB+G9ecoXbvqqQ) Content-type: text/plain; format=flowed; charset=ISO-8859-1 Content-transfer-encoding: 7BIT Spencer Shepler wrote: > If it isn't too much trouble, I would like to have this case extended > until next Wednesday (1/24). Spencer, The timer on this case was extended at Wednesday's PSARC meeting, per Gary's request, so it will now time out on the 24th. -- --Ed --Boundary_(ID_nwAsA8PwsB+G9ecoXbvqqQ) Content-type: text/x-vcard; name=ed.gould.vcf; charset=utf-8 Content-transfer-encoding: 7BIT Content-disposition: attachment; filename=ed.gould.vcf begin:vcard fn:Ed Gould n:Gould;Ed org:Sun Microsystems, Inc.;Solaris Cluster adr;dom:M/S UMPK17-201;;17 Network Circle;Menlo Park;CA;94025 email;internet:ed.gould@sun.com title:File System Architect, PSARC Chair tel;work:+1.650.786.4937 x-mozilla-html:FALSE version:2.1 end:vcard --Boundary_(ID_nwAsA8PwsB+G9ecoXbvqqQ)-- From sacadmin Fri Jan 19 15:42:52 2007 Received: from binky.Central.Sun.COM (binky.Central.Sun.COM [129.153.128.104]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l0JNgpXI025796 for ; Fri, 19 Jan 2007 15:42:51 -0800 (PST) Received: from binky.Central.Sun.COM (localhost [127.0.0.1]) by binky.Central.Sun.COM (8.13.6+Sun/8.13.6) with ESMTP id l0JNgS3e009961; Fri, 19 Jan 2007 17:42:28 -0600 (CST) Received: (from nw141292@localhost) by binky.Central.Sun.COM (8.13.6+Sun/8.13.6/Submit) id l0JNgSYf009960; Fri, 19 Jan 2007 17:42:28 -0600 (CST) Date: Fri, 19 Jan 2007 17:42:28 -0600 From: Nicolas Williams To: Bart Smaalders Cc: Prakash Sangappa , Mark Phalan , psarc@sac.sfbay.sun.com, Darren.Kenny@sun.com, Alan.Bateman@sun.com, Michen.Chang@sun.com, Doug.Leavitt@sun.com Subject: Re: PSARC/2007/027 File Events Notification API Message-ID: <20070119234228.GB9516@binky.Central.Sun.COM> References: <45A6F043.9000500@Sun.COM> <1168984790.777.12.camel@localhost> <45AD7355.6010003@Sun.COM> <20070117180954.GJ1010@binky.Central.Sun.COM> <45AE7740.7010805@sun.com> <20070117192902.GK1010@binky.Central.Sun.COM> <45AE9C2D.40900@Sun.COM> <20070117223401.GP1010@binky.Central.Sun.COM> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20070117223401.GP1010@binky.Central.Sun.COM> User-Agent: Mutt/1.5.7i Status: RO Content-Length: 881 On Wed, Jan 17, 2007 at 04:34:02PM -0600, Nicolas Williams wrote: > > As you'll note in the original spec, there is a very strong > > preference on our part for one port_associate call resulting > > in at most one event. > > Ah! OK. I get it. You really could not implement a complex fs event > system atop event ports. But I think you can still get this right with port events if you try. Imagine every directory vnode as having a fixed-sized queue of metadata events[*], then the port_associate() could mean "fire if there's an event on the given directory's queue or when one is added" and preserve event port semantics (that the association has to be re-made in order to get any more events). Drops would be detected at port_associate time[**]. [*] Of this form, perhaps: {timestamp, event_id, inode, name, LINK/UNLINK}. [**] Using timestamp and/or event_id Nico -- From sacadmin Fri Jan 19 15:47:13 2007 Received: from jurassic.eng.sun.com (jurassic.SFBay.Sun.COM [129.146.17.55]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l0JNlDNg025844 for ; Fri, 19 Jan 2007 15:47:13 -0800 (PST) Received: from sheplap.central.sun.com (sheplap.Central.Sun.COM [10.1.194.251]) by jurassic.eng.sun.com (8.13.8+Sun/8.13.8) with ESMTP id l0JNlDt9489653; Fri, 19 Jan 2007 15:47:13 -0800 (PST) Received: by sheplap.central.sun.com (Postfix, from userid 76367) id 3F285360240; Fri, 19 Jan 2007 17:47:08 -0600 (CST) Date: Fri, 19 Jan 2007 17:47:08 -0600 From: Spencer Shepler To: Ed Gould Cc: Spencer.Shepler@sun.com, Bart Smaalders , psarc@sac.sfbay.sun.com, prakash.sangappa@sun.com, Darren.Kenny@sun.com, Alan.Bateman@sun.com, Michen.Chang@sun.com, Doug.Leavitt@sun.com Subject: Re: PSARC/2007/027 File Events Notification API Message-ID: <20070119234708.GS9232@sheplap.central.sun.com> Reply-To: spencer.shepler@sun.com References: <45A6F043.9000500@Sun.COM> <20070119232436.GR9232@sheplap.central.sun.com> <45B15744.7020600@sun.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <45B15744.7020600@sun.com> User-Agent: Mutt/1.4.2.1i Status: RO Content-Length: 343 On Fri, Ed Gould wrote: > Spencer Shepler wrote: > >If it isn't too much trouble, I would like to have this case extended > >until next Wednesday (1/24). > > Spencer, > > The timer on this case was extended at Wednesday's PSARC meeting, per > Gary's request, so it will now time out on the 24th. Well, isn't that fortuitous. :-) Spencer From sacadmin Sun Jan 21 18:50:22 2007 Received: from engmail3mpk.sfbay.Sun.COM (engmail3mpk.SFBay.Sun.COM [129.146.11.26]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l0M2oMZL006747 for ; Sun, 21 Jan 2007 18:50:22 -0800 (PST) Received: from marduk.eng.sun.com (marduk.SFBay.Sun.COM [129.146.108.224]) by engmail3mpk.sfbay.Sun.COM (8.13.6+Sun/8.13.6/ENSMAIL,v2.2) with ESMTP id l0M2oIrp024190; Sun, 21 Jan 2007 18:50:18 -0800 (PST) Received: from marduk.eng.sun.com (localhost [127.0.0.1]) by marduk.eng.sun.com (8.13.6+Sun/8.12.11) with ESMTP id l0M2oYxf019281; Sun, 21 Jan 2007 18:50:34 -0800 (PST) Received: (from gww@localhost) by marduk.eng.sun.com (8.13.6+Sun/8.12.11/Submit) id l0M2oY4N019280; Sun, 21 Jan 2007 18:50:34 -0800 (PST) Date: Sun, 21 Jan 2007 18:50:34 -0800 (PST) From: Gary Winiger Message-Id: <200701220250.l0M2oY4N019280@marduk.eng.sun.com> To: bart.smaalders@sun.com, gww@eng.sun.com, psarc@sac.sfbay.sun.com Cc: Doug.Leavitt@sun.com, Michen.Chang@sun.com, alan.bateman@sun.com, darren.kenny@sun.com, prakash.sangappa@sun.com Subject: Re: PSARC/2007/027 File Events Notification API Status: RO Content-Length: 2030 > I've been away and am again today. I'd like to ask for more time. Thanks for the additionaly time. Seem's I wasn't alone. > Specifically I'd like to know how these events may or may not > be eventually related to the ACE_SYSTEM_AUDIT_ACE_TYPE and/or > ACE_SYSTEM_ALARM_ACE_TYPE of ZFS (and presumably NFSv4 and CIFS) So let me now ask (and perhaps ZFS/NFS/CIFS folk will also be needed to answer), how will this project interplay, or make use of an underlying mechanism like ACE_SYSTEM_ALARM_ACE_TYPE? It's OK to assert that they are unrelated. I'm just looking for have you looked into it and are you confortable that there won't be conflicts that can't be avoided or worked around. > I'd also like to digest the question sent to audit-core relative > to port-associate. Audit is a more interesting topic. It seems like none of the portfs system call is audited. I don't recall any audit discussion related to 2002/498. So there may be other things hidden below the surface. If any of these functions can be affected by another process, then audit would seem to be a requirement. In the case of this particular project, it seems to me that registering for an event as well as receiving the event might need to be audited. Or at least receiving the event as that would be analogous to doing a stat. I'll try to catch Bart and Prakash offline Mon or Tue and summarize to the case. > And I'd like to think more about using this > mechanism to generate audit events that administrators may wish to > set. > The event types that can be specified at port_associate() time for > the PORT_SOURCE_FILE are FILE_ACCESS, FILE_MODIFIED, FILE_ATTRIB, I've do this and believe that I could use this project to implement what I was thinking about if the event types specified to port_associate() are a mask. I couldn't find that in the spec, nor could I find it in the man page for port_associate(3C). Are the event types a mask so I can ask for all three events? Gary.. From sacadmin Mon Jan 22 12:26:17 2007 Received: from engmail3mpk.sfbay.Sun.COM (engmail3mpk.SFBay.Sun.COM [129.146.11.26]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l0MKQGoa027381 for ; Mon, 22 Jan 2007 12:26:16 -0800 (PST) Received: from marduk.eng.sun.com (marduk.SFBay.Sun.COM [129.146.108.224]) by engmail3mpk.sfbay.Sun.COM (8.13.6+Sun/8.13.6/ENSMAIL,v2.2) with ESMTP id l0MKQDCY017420; Mon, 22 Jan 2007 12:26:13 -0800 (PST) Received: from marduk.eng.sun.com (localhost [127.0.0.1]) by marduk.eng.sun.com (8.13.6+Sun/8.12.11) with ESMTP id l0MKQUSF019988; Mon, 22 Jan 2007 12:26:30 -0800 (PST) Received: (from gww@localhost) by marduk.eng.sun.com (8.13.6+Sun/8.12.11/Submit) id l0MKQU16019987; Mon, 22 Jan 2007 12:26:30 -0800 (PST) Date: Mon, 22 Jan 2007 12:26:30 -0800 (PST) From: Gary Winiger Message-Id: <200701222026.l0MKQU16019987@marduk.eng.sun.com> To: bart.smaalders@sun.com, gww@eng.sun.com, psarc@sac.sfbay.sun.com Subject: Re: PSARC/2007/027 File Events Notification API Cc: Doug.Leavitt@sun.com, Michen.Chang@sun.com, alan.bateman@sun.com, darren.kenny@sun.com, prakash.sangappa@sun.com X-Sun-Charset: US-ASCII Status: RO Content-Length: 1375 > > I'd also like to digest the question sent to audit-core relative > > to port-associate. > > Audit is a more interesting topic. It seems like none of the > portfs system call is audited. I don't recall any audit > discussion related to 2002/498. So there may be other things > hidden below the surface. If any of these functions can be > affected by another process, then audit would seem to be a requirement. > In the case of this particular project, it seems to me that > registering for an event as well as receiving the event might > need to be audited. Or at least receiving the event as that > would be analogous to doing a stat. > I'll try to catch Bart and Prakash offline Mon or Tue and summarize > to the case. I chatted with both Bart and Prakash and we concluded that the existing portfs vectored system call (SYS_port) correctly doesn't require audit since in introduced no new unaudited "signaling" paths to the subject's address space that were not already audited by the existing "signaling" mechanism. However, this project does introduce new communication paths that require audit. The project team agreed to work with the audit project team to define the proper audit records and implement them as part of this project. I'll leave the project team to confirm this agreement in a follow up message to the case ;-) Gary.. From sacadmin Mon Jan 22 13:48:41 2007 Received: from jurassic.eng.sun.com (jurassic.SFBay.Sun.COM [129.146.226.31]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l0MLmfEO000032 for ; Mon, 22 Jan 2007 13:48:41 -0800 (PST) Received: from [129.146.228.98] (justforkicks.Eng.Sun.COM [129.146.228.98]) by jurassic.eng.sun.com (8.13.8+Sun/8.13.8) with ESMTP id l0MLmeC5117695; Mon, 22 Jan 2007 13:48:41 -0800 (PST) Message-ID: <45B5305C.8020102@sun.com> Date: Mon, 22 Jan 2007 13:45:00 -0800 From: Prakash Sangappa User-Agent: Mail/News 1.5.0.4 (X11/20060701) MIME-Version: 1.0 To: Gary Winiger CC: bart.smaalders@sun.com, psarc@sac.sfbay.sun.com, Doug.Leavitt@sun.com, Michen.Chang@sun.com, alan.bateman@sun.com, darren.kenny@sun.com Subject: Re: PSARC/2007/027 File Events Notification API References: <200701222026.l0MKQU16019987@marduk.eng.sun.com> In-Reply-To: <200701222026.l0MKQU16019987@marduk.eng.sun.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Status: RO Content-Length: 1592 Yes, as Gary mentioned, we will work with the audit project team to define and implement appropriate auditing functionality. -Prakash. Gary Winiger wrote: >>> I'd also like to digest the question sent to audit-core relative >>> to port-associate. >>> >> Audit is a more interesting topic. It seems like none of the >> portfs system call is audited. I don't recall any audit >> discussion related to 2002/498. So there may be other things >> hidden below the surface. If any of these functions can be >> affected by another process, then audit would seem to be a requirement. >> In the case of this particular project, it seems to me that >> registering for an event as well as receiving the event might >> need to be audited. Or at least receiving the event as that >> would be analogous to doing a stat. >> I'll try to catch Bart and Prakash offline Mon or Tue and summarize >> to the case. >> > > I chatted with both Bart and Prakash and we concluded that the > existing portfs vectored system call (SYS_port) correctly doesn't > require audit since in introduced no new unaudited "signaling" > paths to the subject's address space that were not already > audited by the existing "signaling" mechanism. However, this > project does introduce new communication paths that require audit. > The project team agreed to work with the audit project team to > define the proper audit records and implement them as part of > this project. > I'll leave the project team to confirm this agreement in a follow > up message to the case ;-) > > Gary.. > From sacadmin Mon Jan 22 13:56:21 2007 Received: from jurassic.eng.sun.com (jurassic.SFBay.Sun.COM [129.146.56.144]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l0MLuLbS000665 for ; Mon, 22 Jan 2007 13:56:21 -0800 (PST) Received: from [129.146.228.98] (justforkicks.Eng.Sun.COM [129.146.228.98]) by jurassic.eng.sun.com (8.13.8+Sun/8.13.8) with ESMTP id l0MLuHMA120843; Mon, 22 Jan 2007 13:56:18 -0800 (PST) Message-ID: <45B53224.5060002@sun.com> Date: Mon, 22 Jan 2007 13:52:36 -0800 From: Prakash Sangappa User-Agent: Mail/News 1.5.0.4 (X11/20060701) MIME-Version: 1.0 To: Gary Winiger CC: bart.smaalders@sun.com, psarc@sac.sfbay.sun.com, Doug.Leavitt@sun.com, Michen.Chang@sun.com, alan.bateman@sun.com, darren.kenny@sun.com Subject: Re: PSARC/2007/027 File Events Notification API References: <200701220250.l0M2oY4N019280@marduk.eng.sun.com> In-Reply-To: <200701220250.l0M2oY4N019280@marduk.eng.sun.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Status: RO Content-Length: 1599 Gary Winiger wrote: >> I've been away and am again today. I'd like to ask for more time. >> > > Thanks for the additionaly time. Seem's I wasn't alone. > > >> Specifically I'd like to know how these events may or may not >> be eventually related to the ACE_SYSTEM_AUDIT_ACE_TYPE and/or >> ACE_SYSTEM_ALARM_ACE_TYPE of ZFS (and presumably NFSv4 and CIFS) >> > > So let me now ask (and perhaps ZFS/NFS/CIFS folk will also be > needed to answer), how will this project interplay, or make use > of an underlying mechanism like ACE_SYSTEM_ALARM_ACE_TYPE? > It's OK to assert that they are unrelated. I'm just looking > for have you looked into it and are you confortable that there > won't be conflicts that can't be avoided or worked around. > > As discussed with Gary, we will investigate how ACE_SYSTEM_ALARM_ACE_TYPE would interact with file events API and send an update to the case. > >> The event types that can be specified at port_associate() time for >> the PORT_SOURCE_FILE are FILE_ACCESS, FILE_MODIFIED, FILE_ATTRIB, >> > > I've do this and believe that I could use this project to implement > what I was thinking about if the event types specified to > port_associate() are a mask. I couldn't find that in the spec, > nor could I find it in the man page for port_associate(3C). > Are the event types a mask so I can ask for all three events? > Yes, the event types requested(at port_associate time) are masks. The application can request to watch all the three event types or any combination of those. -Prakash. > Gary.. > From sacadmin Mon Jan 22 14:22:07 2007 Received: from engmail3mpk.sfbay.Sun.COM (engmail3mpk.SFBay.Sun.COM [129.146.11.26]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l0MMM6Kh000898 for ; Mon, 22 Jan 2007 14:22:07 -0800 (PST) Received: from marduk.eng.sun.com (marduk.SFBay.Sun.COM [129.146.108.224]) by engmail3mpk.sfbay.Sun.COM (8.13.6+Sun/8.13.6/ENSMAIL,v2.2) with ESMTP id l0MMM3J5017113; Mon, 22 Jan 2007 14:22:03 -0800 (PST) Received: from marduk.eng.sun.com (localhost [127.0.0.1]) by marduk.eng.sun.com (8.13.6+Sun/8.12.11) with ESMTP id l0MMML4O020244; Mon, 22 Jan 2007 14:22:21 -0800 (PST) Received: (from gww@localhost) by marduk.eng.sun.com (8.13.6+Sun/8.12.11/Submit) id l0MMMLs8020243; Mon, 22 Jan 2007 14:22:21 -0800 (PST) Date: Mon, 22 Jan 2007 14:22:21 -0800 (PST) From: Gary Winiger Message-Id: <200701222222.l0MMMLs8020243@marduk.eng.sun.com> To: gww@eng.sun.com, prakash.sangappa@sun.com Subject: Re: PSARC/2007/027 File Events Notification API Cc: bart.smaalders@sun.com, psarc@sac.sfbay.sun.com, Doug.Leavitt@sun.com, Michen.Chang@sun.com, alan.bateman@sun.com, darren.kenny@sun.com X-Sun-Charset: US-ASCII Status: RO Content-Length: 1405 > And I'd like to think more about using this > mechanism to generate audit events that administrators may wish to > set. > >> The event types that can be specified at port_associate() time for > >> the PORT_SOURCE_FILE are FILE_ACCESS, FILE_MODIFIED, FILE_ATTRIB, > >> > > > > I've do this and believe that I could use this project to implement > > what I was thinking about if the event types specified to > > port_associate() are a mask. I couldn't find that in the spec, > > nor could I find it in the man page for port_associate(3C). > > Are the event types a mask so I can ask for all three events? > > > Yes, the event types requested(at port_associate time) are masks. The > application can > request to watch all the three event types or any combination of those. My garbled reply was intended to go to building into the audit daemon a way to generate audit event if the administrator wanted auditing to monitor certain files for changes. In thinking more about that, it would greatly benefit this functionality if the pid that caused the event would be included in the data returned. Reading between the lines, I'd expect fo_name to be the renamed name if the FILE_RENAME_TO or FILE_RENAME_FROM event is delivered. Is that correct? And would it be possible (and acceptable to the project) to return the pid causing the event? Thankx, Gary.. From sacadmin Mon Jan 22 14:31:16 2007 Received: from zion.eng.sun.com (zion.SFBay.Sun.COM [129.146.17.75]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l0MMVG9D000972 for ; Mon, 22 Jan 2007 14:31:16 -0800 (PST) Received: from [129.146.228.109] (cyber [129.146.228.109]) by zion.eng.sun.com (8.13.7+Sun/8.13.7) with ESMTP id l0MMVFOG023172; Mon, 22 Jan 2007 14:31:16 -0800 (PST) Message-ID: <45B53B22.2050703@Sun.COM> Date: Mon, 22 Jan 2007 14:30:58 -0800 From: Bart Smaalders Organization: Sun Microsystems User-Agent: Thunderbird 1.5.0.8 (X11/20061204) MIME-Version: 1.0 To: Gary Winiger CC: prakash.sangappa@Sun.COM, psarc@sac.sfbay.sun.com, Doug.Leavitt@Sun.COM, Michen.Chang@Sun.COM, alan.bateman@Sun.COM, darren.kenny@Sun.COM Subject: Re: PSARC/2007/027 File Events Notification API References: <200701222222.l0MMMLs8020243@marduk.eng.sun.com> In-Reply-To: <200701222222.l0MMMLs8020243@marduk.eng.sun.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Status: RO Content-Length: 1632 Gary Winiger wrote: >> And I'd like to think more about using this >> mechanism to generate audit events that administrators may wish to >> set. > >>>> The event types that can be specified at port_associate() time for >>>> the PORT_SOURCE_FILE are FILE_ACCESS, FILE_MODIFIED, FILE_ATTRIB, >>>> >>> I've do this and believe that I could use this project to implement >>> what I was thinking about if the event types specified to >>> port_associate() are a mask. I couldn't find that in the spec, >>> nor could I find it in the man page for port_associate(3C). >>> Are the event types a mask so I can ask for all three events? >>> >> Yes, the event types requested(at port_associate time) are masks. The >> application can >> request to watch all the three event types or any combination of those. > > My garbled reply was intended to go to building into the audit daemon > a way to generate audit event if the administrator wanted auditing > to monitor certain files for changes. In thinking more about that, > it would greatly benefit this functionality if the pid that > caused the event would be included in the data returned. > > Reading between the lines, I'd expect fo_name to be the renamed > name if the FILE_RENAME_TO or FILE_RENAME_FROM event is delivered. > Is that correct? And would it be possible (and acceptable to > the project) to return the pid causing the event? What are you trying to achieve? What pid should we use for nfs servers, CIFS, etc? - Bart -- Bart Smaalders Solaris Kernel Performance barts@cyber.eng.sun.com http://blogs.sun.com/barts From sacadmin Mon Jan 22 14:42:31 2007 Received: from binky.Central.Sun.COM (binky.Central.Sun.COM [129.153.128.104]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l0MMgUuN001325 for ; Mon, 22 Jan 2007 14:42:31 -0800 (PST) Received: from binky.Central.Sun.COM (localhost [127.0.0.1]) by binky.Central.Sun.COM (8.13.6+Sun/8.13.6) with ESMTP id l0MMg5b2003643; Mon, 22 Jan 2007 16:42:05 -0600 (CST) Received: (from nw141292@localhost) by binky.Central.Sun.COM (8.13.6+Sun/8.13.6/Submit) id l0MMg5sl003642; Mon, 22 Jan 2007 16:42:05 -0600 (CST) Date: Mon, 22 Jan 2007 16:42:05 -0600 From: Nicolas Williams To: Bart Smaalders Cc: Gary Winiger , prakash.sangappa@sun.com, psarc@sac.sfbay.sun.com, Doug.Leavitt@sun.com, Michen.Chang@sun.com, Alan.Bateman@sun.com, Darren.Kenny@sun.com Subject: Re: PSARC/2007/027 File Events Notification API Message-ID: <20070122224204.GQ12298@binky.Central.Sun.COM> References: <200701222222.l0MMMLs8020243@marduk.eng.sun.com> <45B53B22.2050703@Sun.COM> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <45B53B22.2050703@Sun.COM> User-Agent: Mutt/1.5.7i Status: RO Content-Length: 972 On Mon, Jan 22, 2007 at 02:30:58PM -0800, Bart Smaalders wrote: > Gary Winiger wrote: > > My garbled reply was intended to go to building into the audit daemon > > a way to generate audit event if the administrator wanted auditing > > to monitor certain files for changes. In thinking more about that, > > it would greatly benefit this functionality if the pid that > > caused the event would be included in the data returned. > > > > Reading between the lines, I'd expect fo_name to be the renamed > > name if the FILE_RENAME_TO or FILE_RENAME_FROM event is delivered. > > Is that correct? And would it be possible (and acceptable to > > the project) to return the pid causing the event? > > What are you trying to achieve? > > What pid should we use for nfs servers, CIFS, etc? Also, if the point is to obtain a subject token from the PID then that may not work since the events aren't synchronous and the process may be gone by the time a consumer gets the event. From sacadmin Mon Jan 22 14:47:10 2007 Received: from engmail3mpk.sfbay.Sun.COM (engmail3mpk.SFBay.Sun.COM [129.146.11.26]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l0MMlAYn001415 for ; Mon, 22 Jan 2007 14:47:10 -0800 (PST) Received: from marduk.eng.sun.com (marduk.SFBay.Sun.COM [129.146.108.224]) by engmail3mpk.sfbay.Sun.COM (8.13.6+Sun/8.13.6/ENSMAIL,v2.2) with ESMTP id l0MMkvOd001212; Mon, 22 Jan 2007 14:46:57 -0800 (PST) Received: from marduk.eng.sun.com (localhost [127.0.0.1]) by marduk.eng.sun.com (8.13.6+Sun/8.12.11) with ESMTP id l0MMlFbv020397; Mon, 22 Jan 2007 14:47:15 -0800 (PST) Received: (from gww@localhost) by marduk.eng.sun.com (8.13.6+Sun/8.12.11/Submit) id l0MMlFam020396; Mon, 22 Jan 2007 14:47:15 -0800 (PST) Date: Mon, 22 Jan 2007 14:47:15 -0800 (PST) From: Gary Winiger Message-Id: <200701222247.l0MMlFam020396@marduk.eng.sun.com> To: gww@eng.sun.com, bart.smaalders@sun.com Subject: Re: PSARC/2007/027 File Events Notification API Cc: prakash.sangappa@sun.com, psarc@sac.sfbay.sun.com, Doug.Leavitt@sun.com, Michen.Chang@sun.com, alan.bateman@sun.com, darren.kenny@sun.com X-Sun-Charset: US-ASCII Status: RO Content-Length: 224 > What are you trying to achieve? I'm trying to get to who triggered the event. Perhaps, I could ask for the ucred_t. > What pid should we use for nfs servers, CIFS, etc? ucred_t would be more useful here. Gary.. From sacadmin Mon Jan 22 16:01:13 2007 Received: from binky.Central.Sun.COM (binky.Central.Sun.COM [129.153.128.104]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l0N01D7A003038 for ; Mon, 22 Jan 2007 16:01:13 -0800 (PST) Received: from binky.Central.Sun.COM (localhost [127.0.0.1]) by binky.Central.Sun.COM (8.13.6+Sun/8.13.6) with ESMTP id l0N00m6Z004195; Mon, 22 Jan 2007 18:00:48 -0600 (CST) Received: (from nw141292@localhost) by binky.Central.Sun.COM (8.13.6+Sun/8.13.6/Submit) id l0N00lll004194; Mon, 22 Jan 2007 18:00:47 -0600 (CST) Date: Mon, 22 Jan 2007 18:00:47 -0600 From: Nicolas Williams To: Gary Winiger Cc: bart.smaalders@sun.com, prakash.sangappa@sun.com, psarc@sac.sfbay.sun.com, Doug.Leavitt@sun.com, Michen.Chang@sun.com, Alan.Bateman@sun.com, Darren.Kenny@sun.com Subject: Re: PSARC/2007/027 File Events Notification API Message-ID: <20070123000047.GC12298@binky.Central.Sun.COM> References: <200701222247.l0MMlFam020396@marduk.eng.sun.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200701222247.l0MMlFam020396@marduk.eng.sun.com> User-Agent: Mutt/1.5.7i Status: RO Content-Length: 686 On Mon, Jan 22, 2007 at 02:47:15PM -0800, Gary Winiger wrote: > > What are you trying to achieve? > > I'm trying to get to who triggered the event. Perhaps, I could > ask for the ucred_t. > > > What pid should we use for nfs servers, CIFS, etc? > > ucred_t would be more useful here. I agree, except that if the source of the events is an NFSv4.1 server then there may still be no way to get a ucred_t (unless the protocol for NFSv4.1 were to have that info in its notifications). To be generic (other FS protocols may have a directory notification system but no ucred_t or mappable equivalent) a ucred_t should be optional here (or define a ucred for "unknown"). Nico -- From sacadmin Mon Jan 22 16:18:09 2007 Received: from jurassic.eng.sun.com (jurassic.SFBay.Sun.COM [129.146.106.105]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l0N0I86b003337 for ; Mon, 22 Jan 2007 16:18:08 -0800 (PST) Received: from [129.146.228.98] (justforkicks.Eng.Sun.COM [129.146.228.98]) by jurassic.eng.sun.com (8.13.8+Sun/8.13.8) with ESMTP id l0N0I8dv179125; Mon, 22 Jan 2007 16:18:08 -0800 (PST) Message-ID: <45B55363.6080902@sun.com> Date: Mon, 22 Jan 2007 16:14:27 -0800 From: Prakash Sangappa User-Agent: Mail/News 1.5.0.4 (X11/20060701) MIME-Version: 1.0 To: Nicolas Williams CC: Gary Winiger , bart.smaalders@sun.com, psarc@sac.sfbay.sun.com, Doug.Leavitt@sun.com, Michen.Chang@sun.com, Alan.Bateman@sun.com, Darren.Kenny@sun.com Subject: Re: PSARC/2007/027 File Events Notification API References: <200701222247.l0MMlFam020396@marduk.eng.sun.com> <20070123000047.GC12298@binky.Central.Sun.COM> In-Reply-To: <20070123000047.GC12298@binky.Central.Sun.COM> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Status: RO Content-Length: 1000 Nicolas Williams wrote: > On Mon, Jan 22, 2007 at 02:47:15PM -0800, Gary Winiger wrote: > >>> What are you trying to achieve? >>> >> I'm trying to get to who triggered the event. Perhaps, I could >> ask for the ucred_t. >> >> >>> What pid should we use for nfs servers, CIFS, etc? >>> >> >> ucred_t would be more useful here. >> > > I agree, except that if the source of the events is an NFSv4.1 server > then there may still be no way to get a ucred_t (unless the protocol for > NFSv4.1 were to have that info in its notifications). > > To be generic (other FS protocols may have a directory notification > system but no ucred_t or mappable equivalent) a ucred_t should be > optional here (or define a ucred for "unknown"). > > Nico > Note also that one event is delivered per file events watch registration. Therefore the application can only get the ucred info when there is an active file events watch on the file when it gets renamed/deleted. -Prakash. From sacadmin Mon Jan 22 16:29:32 2007 Received: from binky.Central.Sun.COM (binky.Central.Sun.COM [129.153.128.104]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l0N0TWud003546 for ; Mon, 22 Jan 2007 16:29:32 -0800 (PST) Received: from binky.Central.Sun.COM (localhost [127.0.0.1]) by binky.Central.Sun.COM (8.13.6+Sun/8.13.6) with ESMTP id l0N0T7d2004385; Mon, 22 Jan 2007 18:29:07 -0600 (CST) Received: (from nw141292@localhost) by binky.Central.Sun.COM (8.13.6+Sun/8.13.6/Submit) id l0N0T6Z0004383; Mon, 22 Jan 2007 18:29:06 -0600 (CST) Date: Mon, 22 Jan 2007 18:29:06 -0600 From: Nicolas Williams To: Prakash Sangappa Cc: Gary Winiger , bart.smaalders@sun.com, psarc@sac.sfbay.sun.com, Doug.Leavitt@sun.com, Michen.Chang@sun.com, Alan.Bateman@sun.com, Darren.Kenny@sun.com Subject: Re: PSARC/2007/027 File Events Notification API Message-ID: <20070123002906.GE12298@binky.Central.Sun.COM> References: <200701222247.l0MMlFam020396@marduk.eng.sun.com> <20070123000047.GC12298@binky.Central.Sun.COM> <45B55363.6080902@sun.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <45B55363.6080902@sun.com> User-Agent: Mutt/1.5.7i Status: RO Content-Length: 434 On Mon, Jan 22, 2007 at 04:14:27PM -0800, Prakash Sangappa wrote: > Note also that one event is delivered per file events watch > registration. Therefore the application can only get the ucred info > when there is an active file events watch on the file when it gets > renamed/deleted. Ah, right, you could have some aliasing here. I don't think that you could build a file watching utility that can "audit" file changes. Nico -- From sacadmin Tue Jan 23 16:53:32 2007 Received: from jurassic.eng.sun.com (jurassic.SFBay.Sun.COM [129.146.224.130]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l0O0rWUH010979 for ; Tue, 23 Jan 2007 16:53:32 -0800 (PST) Received: from [129.146.228.98] (justforkicks.Eng.Sun.COM [129.146.228.98]) by jurassic.eng.sun.com (8.13.8+Sun/8.13.8) with ESMTP id l0O0rWFe581537; Tue, 23 Jan 2007 16:53:32 -0800 (PST) Message-ID: <45B6AD2E.5010205@sun.com> Date: Tue, 23 Jan 2007 16:49:50 -0800 From: Prakash Sangappa User-Agent: Mail/News 1.5.0.4 (X11/20060701) MIME-Version: 1.0 To: Bart Smaalders CC: psarc@sac.sfbay.sun.com, darren.kenny@sun.com, alan.bateman@sun.com, Michen.Chang@sun.com, Doug.Leavitt@sun.com Subject: Re: PSARC/2007/027 File Events Notification API References: <45A6F043.9000500@Sun.COM> In-Reply-To: <45A6F043.9000500@Sun.COM> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Status: RO Content-Length: 23000 The currently proposed man page documents that symbolic links are not followed. This behavior makes using File Events Notification API to watch files with symbolic links unnecessarily complicated. The application will have to monitor the symbolic links along with the target file. The File Events API needs to follow links if the specified file is a symbolic link. The following amends the proposed man page changes. The changes are as follows: Add the following to port_associate(3C) man page : Objects of type PORT_SOURCE_FILE are pointer to the structure file_obj defined in . This event source provides event notification when the specified file/directory is accessed, modified or its status changes. The path name of the file/directory to be watched is passed in the 'struct file_obj' along with the 'access', 'modification', and 'change' time stamps acquired from - a stat(2) call. If the file name is a symbolic link, it is not - followed; e.g. the link is monitored. --- change to ---- + a stat(2) call. If the file name specified is a symbolic link, it + is followed. All the symbolic links encountered during the + resolution of the file name and the target file are monitored. + If any symbolic link gets renamed or deleted, it will be treated as + a file delete operation. Add following to the ERRORS section of port_associate() + ELOOP A loop exists in symbolic links or more then + {SYMLOOP_MAX} symbolic links where encountered during + the resolution of the file path name specified. -Prakash. Bart Smaalders wrote: > I'm sponsoring the attached fast track for Prakash Sangappa. > The requested release binding is minor, and the proposed timeout > is 1/17/2007. The user interfaces described herein extend the > ones introduced in PSARC/2002/498 and have the same stability > levels. > > - Bart > > ------------------------------------------------------------------------ > > Copyright 2007 Sun Microsystems, Inc. > > 1. Introduction > 1.1. Project/Component Working Name: > File Events Notification API > > 1.2. Name of Document Author/Supplier: Prakash Sangappa > > 1.3. Date of This Document: 12/8/06 > > 1.4. Name of Major Document Customer(s)/Consumer(s): > 1.4.2. The ARC(s) you expect to review your project: PSARC > 1.4.4. The name of your business unit: OPG > > 1.5. Email Aliases: > 1.5.1. Responsible Manager: michael.pogue@sun.com > 1.5.2. Responsible Engineer: prakash.sangappa@sun.com > 1.5.3. Marketing Manager: N/A > 1.5.4. Interest List: bart.smaalders@sun.com > darren.kenny@sun.com > alan.bateman@sun.com > Michen.Chang@Sun.COM > Doug.Leavitt@Sun.COM > > 2. Project Summary > 2.1. Project Description: > > This project delivers API for receiving event notifications when > file or directory status changes. The API is based on the event > ports interfaces(PSARC 2002/498). The file events notification > facility will be added as a new event source to the event ports > framework. > > 2.2. Risks and Assumptions: > > The file events notifications may not be accurate on distributed > file systems like NFS, and on file systems which do not update the > time stamps(eg: file system mounted with 'noaccess time' update > option or read only file systems). > > 3. Business Summary > 3.1. Problem Area: > > Some applications have the need to monitor files and directories for > changes caused by non communicating processes. The current method is > to periodically stat them, which is inefficient. Therefore there is > a need for a file/directory monitoring facility. This facility will > allow applications to monitor files and directories and receive > notification when their status changes. > > 3.4. Competitive Analysis: > > Linux(inotify, dnotify), SGI(imon, FAM), Mac OS have flavors of file > events notification mechanism. There are some user land file monitoring > services implemented using the the kernel file events notification > API. The FAM(file alteration monitoring) from SGI and Gamin, which is > a simplified version of FAM, are user land implementations of file > monitoring services. > > 4. Technical Description: > 4.1. Details: > > The file events notification facility is implemented as a new event > source(PORT_SOURCE_FILE) under the events ports framework. The API is > based on the event ports(PSARC 2002/498) API. The object and the event > types for this event source are described in the man page changes below. > > Other implementations of file events notification, like the linux's > inotify/dnotify, support queuing events in the kernel and the events > provide additional context(like the file name created/deleted). > > As it has been discussed on the perf-discuss@opensolaris.org alias, > queuing events can cause scalability issues. On a large multiuser system, > there can be many file operations occurring; as a result the > kernel may generate events at a faster rate then the rate at which the > application can process them, forcing the events to be queued > and thus locking down kernel memory. Since a limit must be imposed > on the number of events that can be queued, the application will > have to implement a fall back method to handle missed events due to > overflow. > > Some applications, like Beagle and Spotlight (both desktop search tools) > require watching all the file and directory activity under a given > path(directory tree), so that whenever files get modified/created/deleted, > the search indicies for those files get updated. > > For example, the desktop search application 'Beagle' on Linux uses > 'inotify' to watch the directory tree. It walks the directory tree > registering a file monitor on each file and directory under it. > > On a large multiuser system with a large number of files and directories, > this approach does not scale, since monitoring very large filesystems will > require an inordinate amount of system memory. System scaling trends > imply that available storage is growing much faster than system memory. > A better solution for such use cases will be to have the filesystem > provide a mechanism/interface which would provide the list of files and > directories that have been added, modified or deleted since some given > time in the past. It appears possible on ZFS to provide this functionality > by allowing the user to get the difference between two snapshots; this is > anticipated to take order the number of files changed, allowing even > arbitrarily large filesystems to be indexed. This project does not > propose these interfaces at present; we just want to point out what we > know we're not addressing. > > In the approach we are taking with the file events notifications API, > there will be no queuing of events. The event types delivered represent > changes to the file's 'access', 'modification' and 'change' time stamps. > The events do not provide any other details. This approach is in > accordance with what the application can find out by statting a file > and comparing its timestamps. The goal is to eliminate the need for > applications (such as Nautilus, the Gnome file manager, or daemons > monitoring config files) to periodically stat the files of interest. > > The man page section 2 describes the system calls that update the > file/directory time stamps. The vnode operations corresponding these > system calls are intercepted and relevant events delivered. > The file event monitoring (FEM - PSARC 2003/172) hooks are used to > intercept the vnode operations. > > There can be only one event outstanding per file or directory that is > being monitored, i.e upon delivering an event, the file monitor is disabled. > The file or directory needs to be re-associated to activate monitoring the > file and receive the next event. To ensure that no events get missed in > between, time stamps are used. The application has to pass the time stamps > collected from a stat(2) call at the time of registering the file monitor. > The time stamps passed in are compared against the current time stamps of > the file and if they have changed, relevant events are delivered > immediately. This behavior enables multithreaded programming using > the file events notification API. It will also help filter out redundant > events. > > Example: A multithreaded application can have a pool of threads processing > file modification events. If file events were to be continuously > delivered after a single registration and a file of interest > was written to multiple times, multiple threads would receive > change notification events and proceed to process them. This > would force these threads to synchronize with each other. Note > that when only one event is delivered and the file monitor gets > disabled, one thread will be able to collect an event from a file > and process it. While this thread is processing the file no other > thread will process the same file as the file monitor is disabled > and no new events get delivered until the file is re-associated > and the file monitoring activated. There will be no need for any > type of synchronization as only one thread would be processing > the file at a time. Another useful aspect of this design is that > rapid writes result in a much reduced set of file notification > events; the monitoring application is never subject to a flood of > events even if it runs very slowly. > > The following code snippet illustrates how a mulithreaded > application with a pool of worker threads can use this file > events notification API to process file status change events. > > /* > * To initiate watching a file, this function can be called > * once. The fobj_t structure is initialized with the file > * name. The fobj pointer will be passed as the user pointer > * to be returned with the event. The 'port' is the > * event port fd obtained from a port_create(3C) call. > */ > int > watchfile(int port, file_obj_t *fobj, events) { > struct stat sbuf; > > stat(fobj->name, &sbuf); > > (fobj->name, events); > > fobj->fo_atime = sbuf.atim; > fobj->fo_mtime = sbuf.mtim; > fobj->fo_ctime = sbuf.ctim; > > return(port_associate(port, PORT_SOURCE_FILE, > (uintptr_t)fobj, events, fobj)); > } > > /* > * Application threads that process file events call > * this function. The file name is in the file_obj_t. > * This 'fobj' would be passed in as the 'user pointer' > * to be returned along with the event. > */ > void > wait_for_fileevents(int port, events) { > > port_event_t pe; > > While (1) { > struct file_obj *fobj; > if (port_get(port, &pe, NULL) == -1) > return; > > /* > * Check for exception events and process file. > */ > if (!(pe.portev_events & (FILE_EXCEPTION))) { > fobj = (file_obj_t)pe.portev_user; > > if (watchfile(port, fobj, events) == -1) > return; > } > } > } > > > 4.2. Bug/RFE Number(s): > 6367770 add user land interface to fem (file event monitoring) > 4667502 need file system event notification framework for Solaris > > 4.3. In Scope: > N/A > > 4.4. Out of Scope: > N/A > > 4.5. Interfaces: > > Proposed man page changes: > -------------------------- > > Changes to port_create(3C) man page: > > source object type association mechanism > PORT_SOURCE_AIO struct aiocb aio_read(3RT), > aio_write(3RT), > lio_listio(3RT) > PORT_SOURCE_FD file descriptor port_associate(3C) > PORT_SOURCE_MQ mqd_t mq_notify(3RT) > PORT_SOURCE_TIMER timer_t timer_create(3RT) > PORT_SOURCE_USER uintptr_t port_send(3C) > PORT_SOURCE_ALERT uintptr_t port_alert(3C) > + PORT_SOURCE_FILE file_obj_t port_associate(3C) > > ... > > + PORT_SOURCE_FILE events represent file/directory status change. Once > + an event is delivered, the file object is no longer associated with > + the port. A file object is associated or re-associated with a port > + using the port_associate(3C) function. > > > Changes to port_associate(3C) man page: > > - The only objects associated with a port by way of the > - port_associate() function are objects of type > - PORT_SOURCE_FD. Objects of other types have type-specific > - association mechanisms. See port_create(3C) for details. > > to > > + The objects that can be associated with a port by way of the > + port_associate() function are objects of type PORT_SOURCE_FD > + and PORT_SOURCE_FILE. Objects of other types have type-specific > + association mechanisms. See port_create(3C) for details. > > > Add the following to port_associate(3C) man page : > > Objects of type PORT_SOURCE_FILE are pointer to the structure > file_obj defined in . This event source provides > event notification when the specified file/directory is accessed, > modified or its status changes. The path name of the file/directory > to be watched is passed in the 'struct file_obj' along with the > 'access', 'modification', and 'change' time stamps acquired from > a stat(2) call. If the file name is a symbolic link, it is not > followed; e.g. the link is monitored. > > The struct file_obj contains the following elements: > > timestruc_t fo_atime; /* Access time got from stat() */ > timestruc_t fo_mtime; /* Modification time from stat() */ > timestruc_t fo_ctime; /* Change time from stat() */ > char *fo_name; /* Pointer to a null terminated path name */ > > At the time the port_associate function is called, the time stamps > passed in the structure file_obj are compared with the file or > directory's current time stamps and if there has been a change > an event is immediately sent to the port. If not, an event will be > sent when such a change occurs. > > The event types that can be specified at port_associate() time for > the PORT_SOURCE_FILE are FILE_ACCESS, FILE_MODIFIED, FILE_ATTRIB, > corresponding to the three time stamps. A atime change will result > in the FILE_ACCESS event, mtime time change will result in the > FILE_MODIFIED event. The ctime change will result in the FILE_ATTRIB > event. > > Following exception events are delivered when they occur. These > event types cannot be filtered. > > FILE_DELETE /* Monitored file/directory was deleted */ > FILE_RENAME_TO /* Monitored file/directory was renamed */ > FILE_RENAME_FROM /* Monitored file/directory was renamed */ > UNMOUNTED /* Monitored file system got unmounted */ > > At most one event notification will be generated per associated > 'file_obj'. When the event for the associated 'file_obj' is > retrieved, the object is no longer associated with the port. The > event can be processed without the possibility that another thread > can retrieve a subsequent event for the same object. The > port_associate() can be called to re-associate the file_obj object > with the port. > > The association is also removed if the port gets closed or > when port_dissociate() is called. > > Note: On NFS file systems, events from only the client side(local) > access/modifications to files or directories will be delivered. > > Add following to the ERRORS section of port_associate() > > EACCES The "source" argument is PORT_SOURCE_FILE and, > Search permission is denied on a component of > path prefix or the file exists and the > permissions, corresponding to the "events" > argument, are denied. > > ENOENT The "source" argument is PORT_SOURCE_FILE and > the file does not exist or the path prefix > does not exist or the path points to an empty > string. > > ENOTSUP The "source" argument is PORT_SOURCE_FILE and > the filesystem on which the specified file recides, > does not support watching for file events > notifications. > > Add following to the ERRORS section of port_dissociate() > > EINVAL The "source" argument is PORT_SOURCE_FILE and > the specified file is currently not associated > with the port(not being watched for file events > notifications). > > > Changes to the VOP, FEM interfaces > ---------------------------------- > > In order to correctly identify file events on files having hard links, > it is required to pass the directory vnode pointer and the file name > component along with the VNEVENT type to VOP_VNEVENT() interface routine. > > Example: > > If a file has the following links > > /tmp/dir1/foo > /tmp/dir2/foo > > and an application is watching /tmp/dir2/foo for file events. > When /tmp/dir1/foo gets removed(rm), right now we receive a > VN_REMOVE vnevent on the vnode. It is not possible to determine > if /tmp/dir1/foo got removed or /tmp/dir2/foo got removed. > > When the link count is increased/decreased, the ctime gets updated > on the file. So, the correct event here on /tmp/dir1/foo should > be FILE_ATTRIB indicating 'ctime' change. > > Where as if /tmp/dir2/foo get removed(rm), then it should receive > a FILE_DELETE event as the name /tmp/dir2/foo got removed. > > > This can be determined if the directory vnode pointer and the > file name components are passed to the VOP_VNEVENT() interface. > > > Modified VOP and supporting FEM interfaces - Consolidation private > --------------------------------------- > > Two new arguments added, 'vnode_t *dvp' and 'char *cname' > > VOP_VNEVENT(vnode_t *vp, vnevent_t vnevent, vnode_t *dvp, char *cname) > fop_vnevent(vnode_t *vp, vnevent_t vnevent, vnode_t *dvp, char *cname) > vnext_vnevent(femarg_t *vf, vnevent_t vnevent, vnode_t *dvp, char *cname) > > vnevent_rename_src(vnode_t *vp, vnode_t *dvp, char *name) > vnevent_rename_dest(vnode_t *vp, vnode_t *dvp, char *name) > vnevent_remove(vnode_t *vp, vnode_t *dvp, char *name) > vnevent_rmdir(vnode_t *vp, vnode_t *dvp, char *name) > > > New VNEVENT types - Consolidation private: > ------------------ > > VE_CREATE - Represents a create operation on an already existing file. > > VE_LINK - The source file of a 'link' system call to file. > > VE_RENAME_DEST_DIR - Destination directory of a rename() operation > > > Corresponding new vnevent routine added: - Consolidation private. > ----------------------------------- > > void vnevent_create(vnode_t *vp) > void vnevent_create(vnode_t *vp) > void vnevent_rename_dest_dir(vnode_t *vp) > > > New member added to private section of 'vnode.h' > ----------------------------------------------- > > + void *v_fopdata; /* file events notification - private data */ > > > VNEVENT support in NFS: > ----------------------- > Added VNEVENTS support to the NFS file system to report client side file > events. It is used to catch any local(client side) file operations on a > NFS file system and report file events. Clearly, this is not complete as > it will not be able to catch any of the server side file operations. This > is documented in the man page. > > 4.6. Doc Impact: > port_associate(3C) - man page > port_create(3C) - man page > > > 5. Reference Documents: > project page - http://perf.eng.sun.com/twiki/bin/view/EventPorts/EPFileEvents > > PSARC/2002/498 - Event Completion Framework > PSARC/2003/172 - FEM (File event Monitoring) > PSARC/2004/170 - VOP_VNEVENT() > > > 6. Resources and Schedule: > 6.1. Projected Availability: > S11 > > 6.2. Cost of Effort: > Development is largely done. > Test case development - 1 week. > > 6.3. Cost of Capital Resources: > N/A > > 6.4. Product Approval Committee requested information: > 6.4.1. Consolidation or Component Name: ON > 6.4.3. Type of CPT Review and Approval expected: RFE > 6.4.4. Project Boundary Conditions: N/A > 6.4.5. Is this a necessary project for OEM agreements: No > 6.4.6. Notes: N/A > 6.4.7. Target RTI Date/Release: > 6.4.8. Target Code Design Review Date: > 6.4.9. Update approval addition: No > > 6.5. ARC review type: FastTrack > > 7. Prototype Availability: > 7.1. Prototype Availability: now > From sacadmin Wed Jan 24 08:59:50 2007 Received: from jurassic.eng.sun.com (jurassic.SFBay.Sun.COM [129.146.224.31]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l0OGxnam029682 for ; Wed, 24 Jan 2007 08:59:49 -0800 (PST) Received: from sheplap.central.sun.com (sheplap.Central.Sun.COM [10.1.194.251]) by jurassic.eng.sun.com (8.13.8+Sun/8.13.8) with ESMTP id l0OGxnW8882877; Wed, 24 Jan 2007 08:59:49 -0800 (PST) Received: by sheplap.central.sun.com (Postfix, from userid 76367) id 7664436817D; Wed, 24 Jan 2007 10:59:47 -0600 (CST) Date: Wed, 24 Jan 2007 10:59:47 -0600 From: Spencer Shepler To: psarc@sac.sfbay.sun.com Cc: prakash.sangappa@sun.com, darren.kenny@sun.com, alan.bateman@sun.com, Michen.Chang@sun.com, Doug.Leavitt@sun.com, Bart Smaalders Subject: Re: PSARC/2007/027 File Events Notification API Message-ID: <20070124165946.GF13054@sheplap.central.sun.com> Reply-To: spencer.shepler@sun.com References: <45A6F043.9000500@Sun.COM> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <45A6F043.9000500@Sun.COM> User-Agent: Mutt/1.4.2.1i Status: RO Content-Length: 3278 On Thu, Bart Smaalders wrote: > > VNEVENT support in NFS: > ----------------------- > Added VNEVENTS support to the NFS file system to report client side file > events. It is used to catch any local(client side) file operations on a > NFS file system and report file events. Clearly, this is not complete as > it will not be able to catch any of the server side file operations. This > is documented in the man page. It is true that the NFSv2/v3/v4 client is incapable of providing the necessary notification mechanisms for the proposed interface. In the case of NFSv4.1, this is not the case. There is a brief introduction and description of the upcoming NFSv4.1 support located at: http://www.nfsv4-editor.org/draft-08/draft-ietf-nfsv4-minorversion1-08.html#anchor86 -and- http://www.nfsv4-editor.org/draft-08/draft-ietf-nfsv4-minorversion1-08.html#OP_GET_DIR_DELEGATION http://www.nfsv4-editor.org/draft-08/draft-ietf-nfsv4-minorversion1-08.html#OP_CB_NOTIFY The information located there will be cryptic for the unitiated but essentially NFSv4.1 adds the ability of an NFSv4.1 server to provide directory delegations (client will be notified of a directory modification in the form of a "recall" of the directory delegation). As part of that support, there is a set of notifications that ride on top of the directory delegation. The notification events that can be requested of the server for the directory and its contents are: - change in directory's attributes - change in child object attributes - directory entry removed - directory entry added - directory entry renamed - directory cookie change (protocol specific) The main intent of this mechanism is to provide the NFSv4.1 client the ability to reduce the amount of cache validation traffic that flows between the traditional NFS client and server. However, these mechanism could be used to provide functionality for the proposed notification APIs. There is nothing that I can see at this point that would inhibit the NFSv4.1 client in provide the feature set described in this case. One of my concerns is related to taking advantage of the full potential of the NFSv4.1 protocol for this API (can we provide more than just timestamp update events given that there are directory name events that are available). The second concern is that the current FEM usage doesnt' inhibit the needs of the NFSv4.1 server. What I mean by this is that for effective scalability of an NFSv4.1 server providing directory delegation support it may be necessary (in fact likely) that the underlying filesystem will need to be updated to store NFSv4.1 notification actions. The main reason for this is that holding all of the vnodes in memory to use the FEM mechanism to implement the NFSv4.1 feature will be too cumbersome (recalling the directory delegations when vnodes fall out of the cache is too burdensome on the client as well). In summary, I had hoped for a user-level API that would take greater advantage of the NFSv4.1 capabilities so that the Solaris implementation would be stronger in this area. However, I don't currently see anything in the proposal that is incompatible with the NFSv4.1 support given that we remain aware of the NFSv4.1 server side requirements. Spencer From sacadmin Wed Jan 24 11:37:26 2007 Received: from zion.eng.sun.com (zion.SFBay.Sun.COM [129.146.17.75]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l0OJbQeV007717 for ; Wed, 24 Jan 2007 11:37:26 -0800 (PST) Received: from [129.146.228.109] (cyber [129.146.228.109]) by zion.eng.sun.com (8.13.7+Sun/8.13.7) with ESMTP id l0OJbP4x027949; Wed, 24 Jan 2007 11:37:26 -0800 (PST) Message-ID: <45B7B564.4070008@Sun.COM> Date: Wed, 24 Jan 2007 11:37:08 -0800 From: Bart Smaalders Organization: Sun Microsystems User-Agent: Thunderbird 1.5.0.8 (X11/20061204) MIME-Version: 1.0 To: spencer.shepler@Sun.COM CC: psarc@sac.sfbay.sun.com, prakash.sangappa@Sun.COM, darren.kenny@Sun.COM, alan.bateman@Sun.COM, Michen.Chang@Sun.COM, Doug.Leavitt@Sun.COM Subject: Re: PSARC/2007/027 File Events Notification API References: <45A6F043.9000500@Sun.COM> <20070124165946.GF13054@sheplap.central.sun.com> In-Reply-To: <20070124165946.GF13054@sheplap.central.sun.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Status: RO Content-Length: 4949 Spencer Shepler wrote: > On Thu, Bart Smaalders wrote: >> VNEVENT support in NFS: >> ----------------------- >> Added VNEVENTS support to the NFS file system to report client side file >> events. It is used to catch any local(client side) file operations on a >> NFS file system and report file events. Clearly, this is not complete as >> it will not be able to catch any of the server side file operations. This >> is documented in the man page. > > It is true that the NFSv2/v3/v4 client is incapable of providing > the necessary notification mechanisms for the proposed interface. > In the case of NFSv4.1, this is not the case. > > There is a brief introduction and description of the upcoming > NFSv4.1 support located at: > > http://www.nfsv4-editor.org/draft-08/draft-ietf-nfsv4-minorversion1-08.html#anchor86 > -and- > http://www.nfsv4-editor.org/draft-08/draft-ietf-nfsv4-minorversion1-08.html#OP_GET_DIR_DELEGATION > http://www.nfsv4-editor.org/draft-08/draft-ietf-nfsv4-minorversion1-08.html#OP_CB_NOTIFY > > The information located there will be cryptic for the unitiated but > essentially NFSv4.1 adds the ability of an NFSv4.1 server to provide > directory delegations (client will be notified of a directory > modification in the form of a "recall" of the directory delegation). > As part of that support, there is a set of notifications that ride > on top of the directory delegation. The notification events that > can be requested of the server for the directory and its contents > are: > - change in directory's attributes > - change in child object attributes > - directory entry removed > - directory entry added > - directory entry renamed > - directory cookie change (protocol specific) > > The main intent of this mechanism is to provide the NFSv4.1 client > the ability to reduce the amount of cache validation traffic that > flows between the traditional NFS client and server. > > However, these mechanism could be used to provide functionality > for the proposed notification APIs. > > There is nothing that I can see at this point that would inhibit the > NFSv4.1 client in provide the feature set described in this case. > Good. How does the 4.1 protocol deal with the producer/consumer problem? Is there a way for the server to signal the client that massive change has occurred? If 100 clients are watching a directory, and I remove 1000 files from that directory, does that mean that we need to send 100K directory entry removed notifications? Will all servers be required to send all notifications, or is a change notification sufficient? > One of my concerns is related to taking advantage of the full potential > of the NFSv4.1 protocol for this API (can we provide more than just > timestamp update events given that there are directory name events > that are available). Note that the real performance advantage here is avoiding over the wire traffic. Does it make sense for the client to maintain a "shadow" version of the directory contents in the kernel? Since the client kernel would know that its directory cache was valid, it could satisfy a readdir request locally w/ little overhead. If the client ran low on memory, it could scrap the cache; subsequent readdir requests would result in over the wire traffic. This would keep the application coding the same, yet provide the majority of the performance benefits. We're loathe to provide a name value that only works some of the time. > The second concern is that the current FEM > usage doesnt' inhibit the needs of the NFSv4.1 server. What I mean > by this is that for effective scalability of an NFSv4.1 server providing > directory delegation support it may be necessary (in fact likely) that > the underlying filesystem will need to be updated to store NFSv4.1 > notification actions. The main reason for this is that holding > all of the vnodes in memory to use the FEM mechanism to implement > the NFSv4.1 feature will be too cumbersome (recalling the directory > delegations when vnodes fall out of the cache is too burdensome on the > client as well). > Are you saying that the FEM mechanism as designed for nfsv4 is not sufficient for nfsv4.1? What would you like this project to do about that? > In summary, I had hoped for a user-level API that would take greater > advantage of the NFSv4.1 capabilities so that the Solaris implementation > would be stronger in this area. However, I don't currently see anything > in the proposal that is incompatible with the NFSv4.1 support given > that we remain aware of the NFSv4.1 server side requirements. > > Spencer We do not want multiple change events to flow from a single event request registration. This is largely incompatible with multi-threaded usage, and memory load on the system increases the slower the system runs. - Bart -- Bart Smaalders Solaris Kernel Performance barts@cyber.eng.sun.com http://blogs.sun.com/barts From sacadmin Wed Jan 24 17:20:15 2007 Received: from jurassic.eng.sun.com (jurassic.SFBay.Sun.COM [129.146.226.130]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l0P1KF6R016295 for ; Wed, 24 Jan 2007 17:20:15 -0800 (PST) Received: from [129.146.228.98] (justforkicks.Eng.Sun.COM [129.146.228.98]) by jurassic.eng.sun.com (8.13.8+Sun/8.13.8) with ESMTP id l0P1K3m5149502; Wed, 24 Jan 2007 17:20:04 -0800 (PST) Message-ID: <45B804E4.9070107@sun.com> Date: Wed, 24 Jan 2007 17:16:20 -0800 From: Prakash Sangappa User-Agent: Mail/News 1.5.0.4 (X11/20060701) MIME-Version: 1.0 To: Gary Winiger CC: bart.smaalders@sun.com, psarc@sac.sfbay.sun.com, Doug.Leavitt@sun.com, Michen.Chang@sun.com, alan.bateman@sun.com, darren.kenny@sun.com Subject: Re: PSARC/2007/027 File Events Notification API References: <200701220250.l0M2oY4N019280@marduk.eng.sun.com> In-Reply-To: <200701220250.l0M2oY4N019280@marduk.eng.sun.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Status: RO Content-Length: 1249 Update on the ACE_SYSTEM_ALARM_TYPE - This was a proposal by the ZFS project with their ACL's implementation, but it has not been implemented. I have discussed it with the ZFS engineer Mark Shellenbaum. At the moment, details regarding how the ACE ALARM type event notification will be delivered has not been decided. It may even be possible to deliver the ACE ALARM events under the File Events API. This has to be looked into if and when the ACE ALARM type events are to be implemented. We agreed that there will not be any conflicts, with the File Events API, that cannot be avoided or worked around. -Prakash. Gary Winiger wrote: >> Specifically I'd like to know how these events may or may not >> be eventually related to the ACE_SYSTEM_AUDIT_ACE_TYPE and/or >> ACE_SYSTEM_ALARM_ACE_TYPE of ZFS (and presumably NFSv4 and CIFS) >> > > So let me now ask (and perhaps ZFS/NFS/CIFS folk will also be > needed to answer), how will this project interplay, or make use > of an underlying mechanism like ACE_SYSTEM_ALARM_ACE_TYPE? > It's OK to assert that they are unrelated. I'm just looking > for have you looked into it and are you confortable that there > won't be conflicts that can't be avoided or worked around. > From sacadmin Thu Jan 25 09:38:03 2007 Received: from jurassic.eng.sun.com (jurassic.SFBay.Sun.COM [129.146.56.144]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l0PHc30I004254 for ; Thu, 25 Jan 2007 09:38:03 -0800 (PST) Received: from sheplap.local (vpn-129-150-32-195.Central.Sun.COM [129.150.32.195]) by jurassic.eng.sun.com (8.13.8+Sun/8.13.8) with ESMTP id l0PHc2qY250449; Thu, 25 Jan 2007 09:38:03 -0800 (PST) Received: by sheplap.local (Postfix, from userid 76367) id 53A8C3695A1; Thu, 25 Jan 2007 11:38:02 -0600 (CST) Date: Thu, 25 Jan 2007 11:38:01 -0600 From: Spencer Shepler To: Bart Smaalders Cc: spencer.shepler@sun.com, psarc@sac.sfbay.sun.com, prakash.sangappa@sun.com, darren.kenny@sun.com, alan.bateman@sun.com, Michen.Chang@sun.com, Doug.Leavitt@sun.com Subject: Re: PSARC/2007/027 File Events Notification API Message-ID: <20070125173801.GC395@sheplap.local> Reply-To: spencer.shepler@sun.com References: <45A6F043.9000500@Sun.COM> <20070124165946.GF13054@sheplap.central.sun.com> <45B7B564.4070008@Sun.COM> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <45B7B564.4070008@Sun.COM> User-Agent: Mutt/1.4.2.1i Status: RO Content-Length: 7703 On Wed, Bart Smaalders wrote: > Spencer Shepler wrote: > >On Thu, Bart Smaalders wrote: > >> VNEVENT support in NFS: > >> ----------------------- > >> Added VNEVENTS support to the NFS file system to report client side > >> file > >> events. It is used to catch any local(client side) file operations on a > >> NFS file system and report file events. Clearly, this is not complete > >> as > >> it will not be able to catch any of the server side file operations. > >> This > >> is documented in the man page. > > > >It is true that the NFSv2/v3/v4 client is incapable of providing > >the necessary notification mechanisms for the proposed interface. > >In the case of NFSv4.1, this is not the case. > > > >There is a brief introduction and description of the upcoming > >NFSv4.1 support located at: > > > >http://www.nfsv4-editor.org/draft-08/draft-ietf-nfsv4-minorversion1-08.html#anchor86 > >-and- > >http://www.nfsv4-editor.org/draft-08/draft-ietf-nfsv4-minorversion1-08.html#OP_GET_DIR_DELEGATION > >http://www.nfsv4-editor.org/draft-08/draft-ietf-nfsv4-minorversion1-08.html#OP_CB_NOTIFY > > > >The information located there will be cryptic for the unitiated but > >essentially NFSv4.1 adds the ability of an NFSv4.1 server to provide > >directory delegations (client will be notified of a directory > >modification in the form of a "recall" of the directory delegation). > >As part of that support, there is a set of notifications that ride > >on top of the directory delegation. The notification events that > >can be requested of the server for the directory and its contents > >are: > > - change in directory's attributes > > - change in child object attributes > > - directory entry removed > > - directory entry added > > - directory entry renamed > > - directory cookie change (protocol specific) > > > >The main intent of this mechanism is to provide the NFSv4.1 client > >the ability to reduce the amount of cache validation traffic that > >flows between the traditional NFS client and server. > > > >However, these mechanism could be used to provide functionality > >for the proposed notification APIs. > > > >There is nothing that I can see at this point that would inhibit the > >NFSv4.1 client in provide the feature set described in this case. > > > > Good. > > How does the 4.1 protocol deal with the producer/consumer problem? > Is there a way for the server to signal the client that massive change > has occurred? If 100 clients are watching a directory, and I remove > 1000 files from that directory, does that mean that we need to send > 100K directory entry removed notifications? Will all servers be > required to send all notifications, or is a change notification > sufficient? There are a couple of things that are in the NFSv4.1 notification support that allows for dealing with scalability issues. The first is that the server is not required to grant notifications requested. The client can ask for a directory delegation and associate notifications and the server can say "no". The server can also provide a subset of the requested set of notifications. Therefore, a client that wants, say, all of the notifications types and the server realizes that it is becoming overloaded (number of clients on a directory or number of directories) it could just provide the "change in directory attributes" notification and not all of the others. This combined with the fact that the server can delay its notifications, based on the client's initial request, the server can reduce the amount of network traffic involved with the notifications. The other things about the way the protocol is defined, the server is capable of sending multiple notification events for a particular filesystem directory. Again, based on the client's specification about how long of a delay is acceptable for notification, the server can gather a number of notification events and incur one network event. Note that based on the NFSv4.1 server's ability to not provide notification events, it may be reasonable to allow the API to provide this information to the application. Maybe a method of saying that the notification will be local-only vs. fully network aware. Maybe this should be present anyway given that we will need to support NFSv3 mounts that can not provide full notification support like a local filesystem would be. > >One of my concerns is related to taking advantage of the full potential > >of the NFSv4.1 protocol for this API (can we provide more than just > >timestamp update events given that there are directory name events > >that are available). > > Note that the real performance advantage here is avoiding over the > wire traffic. Does it make sense for the client to maintain a > "shadow" version of the directory contents in the kernel? Since > the client kernel would know that its directory cache was valid, > it could satisfy a readdir request locally w/ little overhead. > If the client ran low on memory, it could scrap the cache; subsequent > readdir requests would result in over the wire traffic. > > This would keep the application coding the same, yet provide the > majority of the performance benefits. We're loathe to provide > a name value that only works some of the time. Yes, the idea for the NFSv4.1 support is mainly for allowing the NFS client kernel to keep a directory and name entries effectively cached. The name notification events can be used for keeping directory contents cached. > > The second concern is that the current FEM > >usage doesnt' inhibit the needs of the NFSv4.1 server. What I mean > >by this is that for effective scalability of an NFSv4.1 server providing > >directory delegation support it may be necessary (in fact likely) that > >the underlying filesystem will need to be updated to store NFSv4.1 > >notification actions. The main reason for this is that holding > >all of the vnodes in memory to use the FEM mechanism to implement > >the NFSv4.1 feature will be too cumbersome (recalling the directory > >delegations when vnodes fall out of the cache is too burdensome on the > >client as well). > > > > Are you saying that the FEM mechanism as designed for nfsv4 is > not sufficient for nfsv4.1? What would you like this project to do > about that? For the NFSv4.1 client, there needs to be a way to interact with the underlying filesystem given that we will be asking the server to help with notification events. I assume that VNEVENT will allow for that. For the NFSv4.1 server, the use of FEM is insufficient. The main reason is that a vnode may fall out of cache for the NFSv4.1 server (in fact we will want it to to scale) and it would be preferable that the NFSv4.1 server does not have to recall the associated delegation when that occurs. Therefore, there needs to be a method of instantiating the notification events persistently at the NFSv4.1 server and that seems to best handled by the underlying filesystem. I don't have a specific design in mind. > >In summary, I had hoped for a user-level API that would take greater > >advantage of the NFSv4.1 capabilities so that the Solaris implementation > >would be stronger in this area. However, I don't currently see anything > >in the proposal that is incompatible with the NFSv4.1 support given > >that we remain aware of the NFSv4.1 server side requirements. > > > >Spencer > > We do not want multiple change events to flow from a single > event request registration. This is largely incompatible with > multi-threaded usage, and memory load on the system increases the > slower the system runs. The NFSv4.1 client can filter the events to match the upper level API appropriately. Spencer From sacadmin Thu Jan 25 13:28:28 2007 Received: from jurassic.eng.sun.com (jurassic.SFBay.Sun.COM [129.146.106.105]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l0PLSPkO010819 for ; Thu, 25 Jan 2007 13:28:28 -0800 (PST) Received: from [192.9.61.228] (punchin-client-192-9-61-228.SFBay.Sun.COM [192.9.61.228]) by jurassic.eng.sun.com (8.13.8+Sun/8.13.8) with ESMTP id l0PLSJV8332971; Thu, 25 Jan 2007 13:28:19 -0800 (PST) Message-ID: <45B920EE.3090701@sun.com> Date: Thu, 25 Jan 2007 13:28:14 -0800 From: prakash sangappa User-Agent: Thunderbird 1.5 (X11/20060113) MIME-Version: 1.0 To: spencer.shepler@sun.com CC: Bart Smaalders , psarc@sac.sfbay.sun.com, darren.kenny@sun.com, alan.bateman@sun.com, Michen.Chang@sun.com, Doug.Leavitt@sun.com Subject: Re: PSARC/2007/027 File Events Notification API References: <45A6F043.9000500@Sun.COM> <20070124165946.GF13054@sheplap.central.sun.com> <45B7B564.4070008@Sun.COM> <20070125173801.GC395@sheplap.local> In-Reply-To: <20070125173801.GC395@sheplap.local> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Status: RO Content-Length: 1500 Spencer Shepler wrote: > > Note that based on the NFSv4.1 server's ability to not provide notification > events, it may be reasonable to allow the API to provide this > information to the application. Maybe a method of saying that the > notification will be local-only vs. fully network aware. Maybe this > should be present anyway given that we will need to support NFSv3 > mounts that can not provide full notification support like a local > filesystem would be. > Are you proposing that we provide this information now with the file events API? Currently, all the file event notifications will be from local file operations on all supported file systems including nfs. If we have to provide such information to the application, we will have to do that at the time of registering the file events watch(port_associate()). I think we can make provision to return such information in the file object for now and the details worked out once we have the nfsv4.1 directory delegation implemented. Add a another member(fo_pad) to the file object structure. struct file_obj { timestruc_t fo_atime; /* Access time got from stat(2) */ timestruc_t fo_mtime; /* Modification time from stat(2) */ timestruc_t fo_ctime; /* Change time from stat(2) */ uintptr_t fo_pad; /* For future expansion */ char *fo_name; /* Pointer to a null terminated path name */ } -Prakash. From sacadmin Tue Jan 30 12:44:38 2007 Received: from engmail3mpk.sfbay.Sun.COM (engmail3mpk.SFBay.Sun.COM [129.146.11.26]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l0UKicGu023382 for ; Tue, 30 Jan 2007 12:44:38 -0800 (PST) Received: from marduk.eng.sun.com (marduk.SFBay.Sun.COM [129.146.108.224]) by engmail3mpk.sfbay.Sun.COM (8.13.6+Sun/8.13.6/ENSMAIL,v2.2) with ESMTP id l0UKiYWS001199; Tue, 30 Jan 2007 12:44:34 -0800 (PST) Received: from marduk.eng.sun.com (localhost [127.0.0.1]) by marduk.eng.sun.com (8.13.6+Sun/8.12.11) with ESMTP id l0UKj4HJ018628; Tue, 30 Jan 2007 12:45:04 -0800 (PST) Received: (from gww@localhost) by marduk.eng.sun.com (8.13.6+Sun/8.12.11/Submit) id l0UKj3Je018627; Tue, 30 Jan 2007 12:45:04 -0800 (PST) Date: Tue, 30 Jan 2007 12:45:04 -0800 (PST) From: Gary Winiger Message-Id: <200701302045.l0UKj3Je018627@marduk.eng.sun.com> To: prakash.sangappa@sun.com, spencer.shepler@sun.com Cc: Doug.Leavitt@sun.com, Michen.Chang@sun.com, alan.bateman@sun.com, bart.smaalders@sun.com, darren.kenny@sun.com, psarc@sac.sfbay.sun.com Subject: Re: PSARC/2007/027 File Events Notification API Status: RO Content-Length: 1088 > Add a another member(fo_pad) to the file object structure. > > struct file_obj { > timestruc_t fo_atime; /* Access time got from stat(2) */ > timestruc_t fo_mtime; /* Modification time from stat(2) */ > timestruc_t fo_ctime; /* Change time from stat(2) */ > uintptr_t fo_pad; /* For future expansion > */ > char *fo_name; /* Pointer to a null > terminated path name */ > } Perhaps I'm misreading this. Perhaps I read too much into a previous reply. Perhaps I've just missed something all together. I'd expected the to see some indication of the type of event received FILE_ACCESS, FILE_MODIFIED, FILE_ATTRIB, FILE_RENAME_TO, FILE_RENAME_FROM in the data returned with the event. And I'm not sure my question of 22 Jan (message 33): "Reading between the lines, I'd expect fo_name to be the renamed name if the FILE_RENAME_TO or FILE_RENAME_FROM event is delivered. Is that correct?" was answered. The other question in the paragraph was. Gary.. From sacadmin Tue Jan 30 13:02:14 2007 Received: from binky.Central.Sun.COM (binky.Central.Sun.COM [129.153.128.104]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l0UL2ED7023830 for ; Tue, 30 Jan 2007 13:02:14 -0800 (PST) Received: from binky.Central.Sun.COM (localhost [127.0.0.1]) by binky.Central.Sun.COM (8.13.6+Sun/8.13.6) with ESMTP id l0UL1kxV009190; Tue, 30 Jan 2007 15:01:46 -0600 (CST) Received: (from nw141292@localhost) by binky.Central.Sun.COM (8.13.6+Sun/8.13.6/Submit) id l0UL1kVR009189; Tue, 30 Jan 2007 15:01:46 -0600 (CST) Date: Tue, 30 Jan 2007 15:01:46 -0600 From: Nicolas Williams To: prakash sangappa Cc: Spencer.Shepler@sun.com, Bart Smaalders , psarc@sac.sfbay.sun.com, Darren.Kenny@sun.com, Alan.Bateman@sun.com, Michen.Chang@sun.com, Doug.Leavitt@sun.com Subject: Re: PSARC/2007/027 File Events Notification API Message-ID: <20070130210145.GQ28618@binky.Central.Sun.COM> References: <45A6F043.9000500@Sun.COM> <20070124165946.GF13054@sheplap.central.sun.com> <45B7B564.4070008@Sun.COM> <20070125173801.GC395@sheplap.local> <45B920EE.3090701@sun.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <45B920EE.3090701@sun.com> User-Agent: Mutt/1.5.7i Status: RO Content-Length: 753 On Thu, Jan 25, 2007 at 01:28:14PM -0800, prakash sangappa wrote: > Add a another member(fo_pad) to the file object structure. > > struct file_obj { > timestruc_t fo_atime; /* Access time got from stat(2) */ > timestruc_t fo_mtime; /* Modification time from stat(2) */ > timestruc_t fo_ctime; /* Change time from stat(2) */ > uintptr_t fo_pad; /* For future expansion > */ > char *fo_name; /* Pointer to a null > terminated path name */ > } Sounds cool, but make sure that there's a way to convey event drop events (corresponding to, e.g., NFSv4.1 directory delegation recalls when a directory is experiencing change large volumes). From sacadmin Tue Jan 30 13:43:37 2007 Received: from jurassic.eng.sun.com (jurassic.SFBay.Sun.COM [129.146.104.31]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l0ULhbP4025698 for ; Tue, 30 Jan 2007 13:43:37 -0800 (PST) Received: from [129.146.228.98] (justforkicks.Eng.Sun.COM [129.146.228.98]) by jurassic.eng.sun.com (8.13.8+Sun/8.13.8) with ESMTP id l0ULhZg3737290; Tue, 30 Jan 2007 13:43:36 -0800 (PST) Message-ID: <45BFBB22.3010402@sun.com> Date: Tue, 30 Jan 2007 13:39:46 -0800 From: Prakash Sangappa User-Agent: Mail/News 1.5.0.4 (X11/20060701) MIME-Version: 1.0 To: Gary Winiger CC: spencer.shepler@sun.com, Doug.Leavitt@sun.com, Michen.Chang@sun.com, alan.bateman@sun.com, bart.smaalders@sun.com, darren.kenny@sun.com, psarc@sac.sfbay.sun.com Subject: Re: PSARC/2007/027 File Events Notification API References: <200701302045.l0UKj3Je018627@marduk.eng.sun.com> In-Reply-To: <200701302045.l0UKj3Je018627@marduk.eng.sun.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Status: RO Content-Length: 2347 Gary Winiger wrote: >> Add a another member(fo_pad) to the file object structure. >> >> struct file_obj { >> timestruc_t fo_atime; /* Access time got from stat(2) */ >> timestruc_t fo_mtime; /* Modification time from stat(2) */ >> timestruc_t fo_ctime; /* Change time from stat(2) */ >> uintptr_t fo_pad; /* For future expansion >> */ >> char *fo_name; /* Pointer to a null >> terminated path name */ >> } >> > > Perhaps I'm misreading this. Perhaps I read too much into a > previous reply. Perhaps I've just missed something all together. > I'd expected the to see some indication of the type of event > received FILE_ACCESS, FILE_MODIFIED, FILE_ATTRIB, FILE_RENAME_TO, > FILE_RENAME_FROM in the data returned with the event. And I'm > The event type is returned in the port_event_t structure passed to the port_get()/port_getn() call. Yes, the type of events returned are FILE_ACCESS FILE_MODIFIED FILE_ATTRIB FILE_RENAME_TO FILE_RENAME_FROM FILE_DELETE UNMOUNTED These will be defined in port.h. The event FILE_RENAME_TO indicates that some other file was renamed to the file name in 'fo_name' which is being monitored. Example: $ mv /tmp/source /tmp/target The file name '/tmp/target was specified in 'fo_name' and is being monitored. The event FILE_RENAME_FROM indicates that the file whose name is in fo_name was renamed to some other name. Example: $ mv /tmp/source /tmp/target The file name '/tmp/source' was specified in 'fo_name' and is being monitored. > not sure my question of 22 Jan (message 33): > "Reading between the lines, I'd expect fo_name to be the renamed > name if the FILE_RENAME_TO or FILE_RENAME_FROM event is delivered. > Is that correct?" > I am sorry, if this was not clarified. No as per the current proposal, the source/ target file name of the rename operation is not being returned in fo_name. In order to return the renamed file name, the fo_name should be large enough to accommodate the resulting target file path name string. Which means the application should always allocate the fo_name array to be of size 'MAXPATHLEN' (which is set at 1024). -Prakash. > was answered. The other question in the paragraph was. > > > Gary.. > From sacadmin Tue Jan 30 13:47:46 2007 Received: from jurassic.eng.sun.com (jurassic.SFBay.Sun.COM [129.146.224.31]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l0ULlkVv025938 for ; Tue, 30 Jan 2007 13:47:46 -0800 (PST) Received: from [129.146.228.98] (justforkicks.Eng.Sun.COM [129.146.228.98]) by jurassic.eng.sun.com (8.13.8+Sun/8.13.8) with ESMTP id l0ULljYv738620; Tue, 30 Jan 2007 13:47:45 -0800 (PST) Message-ID: <45BFBC1C.2050503@sun.com> Date: Tue, 30 Jan 2007 13:43:56 -0800 From: Prakash Sangappa User-Agent: Mail/News 1.5.0.4 (X11/20060701) MIME-Version: 1.0 To: Nicolas Williams CC: Spencer.Shepler@sun.com, Bart Smaalders , psarc@sac.sfbay.sun.com, Darren.Kenny@sun.com, Alan.Bateman@sun.com, Michen.Chang@sun.com, Doug.Leavitt@sun.com Subject: Re: PSARC/2007/027 File Events Notification API References: <45A6F043.9000500@Sun.COM> <20070124165946.GF13054@sheplap.central.sun.com> <45B7B564.4070008@Sun.COM> <20070125173801.GC395@sheplap.local> <45B920EE.3090701@sun.com> <20070130210145.GQ28618@binky.Central.Sun.COM> In-Reply-To: <20070130210145.GQ28618@binky.Central.Sun.COM> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Status: RO Content-Length: 1079 Nicolas Williams wrote: > On Thu, Jan 25, 2007 at 01:28:14PM -0800, prakash sangappa wrote: > >> Add a another member(fo_pad) to the file object structure. >> >> struct file_obj { >> timestruc_t fo_atime; /* Access time got from stat(2) */ >> timestruc_t fo_mtime; /* Modification time from stat(2) */ >> timestruc_t fo_ctime; /* Change time from stat(2) */ >> uintptr_t fo_pad; /* For future expansion >> */ >> char *fo_name; /* Pointer to a null >> terminated path name */ >> } >> > > Sounds cool, but make sure that there's a way to convey event drop > events (corresponding to, e.g., NFSv4.1 directory delegation recalls > when a directory is experiencing change large volumes). > With the currently proposed API, there is no need to indicate that there are event drops. However, with fo_pad being an uintptr_t, we can include what ever details we can by means of defining a new data structure that fo_pad can be made to point to. -Prakash. From sacadmin Tue Jan 30 13:50:40 2007 Received: from binky.Central.Sun.COM (binky.Central.Sun.COM [129.153.128.104]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l0ULodAC025984 for ; Tue, 30 Jan 2007 13:50:39 -0800 (PST) Received: from binky.Central.Sun.COM (localhost [127.0.0.1]) by binky.Central.Sun.COM (8.13.6+Sun/8.13.6) with ESMTP id l0ULoBux009510; Tue, 30 Jan 2007 15:50:11 -0600 (CST) Received: (from nw141292@localhost) by binky.Central.Sun.COM (8.13.6+Sun/8.13.6/Submit) id l0ULoBaK009509; Tue, 30 Jan 2007 15:50:11 -0600 (CST) Date: Tue, 30 Jan 2007 15:50:11 -0600 From: Nicolas Williams To: Prakash Sangappa Cc: Spencer.Shepler@sun.com, Bart Smaalders , psarc@sac.sfbay.sun.com, Darren.Kenny@sun.com, Alan.Bateman@sun.com, Michen.Chang@sun.com, Doug.Leavitt@sun.com Subject: Re: PSARC/2007/027 File Events Notification API Message-ID: <20070130215011.GU28618@binky.Central.Sun.COM> References: <45A6F043.9000500@Sun.COM> <20070124165946.GF13054@sheplap.central.sun.com> <45B7B564.4070008@Sun.COM> <20070125173801.GC395@sheplap.local> <45B920EE.3090701@sun.com> <20070130210145.GQ28618@binky.Central.Sun.COM> <45BFBC1C.2050503@sun.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <45BFBC1C.2050503@sun.com> User-Agent: Mutt/1.5.7i Status: RO Content-Length: 643 On Tue, Jan 30, 2007 at 01:43:56PM -0800, Prakash Sangappa wrote: > >Sounds cool, but make sure that there's a way to convey event drop > >events (corresponding to, e.g., NFSv4.1 directory delegation recalls > >when a directory is experiencing change large volumes). > > With the currently proposed API, there is no need to indicate that there > are event > drops. And I agree that it wouldn't be necessary. But once you add name information then you need it, IMO. (Sure, the NFSv4 client could do the re-readdir thing and work out the changes, but this could be very wasteful if the app were to consider doing something else.) Nico -- From sacadmin Tue Jan 30 14:36:23 2007 Received: from jurassic.eng.sun.com (jurassic.SFBay.Sun.COM [129.146.228.50]) by sac.sfbay.sun.com (8.13.6+Sun/8.13.6) with ESMTP id l0UMaNPf028000 for ; Tue, 30 Jan 2007 14:36:23 -0800 (PST) Received: from [129.146.228.98] (justforkicks.Eng.Sun.COM [129.146.228.98]) by jurassic.eng.sun.com (8.13.8+Sun/8.13.8) with ESMTP id l0UMaM2W754160; Tue, 30 Jan 2007 14:36:22 -0800 (PST) Message-ID: <45BFC781.3060503@sun.com> Date: Tue, 30 Jan 2007 14:32:33 -0800 From: Prakash Sangappa User-Agent: Mail/News 1.5.0.4 (X11/20060701) MIME-Version: 1.0 To: Prakash Sangappa CC: Bart Smaalders , psarc@sac.sfbay.sun.com, darren.kenny@sun.com, alan.bateman@sun.com, Michen.Chang@sun.com, Doug.Leavitt@sun.com Subject: Re: PSARC/2007/027 File Events Notification API References: <45A6F043.9000500@Sun.COM> <45B6AD2E.5010205@sun.com> In-Reply-To: <45B6AD2E.5010205@sun.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Status: RO Content-Length: 25709 Correction regarding the symbolic links issue: Monitoring all the intermediate symbolic links encountered during the resolution of a given file name, so that it could detect changes to symbolic links, does not address the problem completely. We will have to watch the entire path of the file name as well, because the path name can change. Therefore the same problem applies to regular files as well. For example: If an application wishes to watch some file /var/tmp/foo.cfg. With the file events notification API, we will watch just the file "foo.cfg" and not the entire path. So anybody could change the path like: # mv /var/tmp /var/tmp.junk # mkdir /var/tmp/ # touch /var/tmp/foo.cfg. Now the monitoring application will not be notified of the path getting changed. If required, the file path and the symbolic links can be monitored in the user land by the application or this functionality can be provided by means of some library routines that the applications could use. In order to keep the kernel implementation simple and scalable, this will not be done in the kernel. Instead, In order for applications to be able to monitor either the symbolic link itself or the target of a symbolic link, a flag will be provide that can be passed allowing with the event types to monitor. + FILE_NOFOLLOW. Therefore, by default the symbolic links are followed and the target of the symbolic links are monitored. This will be similar to how stat(2) behaves. If the application wishes to watch the symbolic link itself, it could do so by passing FILE_NOFOLLOW along with the event types. Then lstat(2) call has to be used to stat the symbolic link. -Prakash. Prakash Sangappa wrote: > The currently proposed man page documents that symbolic links are not > followed. > This behavior makes using File Events Notification API to watch > files with > symbolic links unnecessarily complicated. The application will have to > monitor the > symbolic links along with the target file. > > The File Events API needs to follow links if the specified file is a > symbolic link. > The following amends the proposed man page changes. > > > The changes are as follows: > > > Add the following to port_associate(3C) man page : > > Objects of type PORT_SOURCE_FILE are pointer to the structure > file_obj defined in . This event source provides > event notification when the specified file/directory is accessed, > modified or its status changes. The path name of the > file/directory > to be watched is passed in the 'struct file_obj' along with the > 'access', 'modification', and 'change' time stamps acquired from > - a stat(2) call. If the file name is a symbolic link, it is not > - followed; e.g. the link is monitored. > > --- change to ---- > > + a stat(2) call. If the file name specified is a symbolic link, it > + is followed. All the symbolic links encountered during the > + resolution of the file name and the target file are monitored. > + If any symbolic link gets renamed or deleted, it will be > treated as > + a file delete operation. > > > Add following to the ERRORS section of port_associate() > > > + ELOOP A loop exists in symbolic links or more then > + {SYMLOOP_MAX} symbolic links where encountered > during > + the resolution of the file path name specified. > > > > -Prakash. > > > Bart Smaalders wrote: >> I'm sponsoring the attached fast track for Prakash Sangappa. >> The requested release binding is minor, and the proposed timeout >> is 1/17/2007. The user interfaces described herein extend the >> ones introduced in PSARC/2002/498 and have the same stability >> levels. >> >> - Bart >> >> ------------------------------------------------------------------------ >> >> Copyright 2007 Sun Microsystems, Inc. >> >> 1. Introduction >> 1.1. Project/Component Working Name: >> File Events Notification API >> >> 1.2. Name of Document Author/Supplier: Prakash Sangappa >> >> 1.3. Date of This Document: 12/8/06 >> >> 1.4. Name of Major Document Customer(s)/Consumer(s): >> 1.4.2. The ARC(s) you expect to review your project: PSARC >> 1.4.4. The name of your business unit: OPG >> >> 1.5. Email Aliases: >> 1.5.1. Responsible Manager: michael.pogue@sun.com >> 1.5.2. Responsible Engineer: prakash.sangappa@sun.com >> 1.5.3. Marketing Manager: N/A >> 1.5.4. Interest List: bart.smaalders@sun.com >> darren.kenny@sun.com >> alan.bateman@sun.com >> Michen.Chang@Sun.COM >> Doug.Leavitt@Sun.COM >> >> 2. Project Summary >> 2.1. Project Description: >> >> This project delivers API for receiving event notifications when >> file or directory status changes. The API is based on the event >> ports interfaces(PSARC 2002/498). The file events notification >> facility will be added as a new event source to the event ports >> framework. >> 2.2. Risks and Assumptions: >> >> The file events notifications may not be accurate on distributed >> file systems like NFS, and on file systems which do not >> update the >> time stamps(eg: file system mounted with 'noaccess time' update >> option or read only file systems). >> >> 3. Business Summary >> 3.1. Problem Area: >> >> Some applications have the need to monitor files and directories for >> changes caused by non communicating processes. The current >> method is >> to periodically stat them, which is inefficient. Therefore >> there is >> a need for a file/directory monitoring facility. This >> facility will >> allow applications to monitor files and directories and receive >> notification when their status changes. >> >> 3.4. Competitive Analysis: >> >> Linux(inotify, dnotify), SGI(imon, FAM), Mac OS have flavors of file >> events notification mechanism. There are some user land file >> monitoring >> services implemented using the the kernel file events >> notification >> API. The FAM(file alteration monitoring) from SGI and Gamin, >> which is >> a simplified version of FAM, are user land implementations of >> file >> monitoring services. >> >> 4. Technical Description: >> 4.1. Details: >> >> The file events notification facility is implemented as a new event >> source(PORT_SOURCE_FILE) under the events ports framework. The >> API is >> based on the event ports(PSARC 2002/498) API. The object and the >> event >> types for this event source are described in the man page changes >> below. >> >> Other implementations of file events notification, like the linux's >> inotify/dnotify, support queuing events in the kernel and the events >> provide additional context(like the file name created/deleted). >> As it has been discussed on the perf-discuss@opensolaris.org >> alias, >> queuing events can cause scalability issues. On a large multiuser >> system, >> there can be many file operations occurring; as a result the >> kernel may generate events at a faster rate then the rate at >> which the >> application can process them, forcing the events to be queued >> and thus locking down kernel memory. Since a limit must be >> imposed on the number of events that can be queued, the >> application will have to implement a fall back method to handle >> missed events due to >> overflow. >> >> Some applications, like Beagle and Spotlight (both desktop search >> tools) >> require watching all the file and directory activity under a >> given path(directory tree), so that whenever files get >> modified/created/deleted, >> the search indicies for those files get updated. >> >> For example, the desktop search application 'Beagle' on Linux uses >> 'inotify' to watch the directory tree. It walks the directory tree >> registering a file monitor on each file and directory under it. >> >> On a large multiuser system with a large number of files and >> directories, >> this approach does not scale, since monitoring very large >> filesystems will >> require an inordinate amount of system memory. System scaling trends >> imply that available storage is growing much faster than system >> memory. >> A better solution for such use cases will be to have the filesystem >> provide a mechanism/interface which would provide the list of >> files and >> directories that have been added, modified or deleted since some >> given >> time in the past. It appears possible on ZFS to provide this >> functionality >> by allowing the user to get the difference between two snapshots; >> this is >> anticipated to take order the number of files changed, allowing even >> arbitrarily large filesystems to be indexed. This project does >> not propose these interfaces at present; we just want to point >> out what we >> know we're not addressing. >> >> In the approach we are taking with the file events notifications >> API, >> there will be no queuing of events. The event types delivered >> represent >> changes to the file's 'access', 'modification' and 'change' time >> stamps. >> The events do not provide any other details. This approach is in >> accordance with what the application can find out by statting a file >> and comparing its timestamps. The goal is to eliminate the need for >> applications (such as Nautilus, the Gnome file manager, or daemons >> monitoring config files) to periodically stat the files of interest. >> >> The man page section 2 describes the system calls that update the >> file/directory time stamps. The vnode operations corresponding these >> system calls are intercepted and relevant events delivered. >> The file event monitoring (FEM - PSARC 2003/172) hooks are used to >> intercept the vnode operations. >> There can be only one event outstanding per file or directory >> that is >> being monitored, i.e upon delivering an event, the file monitor >> is disabled. >> The file or directory needs to be re-associated to activate >> monitoring the >> file and receive the next event. To ensure that no events get >> missed in >> between, time stamps are used. The application has to pass the >> time stamps >> collected from a stat(2) call at the time of registering the file >> monitor. >> The time stamps passed in are compared against the current time >> stamps of >> the file and if they have changed, relevant events are delivered >> immediately. This behavior enables multithreaded programming using >> the file events notification API. It will also help filter out >> redundant >> events. >> Example: A multithreaded application can have a pool of threads >> processing >> file modification events. If file events were to be >> continuously delivered after a single registration and a >> file of interest was written to multiple times, multiple >> threads would receive >> change notification events and proceed to process them. >> This >> would force these threads to synchronize with each >> other. Note >> that when only one event is delivered and the file >> monitor gets >> disabled, one thread will be able to collect an event >> from a file >> and process it. While this thread is processing the file >> no other >> thread will process the same file as the file monitor is >> disabled >> and no new events get delivered until the file is >> re-associated >> and the file monitoring activated. There will be no need >> for any >> type of synchronization as only one thread would be >> processing >> the file at a time. Another useful aspect of this >> design is that >> rapid writes result in a much reduced set of file >> notification >> events; the monitoring application is never subject to a >> flood of >> events even if it runs very slowly. >> >> The following code snippet illustrates how a mulithreaded >> application with a pool of worker threads can use this file >> events notification API to process file status change >> events. >> >> /* * To initiate watching a file, >> this function can be called >> * once. The fobj_t structure is initialized with the >> file >> * name. The fobj pointer will be passed as the user >> pointer >> * to be returned with the event. The 'port' is the >> * event port fd obtained from a port_create(3C) >> call. */ >> int >> watchfile(int port, file_obj_t *fobj, events) { >> struct stat sbuf; >> stat(fobj->name, &sbuf); >> (fobj->name, events); >> fobj->fo_atime = sbuf.atim; >> fobj->fo_mtime = sbuf.mtim; >> fobj->fo_ctime = sbuf.ctim; >> >> return(port_associate(port, PORT_SOURCE_FILE, >> (uintptr_t)fobj, events, fobj)); >> } >> /* >> * Application threads that process file events call >> * this function. The file name is in the file_obj_t. >> * This 'fobj' would be passed in as the 'user pointer' >> * to be returned along with the event. >> */ >> void >> wait_for_fileevents(int port, events) { >> port_event_t pe; >> While (1) { >> struct file_obj *fobj; >> if (port_get(port, &pe, NULL) == -1) >> return; >> /* >> * Check for exception events and process file. >> */ >> if (!(pe.portev_events & (FILE_EXCEPTION))) { >> fobj = (file_obj_t)pe.portev_user; >> if (watchfile(port, fobj, events) >> == -1) >> return; >> } >> } >> } >> >> >> 4.2. Bug/RFE Number(s): >> 6367770 add user land interface to fem (file event monitoring) >> 4667502 need file system event notification framework for Solaris >> >> 4.3. In Scope: >> N/A >> >> 4.4. Out of Scope: >> N/A >> >> 4.5. Interfaces: >> >> Proposed man page changes: -------------------------- >> >> Changes to port_create(3C) man page: >> >> source object type association >> mechanism >> PORT_SOURCE_AIO struct aiocb aio_read(3RT), >> aio_write(3RT), >> lio_listio(3RT) >> PORT_SOURCE_FD file descriptor port_associate(3C) >> PORT_SOURCE_MQ mqd_t mq_notify(3RT) >> PORT_SOURCE_TIMER timer_t timer_create(3RT) >> PORT_SOURCE_USER uintptr_t port_send(3C) >> PORT_SOURCE_ALERT uintptr_t port_alert(3C) >> + PORT_SOURCE_FILE file_obj_t port_associate(3C) >> >> ... >> >> + PORT_SOURCE_FILE events represent file/directory status >> change. Once >> + an event is delivered, the file object is no longer >> associated with >> + the port. A file object is associated or re-associated with >> a port >> + using the port_associate(3C) function. >> >> >> Changes to port_associate(3C) man page: >> >> - The only objects associated with a port by way of the >> - port_associate() function are objects of type >> - PORT_SOURCE_FD. Objects of other types have type-specific >> - association mechanisms. See port_create(3C) for details. >> >> to >> >> + The objects that can be associated with a port by way of the >> + port_associate() function are objects of type PORT_SOURCE_FD >> + and PORT_SOURCE_FILE. Objects of other types have >> type-specific >> + association mechanisms. See port_create(3C) for details. >> >> >> Add the following to port_associate(3C) man page : >> >> Objects of type PORT_SOURCE_FILE are pointer to the structure >> file_obj defined in . This event source provides >> event notification when the specified file/directory is >> accessed, >> modified or its status changes. The path name of the >> file/directory >> to be watched is passed in the 'struct file_obj' along with the >> 'access', 'modification', and 'change' time stamps acquired from >> a stat(2) call. If the file name is a symbolic link, it is not >> followed; e.g. the link is monitored. >> >> The struct file_obj contains the following elements: >> >> timestruc_t fo_atime; /* Access time got from stat() */ >> timestruc_t fo_mtime; /* Modification time from stat() */ >> timestruc_t fo_ctime; /* Change time from stat() */ >> char *fo_name; /* Pointer to a null terminated >> path name */ >> >> At the time the port_associate function is called, the time >> stamps passed in the structure file_obj are compared with the >> file or >> directory's current time stamps and if there has been a >> change an event is immediately sent to the port. If not, an >> event will be >> sent when such a change occurs. >> >> The event types that can be specified at port_associate() >> time for >> the PORT_SOURCE_FILE are FILE_ACCESS, FILE_MODIFIED, >> FILE_ATTRIB, >> corresponding to the three time stamps. A atime change will >> result >> in the FILE_ACCESS event, mtime time change will result in the >> FILE_MODIFIED event. The ctime change will result in the >> FILE_ATTRIB >> event. >> >> Following exception events are delivered when they occur. These >> event types cannot be filtered. >> >> FILE_DELETE /* Monitored file/directory was deleted */ >> FILE_RENAME_TO /* Monitored file/directory was renamed */ >> FILE_RENAME_FROM /* Monitored file/directory was renamed */ >> UNMOUNTED /* Monitored file system got unmounted */ >> >> At most one event notification will be generated per >> associated 'file_obj'. When the event for the associated >> 'file_obj' is retrieved, the object is no longer associated >> with the port. The >> event can be processed without the possibility that another >> thread >> can retrieve a subsequent event for the same object. The >> port_associate() can be called to re-associate the file_obj >> object >> with the port. >> >> The association is also removed if the port gets closed or >> when port_dissociate() is called. >> >> Note: On NFS file systems, events from only the client >> side(local) >> access/modifications to files or directories will be delivered. >> >> Add following to the ERRORS section of port_associate() >> EACCES The "source" argument is >> PORT_SOURCE_FILE and, >> Search permission is denied on a component of >> path prefix or the file exists and the >> permissions, corresponding to the "events" >> argument, are denied. >> >> ENOENT The "source" argument is PORT_SOURCE_FILE and >> the file does not exist or the path prefix >> does not exist or the path points to an empty >> string. >> >> ENOTSUP The "source" argument is PORT_SOURCE_FILE and >> the filesystem on which the specified file >> recides, >> does not support watching for file events >> notifications. >> >> Add following to the ERRORS section of port_dissociate() >> >> EINVAL The "source" argument is PORT_SOURCE_FILE and >> the specified file is currently not associated >> with the port(not being watched for file events >> notifications). >> >> Changes to the VOP, FEM interfaces >> ---------------------------------- >> >> In order to correctly identify file events on files having hard >> links, >> it is required to pass the directory vnode pointer and the file name >> component along with the VNEVENT type to VOP_VNEVENT() interface >> routine. >> >> Example: >> If a file has the following links >> >> /tmp/dir1/foo >> /tmp/dir2/foo >> >> and an application is watching /tmp/dir2/foo for file events. >> When /tmp/dir1/foo gets removed(rm), right now we receive a >> VN_REMOVE vnevent on the vnode. It is not possible to determine >> if /tmp/dir1/foo got removed or /tmp/dir2/foo got removed. >> >> When the link count is increased/decreased, the ctime gets >> updated >> on the file. So, the correct event here on /tmp/dir1/foo should >> be FILE_ATTRIB indicating 'ctime' change. >> >> Where as if /tmp/dir2/foo get removed(rm), then it should >> receive >> a FILE_DELETE event as the name /tmp/dir2/foo got removed. >> >> This can be determined if the directory vnode >> pointer and the >> file name components are passed to the VOP_VNEVENT() >> interface. >> >> >> Modified VOP and supporting FEM interfaces - Consolidation >> private >> --------------------------------------- >> >> Two new arguments added, 'vnode_t *dvp' and 'char *cname' >> >> VOP_VNEVENT(vnode_t *vp, vnevent_t vnevent, vnode_t *dvp, char >> *cname) >> fop_vnevent(vnode_t *vp, vnevent_t vnevent, vnode_t *dvp, char >> *cname) >> vnext_vnevent(femarg_t *vf, vnevent_t vnevent, vnode_t *dvp, char >> *cname) >> >> vnevent_rename_src(vnode_t *vp, vnode_t *dvp, char *name) >> vnevent_rename_dest(vnode_t *vp, vnode_t *dvp, char *name) >> vnevent_remove(vnode_t *vp, vnode_t *dvp, char *name) >> vnevent_rmdir(vnode_t *vp, vnode_t *dvp, char *name) >> >> >> New VNEVENT types - Consolidation private: >> ------------------ >> >> VE_CREATE - Represents a create operation on an already existing >> file. >> VE_LINK - The source file of a 'link' system call to file. >> >> VE_RENAME_DEST_DIR - Destination directory of a rename() operation >> >> >> Corresponding new vnevent routine added: - Consolidation private. >> ----------------------------------- >> >> void vnevent_create(vnode_t *vp) >> void vnevent_create(vnode_t *vp) >> void vnevent_rename_dest_dir(vnode_t *vp) >> >> >> New member added to private section of 'vnode.h' >> ----------------------------------------------- >> + void *v_fopdata; /* file events notification - >> private data */ >> >> >> VNEVENT support in NFS: >> ----------------------- >> Added VNEVENTS support to the NFS file system to report client >> side file >> events. It is used to catch any local(client side) file operations >> on a >> NFS file system and report file events. Clearly, this is not >> complete as >> it will not be able to catch any of the server side file >> operations. This >> is documented in the man page. >> >> 4.6. Doc Impact: >> port_associate(3C) - man page >> port_create(3C) - man page >> >> 5. Reference Documents: >> project page - >> http://perf.eng.sun.com/twiki/bin/view/EventPorts/EPFileEvents >> >> PSARC/2002/498 - Event Completion Framework >> PSARC/2003/172 - FEM (File event Monitoring) >> PSARC/2004/170 - VOP_VNEVENT() >> >> 6. Resources and Schedule: >> 6.1. Projected Availability: >> S11 >> 6.2. Cost of Effort: >> Development is largely done. >> Test case development - 1 week. >> >> 6.3. Cost of Capital Resources: >> N/A >> >> 6.4. Product Approval Committee requested information: >> 6.4.1. Consolidation or Component Name: ON >> 6.4.3. Type of CPT Review and Approval expected: RFE >> 6.4.4. Project Boundary Conditions: N/A >> 6.4.5. Is this a necessary project for OEM agreements: No >> 6.4.6. Notes: N/A >> 6.4.7. Target RTI Date/Release: 6.4.8. Target Code Design >> Review Date: >> 6.4.9. Update approval addition: No >> >> 6.5. ARC review type: FastTrack >> >> 7. Prototype Availability: >> 7.1. Prototype Availability: now >> >