In this project we are interested in producing a suite of tools that can run to verify that every file system operation works well; that every combination of file system operations works as expected; that there are no race conditions, deadlocks, and obvious crashes of the file system code; that the is no way to corrupt the file system's data or crash the kernel by any combination of possibly obscure set of file system operations. Of particular interest to us is exploring boundary conditions that are often missed. For example, when disks begin to fail, the file system may not gracefully handle the receipt of many EIO errors; or when memory runs out, getting ENOMEM can confuse file system code; or when creating a single file that fills an entire file system with a file larger than 2GB; or when directory data (struct dirent) gets corrupt somehow.
To give you an idea of the techniques we are developing, consider a suspect bad code that does not properly lock the directory inode before modifying the directory. Perhaps the lock is misplaced and a minor race could be triggered under the right conditions. Our suite will pick a set of machines with different characteristics, and especially SMP machines; on these machines, our tool will run concurrent processes such that one thread tries to create a named file with a given rate of creation (often as quickly as possible), and another tread tries to delete the same named file with a given rate of progress. We will try this on SMP and UMP machines, different disks, different amounts of memory, different CPU speeds, etc. The reason is that races often can be won or lost due to specific timing conditions. The creation thread should always get either a success code or EEXIST (the file already exists); the deletion thread should always get a success code or ENOENT (file does not exist). If either thread gets a different error, we may have very likely discovered a bug that needs to be investigated further.
|#||Title (click for html version)||Formats||Published In||Date||Comments|
|1||Auto-pilot: A Platform for System Software Benchmarking||PS PDF BibTeX||Usenix Technical Conference, FREENIX Track||Apr 2005|
|2||High-Confidence Operating Systems||PS PDF BibTeX||Tenth ACM SIGOPS European Workshop||Sep 2002|
|#||Name (click for home page)||Program||Period||Current Location|
|1||Charles P. Wright||PhD||May 2003 - May 2006||Application Software Developer, Walleye Software (New York, NY)|
|2||Naveen Gupta||MS||Sep 2004 - Dec 2005||Member of the Technical Staff, Systems Software group, Google (Mountain View, CA)|
|3||Kiran-Kumar Muniswamy-Reddy||MS||Jan 2002 - May 2004||Senior Software Engineer (DynamoDB) Amazon (Seattle, WA)|
|4||Sunil Satnur||MS||Sep 2004 - Dec 2005||Staff Engineer, Storage and Avaliability Group, VMware Inc. (Palo Alto, CA)|
|#||Sponsor||Amount||Period||Type||Title (click for award abstract)|
|1||NSF HECURA||$760,253||2006-2009||Lead-PI||File System Tracing, Replaying, Profiling, and Analysis on HEC Systems|