Type-Safe Disks

Pointers are the fundamental means by which modern file systems organize raw disk data into semantically-meaningful entities such as files and directories. Pointers define three things: (1) the semantic dependency between blocks (e.g., a data block is accessible only through a pointer from an inode block); (2) the logical grouping of blocks (e.g., blocks pointed to by the same indirect block are part of the same file or directory); and even (3) the importance of a block (e.g., blocks with many outgoing pointers are important because they impact the accessibility of a large set of blocks).

Despite the rich semantic information inherently available through pointers, pointers are completely opaque to disk systems today. Due to a narrow read-write interface, storage systems view data simply as a raw sequence of uninterpreted blocks, thus losing all semantic structure imposed on the data by higher layers such as the file system or database system. This leads to the well-known information gap between the storage system and higher layers. Because of this information gap, storage systems are constrained in the range of functionality they can provide, despite the powerful processing capability and the great deal of low-level layout knowledge they have.

This project proposes the notion of a type-safe disk (TSD), a disk system that has knowledge of the pointer relationships between blocks. A TSD uses this knowledge in two key ways. First, semantic structure conveyed through pointers is used to enforce invariants on data access, providing better data integrity and security. For example, a TSD prevents access to an unallocated block. Second, a TSD can perform various semantics-aware optimizations that are difficult to provide in the current storage hierarchy. A TSD extends the traditional block-based read-write interface with three new primitives: block allocation, pointer creation, and pointer removal. By performing block allocation and de-allocation, a TSD frees the file system from the need for free-space management. Similar in spirit to type-safe programming languages, a TSD also exploits its pointer awareness to perform automatic garbage collection of unused blocks; blocks which have no pointers pointing to them are reclaimed automatically, thus freeing file systems of the need to track reference counts for blocks in many cases.

Type-Safe disks can not only enable the wide-range of useful functionality that alternative proposals such as OSDs and SDSs enable, with lesser modifications to the interface and software, but also more. The list of uses of TSDs include but are not limited to the following:

Capability-based Access Control at the Disk Level: Pointers can be used to authenticate groups of blocks at the disk level. using the parent-child relationship. Child blocks can have implicit capabilities -- capabilities inherited from their parents.
Operation-based Constraints: TSDs can enforce restricted forms of constraints such as append-only, as they can infer certain higher level operation using the pointer operations. For example, when a pointer is added to a pointer block, it means the higher level entity represented by the pointer block is being extended.
Secure Deletion: Simpler and secure than software-level solutions. Automatic garbage collection in TSD can be augmented with secure deletion by adding just a few tens of lines of code.
Track-Aligned Extents: The disk can try to align groups of blocks as indicated by pointers, in track boundaries to reduce unnecessary seeks.
Semantic-guided Pre-fetching: TSDs can perform intelligent prefetching of data because of the pointer information. When a pointer block is accessed, a TSD can prefetch the data blocks pointed to by it, and store it in the on-disk buffers for improved read performance.
Intelligent Replication in RAID: Since TSDs are capable of differentiating data and pointers, they can identify metadata blocks as those blocks that contain outgoing pointers and replicate them to a higher degree, or distribute them evenly across all the disks. This could provide graceful degradation of availability as provided by D-GRAID (FAST '04).
On-disk Caching using Semantics: Cache meta-data blocks to a higher degree in the on-disk buffers to improve performance.
Intelligent Data Placement: Using the knowledge of pointers, a TSD can co-locate blocks along with their reference blocks (blocks that point to them). In general, blocks will be accessed just after their pointer blocks are accessed, and hence there would be better locality during access.
Disk-level Free-block Scheduling: Block Liveness information in TSDs can identify free-blocks and can be used to improve write performance by writing to the nearest free block.
Power-saving in Large RAID Systems: With TSDs, we can co-locate related blocks within the same devices so that the least number of devices need to be turned on to read a file.
Detecting Pointer Corruptions: Corruption of block pointers stored in file system metadata (like inodes and bitmaps) could potentially result in serious damage to data. As TSDs maintain their own copy of pointers, they can be used to verify integrity of pointers.
Data Isolation: Path-based capabilities can enable multiple file systems to co-exist in the same partition of the disk. This can be used to provide each user with his/her own file system which is totally isolated from the others.

Conference and Workshop Papers:

#	Title (click for html version)	Formats	Published In	Date	Comments
1	DHIS: Discriminating Hierarchical Storage	PS PDF BibTeX	The 2nd Israeli Experimental Systems Conference (ACM SYSTOR 2009)	May 2009
2	Exploiting Type-Awareness in a Self-Recovering Disk	PS PDF BibTeX	Third ACM International Workshop on Storage Security and Survivability (StorageSS 2007) held in conjunction with the 14th ACM CCS.	Oct 2007
3	Type-Safe Disks	PS PDF BibTeX	Seventh USENIX Symposium on Operating Systems Design and Implementation (OSDI 2006)	Nov 2006
4	Ensuring Data Integrity in Storage: Techniques and Applications	PS PDF BibTeX	First ACM International Workshop on Storage Security and Survivability (StorageSS 2005) held in conjunction with the 12th ACM CCS.	Nov 2005

Past Students:

#	Name (click for home page)	Program	Period	Current Location
1	Gopalan Sivathanu	PhD	Sep 2003 - May 2008	Software Engineer, Systems Infrastructure group, Google (Mountain View, CA)
2	Swaminathan Sundararaman	MS	Dec 2005 - Aug 2007	Research Scientist, ParallelIM (Sunnyvale, CA)
3	Kiron Vijayasankar	MS	Dec 2006 - Dec 2007	Member of Technical Staff, Engineering Development, Riverbed Technology (Sunnyvale, CA)
4	Chaitanya Yalamanchili	MS	Sep 2007 - Dec 2008	Software Engineer, Storage Availability and Management Group, Symantec, Inc Mountain View, CA)

Sponsors:

#	Sponsor	Amount	Period	Type	Title (click for award abstract)
1	NSF Trusted Computing (TC)	$400,000	2003-2006	Sole PI	A Layered Approach to Securing Network File Systems