Type-Safe Disks

Pointers are the fundamental means by which modern file systems organize raw disk data into semantically-meaningful entities such as files and directories. Pointers define three things: (1) the semantic dependency between blocks (e.g., a data block is accessible only through a pointer from an inode block); (2) the logical grouping of blocks (e.g., blocks pointed to by the same indirect block are part of the same file or directory); and even (3) the importance of a block (e.g., blocks with many outgoing pointers are important because they impact the accessibility of a large set of blocks).

Despite the rich semantic information inherently available through pointers, pointers are completely opaque to disk systems today. Due to a narrow read-write interface, storage systems view data simply as a raw sequence of uninterpreted blocks, thus losing all semantic structure imposed on the data by higher layers such as the file system or database system. This leads to the well-known information gap between the storage system and higher layers. Because of this information gap, storage systems are constrained in the range of functionality they can provide, despite the powerful processing capability and the great deal of low-level layout knowledge they have.

This project proposes the notion of a type-safe disk (TSD), a disk system that has knowledge of the pointer relationships between blocks. A TSD uses this knowledge in two key ways. First, semantic structure conveyed through pointers is used to enforce invariants on data access, providing better data integrity and security. For example, a TSD prevents access to an unallocated block. Second, a TSD can perform various semantics-aware optimizations that are difficult to provide in the current storage hierarchy. A TSD extends the traditional block-based read-write interface with three new primitives: block allocation, pointer creation, and pointer removal. By performing block allocation and de-allocation, a TSD frees the file system from the need for free-space management. Similar in spirit to type-safe programming languages, a TSD also exploits its pointer awareness to perform automatic garbage collection of unused blocks; blocks which have no pointers pointing to them are reclaimed automatically, thus freeing file systems of the need to track reference counts for blocks in many cases.

Type-Safe disks can not only enable the wide-range of useful functionality that alternative proposals such as OSDs and SDSs enable, with lesser modifications to the interface and software, but also more. The list of uses of TSDs include but are not limited to the following:

Conference and Workshop Papers:

# Title (click for html version) Formats Published In Date Comments
1 DHIS: Discriminating Hierarchical Storage PS PDF BibTeX The 2nd Israeli Experimental Systems Conference (ACM SYSTOR 2009) May 2009  
2 Exploiting Type-Awareness in a Self-Recovering Disk PS PDF BibTeX Third ACM International Workshop on Storage Security and Survivability (StorageSS 2007) held in conjunction with the 14th ACM CCS. Oct 2007  
3 Type-Safe Disks PS PDF BibTeX Seventh USENIX Symposium on Operating Systems Design and Implementation (OSDI 2006) Nov 2006  
4 Ensuring Data Integrity in Storage: Techniques and Applications PS PDF BibTeX First ACM International Workshop on Storage Security and Survivability (StorageSS 2005) held in conjunction with the 12th ACM CCS. Nov 2005  

Past Students:

# Name (click for home page) Program Period Current Location
1 Gopalan Sivathanu PhD Sep 2003 - May 2008 Software Engineer, Systems Infrastructure group, Google (Mountain View, CA)
2 Swaminathan Sundararaman MS Dec 2005 - Aug 2007 Research Scientist, ParallelIM (Sunnyvale, CA)
3 Kiron Vijayasankar MS Dec 2006 - Dec 2007 Member of Technical Staff, Engineering Development, Riverbed Technology (Sunnyvale, CA)
4 Chaitanya Yalamanchili MS Sep 2007 - Dec 2008 Software Engineer, Storage Availability and Management Group, Symantec, Inc Mountain View, CA)

Sponsors:

# Sponsor Amount Period Type Title (click for award abstract)
1 NSF Trusted Computing (TC) $400,000 2003-2006 Sole PI A Layered Approach to Securing Network File Systems