RSS: Events
|
News
|
Papers
News
››› Complete list of
news items
Events
››› Complete list of events
|
Archival Storage
Faculty
Students
Associates
Alumni
Sponsors
Description
We have several active and past projects in archival storage, all of which are contributing to the ability to
build more efficient, reliable, and secure long-term storage systems. In addition, we maintain a
wiki page with links to resources on archival storage systems.
- Archival Workload Studies: The first study of publicly accesible long-term data repositories ever done, and the first study of tertiary storage in over 15 years
- Logan: A management system to scalably grow, maintain, and evolve a heterogeneous archival storage system
- Computation-Storage Trade-off: Using provenance to reduce storage overhead by storing intermediate and initial inputs and recomputing a dataset on demand
- Pergamum: long-term evolvable storage built from intelligent network-attached bricks
with both disk and NVRAM such as flash.
- (Past Project) Deep Store: building more efficient archival storage using deduplication to take
advantage of intra-file and inter-file redundancy.
- (Past Project) POTSHARDS: long-term secure storage, which allows the secure preservation of data for decades without relying upon traditional encryption to prevent information leakage.
Workload Studies: Archival workloads information is currently highly out of date. The most recent studies of archives were done on tertiary storage systems over 15 years ago. Not only is this data highly out of date, there are now many publicly available archives of historical, compliance and scientific data with which we have little to no understanding of their usage and access patterns. We are working on obtaining and analyzing a variety of archival workloads to better understand their usage patterns in order to aid in the design and verification of current and future archival storage architectures.
Logan: A system comprised of heterogeneous devicesoffers unique opportunities for administrators to dictate when and where devices are integrated and utilized based upon their characteristics.
Logan optimizes the growth of a system by choosing which devices to integrate into the system based on administratively defined policies. Similarly, it maintains and improves system state by allowing administrators to dictate at a high level when and where data should be migrated or rebuilt when a device fails or is decommissioned.
Computation-Storage Trade-off Often times computations produce many rarely used intermediate or final results. Naively storing or discarding results can prove to be a very expensive trade-off. Often used results may then need to be repeatedly computed, or similarly never used ones waste storage. We examine storing the provenance (work-flow) used to create a data-set, and choose an optimal set of inputs and intermediate results to yield the best level of overhead and availability under a variety of constraints such as time to retrieve a result, feasibility of re-computation.
Pergamum: This project's goal is to develop a long-term storage system that is evolvable and controls the major storage cost contributors: static, operational and management. Pergamum consists of a fully distributed network of intelligent storage devices. Each node, called a tome, consists of a SATA hard drive, a low-power processor, NVRAM and a standardized network interface. Reliability is provided through two levels of redundancy encoding: intra-tome redundancy handles latent sector errors, and inter-tome redundancy handles lost devices. By keeping most of the devices spun-down, and through the utilization of commodity hardware, Pergamum provides cost efficiency on par with tape based systems, while providing superior random access performance. Further cost savings are realized by utilizing hierarchical consistency checking, staged rebuilds and NVRAM based metadata stores; reducing disk spin-up results in dramatic energy savings.
Deepstore: Disk-based deep storage is becoming practical because magnetic disks are rapidly becoming as inexpensive as magnetic tape and optical storage, the traditional storage media used for backup and archiving today. The Deep Store architecture uses inter-file (differential) and intra-file (sliding dictionary) data compression to increase storage density, and by adding distribution and redundancy to improve request bandwidth and robustness, the expected media costs will be much lower than that of traditional backup and archival storage.
POTSHARDS: This is project aimed to securely preserve data by spreading breaking it into pieces (shards ) and storing them across multiple archives so that no individual archive can reconstruct the data or even know which shards it must steal from other archives to build data. However, a user who gathers all of the shards must be able to reconstruct the original data with no additional information (including encryption keys). We accomplish this using multiple levels of secret splitting and approximate pointers that limit the space that must be searched for related shards while requiring an attacker to obtain exponential numbers of shards that may not be identified in advance. This approach has information-theoretic security because of the use of secret splitting, unlike encryption that might be broken by advances in algorithms or computer hardware. We believe that this approach will become common as the need to securely store data for decades becomes more pressing.
Status
Workload Studies: We have thus far obtained several archive access and update logs and are in the process of obtaining more. If you have workload information you wish to share, please contact a current graduate student or faculty member.
Logan: We have designed and initially validated the basic Logan architecture through simulation and are moving on to investigating several areas. First, scalability: the system is ultimately self-managing which means there are several challenges to be addressed such as scalable communications, group membership, and resource discovery. Second, layout heuristics: In a system with thousands of storage devices of varying type and characteristics brute-force search is simply not an option when searching for devices to coordinate for reliability purposes. Furthermore, access to low level traces for planning and provisioning incur overhead and cannot be guaranteed available. Therefore we must take a detailed look at various methods and heuristics to choose, such as simulated annealing techniques, and basic heuristics such as power draw and feasible I/O.
Computation-Storage Trade-off: There are a variety of areas under investigation in this project. First, identifying the necessary information to store within the provenance and workflows, as well as how to gather and represent it. This is less trivial than initially thought as there are many issues that must be accounted for such as how deterministic a process is, scheduling constraints, security issues, and so forth. Secondly, useful metrics: A user debating this tradeoff should receive concise and simple results in answer to questions such as "how long will this take to recompute under my defined conditions". This is a difficult problem as there may be many possible ways schedule re-computations among many different processes. Third, data-selection: what is the ideal set of data to store and discard. large workflows can contain literally thousands of processes and intermediate results, thus the selection process must be as automated as possible.
Publications
2011
-
Ian Adams,
Ethan L. Miller,
David S.H. Rosenthal,
Using Storage Class Memory for Archives with DAWN, a Durable Array of Wimpy Nodes ,
Technical Report UCSC-SSRC-11-07,
October 2011.
-
Lawrence You,
Kristal Pollack,
Darrell D. E. Long,
Kanchi Gopinath,
PRESIDIO: A Framework for Efficient Archival Data Storage,
ACM Transactions on Storage 7(2),
July 2011.
-
Yulai Xie,
Kiran-Kumar Muniswamy-Reddy,
Darrell D. E. Long,
Ahmed Amer,
Dan Feng,
Zhipeng Tan,
Compressing Provenance Graphs,
3rd USENIX Workshop on the Theory and Practice of Provenance,
June 2011.
-
Ian Adams,
Ethan L. Miller,
David S.H. Rosenthal,
Using Storage Class Memory for Archives with DAWN, a Durable Array of Wimpy Nodes ,
Technical Report UCSC-SSRC-11-05,
May 2011.
NOTE: This report has been superseded by Technical Report UCSC-SSRC-11-07, please refer to that version.
-
Ian Adams,
Ethan L. Miller,
Mark W. Storer,
Analysis of Workload Behavior in Scientific and Historical Long-Term Data Repositories,
Technical Report UCSC-SSRC-11-01,
March 2011.
2010
-
Avani Wildani,
Ethan L. Miller,
Semantic Data Placement for Power Management in Archival Storage,
Proceedings of the 5th International Workshop on Petascale Data Storage (PDSW10), held in conjunction with SC2010,
November 2010.
-
Ian Adams,
Ethan L. Miller,
Mark W. Storer,
Examining Energy Use in Heterogeneous Archival Storage Systems,
Proceedings of the 18th Annual Meeting of the IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS 2010),
August 2010, pages 297-306.
2009
-
Avani Wildani,
Thomas Schwarz,
Ethan L. Miller,
Darrell D. E. Long,
Protecting Against Rare Event Failures in Archival Systems,
Proceedings of the 17th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS 2009),
September 2009.
-
Mark W. Storer,
Kevin Greenan,
Ethan L. Miller,
Kaladhar Voruganti,
POTSHARDS—A Secure, Long-Term Storage System,
ACM Transactions on Storage 5(2),
June 2009.
-
Ian Adams,
Darrell D. E. Long,
Ethan L. Miller,
Shankar Pasupathy,
Mark W. Storer,
Maximizing Efficiency By Trading Storage for Computation,
Proceedings of the Workshop on Hot Topics in Cloud Computing (HotCloud ’09),
June 2009.
-
Avani Wildani,
Thomas Schwarz,
Ethan L. Miller,
Darrell D. E. Long,
Protecting Against Rare Event Failures in Archival Systems,
Technical Report UCSC-SSRC-09-03,
April 2009.
Preliminary version of a paper that appeared in MASCOTS 2009.
-
Mark W. Storer,
Secure, Energy-Efficient, Evolvable, Long-Term Archival Storage,
Technical Report UCSC-SSRC-09-01,
March 2009.
2008
-
Kevin Greenan,
Darrell D. E. Long,
Ethan L. Miller,
Thomas Schwarz,
Jay Wylie,
A Spin-Up Saved is Energy Earned: Achieving Power-Efficient, Erasure-Coded Storage,
Proceedings of the Fourth Workshop on Hot Topics in System Dependability (HotDep '08),
December 2008.
-
Mark W. Storer,
Kevin Greenan,
Ian Adams,
Ethan L. Miller,
Darrell D. E. Long,
Kaladhar Voruganti,
Logan: Automatic Management for Evolvable, Large-Scale, Archival Storage,
Proceedings of the 2008 Petascale Data Storage Workshop (PDSW 08),
November 2008.
-
Mark W. Storer,
Kevin Greenan,
Darrell D. E. Long,
Ethan L. Miller,
Secure Data Deduplication,
Proceedings of the 4th International Workshop on Storage Security and Survivability (StorageSS 2008), held in conjunction with the 15th ACM Conference on Computer and Communications Security (CCS 2008),
October 2008.
-
Kevin Greenan,
Ethan L. Miller,
Thomas Schwarz,
Optimizing Galois Field Arithmetic for Diverse Processor Architectures,
Proceedings of the 16th Annual IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS 2008),
September 2008.
-
Casey Marshall,
Efficient and safe data backup with Arrow,
Technical Report UCSC-SSRC-08-02,
June 2008.
Masters project report.
-
Mark W. Storer,
Kevin Greenan,
Ethan L. Miller,
Kaladhar Voruganti,
Pergamum: Energy-efficient Archival Storage with Disk Instead of Tape,
;login: — The USENIX Magazine 33(3),
June 2008.
-
Mark W. Storer,
Kevin Greenan,
Ethan L. Miller,
Kaladhar Voruganti,
Pergamum: Replacing Tape with Energy Efficient, Reliable, Disk-Based Archival Storage,
Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST '08),
February 2008, pages 1-16.
2007
-
Kevin Greenan,
Ethan L. Miller,
Thomas Schwarz,
Darrell D. E. Long,
Disaster Recovery Codes: Increasing Reliability with Large-Stripe Error Correction Codes,
Proceedings of the 3rd International Workshop on Storage Security and Survivability (StorageSS 2007), held in conjunction with the 14th ACM Conference on Computer and Communications Security (CCS 2007),
October 2007.
-
Kevin Greenan,
Ethan L. Miller,
Thomas Schwarz,
Analysis and Construction of Galois Fields for Efficient Storage Reliability,
Technical Report UCSC-SSRC-07-09,
August 2007.
Revised version published in MASCOTS 2008.
-
Deepavali Bhagwat,
Kave Eshghi,
Pankaj Mehra,
Content-based Document Routing and Index Partitioning for Scalable Similarity-based Searches in a Large Corpus,
Proceedings of the 13th ACM SIGKDD international conference on Knowledge Discovery and Data Mining (KDD '07),
August 2007, pages 105-112.
-
Mark W. Storer,
Kevin Greenan,
Ethan L. Miller,
Kaladhar Voruganti,
POTSHARDS: Secure Long-Term Storage Without Encryption,
Proceedings of the 2007 USENIX Technical Conference,
June 2007, pages 143-156.
-
Jehan-François Pâris,
Thomas Schwarz,
Darrell D. E. Long,
Self-Adaptive Two-Dimensional RAID Arrays,
Proceedings of the International Performance Conference on Computers and Communication (IPCCC '07),
April 2007.
2006
-
Mark W. Storer,
Kevin Greenan,
Ethan L. Miller,
Long-Term Threats to Secure Archives,
Proceedings of the 2nd ACM Workshop on Storage Security and Survivability (StorageSS 2006),
October 2006.
-
Mark W. Storer,
Kevin Greenan,
Ethan L. Miller,
Kaladhar Voruganti,
POTSHARDS: Secure Long-Term Archival Storage Without Encryption,
Technical Report UCSC-SSRC-06-03, Storage Systems Research Center, University of California, Santa Cruz,
September 2006.
Later version published in USENIX 2007.
-
Deepavali Bhagwat,
Kristal Pollack,
Darrell D. E. Long,
Thomas Schwarz,
Ethan L. Miller,
Jehan-François Pâris,
Providing High Reliability in a Minimum Redundancy Archival Storage System,
Proceedings of the 14th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS '06),
September 2006, pages 413-421.
-
Thomas Schwarz,
Ethan L. Miller,
Store, forget, and check: Using algebraic signatures to check remotely administered storage,
Proceedings of the IEEE Int'l Conference on Distributed Computing Systems (ICDCS '06),
July 2006.
-
Lawrence You,
Efficient Archival Data Storage,
Technical Report UCSC-SSRC-06-04,
June 2006.
Ph.D. thesis.
2005
-
Mark W. Storer,
Kevin Greenan,
Ethan L. Miller,
Carlos Maltzahn,
POTSHARDS: Storing Data for the Long-Term Without Encryption,
Proceedings of the 3rd International IEEE Security in Storage Workshop,
December 2005.
-
Lawrence You,
Kristal Pollack,
Darrell D. E. Long,
Deep Store: An Archival Storage System Architecture,
Proceedings of the 21st International Conference on Data Engineering (ICDE '05),
April 2005.
-
Joerg Meyer,
Large-Scale Multi-Type Inverted List Indexing,
Masters thesis, University of California, Santa Cruz,
March 2005.
2004
-
Thomas Schwarz,
Qin Xin,
Ethan L. Miller,
Darrell D. E. Long,
Andy Hospodor,
Spencer Ng,
Disk Scrubbing in Large Archival Storage Systems,
Proceedings of the 12th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS '04),
October 2004, pages 409-418.
Won Best Paper award.
-
Lawrence You,
Christos Karamanolis,
Evaluation of efficient archival storage techniques,
Proceedings of the 21st IEEE / 12th NASA Goddard Conference on Mass Storage Systems and Technologies,
April 2004.
1998
Last modified 30 Sep 2011
|