ABSTRACT
In the era of data-intensive computing, large-scale applications, in both scientific and the BigData communities, demonstrate unique I/O requirements leading to a proliferation of different storage devices and software stacks, many of which have conflicting requirements. In this paper, we investigate how to support a wide variety of conflicting I/O workloads under a single storage system. We introduce the idea of a Label, a new data representation, and, we present LABIOS: a new, distributed, Label- based I/O system. LABIOS boosts I/O performance by up to 17x via asynchronous I/O, supports heterogeneous storage resources, offers storage elasticity, and promotes in-situ analytics via data provisioning. LABIOS demonstrates the effectiveness of storage bridging to support the convergence of HPC and BigData workloads on a single platform.
- Michael Bauer, Sean Treichler, Elliott Slaughter, and Alex Aiken. 2012. Legion: Expressing locality and independence with logical regions. In High Performance Computing, Networking, Storage and Analysis (SC). IEEE, 1--11. Google ScholarDigital Library
- Andreas Berl, Erol Gelenbe, Marco Di Girolamo, Giovanni Giuliani, Hermann De Meer, Minh Quan Dang, and Kostas Pentikousis. 2010. Energy-efficient cloud computing. The computer journal, Vol. 53, 7 (2010), 1045--1051. Google ScholarDigital Library
- Dimitris Bertsimas and Ramazan Demir. 2002. An approximate DP approach to multidimensional knapsack problems. Management Science, Vol. 48, 4 (2002), 550--565. Google ScholarDigital Library
- Deepavali M Bhagwat, Marc Eshel, Dean Hildebrand, Manoj P Naik, Wayne A Sawdon, Frank B Schmuck, and Renu Tewari. 2018a. Global namespace for a hierarchical set of file systems. (July 5 2018). US Patent App. 15/397,632.Google Scholar
- Deepavali M Bhagwat, Marc Eshel, Dean Hildebrand, Manoj P Naik, Wayne A Sawdon, Frank B Schmuck, and Renu Tewari. 2018b. Rebuilding the namespace in a hierarchical union mounted file system. (July 5 2018). US Patent App. 15/397,601.Google Scholar
- Wahid Bhimji, Debbie Bard, Melissa Romanus, David Paul, Andrey Ovsyannikov, Brian Friesen, Matt Bryson, Joaquin Correa, Glenn K Lockwood, Vakho Tsulaia, et almbox. 2016. Accelerating science with the NERSC burst buffer early user program . Technical Report. NERSC.Google Scholar
- John Biddiscombe, Jerome Soumagne, Guillaume Oger, David Guibert, and Jean-Guillaume Piccinali. 2011. Parallel computational steering and analysis for hpc applications using a paraview interface and the hdf5 dsm virtual file driver. In Eurographics Symposium on Parallel Graphics and Visualization. Eurographics Association, 91--100.Google Scholar
- M Scot Breitenfeld, Neil Fortner, Jordan Henderson, Jerome Soumagne, Mohamad Chaarawi, Johann Lombardi, and Quincey Koziol. 2017. DAOS for Extreme-scale Systems in Scientific Applications. arXiv preprint arXiv:1712.00423 (2017).Google Scholar
- George H Bryan and J Michael Fritsch. 2002. A benchmark simulation for moist nonhydrostatic numerical models. Monthly Weather Review, Vol. 130, 12 (2002), 2917--2928.Google ScholarCross Ref
- Philip Carns, Sam Lang, Robert Ross, Murali Vilayannur, Julian Kunkel, and Thomas Ludwig. 2009. Small-file access in parallel file systems. In Parallel & Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on. IEEE, 1--11. Google ScholarDigital Library
- Chameleon.org. 2018. Chameleon system . https://www.chameleoncloud.org/about/chameleon/. (2018). {Online; accessed 09--14--2018}.Google Scholar
- Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C Hsieh, Deborah A Wallach, Mike Burrows, Tushar Chandra, and Robert E Gruber. 2008. Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems (TOCS), Vol. 26, 2 (2008), 4. Google ScholarDigital Library
- Nathanaël Cheriere, Matthieu Dorier, and Gabriel Antoniu. 2018. A Lower Bound for the Commission Times in Replication-Based Distributed Storage Systems . Ph.D. Dissertation. Inria Rennes-Bretagne Atlantique.Google Scholar
- Cloud Native Computing Foundation. 2018. NATS Server - C Client . https://github.com/nats-io/cnats . (2018). {Online; accessed 09--14--2018}.Google Scholar
- Xiaoli Cui, Pingfei Zhu, Xin Yang, Keqiu Li, and Changqing Ji. 2014. Optimized big data K-means clustering using MapReduce. The Journal of Supercomputing, Vol. 70, 3 (2014), 1249--1259. Google ScholarDigital Library
- Matthew L. Curry, H. Lee Ward, and Geoff Danielson. 2015. Motivation and Design of the Sirocco Storage System Version 1.0 . Technical Report. Sandia National Laboratories. {Online; accessed 09--17--2018}.Google Scholar
- Matthew Curtis-Maury, Vinay Devadas, Vania Fang, and Aditya Kulkarni. 2016. To Waffinity and Beyond: A Scalable Architecture for Incremental Parallelization of File System Code.. In OSDI. 419--434. Google ScholarDigital Library
- Matteo D'Ambrosio, Christian Dannewitz, Holger Karl, and Vinicio Vercellone. 2011. MDHT: a hierarchical name resolution service for information-centric networks. In Proceedings of the ACM workshop on Information-centric networking. ACM, 7--12. Google ScholarDigital Library
- Sudipto Das, Amr El Abbadi, and Divyakant Agrawal. 2009. ElasTraS: An Elastic Transactional Data Store in the Cloud. HotCloud, Vol. 9 (2009), 131--142.Google Scholar
- Ciprian Docan, Manish Parashar, and Scott Klasky. 2012. Dataspaces: an interaction and coordination framework for coupled simulation workflows. Cluster Computing, Vol. 15, 2 (2012), 163--181. Google ScholarDigital Library
- Mike Folk, Albert Cheng, and Kim Yates. 1999. HDF5: A file format and I/O library for high performance computing applications. In Proceedings of Supercomputing, Vol. 99. 5--33.Google Scholar
- Kui Gao, Wei-keng Liao, Arifa Nisar, Alok Choudhary, Robert Ross, and Robert Latham. 2009. Using subfiling to improve programming flexibility and performance of parallel shared-file I/O. In Parallel Processing, 2009. ICPP'09. International Conference on. IEEE, 470--477. Google ScholarDigital Library
- Alan Gates. 2012. HCatalog: An Integration Tool . Technical Report. Intel® .Google Scholar
- Roxana Geambasu, Amit A Levy, Tadayoshi Kohno, Arvind Krishnamurthy, and Henry M Levy. 2010. Comet: An active distributed key-value store.. In OSDI. 323--336.Google Scholar
- Joachim Giesen, Eva Schuberth, and Milovs Stojaković. 2009. Approximate sorting. Fundamenta Informaticae, Vol. 90, 1--2 (2009), 67--72. Google ScholarDigital Library
- Google Inc. 2018. CityHash library . https://github.com/google/cityhash . (2018). {Online; accessed 09--14--2018}.Google Scholar
- Grant, W. Shane and Voorhies, Randolph. 2017. Cereal - A CGoogle Scholar
- 11 library for serialization by University of Southern California . http://uscilab.github.io/cereal/. (2017). {Online; accessed 09--14--2018}.Google Scholar
- Jan Heichler. 2014. An introduction to BeeGFS. Technical Report.Google Scholar
- Tony Hey, Stewart Tansley, Kristin M Tolle, et almbox. 2009. The fourth paradigm: data-intensive scientific discovery. Vol. 1. Microsoft Research, Redmond, WA.Google Scholar
- IBM. 2018. HDFS Transparency . https://ibm.co/2Pciyv7 . (2018). {Online; accessed 08--27--2018}.Google Scholar
- Intel. 2018. Hadoop Adapter for Lustre (HAL) . https://github.com/whamcloud/lustre-connector-for-hadoop . (2018). {Online; accessed 08--27--2018}.Google Scholar
- High Performance Data Division Intel® Enterprise Edition for Lustre* Software. 2014. WHITE PAPER Big Data Meets High Performance Computing. Technical Report. Intel. {Online; accessed 08--27--2018}.Google Scholar
- Kamil Iskra, John W Romein, Kazutomo Yoshii, and Pete Beckman. 2008. ZOID: I/O-forwarding infrastructure for petascale architectures. In 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming. ACM, 153--162.Google ScholarDigital Library
- Laxmikant V Kale and Sanjeev Krishnan. 1996. CharmGoogle Scholar
- : Parallel programming with message-driven objects. Parallel programming using CGoogle Scholar
- (1996), 175--213.Google Scholar
- Youngjae Kim, Raghul Gunasekaran, Galen M Shipman, David Dillow, Zhe Zhang, and Bradley W Settlemyer. 2010. Workload characterization of a leadership class storage cluster. In Petascale Data Storage Workshop (PDSW), 2010 5th. IEEE, 1--5.Google ScholarCross Ref
- Anthony Kougkas, Hariharan Devarajan, and Xian-He Sun. 2018a. Hermes: a heterogeneous-aware multi-tiered distributed I/O buffering system. In Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing. ACM, 219--230.Google Scholar
- Anthony Kougkas, Hariharan Devarajan, and Xian-He Sun. 2018b. IRIS: I/O Redirection via Integrated Storage. In Proceedings of the 32nd ACM International Conference on Supercomputing (ICS). ACM.Google ScholarDigital Library
- Anthony Kougkas, Hariharan Devarajan, Xian-He Sun, and Jay Lofstead. 2018c. Harmonia: An Interference-Aware Dynamic I/O Scheduler for Shared Non-Volatile Burst Buffers. In Proceedings of the 2018 IEEE Cluster Conference (Cluster'18). IEEE.Google ScholarCross Ref
- Anthony Kougkas, Hassan Eslami, Xian-He Sun, Rajeev Thakur, and William Gropp. 2017. Rethinking key--value store for parallel I/O optimization. The International Journal of High Performance Computing Applications, Vol. 31, 4 (2017), 335--356.Google ScholarDigital Library
- Haoyuan Li, Ali Ghodsi, Matei Zaharia, Scott Shenker, and Ion Stoica. 2014b. Tachyon: Reliable, memory speed storage for cluster computing frameworks. In Proceedings of the ACM Symposium on Cloud Computing. ACM, 1--15. Google ScholarDigital Library
- Jing Li, Jian Jia Chen, Kunal Agrawal, Chenyang Lu, Chris Gill, and Abusayeed Saifullah. 2014a. Analysis of federated and global scheduling for parallel real-time tasks. In Real-Time Systems (ECRTS), 2014 26th Euromicro Conference on. IEEE, 85--96. Google ScholarDigital Library
- Jianwei Li, Wei-keng Liao, Alok Choudhary, Robert Ross, Rajeev Thakur, William Gropp, Robert Latham, Andrew Siegel, Brad Gallagher, and Michael Zingale. 2003. Parallel netCDF: A high-performance scientific I/O interface. In Supercomputing, 2003 ACM/IEEE Conference. ACM/IEEE, Phoenix, AZ, 39--39.Google ScholarDigital Library
- Kenli Li, Xiaoyong Tang, Bharadwaj Veeravalli, and Keqin Li. 2015. Scheduling precedence constrained stochastic tasks on heterogeneous cluster systems. IEEE Transactions on computers, Vol. 64, 1 (2015), 191--204.Google ScholarCross Ref
- Harold C Lim, Shivnath Babu, and Jeffrey S Chase. 2010. Automated control for elastic storage. In Proceedings of the 7th international conference on Autonomic computing. ACM, 1--10. Google ScholarDigital Library
- Juan Liu, Yuyi Mao, Jun Zhang, and Khaled B Letaief. 2016. Delay-optimal computation task scheduling for mobile-edge computing systems. In Information Theory (ISIT), 2016 IEEE International Symposium on. IEEE, 1451--1455.Google ScholarCross Ref
- Yu-Hang Liu and Xian-He Sun. 2015. LPM: concurrency-driven layered performance matching. In Parallel Processing (ICPP), 2015 44th International Conference on. IEEE, 879--888. Google ScholarDigital Library
- Glenn K Lockwood, Damian Hazen, Quincey Koziol, RS Canon, Katie Antypas, Jan Balewski, Nicholas Balthaser, Wahid Bhimji, James Botts, Jeff Broughton, et almbox. 2017. Storage 2020: A Vision for the Future of HPC Storage. Technical Report. NERSC .Google Scholar
- Yucheng Low, Joseph E Gonzalez, Aapo Kyrola, Danny Bickson, Carlos E Guestrin, and Joseph Hellerstein. 2014. Graphlab: A new framework for parallel machine learning. arXiv preprint arXiv:1408.2041 (2014).Google ScholarDigital Library
- Memached. 2018. Extstore plugin . https://github.com/memcached/memcached/wiki/Extstore . (2018). {Online; accessed 09--14--2018}.Google Scholar
- Wira D Mulia, Naresh Sehgal, Sohum Sohoni, John M Acken, C Lucas Stanberry, and David J Fritz. 2013. Cloud workload characterization. IETE Technical Review, Vol. 30, 5 (2013), 382--397.Google ScholarCross Ref
- Ron A. Oldfield, Kenneth Moreland, Nathan Fabian, and David Rogers. 2014. Evaluation of Methods to Integrate Analysis into a Large-Scale Shock Physics Code. In Proceedings of the 28th ACM international Conference on Supercomputing . 83--92. Google ScholarDigital Library
- Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, and Andrew Tomkins. 2008. Pig latin: a not-so-foreign language for data processing. In Proceedings of the 2008 ACM SIGMOD Conference on Management of data. ACM, 1099--1110. Google ScholarDigital Library
- Fengfeng Pan, Yinliang Yue, Jin Xiong, and Daxiang Hao. 2014. I/O characterization of big data workloads in data centers. In Workshop on Big Data Benchmarks, Performance Optimization, and Emerging Hardware. Springer, 85--97.Google ScholarCross Ref
- Juan Piernas, Jarek Nieplocha, and Evan J Felix. 2007. Evaluation of active storage strategies for the lustre parallel file system. In Proceedings of the 2007 ACM/IEEE conference on Supercomputing. ACM, 28.Google ScholarDigital Library
- Jakob Puchinger, Günther R Raidl, and Ulrich Pferschy. 2010. The multidimensional knapsack problem: Structure and algorithms. INFORMS Journal on Computing, Vol. 22, 2 (2010), 250--265. Google ScholarDigital Library
- Daniel A Reed and Jack Dongarra. 2015. Exascale computing and big data. Commun. ACM, Vol. 58, 7 (2015), 56--68. Google ScholarDigital Library
- Kai Ren, Qing Zheng, Swapnil Patil, and Garth Gibson. 2014. IndexFS: scaling file system metadata performance with stateless caching and bulk insertion. In SC14: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, New Orleans, LA, 237--248. Google ScholarDigital Library
- Erik Riedel, Garth Gibson, and Christos Faloutsos. 1998. Active storage for large-scale data mining and multimedia applications. In Proceedings of 24th Conference on Very Large Databases. Citeseer, 62--73. Google ScholarDigital Library
- Robert B Ross, Rajeev Thakur, et almbox. 2000. PVFS: A Parallel File System for Linux Clusters . In Proceedings of the 4th annual Linux Showcase and Conference . Google ScholarDigital Library
- Michael W Shapiro. 2017. Method and system for global namespace with consistent hashing. (Oct. 10 2017). US Patent 9,787,773.Google Scholar
- Steve Conway. 2015. When Data Needs More Firepower: The HPC, Analytics Convergence . https://bit.ly/2od68r7 . (2015). {Online; accessed 08--27--2018}.Google Scholar
- Rajeev Thakur, William Gropp, and Ewing Lusk. 1999. Data sieving and collective I/O in ROMIO. In Frontiers of Massively Parallel Computation, 1999. Frontiers' 99. The Seventh Symposium on the. IEEE, 182--189. Google ScholarDigital Library
- Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Suresh Anthony, Hao Liu, Pete Wyckoff, and Raghotham Murthy. 2009. Hive: a warehousing solution over a map-reduce framework. Proceedings of the VLDB Endowment, Vol. 2, 2 (2009), 1626--1629. Google ScholarDigital Library
- Devesh Tiwari, Simona Boboila, Sudharshan S Vazhkudai, Youngjae Kim, Xiaosong Ma, Peter Desnoyers, and Yan Solihin. 2013. Active flash: towards energy-efficient, in-situ data analytics on extreme-scale machines.. In FAST. 119--132. Google ScholarDigital Library
- Murali Vilayannur, Partho Nath, and Anand Sivasubramaniam. 2005. Providing Tunable Consistency for a Parallel File Store.. In FAST, Vol. 5. 2--2. Google ScholarDigital Library
- Zhenyu Wang and David Garlan. 2000. Task-driven computing . Technical Report. CARNEGIE-MELLON UNIV PITTSBURGH PA SCHOOL OF COMPUTER SCIENCE.Google Scholar
- Hakim Weatherspoon and John D Kubiatowicz. 2002. Erasure coding vs. replication: A quantitative comparison. In International Workshop on Peer-to-Peer Systems. Springer, 328--337. Google ScholarDigital Library
- Jean-Francois Weets, Manish Kumar Kakhani, and Anil Kumar. 2015. Limitations and challenges of HDFS and MapReduce. In Green Computing and Internet of Things (ICGCIoT), 2015 International Conference on. IEEE, 545--549. Google ScholarDigital Library
- Sage A Weil, Scott A Brandt, Ethan L Miller, Darrell DE Long, and Carlos Maltzahn. 2006. Ceph: A scalable, high-performance distributed file system. In Proceedings of the 7th symposium on Operating systems design and implementation. USENIX Association, 307--320. Google ScholarDigital Library
- Jian Xu and Steven Swanson. 2016. NOVA: A Log-structured File System for Hybrid Volatile/Non-volatile Main Memories.. In FAST . 323--338. Google ScholarDigital Library
- Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation. USENIX Association, 2--2. Google ScholarDigital Library
- Matei Zaharia, Mosharaf Chowdhury, Michael J Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster computing with working sets. HotCloud, Vol. 10, 10--10 (2010), 95. Google ScholarDigital Library
- Shuanglong Zhang, Helen Catanese, and An-I Andy Wang. 2016. The Composite-file File System: Decoupling the One-to-One Mapping of Files and Metadata for Better Performance.. In FAST. 15--22. Google ScholarDigital Library
- Fang Zheng, Hasan Abbasi, Ciprian Docan, Jay Lofstead, Qing Liu, Scott Klasky, Manish Parashar, Norbert Podhorszki, Karsten Schwan, and Matthew Wolf. 2010. PreDatA--preparatory data analytics on peta-scale machines. In Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on. IEEE, 1--12.Google ScholarCross Ref
- Qing Zheng, Kai Ren, and Garth Gibson. 2014. BatchFS: scaling the file system control plane with client-funded metadata servers. In Proceedings of the 9th Parallel Data Storage Workshop. IEEE Press, New Orleans, LA, 1--6. Google ScholarDigital Library
- Shujia Zhou, Bruce H Van Aartsen, and Thomas L Clune. 2008. A lightweight scalable I/O utility for optimizing High-End Computing applications. In Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on . IEEE, Miami, FL, USA, 1--7.Google ScholarCross Ref
Index Terms
- LABIOS: A Distributed Label-Based I/O System
Recommendations
Bridging Storage Semantics Using Data Labels and Asynchronous I/O
Special Section on Computational Storage and Regular PapersIn the era of data-intensive computing, large-scale applications, in both scientific and the BigData communities, demonstrate unique I/O requirements leading to a proliferation of different storage devices and software stacks, many of which have ...
Agility and Performance in Elastic Distributed Storage
Special Issue on Usenix Fast 2014Elastic storage systems can be expanded or contracted to meet current demand, allowing servers to be turned off or used for other tasks. However, the usefulness of an elastic distributed storage system is limited by its agility: how quickly it can ...
Evaluating and Optimizing the Storage Strategies for an Elastic Object Store
WISA '13: Proceedings of the 2013 10th Web Information System and Application ConferenceIn this paper, we focus on evaluating different storage strategies of different kinds of data and their index stored in Punt Table. Punt Table is a NoSQL database designed for elastic objects storage. Punt Table uses a schema-free way to store and get ...
Comments