skip to main content
10.1145/3307681.3325405acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article
Public Access

LABIOS: A Distributed Label-Based I/O System

Published:17 June 2019Publication History

ABSTRACT

In the era of data-intensive computing, large-scale applications, in both scientific and the BigData communities, demonstrate unique I/O requirements leading to a proliferation of different storage devices and software stacks, many of which have conflicting requirements. In this paper, we investigate how to support a wide variety of conflicting I/O workloads under a single storage system. We introduce the idea of a Label, a new data representation, and, we present LABIOS: a new, distributed, Label- based I/O system. LABIOS boosts I/O performance by up to 17x via asynchronous I/O, supports heterogeneous storage resources, offers storage elasticity, and promotes in-situ analytics via data provisioning. LABIOS demonstrates the effectiveness of storage bridging to support the convergence of HPC and BigData workloads on a single platform.

References

  1. Michael Bauer, Sean Treichler, Elliott Slaughter, and Alex Aiken. 2012. Legion: Expressing locality and independence with logical regions. In High Performance Computing, Networking, Storage and Analysis (SC). IEEE, 1--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Andreas Berl, Erol Gelenbe, Marco Di Girolamo, Giovanni Giuliani, Hermann De Meer, Minh Quan Dang, and Kostas Pentikousis. 2010. Energy-efficient cloud computing. The computer journal, Vol. 53, 7 (2010), 1045--1051. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Dimitris Bertsimas and Ramazan Demir. 2002. An approximate DP approach to multidimensional knapsack problems. Management Science, Vol. 48, 4 (2002), 550--565. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Deepavali M Bhagwat, Marc Eshel, Dean Hildebrand, Manoj P Naik, Wayne A Sawdon, Frank B Schmuck, and Renu Tewari. 2018a. Global namespace for a hierarchical set of file systems. (July 5 2018). US Patent App. 15/397,632.Google ScholarGoogle Scholar
  5. Deepavali M Bhagwat, Marc Eshel, Dean Hildebrand, Manoj P Naik, Wayne A Sawdon, Frank B Schmuck, and Renu Tewari. 2018b. Rebuilding the namespace in a hierarchical union mounted file system. (July 5 2018). US Patent App. 15/397,601.Google ScholarGoogle Scholar
  6. Wahid Bhimji, Debbie Bard, Melissa Romanus, David Paul, Andrey Ovsyannikov, Brian Friesen, Matt Bryson, Joaquin Correa, Glenn K Lockwood, Vakho Tsulaia, et almbox. 2016. Accelerating science with the NERSC burst buffer early user program . Technical Report. NERSC.Google ScholarGoogle Scholar
  7. John Biddiscombe, Jerome Soumagne, Guillaume Oger, David Guibert, and Jean-Guillaume Piccinali. 2011. Parallel computational steering and analysis for hpc applications using a paraview interface and the hdf5 dsm virtual file driver. In Eurographics Symposium on Parallel Graphics and Visualization. Eurographics Association, 91--100.Google ScholarGoogle Scholar
  8. M Scot Breitenfeld, Neil Fortner, Jordan Henderson, Jerome Soumagne, Mohamad Chaarawi, Johann Lombardi, and Quincey Koziol. 2017. DAOS for Extreme-scale Systems in Scientific Applications. arXiv preprint arXiv:1712.00423 (2017).Google ScholarGoogle Scholar
  9. George H Bryan and J Michael Fritsch. 2002. A benchmark simulation for moist nonhydrostatic numerical models. Monthly Weather Review, Vol. 130, 12 (2002), 2917--2928.Google ScholarGoogle ScholarCross RefCross Ref
  10. Philip Carns, Sam Lang, Robert Ross, Murali Vilayannur, Julian Kunkel, and Thomas Ludwig. 2009. Small-file access in parallel file systems. In Parallel & Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on. IEEE, 1--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Chameleon.org. 2018. Chameleon system . https://www.chameleoncloud.org/about/chameleon/. (2018). {Online; accessed 09--14--2018}.Google ScholarGoogle Scholar
  12. Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C Hsieh, Deborah A Wallach, Mike Burrows, Tushar Chandra, and Robert E Gruber. 2008. Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems (TOCS), Vol. 26, 2 (2008), 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Nathanaël Cheriere, Matthieu Dorier, and Gabriel Antoniu. 2018. A Lower Bound for the Commission Times in Replication-Based Distributed Storage Systems . Ph.D. Dissertation. Inria Rennes-Bretagne Atlantique.Google ScholarGoogle Scholar
  14. Cloud Native Computing Foundation. 2018. NATS Server - C Client . https://github.com/nats-io/cnats . (2018). {Online; accessed 09--14--2018}.Google ScholarGoogle Scholar
  15. Xiaoli Cui, Pingfei Zhu, Xin Yang, Keqiu Li, and Changqing Ji. 2014. Optimized big data K-means clustering using MapReduce. The Journal of Supercomputing, Vol. 70, 3 (2014), 1249--1259. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Matthew L. Curry, H. Lee Ward, and Geoff Danielson. 2015. Motivation and Design of the Sirocco Storage System Version 1.0 . Technical Report. Sandia National Laboratories. {Online; accessed 09--17--2018}.Google ScholarGoogle Scholar
  17. Matthew Curtis-Maury, Vinay Devadas, Vania Fang, and Aditya Kulkarni. 2016. To Waffinity and Beyond: A Scalable Architecture for Incremental Parallelization of File System Code.. In OSDI. 419--434. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Matteo D'Ambrosio, Christian Dannewitz, Holger Karl, and Vinicio Vercellone. 2011. MDHT: a hierarchical name resolution service for information-centric networks. In Proceedings of the ACM workshop on Information-centric networking. ACM, 7--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Sudipto Das, Amr El Abbadi, and Divyakant Agrawal. 2009. ElasTraS: An Elastic Transactional Data Store in the Cloud. HotCloud, Vol. 9 (2009), 131--142.Google ScholarGoogle Scholar
  20. Ciprian Docan, Manish Parashar, and Scott Klasky. 2012. Dataspaces: an interaction and coordination framework for coupled simulation workflows. Cluster Computing, Vol. 15, 2 (2012), 163--181. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Mike Folk, Albert Cheng, and Kim Yates. 1999. HDF5: A file format and I/O library for high performance computing applications. In Proceedings of Supercomputing, Vol. 99. 5--33.Google ScholarGoogle Scholar
  22. Kui Gao, Wei-keng Liao, Arifa Nisar, Alok Choudhary, Robert Ross, and Robert Latham. 2009. Using subfiling to improve programming flexibility and performance of parallel shared-file I/O. In Parallel Processing, 2009. ICPP'09. International Conference on. IEEE, 470--477. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Alan Gates. 2012. HCatalog: An Integration Tool . Technical Report. Intel® .Google ScholarGoogle Scholar
  24. Roxana Geambasu, Amit A Levy, Tadayoshi Kohno, Arvind Krishnamurthy, and Henry M Levy. 2010. Comet: An active distributed key-value store.. In OSDI. 323--336.Google ScholarGoogle Scholar
  25. Joachim Giesen, Eva Schuberth, and Milovs Stojaković. 2009. Approximate sorting. Fundamenta Informaticae, Vol. 90, 1--2 (2009), 67--72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Google Inc. 2018. CityHash library . https://github.com/google/cityhash . (2018). {Online; accessed 09--14--2018}.Google ScholarGoogle Scholar
  27. Grant, W. Shane and Voorhies, Randolph. 2017. Cereal - A CGoogle ScholarGoogle Scholar
  28. 11 library for serialization by University of Southern California . http://uscilab.github.io/cereal/. (2017). {Online; accessed 09--14--2018}.Google ScholarGoogle Scholar
  29. Jan Heichler. 2014. An introduction to BeeGFS. Technical Report.Google ScholarGoogle Scholar
  30. Tony Hey, Stewart Tansley, Kristin M Tolle, et almbox. 2009. The fourth paradigm: data-intensive scientific discovery. Vol. 1. Microsoft Research, Redmond, WA.Google ScholarGoogle Scholar
  31. IBM. 2018. HDFS Transparency . https://ibm.co/2Pciyv7 . (2018). {Online; accessed 08--27--2018}.Google ScholarGoogle Scholar
  32. Intel. 2018. Hadoop Adapter for Lustre (HAL) . https://github.com/whamcloud/lustre-connector-for-hadoop . (2018). {Online; accessed 08--27--2018}.Google ScholarGoogle Scholar
  33. High Performance Data Division Intel® Enterprise Edition for Lustre* Software. 2014. WHITE PAPER Big Data Meets High Performance Computing. Technical Report. Intel. {Online; accessed 08--27--2018}.Google ScholarGoogle Scholar
  34. Kamil Iskra, John W Romein, Kazutomo Yoshii, and Pete Beckman. 2008. ZOID: I/O-forwarding infrastructure for petascale architectures. In 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming. ACM, 153--162.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Laxmikant V Kale and Sanjeev Krishnan. 1996. CharmGoogle ScholarGoogle Scholar
  36. : Parallel programming with message-driven objects. Parallel programming using CGoogle ScholarGoogle Scholar
  37. (1996), 175--213.Google ScholarGoogle Scholar
  38. Youngjae Kim, Raghul Gunasekaran, Galen M Shipman, David Dillow, Zhe Zhang, and Bradley W Settlemyer. 2010. Workload characterization of a leadership class storage cluster. In Petascale Data Storage Workshop (PDSW), 2010 5th. IEEE, 1--5.Google ScholarGoogle ScholarCross RefCross Ref
  39. Anthony Kougkas, Hariharan Devarajan, and Xian-He Sun. 2018a. Hermes: a heterogeneous-aware multi-tiered distributed I/O buffering system. In Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing. ACM, 219--230.Google ScholarGoogle Scholar
  40. Anthony Kougkas, Hariharan Devarajan, and Xian-He Sun. 2018b. IRIS: I/O Redirection via Integrated Storage. In Proceedings of the 32nd ACM International Conference on Supercomputing (ICS). ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Anthony Kougkas, Hariharan Devarajan, Xian-He Sun, and Jay Lofstead. 2018c. Harmonia: An Interference-Aware Dynamic I/O Scheduler for Shared Non-Volatile Burst Buffers. In Proceedings of the 2018 IEEE Cluster Conference (Cluster'18). IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  42. Anthony Kougkas, Hassan Eslami, Xian-He Sun, Rajeev Thakur, and William Gropp. 2017. Rethinking key--value store for parallel I/O optimization. The International Journal of High Performance Computing Applications, Vol. 31, 4 (2017), 335--356.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Haoyuan Li, Ali Ghodsi, Matei Zaharia, Scott Shenker, and Ion Stoica. 2014b. Tachyon: Reliable, memory speed storage for cluster computing frameworks. In Proceedings of the ACM Symposium on Cloud Computing. ACM, 1--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Jing Li, Jian Jia Chen, Kunal Agrawal, Chenyang Lu, Chris Gill, and Abusayeed Saifullah. 2014a. Analysis of federated and global scheduling for parallel real-time tasks. In Real-Time Systems (ECRTS), 2014 26th Euromicro Conference on. IEEE, 85--96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Jianwei Li, Wei-keng Liao, Alok Choudhary, Robert Ross, Rajeev Thakur, William Gropp, Robert Latham, Andrew Siegel, Brad Gallagher, and Michael Zingale. 2003. Parallel netCDF: A high-performance scientific I/O interface. In Supercomputing, 2003 ACM/IEEE Conference. ACM/IEEE, Phoenix, AZ, 39--39.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Kenli Li, Xiaoyong Tang, Bharadwaj Veeravalli, and Keqin Li. 2015. Scheduling precedence constrained stochastic tasks on heterogeneous cluster systems. IEEE Transactions on computers, Vol. 64, 1 (2015), 191--204.Google ScholarGoogle ScholarCross RefCross Ref
  47. Harold C Lim, Shivnath Babu, and Jeffrey S Chase. 2010. Automated control for elastic storage. In Proceedings of the 7th international conference on Autonomic computing. ACM, 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Juan Liu, Yuyi Mao, Jun Zhang, and Khaled B Letaief. 2016. Delay-optimal computation task scheduling for mobile-edge computing systems. In Information Theory (ISIT), 2016 IEEE International Symposium on. IEEE, 1451--1455.Google ScholarGoogle ScholarCross RefCross Ref
  49. Yu-Hang Liu and Xian-He Sun. 2015. LPM: concurrency-driven layered performance matching. In Parallel Processing (ICPP), 2015 44th International Conference on. IEEE, 879--888. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Glenn K Lockwood, Damian Hazen, Quincey Koziol, RS Canon, Katie Antypas, Jan Balewski, Nicholas Balthaser, Wahid Bhimji, James Botts, Jeff Broughton, et almbox. 2017. Storage 2020: A Vision for the Future of HPC Storage. Technical Report. NERSC .Google ScholarGoogle Scholar
  51. Yucheng Low, Joseph E Gonzalez, Aapo Kyrola, Danny Bickson, Carlos E Guestrin, and Joseph Hellerstein. 2014. Graphlab: A new framework for parallel machine learning. arXiv preprint arXiv:1408.2041 (2014).Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Memached. 2018. Extstore plugin . https://github.com/memcached/memcached/wiki/Extstore . (2018). {Online; accessed 09--14--2018}.Google ScholarGoogle Scholar
  53. Wira D Mulia, Naresh Sehgal, Sohum Sohoni, John M Acken, C Lucas Stanberry, and David J Fritz. 2013. Cloud workload characterization. IETE Technical Review, Vol. 30, 5 (2013), 382--397.Google ScholarGoogle ScholarCross RefCross Ref
  54. Ron A. Oldfield, Kenneth Moreland, Nathan Fabian, and David Rogers. 2014. Evaluation of Methods to Integrate Analysis into a Large-Scale Shock Physics Code. In Proceedings of the 28th ACM international Conference on Supercomputing . 83--92. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, and Andrew Tomkins. 2008. Pig latin: a not-so-foreign language for data processing. In Proceedings of the 2008 ACM SIGMOD Conference on Management of data. ACM, 1099--1110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Fengfeng Pan, Yinliang Yue, Jin Xiong, and Daxiang Hao. 2014. I/O characterization of big data workloads in data centers. In Workshop on Big Data Benchmarks, Performance Optimization, and Emerging Hardware. Springer, 85--97.Google ScholarGoogle ScholarCross RefCross Ref
  57. Juan Piernas, Jarek Nieplocha, and Evan J Felix. 2007. Evaluation of active storage strategies for the lustre parallel file system. In Proceedings of the 2007 ACM/IEEE conference on Supercomputing. ACM, 28.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Jakob Puchinger, Günther R Raidl, and Ulrich Pferschy. 2010. The multidimensional knapsack problem: Structure and algorithms. INFORMS Journal on Computing, Vol. 22, 2 (2010), 250--265. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Daniel A Reed and Jack Dongarra. 2015. Exascale computing and big data. Commun. ACM, Vol. 58, 7 (2015), 56--68. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Kai Ren, Qing Zheng, Swapnil Patil, and Garth Gibson. 2014. IndexFS: scaling file system metadata performance with stateless caching and bulk insertion. In SC14: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, New Orleans, LA, 237--248. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Erik Riedel, Garth Gibson, and Christos Faloutsos. 1998. Active storage for large-scale data mining and multimedia applications. In Proceedings of 24th Conference on Very Large Databases. Citeseer, 62--73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Robert B Ross, Rajeev Thakur, et almbox. 2000. PVFS: A Parallel File System for Linux Clusters . In Proceedings of the 4th annual Linux Showcase and Conference . Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Michael W Shapiro. 2017. Method and system for global namespace with consistent hashing. (Oct. 10 2017). US Patent 9,787,773.Google ScholarGoogle Scholar
  64. Steve Conway. 2015. When Data Needs More Firepower: The HPC, Analytics Convergence . https://bit.ly/2od68r7 . (2015). {Online; accessed 08--27--2018}.Google ScholarGoogle Scholar
  65. Rajeev Thakur, William Gropp, and Ewing Lusk. 1999. Data sieving and collective I/O in ROMIO. In Frontiers of Massively Parallel Computation, 1999. Frontiers' 99. The Seventh Symposium on the. IEEE, 182--189. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Suresh Anthony, Hao Liu, Pete Wyckoff, and Raghotham Murthy. 2009. Hive: a warehousing solution over a map-reduce framework. Proceedings of the VLDB Endowment, Vol. 2, 2 (2009), 1626--1629. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Devesh Tiwari, Simona Boboila, Sudharshan S Vazhkudai, Youngjae Kim, Xiaosong Ma, Peter Desnoyers, and Yan Solihin. 2013. Active flash: towards energy-efficient, in-situ data analytics on extreme-scale machines.. In FAST. 119--132. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Murali Vilayannur, Partho Nath, and Anand Sivasubramaniam. 2005. Providing Tunable Consistency for a Parallel File Store.. In FAST, Vol. 5. 2--2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Zhenyu Wang and David Garlan. 2000. Task-driven computing . Technical Report. CARNEGIE-MELLON UNIV PITTSBURGH PA SCHOOL OF COMPUTER SCIENCE.Google ScholarGoogle Scholar
  70. Hakim Weatherspoon and John D Kubiatowicz. 2002. Erasure coding vs. replication: A quantitative comparison. In International Workshop on Peer-to-Peer Systems. Springer, 328--337. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Jean-Francois Weets, Manish Kumar Kakhani, and Anil Kumar. 2015. Limitations and challenges of HDFS and MapReduce. In Green Computing and Internet of Things (ICGCIoT), 2015 International Conference on. IEEE, 545--549. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Sage A Weil, Scott A Brandt, Ethan L Miller, Darrell DE Long, and Carlos Maltzahn. 2006. Ceph: A scalable, high-performance distributed file system. In Proceedings of the 7th symposium on Operating systems design and implementation. USENIX Association, 307--320. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Jian Xu and Steven Swanson. 2016. NOVA: A Log-structured File System for Hybrid Volatile/Non-volatile Main Memories.. In FAST . 323--338. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation. USENIX Association, 2--2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Matei Zaharia, Mosharaf Chowdhury, Michael J Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster computing with working sets. HotCloud, Vol. 10, 10--10 (2010), 95. Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Shuanglong Zhang, Helen Catanese, and An-I Andy Wang. 2016. The Composite-file File System: Decoupling the One-to-One Mapping of Files and Metadata for Better Performance.. In FAST. 15--22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Fang Zheng, Hasan Abbasi, Ciprian Docan, Jay Lofstead, Qing Liu, Scott Klasky, Manish Parashar, Norbert Podhorszki, Karsten Schwan, and Matthew Wolf. 2010. PreDatA--preparatory data analytics on peta-scale machines. In Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on. IEEE, 1--12.Google ScholarGoogle ScholarCross RefCross Ref
  78. Qing Zheng, Kai Ren, and Garth Gibson. 2014. BatchFS: scaling the file system control plane with client-funded metadata servers. In Proceedings of the 9th Parallel Data Storage Workshop. IEEE Press, New Orleans, LA, 1--6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. Shujia Zhou, Bruce H Van Aartsen, and Thomas L Clune. 2008. A lightweight scalable I/O utility for optimizing High-End Computing applications. In Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on . IEEE, Miami, FL, USA, 1--7.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. LABIOS: A Distributed Label-Based I/O System

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader