skip to main content
research-article

Check Out the Big Brain on BRAD: Simplifying Cloud Data Processing with Learned Automated Data Meshes

Published:01 July 2023Publication History
Skip Abstract Section

Abstract

The last decade of database research has led to the prevalence of specialized systems for different workloads. Consequently, organizations often rely on a combination of specialized systems, organized in a Data Mesh. Data meshes present significant challenges for system administrators, including picking the right system for each workload, moving data between systems, maintaining consistency, and correctly configuring each system. Many non-expert end users (e.g., data analysts or app developers) either cannot solve their business problems, or suffer from sub-optimal performance or cost due to this complexity. We envision BRAD, a cloud system that automatically integrates and manages data and systems into an instance-optimized data mesh, allowing users to efficiently store and query data under a unified data model (i.e., relational tables) without knowledge of underlying system details. With machine learning, BRAD automatically deduces the strengths and weaknesses of each engine through a combination of offline training and online probing. Then, BRAD uses these insights to route queries to the most suitable (combination of) system(s) for efficient execution. Furthermore, BRAD automates configuration tuning, resource scaling, and data migration across component systems, and makes recommendations for more impactful decisions, such as adding or removing systems. As such, BRAD exemplifies a new class of systems that utilize machine learning and the cloud to make complex data processing more accessible to end users, raising numerous new problems in database systems, machine learning, and the cloud.

References

  1. Michael Abebe, Horatiu Lazu, and Khuzaima Daudjee. 2022. Proteus: Autonomous Adaptive Storage for Mixed Workloads. In Proceedings of the 2022 International Conference on Management of Data (SIGMOD '22) (Philadelphia, PA, USA). Association for Computing Machinery, New York, NY, USA, 700--714. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Divy Agrawal, Sanjay Chawla, Bertty Contreras-Rojas, Ahmed Elmagarmid, Yasser Idris, Zoi Kaoudi, Sebastian Kruse, Ji Lucas, Essam Mansour, Mourad Ouzzani, Paolo Papotti, Jorge-Arnulfo Quiané-Ruiz, Nan Tang, Saravanan Thirumuruganathan, and Anis Troudi. 2018. RHEEM: Enabling Cross-Platform Data Processing: May the Big Data Be with You! Proceedings of the VLDB Endowment 11, 11 (July 2018), 1414--1427. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Rana Alotaibi, Damian Bursztyn, Alin Deutsch, Ioana Manolescu, and Stamatis Zampetakis. 2019. Towards Scalable Hybrid Stores: Constraint-Based Rewriting to the Rescue. In Proceedings of the 2019 International Conference on Management of Data (SIGMOD '19). 1660--1677.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Amazon Web Services. 2022. AWS announces Amazon Aurora zero-ETL integration with Amazon Redshift . https://aws.amazon.com/about-aws/whats-new/2022/11/amazon-aurora-zero-etl-integration-redshift/.Google ScholarGoogle Scholar
  5. Amazon Web Services. 2023. Amazon Athena. https://aws.amazon.com/athena/.Google ScholarGoogle Scholar
  6. Amazon Web Services. 2023. Amazon EMR. https://aws.amazon.com/emr/.Google ScholarGoogle Scholar
  7. Amazon Web Services. 2023. Amazon Redshift Serverless. https://aws.amazon.com/redshift/redshift-serverless/.Google ScholarGoogle Scholar
  8. Amazon Web Services. 2023. AWS Step Functions. https://aws.amazon.com/step-functions/.Google ScholarGoogle Scholar
  9. Amazon Web Services. 2023. Redshift Concurrency Scaling. https://docs.aws.amazon.com/redshift/latest/dg/concurrency-scaling.html.Google ScholarGoogle Scholar
  10. Amazon Web Services. 2023. What is AWS Glue? https://docs.aws.amazon.com/glue/latest/dg/what-is-glue.html.Google ScholarGoogle Scholar
  11. Michael Armbrust, Tathagata Das, Liwen Sun, Burak Yavuz, Shixiong Zhu, Mukul Murthy, Joseph Torres, Herman van Hovell, Adrian Ionescu, Alicja Łuszczak, Michał undefinedwitakowski, Michał Szafrański, Xiao Li, Takuya Ueshin, Mostafa Mokhtar, Peter Boncz, Ali Ghodsi, Sameer Paranjpye, Pieter Senster, Reynold Xin, and Matei Zaharia. 2020. Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores. Proceedings of the VLDB Endowment 13, 12 (2020), 3411--3424. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Michael Armbrust, Ali Ghodsi, Reynold Xin, and Matei Zaharia. 2021. Lake-house: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics. In Proceedings of the 11th Annual Conference on Innovative Data Systems Research (CIDR '21).Google ScholarGoogle Scholar
  13. Graham Bent, Patrick Dantressangle, David Vyvyan, Abbe Mowshowitz, and Valia Mitsou. 2008. A Dynamic Distributed Federated Database. In Proc. 2nd Ann. Conf. International Technology Alliance (ACITA '08').Google ScholarGoogle Scholar
  14. Philip A. Bernstein, Vassos Hadzilacos, and Nathan Goodman. 1987. Concurrency Control and Recovery in Database Systems. Addison-Wesley.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Yuri Breitbart, Hector Garcia-Molina, and Abraham Silberschatz. 1992. Overview of Multidatabase Transaction Management. VLDB Journal 1 (10 1992), 181--239. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Yuri Breitbart and Avi Silberschatz. 1988. Multidatabase Update Issues. In Proceedings of the 1988 ACM SIGMOD International Conference on Management of Data (Chicago, Illinois, USA) (SIGMOD '88). Association for Computing Machinery, New York, NY, USA, 135--142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Sebastian Burckhardt, Chris Gillum, David Justo, Konstantinos Kallas, Connor McMahon, and Christopher S Meiklejohn. 2021. Durable Functions: Semantics for Stateful Serverless. Proc. ACM Program. Lang. 5, OOPSLA (2021), 1--27.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Josh Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, and Wojciech Zaremba. 2021. Evaluating Large Language Models Trained on Code. arXiv:2107.03374 [cs.LG]Google ScholarGoogle Scholar
  19. Khuzaima Daudjee and Kenneth Salem. 2006. Lazy Database Replication with Snapshot Isolation. Proceedings of the VLDB Endowment (VLDB '06).Google ScholarGoogle Scholar
  20. Z. Dehghani. 2022. Data Mesh. O'Reilly Media. https://books.google.com/books?id=jmZjEAAAQBAJGoogle ScholarGoogle Scholar
  21. Amol Deshpande and Joseph M Hellerstein. 2002. Decoupled Query Optimization for Federated Database Systems. In Proceedings 18th International Conference on Data Engineering (ICDE '02). IEEE, 716--727.Google ScholarGoogle ScholarCross RefCross Ref
  22. Jialin Ding, Umar Farooq Minhas, Badrish Chandramouli, Chi Wang, Yinan Li, Ying Li, Donald Kossmann, Johannes Gehrke, and Tim Kraska. 2021. Instance-Optimized Data Layouts for Cloud Analytics Workloads. In Proceedings of the 2021 International Conference on Management of Data (Virtual Event, China) (SIGMOD '21). Association for Computing Machinery, New York, NY, USA, 418--431. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Jialin Ding, Vikram Nathan, Mohammad Alizadeh, and Tim Kraska. 2020. Tsunami: A Learned Multi-Dimensional Index for Correlated Data and Skewed Workloads. Proceedings of the VLDB Endowment 14, 2 (November 2020), 74--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Jennie Duggan, Aaron J. Elmore, Michael Stonebraker, Magda Balazinska, Bill Howe, Jeremy Kepner, Sam Madden, David Maier, Tim Mattson, and Stan Zdonik. 2015. The BigDAWG Polystore System. SIGMOD Rec. 44, 2 (August 2015), 11--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Aaron J. Elmore, Jennie Duggan, Mike Stonebraker, Magdalena Balazinska, Ugur Çetintemel, Vijay Gadepally, Jeffrey Heer, Bill Howe, Jeremy Kepner, Tim Kraska, Samuel Madden, David Maier, Timothy G. Mattson, Stavros Papadopoulos, Jeff Parkhurst, Nesime Tatbul, Manasi Vartak, and Stan Zdonik. 2015. A Demonstration of the BigDAWG Polystore System. Proceedings of the VLDB Endowment 8, 12 (2015), 1908--1911. http://www.vldb.org/pvldb/vol8/p1908-Elmore.pdfGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  26. Franz Färber, Norman May, Wolfgang Lehner, Philipp Große, Ingo Müller, Hannes Rauhe, and Jonathan Dees. 2012. The SAP HANA Database - An Architecture Overview. IEEE Data Eng. Bull. 35 (03 2012), 28--33.Google ScholarGoogle Scholar
  27. Gartner. 2022. DBMS Market Transformation 2021: The Big Picture. https://blogs.gartner.com/merv-adrian/2022/04/16/dbms-market-transformation-2021-the-big-picture/.Google ScholarGoogle Scholar
  28. Dimitrios Georgakopoulos, Marek Rusinkiewicz, and Amit P. Sheth. 1991. On Serializability of Multidatabase Transactions Through Forced Local Conflicts. In Proceedings of the Seventh International Conference on Data Engineering (ICDE '91). IEEE Computer Society, USA, 314--323.Google ScholarGoogle Scholar
  29. Victor Giannakouris and Immanuel Trummer. 2022. Building Learned Federated Query Optimizers. In CEUR workshop proceedings, Vol. 3186.Google ScholarGoogle Scholar
  30. Google, Inc. 2023. AlloyDB. https://cloud.google.com/alloydb.Google ScholarGoogle Scholar
  31. Laura Haas, Donald Kossmann, Edward Wimmers, and Jun Yang. 1997. Optimizing Queries Across Diverse Data Sources. In Proceedings of the VLDB Endowment (VLDB '97).Google ScholarGoogle Scholar
  32. Joachim Hammer, Hector Garcia-Molina, Kelly Ireland, Yannis Papakonstantinou, Jeffrey Ullman, and Jennifer Widom. 1995. Information Translation, Mediation, and Mosaic-Based Browsing in the TSIMMIS System. In Proceedings of the International Conference on Management of Data (SIGMOD '95).Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Benjamin Hilprecht and Carsten Binnig. 2022. Zero-Shot Cost Models for Out-of-the-box Learned Cost Prediction. arXiv preprint arXiv:2201.00561 (2022).Google ScholarGoogle Scholar
  34. Benjamin Hilprecht, Andreas Schmidt, Moritz Kulessa, Alejandro Molina, Kristian Kersting, and Carsten Binnig. 2019. Deepdb: Learn from data, not from queries! arXiv preprint arXiv:1909.00607 (2019).Google ScholarGoogle Scholar
  35. Dongxu Huang, Qi Liu, Qiu Cui, Zhuhe Fang, Xiaoyu Ma, Fei Xu, Li Shen, Liu Tang, Yuxing Zhou, Menglong Huang, Wan Wei, Cong Liu, Jian Zhang, Jianjun Li, Xuelian Wu, Lingyu Song, Ruoxi Sun, Shuaipeng Yu, Lei Zhao, Nicholas Cameron, Liquan Pei, and Xin Tang. 2020. TiDB: A Raft-Based HTAP Database. Proceedings of the VLDB Endowment 13, 12 (August 2020), 3072--3084. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. S.-Y. Hwang, E.-P. Lim, H.-R. Yang, S. Musukula, K. Mediratta, M. Ganesh, D. Clements, J. Stenoien, and J. Srivastava. 1994. The MYRIAD Federated Database Prototype. In Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data (Minneapolis, Minnesota, USA) (SIGMOD '94). Association for Computing Machinery, New York, NY, USA, 518. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Eric Jonas, Johann Schleier-Smith, Vikram Sreekanti, Chia che Tsai, Anurag Khandelwal, Qifan Pu, Vaishaal Shankar, Joao Carreira, Karl Krauth, Neeraja Jayant Yadwadkar, Joseph Gonzalez, Raluca A. Popa, Ion Stoica, and David A. Patterson. 2019. Cloud Programming Simplified: A Berkeley View on Serverless Computing. ArXiv abs/1902.03383 (2019).Google ScholarGoogle Scholar
  38. Vanja Josifovski, Peter Schwarz, Laura Haas, and Eileen Lin. 2002. Garlic: A New Flavor of Federated Query Processing for DB2. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data (SIGMOD '02). 524--532.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Konstantinos Kanellis, Ramnatthan Alagappan, and Shivaram Venkataraman. 2020. Too Many Knobs to Tune? Towards Faster Database Tuning by Pre-selecting Important Knobs. In 12th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage '20).Google ScholarGoogle Scholar
  40. Konstantinos Kanellis, Cong Ding, Brian Kroth, Andreas Müller, Carlo Curino, and Shivaram Venkataraman. 2022. LlamaTune: Sample-Efficient DBMS Configuration Tuning. Proceedings of the VLDB Endowment 15, 11 (2022), 2953--2965.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Alfons Kemper and Thomas Neumann. 2011. HyPer: A Hybrid OLTP & OLAP Main Memory Database System Based on Virtual Memory Snapshots. In Proceedings of the 2011 IEEE 27th International Conference on Data Engineering (ICDE '11). IEEE Computer Society, USA, 195--206. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Tim Kraska, Mohammad Alizadeh, Alex Beutel, Ed H. Chi, Ani Kristo, Guillaume Leclerc, Samuel Madden, Hongzi Mao, and Vikram Nathan. 2019. SageDB: A Learned Database System. In 9th Biennial Conference on Innovative Data Systems Research, (CIDR '19), Asilomar, CA, USA, January 13--16, 2019, Online Proceedings. www.cidrdb.org. http://cidrdb.org/cidr2019/papers/p117-kraska-cidr19.pdfGoogle ScholarGoogle Scholar
  43. Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, and Neoklis Polyzotis. 2017. The Case for Learned Index Structures. CoRR abs/1712.01208 (2017). arXiv:1712.01208 http://arxiv.org/abs/1712.01208Google ScholarGoogle Scholar
  44. Sanjay Krishnan, Zongheng Yang, Ken Goldberg, Joseph Hellerstein, and Ion Stoica. 2018. Learning to Optimize Join Queries with Deep Reinforcement Learning. arXiv preprint arXiv:1808.03196 (2018).Google ScholarGoogle Scholar
  45. Tirthankar Lahiri, Shasank Chavan, Maria Colgan, Dinesh Das, Amit Ganesh, Mike Gleeson, Sanket Hase, Allison Holloway, Jesse Kamp, Teck-Hua Lee, Juan Loaiza, Neil Macnaughton, Vineet Marwah, Niloy Mukherjee, Atrayee Mullick, Sujatha Muthulingam, Vivekanandhan Raja, Marty Roth, Ekrem Soylemez, and Mohamed Zait. 2015. Oracle Database In-Memory: A Dual Format In-Memory Database. In 2015 IEEE 31st International Conference on Data Engineering (ICDE '15). 1253--1258. Google ScholarGoogle ScholarCross RefCross Ref
  46. Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2015. How Good are Query Optimizers, Really? Proceedings of the VLDB Endowment 9, 3 (2015), 204--215.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Jiexing Li, Arnd Christian König, Vivek Narasayya, and Surajit Chaudhuri. 2012. Robust Estimation of Resource Consumption for SQL Queries Using Statistical Techniques. Proceedings of the VLDB Endowment 5, 11 (2012).Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Ee-Peng Lim and Jaideep Srivastava. 1993. Query Optimization and Processing in Federated Database Systems. In Proceedings of the Second International Conference on Information and Knowledge Management (CIKM '93). 720--722.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Wan Shen Lim, Matthew Butrovich, William Zhang, Andrew Crotty, Lin Ma, Peijing Xu, Johannes Gehrke, and Andrew Pavlo. 2023. Database Gyms. In Conference on Innovative Data Systems Research (CIDR '23).Google ScholarGoogle Scholar
  50. Lin Ma, Bailu Ding, Sudipto Das, and Adith Swaminathan. 2020. Active Learning for ML Enhanced Database Systems. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (SIGMOD '20). 175--191.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Lin Ma, Dana Van Aken, Ahmed Hefny, Gustavo Mezerhane, Andrew Pavlo, and Geoffrey J Gordon. 2018. Query-Based Workload Forecasting for Self-Driving Database Management Systems. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD '18). 631--645.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Ryan Marcus, Parimarjan Negi, Hongzi Mao, Nesime Tatbul, Mohammad Alizadeh, and Tim Kraska. 2022. Bao: Making Learned Query Optimization Practical. In Proceedings of the International Conference on Management of Data (SIGMOD '22).Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Ryan Marcus, Parimarjan Negi, Hongzi Mao, Chi Zhang, Mohammad Alizadeh, Tim Kraska, Olga Papaemmanouil, and Nesime Tatbul. 2019. Neo: A Learned Query Optimizer. Proceedings of the VLDB Endowment 12, 11 (2019).Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Ryan Marcus and Olga Papaemmanouil. 2018. Deep Reinforcement Learning for Join Order Enumeration. In Proceedings of the First International Workshop on Exploiting Artificial Intelligence Techniques for Data Management (aiDM '18).Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Ryan Marcus and Olga Papaemmanouil. 2019. Plan-Structured Deep Neural Network Models for Query Performance Prediction. Proceedings of the VLDB Endowment 12, 11 (2019).Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Microsoft Corporation. 2023. Serverless Compute Tier for Azure SQL Database. https://learn.microsoft.com/en-us/azure/azure-sql/database/serverless-tier-overview?view=azuresql&tabs=general-purpose.Google ScholarGoogle Scholar
  57. Vikram Nathan, Jialin Ding, Mohammad Alizadeh, and Tim Kraska. 2020. Learning Multi-Dimensional Indexes. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (Portland, OR, USA) (SIGMOD '20). Association for Computing Machinery, New York, NY, USA, 985--1000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Parimarjan Negi, Ryan Marcus, Andreas Kipf, Hongzi Mao, Nesime Tatbul, Tim Kraska, and Mohammad Alizadeh. 2021. Flow-Loss: Learning Cardinality Estimates That Matter. Proceedings of the VLDB Endowment 14, 11 (2021).Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Parimarjan Negi, Ziniu Wu, Andreas Kipf, Nesime Tatbul, Ryan Marcus, Sam Madden, Tim Kraska, and Mohammad Alizadeh. 2023. Robust Query Driven Cardinality Estimation under Changing Workloads. Proceedings of the VLDB Endowment 16, 6 (2023), 1520--1533.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Patrick O'Neil, Betty O'Neil, and Xuedong Chen. 2006. Star Schema Benchmark. Technical Report. University of Massachusetts Boston. https://www.cs.umb.edu/~poneil/StarSchemaB.PDF.Google ScholarGoogle Scholar
  61. Oracle. 2023. Oracle Autonomous Database. https://www.oracle.com/autonomous-database/.Google ScholarGoogle Scholar
  62. Andrew Pavlo, Gustavo Angulo, Joy Arulraj, Haibin Lin, Jiexi Lin, Lin Ma, Prashanth Menon, Todd Mowry, Matthew Perron, Ian Quah, Siddharth Santurkar, Anthony Tomasic, Skye Toor, Dana Van Aken, Ziqi Wang, Yingjun Wu, Ran Xian, and Tieying Zhang. 2017. Self-Driving Database Management Systems. In Conference on Innovative Data Systems Research (CIDR '17). https://db.cs.cmu.edu/papers/2017/p42-pavlo-cidr17.pdfGoogle ScholarGoogle Scholar
  63. Andrew Pavlo, Matthew Butrovich, Ananya Joshi, Lin Ma, Prashanth Menon, Dana Van Aken, Lisa Lee, and Ruslan Salakhutdinov. 2019. External vs. Internal: An Essay on Machine Learning Agents for Autonomous Database Management Systems. IEEE Data Engineering Bulletin (June 2019), 32--46. https://db.cs.cmu.edu/papers/2019/pavlo-icde-bulletin2019.pdfGoogle ScholarGoogle Scholar
  64. Andrew Pavlo, Matthew Butrovich, Lin Ma, Wan Shen Lim, Prashanth Menon, Dana Van Aken, and William Zhang. 2021. Make Your Database System Dream of Electric Sheep: Towards Self-Driving Operation. Proceedings of the VLDB Endowment 14, 12 (2021), 3211--3221. https://db.cs.cmu.edu/papers/2021/p3211-pavlo.pdfGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  65. Maksim Podkorytov and Michael Gubanov. 2019. Hybrid.Poly: A Consolidated Interactive Analytical Polystore System. In 2019 IEEE 35th International Conference on Data Engineering (ICDE '19). 1996--1999. Google ScholarGoogle ScholarCross RefCross Ref
  66. Calton Pu. 1988. Superdatabases for Composition of Heterogeneous Databases. In Proceedings of the Fourth International Conference on Data Engineering. IEEE Computer Society, USA, 548--555.Google ScholarGoogle ScholarCross RefCross Ref
  67. Mary Tork Roth, Laura M Haas, and Fatma Ozcan. 1999. Cost Models Do Matter: Providing Cost Information for Diverse Data Sources in a Federated System. IBM Thomas J. Watson Research Division.Google ScholarGoogle Scholar
  68. P. Griffiths Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. 1979. Access Path Selection in a Relational Database Management System. In Proceedings of the 1979 ACM SIGMOD International Conference on Management of Data (SIGMOD '79) (Boston, Massachusetts) (SIGMOD '79). Association for Computing Machinery, New York, NY, USA, 23--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Amit P Sheth and James A Larson. 1990. Federated Database Systems for Managing Distributed, Heterogeneous, and Autonomous Databases. ACM Computing Surveys (CSUR) 22, 3 (1990), 183--236.Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Vishal Sikka, Franz Färber, Wolfgang Lehner, Sang Kyun Cha, Thomas Peh, and Christof Bornhövd. 2012. Efficient Transaction Processing in SAP HANA Database: The End of a Column Store Myth. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (Scottsdale, Arizona, USA) (SIGMOD '12). Association for Computing Machinery, New York, NY, USA, 731--742. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Michael Stonebraker and Ugur Cetintemel. 2005. "One Size Fits All": An Idea Whose Time Has Come and Gone. In Proceedings of the 21st International Conference on Data Engineering (ICDE '05). IEEE Computer Society, USA, 2--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Ji Sun and Guoliang Li. 2019. An End-to-End Learning-based Cost Estimator. Proceedings of the VLDB Endowment 13, 3 (2019).Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Rebecca Taft, Nosayba El-Sayed, Marco Serafini, Yu Lu, Ashraf Aboulnaga, Michael Stonebraker, Ricardo Mayerhofer, and Francisco Andrade. 2018. P-Store: An Elastic Database System with Predictive Provisioning. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD '18) (Houston, TX, USA) (SIGMOD '18). Association for Computing Machinery, New York, NY, USA, 205--219. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Anthony Tomasic, Remy Amouroux, Philippe Bonnet, Olga Kapitskaia, Hubert Naacke, and Louiqa Raschid. 1997. The Distributed Information Search Component (Disco) and the World Wide Web. ACM SIGMOD Record 26, 2 (1997), 546--548.Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Immanuel Trummer. 2022. CodexDB: Generating Code for Processing SQL Queries using GPT-3 Codex. arXiv:2204.08941 [cs.DB]Google ScholarGoogle Scholar
  76. Immanuel Trummer. 2022. DB-BERT: A Database Tuning Tool That "Reads the Manual". In Proceedings of the 2022 International Conference on Management of Data (Philadelphia, PA, USA) (SIGMOD '22). Association for Computing Machinery, New York, NY, USA, 190--203. Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Stephen Tu, Wenting Zheng, Eddie Kohler, Barbara Liskov, and Samuel Madden. 2013. Speedy Transactions in Multicore In-Memory Databases. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP '13). 18--32.Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Dana Van Aken, Andrew Pavlo, Geoffrey J. Gordon, and Bohan Zhang. 2017. Automatic Database Management System Tuning Through Large-Scale Machine Learning. In Proceedings of the 2017 ACM International Conference on Management of Data (Chicago, Illinois, USA) (SIGMOD '17). Association for Computing Machinery, New York, NY, USA, 1009--1024. Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. Marco Vogt, Alexander Stiemer, and Heiko Schuldt. 2018. Polypheny-DB: Towards a Distributed and Self-Adaptive Polystore. In 2018 IEEE International Conference on Big Data (Big Data). IEEE, 3364--3373.Google ScholarGoogle ScholarCross RefCross Ref
  80. Jingjing Wang, Tobin Baker, Magdalena Balazinska, Daniel Halperin, Brandon Haynes, Bill Howe, Dylan Hutchison, Shrainik Jain, Ryan Maas, Parmita Mehta, Dominik Moritz, Brandon Myers, Jennifer Ortiz, Dan Suciu, Andrew Whitaker, and Shengliang Xu. 2017. The Myria Big Data Management and Analytics System and Cloud Services. In Proceedings of the Conference on Innovative Data Systems Research (CIDR '17).Google ScholarGoogle Scholar
  81. Christopher J. C. H. Watkins and Peter Dayan. 1992. Q-learning. Machine Learning 8, 3 (1992), 279--292. Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. Ronald J. Williams. 1992. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning. Mach. Learn. 8, 3--4 (May 1992), 229--256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. Ziniu Wu, Parimarjan Negi, Mohammad Alizadeh, Tim Kraska, and Samuel Madden. 2023. FactorJoin: A New Cardinality Estimation Framework for Join Queries. Proc. ACM Manag. Data 1, 1, Article 41 (May 2023), 27 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. Geoffrey X. Yu, Markos Markakis, Andreas Kipf, Per-Åke Larson, Umar Farooq Minhas, and Tim Kraska. 2022. TreeLine: An Update-In-Place Key-Value Store for Modern Storage. Proceedings of the VLDB Endowment 16, 1 (2022), 99--112.Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. Xiang Yu, Guoliang Li, Chengliang Chai, and Nan Tang. 2020. Reinforcement Learning with Tree-LSTM for Join Order Selection. In 2020 IEEE 36th International Conference on Data Engineering (ICDE). IEEE, 1297--1308.Google ScholarGoogle Scholar
  86. Jianqiu Zhang, Kaisong Huang, Tianzheng Wang, and King Lv. 2022. Skeena: Efficient and Consistent Cross-Engine Transactions. In Proceedings of the 2022 International Conference on Management of Data (Philadelphia, PA, USA) (SIGMOD '22). Association for Computing Machinery, New York, NY, USA, 34--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. Ji Zhang, Yu Liu, Ke Zhou, Guoliang Li, Zhili Xiao, Bin Cheng, Jiashu Xing, Yangtao Wang, Tianheng Cheng, Li Liu, Minwei Ran, and Zekang Li. 2019. An End-to-End Automatic Cloud Database Tuning System Using Deep Reinforcement Learning. In Proceedings of the 2019 International Conference on Management of Data (Amsterdam, Netherlands) (SIGMOD '19). Association for Computing Machinery, New York, NY, USA, 415--432. Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. Xiuwen Zheng, Subhasis Dasgupta, Arun Kumar, and Amarnath Gupta. 2022. AWESOME: Empowering Scalable Data Science on Social Media Data with an Optimized Tri-Store Data System. arXiv:2112.00833 [cs.DB]Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 16, Issue 11
    July 2023
    789 pages
    ISSN:2150-8097
    Issue’s Table of Contents

    Publisher

    VLDB Endowment

    Publication History

    • Published: 1 July 2023
    Published in pvldb Volume 16, Issue 11

    Check for updates

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader