research-article

Check Out the Big Brain on BRAD: Simplifying Cloud Data Processing with Learned Automated Data Meshes

Authors:
Tim Kraska

MIT CSAIL and Amazon Web Services

MIT CSAIL and Amazon Web Services
View Profile

,
Tianyu Li

MIT CSAIL

MIT CSAIL
View Profile

,
Samuel Madden

MIT CSAIL

MIT CSAIL
View Profile

,
Markos Markakis

MIT CSAIL

MIT CSAIL
View Profile

,
Amadou Ngom

MIT CSAIL

MIT CSAIL
View Profile

,
Ziniu Wu

MIT CSAIL

MIT CSAIL
View Profile

,
Geoffrey X. Yu

MIT CSAIL

MIT CSAIL
View Profile

Authors Info & Claims

Proceedings of the VLDB Endowment Volume 16 Issue 11pp 3293–3301https://doi.org/10.14778/3611479.3611526

Published:01 July 2023Publication History

Proceedings of the VLDB Endowment

Abstract

The last decade of database research has led to the prevalence of specialized systems for different workloads. Consequently, organizations often rely on a combination of specialized systems, organized in a Data Mesh. Data meshes present significant challenges for system administrators, including picking the right system for each workload, moving data between systems, maintaining consistency, and correctly configuring each system. Many non-expert end users (e.g., data analysts or app developers) either cannot solve their business problems, or suffer from sub-optimal performance or cost due to this complexity. We envision BRAD, a cloud system that automatically integrates and manages data and systems into an instance-optimized data mesh, allowing users to efficiently store and query data under a unified data model (i.e., relational tables) without knowledge of underlying system details. With machine learning, BRAD automatically deduces the strengths and weaknesses of each engine through a combination of offline training and online probing. Then, BRAD uses these insights to route queries to the most suitable (combination of) system(s) for efficient execution. Furthermore, BRAD automates configuration tuning, resource scaling, and data migration across component systems, and makes recommendations for more impactful decisions, such as adding or removing systems. As such, BRAD exemplifies a new class of systems that utilize machine learning and the cloud to make complex data processing more accessible to end users, raising numerous new problems in database systems, machine learning, and the cloud.

References

Michael Abebe, Horatiu Lazu, and Khuzaima Daudjee. 2022. Proteus: Autonomous Adaptive Storage for Mixed Workloads. In Proceedings of the 2022 International Conference on Management of Data (SIGMOD '22) (Philadelphia, PA, USA). Association for Computing Machinery, New York, NY, USA, 700--714. Google ScholarDigital Library
Divy Agrawal, Sanjay Chawla, Bertty Contreras-Rojas, Ahmed Elmagarmid, Yasser Idris, Zoi Kaoudi, Sebastian Kruse, Ji Lucas, Essam Mansour, Mourad Ouzzani, Paolo Papotti, Jorge-Arnulfo Quiané-Ruiz, Nan Tang, Saravanan Thirumuruganathan, and Anis Troudi. 2018. RHEEM: Enabling Cross-Platform Data Processing: May the Big Data Be with You! Proceedings of the VLDB Endowment 11, 11 (July 2018), 1414--1427. Google ScholarDigital Library
Rana Alotaibi, Damian Bursztyn, Alin Deutsch, Ioana Manolescu, and Stamatis Zampetakis. 2019. Towards Scalable Hybrid Stores: Constraint-Based Rewriting to the Rescue. In Proceedings of the 2019 International Conference on Management of Data (SIGMOD '19). 1660--1677.Google ScholarDigital Library
Amazon Web Services. 2022. AWS announces Amazon Aurora zero-ETL integration with Amazon Redshift . https://aws.amazon.com/about-aws/whats-new/2022/11/amazon-aurora-zero-etl-integration-redshift/.Google Scholar
Amazon Web Services. 2023. Amazon Athena. https://aws.amazon.com/athena/.Google Scholar
Amazon Web Services. 2023. Amazon EMR. https://aws.amazon.com/emr/.Google Scholar
Amazon Web Services. 2023. Amazon Redshift Serverless. https://aws.amazon.com/redshift/redshift-serverless/.Google Scholar
Amazon Web Services. 2023. AWS Step Functions. https://aws.amazon.com/step-functions/.Google Scholar
Amazon Web Services. 2023. Redshift Concurrency Scaling. https://docs.aws.amazon.com/redshift/latest/dg/concurrency-scaling.html.Google Scholar
Amazon Web Services. 2023. What is AWS Glue? https://docs.aws.amazon.com/glue/latest/dg/what-is-glue.html.Google Scholar
Michael Armbrust, Tathagata Das, Liwen Sun, Burak Yavuz, Shixiong Zhu, Mukul Murthy, Joseph Torres, Herman van Hovell, Adrian Ionescu, Alicja Łuszczak, Michał undefinedwitakowski, Michał Szafrański, Xiao Li, Takuya Ueshin, Mostafa Mokhtar, Peter Boncz, Ali Ghodsi, Sameer Paranjpye, Pieter Senster, Reynold Xin, and Matei Zaharia. 2020. Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores. Proceedings of the VLDB Endowment 13, 12 (2020), 3411--3424. Google ScholarDigital Library
Michael Armbrust, Ali Ghodsi, Reynold Xin, and Matei Zaharia. 2021. Lake-house: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics. In Proceedings of the 11th Annual Conference on Innovative Data Systems Research (CIDR '21).Google Scholar
Graham Bent, Patrick Dantressangle, David Vyvyan, Abbe Mowshowitz, and Valia Mitsou. 2008. A Dynamic Distributed Federated Database. In Proc. 2nd Ann. Conf. International Technology Alliance (ACITA '08').Google Scholar
Philip A. Bernstein, Vassos Hadzilacos, and Nathan Goodman. 1987. Concurrency Control and Recovery in Database Systems. Addison-Wesley.Google ScholarDigital Library
Yuri Breitbart, Hector Garcia-Molina, and Abraham Silberschatz. 1992. Overview of Multidatabase Transaction Management. VLDB Journal 1 (10 1992), 181--239. Google ScholarDigital Library
Yuri Breitbart and Avi Silberschatz. 1988. Multidatabase Update Issues. In Proceedings of the 1988 ACM SIGMOD International Conference on Management of Data (Chicago, Illinois, USA) (SIGMOD '88). Association for Computing Machinery, New York, NY, USA, 135--142. Google ScholarDigital Library
Sebastian Burckhardt, Chris Gillum, David Justo, Konstantinos Kallas, Connor McMahon, and Christopher S Meiklejohn. 2021. Durable Functions: Semantics for Stateful Serverless. Proc. ACM Program. Lang. 5, OOPSLA (2021), 1--27.Google ScholarDigital Library
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Josh Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, and Wojciech Zaremba. 2021. Evaluating Large Language Models Trained on Code. arXiv:2107.03374 [cs.LG]Google Scholar
Khuzaima Daudjee and Kenneth Salem. 2006. Lazy Database Replication with Snapshot Isolation. Proceedings of the VLDB Endowment (VLDB '06).Google Scholar
Z. Dehghani. 2022. Data Mesh. O'Reilly Media. https://books.google.com/books?id=jmZjEAAAQBAJGoogle Scholar
Amol Deshpande and Joseph M Hellerstein. 2002. Decoupled Query Optimization for Federated Database Systems. In Proceedings 18th International Conference on Data Engineering (ICDE '02). IEEE, 716--727.Google ScholarCross Ref
Jialin Ding, Umar Farooq Minhas, Badrish Chandramouli, Chi Wang, Yinan Li, Ying Li, Donald Kossmann, Johannes Gehrke, and Tim Kraska. 2021. Instance-Optimized Data Layouts for Cloud Analytics Workloads. In Proceedings of the 2021 International Conference on Management of Data (Virtual Event, China) (SIGMOD '21). Association for Computing Machinery, New York, NY, USA, 418--431. Google ScholarDigital Library
Jialin Ding, Vikram Nathan, Mohammad Alizadeh, and Tim Kraska. 2020. Tsunami: A Learned Multi-Dimensional Index for Correlated Data and Skewed Workloads. Proceedings of the VLDB Endowment 14, 2 (November 2020), 74--86. Google ScholarDigital Library
Jennie Duggan, Aaron J. Elmore, Michael Stonebraker, Magda Balazinska, Bill Howe, Jeremy Kepner, Sam Madden, David Maier, Tim Mattson, and Stan Zdonik. 2015. The BigDAWG Polystore System. SIGMOD Rec. 44, 2 (August 2015), 11--16. Google ScholarDigital Library
Aaron J. Elmore, Jennie Duggan, Mike Stonebraker, Magdalena Balazinska, Ugur Çetintemel, Vijay Gadepally, Jeffrey Heer, Bill Howe, Jeremy Kepner, Tim Kraska, Samuel Madden, David Maier, Timothy G. Mattson, Stavros Papadopoulos, Jeff Parkhurst, Nesime Tatbul, Manasi Vartak, and Stan Zdonik. 2015. A Demonstration of the BigDAWG Polystore System. Proceedings of the VLDB Endowment 8, 12 (2015), 1908--1911. http://www.vldb.org/pvldb/vol8/p1908-Elmore.pdfGoogle ScholarDigital Library
Franz Färber, Norman May, Wolfgang Lehner, Philipp Große, Ingo Müller, Hannes Rauhe, and Jonathan Dees. 2012. The SAP HANA Database - An Architecture Overview. IEEE Data Eng. Bull. 35 (03 2012), 28--33.Google Scholar
Gartner. 2022. DBMS Market Transformation 2021: The Big Picture. https://blogs.gartner.com/merv-adrian/2022/04/16/dbms-market-transformation-2021-the-big-picture/.Google Scholar
Dimitrios Georgakopoulos, Marek Rusinkiewicz, and Amit P. Sheth. 1991. On Serializability of Multidatabase Transactions Through Forced Local Conflicts. In Proceedings of the Seventh International Conference on Data Engineering (ICDE '91). IEEE Computer Society, USA, 314--323.Google Scholar
Victor Giannakouris and Immanuel Trummer. 2022. Building Learned Federated Query Optimizers. In CEUR workshop proceedings, Vol. 3186.Google Scholar
Google, Inc. 2023. AlloyDB. https://cloud.google.com/alloydb.Google Scholar
Laura Haas, Donald Kossmann, Edward Wimmers, and Jun Yang. 1997. Optimizing Queries Across Diverse Data Sources. In Proceedings of the VLDB Endowment (VLDB '97).Google Scholar
Joachim Hammer, Hector Garcia-Molina, Kelly Ireland, Yannis Papakonstantinou, Jeffrey Ullman, and Jennifer Widom. 1995. Information Translation, Mediation, and Mosaic-Based Browsing in the TSIMMIS System. In Proceedings of the International Conference on Management of Data (SIGMOD '95).Google ScholarDigital Library
Benjamin Hilprecht and Carsten Binnig. 2022. Zero-Shot Cost Models for Out-of-the-box Learned Cost Prediction. arXiv preprint arXiv:2201.00561 (2022).Google Scholar
Benjamin Hilprecht, Andreas Schmidt, Moritz Kulessa, Alejandro Molina, Kristian Kersting, and Carsten Binnig. 2019. Deepdb: Learn from data, not from queries! arXiv preprint arXiv:1909.00607 (2019).Google Scholar
Dongxu Huang, Qi Liu, Qiu Cui, Zhuhe Fang, Xiaoyu Ma, Fei Xu, Li Shen, Liu Tang, Yuxing Zhou, Menglong Huang, Wan Wei, Cong Liu, Jian Zhang, Jianjun Li, Xuelian Wu, Lingyu Song, Ruoxi Sun, Shuaipeng Yu, Lei Zhao, Nicholas Cameron, Liquan Pei, and Xin Tang. 2020. TiDB: A Raft-Based HTAP Database. Proceedings of the VLDB Endowment 13, 12 (August 2020), 3072--3084. Google ScholarDigital Library
S.-Y. Hwang, E.-P. Lim, H.-R. Yang, S. Musukula, K. Mediratta, M. Ganesh, D. Clements, J. Stenoien, and J. Srivastava. 1994. The MYRIAD Federated Database Prototype. In Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data (Minneapolis, Minnesota, USA) (SIGMOD '94). Association for Computing Machinery, New York, NY, USA, 518. Google ScholarDigital Library
Eric Jonas, Johann Schleier-Smith, Vikram Sreekanti, Chia che Tsai, Anurag Khandelwal, Qifan Pu, Vaishaal Shankar, Joao Carreira, Karl Krauth, Neeraja Jayant Yadwadkar, Joseph Gonzalez, Raluca A. Popa, Ion Stoica, and David A. Patterson. 2019. Cloud Programming Simplified: A Berkeley View on Serverless Computing. ArXiv abs/1902.03383 (2019).Google Scholar
Vanja Josifovski, Peter Schwarz, Laura Haas, and Eileen Lin. 2002. Garlic: A New Flavor of Federated Query Processing for DB2. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data (SIGMOD '02). 524--532.Google ScholarDigital Library
Konstantinos Kanellis, Ramnatthan Alagappan, and Shivaram Venkataraman. 2020. Too Many Knobs to Tune? Towards Faster Database Tuning by Pre-selecting Important Knobs. In 12th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage '20).Google Scholar
Konstantinos Kanellis, Cong Ding, Brian Kroth, Andreas Müller, Carlo Curino, and Shivaram Venkataraman. 2022. LlamaTune: Sample-Efficient DBMS Configuration Tuning. Proceedings of the VLDB Endowment 15, 11 (2022), 2953--2965.Google ScholarDigital Library
Alfons Kemper and Thomas Neumann. 2011. HyPer: A Hybrid OLTP & OLAP Main Memory Database System Based on Virtual Memory Snapshots. In Proceedings of the 2011 IEEE 27th International Conference on Data Engineering (ICDE '11). IEEE Computer Society, USA, 195--206. Google ScholarDigital Library
Tim Kraska, Mohammad Alizadeh, Alex Beutel, Ed H. Chi, Ani Kristo, Guillaume Leclerc, Samuel Madden, Hongzi Mao, and Vikram Nathan. 2019. SageDB: A Learned Database System. In 9th Biennial Conference on Innovative Data Systems Research, (CIDR '19), Asilomar, CA, USA, January 13--16, 2019, Online Proceedings. www.cidrdb.org. http://cidrdb.org/cidr2019/papers/p117-kraska-cidr19.pdfGoogle Scholar
Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, and Neoklis Polyzotis. 2017. The Case for Learned Index Structures. CoRR abs/1712.01208 (2017). arXiv:1712.01208 http://arxiv.org/abs/1712.01208Google Scholar
Sanjay Krishnan, Zongheng Yang, Ken Goldberg, Joseph Hellerstein, and Ion Stoica. 2018. Learning to Optimize Join Queries with Deep Reinforcement Learning. arXiv preprint arXiv:1808.03196 (2018).Google Scholar
Tirthankar Lahiri, Shasank Chavan, Maria Colgan, Dinesh Das, Amit Ganesh, Mike Gleeson, Sanket Hase, Allison Holloway, Jesse Kamp, Teck-Hua Lee, Juan Loaiza, Neil Macnaughton, Vineet Marwah, Niloy Mukherjee, Atrayee Mullick, Sujatha Muthulingam, Vivekanandhan Raja, Marty Roth, Ekrem Soylemez, and Mohamed Zait. 2015. Oracle Database In-Memory: A Dual Format In-Memory Database. In 2015 IEEE 31st International Conference on Data Engineering (ICDE '15). 1253--1258. Google ScholarCross Ref
Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2015. How Good are Query Optimizers, Really? Proceedings of the VLDB Endowment 9, 3 (2015), 204--215.Google ScholarDigital Library
Jiexing Li, Arnd Christian König, Vivek Narasayya, and Surajit Chaudhuri. 2012. Robust Estimation of Resource Consumption for SQL Queries Using Statistical Techniques. Proceedings of the VLDB Endowment 5, 11 (2012).Google ScholarDigital Library
Ee-Peng Lim and Jaideep Srivastava. 1993. Query Optimization and Processing in Federated Database Systems. In Proceedings of the Second International Conference on Information and Knowledge Management (CIKM '93). 720--722.Google ScholarDigital Library
Wan Shen Lim, Matthew Butrovich, William Zhang, Andrew Crotty, Lin Ma, Peijing Xu, Johannes Gehrke, and Andrew Pavlo. 2023. Database Gyms. In Conference on Innovative Data Systems Research (CIDR '23).Google Scholar
Lin Ma, Bailu Ding, Sudipto Das, and Adith Swaminathan. 2020. Active Learning for ML Enhanced Database Systems. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (SIGMOD '20). 175--191.Google ScholarDigital Library
Lin Ma, Dana Van Aken, Ahmed Hefny, Gustavo Mezerhane, Andrew Pavlo, and Geoffrey J Gordon. 2018. Query-Based Workload Forecasting for Self-Driving Database Management Systems. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD '18). 631--645.Google ScholarDigital Library
Ryan Marcus, Parimarjan Negi, Hongzi Mao, Nesime Tatbul, Mohammad Alizadeh, and Tim Kraska. 2022. Bao: Making Learned Query Optimization Practical. In Proceedings of the International Conference on Management of Data (SIGMOD '22).Google ScholarDigital Library
Ryan Marcus, Parimarjan Negi, Hongzi Mao, Chi Zhang, Mohammad Alizadeh, Tim Kraska, Olga Papaemmanouil, and Nesime Tatbul. 2019. Neo: A Learned Query Optimizer. Proceedings of the VLDB Endowment 12, 11 (2019).Google ScholarDigital Library
Ryan Marcus and Olga Papaemmanouil. 2018. Deep Reinforcement Learning for Join Order Enumeration. In Proceedings of the First International Workshop on Exploiting Artificial Intelligence Techniques for Data Management (aiDM '18).Google ScholarDigital Library
Ryan Marcus and Olga Papaemmanouil. 2019. Plan-Structured Deep Neural Network Models for Query Performance Prediction. Proceedings of the VLDB Endowment 12, 11 (2019).Google ScholarDigital Library
Microsoft Corporation. 2023. Serverless Compute Tier for Azure SQL Database. https://learn.microsoft.com/en-us/azure/azure-sql/database/serverless-tier-overview?view=azuresql&tabs=general-purpose.Google Scholar
Vikram Nathan, Jialin Ding, Mohammad Alizadeh, and Tim Kraska. 2020. Learning Multi-Dimensional Indexes. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (Portland, OR, USA) (SIGMOD '20). Association for Computing Machinery, New York, NY, USA, 985--1000. Google ScholarDigital Library
Parimarjan Negi, Ryan Marcus, Andreas Kipf, Hongzi Mao, Nesime Tatbul, Tim Kraska, and Mohammad Alizadeh. 2021. Flow-Loss: Learning Cardinality Estimates That Matter. Proceedings of the VLDB Endowment 14, 11 (2021).Google ScholarDigital Library
Parimarjan Negi, Ziniu Wu, Andreas Kipf, Nesime Tatbul, Ryan Marcus, Sam Madden, Tim Kraska, and Mohammad Alizadeh. 2023. Robust Query Driven Cardinality Estimation under Changing Workloads. Proceedings of the VLDB Endowment 16, 6 (2023), 1520--1533.Google ScholarDigital Library
Patrick O'Neil, Betty O'Neil, and Xuedong Chen. 2006. Star Schema Benchmark. Technical Report. University of Massachusetts Boston. https://www.cs.umb.edu/~poneil/StarSchemaB.PDF.Google Scholar
Oracle. 2023. Oracle Autonomous Database. https://www.oracle.com/autonomous-database/.Google Scholar
Andrew Pavlo, Gustavo Angulo, Joy Arulraj, Haibin Lin, Jiexi Lin, Lin Ma, Prashanth Menon, Todd Mowry, Matthew Perron, Ian Quah, Siddharth Santurkar, Anthony Tomasic, Skye Toor, Dana Van Aken, Ziqi Wang, Yingjun Wu, Ran Xian, and Tieying Zhang. 2017. Self-Driving Database Management Systems. In Conference on Innovative Data Systems Research (CIDR '17). https://db.cs.cmu.edu/papers/2017/p42-pavlo-cidr17.pdfGoogle Scholar
Andrew Pavlo, Matthew Butrovich, Ananya Joshi, Lin Ma, Prashanth Menon, Dana Van Aken, Lisa Lee, and Ruslan Salakhutdinov. 2019. External vs. Internal: An Essay on Machine Learning Agents for Autonomous Database Management Systems. IEEE Data Engineering Bulletin (June 2019), 32--46. https://db.cs.cmu.edu/papers/2019/pavlo-icde-bulletin2019.pdfGoogle Scholar
Andrew Pavlo, Matthew Butrovich, Lin Ma, Wan Shen Lim, Prashanth Menon, Dana Van Aken, and William Zhang. 2021. Make Your Database System Dream of Electric Sheep: Towards Self-Driving Operation. Proceedings of the VLDB Endowment 14, 12 (2021), 3211--3221. https://db.cs.cmu.edu/papers/2021/p3211-pavlo.pdfGoogle ScholarDigital Library
Maksim Podkorytov and Michael Gubanov. 2019. Hybrid.Poly: A Consolidated Interactive Analytical Polystore System. In 2019 IEEE 35th International Conference on Data Engineering (ICDE '19). 1996--1999. Google ScholarCross Ref
Calton Pu. 1988. Superdatabases for Composition of Heterogeneous Databases. In Proceedings of the Fourth International Conference on Data Engineering. IEEE Computer Society, USA, 548--555.Google ScholarCross Ref
Mary Tork Roth, Laura M Haas, and Fatma Ozcan. 1999. Cost Models Do Matter: Providing Cost Information for Diverse Data Sources in a Federated System. IBM Thomas J. Watson Research Division.Google Scholar
P. Griffiths Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. 1979. Access Path Selection in a Relational Database Management System. In Proceedings of the 1979 ACM SIGMOD International Conference on Management of Data (SIGMOD '79) (Boston, Massachusetts) (SIGMOD '79). Association for Computing Machinery, New York, NY, USA, 23--34. Google ScholarDigital Library
Amit P Sheth and James A Larson. 1990. Federated Database Systems for Managing Distributed, Heterogeneous, and Autonomous Databases. ACM Computing Surveys (CSUR) 22, 3 (1990), 183--236.Google ScholarDigital Library
Vishal Sikka, Franz Färber, Wolfgang Lehner, Sang Kyun Cha, Thomas Peh, and Christof Bornhövd. 2012. Efficient Transaction Processing in SAP HANA Database: The End of a Column Store Myth. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (Scottsdale, Arizona, USA) (SIGMOD '12). Association for Computing Machinery, New York, NY, USA, 731--742. Google ScholarDigital Library
Michael Stonebraker and Ugur Cetintemel. 2005. "One Size Fits All": An Idea Whose Time Has Come and Gone. In Proceedings of the 21st International Conference on Data Engineering (ICDE '05). IEEE Computer Society, USA, 2--11. Google ScholarDigital Library
Ji Sun and Guoliang Li. 2019. An End-to-End Learning-based Cost Estimator. Proceedings of the VLDB Endowment 13, 3 (2019).Google ScholarDigital Library
Rebecca Taft, Nosayba El-Sayed, Marco Serafini, Yu Lu, Ashraf Aboulnaga, Michael Stonebraker, Ricardo Mayerhofer, and Francisco Andrade. 2018. P-Store: An Elastic Database System with Predictive Provisioning. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD '18) (Houston, TX, USA) (SIGMOD '18). Association for Computing Machinery, New York, NY, USA, 205--219. Google ScholarDigital Library
Anthony Tomasic, Remy Amouroux, Philippe Bonnet, Olga Kapitskaia, Hubert Naacke, and Louiqa Raschid. 1997. The Distributed Information Search Component (Disco) and the World Wide Web. ACM SIGMOD Record 26, 2 (1997), 546--548.Google ScholarDigital Library
Immanuel Trummer. 2022. CodexDB: Generating Code for Processing SQL Queries using GPT-3 Codex. arXiv:2204.08941 [cs.DB]Google Scholar
Immanuel Trummer. 2022. DB-BERT: A Database Tuning Tool That "Reads the Manual". In Proceedings of the 2022 International Conference on Management of Data (Philadelphia, PA, USA) (SIGMOD '22). Association for Computing Machinery, New York, NY, USA, 190--203. Google ScholarDigital Library
Stephen Tu, Wenting Zheng, Eddie Kohler, Barbara Liskov, and Samuel Madden. 2013. Speedy Transactions in Multicore In-Memory Databases. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP '13). 18--32.Google ScholarDigital Library
Dana Van Aken, Andrew Pavlo, Geoffrey J. Gordon, and Bohan Zhang. 2017. Automatic Database Management System Tuning Through Large-Scale Machine Learning. In Proceedings of the 2017 ACM International Conference on Management of Data (Chicago, Illinois, USA) (SIGMOD '17). Association for Computing Machinery, New York, NY, USA, 1009--1024. Google ScholarDigital Library
Marco Vogt, Alexander Stiemer, and Heiko Schuldt. 2018. Polypheny-DB: Towards a Distributed and Self-Adaptive Polystore. In 2018 IEEE International Conference on Big Data (Big Data). IEEE, 3364--3373.Google ScholarCross Ref
Jingjing Wang, Tobin Baker, Magdalena Balazinska, Daniel Halperin, Brandon Haynes, Bill Howe, Dylan Hutchison, Shrainik Jain, Ryan Maas, Parmita Mehta, Dominik Moritz, Brandon Myers, Jennifer Ortiz, Dan Suciu, Andrew Whitaker, and Shengliang Xu. 2017. The Myria Big Data Management and Analytics System and Cloud Services. In Proceedings of the Conference on Innovative Data Systems Research (CIDR '17).Google Scholar
Christopher J. C. H. Watkins and Peter Dayan. 1992. Q-learning. Machine Learning 8, 3 (1992), 279--292. Google ScholarDigital Library
Ronald J. Williams. 1992. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning. Mach. Learn. 8, 3--4 (May 1992), 229--256. Google ScholarDigital Library
Ziniu Wu, Parimarjan Negi, Mohammad Alizadeh, Tim Kraska, and Samuel Madden. 2023. FactorJoin: A New Cardinality Estimation Framework for Join Queries. Proc. ACM Manag. Data 1, 1, Article 41 (May 2023), 27 pages. Google ScholarDigital Library
Geoffrey X. Yu, Markos Markakis, Andreas Kipf, Per-Åke Larson, Umar Farooq Minhas, and Tim Kraska. 2022. TreeLine: An Update-In-Place Key-Value Store for Modern Storage. Proceedings of the VLDB Endowment 16, 1 (2022), 99--112.Google ScholarDigital Library
Xiang Yu, Guoliang Li, Chengliang Chai, and Nan Tang. 2020. Reinforcement Learning with Tree-LSTM for Join Order Selection. In 2020 IEEE 36th International Conference on Data Engineering (ICDE). IEEE, 1297--1308.Google Scholar
Jianqiu Zhang, Kaisong Huang, Tianzheng Wang, and King Lv. 2022. Skeena: Efficient and Consistent Cross-Engine Transactions. In Proceedings of the 2022 International Conference on Management of Data (Philadelphia, PA, USA) (SIGMOD '22). Association for Computing Machinery, New York, NY, USA, 34--48. Google ScholarDigital Library
Ji Zhang, Yu Liu, Ke Zhou, Guoliang Li, Zhili Xiao, Bin Cheng, Jiashu Xing, Yangtao Wang, Tianheng Cheng, Li Liu, Minwei Ran, and Zekang Li. 2019. An End-to-End Automatic Cloud Database Tuning System Using Deep Reinforcement Learning. In Proceedings of the 2019 International Conference on Management of Data (Amsterdam, Netherlands) (SIGMOD '19). Association for Computing Machinery, New York, NY, USA, 415--432. Google ScholarDigital Library
Xiuwen Zheng, Subhasis Dasgupta, Arun Kumar, and Amarnath Gupta. 2022. AWESOME: Empowering Scalable Data Science on Social Media Data with an Optimized Tri-Store Data System. arXiv:2112.00833 [cs.DB]Google Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the VLDB Endowment Volume 16, Issue 11
July 2023
789 pages
ISSN:2150-8097
Editors:
Georgia Koutrika
Athena Research Center
,
Jun Yang
Duke University
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 July 2023
Published in pvldb Volume 16, Issue 11

Check for updates
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 267
  Total Downloads
- Downloads (Last 12 months)267
- Downloads (Last 6 weeks)28
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Check Out the Big Brain on BRAD: Simplifying Cloud Data Processing with Learned Automated Data Meshes

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Recommendations

Big Data Processing Using Spark in Cloud

Processing Big Data with Azure HDInsight: Building Real-World Big Data Systems on Azure HDInsight Using the Hadoop Ecosystem

Big Data Analytics

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Check Out the Big Brain on BRAD: Simplifying Cloud Data Processing with Learned Automated Data Meshes

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Recommendations

Big Data Processing Using Spark in Cloud

Processing Big Data with Azure HDInsight: Building Real-World Big Data Systems on Azure HDInsight Using the Hadoop Ecosystem

Big Data Analytics

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media