Skip to main content

2023 | Buch

Google Cloud Platform for Data Science

A Crash Course on Big Data, Machine Learning, and Data Analytics Services

verfasst von: Dr. Shitalkumar R. Sukhdeve, Sandika S. Sukhdeve

Verlag: Apress

insite
SUCHEN

Über dieses Buch

This book is your practical and comprehensive guide to learning Google Cloud Platform (GCP) for data science, using only the free tier services offered by the platform.

Data science and machine learning are increasingly becoming critical to businesses of all sizes, and the cloud provides a powerful platform for these applications. GCP offers a range of data science services that can be used to store, process, and analyze large datasets, and train and deploy machine learning models.

The book is organized into seven chapters covering various topics such as GCP account setup, Google Colaboratory, Big Data and Machine Learning, Data Visualization and Business Intelligence, Data Processing and Transformation, Data Analytics and Storage, and Advanced Topics. Each chapter provides step-by-step instructions and examples illustrating how to use GCP services for data science and big data projects.

Readers will learn how to set up a Google Colaboratory account and run Jupyter notebooks, access GCP services and data from Colaboratory, use BigQuery for data analytics, and deploy machine learning models using Vertex AI. The book also covers how to visualize data using Looker Data Studio, run data processing pipelines using Google Cloud Dataflow and Dataprep, and store data using Google Cloud Storage and SQL.

What You Will Learn

Set up a GCP account and projectExplore BigQuery and its use cases, including machine learningUnderstand Google Cloud AI Platform and its capabilities Use Vertex AI for training and deploying machine learning modelsExplore Google Cloud Dataproc and its use cases for big data processingCreate and share data visualizations and reports with Looker Data StudioExplore Google Cloud Dataflow and its use cases for batch and stream data processing Run data processing pipelines on Cloud DataflowExplore Google Cloud Storage and its use cases for data storage Get an introduction to Google Cloud SQL and its use cases for relational databases Get an introduction to Google Cloud Pub/Sub and its use cases for real-time data streaming

Who This Book Is For

Data scientists, machine learning engineers, and analysts who want to learn how to use Google Cloud Platform (GCP) for their data science and big data projects

Inhaltsverzeichnis

Frontmatter
Chapter 1. Introduction to GCP
Abstract
Google Cloud Platform (GCP) provides a comprehensive collection of cloud computing services that utilize the same robust infrastructure employed by Google’s products. In this chapter, we introduced GCP, highlighting its core services and their significance within the context of data science. We delved into a number of vital GCP services for data science, such as BigQuery, Cloud AI Platform, Cloud Dataflow, Cloud DataLab, Cloud Dataproc, Cloud Storage, and Cloud Vision API. These services each fulfill distinct roles within the data science workflow, spanning from data storage and processing to the development and deployment of machine learning models. Additionally, we touched upon the availability of free-tier options for various GCP data science services, enabling users to initiate their exploration of these capabilities without incurring any costs. It’s crucial to stay updated on the latest information regarding these free-tier offerings and pricing by referring to the GCP pricing page. By following the outlined steps, you can establish your GCP accounts and create projects, granting you access to a potent cloud computing platform for your data science endeavors. GCP’s robust infrastructure, extensive array of services, and compatibility with popular data science tools render it a compelling choice for both organizations and individuals seeking to harness the power of the cloud for data-driven projects. With GCP, you can leverage the scalability, reliability, and performance of Google’s infrastructure to address intricate data challenges and unlock valuable insights. As we progress through this book, we will delve deeper into specific GCP services and explore how they can be effectively employed for various data science tasks.
Dr. Shitalkumar R. Sukhdeve, Sandika S. Sukhdeve
Chapter 2. Google Colaboratory
Abstract
In this chapter, we will explore Google Colaboratory, a cloud-based Jupyter notebook environment that will be freely provided by Google. We will discover its key features, including its cloud-based nature, user-friendly interface, integration with Google Cloud Platform (GCP) services, sharing and collaboration capabilities, and support for GPU and TPU computing. These features will make Colaboratory an accessible and powerful tool for data scientists, software engineers, and students. We will delve into the process of creating and running Jupyter notebooks on Google Colaboratory. We will learn how to access the platform, sign up for a Google account, and navigate the Colab interface. We will also understand how to create new Colaboratory notebooks, select the runtime type, and start coding in Python. Through hands-on examples, we will practice inserting text, performing arithmetic operations, generating random numbers, and visualizing data using libraries like Matplotlib and Seaborn. Moreover, we will explore importing libraries into Colab notebooks and working with data stored in Google Drive using the google.colab library. We will learn how to import data from and write data to Google Drive, as well as access data in Google Drive through Colaboratory. Additionally, we will briefly touch upon running machine learning models on Google Colaboratory and mention the availability of powerful machine learning tools and libraries like TensorFlow and Keras. We will even explore an example of building a machine-learning model using the scikitlearn library. In conclusion, this chapter will provide an overview of Google Colaboratory and its features, guide us through the process of accessing and using Colab, and teach us various coding, data manipulation, and visualization techniques within the Colaboratory environment.
Dr. Shitalkumar R. Sukhdeve, Sandika S. Sukhdeve
Chapter 3. Big Data and Machine Learning
Abstract
Google Cloud Platform offers multiple services for the management of big data and the execution of Extract, Transform, Load (ETL) operations. Additionally, it provides tools for the training and deployment of machine learning models. Within this chapter, we will delve into BigQuery and its query execution. Moreover, we will gain insights into employing BigQuery ML for the development of machine learning models. The exploration extends to Google Cloud AI Platform, where we will gain practical experience with Vertex AI for training and deploying machine learning models.
Shitalkumar R. Sukhdeve, Sandika S. Sukhdeve
Chapter 4. Data Visualization and Business Intelligence
Abstract
In this chapter, we’ll explore data visualization and business intelligence, with a focus on Looker Data Studio and other tools for creating insightful reports and visuals. We’ll start with an introduction to Looker Data Studio and its capabilities. Then, we’ll learn how to create and share visualizations, explore BigQuery integration, build dashboards, and discover data visualization with Colab. Throughout, we’ll emphasize the importance of data visualization in driving business intelligence and decision-making.
Shitalkumar R. Sukhdeve, Sandika S. Sukhdeve
Chapter 5. Data Processing and Transformation
Abstract
In this chapter, we will explore the domain of data manipulation and alteration, centering our attention on Google Cloud Dataflow and Google Cloud Dataprep. Our journey commences with an introduction to Google Cloud Dataflow, a versatile service for both batch and stream data handling. We explore its use in real-time analytics, ETL pipelines, and event-triggered processing, understanding its pivotal role in scalable data manipulation. Moving forward, we immerse ourselves in executing data manipulation pipelines within Cloud Dataflow, comprehending the concept of pipelines and their integration with Google Cloud services like BigQuery and Cloud Pub/Sub. Our exploration then leads us to Google Cloud Dataprep, designed for data refinement, including scrubbing, normalization, and data exploration through a user-friendly visual interface, bypassing complex coding. This chapter underscores the vital role of data manipulation in the data lifecycle, utilizing Google Cloud Dataflow and Dataprep tools for processing and refining data at scale, enabling valuable insights and informed decisionmaking. Harnessing Dataflow and Dataprep, we develop expertise in creating efficient data manipulation pipelines, ensuring data integrity, and unlocking its potential for business growth and success.
Shitalkumar R. Sukhdeve, Sandika S. Sukhdeve
Chapter 6. Data Analytics and Storage
Abstract
This chapter delves into data analytics and storage, spotlighting Google Cloud Storage, Google Cloud SQL, and Google Cloud Pub/Sub. We commence with Google Cloud Storage, a scalable object storage solution. It finds utility in data archiving, hosting websites, and backups. We explore its features and compatibility with other Google services.
Shitalkumar R. Sukhdeve, Sandika S. Sukhdeve
Chapter 7. Advanced Topics
Abstract
This chapter delves into advanced aspects of securing and managing Google Cloud Platform (GCP) resources, version control using Google Cloud Source Repositories, and powerful data integration tools: Dataplex and Cloud Data Fusion.
Shitalkumar R. Sukhdeve, Sandika S. Sukhdeve
Backmatter
Metadaten
Titel
Google Cloud Platform for Data Science
verfasst von
Dr. Shitalkumar R. Sukhdeve
Sandika S. Sukhdeve
Copyright-Jahr
2023
Verlag
Apress
Electronic ISBN
978-1-4842-9688-2
Print ISBN
978-1-4842-9687-5
DOI
https://doi.org/10.1007/978-1-4842-9688-2

Premium Partner