Artículos relacionados a Big Data With Pyspark: Processing Large Datasets: A...

Big Data With Pyspark: Processing Large Datasets: A Hands-On Guide To Distributed Data Engineering, Machine Learning And Big Data Pipelines With Apache Spark And Python - Tapa blanda

 
9798290030715: Big Data With Pyspark: Processing Large Datasets: A Hands-On Guide To Distributed Data Engineering, Machine Learning And Big Data Pipelines With Apache Spark And Python

Sinopsis

You'll Learn

  • Understand the Foundations of Big Data and Distributed Computing: Gain a solid grasp of Big Data concepts, including the 5 Vs, the challenges of traditional systems, and the fundamental principles of distributed computing like parallelism, fault tolerance, and scalability.

  • Master the PySpark Ecosystem: Learn the architecture of Apache Spark, its core components (Spark SQL, Structured Streaming, MLlib, GraphFrames), and how the PySpark API seamlessly integrates with Python.

  • Set Up Your PySpark Environment: Get hands-on experience setting up a complete development environment on your local machine and learn how to run applications in various cloud platforms like Databricks, AWS EMR, and Google Cloud Dataproc.

  • Process Data with RDDs and DataFrames: Master Spark's core data structures, from the low-level RDDs to the powerful and optimized DataFrames. Learn to apply a wide range of transformations and actions for data manipulation.

  • Perform Advanced Data Wrangling and Feature Engineering: Acquire skills in data cleaning, handling missing values and duplicates, and performing complex transformations using Spark SQL, Window Functions, and User-Defined Functions (UDFs), including high-performance Pandas UDFs.

  • Connect to Diverse Data Sources: Read and write data from various formats (CSV, JSON, Parquet) and connect to external systems like relational databases (JDBC), NoSQL stores (Cassandra, MongoDB), and cloud storage (S3, ADLS).

  • Build Real-Time Data Pipelines: Implement modern, fault-tolerant data ingestion with Structured Streaming, including handling event time, watermarking, and performing stateful transformations for real-time analytics.

  • Apply Machine Learning at Scale with MLlib: Learn to build and evaluate distributed machine learning pipelines for classification, regression, and clustering tasks using Spark's MLlib library.

  • Analyze Graph-Structured Data: Explore the power of GraphFrames to model and analyze complex relationships, run graph algorithms like PageRank, and find patterns in network data.

  • Optimize PySpark Applications for Performance: Dive deep into performance tuning, including understanding DAGs and shuffles, managing partitioning, optimizing joins, and configuring memory settings to make your code run faster and more efficiently.

  • Monitor, Debug, and Deploy Applications: Utilize the Spark UI to monitor your jobs, troubleshoot common errors, and learn to package and deploy your PySpark applications to different cluster managers like YARN and Kubernetes.

  • Solve Real-World Big Data Problems: Apply your knowledge through practical case studies, including building a recommendation engine, a real-time fraud detection system, and an ETL pipeline, to solidify your skills and build a portfolio.

"Sinopsis" puede pertenecer a otra edición de este libro.

Comprar nuevo

Ver este artículo

EUR 6,79 gastos de envío desde Estados Unidos de America a España

Destinos, gastos y plazos de envío

Resultados de la búsqueda para Big Data With Pyspark: Processing Large Datasets: A...

Imagen de archivo

Publishing, PythQuill
Publicado por Independently published, 2025
ISBN 13: 9798290030715
Nuevo Tapa blanda
Impresión bajo demanda

Librería: California Books, Miami, FL, Estados Unidos de America

Calificación del vendedor: 5 de 5 estrellas Valoración 5 estrellas, Más información sobre las valoraciones de los vendedores

Condición: New. Print on Demand. Nº de ref. del artículo: I-9798290030715

Contactar al vendedor

Comprar nuevo

EUR 20,11
Convertir moneda
Gastos de envío: EUR 6,79
De Estados Unidos de America a España
Destinos, gastos y plazos de envío

Cantidad disponible: Más de 20 disponibles

Añadir al carrito