Data streaming / India / Kolkata / 2 Years Experience

Pyspark Streaming

Roles and Responsibilities

JOB DESCRIPTION:

Spark/Scala/PySpark developer who knows how to fully exploit the potential of our Spark cluster.
Should have ability to clean, transform, and analyze vast amounts of raw data from various systems using Spark to provide ready-to-use data.

RESPONSIBILTIES:

Create Scala/Spark/Pyspark jobs for data transformation and aggregation Produce unit tests for Spark transformations and helper methods Write Scaladoc-style documentation with all code Design data processing pipelines

SKILLS:

Pyspark

Scala (with a focus on the functional programming paradigm)

Apache Spark 2.x, 3.x

Apache Spark RDD API

Apache Spark SQL DataFrame API

Apache Spark Streaming API

Spark query tuning and performance optimization

SQL database integration (Postgres, and/or MySQL)

Experience working with HDFS, AWS ( S3, Redshift, EMR , IAM , Polices , Routing)

CI-CD Pipleline, Jenkins, Gitlab /Bitbucket Deep understanding of distributed systems (e.g. partitioning, replication, consistency, and consensus)