The candidate will be responsible for developing, testing, and deploying scalable and reliable data processing applications using Scala and Spark.
The candidate will also need to optimize the performance of Spark jobs, transformations, and data structures, and handle data skew, partitioning, shuffling, and data locality issues.
The candidate will work in an agile environment, follow the SDLC, and adhere to the configuration and release management process.
Required Skills
Deep understanding of functional programming and frameworks like Akka, type systems, collections, monads, and higher order functions.
Expertise in working with Spark SQL, Spark Core, DataFrames, and RDDs.
Experience in working with various data sources and formats, such as JSON, CSV, Parquet, ORC, etc.
Knowledge of Spark tuning, debugging, and monitoring tools, such as Spark UI, Spark History Server, Spark Shell, etc.
Familiarity with cloud platforms, such as Azure, or GCP, and their data services
Good understanding of distributed systems concepts, such as concurrency, parallelism, fault tolerance, consistency, and scalability.