The role requires 6+ years of overall IT experience, excellent communication and presentation skills, and 3+ years of relevant and latest experience in Scala programming and distributed Apache Spark architecture.
- The candidate will be responsible for developing, testing, and deploying scalable and reliable data processing applications using Scala and Spark.
- The candidate will also need to optimize the performance of Spark jobs, transformations, and data structures, and handle data skew, partitioning, shuffling, and data locality issues.
- The candidate will work in an agile environment, follow the SDLC, and adhere to the configuration and release management process.
Required Skills
- Deep understanding of functional programming and frameworks like Akka, type systems, collections, monads, and higher order functions.
- Expertise in working with Spark SQL, Spark Core, DataFrames, and RDDs.
- Experience in working with various data sources and formats, such as JSON, CSV, Parquet, ORC, etc.
- Knowledge of Spark tuning, debugging, and monitoring tools, such as Spark UI, Spark History Server, Spark Shell, etc.
- Familiarity with cloud platforms, such as Azure, or GCP, and their data services
- Good understanding of distributed systems concepts, such as concurrency, parallelism, fault tolerance, consistency, and scalability.