Experienced in ETL optimization, designing, coding, and tuning using Apache Spark or similar technologies.
Experience with building data pipelines and applications to stream and process datasets at low latencies.
Demonstrate efficiency in handling data - tracking data lineage, ensuring data quality, and improving discoverability of data.
Experience with languages like Python,R and other programming language
Extensive experience developing Spark applications and Scala programming languages.
Familiar with various cloud data sources and architectures such as AWS/S3,EC2,EMR, lambda, Parquet, Kafka
Experience with both relational database design (SQL), non-relational (NoSQL) databases, real-time technologies
Familiarity with Docker, CI/CD (such as Jenkins/Gitlab), AWS
Strong understanding of cloud architecture and Big data fundamentals.
Support continuing increases in data velocity, volume, and complexity
Self-directed, ability to multi-task, sharp analytical abilities, excellent communication skills, capable of working effectively in a dynamic environment
Collaborate closely with fellow data team members as well as tech and product teams and company leaders
Sound knowledge of distributed systems and data architecture (lambda)- design and implement batch and stream data processing pipelines, knows how to optimize the distribution, partitioning, and MPP of high-level data structures.
Strong OO & FP design patterns, data structure, and algorithm design skills
Experience with or knowledge of Agile Software Development methodologies
Experience developing and / or consuming web interfaces (REST API) and associated skills (HTTP, web services)