- Overall, 8-12 years of experience
- Able to communicate with Business and Technical stakeholders to propose solutions, create work breakdown and estimate the effort for execution.
- Able to coordinate and collaborate with cross-functional teams, stakeholders, and vendors for the smooth functioning of the enterprise data system.
- Ability and readiness for hands on work
- Worked with Spark, Python and SQL for at least 3 years in a professional setting
- Experienced in designing data pipelines for both real time and batch data sources
- Has comfort with containerization technologies (Fargate, ECS, docker)
- Intellectually curious and self-directed problem solver, keen to work on a variety of data projects and independently search for answers
- Has worked with Python common data related libraries (pandas, numpy)
- Has worked with Databricks, RDS as well as NoSQL databases (Cassandra, HBase)
- Has experience working with DevOps tools like Git, Jenkins and Terraform.
- Has hands-on experience with SQL and can write complex queries with ease. Has experience building batch and real-time data pipelines.
- Has familiarity with AWS services (kinesis, fargate, lambda, S3)
- Has familiarity with Data warehousing design and management including Master Data Management
- Good to have experience deploying machine learning models into production and building systems for these processes. Experience with data bricks MLOps or AWS Sagemaker a plus
- Care deeply about the quality of your code, but are also aware of timelines and don’t spend countless hours trying to bring things to perfection
Communicates clearly and effectively