The Data Engineer contractor role will be a project based role focused on migrating data pipelines from legacy infrastructure and frameworks such as Scalding to more modern infrastructure we support such as Spark Scala. This role will be responsible for:
- Analyzing existing data pipelines to understand their architecture, dependencies, and functionality.
- Working closely with data engineers to develop a migration strategy for converting pipelines from the current framework to the target framework.
- Designing, building, and testing new data pipelines in the target framework, ensuring they meet or exceed existing performance and reliability standards.
- Debugging and troubleshooting issues that arise during the migration process, working collaboratively with cross-functional teams to resolve them quickly.
- Communicating progress, challenges, and timelines to stakeholders on a regular basis.
Requirements
The ideal candidate is a Data Engineer with considerable experience in migrations and Big Data frameworks.
Must-Haves
- Scala programming language expertise
- Spark framework expertise
- Experience working with BigQuery
- Familiarity scheduling jobs in Airflow
- Fluency with Google Cloud Platform, in particular GCS and Dataproc
Nice-to-Haves
- Python programming language fluency
- Scalding framework fluency Pyspark framework fluency
- Dataflow(Apache Beam) framework fluency