Vanguard is seeking a skilled Senior Data Engineer to join our team. The ideal candidate will have a strong foundation in ETL processes using EMR and Glue, as well as experience with a variety of technologies and tools. This role is critical in building and optimizing our data infrastructure to support business objectives.
Responsibilities:
- Design and implement robust data pipelines using AWS technologies such as EMR, Glue, S3, EC2, Service Catalog, PostgreSQL, Lambda, CloudFormation, SNS, and EventBridge.
- Develop and optimize Hive tables, PySpark data frames, joins, partitioning, and parallelism.
- Collaborate with cross-functional teams to analyze requirements and translate business objectives into scalable data solutions.
- Leverage SQL, PySpark, and Python for data extraction, transformation, and loading (ETL).
- Utilize tools like Jupyter Notebooks, GitHub, and CI/CD pipelines to streamline development and deployment processes.
- Stay up-to-date with industry trends and best practices in data engineering.
Qualifications:
- Minimum of 6 years of experience in data engineering.
- Strong proficiency in SQL, Python, and PySpark.
- In-depth knowledge of AWS services, particularly EMR and Glue.
- Experience with ETL processes, data modeling, and data warehousing.
- Familiarity with Agile methodologies and DevOps practices.
- Excellent problem-solving and communication skills.