Client: TCS
Title: Data Engineer with Python, AWS, ETL
Location : Tempe, AZ
Job Description
- The candidate must have overall 8-12 years of experience in ETL and Data Warehouse, Data Lake, data quality & etc.
- Minimum 2+ years of experience in implementing AWS technologies like AWS S3, Glue, AWS RDS, PySpark, Python, Confluent Kafka, etc.
- Minimum 2+ years of experience in implementing ETL technologies Informatica, BODS, SSIS etc
- Expertise in scripting technologies like Python/Spark/pyspark/Linux
- Batch solution (aws glue/aws pipeline).
- Distributed compute solution (Spark, EMR)
- Analyze data through standard SQL (Athena)
- Functional solution (aws lambda)
- Distributed storage (redshift, S3)
- Real-time solution (kafka, aws kinesis)
- Experience / understanding of Agile, SCRUM and CI/CD.
- Experience with Redshift query optimization, conversion and execution
- Design and Build ETL jobs to support customer Data Lake, Enterprise data warehouse
- Need to have comprehensive understanding on ETL concepts and Cross Environment Data Transfers
- Write Extract-Transform-Load (ETL) jobs and Spark/Hadoop jobs to calculate business metrics
- AWS Pipeline knowledge to develop ETL for data movement to Redshift, experience to map the source to target rules and fields.
- Experience in migrating data from an on-premise/traditional big data system, relational databases, pgSQL , data lake and data warehouse to AWS native service
- Working knowledge of AWS eco-system and data analytics services like queing, notification, lambda, aws batch etc
- Good knowledge on AWS environment and Service knowledge with S3 storage understanding.
- Hands on Experience with data import /export mechanism to Redshift from S3
- Hands on Experience in writing postgres procedures, functions and views in Redshift database
- Hands on experience of Amazon Redshift Architecture.
- AWS Redshift experience for creating DB objects
- AWS Pipeline knowledge to develop ETL for data movement to Redshift, experience to map the source to target rules and fields.
- Strong in writing complex queries with nested joins and derived tables.
- Candidate should have good knowledge of Greenplum/Postgres database.
- Able to perform technical root cause analysis and outline corrective action for given problems Proficient SQL/Performance tuning skills for Redshift Database
- Should be flexible to overlap US / India business hours
- Fluent in complex, distributed and massively parallel systems.
- Ability to lay pipelines across multiple systems is a key requirement
- AWS Lake Formation with EMR knowledge is added on
- AWS Certified applicants preferable