Tagcor

AWS S3 Glue AWS RDS PySpark Python Confluent Kafka

Role Description-

The candidate must have overall 8-12 years of experience in ETL and Data Warehouse, Data Lake, data quality & etc.

Minimum 2+ years of experience in implementing AWS technologies like AWS S3, Glue, AWS RDS, PySpark, Python, Confluent Kafka, etc.
Design and Build ETL jobs to support customer Data Lake, Enterprise data warehouse
Need to have comprehensive understanding on ETL concepts and Cross Environment Data Transfers
Write Extract-Transform-Load (ETL) jobs and Spark/Hadoop jobs to calculate business metrics

Minimum 2+ years of experience in implementing ETL technologies Informatica, BODS, SSIS etc
Expertise in scripting technologies like Python/Spark/pyspark/Linux
Batch solution (aws glue/aws pipeline).
Distributed compute solution (Spark, EMR)
Analyze data through standard SQL (Athena)
Functional solution (aws lambda)
Distributed storage (redshift, S3)
Real-time solution (kafka, aws kinesis)
Experience / understanding of Agile, SCRUM and CI/CD.
Experience with Redshift query optimization, conversion and execution

AWS Pipeline knowledge to develop ETL for data movement to Redshift, experience to map the source to target rules and fields.
Experience in migrating data from an on-premise/traditional big data system, relational databases, pgSQL , data lake and data warehouse to AWS native service
Working knowledge of AWS eco-system and data analytics services like queing, notification, lambda, aws batch etc
Good knowledge on AWS environment and Service knowledge with S3 storage understanding.
Hands on Experience with data import /export mechanism to Redshift from S3
Hands on Experience in writing postgres procedures, functions and views in Redshift database
Hands on experience of Amazon Redshift Architecture.
AWS Redshift experience for creating DB objects
AWS Pipeline knowledge to develop ETL for data movement to Redshift, experience to map the source to target rules and fields.
Strong in writing complex queries with nested joins and derived tables.
Candidate should have good knowledge of Greenplum/Postgres database.
Able to perform technical root cause analysis and outline corrective action for given problems Proficient SQL/Performance tuning skills for Redshift Database
Should be flexible to overlap US / India business hours
Fluent in complex, distributed and massively parallel systems.
Ability to lay pipelines across multiple systems is a key requirement
AWS Lake Formation with EMR knowledge is added on
AWS Certified applicants preferable