Create, maintain, and optimize ETL/ELT pipelines to ingest, process, and manage data from various sources using Python, Apache Spark, and AWS services.
Design data models, build data structures, and implement data storage solutions that ensure data integrity, consistency, and security.
Tune data processing workflows for performance, scalability, and cost efficiency on distributed systems using Spark and AWS.
Work with cross-functional teams (e.g., data science, product, analytics) to understand data requirements and support business needs. Document data workflows, processes, and solutions for transparency and reproducibility
Implement data quality checks, error handling, and recovery processes. Ensure compliance with data governance and security protocols.