- Advanced hands-on experience designing AWS data lake solutions.
- Experience integrating Redshift with other AWS services, such as DMS, Glue, Lambda, S3, Athena, Airflow.
- Proficiency in Python programming with a focus on developing efficient Airflow DAGs and operators.
- Experience with Pyspark and Glue ETL scripting including functions like relationalize, performing joins and transforming dataframes with pyspark code.
- Competency developing CloudFormation templates to deploy AWS infrastructure, including YAML defined IAM policies and roles.
- Experience with Airflow DAG creation.
- Familiarity with debugging serverless applications using AWS tooling like Cloudwatch Logs & Log Insights, Cloudtrail, IAM.
- Ability to work in a highly complex python object oriented platform.
- Strong understanding of ETL best practices, data integration, data modeling, and data transformation.
- Proficiency in identifying and resolving performance bottleneck and fine-tuning Redshift queries.
- Familiarity with version control systems, particularly Git, for maintaining a structured code repository.
- Strong coding and problem-solving skills, and attention to detail in data quality and accuracy.
- Ability to work collaboratively in a fast-paced, agile environment and effectively communicate technical concepts to non-technical stakeholders.
Additional Useful Experience:
- Docker
- Airflow Server Administration
- Parquet file formats
- AWS Security
- Jupyter Notebooks
- API Best Practices, API Gateway, Route Structuring and standard API authentication protocols including tokens
- Git, Git flow best practices
- Release management and DevOps
- Shell scripting
- AWS certifications related to data engineering or databases are a plus.
- Experience with DevOps technologies and processes.
- Experience with complex ETL scenarios, such as CDC and SCD logics, and integrating data from multiple source systems.
- Experience in converting Oracle scripts and Stored Procedures to Redshift equivalents.
- Experience working with large-scale, high-volume data environments.
- Exposure to higher education, finance, and/or human resources data is a plus.
- Proficiency in SQL programming and Redshift stored procedures for efficient data manipulation and transformation.