Design, build and operationalize large scale enterprise data solutions and applications using one or more of AWS data and analytics services in combination with 3rd parties - Spark, EMR, RedShift, Lambda, Glue
- Design and build production data pipelines from ingestion to consumption within a big data architecture, using Java, Python.
- Design and implement data engineering, ingestion and curation functions on AWS cloud using AWS native or custom programming.
- Work experience with ETL, Data Modeling, and Data Architecture.
- Experience with Big Data technologies such as Hadoop/Hive/Spark.
- Skilled in writing and optimizing SQL.
- Experience operating very large data warehouses or data lakes.
Required/Preferred
- Minimum of 5 years of experience with data warehousing solutions.
- Experience in designing and implementing highly performant data ingestion pipelines from multiple sources using Apache Spark and/or Azure Databricks
- Show efficiency in handling data - tracking data lineage, ensuring data quality, and improving discoverability of data.
- Integrating end to end data pipeline to take data from source systems to target data repositories ensuring the quality and consistency of data is always maintained.
- Knowledge of Engineering and Operational Excellence using standard methodologies.
- Comfortable using PySpark APIs to perform advanced data transformations.
- Familiarity with implementing classes with Python.
- Experience operating in an Agile/Scrum environment
- Strong written and verbal communication skills
- Bachelor’s Degree in Computer Science or related area