Design, build and operationalize large scale enterprise data solutions and applications using one or more of AWS data and analytics services in combination with 3rd parties - Spark, EMR, Kinesis, Kafka, API data. Hands-on in python, pyspark /glue is must.
Airflow , Lake formation tag based access and Lake Formation Tages Access/Role creation etc… which is key for project.
Experience of Big data and Hive
Analyze, re-architect and re-platform on-premise data warehouses to data platforms on AWS cloud using AWS or 3rd party services.
Design and build production data pipelines from ingestion to consumption within a big data architecture, using Java, Python, Scala.
Design and implement data engineering, ingestion and curation functions on AWS cloud using AWS native or custom programming.
Perform detail assessments of current state data platforms and create an appropriate transition path to AWS cloud.
Experience in redshift database / Snowflake knowledge preferable
Experience in handling project in AWS data landscape - EMR, Hadoop Big data and Hive, Kinesis, Kafka, Spark, API data,.
Experience of Splunk, ELK (Elasticsearch, Logstash, and Kibana) preferable
Hands-on administering Jenkins, or proficiency with other CI/CD tools.
Deployment Orchestration and Containerization (Docker or Kubernetes) preferable
Experience with building data pipelines and applications to stream and process datasets at low latencies.
Show efficiency in handling data - tracking data lineage, ensuring data quality, and improving discoverability of data
Sound knowledge of distributed systems and data architecture (lambda)- design and implement batch and stream data processing pipelines, knows how to optimize the distribution, partitioning, and MPP of high-level data structures.
Knowledge of Engineering and Operational Excellence using standard methodologies.