At least 4 years of experience on designing and developing production data pipelines for data ingestion or transformation using tools from Hadoop stack (HDFS, Hive, Spark (PySpark/Scala), HBase, Kafka, NiFi, Oozie, Apache Beam, Apache Airflow etc.).
At least 4 years of experience in the following Big Data Platforms: Cloudera / Horton Works / Snowflake / AWS EMR / RedShift / AWS Glue.
At least 4 years of developing applications with Monitoring, Build Tools, Version Control, Unit Test, TDD, Change Management to support DevOps
At least 2 years of experience with SQL and Shell Scripting experience
Experience troubleshooting JVM-related issues.
Familiarity with machine learning implementation using Spark ML / Tensorflow.
Experience in data visualization tools like Cognos, Arcadia, Tableau.
Experience in data warehousing modeling techniques.
Preferred:
Familiarity with JAVA development on platforms like PCF, Kubernetes will be a plus
2+ years’ experience with Amazon Web Services (AWS), Google Compute or another public cloud service
2+ years of experience working with Streaming using Spark or Flink or Kafka
2+ years of experience working with Dimensional Data Model and pipelines in relation with the same
Intermediate level experience/knowledge in at least one scripting language (Python, Perl, JavaScript)
Hands on design experience with data pipelines, joining data between structured and unstructured data
Familiarity of SAS programming will be a plus
Experience implementing open source frameworks & exposure to various open source & package software architectures (AngularJS, ReactJS, Node, Elastic Search, Spark, Scala, Splunk, Apigee, and Jenkins etc.).
Experience with various NOSQL databases (Hive, MongoDB, Couchbase, Cassandra, and Neo4j) will be a plus
Experience in Ab Initio technologies including, but not limited to Ab Initio graph development, EME, Co-Op, BRE, Continuous flow)