Apache Beam / Apache Flume / Kafka / Azure Databricks, Python, Integration, Data Engineer, Azure
- 8+ years proven experience in developing and deploying data pipelines, preferably in the Cloud; Azure and Snowflake experience is a plus.
- 2+ years of proven expertise in creating pipelines for real time and near real time integration.
- Proven experience with Spark SQL, Spark Streaming and using Core Spark API to explore Spark features to build data pipelines.
- 2+ experience with at least one technologies – Apache Beam / Apache Flume / Kafka / DataBricks
- Experience in Azure
- Hands-on experience in productionizing and deploying Big Data platforms and applications. Hands-on experience working with: Relational/SQL, distributed columnar data stores/NoSQL databases (MongoDB or Cassandra), graph databases, timeseries databases, HDFS, HBase, Map Reduce, NiFi, Spark streaming, Kafka, Sqoop, Hive, Oozie, Avro, and more
- Databricks and Delta table knowledge is a plus.
- Extensive experience in data transformations for Retail business use cases will be a plus.
- Knowledge for handling exceptions and automated re-processing and reconciling
- Passion for Data Quality with an ability to integrate these capabilities into the deliverables.
- Prior use of Big Data components and the ability to rationalize and align their fit for a business case.
- Experience in working with different data sources - flat files, XML, JSON, Avro files and databases.
- Proficiency in techniques for slowly changing dimensions.
- Knowledge of Jenkins for continuous integration and End-to-End automation for application build and deployments