Implements and automates deployment of our distributed system for ingesting and transforming data from various types of sources (relational, event-based, unstructured).
Designs and implements Spark Structured Streaming & API workflow
Implements methods to continuously monitor and troubleshoot data quality and data integrity issues.
Implements data governance processes and methods for managing metadata, access, retention to data for internal and external users.
Develops reliable, efficient, scalable, and quality data pipelines with monitoring and alert mechanisms that combine a variety of sources using ETL/ELT tools or scripting languages.
Develops physical data models and implements data storage architectures as per design guidelines.
Analyzes complex data elements and systems, data flow, dependencies, and relationships in order to contribute to conceptual physical and logical data models.
Participates in testing and troubleshooting of data pipelines.
Develops and operates large scale data storage and processing solutions using different distributed and cloud based platforms for storing data (e.g. Data Lakes, Hadoop, Hbase, Cassandra, MongoDB, Accumulo, DynamoDB, others).
Uses agile development technologies, such as DevOps, Scrum, Kanban and continuous improvement cycle, for data driven application, attends daily stand-ups.
Hands on experience in Spark Structured Streaming & API workflow
SPARK, Scala/Java, Map-Reduce, Hive, Hbase, and Kafka, Microsoft Azure Databricks