Development, customize and manage integration tools, databases, warehouses and analytical systems with the use of data related instruments/instances
- Create and run complex queries and automation scripts for operational data processing and building out Python ETL processes and writing complex SQL queries.
- Test the reliability and performance of each part of a system and cooperate with the testing team
- Deploying data models into production environments. This entails providing the model with data stored in a warehouse or coming directly from sources, configuring data attributes, managing computing resources, setting up monitoring tools, etc.
- Responsible for setting up tools to view data, generate reports, and create visuals
- Monitoring the overall performance and stability of the system. Adjust and adapt the automated pipeline as data/models/requirements change.
- Excellent understanding of ETL cycle. Analyze and organize raw data, build data systems and pipelines.
- Combine raw information from different sources, explore ways to enhance data quality and reliability, interpret trends and patterns from the raw data.
- Experience in using of Python/ PySpark and/or Scala for data engineering.
- Understanding of data types/ handling of different data models.
- Good knowledge in various phases of SDLC Requirement Analysis, Design, Development and Testing on various Development and Enhancement Projects.
- Desirable to have experience with Spark, Flink, Kafka, Flask, Scala, PySpark for Data engineering.
- Experience with the Microsoft Azure or AWS data management tools such as Azure Datafactory, Datalake and Databricks or AWS Snowflake is a plus
- Experience with data visualization tools is a plus (PowerBI, Tableau).
- Understanding of descriptive and exploratory statistics, predictive modelling, evaluation metrics, decision trees, machine learning algorithms is a plus
- Good scripting and programming skills.