Candidate should have 15+ years of experience in Data Engineering
Designing, creating, testing and maintaining the complete data management & processing systems.
Working closely with the stakeholders & solution architect.
Building highly scalable, robust & fault-tolerant systems.
Knowledge of Hadoop ecosystem and different frameworks inside it – HDFS, YARN, MapReduce, Apache Pig, Hive, Flume, Sqoop, ZooKeeper, Oozie, Impala and Kafka
Must have knowledge and working experience in Real-time processing Framework (Apache Spark), PySpark and in AWS Redshift, Apache Airflow and EMR
Must have experience on SQL-based technologies (e.g. MySQL/ Oracle DB) and NoSQL technologies (e.g. Cassandra and MongoDB)
Should have Python/Scala/Java Programming skills
Discovering data acquisitions opportunities
Finding ways & methods to find value out of existing data.
Improving data quality, reliability & efficiency of the individual components & the complete system.
Problem solving mindset working in agile environment