Experience in database programming using multiple aspects of SQL and Python
Understand and translate data, analytical requirements, and functional needs into technical requirements
Build, maintain and deploy scalable data pipelines to support large scale data management projects
Ensure alignment with data strategy and standards of data processing
Experience in any orchestration/workflow tool such as Airflow/Oozie for scheduling pipelines
Exposure to latest cloud ETL tools such as Glue/ADF/Dataflow
Hands-on experience in using Spark Streaming, Kafka and Hbase
BE/BS/MTech/MS in computer science or equivalent work experience.
Additional Responsibilities:
Exposure to latest cloud ETL tools such as Glue/ADF/Dataflow is a plus
Expertise in data structures, manipulating, and analyzing complex high-volume data from variety of internal and external sources
Experience in building structured and unstructured data pipelines
Proficient in programming language such as Python/Scala
Good understanding of data analysis techniques
Good understanding of in relational/dimensional modelling and ETL concepts
Understanding of any reporting tools such as Looker, Tableau, Qlikview or PowerBI
Good to have:
Experience in Big Data ecosystem - on-prem (Hortonworks/MapR) or Cloud (Dataproc/EMR/HDInsight)
Experience in Hadoop, Pig, SQL, Hive, Sqoop and SparkSQL
Understand and execute in memory distributed computing frameworks like Spark (and/or DataBricks) and its parameter tuning, writing optimized queries in Spark
Qualifications preferred
2 - 4 years of experience
Category: Bachelors Degree, Masters Degree
Field specialization: Computer Science / IT
Degree: Bachelor of Engineering - BE, Bachelor of Science - BS, Master of Engineering - MEng, Master of Science - MS