Us Citizen
Green Card
EAD (OPT/CPT/GC/H4)
Corp-Corp
Consulting/Contract
UG :- - Not Required
PG :- - Not Required
No of position :- ( 1 )
Post :- 6th Jan 2021
Design and implement distributed data processing pipelines using Spark, Hive, Sqoop, Python, and other tools and languages prevalent in the Hadoop ecosystem.
· Coding and architecting of end-to-end applications on modern data processing technology stack (e.g. Hadoop, Cloud, Spark ecosystem technologies)
· Build continuous integration/continuous delivery, test-driven development, and production deployment frameworks
· Build utilities, user-defined functions, and frameworks to better enable data flow patterns.
· Lead conversations with infrastructure teams (on-prem & cloud) on analytics application requirements (e.g., configuration, access, tools, services, compute capacity, etc.)
· Familiarity with building data pipelines, data modeling, architecture & governance concepts
· Experience implementing ML models and building highly scalable and high availability systems
· Experience operating in distributed environments including cloud (Azure, GCP, AWS etc.)
· Experience building, launching and maintaining complex analytics pipelines in production
· Platforms: Hadoop, Spark, Kafka, Kinesis, Oracle, Teradata
· Languages: Python, PySpark, Hive, Shell Scripting, SQL, Pig, Java / Scala
· Proficient in MapReduce, Conda, H2O, Spark, Airflow / Oozie / Jenkins, Hbase, Pig, No-SQL, Chef / Puppet, Git