Required Skills - PySpark, Big Data Management, AWS EMR, Python
Job summary
- Design and implement distributed data processing pipelines using Spark, Hive, Python, and other tools and languages prevalent in the Hadoop ecosystem.
- Ability to design and implement end to end solution.
- Experience publishing RESTful API’s to enable real-time data consumption using OpenAPI specifications
Roles & Responsibilities
- Design and implement distributed data processing pipelines using Spark, Hive, Python, and other tools and languages prevalent in the Hadoop ecosystem.
- Ability to design and implement end to end solution.
- Experience publishing RESTful API’s to enable real-time data consumption using OpenAPI specifications
- Experience with open source NOSQL technologies such as HBase, DynamoDB, Cassandra
- Familiar with Distributed Stream Processing frameworks for Fast & Big Data like ApacheSpark, Flink, Kafka stream
- Build utilities, user defined functions, and frameworks to better enable data flow patterns.
- Work with architecture/engineering leads and other teams to ensure quality solutions are implements, and engineering best practices are defined and adhered to.
- Experience in Business Rule management systems like Drools