This position is responsible for hands-on design development in Spark and Python (PySpark) along with other Hadoop ecosystems like HDFS , Hive , Hue , Impala , Zeppelin etc
The purpose of position includes- Analysis , design and implementation of business requirements using PySpark Python
Cloudera Hadoop development around Big Data
Solid SQL experience
Development experience with PySpark SparkSql with good analytical debugging skills
Development work for building new solutions around Hadoop and automation of operational tasks
Assisting team and troubleshooting issues
Duties and Responsibilities Design and development around PySpark , Python and Hadoop Framework
Experience with RDD and Data Frames within Spark
Experience with data analytics , and working knowledge of big data infrastructure including Hadoop Ecosystems like HDFS , Hive , Spark etc
Work with gigabytes/terabytes of data and must understand the challenges of transforming and enriching such large datasets
Provide effective solutions to address the business problems strategic and tactical
Collaboration with team members , project managers , business analysts and business users in conceptualizing , estimating and developing new solutions and enhancements
Work closely with the stake holders to define and refine the big data platform to achieve company product and business objectives
Collaborate with other technology teams and architects to define and develop cross-functional technology stack interactions
Read , extract , transform , stage and load data to multiple targets , including Hadoop and Oracle
Develop automation scripts around Hadoop framework to automate processes and existing flows around
Should be able to modify existing programming/codes for new requirements
Unit testing and debugging
Perform root cause analysis (RCA) for any failed processes
Document existing processes as well as analyze for potential automation and performance improvements
Convert business requirements into technical design specifications and execute on them
Execute new development as per design specifications and business rules/requirements
Job Description epsilon
com Participate in code reviews and keep applications/code base in sync with version control
Effective communication , self-motivated with ability to work independently yet still aligned within a team environment
Required Skills Bachelors in computer science (or equivalent) or Masters with 3+ years of experience with big data against ingestion , transformation and staging using following technologies / principles / methodologies: Design and solution capabilities
Rich experience with Hadoop distributed frameworks , handling large amount of big data using Apache Spark and Hadoop Ecosystems
Working knowledge and good experience in Unix environment and capable of Unix Shell scripts (ksh , bash)
Basic Hadoop administration knowledge
DevOps Knowledge is an added advantage
Ability to work within deadlines and effectively prioritize and execute on tasks
Strong communication skills (verbal and written) with ability to communicate across teams , internal and external at all levels
Certifications Anyone of these:
CCA Spark and Hadoop Developer
MapR Certified Spark Developer (MCSD)
MapR Certified Hadoop Developer (MCHD)
HDP Certified Apache Spark Developer
HDP Certified Developer
Preferred Skills Technical: o Working knowledge of Oracle databases and PL/SQL
o Hadoop Admin Dev-Ops
o Experience with JIRA for user-story/bug tracking o Experience with GIT/Bitbucket Non-Technical: o Familiarity with SDLC and development/migration processes o Good analytical thinking and problem-solving skills
o Ability to diagnose and troubleshoot problems quickly
o Motivated to learn new technologies , applications and domain
o Possess appetite for learning through exploration and reverse engineering
o Strong time management skills
o Ability to take full ownership of tasks and projects
Job Description epsilon
com Behavioral Attributes: o Team player with excellent interpersonal skills
o Good verbal and written communication
o Possess Can-Do attitude to overcome any kind of challenges
Required Skills
SQL, Development,
Debugging Skills, Spark, Data Analytics,
Big Data Infrastructure, Programming, Computer Science, Big Data, Hadoop Distributed Frameworks, Unix, Unix Shell, Communication Skills, Ability to communicate, JIRA, Bug Tracking, GIT, Bitbucket, Time Management Skills, Team Player, Interpersonal Skills, Hands-On Design, Python, Hadoop, HDFS, Hive, Impala, Analysis, Design and Implementation, Business Requirements, Hadoop Development, Development Work, Automation, Troubleshooting, Design and Development, Hadoop Framework, Datasets, Strategic, Collaboration, Estimating, Developing, Cross-Functional, Oracle, Develop Automation, Unit Testing, Debugging, Root Cause Analysis, Potential Automation, Technical Design Specifications, New Development, Design Specifications, Business Rules, Code Reviews, Code Base, Version Control, Effective Communication, Motivated, Ability to work independently, Design, Apache Spark, RDBMS, Oracle Databases, PL/SQL, Ksh, Bash, Hadoop Administration, DevOps, Prioritize, Non-Technical, SDLC, Migration, Analytical Thinking, Problem-Solving Skills, Learning, Reverse Engineering, Take Full Ownership, Written Communication, Can-Do Attitude