Good experience with Hadoop, Hive, PySpark and optimizing Spark jobs.
Senior person who is hands on and lead offshore deliverables
Responsibilities
Design, develop, and maintain large-scale data processing systems using Hadoop and related technologies.
Create and optimize data pipelines for ingesting, processing, and transforming structured and unstructured data.
Collaborate with data scientists, analysts, and other stakeholders to understand data requirements and implement effective data solutions.
Ensure data quality and integrity by implementing data validation and cleansing processes.
Monitor and troubleshoot performance issues in the Hadoop ecosystem, optimizing data processing and storage for efficiency.
Develop and implement data security and privacy measures to protect sensitive data.
Stay updated with the latest trends and technologies in the field of big data and distributed computing, and propose improvements to our existing systems.
Document and communicate technical specifications and solutions to both technical and non-technical team members.
Qualifications:
Bachelor's degree in Computer Science, Engineering, or a related field.
A Master's degree is a plus.
Strong experience working with Hadoop ecosystem technologies, such as HDFS, MapReduce, Hive, Pig, Spark, and Kafka.
Proficiency in programming languages commonly used in the Hadoop ecosystem, such as Java, Scala, or Python.
Experience with data modeling and database design principles.
Solid understanding of distributed computing principles and large-scale data processing frameworks.
Familiarity with cloud-based data technologies, such as Amazon Web Services (AWS) or Google Cloud Platform (GCP).
Strong problem-solving and troubleshooting skills.
Excellent communication and collaboration abilities