US Citizen
Green Card
EAD (OPT/CPT/GC/H4)
H1B Work Permit
Corp-Corp
W2-Permanent
W2-Contract
Contract to Hire
Consulting/Contract
UG :- - Not Required
PG :- - Not Required
No of position :- ( 1 )
Post :- 21st Oct 2025
• Lead complex technology initiatives towards Hadoop platform stability, automation initiatives.
• Platform Incident and Problem management
• Be part of 24/7 platform support team to perform platform incident management and triaging team.
• Assess the impact of the platform issue and prioritize the triage by partnering with platform tenants, platform development and Hadoop vendor teams.
• Facilitate and perform root cause analysis of platform incidents towards determining permanent resolution involving Tenants and Hadoop vendor.
• Communicate with Stakeholders, leadership team on the triaging progress status.
• To troubleshoot tenant reported failures by deep diving onto client logs to identify the root cause.
• Use Autosys as scheduler to schedule, execute platform routine activities.
• Hadoop Cluster Monitoring
• Monitor Hadoop cluster health and performance.
• Automate new monitoring opportunities leveraging available monitoring and ing tools such as Prometheus, Thousand Eyes, Splunk etc.
• Perform troubleshooting on the actionable s and tuning to ensure optimal cluster performance.
• Hadoop Cluster Configuration and tuning
• Periodically tune and configure Hadoop ecosystem components such as HDFS, YARN, Hive, HBase, Spark, HPOS etc.
• Schedule the Change request and track the approval process for implementation.
• Platform Backup and Recovery:
• Ensure Hadoop platform is resilient with data backup and recovery strategies in place.
• Perform emergency or scheduled failover of platform to Disaster recovery site and fall back.
• Upgrades and Patch Management:
• Plan and execute upgrades of Hadoop ecosystem components and patches.
• Ensure compatibility and smooth integration of new features and enhancements.
• Work with Operating System administrators and patching automation build team to schedule and execute patching.
• Perform Cluster health validation post patching and communicate with stakeholders.
• Identify automation and process improvement opportunities related to patching to implement it.
• Documentation and Reporting:
• Maintain comprehensive documentation of Hadoop cluster configurations & troubleshooting processes, and procedures.
• Develop and Schedule to Generate reports on cluster usage, performance metrics, and capacity utilization.
• Collaboration and Support:
• Ability to work both independently and in collaboration with Platform development and Platform tenant teams to troubleshoot and fix Hadoop platform issues, maintain production stability to enterprise-wide applications.
• Collaborate and consult with key technical experts, senior technology team, and Hadoop vendor to resolve complex technical issues and achieve goals.
Mandatory Skills and Qualifications:
• Proven Experience in technology incident and problem management role.
• Proven Experience in using monitoring tools like Prometheus, Splunk, SiteScope & thousand eyes etc.,
• Proven experience as a Unix Scripting
• Proven experience in Hadoop administration or in a similar role.
• Strong knowledge of Hadoop ecosystem and its components (Hive, HBase, Spark, Hue ).
• Proven Experience is using Autosys or similar job scheduler.
• Excellent problem-solving and troubleshooting skills.
• Excellent communication and people skills for collaborating with cross-functional teams.