Required Skills

Site Reliability Engineer

Work Authorization

  • US Citizen

  • Green Card

  • EAD (OPT/CPT/GC/H4)

  • H1B Work Permit

Preferred Employment

  • Corp-Corp

  • W2-Permanent

  • W2-Contract

  • Contract to Hire

Employment Type

  • Consulting/Contract

education qualification

  • UG :- - Not Required

  • PG :- - Not Required

Other Information

  • No of position :- ( 1 )

  • Post :- 23rd Dec 2025

JOB DETAIL

We are seeking a highly motivated and skilled Site Reliability Engineer to join our HDInsight team. In this role, you will play a critical part in ensuring the stability, performance, and reliability of our big data platform. You will work closely with the HDInsight product team to provide exceptional SRE support, helping customers resolve complex issues and ensuring smooth operations.

Responsibilities:

  • Customer Support: Provide top-notch support to customers, troubleshooting and resolving HDInsight-related issues via Incident Communication Management (ICM).
  • Performance Optimization: Analyze and optimize the performance of Spark, Hive, and Hadoop jobs, ensuring efficient and scalable big data processing.
  • Root Cause Analysis: Investigate production incidents, identify root causes, and implement effective mitigations to prevent future occurrences.
  • Tool Development: Build and maintain tools and services that enhance the debuggability and supportability of HDInsight.
  • Proactive Monitoring: Monitor the health of clusters for key customers, identifying potential problems before they impact operations.
  • Migration Support: Assist in the migration of big data workloads, leveraging your expertise to ensure seamless transitions.

Essential Skills:

  • Deep understanding of big data technologies and Hadoop ecosystem (HBase, Kafka, etc.)
  • Hands-on experience with Hadoop administration and troubleshooting
  • Proficiency in cloud technologies, particularly AWS
  • Strong problem-solving and analytical skills
  • Excellent communication and customer service skills
  • Experience with Hortonworks Data Platform (HDP) is a plus

Desirable Skills:

  • Knowledge of Spark, Hive, and other big data processing frameworks
  • Familiarity with performance tuning techniques for big data workloads
  • Experience with scripting and automation (e.g., Python, Bash)
  • Understanding of DevOps principles and practices

Qualifications:

  • Bachelor's degree in Computer Science or a related field
  • 3+ years of experience in Site Reliability Engineering or a similar role
  • Proven track record of supporting and troubleshooting large-scale distributed systems

Benefits:

  • Competitive salary and benefits package
  • Opportunity to work on cutting-edge big data technologies
  • Collaborative and innovative work environment
  • Potential for professional growth and development

If you are passionate about big data and have a strong desire to help customers succeed, we encourage you to apply!

Keywords: Site Reliability Engineer, SRE, Big Data, Hadoop, HBase, Kafka, AWS, Cloud, Hortonworks, HDInsight, Spark, Hive, Performance Tuning, Troubleshooting, Customer Support.

Company Information