Required Skills

SRE - AWS Docker Kubernetes APIGEE Cassandra Oracle PostgresSQL Jenkins Github ELK New Relic Terraform Python Bash

Work Authorization

  • US Citizen

  • Green Card

  • EAD (OPT/CPT/GC/H4)

  • H1B Work Permit

Preferred Employment

  • Corp-Corp

  • W2-Permanent

  • W2-Contract

  • Contract to Hire

Employment Type

  • Consulting/Contract

education qualification

  • UG :- - Not Required

  • PG :- - Not Required

Other Information

  • No of position :- ( 1 )

  • Post :- 24th Sep 2025

JOB DETAIL

  • Team Leadership:
    • Lead and mentor a team of SREs, ensuring they have the resources and support needed to succeed
    • Foster a culture of reliability and continuous improvement within the team
  • System Reliability:
    • Ensure the availability, performance, and scalability of systems and services
    • Develop and implement strategies for monitoring and maintaining system health
  • Incident Management:
    • Oversee the response to incidents, ensuring quick resolution and minimal downtime
    • Conduct post-mortems to identify root causes and prevent future incidents
  • Automation and Tooling:
    • Develop and maintain automation tools to reduce manual work and improve efficiency
    • Implement and manage CI/CD pipelines to streamline deployments
  • Collaboration:
    • Work closely with development, operations, and product teams to ensure alignment on reliability goals
    • Communicate effectively with stakeholders about system performance and reliability
  • Risk Management:
    • Identify and mitigate potential risks to system reliability
    • Implement strategies to handle failures and ensure disaster recovery

Skills: 

  • Technical Expertise:
    • Experience with:
      • Cloud platforms (AWS), containerization technologies (Docker & Kubernetes), API management (Apigee), Databases (Non-SQL: Casandra & SQL: Oracle, PostgreSQL & DB2), and CICD (Jenkins, Github)
      • Other technologies, ELK Stack & APM (New Relic, Terraform)
      • Proficiency in scripting languages like Python or Bash
  • Problem-Solving:
    • Strong analytical skills to diagnose and resolve complex system issues
    • Ability to design and implement effective monitoring and alerting systems
  • Leadership:
    • Proven experience in leading and growing engineering teams
    • Excellent communication and collaboration skills
  • Automation:
    • Expertise in automation tools and practices to reduce manual intervention
    • Familiarity with CI/CD processes and tools
  • Resilience Engineering:
    • Knowledge of best practices in building resilient, self-healing systems

Experience with disaster recovery planning and execution

Company Information