Required Skills

SRE

Work Authorization

  • US Citizen

  • Green Card

  • H1B Work Permit

Preferred Employment

  • Corp-Corp

  • W2-Permanent

  • W2-Contract

  • Contract to Hire

Employment Type

  • Consulting/Contract

education qualification

  • UG :- - Not Required

  • PG :- - Not Required

Other Information

  • No of position :- ( 1 )

  • Post :- 3rd Feb 2023

JOB DETAIL

• 5+ years managing and monitoring Incident/Crisis management

• 3+ years’ experience monitoring with various tools like Grafana, NewRelic etc.

• 1+ years’ experience programming in a programming language such as Python and Go

• Infrastructure as Code and Terraform

• On call experience

• Lead the on-call teams and processes to improve site reliability

• Focus on managing large scale systems with high loads 24/7

• Support our SRE and engineering teams in their day to day

• Build, enhance and maintain runbooks working with various teams cross-functionally

• Thrive on automating processes as much as possible

• Observability and Monitoring with services like Prometheus, Grafana, New Relic

• Additional other duties and responsibilities, as assigned.

• Lead the NOC tools, runbooks, processes and teams

• Automation of runbooks as necessary

• Work with our development teams on improving the system

Attention to detail and ability to manage multiple projects

Strong analytical skills and ability to present complex data on site reliability and other factors

Demonstrated ability to work with 3rd parties and collaborate on solutions.

• Experience in Monitoring using NewRelic/Grafana/Prometheus.

• Experienced in scripting languages Python/Go

Required Experience:

1. Cloud Concepts: Extensive hands-on experience in AWS/GCP, Kubernetes

2. Infrastructure as code : Terraform, CI/CD: GitHub, Jenkins

3. Monitoring Tools: Experience in using New Relic, Grafana, Prometheus.

7-8 years Experience

1. Cloud Concepts: Extensive hands-on experience in AWS/GCP, Kubernetes

2. Infrastructure as code : Terraform, CI/CD: GitHub, Jenkins

3. Monitoring Tools: Experience in using New Relic, Grafana, Prometheus.

Experience in Incident/Crisis management, Support experience, On Call experience

Team Culture:

• Candidate with Support and On call experience

• Attention to detail and ability to manage multiple projects

• Strong analytical skills and ability to present complex data on site reliability and other factors

• Demonstrated ability to work with 3rd parties and collaborate on solutions

• Ability to work with Offshore team, set goals and ensure 100% adherence on the process and deliverables

• Willingness to provide on call support and join calls to troubleshoot major outages.

 

Company Information