US Citizen
Green Card
EAD (OPT/CPT/GC/H4)
H1B Work Permit
Corp-Corp
W2-Permanent
W2-Contract
Contract to Hire
Consulting/Contract
UG :- - Not Required
PG :- - Not Required
No of position :- ( 1 )
Post :- 2nd Aug 2023
Implement SRE practices
Identify, craft, and maintain SLIs and SLOs for teams, as well as metrics such as MTTR, Lead time for change, Deployment Frequency and Change Failure Rate
Work with Application teams to set up Observability, Telemetry
Define what it means for a service to be available and develop, monitor, and alert on SLIs/SLOs
Define, track, and enforce error budgets
Review code instrumentation with development teams and ensure necessary dashboards are created to monitor SLI/SLO/SLAs
Establish, test, and tune alerting for varying tiers of applications
Document and maintain runbooks and procedures, automate as much as possible
Plan and execute periodic Disaster Recovery exercises including both tabletop and simulated failures (fault injection)
8+ years of SRE or Systems Engineering experience and total of 12-15 years of software industry experience
Experience with Any SRE tool, (Grafana, Dynatrace, Splunk are preferable)
Experience with Distributed tracing
Experience with establishing hooks into CI/CD pipeline in lower environments for SRE violations
Strong analytical and problem-solving mindset combined with experience troubleshooting under pressure
Strategic thinking, complex problem solving and analytical capabilities
Strong organizational and interpersonal skills, with experience developing and instilling a culture of operational maturity
Ability to adjust quickly to new technologies