US Citizen
Green Card
EAD (OPT/CPT/GC/H4)
H1B Work Permit
Corp-Corp
W2-Permanent
W2-Contract
Contract to Hire
Consulting/Contract
UG :- - Not Required
PG :- - Not Required
No of position :- ( 1 )
Post :- 31st Jan 2026
This position is tailored for an experienced AWS-focused Site Reliability Engineer (SRE). The individual will be primarily responsible for ensuring basic system reliability and reducing operational toil. SREs at this level possess the ability to monitor system performance, configure proactive alerts, and actively participate in the on-call rotation. Additionally, they contribute to disaster recovery testing and may assist in training new team members.
Scope and Key Responsibilities:
- Creates monitoring queries and establishes service level baselines
- Provides support to senior engineers during incidents
- Contributes to post-mortems and Root Cause Analyses (RCAs)
- Participates in disaster recovery testing
- Implements automation and executes code in production environments
- Contributes to SRE knowledge documentation
TOP 3 must-have skills:
a.) AWS
b.) Windows/Linux
c.) Troubleshooting
Technical Skills:
Observability (Level 3):
- Creates proactive alert rules and configures browser agents for monitoring
- Develops complex synthetic transactions and advanced Application Performance Monitoring (APM)
- Recommends and establishes Service Level Objectives (SLOs)
Incident Management (Level 3):
- Leads RCAs and scenario modeling exercises
- Participates in on-call rotation and provides support
- Writes advanced automation scripts for incident response
Design for Reliability (Level 3):
- Makes performance and capacity recommendations based on customer demand
- Proficient in DevOps practices including monitoring and configuration management
Disaster Recovery (Level 3):
- Assists in the recovery of Major Incidents and tests system failover
- Automates system recovery using Infrastructure-as-Code
Platforms and Automation (Level 3):
- Identifies opportunities for improving developer experience and software delivery performance
- Maintains and secures cloud environments
Reliability Culture (Level 3):
- Contributes to SRE knowledge base and toil elimination projects
- Analyzes ticket trends and provides recommendations for improvement
Behavioral Competencies:
- Collaboration and Teamwork
- Customer & External Focus
- Problem Solving and Issue Analysis
- Learning Agility