- SRE Engineer with real interest and experience in troubleshooting Linux systems, networking, monitoring, Databases, containers/Kubernetes, cloud technologies etc and a proven interest and experience in using software engineering to solve operational problems.
- comfortable writing code to automate API-driven tasks at scale. Python preferred.
- Architect and implement automations to auto-remediate/self-heal issues in production.
- participate in SRE software engineering, writing code for the continuing reduction of human intervention in operational tasks and automation of processes.
- Monitor the application ecosystem, jumping on bridges and resolving the issues.
- Having a good understanding of core DevOps and SRE practices and technologies.
- Be ready to participate in 24x365 on-call schedules and close it within 30 Minutes.
- Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity.
Skills & Qualifications
Overall 10+ years of experience with DevOps and SRE practices, technologies, and industry standards to make production reliable and resilient.
Having experience of core DevOps and SRE technologies like:
chaos engineering
- Ansible
- Docker
- Kubernetes, Helm
- Jenkins
- Terraform
- IaC via Terraform
- Prometheus, Grafana
- ELK stack
- Azure Cloud Stack
- Azure DevOps
- Expert Hands-on experience with provisioning and deploying infrastructures in Azure Public Cloud in a large scale enterprise environment with mission critical applications
- Expert Hands-on experience using Azure DevOps stack to build automated CI/CD pipelines for deploying applications and infrastructure