- Design, build & maintain large scale high performing, secure Kubernetes and other application platform infrastructure on AWS, Azure, GCP etc.
- Own the Infrastructure, APM and work with DevOps teams to Build, Release, Monitor and run the services to improve service reliably
- Work on config management / orchestration suite, know where it's broken, work towards fixing them and explore new alternatives
- Handle cross team performance issues from identification of the cause, determining the areas of improvement and driving those actions to closure
- Performance and maturity baselining of DevOps process, tools maturity & coverage, metrics, technology and engineering practices
- Define, Measure and improve Reliability Metrics, Observability (Monitoring, Logging-Tracing solutions), Ops process (Incident, Problem Management) and streamline – automate release management
- Be a subject matter expert, able to upskill / cross skill engineering teams on SRE principles, tools and execution
- Dev Ops, Debugging skills, experience in logging and monitoring solutions such as Elastic Search, Kibana, Prometheus, AWS CloudWatch/Cloud Metrics, etc.
- Troubleshoot, debug, and diagnose operational issues and drive them to closure.
Experience:
- Experience with various Kubernetes platforms (OpenShift, EKS, AKS, GKE)
- Proven experience in designing and building large scale high performing, secure Kubernetes and other application platform infrastructure on AWS, Azure, GCP etc.
- Experience with scripting and orchestration including Terraform and/or CloudFormation
- Experience with monitoring tools such Dynatrace AppDynamics, ELK, Grafana, Prometheus, or equivalent
- Experience with public cloud (AWS, GCP, Azure)
- Experience working with Jira, Jenkins, Jfrog X-ray, ECR, ACR etc.
- Experience automating the software dev/test/deployment lifecycle with continuous integration and continuous deployment
- Experience with scaling, monitoring, and troubleshooting actively running systems
- Good understanding & implementation experience using 12-factor App principle
- Exp in building monitoring/metrics & alerting tool (APM tool), custom dashboard for each Application stack against supported environment
- Experience on container networking & security, image scanning for vulnerabilities using tools, Source code management and Implementation of Security best practices, other DevSecOps principles and tools.
- Excellent hands-on experience with Unix/Linux-OS Internals and administration.
- Excellent hands-on of Containerization using platforms` and tools like – Kubernetes, Docker, Jenkins, Jfrog, git, Prometheus, graphana etc.
- Good understanding of uplifting the maturity (App Engineering practices & Ops)
- Understanding of software delivery lifecycles, particularly Agile/Lean & DevOps
- Proven experience in handling large scale and growing infrastructure across Data Centres and heterogeneous Cloud platforms
Qualifications:
Skills - Requirements
- Hands on experience with AWS & Azure PaaS, CaaS and other cloud infrastructure components.
- Must have expert level knowledge of Kubernetes & Certified Kubernetes Administrator (CKA) / Developer ( CKAD)
- Production experience working with K8s, EKS, AKS, OpenShift. Etc. for design , build and operate.
- Solid understanding of automation principles and scripting experience (eg. Python, Bash, PowerShell)
AWS/Azure Certified Solutions Architect -Associate , AWS Certified C