Lead and manage the Cloud Reliability teams to provide strong Managed Services support to end-customers.
Responsible for automation using either Docker, Puppet, Ansible, Chef, or an equivalent. Deploying, automating, maintaining, to ensure the availability, performance, scalability, and security of production systems.
Handle code deployments in all environments, work towards CI/CD.
Provide technical guidance and educate team members on development and operations. Monitor metrics and develop ways to improve.
System troubleshooting and problem-solving across platform and application domains. Ability to use a wide variety of open-source technologies and cloud services.
Build, maintain, and monitor configuration standards.
Ensuring critical system security through using best-in-class cloud security solutions.
Analyze current technology utilized within the company and develop steps and processes to improve and expand upon them
Work closely with engineers and developers to build and maintain the tools needed for projects to be completed efficiently
Seek to continually improve DevOps processes like deployments and infrastructure cost
Handle sudden spikes and anomalies in the shortest possible time and in a cost-effective manner.
Desired Candidate Profile
Good command of English, written and spoken.
Good analytical, communication, problem solving, and learning skills.
Cloud: AWS/GCP/Azure (Professional Certification would be great)
Infrastructure-as-Code: CloudFormation/Terraform
Configuration Management: Ansible/Chef/Puppet
Continuous Integration, Delivery, and Deployment principles