Monitor and improve the availability, performance, and security of production services.
- Apply prevention steps in order to improve production services reliability.
- Mitigate issues on production systems and build solutions through automation to prevent them from reoccurring.
- Enhance and feed the monitoring system to improve service reliability and to provide other teams at CyberArk with the dashboards to help deliver an excellent service to our customers.
- Automate common, repeatable tasks using Ansible and scripting languages.
- Triage and manage escalation of cases.
- Performance deliberate and structured Troubleshooting.
- Share the on-call rotation and act as an escalation contact for incidents.
- Influence design / architecture of services to proactively prevent system failures.
Role requirements
- 5-8 years of experience focused on site reliability, DevOps Engineering, system administration or application development.
- Strong hands-on experience in:
- Linux/Unix and Windows OS.
- Network architecture and security configurations.
- Hands-on experience with the following scripting technologies:
- Automation/Configuration management using either Ansible, Puppet, Chef, or an equivalent.
- Python, Ruby, Bash, PowerShell.
- Hands-on experience with IAC (Infrastructure as code) like Terraform, CloudFormation.
- Hands-on experience with Cloud infrastructure such as AWS, Azure, GCP.
- Bachelor s Degree in Computer Science or related field.
- Think like an attacker.
- Excellent communication skills.
- Strong attention to detail.
- Strong hands-on technical abilities.
- Strong computer literacy and/or the comfort, ability, and desire to advance technically.
- Strong understanding of Information Security in various environments.
- Demonstrated ability to assume sole and independent responsibilities.
- Ability to keep track of numerous detail-intensive, interdependent tasks and ensure their accurate completion.