Position Responsibilities:
- Work closely with engineering and production operations teams to design, build, and maintain large-scale distributed systems.
- Automate current and new systems and processes
- Design, develop and execute automated tests to validate solutions and environments.
- Troubleshoot issues across the entire stack
- Maintain a hybrid environment consisting of both hosted as well as public and private cloud systems.
- Represent the Production Operations team and contribute towards new and ongoing technology projects in areas of scalability, performance and availability.
- Maintain current and future configuration processes and policies.
- Perform troubleshooting analysis and implement fixes to ensure availability SLAs are met.
- Take part in a 24x7 on-call rotation.
Position Qualifications:
- Command of your favorite scripting language: Python, Perl, Ruby, Bash, Java, C++,
- Linux systems administration background.
- 3+ years of experience Linux systems administration
- Experience with web server configuration, monitoring, trending, network design and high availability.
- A desire to use existing tools when possible, and build custom solutions when needed.
- Familiarity with a wide range of open-source tools and solutions, to guide selection and implementation in a quickly-growing and always-changing environment. Specific examples include monitoring, version control, system automation, graphing, load testing, and build/integrations/release tool systems
- Require limited supervision and direction; drive results and set priorities independently.
- Ability to handle multiple complex tasks, with tight deadlines concurrently.
- Excellent verbal and written communication skills.
- A Passion for supporting, designing, analyzing and troubleshooting large-scale distributed systems.
Preferred Qualifications
- Experience with either GCP or Azure large-scale deployments
- 5+ years of experience with professional Linux-based large-scale operations role
- 5+ years of hands-on operational experience in a high-volume or critical production service environment.
- Container based architecture and deployments (Docker and Kubernetes)
- Experience with web-based Java/J2EE architectures and JVM configuration.
- You have recently argued with someone why Ansible is superior to Puppet or Chef