Looking for a talented Site Reliability Engineer (SRE) with a strong background in Google Cloud Platform (Google Cloud Platform), and RedHat OpenShift administration. The ideal candidate will be responsible for ensuring the reliability, performance, and scalability of our on-premise and cloud-based systems along with focus on reducing costs for Google Cloud.
- System Reliability: Ensure the reliability and uptime of critical services and infrastructure.
- Google Cloud Expertise: Design, implement, and manage cloud infrastructure using Google Cloud services.
- Automation: Develop and maintain automation scripts and tools to improve system efficiency and reduce manual intervention.
- Monitoring and Incident Response: Implement monitoring solutions and respond to incidents to minimize downtime and ensure quick recovery.
- Collaboration: Work closely with development and operations teams to improve system reliability and performance.
- Capacity Planning: Conduct capacity planning and performance tuning to ensure systems can handle future growth.
- Documentation: Create and maintain comprehensive documentation for system configurations, processes, and procedures.