Keywords: SRE with Splunk, Grafana, and Prometheus, Infra, AWS and Kube
Skills:
This role will be:Looking for a Service Reliability Engineer (SRE) to design, build tools and support our large-scale storage systems.
Candidates should have proven software development skills and strong Linux / Systems expertise, understand SRE, and know what it will take to run services at large scale with high operational precision.
Should play a critical role in the day-to-day operations of services partnering with engineering teams to ensure we, and they are successful!
Required skills:
- Multi year experience in Site Reliability Engineering, DevOps, or Infrastructure focused roleExperience supporting and understanding distributed systems, networking, Linux.
- Ability to implement and coordinate telemetry using monitoring and observability tools such as Splunk, Grafana, and PrometheusStrong verbal and written communication skills
- Cloud technologies like Kubernetes, AWSCoding experience using a high-level programming language like Java, Golang, or Python
- Automation advocate - able to write software and scripts for automationA strong sense of ownership.
- At the same time, you’re a great teammate who communicates clearly and transparently -
- Self-motivated, inquisitive, and always looking to learn moreExperience managing, scaling, and troubleshooting Java applications
- Familiarity with cloud infrastructure concepts (zones, regions, VPCs, etc.)An understanding of a variety of software service deployment packaging, strategies, and tooling
- Working understanding of common authentication schemes, certificates, and securely managing secrets
- Capable of designing and implementing automated configuration management processes for repeatable and consistent service deployment