Design, deployment and maintenance of Grafana Dashboards. Must be able to develop visualizations that are simple yet meaningful for Executive level presentations
Operational support of event notification platform (Pager Duty)
Operational support of observability infrastructure, including following components: Prometheus, Grafana, Loki, Jaeger, FluentD/Bit, etc (on-call shifts)
Assisting internal customers on getting the most from company observability platforms, both guarding during onboarding stage and encouraging the best practice methodologies
Required Skills:
Knowledge of Prometheus and PromQL language
Grafana plugin development
Strong experience in query languages and writing complex queries with joins and aggregate that deals with large amounts of data
2+ years of experience with application and infrastructure monitoring tools
Experience in IT Operations Monitoring dashboards is mandatory
Experience with data integration
Experience with Data validation
Desire Skills:
Understanding of foundational monitoring concepts: avoiding noise, defining CLI/SLO, etc.
Experience designing and implementing central logging solution by using tools such as FluentBit, FluentF, PromTail
Experience with the ELK stack (Logstash, Kibana, Elasticsearch)
Understanding of multiple approaches to data storage such as Prometheus or NoSQL Databases and experience in analyzing sophisticated data sets
Understanding of network and system management standards and event logging protocols such as SNMP, WMI, syslog, and Cisco Telemetry
2+ years of experience working with Kubernetes platform, especially AWS EKS ecosystem
2+ years of experience with CI/CD, GitLab
Understanding of Linux and networking fundamentals.