-
US Citizen
-
Green Card
-
EAD (OPT/CPT/GC/H4)
-
H1B Work Permit
-
Corp-Corp
-
W2-Permanent
-
W2-Contract
-
Contract to Hire
-
UG :- - Not Required
-
PG :- - Not Required
-
No of position :- ( 1 )
-
Post :- 26th Feb 2025
- Architecture & Design
- Design end-to-end Grafana solutions for metrics, logs, traces, and dashboards, ensuring scalability, security, and compliance.
- Architect integrations with Prometheus, Loki, Mimir, Tempo, and third-party tools (e.g., AWS CloudWatch, Datadog).
- Define best practices for Grafana deployment (self-managed vs. Grafana Cloud) and optimize data storage/retention strategies.
- SRE Leadership
- Implement SRE principles: SLAs/SLOs/SLIs, error budgets, and blameless post-mortems.
- Build automated monitoring/alerting systems to preemptively identify system bottlenecks and failures.
- Lead incident response, root cause analysis, and remediation for observability-related outages.
- Collaboration & Integration
- Partner with DevOps teams to embed Grafana into CI/CD pipelines and automate provisioning via IaC (Terraform, Ansible).
- Work with developers to instrument applications for observability (OpenTelemetry, custom exporters).
- Advise stakeholders on cost-effective monitoring strategies and resource optimization.
- Performance Optimization
- Tune Grafana dashboards, queries, and data sources for high-performance environments.
- Optimize PromQL/Loki LogQL queries and manage large-scale time-series databases (Mimir).
- Conduct capacity planning and disaster recovery testing for Grafana ecosystems.
- Governance & Security
- Ensure compliance with security policies (RBAC, SSO, encryption) and audit requirements.
- Monitor Grafana stack health, perform upgrades, and enforce version control.
- Mentorship & Innovation
- Mentor SRE/engineering teams on Grafana best practices and SRE culture.
- Stay ahead of Grafana/Observability trends and pilot new tools (e.g., AI-driven anomaly detection).