Tagcor

Required Skills

Site Realibility Engineer

Work Authorization

US Citizen
Green Card
EAD (OPT/CPT/GC/H4)
H1B Work Permit

Preferred Employment

Corp-Corp
W2-Permanent
W2-Contract
Contract to Hire

Employment Type

Consulting/Contract

education qualification

UG :- - Not Required
PG :- - Not Required

Other Information

No of position :- ( 1 )
Post :- 1st Dec 2023

JOB DETAIL

● Overall with 7+ years of experience with proven 5+ years as an Observability Engineer, Site
Reliability Engineer (SRE), or similar role.
● Strong proficiency in implementing and maintaining observability tools, such as New Relic,
Datadog, Prometheus, Grafana, ELK Stack (Elasticsearch, Logstash, Kibana), TICK Stack.
● Solid experience with instrumentation practices, including metrics, logging, and distributed
tracing.
● Familiarity with cloud platforms and containerization technologies (Docker, Kubernetes).
● Proficiency in scripting and automation using languages such as Python, Bash, or PowerShell.
● Excellent problem-solving skills with the ability to analyze complex issues and provide
efficient solutions.
● Strong communication skills and the ability to collaborate effectively across teams.
● Understanding of Agile and DevOps principles and their application in observability and
monitoring contexts.
● Relevant certifications in observability tools and practices (e.g Certified Prometheus
Practitioner) are a plus

Roles and Responsibilities:
● Develop and implement observability solutions to gain insights into application and
infrastructure performance, availability, and reliability.
● Collaborate with development, operations, and other teams to instrument applications and
services for metrics, logs, traces, and other relevant data.
● Design and implement monitoring solutions using industry-standard tools and practices to
detect, analyze, and mitigate incidents and anomalies.
● Create and manage dashboards, alerts, and visualization tools to provide real-time visibility
into system behavior and performance.
● Perform in-depth analysis of system behavior and trends to identify areas for improvement,
optimization, and increased efficiency.
● Troubleshoot complex issues by analyzing data from various sources to quickly diagnose and
resolve incidents, minimizing downtime.
● Continuously evaluate and recommend improvements to observability processes, tools, and
practices to align with industry best practices.
● Contribute to the development of automation scripts and tools to enhance observability and
incident response.
● Collaborate with development teams to improve application design for better observability,
including implementing distributed tracing and structured logging.
● Stay updated with emerging trends, technologies, and methodologies in observability,
monitoring, and performance analysis.