Tagcor

Required Skills

Site Reliability Engineer Splunk ServiceNow Azure

Work Authorization

US Citizen
Green Card

Preferred Employment

Corp-Corp
W2-Permanent
W2-Contract
Contract to Hire

Employment Type

Consulting/Contract

education qualification

UG :- - Not Required
PG :- - Not Required

Other Information

No of position :- ( 1 )
Post :- 22nd Sep 2025

JOB DETAIL

• Cloud Strategy

• Provide thought leadership, mentorship, and technical vision related to site reliability, DevOps, and a ‘cloud-first’ culture.

• Analyze and implement cloud services to meet business goals, focusing on cost optimizations, efficiencies, and scalability.

• Drive orchestration efforts for cloud services, design self-service aspects, and stay updated with emerging cloud technologies.

• Infrastructure Automation and Design

• Collaborate on designing, building, and maintaining scalable infrastructure across cloud and on-prem environments.

• Automate provisioning and configuration using tools like Terraform, Terragrunt, and Puppet.

• Develop automation scripts, maintain CI/CD pipelines, and plan for scalability and capacity, conducting load testing as needed.

• Reliability and Performance Engineering

• Ensure system reliability, availability, and performance through monitoring, alerting, and incident response.

• Implement and manage SLOs/SLIs to meet reliability standards.

• Identify and address performance bottlenecks across the infrastructure and application stack.

• Build and maintain observability solutions (e.g., monitoring, logging, and tracing) and improve system health dashboards.

• Security and Compliance:

• Implement security measures for Cloud Native applications and ensure compliance with industry standards (SOC2, PCI, etc).

• Collaborate with security teams to audit and monitor systems, continuously updating security configurations and dashboards.

• Incident Management and Root Cause Analysis:

• Participate in on-call rotations to provide 24/7 support for production environment.

• Lead incident response activities and perform root cause analysis to prevent recurring incidents.

• Conduct and document post-incident retrospectives (postmortems) to drive continuous improvement.

• Create and Maintain runbooks and operational documentation for continuous improvement.

• Proactively test system resilience through Chaos Engineering experiments and failure injection.

• Disaster Recovery and Business Continuity

• Design and test disaster recovery (DR) and business continuity strategies, ensuring backup and failover mechanisms are effective.

• Cost Management and Financial Optimization

• Monitor cloud usage and implement financial optimization practices (FinOps) to control infrastructure costs.

• Collaborate with stakeholders to drive financial efficiency.

• Collaboration, Knowledge Sharing, and Communication:

• Collaborate across teams to ensure alignment and effective project implementation.

• Communicate during incidents and changes, providing transparency to stakeholders.

• Mentor and share knowledge with team members to foster a collaborative and continuous learning environment.

• Maintain comprehensive documentation of system configurations, processes, and best practices.

Site Reliability Engineer