Tagcor

Required Skills

Site Reliability Engineer

Work Authorization

US Citizen
Green Card
EAD (OPT/CPT/GC/H4)
H1B Work Permit

Preferred Employment

Corp-Corp
W2-Contract
1099-Contract
Contract to Hire

Employment Type

Consulting/Contract

education qualification

UG :- - Not Required
PG :- - Not Required

Other Information

No of position :- ( 1 )
Post :- 28th Oct 2022

JOB DETAIL

·Technical/Functional Skills

·Experience in implementing SRE solutions in areas of monitoring, resiliency, incident management and automation.

·Experience resolving issues in areas of user application, cloud platform, system uptimes, system recovery, performance, etc

·Strong hands-on experience developing applications using Java, NodeJS / AngularJS, etc.

·Deep understanding and experience of microservices, API and Web Services.

·Experience working with Web Content Management (WCM) technologies like Adobe Experience Manager(AEM), etc

·Strong scripting experience to automate and reduce toil with Bash , Python, GO etc. in Java/NodeJS runtime environment.

·Exposure to monitoring tools in setting up RUM, APM, Synthetic, Infrastructure, and alerting.

·Hands on experience in setting up dashboards and providing analytics using any Cloud platforms like Azure/AWS, etc.

·Strong hands on experience working with any Real User Monitoring (RUM) tools like Dynatrace, New Relic, AppDynamics, etc.

·Experience working with any Synthetic Monitoring tools like Dynatrace, Blue Triangle, Sematext etc

·Dashboarding experience working with any log monitoring tools like Splunk, Solarwinds, etc.

·Experience with handling infrastructure issues in cloud native applications using docker, Kubernetes, etc.

·Experience supporting Healthcare Domains.

·Experience with CI\CD pipeline using Jenkins and Github.

·Excellent verbal and written communication skills.

·Certifications in any Monitoring tools and Cloud platforms are preferred.

Experience Required

6-8

Roles & Responsibilities

·Responsible for toil reduction in processes involving Testing, Release Management, Change management and Incident management.

·Hands-on experience with identifying required dashboards for Java and NodeJS services.

·Familiar with resolving java script and user application issues in adobe experience manager application.

·Hands-on experience with setting up dashboards using JSON scripting

·Create rules to optimize incident response by metrics, streamlining alert flows, and collaboration and communication across squads/ dev teams

·Hands-on experience in scripting for setting up log monitoring dashboards.

·Proactively identify the issues that might disrupt the service in production.

·Has experience leading P1 priority production issues as an SRE Lead/Manager.

·Hands on experience in leading Root Cause Analysis (RCA) calls with stakeholders.

·Hands on experience with setting up monitoring threshold metrics and alerts in any cloud tools (Azure or AWS). Any certifications are preferred.

·Hands on experience with SSH Monitoring of pods and containers in Kubernetes.

·Partner with development teams to improve services through rigorous testing and release procedures.

·Balance feature development speed and reliability with well-defined service level objective (SLO, SLI)

·Define and monitor service level metrics that include incident management KPIs like: MTTD, MTTR, MTBF, MTTF, Unavailability rate, Incident count, etc.

·Debug production issues across services and API endpoints.

·Building reports using analytics tools like Adobe Analytics or Cloud analytics.