Incident Management: Troubleshoot, investigate, triage and coordinating incidents with stakeholders.
Problem Management: Work with developers and other infra teams for determining RCA and implement preventative measures. Address and triage to resolve reoccurring issue with a permanent fix.
Autosys Batch Monitoring: Providing daily batch support and generate status reports.
Environment patching management: Supporting infra patch upgrade, coordinate post patching checks and taking QA checkouts for signoffs.
Middleware certificate managements: Support and coordinate b/w RLC and app team for renewal of certificate. Post certificate renewal take signoff from stakeholders.
Environment monitoring:
AppDynamics- To monitor server and JVM health.
Configure alerts and policies for proactive monitoring.
Database health monitoring.
Daily and Weekly incident and environment downtime reporting.
Skills:
7+ years of experience required on relevant field with handling a team.
Proficient knowledge on Unix complex command and shell scripting.
Working knowledge on Autosys tool, Informatica and Cloudera.
Middleware knowledge- Application servers, Web servers (Tomcat), MQ and Kafka.
Web development and API knowledge.
Hand on experience on Azure monitoring tools, AppDynamics, Splunk, Grafana.