- Production Health Health Monitoring and failures analysis.
- ICM Incidents Creation, Acknowledgement, Analysis, Mitigation and Resolution, RCA.
- ADO user stories/bugs creation with supported analysis and proposed fixes.
- Driving Issue Management triage with PMs.
- Backlog prioritization of Production issues/CRs with PMs.
- LSI communication to impacted stakeholders.
- Up/Down stream co-ordination for Prod/Staging issues and follow-up till resolution.
- Maintain RCAs/KB Articles/TSG materials.
- Daily/Weekly/Monthly/Quarterly Status Reporting.
- Production Health Reporting dashboard of Service health and review with engineering/ leadership.
- Provide data insights on Incidents Trends, resolution categorization, DQ metrics, LSI trends, pipeline failures metrics.
- Focus on customer challenges and look for ways to improve the existing process, reduce effort.
Primary Skill Category Group > Skill Category > Skill
P4 - Applied Intelligence > Data Engineering > Microsoft Azure Data Factory
Secondary Skill Category Group > Skill Category > Skill
P4 - Other Skill Categories > No Category > Incident Analysis