Tagcor

Required Skills

Site Reliability Engineer

Work Authorization

US Citizen
Green Card
H1B Work Permit

Preferred Employment

Corp-Corp
W2-Permanent
W2-Contract
Contract to Hire

Employment Type

Consulting/Contract

education qualification

UG :- - Not Required
PG :- - Not Required

Other Information

No of position :- ( 1 )
Post :- 3rd Jul 2024

JOB DETAIL

· Contribute to Development Activities: SRE is expected to participate in SDLC activities that include design, develop, test, deploy, and operate, covering both frontend and backend

· Cross-Functional Work: Collaborate with global teams to integrate with existing internal systems and GCP cloud

· Issue Resolution: Triage and resolve product or system issues, ensuring quality and performance

· Documentation: Write technical documentation, support guides, and run books

· Agile Practices: Participate in sprint planning, retrospectives, and other agile activities

· Compliance: Ensure software meets secure development guidelines and engineering standards

SRE Accountability

· General: Use coding, automation, and software engineering principles to ensure scalability, performance, and reliability efficiently and toil-free

· IAC: Build infrastructure as code (IAC) patterns that meet security and engineering standards using one or more technologies (Terraform, scripting with cloud CLI, and programming with cloud SDK)

· CI/CD: Build CI/CD pipelines for build, test and deployment of application and cloud architecture patterns, using platform (Jenkins) and cloud-native toolchains

· Automation: Build automated tooling to deploy service requests to push a change into production. Build runbooks that are comprehensive and detailed to manage detect, remediate and restore services

· Change Management: Work closely with the dev team to ensure all DevSecOps issues are addressed timely, in compliance with Equifax security policies, and adherence to Engineering Handbook

· Incident management: Solve problems and triage complex distributed architecture service maps. On call for high severity application incidents and improving run books to improve MTTR

· RCA and postmortem: Lead root cause analysis and blameless postmortem and own the call to action to remediate recurrences

· Customer Focus: Address service disruptions and downtime ensuring end-customer needs are met, and drive processes for a flawless customer experience ensuring

· Reliability and Availability: Ensure monitoring of SRE golden signals, SLO, SLIs, and SLAs are honoured within error budgets. Work closely with devs, QE, POs, and other stakeholders providing continuous feedback on uptime, scalability, and reliability, and influence best practices with aim of providing excellent operational experiences

· Reliability roadmap: Own the reliability roadmap by taking a holistic view of all data operations management capabilities that includes participating in Production Readiness Review (PRR), and working with stakeholders to ensure DR plans are in place

Must-Have Skills

· General experience: 5-7 years of experience in software engineering, systems administration, database administration, and networking. System administration skills, including automation and orchestration of Linux/Windows using Terraform, Chef, Ansible and/or containers (Docker, Kubernetes), and shell scripting

· Cloud-Native Application Development: 3+ years. Solid experience with developing and supporting cloud-native applications. Experience with cloud-based security: IAM, AuthZ

· End-user Application Experience: 3+ years experience as a SRE supporting an end-user facing application, e.g web/mobile/desktop app that includes UI, APIs, and backend systems

· Development Experience: 2+ years of general proficiency with Java, or JavaScript/NodeJS

· Frontend Experience: Experience with Angular, JavaScript, TypeScript, or modern web application development frameworks

· Architecture Knowledge: Understanding of modular systems, performance, scalability, security

· Agile Experience: Agile development mindset and experience

· Service-Oriented Architecture: Knowledge of RESTful web services, JSON, AVRO

· Application Troubleshooting: Debugging, performance tuning, production support

· Documentation Skills: Strong written and verbal communication

· General SDLC: Experience with CI/CD concepts and can use tools including Jenkins/Bamboo, and release management concepts. Understanding of GCP services related to big data like BigQuery, Dataflow, Pub/Sub,GCS, Composer/Airflow. Or, similar solutions in AWS: Redshift, SNS, SQS, S3, Kinesis and others

Nice-to-Have Skills

· Big Data Processing: ETL/ELT experience

· Scripting Languages: Groovy, Python

· Cloud Certification: Relevant certifications in cloud technologies