1.Infrastructure Management:
- Design, deploy, and maintain AWS infrastructure to ensure high availability, performance, and security.
- Implement infrastructure as code using tools such as Terraform or CloudFormation.
2.Monitoring and Observability:
- Develop and maintain monitoring, logging, and alerting solutions using OpenTelemetry, New Relic, Splunk, and other related tools.
- Ensure comprehensive observability for all services and infrastructure components.
3.Incident Management:
- Respond to and resolve incidents, ensuring minimal downtime and impact on end-users.
- Conduct root cause analysis for incidents and implement preventive measures.
4.Performance Optimization:
- Analyze system performance metrics and optimize resource utilization.
- Collaborate with development teams to improve application performance and reliability.
5.Automation and Tooling:
- Automate repetitive tasks and processes to improve efficiency and reduce human error.
- Develop and maintain CI/CD pipelines to ensure smooth deployment processes.
6.Security and Compliance:
- Implement security best practices and ensure compliance with industry standards.
- Conduct regular security assessments and audits.
7.Collaboration and Communication:
- Work closely with development, QA, and operations teams to ensure seamless integration and deployment of services.
- Communicate effectively with stakeholders regarding system status, incidents, and project updates.
Skill Set:
Technical Skills:
- Strong experience with AWS services (EC2, S3, RDS, Lambda, etc.).
- Proficiency in infrastructure as code tools like Terraform or CloudFormation.
- Deep understanding of monitoring and observability tools, especially OpenTelemetry, New Relic, and Splunk.
- Solid scripting skills in languages such as Python, Bash, or similar.
- Familiarity with containerization and orchestration tools like Docker and Kubernetes.
- Experience with CI/CD tools such as Jenkins, GitLab CI, or AWS CodePipeline.
Problem-Solving Skills:
- Ability to troubleshoot and resolve complex system issues.
- Analytical mindset for performance tuning and optimization.
Communication Skills:
- Excellent verbal and written communication skills.
- Ability to collaborate effectively with cross-functional teams.
Security and Compliance Knowledge:
- Understanding of security best practices and compliance requirements.
- Experience conducting security assessments and audits.
Soft Skills:
- Strong organizational and multitasking abilities.
- Proactive attitude with a focus on continuous improvement.
Preferred Qualifications:
- Relevant certifications (e.g., AWS Certified Solutions Architect, Certified Kubernetes Administrator).
- Experience with additional monitoring tools and platforms.