Design and implement scalable, resilient, and highly available infrastructure solutions using cloud technologies and best practices leveraging Terraform/Github actions.
Develop comprehensive technical strategies that align with business objectives and drive operational excellence overall and in specific to High Availability/Disaster recovery Cloud systems.
Create architectural blueprints and detailed technical specifications for complex systems and applications. - Develop expertise in event-driven architectures and related technologies (e.g., Apache Kafka/Eventhub, Redis, Mongo Atlas, IoTHub).
DevOps and SRE Practices - Establish and promote DevOps and Site Reliability Engineering (SRE) practices throughout HON IA-PSS Group.
Implement continuous integration and delivery (CI/CD) workflows to streamline software development and deployment processes.
Design and implement monitoring, alerting, and observability solutions to ensure system reliability and performance.
Create self-healing systems and automate routine operational tasks to reduce manual intervention.
Automation and Optimization
Identify opportunities for automation and develop strategies to implement them across the infrastructure.
Optimize existing systems and processes to improve efficiency, reduce costs, and enhance performance.
Implement Infrastructure as Code (IaC) practices using tools like Terraform (Preferred), Ansible, or ARM.
Collaboration and Leadership
Work closely with development teams, operations, and other stakeholders to ensure alignment between technical solutions and business needs.
Provide technical guidance and mentorship to junior engineers and team members.
Act as a liaison between technical and non-technical stakeholders, translating complex concepts into understandable terms.
Risk Management and Security
Assess and mitigate technical risks associated with infrastructure and application architectures.
Ensure that security best practices are integrated into all aspects of the infrastructure and application design.