Mad Street Den is looking for a DevOps Manager who will maintain and execute organizational policies and procedures for change management, configuration management, release and deployment management, service monitoring, support, and problem management. If collaborating cross-functionally, translating requirements into technical solutions with minimal supervision is your strength, it s you we re looking for!
Responsibilities:
- Configure, optimize, document, and support all the infrastructure components hosted in cloud services such as AWS, Azure
- Implement and support Real User Monitoring solutions as part of our overall Application Performance Monitoring strategy
- Design, build, and deliver cloud computing solutions, hosted services, and underlying software infrastructure
- Work with engineers on the design, deployment, and continuous improvement of important infrastructure services (i.e., logging, monitoring, and alerting)
- Support and troubleshoot scalability, high availability, performance, monitoring, backup, and restoration of different environments
- Act as a subject matter expert for troubleshooting and resolving complex, multi-tier web problems that span several different platforms
- Take full ownership of the DevOps discipline and manage all server/infrastructure-related issues
- Manage, nurture, and grow the DevOps team and be responsible for their learning and development
- Evaluate new tools, technologies, and processes to improve the speed, efficiency, and scalability of websites continuous integration environments
Requirements:
- 5+ years of experience working with cloud platforms like AWS Azure
- 3+ years of experience leading a team technically
- Good understanding of core DevOps principles (e.g., testing automation, BDD, TDD, Release automation, CI/CD, etc.)
- Has experience with and is opinionated about containerization (Docker) and container orchestration systems like ECS, Kubernetes
- Hands-on experience on web server log monitoring, the configuration of reverse proxies
- Expert in one or more Application Performance Monitoring tools like Prometheus, Grafana, ELK
- Good understanding of how different software components interact in a distributed environment
- Familiarity with Relational databases and NoSQL database
- Ability to work independently, prioritizing existing projects, and proactively determining areas requiring additional attention, monitoring, or maintenance
- Prior working experience in Python and one or more scripting languages like Bash/ Golang
- Experience in working with Infrastructure as a code with tools like Terraform
- Exposure to config management tools like Ansible
- Has a deep understanding of server/network security concepts and implementations
- Hands-on experience in installing, configuring, operating, and monitoring any CI/CD pipeline tools and security tools
- Must exhibit a high-level end-to-end understanding of web application design, construction, testing, deployment, and delivery
- Certifications like CKAD and CKA are a bonus