Required Skills

Site Reliability Engineer

Work Authorization

  • US Citizen

  • Green Card

  • EAD (OPT/CPT/GC/H4)

Preferred Employment

  • Corp-Corp

  • W2-Permanent

  • W2-Contract

  • Contract to Hire

Employment Type

  • Consulting/Contract

education qualification

  • UG :- - Not Required

  • PG :- - Not Required

Other Information

  • No of position :- ( 1 )

  • Post :- 2nd Aug 2025

JOB DETAIL

  • -Fleet monitoring & recovery of assets in our private cloud environment that houses several compute servers with NVIDIA GPUs.
    -Specific focus on building and stabilizing our virtualization infrastructure of ESXi, KVM and Hyper-V.
    -Deploy and maintain a large farm of machines using the latest Configuration Management & Infrastructure Automation tools (Chef, Ansible, Terraform).
    -Participate in on-call & rotational L1 support for round-the-clock monitoring and remediation of infrastructure issues (PagerDuty)
    -Analyze and Debug operating system, networking, configuration and performance problems.
    -Assist in roll-out and deployment of infrastructure configurations to supporting the latest hardware and technologies.
    -Contribute to the development of monitoring systems to have fast, reliable and real-time pulse of the various infrastructure subsystems (Zabbix, Big Panda, Grafana)

Apply now!

  • -Bachelor’s or Master’s Degree in Computer Science or Software Engineering, or equivalent experience.
    -Good with system and platform debugging 
    -Virtualization experience (key match if available) - (vSphere, Hyper-V, KVM, Xen server)
    -Familiar with Client Configuration tools (Chef (preferred), Ansible)
    -Experience working in large scale enterprise production systems. -5+ years of professional experience required.
    -Ability to debug and analyze system issues, code to triage, root cause and resolve issues in the infrastructure. Work closely with the platform engineering team in understanding hardware setups.
    -Familiar with maintenance and setup of Linux, Windows hosts
    -Scripting experience with any of Python, Go. Unix shell proficiency.
    -Experience with version control systems like Perforce, GIT.
  • Preferred:
    -Familiar with private cloud setups (VMware, Dell, Apple)
    Scripting (bash, python, go)
    -Experience with VM and hardware virtualization technologies like VMware, KVM, Hyper-V, Docker and Kubernetes.
    -Background with automating bare metal and VM provisioning.
    -Experience with supporting GPUs, embedded device development, driver development and CUDA/TensorRT applications.
    -Development experience in Chef, Ansible and infrastructure orchestration.

Company Information