Required Skills

Data Engineer

Work Authorization

  • US Citizen

  • Green Card

  • EAD (OPT/CPT/GC/H4)

  • H1B Work Permit

Preferred Employment

  • Corp-Corp

  • W2-Permanent

  • W2-Contract

  • Contract to Hire

Employment Type

  • Consulting/Contract

education qualification

  • UG :- - Not Required

  • PG :- - Not Required

Other Information

  • No of position :- ( 1 )

  • Post :- 27th Nov 2025

JOB DETAIL

  • Day to day:

 

    • This role will focus on workflow management for data sets in the genomic space. Ability to create workflow pipelines by way of data management and engineering.
    • Utilize Python, AWS, and Kubernetes to design, develop, optimize, and maintain scalable bioinformatics workflows for processing and analyzing large-scale genomics datasets in the cloud and in-house
    • Include a flexible modular architecture into the workflows to enable the exchange of analysis components and different algorithms
    • Implement the bioinformatics data processing pipelines using workflow management tools and programming languages such as Python
    • Work with team members to perform quality control and validation of pipelines to ensure accuracy and reproducibility of results
    • Document the development processes, including code, workflows, data flow diagrams, and standard operating procedures, following software development and DataOps best practices

 

Qualifications: (Recruiter)

  • Must Haves:
    • Bachelors or higher in Engineering (prefer someone outside of Biology/ sciences)  
    • Open on years’ experience as long as they have the following:
      • Ability to create robust & scalable data-workflows/ pipelines
      • Python
      • AWS
      • Kubernetes

 

  • Plus Haves:
    • Life Sciences/ Bioinformatics/ Genomics background
  • Perfect fit:
    •  
  • Disqualifiers:
    •  

Interview Process: (Account Manager)

  1. 45 minute skills assessment – candidate will access a Github file/ doc. Focused in Python skills
  2. Teams w/ Maurcio, panel w/ his boss and 2 engineers on the team

Ending Questions: (Account Manager)

  • When can we put some time in calendar to walk through candidates we are coming across? Thurs
  • Are there any other recruiting companies or internal HR working on this role? Yes LOTS
  • Are there any other people in process? No. Will be setting up interviews next week

 

 

 

Project Scope and Brief Description:

The position is for work in the bioinformatics space, principally writing new and/or maintaining existing bioinformatics workflows and pipelines such as an Eukaryote Genome Annotation Pipeline. As such the role requires knowledge of Cloud technologies (AWS, Kubernetes, Container orchestration) as well as experience with industry-level scientific workflow management.

Responsibilities:

  • Design, develop, optimize, and maintain scalable bioinformatics workflows for processing and analyzing large-scale genomics datasets in the cloud and in-house
  • Include a flexible modular architecture into the workflows to enable the exchange of analysis components and different algorithms
  • Implement the bioinformatics data processing pipelines using workflow management tools and programming languages such as Python
  • Work with team members to perform quality control and validation of pipelines to ensure accuracy and reproducibility of results
  • Document the development processes, including code, workflows, data flow diagrams, and standard operating procedures, following software development and DataOps best practices

Skills / Experience:

Required Qualifications

  • Previous experience developing industrial scale scientific data workflows.
  • Strong programming skills in Python including libraries for Data Science such as NumPy, Pandas, NetworkX, matplotlib, etc. 
  • Working knowledge of container technologies (such as Docker, ContainerD, or Podman) and container orchestration.
  • Experience with data pipeline tools (like Argo, Ray, AirFlow, Redun or NextFlow).
  • Familiarity with the AWS platform (IAM, EC2, S3, CloudWatch, Spot instances) and Kubernetes, EKS, ECS, AWS Batch or other Cloud compute architectures.
  • Ability to work both independently and collaboratively with good communication skills. Interest in learning new technologies

 

Preferred Qualifications

  • Specific experience analyzing large genomics datasets
  • Familiarity with common bioinformatics tools and datatypes for the analysis of NextGen sequencing data
  • Familiarity with statistical analysis methods and tools commonly used in bioinformatics analysis such as Gene Expression or ChIPSeq
  • Knowledge of any additional programming languages such as C, Rust, Perl, R, Unix Shell or others

Company Information