Education: Bachelor’s or Master’s degree in Computer Science, Data Engineering, Information Systems, or a related field
7+ years of experience in data engineering roles, working with large complex data preferably in the pharmaceutical or life sciences domain.
Proven experience working with drug development data, including clinical trials, preclinical studies, and regulatory submissions.
Experience in developing data products and infrastructure to support AI applications.
Managing data pipelines in a variety of environments, and dealing with evolving schemas of source data
Designing and optimizing scalable data pipelines to efficiently process and manage large datasets (100+ million records)
Proficiency in programming languages such as Python, Pyspark, and SQL.
Expertise in data engineering platforms such as Databricks, Snowflake, DBT and their underlying functions
Strong SQL skills and experience with relational databases (e.g., PostgreSQL)
Experience with cloud platforms (e.g., AWS preferred) and infrastructure-as-code tools (e.g., Terraform, CloudFormation).
Familiarity with containerization and orchestration tools like Docker and Kubernetes.
Knowledge of data governance frameworks and compliance with pharmaceutical industry regulations.
Excellent problem-solving skills with a focus on practical solutions.
Enthusiasm for continuous learning and professional growth. A passion for exploring new technologies, frameworks, and software development methodologies.
Embraces rapid prototyping with an emphasis on user feedback
Autonomous and excited about taking ownership over major initiatives.
Strong communication skills, capable of conveying complex technical concepts to both technical and non-technical stakeholders.
Strong collaboration skills, with a demonstrated ability to work effectively in cross-functional teams