Design and build scalable data pipelines to implement and execute KGE models on drug discovery knowledge graphs.
Conduct comprehensive evaluations of KGE models, including ULTRA, ComplEx, DistMult, RotatE, TransE, and TransH, to assess their performance specifically in biomedical applications.
Analyze model performance across various factors, such as hyperparameters, model initialization, dataset splits, and different knowledge graphs, including Disqover, PrimeKG, Hetionet, and BioKG.
Develop and execute experiments to understand the impact of diverse training setups and configurations on model effectiveness.
Implement hyperparameter optimization techniques to boost model accuracy and generalizability, using tools such as Optuna and PyKEEN.
Collaborate with cross-functional teams to ensure that KGEs align with real-world drug discovery applications and adhere to fair evaluation and reproducibility standards.
Document and communicate findings and recommendations to improve KGE model evaluation practices and contribute to the broader knowledge base in biomedical AI.
Qualifications
Proven experience in building and managing data pipelines and handling large datasets.
Strong programming skills in Python, with proficiency in PyTorch and libraries like PyG and PyKEEN.
Experience with knowledge graphs, machine learning, and graph embedding models and their applications.
Familiarity with biomedical knowledge graphs, such as Disqover, PrimeKG, Hetionet, or BioKG.
Demonstrated expertise in hyperparameter tuning and optimization, including the use of Bayesian optimization techniques.
Excellent analytical skills and the ability to communicate complex ideas effectively to both technical and non-technical stakeholders.