Sagentia Project Proposals

Predicting the cell types of single-cell RNA-Seq samples
Single-cell RNA-Seq makes it possible to characterize the transcriptomes of cell types and identify their transcriptional signatures via differential analysis. The challenge is to create a machine learning program/pipeline for identifying cell types from a dataset of single-cell RNA samples. You can use the method described here:
https://www.biorxiv.org/content/biorxiv/early/2018/02/14/258566.full.pdf

 

Predicting gene expression from histone modification signals
Post-translational modifications to histone proteins can impact gene expression in different ways. The challenge is to predict (with machine learning) the level of gene expression, by analysing histone modification signals. Read the detailed description of the problem and data here:
https://www.kaggle.com/c/gene-expression-prediction

 

Build a predictive model that differentiates between true and false donor sites In human
DNA, most introns start with dinucleotide GT, called the donor side of the intron. However, a gene contains many more GT dinucleotides that are not donor sites. The challenge is to build a machine learning program/pipeline that can tell apart true and false donor sites. Read the detailed description of the problem and data here:
https://www.kaggle.com/c/human-gene-donor-site-prediction

 

Signatures of mutational processes in human cancer data
Somatic mutations are present in all cells of the human body and occur throughout life. Different mutational processes generate unique combinations of mutation types, termed Mutational Signatures. The challenge is to use machine learning to analyse cancer datasets and identify specific signatures. You can read more here:
https://cancer.sanger.ac.uk/cosmic/signatures
https://www.nature.com/articles/nature12477

 

For more details on any of these projects, please contact: steven.wingett@babraham.ac.uk
 

Sagentia Logo