The next AIRG is at 4PM on 4/24 in CS 3310.
I have been collaborating with a lot of biologists for several years, and they often ask questions like âWhat genes are involved with cardiomyocytes?â or âWhat genes are involved with WNT signaling?â or maybe âWhat drugs might be useful for
a particular diseaseâ? So, I would go off to Google scholar, and comb through the literature to try to find an association between some gene or drug and some biological process or key phrase (such as âcardiomyocyteâ). I got sick of doing that. While there
is extensive literature on various aspects of NLP, I mostly ignored it to start, and so we built a simple way based on co-occurence to try to rank genes or drugs with regard to how likely they are involved/associated with a biological process or keyphrase.
The first paper is our KinderMiner paper (Finn Kuusisto is the first author):
Published online 2017 Jul 26.
A Simple Text Mining Approach for Ranking Pairwise Associations in Biomedical Applications
, PhD,1 , MS,1 , MS,2 , VMD, PhD,1,2 , PhD,2and , PhD1
KinderMiner provides a quick way to get an idea about what the roughly 30 million articles in PubMed say about your particular biological process or keyphrase you are interested in.
------------------------------------------------------------------------------------------------------------------------
After going through KinderMiner (which is not math heavy and certainly not very AI-ish), I wanted to talk about Serial KinderMining (SKiM). SKiM is designed to look for associations across distinct and separated literature domains. (Example:
some drug or compound might be mentioned in the nutrition literature to have some effect on a symptom. This symptom might be mentioned in some medical journals as being associated with a disease. However, you might not ever see a paper that talks about
the drug/compound and the disease together.) This falls under the larger scope of Literature-Based Discovery (LBD).
A nice review that covers some of the classic examples of LBD is:
PMCID: PMC5771422
NIHMSID: NIHMS932897
Rediscovering Don Swanson: the Past, Present and Future of Literature-Based Discovery
Again, none of this is very AI-ish. Iâll talk a little about SKiM and how we are using it towards LBD about drug repurposing.
ââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
Iâm interested in word embeddings as a potential complement to KinderMining-like approaches.
If there is any time left after that, then I would probably talk about word embeddings and some recent work on context-specific word embeddings such as BERT:
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Ron
Ron Stewart, Ph.D
Associate Director-Bioinformatics
Regenerative Biology Laboratory
Morgridge Institute for Research
608 316-4349