BioNLP and Knowledge Discovery, Course Note

— A Course for Graduates and Junior/Senior Undergraduates


Discover The Syllabuse


Contents

1  Preface

2  Introduction of BioNLP and this Course

2.1  Fundamentals in Linguisitics, NLP and BioNLP…………………… 3

2.2  What Make BioNLP Unique? ……………………………. 6

2.3  Main Research Issues………………………………… 7

2.4  Course Contents for 2020 Spring ………………………….. 9

3  First Class of Linux and Lexcial Analysis

3.1  Corpus and TTR………………………………….. 15

3.2  Commands in Linux ………………………………… 17

3.3  Case study: Comparison of BROWN Corpus and PubMed……………… 18

3.4  Additional Metric for Text Complexity Evaluation …………………. 18

3.5  Assignment This Week……………………………….. 19

4  R programming and Word Cloud

4.1  R and R Studio …………………………………… 21

4.2  Case Study. Word Cloud Plotting………………………….. 22

4.3  Assignment This Week (TTR and Word Cloud plotting of GENIA and AGAC) . . . . . . 25

5  Gene Ontology (GO Enrichment and R Implementation)

5.1  Ontology ……………………………………… 27

5.2  Case Study. GO Enrichment ……………………………. 29

5.3  Assignment This Week (GO Enrichment Analysis by Using R Packages) . . . . . . . . . . 31

6  Human Phenotype Ontology (Enrichment Theory and HPO Enrichment)

6.1  Theory of Enrichment Analysis …………………………… 33

6.2  Human Phenotype Ontology…………………………….. 35

6.3  HPO-Shuffle, A Case Study on HPO Application ………………….. 36

6.4  Assignment This Week (HPO Enrichment) …………………….. 38

6.5  A Case Study of HPO enrichment …………………………. 39

7  Semantic Annotation with Plant Trait Ontology

7.1 Motive of PTO mapping………………………………. 49

7.2 Plant Trait Ontology………………………………… 49

7.3 PTO Researches on Plant Breeding…………………………. 50

7.4 Assignment This Week (PTO Mapping) ………………………. 54

8  PubMed Terms NER and Shell Programming

8.1 PubMed………………………………………. 55

8.2 PubTator ……………………………………… 56

8.3 Assignment This Week (Gene&mutation Extraction of Interested Genes in Plant-related Abstracts)……………………………………… 58

8.4 A Case Study of Bio Concept Network……………………….. 58

9 Advanced NLP Topic in Dependency Tree and Shortest Dependency Path

9.1 Grammatical Relation of Words…………………………… 65

9.2 Spyder, An user-friendly Interface for Python ……………………. 73

9.3 SpaCy for Dependency Tree & Networkx for SDP………………….. 74

9.4 Assignment This Week (SDP distribution of PTO and gene terms) . . . . . . . . . . . . . 75

10 Advanced NLP Topic in Latent Semantic Analysis, from SVD to LSA

10.1 Intro of SVD ……………………………………. 77

10.2 Proof of SVD……………………………………. 78

10.3 Conclusion of SVD…………………………………. 80

10.4 Application to Latent Semantic Analysis………………………. 80

11 A Customized Biomedical Corpus on Mutations, AGAC

11.1 What For?……………………………………… 81

11.2 AGAC corpus……………………………………. 82

11.3 AGAC Track in BioNLP OST 2019…………………………. 85

12 Advanced NLP Topic in Sequence Labeling, from HMM to CRF

12.1 Road map and external resources for students ……………………. 89

12.2 Graphical Model ………………………………….. 90

12.3 Naïve Bayes, HMM and ME…………………………….. 90

12.4 Viterbi on HMM and CRF …………………………….. 93

12.5 A Case Study of CRF on AGAC ………………………….. 98

13 Advanced NLP Topic in Topic Modeling, from Variational Inference to Gibbs Sampling

13.1 Latent Dirichlet Allocation ……………………………..112

13.2 VI on LDA ……………………………………..115

13.3 Gibbs Sampling on LDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

13.4 Appendix to Gamma/Beta and Dirichlet/Multinomial . . . . . . . . . . . . . . . . 129

14 Modern NLP Topic in Word Embedding, from Count-based to Prediction-based

14.1 Count-based Vector Space Word Representation . . . . . . . . . . . . . . . . . . . . . . . . 131

14.2 Prediction-based Word Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

14.3 Case Study: Word2Vec term embedding with PyTorch . . . . . . . . . . . . . . . . . . . . 136

14.4 Assignment This Week: Word embedding of PTO abstracts ……………..153

14.5 Appendix: Basic Introduction of ANN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

15 Modern NLP Topic in Graph Embedding and Knowledge Graph, about Their Biomedical Application

15.1 Graph Embedding and Knowledge Graph ………………………167

15.2 Translational Distance Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

15.3 Semantic Matching Models ……………………………..172

15.4 Neural Network on Graphs ……………………………..175

15.5 Application of graph embedding to biomedical entities . . . . . . . . . . . . . . . . . . . .190

15.6 Recent progress at NLP and its application in bio field . . . . . . . . . . . . . . . . . . . . 191

16 Acknowledgement


Sample Chapter

Chapter 6
Chapter 9
Chapter 14

Request the Course Note.


The course note (Latest version: Apr, 2021) is available under request, please fill-in the form below. In addition, if you any problem about the course, feel free to fill-in the form and send it to me.


Course for BioNLP

学科交叉,融会贯通,学好BioNLP.

Course Hours

See jw.hzau.edu.cn

Office

C610, Yifu bldg

Contact me

xiajingbo.math@gmail.com

%d bloggers like this: