— A Course for Graduates and Junior/Senior Undergraduates
Apply the NLP ideas in Biomedicine Texts
没有一个春天不会到来。
Welcome to the course: “BioNLP and Knowledge discovery”
Discover The Syllabuse
Contents
1 Preface
2 Introduction of BioNLP and this Course
2.1 Fundamentals in Linguisitics, NLP and BioNLP…………………… 3
2.2 What Make BioNLP Unique? ……………………………. 6
2.3 Main Research Issues………………………………… 7
2.4 Course Contents for 2020 Spring ………………………….. 9
3 First Class of Linux and Lexcial Analysis
3.1 Corpus and TTR………………………………….. 15
3.2 Commands in Linux ………………………………… 17
3.3 Case study: Comparison of BROWN Corpus and PubMed……………… 18
3.4 Additional Metric for Text Complexity Evaluation …………………. 18
3.5 Assignment This Week……………………………….. 19
4 R programming and Word Cloud
4.1 R and R Studio …………………………………… 21
4.2 Case Study. Word Cloud Plotting………………………….. 22
4.3 Assignment This Week (TTR and Word Cloud plotting of GENIA and AGAC) . . . . . . 25
5 Gene Ontology (GO Enrichment and R Implementation)
5.1 Ontology ……………………………………… 27
5.2 Case Study. GO Enrichment ……………………………. 29
5.3 Assignment This Week (GO Enrichment Analysis by Using R Packages) . . . . . . . . . . 31
6 Human Phenotype Ontology (Enrichment Theory and HPO Enrichment)
6.1 Theory of Enrichment Analysis …………………………… 33
6.2 Human Phenotype Ontology…………………………….. 35
6.3 HPO-Shuffle, A Case Study on HPO Application ………………….. 36
6.4 Assignment This Week (HPO Enrichment) …………………….. 38
6.5 A Case Study of HPO enrichment …………………………. 39
7 Semantic Annotation with Plant Trait Ontology
7.1 Motive of PTO mapping………………………………. 49
7.2 Plant Trait Ontology………………………………… 49
7.3 PTO Researches on Plant Breeding…………………………. 50
7.4 Assignment This Week (PTO Mapping) ………………………. 54
8 PubMed Terms NER and Shell Programming
8.1 PubMed………………………………………. 55
8.2 PubTator ……………………………………… 56
8.3 Assignment This Week (Gene&mutation Extraction of Interested Genes in Plant-related Abstracts)……………………………………… 58
8.4 A Case Study of Bio Concept Network……………………….. 58
9 Advanced NLP Topic in Dependency Tree and Shortest Dependency Path
9.1 Grammatical Relation of Words…………………………… 65
9.2 Spyder, An user-friendly Interface for Python ……………………. 73
9.3 SpaCy for Dependency Tree & Networkx for SDP………………….. 74
9.4 Assignment This Week (SDP distribution of PTO and gene terms) . . . . . . . . . . . . . 75
10 Advanced NLP Topic in Latent Semantic Analysis, from SVD to LSA
10.1 Intro of SVD ……………………………………. 77
10.2 Proof of SVD……………………………………. 78
10.3 Conclusion of SVD…………………………………. 80
10.4 Application to Latent Semantic Analysis………………………. 80
11 A Customized Biomedical Corpus on Mutations, AGAC
11.1 What For?……………………………………… 81
11.2 AGAC corpus……………………………………. 82
11.3 AGAC Track in BioNLP OST 2019…………………………. 85
12 Advanced NLP Topic in Sequence Labeling, from HMM to CRF
12.1 Road map and external resources for students ……………………. 89
12.2 Graphical Model ………………………………….. 90
12.3 Naïve Bayes, HMM and ME…………………………….. 90
12.4 Viterbi on HMM and CRF …………………………….. 93
12.5 A Case Study of CRF on AGAC ………………………….. 98
13 Advanced NLP Topic in Topic Modeling, from Variational Inference to Gibbs Sampling
13.1 Latent Dirichlet Allocation ……………………………..112
13.2 VI on LDA ……………………………………..115
13.3 Gibbs Sampling on LDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
13.4 Appendix to Gamma/Beta and Dirichlet/Multinomial . . . . . . . . . . . . . . . . 129
14 Modern NLP Topic in Word Embedding, from Count-based to Prediction-based
14.1 Count-based Vector Space Word Representation . . . . . . . . . . . . . . . . . . . . . . . . 131
14.2 Prediction-based Word Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
14.3 Case Study: Word2Vec term embedding with PyTorch . . . . . . . . . . . . . . . . . . . . 136
14.4 Assignment This Week: Word embedding of PTO abstracts ……………..153
14.5 Appendix: Basic Introduction of ANN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
15 Modern NLP Topic in Graph Embedding and Knowledge Graph, about Their Biomedical Application
15.1 Graph Embedding and Knowledge Graph ………………………167
15.2 Translational Distance Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
15.3 Semantic Matching Models ……………………………..172
15.4 Neural Network on Graphs ……………………………..175
15.5 Application of graph embedding to biomedical entities . . . . . . . . . . . . . . . . . . . .190
15.6 Recent progress at NLP and its application in bio field . . . . . . . . . . . . . . . . . . . . 191
16 Acknowledgement
Sample Chapter
Chapter 6

Chapter 9

Chapter 14

Request the Course Note.
The course note is available under request, please fill-in the form below. In addition, any problem about the course, please fill-in the form and send it to me.
所有课程
♥ 回到 我的课程列表和逻辑关系图
Course for BioNLP
学科交叉,融会贯通,学好BioNLP.
Course Hours
See jw.hzau.edu.cn
Office
C610, Yifu bldg
Contact me
xiajingbo.math@gmail.com