- Nov 20, 2018
-
-
Jurica Seva authored
Training VAE on PubTator text to use for transfer and fine tune on the corpora at hand. VAE as input to ranking function. To Do Now: 1) start writing paper as you do each module! 2) Add corpora overview (i.e. unique documents, cancer types, ratio 0/1 for about_cancer and clinical tasks) 3) Describe preparation (link to BioNLP paper) 4) Desribe training process for a) DL models (used FastText, 75 percentile for sentence length, used masking) with 5-fold CV b) sklearn models (custom tokenizer, tf-idf, no chi2, RandomizedSearchCV with 1- folds, char and word n-gram, hyperparam optimization with 10 combinations) To Do Future: 1) reindex VIST with new models (after selecting best one) 2) extend ranking function evaluation metrics with MAP, MAR 3) try Logistical Regression and (V)AE as ranking function
-
- Nov 19, 2018
-
-
Jurica Seva authored
Developed VAE for document retrieval. To Do Now: 1) start writing paper as you do each module! 2) Add corpora overview (i.e. unique documents, cancer types, ratio 0/1 for about_cancer and clinical tasks) 3) Describe preparation (link to BioNLP paper) 4) Desribe training process for a) DL models (used FastText, 75 percentile for sentence length, used masking) with 5-fold CV b) sklearn models (custom tokenizer, tf-idf, no chi2, RandomizedSearchCV with 1- folds, char and word n-gram, hyperparam optimization with 10 combinations) To Do Future: 1) reindex VIST with new models (after selecting best one) 2) extend ranking function evaluation metrics with MAP, MAR 3) try Logistical Regression and (V)AE as ranking function
-
- Nov 07, 2018
-
-
Jurica Seva authored
Developed VAE for document retrieval. To Do Now: 1) start writing paper as you do each module! 2) Add corpora overview (i.e. unique documents, cancer types, ratio 0/1 for about_cancer and clinical tasks) 3) Describe preparation (link to BioNLP paper) 4) Desribe training process for a) DL models (used FastText, 75 percentile for sentence length, used masking) with 5-fold CV b) sklearn models (custom tokenizer, tf-idf, no chi2, RandomizedSearchCV with 1- folds, char and word n-gram, hyperparam optimization with 10 combinations) To Do Future: 1) reindex VIST with new models (after selecting best one) 2) extend ranking function evaluation metrics with MAP, MAR 3) try Logistical Regression and (V)AE as ranking function
-
- Oct 31, 2018
-
-
Jurica Seva authored
To Do Now: 1) start writing paper as you do each module! 2) Add corpora overview (i.e. unique documents, cancer types, ratio 0/1 for about_cancer and clinical tasks) 3) Describe preparation (link to BioNLP paper) 4) Desribe training process for a) DL models (used FastText, 75 percentile for sentence length, used masking) with 5-fold CV b) sklearn models (custom tokenizer, tf-idf, no chi2, RandomizedSearchCV with 1- folds, char and word n-gram, hyperparam optimization with 10 combinations) To Do Future: 1) reindex VIST with new models (after selecting best one) 2) extend ranking function evaluation metrics with MAP, MAR 3) try Logistical Regression and (V)AE as ranking function
-
- Oct 23, 2018
-
-
Jurica Seva authored
To Do: 1) each corpus (excluding Onco+/- and PubMed (used as clinically not relevant) 2) concat all datasets 3) leave on out (ablation) Models: 1. HATT 2. MTL 3. Linear SVM 4. Random Forrest
-
- Oct 08, 2018
-
-
Jurica Seva authored
To Do: 1) each corpus (excluding Onco+/- and PubMed (used as clinically not relevant) 2) concat all datasets 3) leave on out (ablation) Models: 1. HATT 2. MTL 3. Linear SVM 4. Random Forrest
-
- Oct 01, 2018
-
-
Jurica Seva authored
To Do: Evaluate on GS data, seperate for each model.
-
- Sep 26, 2018
-
-
Jurica Seva authored
Not stratified but no unseen train/test classes. Next: tokenize, convert to sequences, labelize, save all models and store data as such. Implement MTL models. Evaluate.
-
- Sep 25, 2018
-
-
Jurica Seva authored
Rewrote data preparation/loading, to support both HATT as well as MTL schema. Added recent_pubmed.p for NotCancer exmples. Next: create labeling schemata for MTL. Input: training doc [embedded] Output { [0,1], [0,1], [0,..,n ]
-
- Sep 17, 2018
-
-
Jurica Seva authored
Hierarchical text classification works. minor issues with tokenization (performed twice, unnecessary). ToDo: implement P/R/F1 score on validation data, to compare with stat Ml performance.
-
Jurica Seva authored
-
- Sep 16, 2018
-
-
Jurica Seva authored
-
Jurica Seva authored
-
Jurica Seva authored
-
Jurica Seva authored
-