Commits · f29eaba9f055e975f37049f49f5fb7513213ff91 · Jurica Seva / deepClassification

Nov 20, 2018

Training VAE on PubTator text to use for transfer and fine tune on the corpora... · f29eaba9

Jurica Seva authored 6 years ago

Training VAE on PubTator text to use for transfer and fine tune on the corpora at hand. VAE as input to ranking function.

To Do Now:
1) start writing paper as you do each module!
2) Add corpora overview (i.e. unique documents, cancer types, ratio 0/1 for about_cancer and clinical tasks)
3) Describe preparation (link to BioNLP paper)
4) Desribe training process for a) DL models (used FastText, 75 percentile for sentence length, used masking) with 5-fold CV b) sklearn models (custom tokenizer, tf-idf, no chi2, RandomizedSearchCV with 1- folds, char and word n-gram, hyperparam optimization with 10 combinations)

To Do Future:
1) reindex VIST with new models (after selecting best one)
2) extend ranking function evaluation metrics with MAP, MAR
3) try Logistical Regression and (V)AE as ranking function

f29eaba9

Nov 19, 2018

Evluation finished/in progress. Write up begun. Corpora analysis, data prepartion et al. · 63d16a7f

Jurica Seva authored 6 years ago

Developed VAE for document retrieval.

To Do Future:
1) reindex VIST with new models (after selecting best one)
2) extend ranking function evaluation metrics with MAP, MAR
3) try Logistical Regression and (V)AE as ranking function

63d16a7f

Nov 07, 2018

Evluation finished/in progress. · d349f5fd

Jurica Seva authored 6 years ago

Developed VAE for document retrieval.

To Do Future:
1) reindex VIST with new models (after selecting best one)
2) extend ranking function evaluation metrics with MAP, MAR
3) try Logistical Regression and (V)AE as ranking function

d349f5fd

Oct 31, 2018

Implemented all ML models; evalauation underway. · c0cb111f

Jurica Seva authored 6 years ago

To Do Future:
1) reindex VIST with new models (after selecting best one)
2) extend ranking function evaluation metrics with MAP, MAR
3) try Logistical Regression and (V)AE as ranking function

c0cb111f

Oct 23, 2018

Defined evaluations. · f65d561d

Jurica Seva authored 6 years ago

To Do:
1) each corpus (excluding Onco+/- and PubMed (used as clinically not relevant)
2) concat all datasets
3) leave on out (ablation)

Models:
1. HATT
2. MTL
3. Linear SVM
4. Random Forrest

f65d561d

Oct 08, 2018

Defined evaluations. · 95a14714

Jurica Seva authored 6 years ago

To Do:
1) each corpus (excluding Onco+/- and PubMed (used as clinically not relevant)
2) concat all datasets
3) leave on out (ablation)

Models:
1. HATT
2. MTL
3. Linear SVM
4. Random Forrest

95a14714

Oct 01, 2018
- Both MTL and HATT for all datasets done. Results are good. · fa060fdf
  Jurica Seva authored 6 years ago
```
To Do: Evaluate on GS data, seperate for each model.
```
  fa060fdf
Sep 26, 2018

Data preperation for MTL done. · ecca3cba

Jurica Seva authored 6 years ago

Not stratified but no unseen train/test classes.
Next: tokenize, convert to sequences, labelize, save all models and store data as such.
Implement MTL models.
Evaluate.

ecca3cba

Sep 25, 2018

Rewrote data preparation/loading, to support both HATT as well as MTL schema.... · 92fb588a

Jurica Seva authored 6 years ago

Rewrote data preparation/loading, to support both HATT as well as MTL schema. Added recent_pubmed.p for NotCancer exmples.
Next: create labeling schemata for MTL.
Input: training doc [embedded]
Output { [0,1], [0,1], [0,..,n ]

92fb588a

Sep 17, 2018
- Hierarchical text classification works. minor issues with tokenization... · 39204378
  Jurica Seva authored 6 years ago
```
Hierarchical text classification works. minor issues with tokenization (performed twice, unnecessary).

ToDo:
implement P/R/F1 score on validation data, to compare with stat Ml performance.
```
  39204378
- Added some initial info and steps overview. created data structure. · 43b08bf2
  Jurica Seva authored 6 years ago
  
  43b08bf2
Sep 16, 2018
- Added some initial info and steps overview. created data structure. · 4e9bb152
  Jurica Seva authored 6 years ago
  
  4e9bb152
- Added some initial info and steps overview. created data structure. · 560caa71
  Jurica Seva authored 6 years ago
  
  560caa71
- Merge branch 'master' of https://gitlab.informatik.hu-berlin.de/sevajuri/deepClassification · 089b4a05
  Jurica Seva authored 6 years ago
  
  089b4a05
- Initial repo. · 911ad8d0
  Jurica Seva authored 6 years ago
  
  911ad8d0
- Initial commit · 89615bfb
  Jurica Seva authored 6 years ago
  
  89615bfb