Skip to content
Snippets Groups Projects
  1. Nov 20, 2018
    • Jurica Seva's avatar
      Training VAE on PubTator text to use for transfer and fine tune on the corpora... · f29eaba9
      Jurica Seva authored
      Training VAE on PubTator text to use for transfer and fine tune on the corpora at hand. VAE as input to ranking function.
      
      To Do Now:
      1) start writing paper as you do each module!
      2) Add corpora overview (i.e. unique documents, cancer types, ratio 0/1 for about_cancer and clinical tasks)
      3) Describe preparation (link to BioNLP paper)
      4) Desribe training process for a) DL models (used FastText, 75 percentile for sentence length, used masking) with 5-fold CV b) sklearn models (custom tokenizer, tf-idf, no chi2, RandomizedSearchCV with 1- folds, char and word n-gram, hyperparam optimization with 10 combinations)
      
      To Do Future:
      1) reindex VIST with new models (after selecting best one)
      2) extend ranking function evaluation metrics with MAP, MAR
      3) try Logistical Regression and (V)AE as ranking function
      f29eaba9
  2. Nov 19, 2018
    • Jurica Seva's avatar
      Evluation finished/in progress. Write up begun. Corpora analysis, data prepartion et al. · 63d16a7f
      Jurica Seva authored
      Developed VAE for document retrieval.
      
      To Do Now:
      1) start writing paper as you do each module!
      2) Add corpora overview (i.e. unique documents, cancer types, ratio 0/1 for about_cancer and clinical tasks)
      3) Describe preparation (link to BioNLP paper)
      4) Desribe training process for a) DL models (used FastText, 75 percentile for sentence length, used masking) with 5-fold CV b) sklearn models (custom tokenizer, tf-idf, no chi2, RandomizedSearchCV with 1- folds, char and word n-gram, hyperparam optimization with 10 combinations)
      
      To Do Future:
      1) reindex VIST with new models (after selecting best one)
      2) extend ranking function evaluation metrics with MAP, MAR
      3) try Logistical Regression and (V)AE as ranking function
      63d16a7f
  3. Nov 07, 2018
    • Jurica Seva's avatar
      Evluation finished/in progress. · d349f5fd
      Jurica Seva authored
      Developed VAE for document retrieval.
      
      To Do Now:
      1) start writing paper as you do each module!
      2) Add corpora overview (i.e. unique documents, cancer types, ratio 0/1 for about_cancer and clinical tasks)
      3) Describe preparation (link to BioNLP paper)
      4) Desribe training process for a) DL models (used FastText, 75 percentile for sentence length, used masking) with 5-fold CV b) sklearn models (custom tokenizer, tf-idf, no chi2, RandomizedSearchCV with 1- folds, char and word n-gram, hyperparam optimization with 10 combinations)
      
      To Do Future:
      1) reindex VIST with new models (after selecting best one)
      2) extend ranking function evaluation metrics with MAP, MAR
      3) try Logistical Regression and (V)AE as ranking function
      d349f5fd
  4. Oct 31, 2018
    • Jurica Seva's avatar
      Implemented all ML models; evalauation underway. · c0cb111f
      Jurica Seva authored
      To Do Now:
      1) start writing paper as you do each module!
      2) Add corpora overview (i.e. unique documents, cancer types, ratio 0/1 for about_cancer and clinical tasks)
      3) Describe preparation (link to BioNLP paper)
      4) Desribe training process for a) DL models (used FastText, 75 percentile for sentence length, used masking) with 5-fold CV b) sklearn models (custom tokenizer, tf-idf, no chi2, RandomizedSearchCV with 1- folds, char and word n-gram, hyperparam optimization with 10 combinations)
      
      To Do Future:
      1) reindex VIST with new models (after selecting best one)
      2) extend ranking function evaluation metrics with MAP, MAR
      3) try Logistical Regression and (V)AE as ranking function
      c0cb111f
  5. Oct 23, 2018
    • Jurica Seva's avatar
      Defined evaluations. · f65d561d
      Jurica Seva authored
      To Do:
      1) each corpus (excluding Onco+/- and PubMed (used as clinically not relevant)
      2) concat all datasets
      3) leave on out (ablation)
      
      Models:
      1. HATT
      2. MTL
      3. Linear SVM
      4. Random Forrest
      f65d561d
  6. Oct 08, 2018
    • Jurica Seva's avatar
      Defined evaluations. · 95a14714
      Jurica Seva authored
      To Do:
      1) each corpus (excluding Onco+/- and PubMed (used as clinically not relevant)
      2) concat all datasets
      3) leave on out (ablation)
      
      Models:
      1. HATT
      2. MTL
      3. Linear SVM
      4. Random Forrest
      95a14714
  7. Oct 01, 2018
  8. Sep 26, 2018
    • Jurica Seva's avatar
      Data preperation for MTL done. · ecca3cba
      Jurica Seva authored
      Not stratified but no unseen train/test classes.
      Next: tokenize, convert to sequences, labelize, save all models and store data as such.
      Implement MTL models.
      Evaluate.
      ecca3cba
  9. Sep 25, 2018
  10. Sep 17, 2018
  11. Sep 16, 2018