@@ -31,7 +31,7 @@ The entire model is optimized using the Adam optimization algorithm \cite{kingma
Model training was performed either for 100 epochs or until an early stopping criteria is met (no change in validation loss for two epochs).
As the provided dataset are imbalanced regarding the tasks' languages, we devised two different evaluation settings: (1) DCEM-Balanced, where each language was supported by 49.823 randomly drawn instances (size of the smallest corpus) and (2) DCEM-Full, where all available data is used.
The results, obtained on the training and validation set, are shown in Table \ref{tab:s2s}.
Table \ref{tab:s2s} shows the results obtained on the training and validation set.
The figures reveal that distribution of training instances per language have a huge influence on the performance of the model.
The model trained on the full training data achieves an accuracy of 0.678 on the validation set.
In contrast using the balanced data set the model reaches an accuracy of 0.899 (+ 32.5\%).
...
...
@@ -45,7 +45,7 @@ In contrast using the balanced data set the model reaches an accuracy of 0.899 (
@@ -106,6 +106,7 @@ We tested both death cause extraction models (based on the balanced and unbalanc
On the contrary, both ICD-10 classification models perform similarly, so we just used the extended ICD-10 classification model, with word level tokens\footnote{Although models supporting character level tokens were developed and evaluated, their performance faired poorly compared to the word level tokens.}, in the final pipeline.
To evaluate the pipeline we build a training and a hold-out validation set during development.
The obtained results on the validation set are presented in Table \ref{tab:final_train}.
The scores are calculated using a prevalence-weighted macro-average across the output classes, i.e. we calculated precision, recall and F-score for each ICD-10 code and build the average by weighting the scores by the number occurrences of the code in the gold standard into account.
\caption{Evaluation results of the final pipeline on the validation set of the training data. Final-Balanced = DCEM-Balanced + ICD-10\_Extended. Final-Full = DCEM-Full + ICD-10\_Extended}
\caption{Evaluation results of the final pipeline on the validation set of the training data. Reported figures represent
the prevalence-weighted macro-average across the output classes. Final-Balanced = DCEM-Balanced + ICD-10\_Extended.
Final-Full = DCEM-Full + ICD-10\_Extended}
\end{table}
Although the individual models, as shown in Tables \ref{tab:s2s} and \ref{tab:icd10Classification} are promising, the performance decreases considerably in a pipeline setting . %, by roughly a third.
@@ -268,7 +268,7 @@ The system proposed in this study provides automatic identification and characte
title={{CLEF} {eHealth} 2018 {Multilingual} {Information} {Extraction} task {Overview}: {ICD}10 {Coding} of {Death} {Certificates} in {French}, {Hungarian} and {Italian}},
booktitle={{CLEF} 2018 {Evaluation} {Labs} and {Workshop}: {Online} {Working} {Notes}},
publisher={CEUR-WS},
author={Névéol, Aurélie and Robert, Aude and Grippo, F and Lavergne, Thomas and Morgand, C and Orsi, C and Pelikán, L and Ramadier, Lionel and Rey, Grégoire and Zweigenbaum, Pierre},
author={Névéol, Aurélie and Robert, Aude and Grippo, F and Morgand, C and Orsi, C and Pelikán, L and Ramadier, Lionel and Rey, Grégoire and Zweigenbaum, Pierre},