From b043ef915d83f2adf3dcf6acb168e88184c7bd73 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Mario=20Sa=CC=88nger?= <mario.saenger@student.hu-berlin.de> Date: Wed, 30 May 2018 16:31:27 +0200 Subject: [PATCH] Minor changes to experiments + wording in introduction --- paper/40_experiments.tex | 63 +++++++++++++++++++++++----------------- paper/wbi-eclef18.tex | 4 +-- 2 files changed, 38 insertions(+), 29 deletions(-) diff --git a/paper/40_experiments.tex b/paper/40_experiments.tex index 9148392..cbe5dbe 100644 --- a/paper/40_experiments.tex +++ b/paper/40_experiments.tex @@ -44,7 +44,7 @@ encoders output is used as the initial state of the decoder. The decoder generates, based on the input description from the dictionary and a special start token, a death cause word by word. This decoding process continues until a special end token is generated. The entire model is optimized using the -Adam optimization algorithm \cite{kingma_adam} and a batch size of 700. Model +Adam optimization algorithm \cite{kingma_adam:_2014} and a batch size of 700. Model training was performed either for 100 epochs or if an early stopping criteria is met (no change in validation loss for two epochs). @@ -72,7 +72,7 @@ DCEM-Full & 9 &0.709 & 0.098 & 0.678 & 0.330 \\ \bottomrule \end{tabularx} \caption{Experiment results of our death cause extraction sequence-to-sequence -model concerning balanced (equal number of training data per language) and full +model concerning balanced (equal number of training instances per language) and full data set setting.} \end{table} @@ -80,40 +80,49 @@ data set setting.} The classification model is responsible for assigning a ICD-10 code to death cause description obtained during the first step. Our model uses an embedding layer with input masking on zero values, followed by and bidirectional LSTM -layer with 256 dimension hidden layer. Thereafter a attention layer builds an -adaptive weighted average over all LSTM states. They ICD-10 code will be -determined by a dense layer with softmax activation function. - -We use the Adam optimizer to perform model training. The model was validated on -25\% od the data. As for the extraction model, no cross-validation or -hyperparameter was performed due to time contraints during development. Once -again, we devised two approaches. This was manly influenced by the lack of +layer with 256 dimension hidden layer. Thereafter an attention layer builds an +adaptive weighted average over all LSTM states. The respective ICD-10 code will +be determined by a dense layer with softmax activation function. We use the Adam +optimizer to perform model training. The model was validated on 25\% of the +data. As for the extraction model, no cross-validation or hyperparameter +optimization was performed due to time contraints during development. + +Once again, we devised two approaches. This was mainly caused by the lack of adequate training data in terms of coverage for individual ICD-10 codes. -Therefore, we once again defined two datasets: (1) minimal, where only ICD-10 -codes with 2 or more supporting data points are used. This, of course, minimizes -the number of ICD-10 codes in the label space. Therefore, (2) an extended -dataset was defined. Here, the original ICD-10 codes mappings, found in the -supplied dictionaries, are extended with the data from individual langugae -Causes Calcules. Finally, for the remaining ICD-10 codes with support of 1 we -duplicate those datapoints. The goal of this approach is to extend our possible -label space to all available ICD-10 codes. The results obtained from the two -approaches are shown in Table \ref{tab:icd10Classification}. +Therefore, we once again defined two training data settings: (1) minimal, where +only ICD-10 codes with two or more supporting training instances are used. This, +of course, minimizes the number of ICD-10 codes in the label space. Therefore, +(2) an extended dataset was defined. Here, the original ICD-10 code mappings, +found in the supplied dictionaries, are extended with the training instances +from individual certificate data from the three languages. Finally, for the +remaining ICD-10 codes that have only one supporting diagnosis text resp. death +cause description, we duplicate those data points. The goal of this approach is +to extend our possible label space to all available ICD-10 codes. The results +obtained from the two approaches on the validation set are shown in Table +\ref{tab:icd10Classification}. Using the \textit{minimal} data set the model +achieves an accuracy of 0.937. In contrast, using the extended data set the +model reaches an accuracy of 0.954 which represents an improvment of 1.8\%. \begin{table}[] \label{tab:icd10Classification} \centering -\begin{tabularx}{\textwidth}{p{2.25cm}|p{1.75cm}|c|c|c|c|c} +\begin{tabularx}{0.85\textwidth}{p{2.25cm}|c|c|c|c|c} \toprule -\multirow{2}{*}{\textbf{Tokenization}}&\multirow{2}{*}{\textbf{Model}}&\multirow{2}{*}{\textbf{Trained Epochs}}&\multicolumn{2}{c|}{\textbf{Train}}&\multicolumn{2}{c}{\textbf{Validation}} \\ -\cline{4-7} -&&&\textbf{Accuracy}&\textbf{Loss}&\textbf{Accuracy}&\textbf{Loss} \\ +%\multirow{2}{*}{\textbf{Tokenization}}&\multirow{2}{*}{\textbf{Model}}&\multirow{2}{*}{\textbf{Trained Epochs}}&\multicolumn{2}{c|}{\textbf{Train}}&\multicolumn{2}{c}{\textbf{Validation}} \\ +%\cline{4-7} +\multirow{2}{*}{\textbf{Setting}}&\multirow{2}{*}{\textbf{Trained Epochs}}&\multicolumn{2}{c|}{\textbf{Train}}&\multicolumn{2}{c}{\textbf{Validation}} \\ +\cline{3-6} +&&\textbf{Accuracy}&\textbf{Loss}&\textbf{Accuracy}&\textbf{Loss} \\ \hline -Word & Minimal & 69 & 0.925 & 0.190 & 0.937 & 0.169 \\ -Word & Extended & 41 & 0.950 & 0.156 & 0.954 & 0.141 \\ -Character & Minimal & 91 & 0.732 & 1.186 & 0.516 & 2.505 \\ +Minimal & 69 & 0.925 & 0.190 & 0.937 & 0.169 \\ +Extended & 41 & 0.950 & 0.156 & 0.954 & 0.141 \\ +%Character & Minimal & 91 & 0.732 & 1.186 & 0.516 & 2.505 \\ \bottomrule \end{tabularx} -\caption{Experiment results for our ICD-10 classification model regarding different settings.} +\caption{Experiment results for our ICD-10 classification model regarding different data settings. The \textit{Minimal} +setting uses only ICD-10 codes with two or more training instances in the supplied dictionary. In contrast, +\textit{Extended} addtionally takes the diagnosis texts from the certificate data and duplicates +ICD-10 training instances with only one diagnosis text in the dictionary and certificate lines.} \end{table} \subsection{Complete Pipeline} diff --git a/paper/wbi-eclef18.tex b/paper/wbi-eclef18.tex index d89780a..a11ff54 100644 --- a/paper/wbi-eclef18.tex +++ b/paper/wbi-eclef18.tex @@ -49,7 +49,7 @@ This paper describes the participation of the WBI team in the CLEF eHealth 2018 shared task 1 (``Multilingual Information Extraction - ICD-10 coding''). Our contribution focus on the setup and evaluation of a baseline language-independent neural architecture for ICD-10 classification as well as a simple, heuristic -multi-language word embedding technique. The approach builds on two recurrent +multi-language word embedding space. The approach builds on two recurrent neural networks models to extract and classify causes of death from French, Italian and Hungarian death certificates. First, we employ a LSTM-based sequence-to-sequence model to obtain a death cause from each death certificate @@ -57,7 +57,7 @@ line. We then utilize a bidirectional LSTM model with attention mechanism to assign the respective ICD-10 codes to the received death cause description. Both models take multi-language word embeddings as inputs. During evaluation our best model achieves an F-measure of 0.34 for French, 0.45 for Hungarian and 0.77 for -Italian. The results are encouraging for future work as well as extension and +Italian. The results are encouraging for future work as well as the extension and improvement of the proposed baseline system. \keywords{ICD-10 coding \and Biomedical information extraction \and Multi-lingual sequence-to-sequence model -- GitLab