From b043ef915d83f2adf3dcf6acb168e88184c7bd73 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Mario=20Sa=CC=88nger?= <mario.saenger@student.hu-berlin.de>
Date: Wed, 30 May 2018 16:31:27 +0200
Subject: [PATCH] Minor changes to experiments + wording in introduction

---
 paper/40_experiments.tex | 63 +++++++++++++++++++++++-----------------
 paper/wbi-eclef18.tex    |  4 +--
 2 files changed, 38 insertions(+), 29 deletions(-)

diff --git a/paper/40_experiments.tex b/paper/40_experiments.tex
index 9148392..cbe5dbe 100644
--- a/paper/40_experiments.tex
+++ b/paper/40_experiments.tex
@@ -44,7 +44,7 @@ encoders output is used as the initial state of the decoder.
 The decoder generates, based on the input description from the dictionary and a
 special start token, a death cause word by word. This decoding process continues
 until a special end token is generated. The entire model is optimized using the
-Adam optimization algorithm \cite{kingma_adam} and a batch size of 700. Model
+Adam optimization algorithm \cite{kingma_adam:_2014} and a batch size of 700. Model
 training was performed either for 100 epochs or if an early stopping criteria is
 met (no change in validation loss for two epochs).
 
@@ -72,7 +72,7 @@ DCEM-Full &  9 &0.709 & 0.098 & 0.678 & 0.330  \\
 \bottomrule
 \end{tabularx}
 \caption{Experiment results of our death cause extraction sequence-to-sequence
-model concerning balanced (equal number of training data per language) and full
+model concerning balanced (equal number of training instances per language) and full
 data set setting.}
 \end{table}
 
@@ -80,40 +80,49 @@ data set setting.}
 The classification model is responsible for assigning a ICD-10 code to death
 cause description obtained during the first step. Our model uses an embedding
 layer with input masking on zero values, followed by and bidirectional LSTM
-layer with 256 dimension hidden layer. Thereafter a attention layer builds an
-adaptive weighted average over all LSTM states. They ICD-10 code will be
-determined by a dense layer with softmax activation function.
-
-We use the Adam optimizer to perform model training. The model was validated on
-25\% od the data. As for the extraction model, no cross-validation or
-hyperparameter was performed due to time contraints during development. Once
-again, we devised two approaches. This was manly influenced by the lack of
+layer with 256 dimension hidden layer. Thereafter an attention layer builds an
+adaptive weighted average over all LSTM states. The respective ICD-10 code will
+be determined by a dense layer with softmax activation function. We use the Adam
+optimizer to perform model training. The model was validated on 25\% of the
+data. As for the extraction model, no cross-validation or hyperparameter
+optimization was performed due to time contraints during development.
+
+Once again, we devised two approaches. This was mainly caused by the lack of
 adequate training data in terms of coverage for individual ICD-10 codes.
-Therefore, we once again defined two datasets: (1) minimal, where only ICD-10
-codes with 2 or more supporting data points are used. This, of course, minimizes
-the number of ICD-10 codes in the label space. Therefore, (2) an extended
-dataset was defined. Here, the original ICD-10 codes mappings, found in the
-supplied dictionaries, are extended with the data from individual langugae
-Causes Calcules. Finally, for the remaining ICD-10 codes with support of 1 we
-duplicate those datapoints. The goal of this approach is to extend our possible
-label space to all available ICD-10 codes. The results obtained from the two
-approaches are shown in Table \ref{tab:icd10Classification}.
+Therefore, we once again defined two training data settings: (1) minimal, where
+only ICD-10 codes with two or more supporting training instances are used. This,
+of course, minimizes the number of ICD-10 codes in the label space. Therefore,
+(2) an extended dataset was defined. Here, the original ICD-10 code mappings,
+found in the supplied dictionaries, are extended with the training instances
+from individual certificate data from the three languages. Finally, for the
+remaining ICD-10 codes that have only one supporting diagnosis text resp. death
+cause description, we duplicate those data points. The goal of this approach is
+to extend our possible label space to all available ICD-10 codes. The results
+obtained from the two approaches on the validation set are shown in Table
+\ref{tab:icd10Classification}. Using the \textit{minimal} data set the model
+achieves an accuracy of 0.937. In contrast, using the extended data set the
+model reaches an accuracy of 0.954 which represents an improvment of 1.8\%.
 
 \begin{table}[]
 \label{tab:icd10Classification}
 \centering
-\begin{tabularx}{\textwidth}{p{2.25cm}|p{1.75cm}|c|c|c|c|c}
+\begin{tabularx}{0.85\textwidth}{p{2.25cm}|c|c|c|c|c} 
 \toprule
-\multirow{2}{*}{\textbf{Tokenization}}&\multirow{2}{*}{\textbf{Model}}&\multirow{2}{*}{\textbf{Trained Epochs}}&\multicolumn{2}{c|}{\textbf{Train}}&\multicolumn{2}{c}{\textbf{Validation}} \\
-\cline{4-7} 
-&&&\textbf{Accuracy}&\textbf{Loss}&\textbf{Accuracy}&\textbf{Loss} \\
+%\multirow{2}{*}{\textbf{Tokenization}}&\multirow{2}{*}{\textbf{Model}}&\multirow{2}{*}{\textbf{Trained Epochs}}&\multicolumn{2}{c|}{\textbf{Train}}&\multicolumn{2}{c}{\textbf{Validation}} \\
+%\cline{4-7} 
+\multirow{2}{*}{\textbf{Setting}}&\multirow{2}{*}{\textbf{Trained Epochs}}&\multicolumn{2}{c|}{\textbf{Train}}&\multicolumn{2}{c}{\textbf{Validation}} \\
+\cline{3-6}
+&&\textbf{Accuracy}&\textbf{Loss}&\textbf{Accuracy}&\textbf{Loss} \\
 \hline
-Word & Minimal &  69 & 0.925 & 0.190 & 0.937 & 0.169 \\
-Word & Extended &  41 & 0.950 & 0.156 & 0.954 & 0.141 \\
-Character & Minimal &   91 & 0.732 & 1.186 & 0.516 & 2.505 \\
+Minimal &  69 & 0.925 & 0.190 & 0.937 & 0.169 \\
+Extended &  41 & 0.950 & 0.156 & 0.954 & 0.141 \\
+%Character & Minimal &   91 & 0.732 & 1.186 & 0.516 & 2.505 \\
 \bottomrule
 \end{tabularx}
-\caption{Experiment results for our ICD-10 classification model regarding different settings.}
+\caption{Experiment results for our ICD-10 classification model regarding different data settings. The \textit{Minimal} 
+setting uses only ICD-10 codes with two or more training instances in the supplied dictionary. In contrast, 
+\textit{Extended} addtionally takes the diagnosis texts from the certificate data and duplicates 
+ICD-10 training instances with only one diagnosis text in the dictionary and certificate lines.}
 \end{table}
 
 \subsection{Complete Pipeline}
diff --git a/paper/wbi-eclef18.tex b/paper/wbi-eclef18.tex
index d89780a..a11ff54 100644
--- a/paper/wbi-eclef18.tex
+++ b/paper/wbi-eclef18.tex
@@ -49,7 +49,7 @@ This paper describes the participation of the WBI team in the CLEF eHealth 2018
 shared task 1 (``Multilingual Information Extraction - ICD-10 coding''). Our
 contribution focus on the setup and evaluation of a baseline language-independent
 neural architecture for ICD-10 classification as well as a simple, heuristic
-multi-language word embedding technique. The approach builds on two recurrent
+multi-language word embedding space. The approach builds on two recurrent
 neural networks models to extract and classify causes of death from French,
 Italian and Hungarian death certificates. First, we employ a LSTM-based
 sequence-to-sequence model to obtain a death cause from each death certificate
@@ -57,7 +57,7 @@ line. We then utilize a bidirectional LSTM model with attention mechanism to
 assign the respective ICD-10 codes to the received death cause description. Both
 models take multi-language word embeddings as inputs. During evaluation our best
 model achieves an F-measure of 0.34 for French, 0.45 for Hungarian and 0.77 for
-Italian. The results are encouraging for future work as well as extension and
+Italian. The results are encouraging for future work as well as the extension and
 improvement of the proposed baseline system.
 
 \keywords{ICD-10 coding \and Biomedical information extraction \and Multi-lingual sequence-to-sequence model 
-- 
GitLab