Skip to content
Snippets Groups Projects
Commit b043ef91 authored by Mario Sänger's avatar Mario Sänger
Browse files

Minor changes to experiments + wording in introduction

parent 9f123f75
No related merge requests found
...@@ -44,7 +44,7 @@ encoders output is used as the initial state of the decoder. ...@@ -44,7 +44,7 @@ encoders output is used as the initial state of the decoder.
The decoder generates, based on the input description from the dictionary and a The decoder generates, based on the input description from the dictionary and a
special start token, a death cause word by word. This decoding process continues special start token, a death cause word by word. This decoding process continues
until a special end token is generated. The entire model is optimized using the until a special end token is generated. The entire model is optimized using the
Adam optimization algorithm \cite{kingma_adam} and a batch size of 700. Model Adam optimization algorithm \cite{kingma_adam:_2014} and a batch size of 700. Model
training was performed either for 100 epochs or if an early stopping criteria is training was performed either for 100 epochs or if an early stopping criteria is
met (no change in validation loss for two epochs). met (no change in validation loss for two epochs).
...@@ -72,7 +72,7 @@ DCEM-Full & 9 &0.709 & 0.098 & 0.678 & 0.330 \\ ...@@ -72,7 +72,7 @@ DCEM-Full & 9 &0.709 & 0.098 & 0.678 & 0.330 \\
\bottomrule \bottomrule
\end{tabularx} \end{tabularx}
\caption{Experiment results of our death cause extraction sequence-to-sequence \caption{Experiment results of our death cause extraction sequence-to-sequence
model concerning balanced (equal number of training data per language) and full model concerning balanced (equal number of training instances per language) and full
data set setting.} data set setting.}
\end{table} \end{table}
...@@ -80,40 +80,49 @@ data set setting.} ...@@ -80,40 +80,49 @@ data set setting.}
The classification model is responsible for assigning a ICD-10 code to death The classification model is responsible for assigning a ICD-10 code to death
cause description obtained during the first step. Our model uses an embedding cause description obtained during the first step. Our model uses an embedding
layer with input masking on zero values, followed by and bidirectional LSTM layer with input masking on zero values, followed by and bidirectional LSTM
layer with 256 dimension hidden layer. Thereafter a attention layer builds an layer with 256 dimension hidden layer. Thereafter an attention layer builds an
adaptive weighted average over all LSTM states. They ICD-10 code will be adaptive weighted average over all LSTM states. The respective ICD-10 code will
determined by a dense layer with softmax activation function. be determined by a dense layer with softmax activation function. We use the Adam
optimizer to perform model training. The model was validated on 25\% of the
We use the Adam optimizer to perform model training. The model was validated on data. As for the extraction model, no cross-validation or hyperparameter
25\% od the data. As for the extraction model, no cross-validation or optimization was performed due to time contraints during development.
hyperparameter was performed due to time contraints during development. Once
again, we devised two approaches. This was manly influenced by the lack of Once again, we devised two approaches. This was mainly caused by the lack of
adequate training data in terms of coverage for individual ICD-10 codes. adequate training data in terms of coverage for individual ICD-10 codes.
Therefore, we once again defined two datasets: (1) minimal, where only ICD-10 Therefore, we once again defined two training data settings: (1) minimal, where
codes with 2 or more supporting data points are used. This, of course, minimizes only ICD-10 codes with two or more supporting training instances are used. This,
the number of ICD-10 codes in the label space. Therefore, (2) an extended of course, minimizes the number of ICD-10 codes in the label space. Therefore,
dataset was defined. Here, the original ICD-10 codes mappings, found in the (2) an extended dataset was defined. Here, the original ICD-10 code mappings,
supplied dictionaries, are extended with the data from individual langugae found in the supplied dictionaries, are extended with the training instances
Causes Calcules. Finally, for the remaining ICD-10 codes with support of 1 we from individual certificate data from the three languages. Finally, for the
duplicate those datapoints. The goal of this approach is to extend our possible remaining ICD-10 codes that have only one supporting diagnosis text resp. death
label space to all available ICD-10 codes. The results obtained from the two cause description, we duplicate those data points. The goal of this approach is
approaches are shown in Table \ref{tab:icd10Classification}. to extend our possible label space to all available ICD-10 codes. The results
obtained from the two approaches on the validation set are shown in Table
\ref{tab:icd10Classification}. Using the \textit{minimal} data set the model
achieves an accuracy of 0.937. In contrast, using the extended data set the
model reaches an accuracy of 0.954 which represents an improvment of 1.8\%.
\begin{table}[] \begin{table}[]
\label{tab:icd10Classification} \label{tab:icd10Classification}
\centering \centering
\begin{tabularx}{\textwidth}{p{2.25cm}|p{1.75cm}|c|c|c|c|c} \begin{tabularx}{0.85\textwidth}{p{2.25cm}|c|c|c|c|c}
\toprule \toprule
\multirow{2}{*}{\textbf{Tokenization}}&\multirow{2}{*}{\textbf{Model}}&\multirow{2}{*}{\textbf{Trained Epochs}}&\multicolumn{2}{c|}{\textbf{Train}}&\multicolumn{2}{c}{\textbf{Validation}} \\ %\multirow{2}{*}{\textbf{Tokenization}}&\multirow{2}{*}{\textbf{Model}}&\multirow{2}{*}{\textbf{Trained Epochs}}&\multicolumn{2}{c|}{\textbf{Train}}&\multicolumn{2}{c}{\textbf{Validation}} \\
\cline{4-7} %\cline{4-7}
&&&\textbf{Accuracy}&\textbf{Loss}&\textbf{Accuracy}&\textbf{Loss} \\ \multirow{2}{*}{\textbf{Setting}}&\multirow{2}{*}{\textbf{Trained Epochs}}&\multicolumn{2}{c|}{\textbf{Train}}&\multicolumn{2}{c}{\textbf{Validation}} \\
\cline{3-6}
&&\textbf{Accuracy}&\textbf{Loss}&\textbf{Accuracy}&\textbf{Loss} \\
\hline \hline
Word & Minimal & 69 & 0.925 & 0.190 & 0.937 & 0.169 \\ Minimal & 69 & 0.925 & 0.190 & 0.937 & 0.169 \\
Word & Extended & 41 & 0.950 & 0.156 & 0.954 & 0.141 \\ Extended & 41 & 0.950 & 0.156 & 0.954 & 0.141 \\
Character & Minimal & 91 & 0.732 & 1.186 & 0.516 & 2.505 \\ %Character & Minimal & 91 & 0.732 & 1.186 & 0.516 & 2.505 \\
\bottomrule \bottomrule
\end{tabularx} \end{tabularx}
\caption{Experiment results for our ICD-10 classification model regarding different settings.} \caption{Experiment results for our ICD-10 classification model regarding different data settings. The \textit{Minimal}
setting uses only ICD-10 codes with two or more training instances in the supplied dictionary. In contrast,
\textit{Extended} addtionally takes the diagnosis texts from the certificate data and duplicates
ICD-10 training instances with only one diagnosis text in the dictionary and certificate lines.}
\end{table} \end{table}
\subsection{Complete Pipeline} \subsection{Complete Pipeline}
......
...@@ -49,7 +49,7 @@ This paper describes the participation of the WBI team in the CLEF eHealth 2018 ...@@ -49,7 +49,7 @@ This paper describes the participation of the WBI team in the CLEF eHealth 2018
shared task 1 (``Multilingual Information Extraction - ICD-10 coding''). Our shared task 1 (``Multilingual Information Extraction - ICD-10 coding''). Our
contribution focus on the setup and evaluation of a baseline language-independent contribution focus on the setup and evaluation of a baseline language-independent
neural architecture for ICD-10 classification as well as a simple, heuristic neural architecture for ICD-10 classification as well as a simple, heuristic
multi-language word embedding technique. The approach builds on two recurrent multi-language word embedding space. The approach builds on two recurrent
neural networks models to extract and classify causes of death from French, neural networks models to extract and classify causes of death from French,
Italian and Hungarian death certificates. First, we employ a LSTM-based Italian and Hungarian death certificates. First, we employ a LSTM-based
sequence-to-sequence model to obtain a death cause from each death certificate sequence-to-sequence model to obtain a death cause from each death certificate
...@@ -57,7 +57,7 @@ line. We then utilize a bidirectional LSTM model with attention mechanism to ...@@ -57,7 +57,7 @@ line. We then utilize a bidirectional LSTM model with attention mechanism to
assign the respective ICD-10 codes to the received death cause description. Both assign the respective ICD-10 codes to the received death cause description. Both
models take multi-language word embeddings as inputs. During evaluation our best models take multi-language word embeddings as inputs. During evaluation our best
model achieves an F-measure of 0.34 for French, 0.45 for Hungarian and 0.77 for model achieves an F-measure of 0.34 for French, 0.45 for Hungarian and 0.77 for
Italian. The results are encouraging for future work as well as extension and Italian. The results are encouraging for future work as well as the extension and
improvement of the proposed baseline system. improvement of the proposed baseline system.
\keywords{ICD-10 coding \and Biomedical information extraction \and Multi-lingual sequence-to-sequence model \keywords{ICD-10 coding \and Biomedical information extraction \and Multi-lingual sequence-to-sequence model
......
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment