Skip to content
Snippets Groups Projects
Commit b043ef91 authored by Mario Sänger's avatar Mario Sänger
Browse files

Minor changes to experiments + wording in introduction

parent 9f123f75
Branches
No related merge requests found
......@@ -44,7 +44,7 @@ encoders output is used as the initial state of the decoder.
The decoder generates, based on the input description from the dictionary and a
special start token, a death cause word by word. This decoding process continues
until a special end token is generated. The entire model is optimized using the
Adam optimization algorithm \cite{kingma_adam} and a batch size of 700. Model
Adam optimization algorithm \cite{kingma_adam:_2014} and a batch size of 700. Model
training was performed either for 100 epochs or if an early stopping criteria is
met (no change in validation loss for two epochs).
......@@ -72,7 +72,7 @@ DCEM-Full & 9 &0.709 & 0.098 & 0.678 & 0.330 \\
\bottomrule
\end{tabularx}
\caption{Experiment results of our death cause extraction sequence-to-sequence
model concerning balanced (equal number of training data per language) and full
model concerning balanced (equal number of training instances per language) and full
data set setting.}
\end{table}
......@@ -80,40 +80,49 @@ data set setting.}
The classification model is responsible for assigning a ICD-10 code to death
cause description obtained during the first step. Our model uses an embedding
layer with input masking on zero values, followed by and bidirectional LSTM
layer with 256 dimension hidden layer. Thereafter a attention layer builds an
adaptive weighted average over all LSTM states. They ICD-10 code will be
determined by a dense layer with softmax activation function.
We use the Adam optimizer to perform model training. The model was validated on
25\% od the data. As for the extraction model, no cross-validation or
hyperparameter was performed due to time contraints during development. Once
again, we devised two approaches. This was manly influenced by the lack of
layer with 256 dimension hidden layer. Thereafter an attention layer builds an
adaptive weighted average over all LSTM states. The respective ICD-10 code will
be determined by a dense layer with softmax activation function. We use the Adam
optimizer to perform model training. The model was validated on 25\% of the
data. As for the extraction model, no cross-validation or hyperparameter
optimization was performed due to time contraints during development.
Once again, we devised two approaches. This was mainly caused by the lack of
adequate training data in terms of coverage for individual ICD-10 codes.
Therefore, we once again defined two datasets: (1) minimal, where only ICD-10
codes with 2 or more supporting data points are used. This, of course, minimizes
the number of ICD-10 codes in the label space. Therefore, (2) an extended
dataset was defined. Here, the original ICD-10 codes mappings, found in the
supplied dictionaries, are extended with the data from individual langugae
Causes Calcules. Finally, for the remaining ICD-10 codes with support of 1 we
duplicate those datapoints. The goal of this approach is to extend our possible
label space to all available ICD-10 codes. The results obtained from the two
approaches are shown in Table \ref{tab:icd10Classification}.
Therefore, we once again defined two training data settings: (1) minimal, where
only ICD-10 codes with two or more supporting training instances are used. This,
of course, minimizes the number of ICD-10 codes in the label space. Therefore,
(2) an extended dataset was defined. Here, the original ICD-10 code mappings,
found in the supplied dictionaries, are extended with the training instances
from individual certificate data from the three languages. Finally, for the
remaining ICD-10 codes that have only one supporting diagnosis text resp. death
cause description, we duplicate those data points. The goal of this approach is
to extend our possible label space to all available ICD-10 codes. The results
obtained from the two approaches on the validation set are shown in Table
\ref{tab:icd10Classification}. Using the \textit{minimal} data set the model
achieves an accuracy of 0.937. In contrast, using the extended data set the
model reaches an accuracy of 0.954 which represents an improvment of 1.8\%.
\begin{table}[]
\label{tab:icd10Classification}
\centering
\begin{tabularx}{\textwidth}{p{2.25cm}|p{1.75cm}|c|c|c|c|c}
\begin{tabularx}{0.85\textwidth}{p{2.25cm}|c|c|c|c|c}
\toprule
\multirow{2}{*}{\textbf{Tokenization}}&\multirow{2}{*}{\textbf{Model}}&\multirow{2}{*}{\textbf{Trained Epochs}}&\multicolumn{2}{c|}{\textbf{Train}}&\multicolumn{2}{c}{\textbf{Validation}} \\
\cline{4-7}
&&&\textbf{Accuracy}&\textbf{Loss}&\textbf{Accuracy}&\textbf{Loss} \\
%\multirow{2}{*}{\textbf{Tokenization}}&\multirow{2}{*}{\textbf{Model}}&\multirow{2}{*}{\textbf{Trained Epochs}}&\multicolumn{2}{c|}{\textbf{Train}}&\multicolumn{2}{c}{\textbf{Validation}} \\
%\cline{4-7}
\multirow{2}{*}{\textbf{Setting}}&\multirow{2}{*}{\textbf{Trained Epochs}}&\multicolumn{2}{c|}{\textbf{Train}}&\multicolumn{2}{c}{\textbf{Validation}} \\
\cline{3-6}
&&\textbf{Accuracy}&\textbf{Loss}&\textbf{Accuracy}&\textbf{Loss} \\
\hline
Word & Minimal & 69 & 0.925 & 0.190 & 0.937 & 0.169 \\
Word & Extended & 41 & 0.950 & 0.156 & 0.954 & 0.141 \\
Character & Minimal & 91 & 0.732 & 1.186 & 0.516 & 2.505 \\
Minimal & 69 & 0.925 & 0.190 & 0.937 & 0.169 \\
Extended & 41 & 0.950 & 0.156 & 0.954 & 0.141 \\
%Character & Minimal & 91 & 0.732 & 1.186 & 0.516 & 2.505 \\
\bottomrule
\end{tabularx}
\caption{Experiment results for our ICD-10 classification model regarding different settings.}
\caption{Experiment results for our ICD-10 classification model regarding different data settings. The \textit{Minimal}
setting uses only ICD-10 codes with two or more training instances in the supplied dictionary. In contrast,
\textit{Extended} addtionally takes the diagnosis texts from the certificate data and duplicates
ICD-10 training instances with only one diagnosis text in the dictionary and certificate lines.}
\end{table}
\subsection{Complete Pipeline}
......
......@@ -49,7 +49,7 @@ This paper describes the participation of the WBI team in the CLEF eHealth 2018
shared task 1 (``Multilingual Information Extraction - ICD-10 coding''). Our
contribution focus on the setup and evaluation of a baseline language-independent
neural architecture for ICD-10 classification as well as a simple, heuristic
multi-language word embedding technique. The approach builds on two recurrent
multi-language word embedding space. The approach builds on two recurrent
neural networks models to extract and classify causes of death from French,
Italian and Hungarian death certificates. First, we employ a LSTM-based
sequence-to-sequence model to obtain a death cause from each death certificate
......@@ -57,7 +57,7 @@ line. We then utilize a bidirectional LSTM model with attention mechanism to
assign the respective ICD-10 codes to the received death cause description. Both
models take multi-language word embeddings as inputs. During evaluation our best
model achieves an F-measure of 0.34 for French, 0.45 for Hungarian and 0.77 for
Italian. The results are encouraging for future work as well as extension and
Italian. The results are encouraging for future work as well as the extension and
improvement of the proposed baseline system.
\keywords{ICD-10 coding \and Biomedical information extraction \and Multi-lingual sequence-to-sequence model
......
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment