diff --git a/paper/10_introduction.tex b/paper/10_introduction.tex
index 3913217d8d118294a8b5bd3d83828f5babcf907b..1f014ca94ec69d3d56b8366282855f5173ba6c20 100644
--- a/paper/10_introduction.tex
+++ b/paper/10_introduction.tex
@@ -1,43 +1,26 @@
-Automatic extraction, classification and analysis of biological and medical
-concepts from unstructured texts, such as scientific publications or electronic
-health documents, is a highly important task to support many applications in
-research, daily clinical routine and policy-making. Computer-aided approaches
-can improve decision making and support clinical processes, for example, by
-giving a more sophisticated overview about a research area, providing detailed
-information about the aetiopathology of a patient or disease patterns. In the
-past years major advances have been made in the area of natural language
-processing. However, improvements in the field of biomedical text mining lag
-behind other domains mainly due to privacy issues and concerns regarding the
-processed data (e.g. electronic health records).
+Automatic extraction, classification and analysis of biological and medical concepts from unstructured texts, such as scientific publications or electronic health documents, is a highly important task to support many applications in research, daily clinical routine and policy-making.
+Computer-aided approaches can improve decision making and support clinical processes, for example, by giving a more sophisticated overview about a research area, providing detailed information about the aetiopathology of a patient or disease patterns.
+In the past years major advances have been made in the area of natural language processing.
+However, improvements in the field of biomedical text mining lag behind other domains mainly due to privacy issues and concerns regarding the processed data (e.g. electronic health records).
 
-The CLEF eHealth lab attends to this circumstance through organization of
-various shared tasks which aid and support the development of approaches to
-exploit electronically available medical content \cite{suominen_overview_2018}.
-In particular, Task 1 of the lab was concerned with the extraction and
-classification of causes of death from death certificates originating from
-different languages \cite{neveol_clef_2018}. Participants were asked to classify
-the death causes mentioned in the certificates according to the International
-Classification of Disease version 10 (ICD-10). The task has been carried out the
-last two years of the lab, however was only concerned with French and English
-certificates. In contrast, the organizers provided annotated death reports as
-well as ICD-10 dictionaries for French, Italian and Hungarian this year. The
-development of language-independent, multilingual approaches was encouraged.
-
-Inspired by the recent success of recurrent neural network models
-\cite{cho_learning_2014,lample_neural_2016,dyer_transition-based_2015} in
-general and the convincing performance of the work from Miftahutdinov and
-Tutbalina \cite{miftakhutdinov_kfu_2017} in the last year's competition we opt
-for the development of a deep learning model for this year's task. Our work
-introduces a language independent approach for ICD-10 classification using
-multi-language word embeddings and LSTM-based recurrent models. We divide the
-the classification into two tasks. First, we extract the death cause description
-from a certificate line backed by an encoder-decoder model. Given the death
-cause the actual ICD-10 classification will be performed by a separate LSTM
-model. Our work focus on the setup and evaluation of an initial, baseline
-language-independent approach which builds on a heuristic multi-language
-embedding space and therefore only needs one single model for all three data
-sets. Moreover, we tried to as little as possible additional external resources.
+The CLEF eHealth lab attends to circumvent this through organization of various shared tasks %which aid and support the development of approaches
+to exploit electronically available medical content \cite{suominen_overview_2018}.
+In particular, Task 1\footnote{https://sites.google.com/view/clef-ehealth-2018/task-1-multilingual-information-extraction-icd10-coding} of the lab is concerned with the extraction and classification of death causes from death certificates originating from different languages \cite{neveol_clef_2018}.
+Participants were asked to classify the death causes mentioned in the certificates according to the International Classification of Disease version 10 (ICD-10).
+The task %has been carried out the last two years of the lab, however
+was concerned with French and English death certificates in previous years.
+In contrast, this year the organizers provided annotated death reports as well as ICD-10 dictionaries for French, Italian and Hungarian this year.
+The development of language-independent, multilingual approaches was encouraged.
 
+Inspired by the recent success of recurrent neural network models \cite{cho_learning_2014,lample_neural_2016,dyer_transition-based_2015} in general and the convincing performance of the work from Miftahutdinov and Tutbalina \cite{miftakhutdinov_kfu_2017} in CLEF eHealth Task 1 2017 competition we opt for the development of a deep learning model for this year's task.
+Our work introduces a language independent approach for ICD-10 classification using multi-language word embeddings and LSTM-based recurrent models.
+We divide the proposed pipeline %$classification
+into two tasks.
+First, we perform Name Entity Recognition (NER), i.e. extract the death cause description from a certificate line, with an an encoder-decoder model.
+Given the death cause, Named Entity Normalization (NEN), i.e. assigning an ICD-10 code to extracted death cause, is performed by a separate LSTM model.
+In this work we present the setup and evaluation of an initial, baseline language-independent approach which builds on a heuristic multi-language embedding space and therefore only needs one single model for all three data sets.
+Moreover, we tried to use as little as possible additional external resources.
+PARAGRAPH ABOUT EMBEDDINGS. 
  
 
 
diff --git a/paper/20_related_work.tex b/paper/20_related_work.tex
index bbd7b4f59c1ed331ea6d98b969736c2f8b340ad6..d63f3ab2f49447e316c263c48e91c4b1df7b3ee9 100644
--- a/paper/20_related_work.tex
+++ b/paper/20_related_work.tex
@@ -4,7 +4,7 @@ eHealth lab. Participating teams used a plethora of different approaches to
 tackle the classification problem. The methods can essentially be divided into
 two categories: knowledge-based
 \cite{cabot_sibm_2016,jonnagaddala_automatic_2017,van_mulligen_erasmus_2016} and
-machine learning approaches
+machine learning (ML) approaches
 \cite{dermouche_ecstra-inserm_2016,ebersbach_fusion_2017,ho-dac_litl_2016,miftakhutdinov_kfu_2017}.
 The former relies on lexical sources, medical terminologies and other ontologies
 to match (parts of) the certificate text with entries from the knowledge-bases
@@ -13,14 +13,13 @@ according to a rule framework.  For example, Di Nunzio et al.
 by summing the binary or tf-idf weights of each term of a certificate line
 segment and assign the ICD-10 code with the highest score. In contrast, Ho-Dac
 et al. \cite{ho-dac_litl_2017} treat the problem as information retrieval task
-and utilze the Apache Solr search engine\footnote{\url{http://lucene.apache.org/solr/}}.
+and utilize the Apache Solr search engine\footnote{\url{http://lucene.apache.org/solr/}}.
 
-The machine learning based approaches employ a variety techniques, e.g.
+The ML-based approaches employ a variety of techniques, e.g.
 Conditional Random Fields (CRFs) \cite{ho-dac_litl_2016}, Labeled Latent
 Dirichlet Analysis (LDA) \cite{dermouche_ecstra-inserm_2016} and Support Vector
 Machines (SVMs) \cite{ebersbach_fusion_2017} with diverse hand-crafted features.
 
-
 Most similar to our approach is the work from Miftahutdinov and Tutbalina
 \cite{miftakhutdinov_kfu_2017}, which achieved the best results for English
 certificates in the last year's competition. They use a neural LSTM-based
@@ -31,7 +30,7 @@ diagnosis texts of the individual ICD-10 codes is used to integrate prior
 knowledge into the model. The concatenation of both vector representations is
 then used to output the characters and numbers of the ICD-10 code in the
 decoding step. In contrast to their work, our approach introduces a model for
-multi-language ICD-10 classification. We utilitize two separate recurrent neural
+multi-language ICD-10 classification. We utilize two separate recurrent neural
 networks, one sequence to sequence model for death cause extraction and one for
 classification, to predict the ICD-10 codes for a certificate text independent
 from which language they originate.
diff --git a/paper/30_methods_intro.tex b/paper/30_methods_intro.tex
index 0226452b5b412f1af81a958892ea17bb3570f3c9..1f4649c32602151a718891b28e6094820af6d640 100644
--- a/paper/30_methods_intro.tex
+++ b/paper/30_methods_intro.tex
@@ -33,4 +33,6 @@ which make the past and future context available in every time step. A
 bidirectional LSTM model consists of a forward chain, which processes the input
 data from left to right, and and backward chain, consuming the data in the
 opposite direction. The final representation is typically the concatenation or a
-linear combination of both states. 
\ No newline at end of file
+linear combination of both states. 
+
+AREN'T WE MOVING THIS TO RELATED WORK?
\ No newline at end of file
diff --git a/paper/31_methods_seq2seq.tex b/paper/31_methods_seq2seq.tex
index e18d6c1ab14f3592a035dc9a21eab955e9c65eab..42cfde18daeedbffee0add5989b996cbafda651f 100644
--- a/paper/31_methods_seq2seq.tex
+++ b/paper/31_methods_seq2seq.tex
@@ -1,52 +1,33 @@
 \subsection{Death Cause Extraction Model}
-The first step in our pipeline is the extraction of the death cause description
-from a given certificate line. We use the training certificate lines (with their
-corresponding ICD-10 codes) and the ICD-10 dictionaries as basis for our model.
-The dictionaries provide us with death causes resp. diagnosis for each ICD-10
-code. The goal of the model is to reassemble the dictionary death cause
-description text from the certificate line.
+The first step in our pipeline is the extraction of the death cause description from a given certificate line. 
+We use the training certificate lines (with their corresponding ICD-10 codes) and the ICD-10 dictionaries as basis for our model. 
+The dictionaries provide us with death causes resp. diagnosis for each ICD-10 code. 
+The goal of the model is to reassemble the dictionary death cause description text from the certificate line.
 
-For this we adopt the encoder-decoder architecture proposed in
-\cite{sutskever_sequence_2014}. Figure \ref{fig:encoder_decoder} illustrates the
-architecture of the model. As encoder we utilize a forward LSTM model, which
-takes the single words of a certificate line as inputs and scans the line from
-left to right. Each token will be represented using pre-trained fastText
-word embeddings. Word embedding models represent words using a real-valued
-vector and caputure syntactic and semantic similiarities between them. fastText
-embeddings take sub-word information into account during training whereby the
-model is able to provide suitable representations even for unseen,
-out-of-vocabulary words. We utilize fastText embeddings for French, Italian and
-Hungarian trained on Common Crawl and Wikipedia articles\footnote{\url{https://github.com/facebookresearch/fastText/blob/master/docs/crawl-vectors.md}}. 
-Independently from which lanugage a word originates we lookup the word in all
-three embedding models and concatenate the obtained vectors. Through this we get
-a (kind of) multi-language representation of the word. This heuristic
-composition constitutes a naive solution to build a multi-language embedding
-space, however we opted to evaluate this approach as simple baseline for future
-investigations. The encoders final state represents the semantic meaning of the
-certificate line and serves as intial input for decoding process.
+For this we adopt the encoder-decoder architecture proposed in \cite{sutskever_sequence_2014}. Figure \ref{fig:encoder_decoder} illustrates the architecture of the model. 
+As encoder we utilize a forward LSTM model, which takes the single words of a certificate line as inputs and scans the line from left to right.
+Each token is represented using pre-trained fastText\footnote{https://github.com/facebookresearch/fastText/} word embeddings\cite{bojanowski_enriching_2016}. 
+Word embedding models represent words using a real-valued vector and capture syntactic and semantic similarities between them. 
+fastText embeddings take sub-word information into account during training whereby the model is able to provide suitable representations even for unseen, out-of-vocabulary (OOV) words. 
+We utilize fastText embeddings for French, Italian and Hungarian trained on Common Crawl and Wikipedia articles\footnote{\url{https://github.com/facebookresearch/fastText/blob/master/docs/crawl-vectors.md}}. 
+Independently from the language a word originates from, we lookup the word in all three embedding models and concatenate the obtained vectors. 
+Through this we get a (basic) multi-language representation of the word. 
+This heuristic composition constitutes a naive solution to build a multi-language embedding space. 
+However we opted to evaluate this approach as a simple baseline for future work. 
+Encoders' final state represents the semantic representation of the certificate line and serves as initial input for decoding process.
 
 \begin{figure}
-\includegraphics[width=\textwidth,trim={0 17cm 0
-3cm},clip=true]{encoder-decoder-model.pdf} \caption{Illustration of the neural
-encoder-decoder model for death cause extraction. The encoder processes a death
-certificate line token-wise from left to right. The final state of the encoder
-forms a semantic representation of the line and serves as initial input for the
-decoding process. The decoder will be trained to predict the death cause
-description text from the provided ICD-10 dictionaries word by word (using
-special tags \textbackslash s and \textbackslash e for start resp. end of a
-sequence). All input tokens will be represented using the concatenation of the
-fastText embeddings \cite{bojanowski_enriching_2016} of all three languages.}
+\includegraphics[width=\textwidth,trim={0 17cm 0 3cm},clip=true]{encoder-decoder-model.pdf} 
+\caption{Illustration of the neural encoder-decoder model for death cause extraction. The encoder processes a death certificate line token-wise from left to right. The final state of the encoder forms a semantic representation of the line and serves as initial input for the decoding process. The decoder will be trained to predict the death cause description text from the provided ICD-10 dictionaries word by word (using special tags \textbackslash s and \textbackslash e for start resp. end of a sequence). All input tokens will be represented using the concatenation of the fastText embeddings %\cite{bojanowski_enriching_2016} 
+of all three languages.}
 \label{fig:encoder_decoder}
 \end{figure}
 
-As decoder with utilize another LSTM model. The initial input of the decoder is
-the final state of the encoder. Moreover, each token of the dictionary death
-cause description name (padded with special start and end tag) serves as input
-for the different time steps. Again, we use FastEmbeddngs of all three languages
-to represent the token. The decoder predicts one-hot-encoded words of the
-symptom name. During test time we use the encoder to obtain a semantic
-representation of the certificate line and decode the death cause description
-word by word starting with the special start tag. The decoding process finishs
-when the decoder outputs the end tag.
+For the decoder with utilize another LSTM model. The initial input of the decoder is the final state of the encoder model. 
+Moreover, each token of the dictionary death cause description name (padded with special start and end tag) serves as input for the different time steps. 
+Again, we use fastText embeddings of all three languages to represent the token. 
+The decoder predicts one-hot-encoded words of the symptom name. 
+During test time we use the encoder to obtain a semantic representation of the certificate line and decode the death cause description word by word starting with the special start tag. 
+The decoding process finishes when the decoder outputs the end tag.
 
 
diff --git a/paper/32_methods_icd10.tex b/paper/32_methods_icd10.tex
index 89c2c572dfac5b09cf8785cf4484716852a89c31..263a1826755cfc900787bcc7a68f7da664eebfea 100644
--- a/paper/32_methods_icd10.tex
+++ b/paper/32_methods_icd10.tex
@@ -1,33 +1,22 @@
 \subsection{ICD-10 Classification Model}  
-The second step in our pipeline is to assign a ICD-10 code to the obtained death
-cause description. For this purpose we employ a bidirectional LSTM model which
-is able to capture the past and future context for each token of a death cause description.
-Just as in our encoder-decoder model we encode each token using the
-concatenation of the fastText embeddings of the word from all three languages.
-To enable our model to attend to different parts of the death cause description
-we add an extra attention layer \cite{raffel_feed-forward_2015} to the model.
-Through the attention mechanism our model learns a fixed-sized embedding of the
-death cause description by computing an adaptive weighted average of the state
-sequence of the LSTM model. This allows the model to better integrate
-information over time. Figure \ref{fig:classification-model} presents the
-architecture of our ICD-10 classification model.
+The second step in our pipeline is to assign a ICD-10 code to the generated death cause description. 
+For this we employ a bidirectional LSTM model which is able to capture the past and future context for each token of a death cause description.
+Just as in our encoder-decoder model we encode each token using the concatenation of the fastText embeddings of the word from all three languages.
+To enable our model to attend to different parts of the death cause description we add an extra attention layer \cite{raffel_feed-forward_2015} to the model.
+Through the attention mechanism our model learns a fixed-sized embedding of the death cause description by computing an adaptive weighted average of the state sequence of the LSTM model. 
+This allows the model to better integrate information over time. Figure \ref{fig:classification-model} presents the architecture of our ICD-10 classification model.
 
 \begin{figure}
 \centering
-\includegraphics[width=\textwidth,trim={0cm 16.5cm 0cm
-3cm},clip=true]{classification-model.pdf} \caption{Illustration of the neural
-ICD-10 classification model. The model utilizes a bi-directional LSTM layer,
-which processes the death cause description from left to right and vice versa.
-The attention layer summarizes the whole description by computing an adaptive
-weighted average over the LSTM states. The resulting death cause embedding will
-be feed through a softmax layer to get the final classification. Equivalent to
-our encoder-decoder model all input tokens will be represented using the
-concatenation of the fastText embeddings of all three languages.}
+\includegraphics[width=\textwidth,trim={0cm 16.5cm 0cm 3cm},clip=true]{classification-model.pdf} 
+\caption{Illustration of the neural ICD-10 classification model. The model utilizes a bi-directional LSTM layer, which processes the death cause description from left to right and vice versa. 
+The attention layer summarizes the whole description by computing an adaptive weighted average over the LSTM states. 
+The resulting death cause embedding will be feed through a softmax layer to get the final classification. 
+Equivalent to our encoder-decoder model all input tokens will be represented using the concatenation of the fastText embeddings of all three languages.}
 \label{fig:classification-model}
 \end{figure}
  
-We train the model using the provided ICD-10 dictionaries from all three
-languages. During development we also experimented with character-level RNNs for
-better ICD-10 classification, however couldn't achieve any performance
+We train the model using the provided ICD-10 dictionaries from all three languages. 
+During development we also experimented with character-level RNNs for better ICD-10 classification, however couldn't achieve any performance
 approvements. 
 
diff --git a/paper/40_experiments.tex b/paper/40_experiments.tex
index cbe5dbeff9355537627fb59fb7fdaa4e1d360370..2f96df92e98b2d6ed3a7638f05d02667761d7b80 100644
--- a/paper/40_experiments.tex
+++ b/paper/40_experiments.tex
@@ -1,62 +1,39 @@
-In this section we will present experiments and obtained results for the two
-developed models, both individually as well as combined in a pipeline setting.
+In this section we will present experiments and obtained results for the two developed models, both individually as well as combined in a pipeline setting.
 
 \subsection{Training Data and Experiment Setup}
-The CLEF e-Health 2018 Task 1 participants where provided with annotated death
-certificates for the three selected languages: French, Italian and Hungarian.
-Each of the languages is supported by training certificate lines as well as a
-dictionary with death cause descriptions resp. diagnosises for the different ICD-10
-codes. The provided training data sets were imbalanced concerning the different
-languages: the Italian corpora consists of 49,823, French corpora of 77,348\footnote{For
-French we only took the provided data set from 2014.} and Hungarian corpora 323,175
-certificate lines. We split each data set into a training and a hold-out evaluation set. The
-complete training data set was then created by combining the certificate lines
-of all three languages into one data set. Despite the provided certificate data
-we used no further, external knowledge resources or annotated texts were
-incorporated.
-
-Due to time constraints during developement we didn't perform cross-validation
-to optimize the (hyper-) parameters and the inidividual layers of our models. We
-either keep the default values of the hyperparameters or set them to reasonable
-values according to existing work. During model training we shuffle the training
-instances and use varying validation instances to perform a validation of the
-epoch.
-
-As representation for the input tokens of the model we use pre-trained fastText
-word embeddings \cite{bojanowski_enriching_2016}. The embeddings were trained on
-Common Crawl and Wikipedia articles. For the training of the embeddings,
-Bojanowski et al. used the following parameter settings: CBOW with
-position-weights, embedding dimension size 300, with character n-grams of length
-5, a window of size 5 and 10 negatives. Unfortunately, they are trained on
-corpora not related with the biomedical domain and therefore do not represent
-the best possible textual basis for an embedding space for biomedical
-information extraction. Final embedding space used by our models is created by
-concatenating individual embedding vectors for all three languages. Thus the
-input of our model is embedding vector of size 900. All models were implemented
-with the Keras library \footnote{\url{https://keras.io/}} in Version X.X.
+The CLEF e-Health 2018 Task 1 participants where provided with annotated death certificates for the three selected languages: French, Italian and Hungarian.
+Each of the languages is supported by training certificate lines as well as a dictionary with death cause descriptions resp. diagnosis for the different ICD-10 codes. 
+The provided training data sets were imbalanced concerning the different languages: the Italian corpora consists of 49,823, French corpora of 77,348\footnote{For French we only took the provided data set from 2014.} and Hungarian corpora 323,175 certificate lines. 
+We split each data set into a training and a hold-out evaluation set. 
+The complete training data set was then created by combining the certificate lines of all three languages into one data set. 
+Beside the provided certificate data we used, no additional knowledge resources or annotated texts were used.
+
+Due to time constraints during development no cross-validation to optimize the (hyper-) parameters and the individual layers of our models was performed. 
+We either keep the default values of the hyper-parameters or set them to reasonable values according to existing work. 
+During model training we shuffle the training instances and use varying validation instances to perform a validation of the epoch.
+
+As representation for the input tokens of the model we use pre-trained fastText word embeddings. % \cite{bojanowski_enriching\_2016}. The embeddings were trained on Common Crawl and Wikipedia articles. 
+Embeddings' were trained using the following parameter settings: CBOW with position-weights, embedding dimension size 300, with character n-grams of length 5, a window of size 5 and 10 negatives. 
+Unfortunately, they are trained on corpora not related with the biomedical domain and therefore do not represent the best possible textual basis for an embedding space for biomedical information extraction. 
+Final embedding space used by our models is created by concatenating individual embedding vectors for all three languages. 
+Thus the input of our model is embedding vector of size 900. 
+All models were implemented with the Keras library \footnote{\url{https://keras.io/}}.% in Version X.X.
 
 \subsection{Death cause extraction model} 
-To identify possible tokens as candidates for a death cause description, we
-focused on the use of an encoder-decoder model. The encoder uses an embedding
-layer with input masking on zero values and a LSTM layer with 256 units. The 
-encoders output is used as the initial state of the decoder.
-
-The decoder generates, based on the input description from the dictionary and a
-special start token, a death cause word by word. This decoding process continues
-until a special end token is generated. The entire model is optimized using the
-Adam optimization algorithm \cite{kingma_adam:_2014} and a batch size of 700. Model
-training was performed either for 100 epochs or if an early stopping criteria is
-met (no change in validation loss for two epochs).
-
-As the available dataset are imbalanced concerning the different languages, we
-devised two different evaluation settings: (1) DCEM-Balanced, where each
-language was supported by 49.823 randomly drawn instances (size of the smallest
-corpus) and (2) DCEM-Full, where all available data is used. The results,
-obtained on the training and validation set, are shown in Table \ref{tab:s2s}.
-The figures reveal that distribution of training instances per language have a
-huge influence on the performance of the model. The model trained on the
-full training data achieves an accuracy of 0.678 on the validation set. In contrast
-using the balanced data set the model reaches an accuracy of 0.899 (+ 32.5\%).
+To identify possible candidates for a death cause description, we focus on the use of an encoder-decoder model. 
+The encoder model uses an embedding layer with input masking on zero values and a LSTM layer with 256 units. 
+The encoders' output is used as the initial state of the decoder model.
+
+Based on the input description from the dictionary and a special start token, the decoder generates a death cause word by word. 
+This decoding process continues until a special end token is generated. 
+The entire model is optimized using the Adam optimization algorithm \cite{kingma_adam:_2014} and a batch size of 700. 
+Model training was performed either for 100 epochs or until an early stopping criteria is met (no change in validation loss for two epochs).
+
+As the available dataset are imbalanced concerning the different languages, we devised two different evaluation settings: (1) DCEM-Balanced, where each language was supported by 49.823 randomly drawn instances (size of the smallest corpus) and (2) DCEM-Full, where all available data is used. 
+The results, obtained on the training and validation set, are shown in Table \ref{tab:s2s}.
+The figures reveal that distribution of training instances per language have a huge influence on the performance of the model. 
+The model trained on the full training data achieves an accuracy of 0.678 on the validation set. 
+In contrast using the balanced data set the model reaches an accuracy of 0.899 (+ 32.5\%).
 
 \begin{table}[]
 \label{tab:s2s}
@@ -68,6 +45,7 @@ using the balanced data set the model reaches an accuracy of 0.899 (+ 32.5\%).
 &&\textbf{Accuracy}&\textbf{Loss}&\textbf{Accuracy}&\textbf{Loss} \\
 \hline
 DCEM-Balanced &  18 & 0.958 & 0.205 & 0.899 & 0.634 \\
+\hline
 DCEM-Full &  9 &0.709 & 0.098 & 0.678 & 0.330  \\
 \bottomrule
 \end{tabularx}
@@ -77,36 +55,28 @@ data set setting.}
 \end{table}
 
 \subsection{ICD-10 Classification Model}
-The classification model is responsible for assigning a ICD-10 code to death
-cause description obtained during the first step. Our model uses an embedding
-layer with input masking on zero values, followed by and bidirectional LSTM
-layer with 256 dimension hidden layer. Thereafter an attention layer builds an
-adaptive weighted average over all LSTM states. The respective ICD-10 code will
-be determined by a dense layer with softmax activation function. We use the Adam
-optimizer to perform model training. The model was validated on 25\% of the
-data. As for the extraction model, no cross-validation or hyperparameter
-optimization was performed due to time contraints during development.
-
-Once again, we devised two approaches. This was mainly caused by the lack of
-adequate training data in terms of coverage for individual ICD-10 codes.
-Therefore, we once again defined two training data settings: (1) minimal, where
-only ICD-10 codes with two or more supporting training instances are used. This,
-of course, minimizes the number of ICD-10 codes in the label space. Therefore,
-(2) an extended dataset was defined. Here, the original ICD-10 code mappings,
-found in the supplied dictionaries, are extended with the training instances
-from individual certificate data from the three languages. Finally, for the
-remaining ICD-10 codes that have only one supporting diagnosis text resp. death
-cause description, we duplicate those data points. The goal of this approach is
-to extend our possible label space to all available ICD-10 codes. The results
-obtained from the two approaches on the validation set are shown in Table
-\ref{tab:icd10Classification}. Using the \textit{minimal} data set the model
-achieves an accuracy of 0.937. In contrast, using the extended data set the
-model reaches an accuracy of 0.954 which represents an improvment of 1.8\%.
+The classification model is responsible for assigning a ICD-10 code to death cause description obtained during the first step. 
+Our model uses an embedding layer with input masking on zero values, followed by and bidirectional LSTM layer with 256 dimension hidden layer. 
+Thereafter an attention layer builds an adaptive weighted average over all LSTM states. 
+The respective ICD-10 code will be determined by a dense layer with softmax activation function. 
+We use the Adam optimizer to perform model training. 
+The model was validated on 25\% of the data. 
+As for the extraction model, no cross-validation or hyper-parameter optimization was performed due to time constraints during development.
+
+Once again, we devised two approaches. This was mainly caused by the lack of adequate training data in terms of coverage for individual ICD-10 codes.
+Therefore, we once again defined two training data settings: (1) minimal (ICD-10\_Minimal), where only ICD-10 codes with two or more supporting training instances are used. 
+This, of course, minimizes the number of ICD-10 codes in the label space. 
+Therefore, (2) an extended (ICD-10\_Extended) dataset was defined. Here, the original ICD-10 code mappings, found in the supplied dictionaries, are extended with the training instances from individual certificate data from the three languages. 
+Finally, for the remaining ICD-10 codes that have only one supporting diagnosis text resp. death cause description, we duplicate those data points. 
+The goal of this approach is to extend our possible label space to all available ICD-10 codes. 
+The results obtained from the two approaches on the validation set are shown in Table \ref{tab:icd10Classification}. 
+Using the \textit{minimal} data set the model achieves an accuracy of 0.937. 
+In contrast, using the extended data set the model reaches an accuracy of 0.954 which represents an improvement of 1.8\%.
 
 \begin{table}[]
 \label{tab:icd10Classification}
 \centering
-\begin{tabularx}{0.85\textwidth}{p{2.25cm}|c|c|c|c|c} 
+\begin{tabularx}{0.9\textwidth}{p{2.25cm}|c|c|c|c|c} 
 \toprule
 %\multirow{2}{*}{\textbf{Tokenization}}&\multirow{2}{*}{\textbf{Model}}&\multirow{2}{*}{\textbf{Trained Epochs}}&\multicolumn{2}{c|}{\textbf{Train}}&\multicolumn{2}{c}{\textbf{Validation}} \\
 %\cline{4-7} 
@@ -114,33 +84,36 @@ model reaches an accuracy of 0.954 which represents an improvment of 1.8\%.
 \cline{3-6}
 &&\textbf{Accuracy}&\textbf{Loss}&\textbf{Accuracy}&\textbf{Loss} \\
 \hline
-Minimal &  69 & 0.925 & 0.190 & 0.937 & 0.169 \\
-Extended &  41 & 0.950 & 0.156 & 0.954 & 0.141 \\
+ICD-10\_Minimal &  69 & 0.925 & 0.190 & 0.937 & 0.169 \\
+\hline
+ICD-10\_Extended &  41 & 0.950 & 0.156 & 0.954 & 0.141 \\
 %Character & Minimal &   91 & 0.732 & 1.186 & 0.516 & 2.505 \\
 \bottomrule
 \end{tabularx}
-\caption{Experiment results for our ICD-10 classification model regarding different data settings. The \textit{Minimal} 
-setting uses only ICD-10 codes with two or more training instances in the supplied dictionary. In contrast, 
-\textit{Extended} addtionally takes the diagnosis texts from the certificate data and duplicates 
-ICD-10 training instances with only one diagnosis text in the dictionary and certificate lines.}
+\caption{Experiment results for our ICD-10 classification model regarding different data settings. 
+The \textit{Minimal} setting uses only ICD-10 codes with two or more training instances in the supplied dictionary. 
+In contrast, \textit{Extended} additionally takes the diagnosis texts from the certificate data and duplicates ICD-10 training instances with only one diagnosis text in the dictionary and certificate lines.}
 \end{table}
 
 \subsection{Complete Pipeline}
-The two models where combined to create the final pipeline. We tested both
-neural models in the final pipeline, as their performance differs greatly.
-As both ICD-10 classification models perform similarly, we used the word and
-extended ICD-10 classification model in the final pipeline. The results obtained
-during training are presented in Table \ref{tab:final_train}. Results obtained
-on the evaluation dataset are shown in Table \ref{tab:final_test}.
+The two models where combined to create the final pipeline. 
+We tested both neural models in the final pipeline, as their performance differs greatly. 
+As both ICD-10 classification models perform similarly, we used the word and extended ICD-10 classification model in the final pipeline. 
+The results obtained during training are presented in Table \ref{tab:final_train}. 
+Results obtained on the evaluation dataset are shown in Table \ref{tab:final_test}.
 
 \begin{table}[]
 \centering
-\begin{tabular}{|l|l|l|l|}
-Model &  Precision & Recall & F-score \\
-S2S balanced + ICD-10 extended & 0.73 & 0.61 & 0.61 \\
-S2S extended + ICD-10 extended & 0.74 & 0.62 & 0.63 \\
+\begin{tabular}{|l|c|c|c|}
+\toprule
+\textbf{Model} &  \textbf{Precision} & \textbf{Recall} & \textbf{F-score} \\
+\hline
+Final-Balanced & 0.73 & 0.61 & 0.61 \\
+\hline
+Final-Full & 0.74 & 0.62 & 0.63 \\
+\bottomrule
 \end{tabular}
-\caption{Final Pipeline Evaluation}
+\caption{Final Pipeline Performance - Training Data. Final-Balanced = DCEM-Balanced + ICD-10\_Extended. Final-Full = DCEM-Full + ICD-10\_Extended}
 \label{tab:final_train}
 \end{table}
 
@@ -151,8 +124,8 @@ S2S extended + ICD-10 extended & 0.74 & 0.62 & 0.63 \\
 \textbf{Language} & \textbf{Model} & \textbf{Precision} & \textbf{Recall} & \textbf{F-score}\\
 \hline
 \multirow{2}{*}{French}
-& DCEM-Balanced & 0.494 & 0.246 & 0.329 \\
-& DCEM-Full     & 0.512 & 0.253 & 0.339 \\
+& Final-Balanced & 0.494 & 0.246 & 0.329 \\
+& Final-Full     & 0.512 & 0.253 & 0.339 \\
 \cline{2-5}
 & Baseline      & 0.341 & 0.200 & 0.253 \\
 & Average       & 0.723 & 0.410 & 0.507 \\
@@ -160,8 +133,8 @@ S2S extended + ICD-10 extended & 0.74 & 0.62 & 0.63 \\
 \hline
 
 \multirow{2}{*}{Hungarian}
-& DCEM-Balanced & 0.518 & 0.384 & 0.441 \\
-& DCEM-Full     & 0.522 & 0.388 & 0.445 \\
+& Final-Balanced & 0.518 & 0.384 & 0.441 \\
+& Final-Full     & 0.522 & 0.388 & 0.445 \\
 \cline{2-5}
 & Baseline      & 0.243 & 0.174 & 0.202 \\
 & Average       & 0.827 & 0.783 & 0.803 \\
@@ -169,15 +142,15 @@ S2S extended + ICD-10 extended & 0.74 & 0.62 & 0.63 \\
 \hline
 
 \multirow{3}{*}{Italian} 
-& DCEM-Balanced & 0.857 & 0.685 & 0.761 \\
-& DCEM-Full     & 0.862 & 0.689 & 0.766 \\
+& Final-Balanced & 0.857 & 0.685 & 0.761 \\
+& Final-Full     & 0.862 & 0.689 & 0.766 \\
 \cline{2-5}
 & Baseline      & 0,165 & 0.172 & 0.169 \\
 & Average       & 0.844 & 0.760 & 0.799 \\
 & Median        & 0,900 & 0.824 & 0.863 \\
 \bottomrule
 \end{tabularx}
-\caption{Final Pipeline Evaluation}
+\caption{Final Pipeline Perfromance - Evaluation Data. Final-Balanced = DCEM-Balanced + ICD-10\_Extended. Final-Full = DCEM-Full + ICD-10\_Extended}
 \label{tab:final_test}
 \end{table}
 
diff --git a/paper/50_conclusion.tex b/paper/50_conclusion.tex
index f4e76cfdc61572028f81e0c58251fa9605f01609..c071a5bc14e3ed0fb51b9c56f3dffdbfd34a5fc5 100644
--- a/paper/50_conclusion.tex
+++ b/paper/50_conclusion.tex
@@ -1,33 +1,24 @@
-In this paper we tackled the problem of information extraction of death causes
-in an multilingual environment. The proposed solution was focused on the setup
-and evaluation of an initial language-independent model which relies on a
-heuristic mutual word embedding space for all three languages. The proposed pipeline
-is divided in two steps: possible token describing the death cause are generated
-by using a sequence to sequence model first. Afterwards the generated token
-sequence is normalized to a ICD-10 code using a distinct LSTM-based
-classification model with attention mechanism. During evaluation our best model
-achieves an f-measure of 0.34 for French, 0.45 for Hungarian and 0.77 for
-Italian. The obtained results are encouraging for furthur investigation however
-can't compete with the solutions of the other participants yet.
+In this paper we tackled the problem of information extraction of death causes in an multilingual environment. 
+The proposed solution was focused on the setup and evaluation of an initial language-independent model which relies on a
+heuristic mutual word embedding space for all three languages. 
+The proposed pipeline is divided in two steps: possible token describing the death cause are generated by using a sequence to sequence model first. 
+Afterwards the generated token sequence is normalized to a ICD-10 code using a distinct LSTM-based classification model with attention mechanism. 
+During evaluation our best model achieves an f-measure of 0.34 for French, 0.45 for Hungarian and 0.77 for Italian. 
+The obtained results are encouraging for further investigation however can't compete with the solutions of the other participants yet.
  
-We detected several issues with the proposed pipeline. These issues serve as
-prospective future work to us. First of all the representation of the input
-words can be improved in several ways. The word embeddings we used are not
-optimized to the biomedical domain but are trained on general text. Existing
-work was proven that in-domain embeddings improve the quality of achieved
-results. Although this was our initial approach, the difficulties of finding adequate
-in-domain corpora for selected languages has proven to be to a hard to tackle.
-Moreover, the multi-language embedding space is currently heuristically defined
-as concatenation of the three word embeddings models for individual tokens.
-Creating an unified embedding space would create a truly language-independent
-token representation. The improvement of the input layer will be the main focus
-of our future work.
+We detected several issues with the proposed pipeline. 
+These issues serve as prospective future work to us. 
+First of all the representation of the input words can be improved in several ways. 
+The word embeddings we used are not optimized to the biomedical domain but are trained on general text. 
+Existing work was proven that in-domain embeddings improve the quality of achieved results. 
+Although this was our initial approach, the difficulties of finding adequate in-domain corpora for selected languages has proven to be to a hard to tackle.
+Moreover, the multi-language embedding space is currently heuristically defined as concatenation of the three word embeddings models for individual tokens.
+Creating an unified embedding space would create a truly language-independent token representation. 
+The improvement of the input layer will be the main focus of our future work.
 
-The ICD-10 classification step also suffers from lack of adequate training
-data. Unfortunately, we were unable to obtain extensive ICD-10 dictinaries for all
-languages and therefore can't guarantee the completeness of the ICD-10 label
-space. Another disadvantage of the current pipeline is the missing support for
-mutli-label classification.
+The ICD-10 classification step also suffers from lack of adequate training data. 
+Unfortunately, we were unable to obtain extensive ICD-10 dictionaries for all languages and therefore can't guarantee the completeness of the ICD-10 label space. 
+Another disadvantage of the current pipeline is the missing support for mutli-label classification.
 
 
 
diff --git a/paper/wbi-eclef18.tex b/paper/wbi-eclef18.tex
index a11ff54a5d9eb43e2fa973673e80e28db0c6f5bd..d6d00dae21cfdef89a224629f7b5a27bcbdba6e7 100644
--- a/paper/wbi-eclef18.tex
+++ b/paper/wbi-eclef18.tex
@@ -45,20 +45,14 @@ Bioinformatics, \\ Berlin, Germany\\
 \maketitle              % typeset the header of the contribution
 % 
 \begin{abstract}
-This paper describes the participation of the WBI team in the CLEF eHealth 2018
-shared task 1 (``Multilingual Information Extraction - ICD-10 coding''). Our
-contribution focus on the setup and evaluation of a baseline language-independent
-neural architecture for ICD-10 classification as well as a simple, heuristic
-multi-language word embedding space. The approach builds on two recurrent
-neural networks models to extract and classify causes of death from French,
-Italian and Hungarian death certificates. First, we employ a LSTM-based
-sequence-to-sequence model to obtain a death cause from each death certificate
-line. We then utilize a bidirectional LSTM model with attention mechanism to
-assign the respective ICD-10 codes to the received death cause description. Both
-models take multi-language word embeddings as inputs. During evaluation our best
-model achieves an F-measure of 0.34 for French, 0.45 for Hungarian and 0.77 for
-Italian. The results are encouraging for future work as well as the extension and
-improvement of the proposed baseline system.
+This paper describes the participation of the WBI team in the CLEF eHealth 2018 shared task 1 (``Multilingual Information Extraction - ICD-10 coding''). 
+Our contribution focus on the setup and evaluation of a baseline language-independent neural architecture for ICD-10 classification as well as a simple, heuristic multi-language word embedding space. 
+The approach builds on two recurrent neural networks models to extract and classify causes of death from French, Italian and Hungarian death certificates. 
+First, we employ a LSTM-based sequence-to-sequence model to obtain a death cause from each death certificate line. 
+We then utilize a bidirectional LSTM model with attention mechanism to assign the respective ICD-10 codes to the received death cause description. 
+Both models take multi-language word embeddings as inputs. 
+During evaluation our best model achieves an F-measure of 0.34 for French, 0.45 for Hungarian and 0.77 for Italian. 
+The results are encouraging for future work as well as the extension and improvement of the proposed baseline system.
 
 \keywords{ICD-10 coding \and Biomedical information extraction \and Multi-lingual sequence-to-sequence model 
 \and Represention learning \and Recurrent neural network \and Attention mechanism \and Multi-language embeddings}