diff --git a/paper/10_introduction.tex b/paper/10_introduction.tex
index 045cff6251abbe30cd1200d2801275576fd28fb8..53b092b03c42e4ea6ba504e4dfcbc91431aa06d8 100644
--- a/paper/10_introduction.tex
+++ b/paper/10_introduction.tex
@@ -3,24 +3,23 @@ Computer-assisted approaches can improve decision making and support clinical pr
 In the past years major advances have been made in the area of natural-language processing (NLP).
 However, improvements in the field of biomedical text mining lag behind other domains mainly due to privacy issues and concerns regarding the processed data (e.g. electronic health records).
 
-The CLEF eHealth lab attends to circumvent this through organization of various shared tasks %which aid and support the development of approaches
+The CLEF eHealth lab\footnote{\url{https://sites.google.com/site/clefehealth/}} attends to circumvent this situation through organization of various shared tasks %which aid and support the development of approaches
 to exploit electronically available medical content \cite{suominen_overview_2018}.
 In particular, Task 1\footnote{\url{https://sites.google.com/view/clef-ehealth-2018/task-1-multilingual-information-extraction-icd10-coding}} of the lab is concerned with the extraction and classification of death causes from death certificates originating from different languages \cite{neveol_clef_2018}.
-Participants were asked to classify the death causes mentioned in the certificates according to the International Classification of Disease version 10 (ICD-10).
+Participants were asked to classify the death causes mentioned in the certificates according to the International Classification of Disease version 10 (ICD-10)\footnote{\url{http://www.who.int/classifications/icd/en/}}.
 The task %has been carried out the last two years of the lab, however
 was concerned with French and English death certificates in previous years.
 In contrast, this year the organizers provided annotated death reports as well as ICD-10 dictionaries for French, Italian and Hungarian.
 The development of language-independent, multilingual approaches was encouraged.
 
-Inspired by the recent success of recurrent neural network models (RNN) \cite{cho_learning_2014,lample_neural_2016,dyer_transition-based_2015} in general and the convincing performance of the work from Miftahutdinov and Tutbalina \cite{miftakhutdinov_kfu_2017} in CLEF eHealth Task 1 2017 competition, we opt for the development of a deep learning model for this year's task.
-Our work introduces a language independent approach for ICD-10 classification using multi-language word embeddings and LSTM-based recurrent models.
+Inspired by the recent success of recurrent neural network models (RNN) \cite{cho_learning_2014,lample_neural_2016,dyer_transition-based_2015} in general and the convincing performance of the work from Miftahutdinov and Tutbalina \cite{miftakhutdinov_kfu_2017} in the last edition of the lab, we opt for the development of a deep learning model for this year's competition.
+Our work introduces a prototypical, language independent approach for ICD-10 classification using multi-language word embeddings and long short-term memory models (LSTMs).
 We divide the proposed pipeline %$classification
 into two tasks.
 First, we perform Name Entity Recognition (NER), i.e. extract the death cause description from a certificate line, with an an encoder-decoder model.
-Given the death cause, Named Entity Normalization (NEN), i.e. assigning an ICD-10 code to extracted death cause, is performed by a separate long short-term memory model (LSTM).
-In this work we present the setup and evaluation of an initial, baseline language-independent approach which builds on a heuristic multi-language embedding space and therefore only needs one single model for all three data sets.
-Moreover, we tried to use as little as possible additional external resources.
- 
+Given the death cause, Named Entity Normalization (NEN), i.e. assigning an ICD-10 code to extracted death cause, is performed by a separate LSTM.
+Our approach builds upon a heuristic multi-language embedding space and therefore only needs one single model for all three data sets.
+With this work we want to experiment and evaluate which performance can be achieved with such a simple shared embedding space. 
 
 
 
diff --git a/paper/20_related_work.tex b/paper/20_related_work.tex
index 012886338e52d149536cd74929c48ef7f9448368..9378ec8652a28954f1b1ddcc329b374253d4d460 100644
--- a/paper/20_related_work.tex
+++ b/paper/20_related_work.tex
@@ -11,7 +11,7 @@ RNNs are a widely used technique for sequence learning problems such as machine
 RNNs model dynamic temporal behaviour in sequential data through recurrent
 units, i.e. the hidden, internal state of a unit in one time step depends on the
 internal state of the unit in the previous time step. These feedback connections
-enable the network to memorize information from recent time steps and capture
+enable the network to memorize information from recent time steps and add the ability to capture
 long-term dependencies. 
 
 However, training of RNNs can be difficult due to the vanishing gradient problem
@@ -32,21 +32,21 @@ linear combination of both states.
 
 \subsection{Word Embeddings} 
 Distributional semantic models (DSMs) have been researched for decades in NLP \cite{turney_frequency_2010}.
-Based on a huge amount of unlabeled texts, DSMs aim to represent words using a real-valued vector (also called embedding) which captures syntactic and semantic similarities between the units.
-Starting with the publication of the work from Collobert et al. \cite{collobert_natural_2011} in 2011, learning embeddings for linguistic units, such as words, sentences or paragraphs, are one of the hot topics in NLP and a plethora of appraoches have been proposed \cite{bojanowski_enriching_2017,mikolov_distributed_2013,peters_deep_2018,pennington_glove_2014}.
+Based on a huge amount of unlabeled texts, DSMs aim to represent words using a real-valued vector (also called embedding) which captures syntactic and semantic similarities between the words.
+Starting with the publication of the work from Collobert et al. \cite{collobert_natural_2011} in 2011, learning embeddings for linguistic units, such as words, sentences or paragraphs, ist one of the hot topics in NLP and a plethora of approaches have been proposed \cite{bojanowski_enriching_2017,mikolov_distributed_2013,peters_deep_2018,pennington_glove_2014}.
  
 The majority of todays embedding models are based on deep learning models trained to perform some kind of language modeling task \cite{peters_semi-supervised_2017,peters_deep_2018,pinter_mimicking_2017}. 
 The most popular embedding model is the Word2Vec model introduced by Mikolov et. al \cite{mikolov_distributed_2013,mikolov_efficient_2013}. 
 They propose two shallow neural network models, continuous bag-of-words (CBOW) and SkipGram, that are trained to reconstruct the context given a center word and vice versa.
 In contrast, Pennington et al. \cite{pennington_glove_2014} use the ratio between co-occurrence probabilities of two words with another one to learn a vector representation.
-In \cite{peters_deep_2018} deep bi-directional LSTM models will be utilized to learn word embeddings that also capture different contexts of it. 
+In \cite{peters_deep_2018} multi-layer, bi-directional LSTM models will be utilized to learn word embeddings that also capture different contexts of it. 
 
 Several recent models focus on the integration of subword and morphological information to provide suitable representations even for unseen, out-of-vocabulary words. 
 For example, Pinter et al. \cite{pinter_mimicking_2017} try to reconstruct a pre-trained word embedding by learning a bi-directional LSTM model on character level. 
 Similarily, Bojanowski et al. \cite{bojanowski_enriching_2017} adapt the SkipGram by taking character n-grams into account. 
-Within their so called fastText model they assign a vector representation to each character n-gram and represent words by summing over all of these representations of a word.
+Their so called fastText model assigns a vector representation to each character n-gram and represents words by summing over all of these representations of a word.
 
-Next to embeddings that capture word similiarities in one language also multi- resp. cross-lingual approaches have been investigated.
+Next to embeddings that capture word similiarities in one language also multi-/cross-lingual approaches have been investigated.
 Proposed methods either learn a linear mapping between monolingual representations \cite{faruqui_improving_2014,xing_normalized_2015} or utilize word- \cite{guo_cross-lingual_2015,vyas_sparse_2016}, sentence- \cite{pham_learning_2015} or document-aligned \cite{sogaard_inverted_2015} corpora to build a shared embedding space.
     
 \subsection{ICD-10 Classification}
@@ -55,14 +55,13 @@ Participating teams used a plethora of different approaches to tackle the classi
 The methods can essentially be divided into two categories: knowledge-based \cite{cabot_sibm_2016,jonnagaddala_automatic_2017,van_mulligen_erasmus_2016} and machine learning (ML) approaches \cite{dermouche_ecstra-inserm_2016,ebersbach_fusion_2017,ho-dac_litl_2016,miftakhutdinov_kfu_2017}.
 The former relies on lexical sources, medical terminologies and other ontologies to match (parts of) the certificate text with entries from the knowledge-bases according to a rule framework. 
 For example, Di Nunzio et al. \cite{di_nunzio_lexicon_2017} calculate a score for each ICD-10 dictionary entry by summing the binary or tf-idf weights of each term of a certificate line segment and assign the ICD-10 code with the highest score. 
-In contrast, Ho-Dac et al. \cite{ho-dac_litl_2017} treat the problem as information retrieval task and utilize the Apache Solr search engine\footnote{\url{http://lucene.apache.org/solr/}}.
+In contrast, Ho-Dac et al. \cite{ho-dac_litl_2017} treat the problem as information retrieval task and utilize the Apache Solr search engine\footnote{\url{http://lucene.apache.org/solr/}} to classify the individual lines.
 
 The ML-based approaches employ a variety of techniques, e.g. Conditional Random Fields (CRFs) \cite{ho-dac_litl_2016}, Labeled Latent Dirichlet Analysis (LDA) \cite{dermouche_ecstra-inserm_2016} and Support Vector Machines (SVMs) \cite{ebersbach_fusion_2017} with diverse hand-crafted features.
-
 Most similar to our approach is the work from Miftahutdinov and Tutbalina \cite{miftakhutdinov_kfu_2017}, which achieved the best results for English certificates in the last year's competition. 
 They use a neural LSTM-based encoder-decoder model that processes the raw certificate text as input and encodes it into a vector representation. 
-Furthermore a vector which captures the textual similarity between the certificate line and the death causes resp. diagnosis texts of the individual ICD-10 codes is used to integrate prior knowledge into the model. 
+Additionally, a vector which captures the textual similarity between the certificate line and the death causes resp. diagnosis texts of the individual ICD-10 codes is used to integrate prior knowledge into the model. 
 The concatenation of both vector representations is then used to output the characters and numbers of the ICD-10 code in the decoding step. 
 In contrast to their work, our approach introduces a model for multi-language ICD-10 classification. 
-We utilize two separate RNNs, a sequence-to-sequence model for death cause extraction and a for classification, to predict the ICD-10 codes for a certificate text independent from the original language.
+Moreover, we divide the task into two distinct steps: death cause extraction and ICD-10 classification.
 
diff --git a/paper/30_methods_intro.tex b/paper/30_methods_intro.tex
index ae1fbedfb8e723ca5134802f280c4028df5cf8e2..b724586f3815bc5f8f0411b07261433073bd5449 100644
--- a/paper/30_methods_intro.tex
+++ b/paper/30_methods_intro.tex
@@ -1,4 +1,4 @@
 Our approach models the extraction and classification of death causes as two-step process. 
 First, we employ a neural, multi-language sequence-to-sequence model to receive a death cause description for a given death certificate line.
 We then use a second classification model to assign the respective ICD-10 codes to the obtained death cause. 
-The remainder of this section detailed explanation of the architecture of the two models. 
\ No newline at end of file
+The remainder of this section gives a detailed explanation of the architecture of the two models. 
\ No newline at end of file
diff --git a/paper/31_methods_seq2seq.tex b/paper/31_methods_seq2seq.tex
index ba6249e6c008756c7bdc518d14407dc997fdf6c2..282d1857f3b3c6dd7c4085e87f85d80cf44361da 100644
--- a/paper/31_methods_seq2seq.tex
+++ b/paper/31_methods_seq2seq.tex
@@ -1,29 +1,29 @@
 \subsection{Death Cause Extraction Model}
-The first step in our pipeline is the extraction of the death cause description from a given certificate line. 
+The first step in our pipeline is the extraction of the death cause from a given certificate line. 
 We use the training certificate lines (with their corresponding ICD-10 codes) and the ICD-10 dictionaries as basis for our model. 
-The dictionaries provide us with death causes resp. diagnosis for each ICD-10 code. 
-The goal of the model is to reassemble the dictionary death cause description text from the certificate line.
+The dictionaries provide us with death causes for each ICD-10 code. 
+The goal of the model is to reassemble the dictionary death cause text from the certificate line.
 
 For this we adopt the encoder-decoder architecture proposed in \cite{sutskever_sequence_2014}. Figure \ref{fig:encoder_decoder} illustrates the architecture of the model. 
 As encoder we utilize a unidirectional LSTM model, which takes the single words of a certificate line as inputs and scans the line from left to right.
 Each token is represented using pre-trained fastText\footnote{\url{https://github.com/facebookresearch/fastText/}} word embeddings \cite{bojanowski_enriching_2017}.   
 We utilize fastText embedding models for French, Italian and Hungarian trained on Common Crawl and Wikipedia articles\footnote{\url{https://github.com/facebookresearch/fastText/blob/master/docs/crawl-vectors.md}}. 
 Independently from the original language a word we represent it by looking up the word in all three embedding models and concatenate the obtained vectors. 
-Through this we get a (basic) multi-language representation of the word. 
+Through this we get a simple multi-language representation of the word. 
 This heuristic composition constitutes a naive solution to build a multi-language embedding space. 
 However we opted to evaluate this approach as a simple baseline for future work. 
 Encoders' final state represents the semantic representation of the certificate line and serves as initial input for decoding process.
 
 \begin{figure}
 \includegraphics[width=\textwidth,trim={0 17cm 0 3cm},clip=true]{encoder-decoder-model.pdf} 
-\caption{Illustration of the neural encoder-decoder model for death cause extraction. The encoder processes a death certificate line token-wise from left to right. The final state of the encoder forms a semantic representation of the line and serves as initial input for the decoding process. The decoder will be trained to predict the death cause description text from the provided ICD-10 dictionaries word by word (using special tags \textbackslash s and \textbackslash e for start resp. end of a sequence). All input tokens will be represented using the concatenation of the fastText embeddings %\cite{bojanowski_enriching_2016} 
+\caption{Illustration of the encoder-decoder model for death cause extraction. The encoder processes a death certificate line token-wise from left to right. The final state of the encoder forms a semantic representation of the line and serves as initial input for the decoding process. The decoder will be trained to predict the death cause text from the provided ICD-10 dictionaries word by word (using special tags \textbackslash s and \textbackslash e for start resp. end of a sequence). All input tokens will be represented using the concatenation of the fastText embeddings %\cite{bojanowski_enriching_2016} 
 of all three languages.}
 \label{fig:encoder_decoder}
 \end{figure} 
 
 For the decoder we utilize another LSTM model. The initial input of the decoder is the final state of the encoder model. 
-Moreover, each token of the dictionary death cause description name (padded with special start and end tag) serves as input for the different time steps. 
-Again, we use fastText embeddings of all three languages to represent the token. 
-The decoder predicts one-hot-encoded words of the symptom name. 
+Moreover, each token of the dictionary death cause text (padded with special start and end tag) serves as (sequential) input. 
+Again, we use fastText embeddings of all three languages to represent the input tokens. 
+The decoder predicts one-hot-encoded words of the death cause. 
 During test time we use the encoder to obtain a semantic representation of the certificate line and decode the death cause description word by word starting with the special start tag. 
 The decoding process finishes when the decoder outputs the end tag.
diff --git a/paper/32_methods_icd10.tex b/paper/32_methods_icd10.tex
index 073cfff66c52198f52a9c4c1195b7cfcfd3ec454..d91df973d688c3e5a232b9609a22b08e18e954f0 100644
--- a/paper/32_methods_icd10.tex
+++ b/paper/32_methods_icd10.tex
@@ -2,21 +2,17 @@
 The second step in our pipeline is to assign a ICD-10 code to the generated death cause description. 
 For this we employ a bidirectional LSTM model which is able to capture the past and future context for each token of a death cause description.
 Just as in our encoder-decoder model we encode each token using the concatenation of the fastText embeddings of the word from all three languages.
-To enable our model to attend to different parts of the death cause description we add an extra attention layer \cite{raffel_feed-forward_2016} to the model.
+To enable our model to attend to different parts of the death cause we add an extra attention layer \cite{raffel_feed-forward_2016} to the model.
 Through the attention mechanism our model learns a fixed-sized embedding of the death cause description by computing an adaptive weighted average of the state sequence of the LSTM model. 
 This allows the model to better integrate information over time. Figure \ref{fig:classification-model} presents the architecture of our ICD-10 classification model.
+We train the model using the provided ICD-10 dictionaries from all three languages.
 
 \begin{figure}
 \centering
 \includegraphics[width=\textwidth,trim={0cm 16.5cm 0cm 3cm},clip=true]{classification-model.pdf} 
-\caption{Illustration of the neural ICD-10 classification model. The model utilizes a bi-directional LSTM layer, which processes the death cause description from left to right and vice versa. 
+\caption{Illustration of the ICD-10 classification model. The model utilizes a bi-directional LSTM layer, which processes the death cause from left to right and vice versa. 
 The attention layer summarizes the whole description by computing an adaptive weighted average over the LSTM states. 
 The resulting death cause embedding will be feed through a softmax layer to get the final classification. 
 Equivalent to our encoder-decoder model all input tokens will be represented using the concatenation of the fastText embeddings of all three languages.}
 \label{fig:classification-model}
-\end{figure}
- 
-We train the model using the provided ICD-10 dictionaries from all three languages. 
-During development we also experimented with character-level RNNs for better ICD-10 classification, however couldn't achieve any performance
-approvements. 
-
+\end{figure}