diff --git a/paper/20_related_work.tex b/paper/20_related_work.tex
index 00d9f8e2eab982e6a24a69f07abadfc4ab972583..690a80fa2d5579c51318d6acfc4339811f4d6649 100644
--- a/paper/20_related_work.tex
+++ b/paper/20_related_work.tex
@@ -4,20 +4,22 @@ eHealth lab. Participating teams used a plethora of different approaches to
 tackle the classification problem. The methods can essentially be divided into
 two categories: knowledge-based
 \cite{cabot_sibm_2016,jonnagaddala_automatic_2017,van_mulligen_erasmus_2016} and
-machine learning approaches. The former relies on lexical sources, medical
-terminologies and other ontologies to match (parts of) the certificate text with
-entries from the knowledge-bases according to a rule framework.  For example, Di
-Nunzio et al. \cite{di_nunzio_lexicon_2017} calculate a score for each ICD-10
-dictionary entry by summing the binary or tf-idf weights of each term of a
-certificate line segment and assign the ICD-10 code with the highest score. In
-contrast, Ho-Dac et al. \cite{ho-dac_litl_2017} treat the problem as information
-retrieval task and utilze the SOLR search engine.
+machine learning approaches
+\cite{dermouche_ecstra-inserm_2016,ebersbach_fusion_2017,ho-dac_litl_2016,miftakhutdinov_kfu_2017}.
+The former relies on lexical sources, medical terminologies and other ontologies
+to match (parts of) the certificate text with entries from the knowledge-bases
+according to a rule framework.  For example, Di Nunzio et al.
+\cite{di_nunzio_lexicon_2017} calculate a score for each ICD-10 dictionary entry
+by summing the binary or tf-idf weights of each term of a certificate line
+segment and assign the ICD-10 code with the highest score. In contrast, Ho-Dac
+et al. \cite{ho-dac_litl_2017} treat the problem as information retrieval task
+and utilze the SOLR search engine.
 
 The machine learning based approaches employ a variety techniques, e.g.
 Conditional Random Fields (CRFs) \cite{ho-dac_litl_2016}, Labeled Latent
-Dirichlet Analysis (LDA) \cite{dermouche_ecstra} and Support Vector Machines
+Dirichlet Analysis (LDA) \cite{dermouche_ecstra-inserm_2016} and Support Vector Machines
 (SVMs) \cite{ebersbach_fusion_2017} with diverse hand-crafted features. Most
-similar to our approach is the work from Miftahutdinov and Tutbalina \cite{},
+similar to our approach is the work from Miftahutdinov and Tutbalina \cite{miftakhutdinov_kfu_2017},
 which achieved the best results for English certificates in the last year's
 competition. They use a neural LSTM-based encoder-decoder model that processes the raw
 certificate text as input and encodes it into a vector representation.
@@ -29,7 +31,7 @@ ICD-10 code in the decoding step. In contrast to their work, our approach
 introduces a model for multi-language ICD-10 classification. We utilitize two
 separate recurrent neural networks, one sequence to sequence model for symptom
 extraction and one for classification, to predict the ICD-10 codes for a
-certificate text independent from which language they originate. 
+certificate text independent from which language they originate.