diff --git a/paper/40_experiments.tex b/paper/40_experiments.tex
index cbc1d58c34743f62469b5415fc621f4d49ae9d4e..171c3aff2e92867a8236b0e79dc528ed68238b20 100644
--- a/paper/40_experiments.tex
+++ b/paper/40_experiments.tex
@@ -31,7 +31,7 @@ The entire model is optimized using the Adam optimization algorithm \cite{kingma
 Model training was performed either for 100 epochs or until an early stopping criteria is met (no change in validation loss for two epochs).
 
 As the provided dataset are imbalanced regarding the tasks' languages, we devised two different evaluation settings: (1) DCEM-Balanced, where each language was supported by 49.823 randomly drawn instances (size of the smallest corpus) and (2) DCEM-Full, where all available data is used. 
-The results, obtained on the training and validation set, are shown in Table \ref{tab:s2s}.
+Table \ref{tab:s2s} shows the results obtained on the training and validation set.
 The figures reveal that distribution of training instances per language have a huge influence on the performance of the model. 
 The model trained on the full training data achieves an accuracy of 0.678 on the validation set. 
 In contrast using the balanced data set the model reaches an accuracy of 0.899 (+ 32.5\%).
@@ -45,7 +45,7 @@ In contrast using the balanced data set the model reaches an accuracy of 0.899 (
 \cline{3-6}
 &&\textbf{Accuracy}&\textbf{Loss}&\textbf{Accuracy}&\textbf{Loss} \\
 \hline
-DCEM-Balanced&  18 & 0.958 & 0.205 & 0.899 & 0.634 \\
+DCEM-Balanced &  18 & 0.958 & 0.205 & 0.899 & 0.634 \\
 \hline
 DCEM-Full &  9 &0.709 & 0.098 & 0.678 & 0.330  \\
 \bottomrule
@@ -106,6 +106,7 @@ We tested both death cause extraction models (based on the balanced and unbalanc
 On the contrary, both ICD-10 classification models perform similarly, so we just used the extended ICD-10 classification model, with word level tokens\footnote{Although models supporting character level tokens were developed and evaluated, their performance faired poorly compared to the word level tokens.}, in the final pipeline. 
 To evaluate the pipeline we build a training and a hold-out validation set during development. 
 The obtained results on the validation set are presented in Table \ref{tab:final_train}. 
+The scores are calculated using a prevalence-weighted macro-average across the output classes, i.e. we calculated precision, recall and F-score for each ICD-10 code and build the average by weighting the scores by the number occurrences of the code in the gold standard into account. 
 
 \begin{table}[t!]
 \centering
@@ -118,7 +119,9 @@ Final-Balanced & 0.73 & 0.61 & 0.61 \\
 Final-Full & 0.74 & 0.62 & 0.63 \\
 \bottomrule
 \end{tabular}
-\caption{Evaluation results of the final pipeline on the validation set of the training data. Final-Balanced = DCEM-Balanced + ICD-10\_Extended. Final-Full = DCEM-Full + ICD-10\_Extended}
+\caption{Evaluation results of the final pipeline on the validation set of the training data. Reported figures represent
+the prevalence-weighted macro-average across the output classes. Final-Balanced = DCEM-Balanced + ICD-10\_Extended. 
+Final-Full = DCEM-Full + ICD-10\_Extended}
 \end{table}
 
 Although the individual models, as shown in Tables \ref{tab:s2s} and \ref{tab:icd10Classification} are promising, the performance decreases considerably in a pipeline setting . %, by roughly a third.
diff --git a/paper/references.bib b/paper/references.bib
index 95c0f2f502a2c30f22a11c831a3a6c16adba83f0..1dbd54967535b49075ca8313ca710284a47e0683 100644
--- a/paper/references.bib
+++ b/paper/references.bib
@@ -268,7 +268,7 @@ The system proposed in this study provides automatic identification and characte
 	title = {{CLEF} {eHealth} 2018 {Multilingual} {Information} {Extraction} task {Overview}: {ICD}10 {Coding} of {Death} {Certificates} in {French}, {Hungarian}  and {Italian}},
 	booktitle = {{CLEF} 2018 {Evaluation} {Labs} and {Workshop}: {Online} {Working} {Notes}},
 	publisher = {CEUR-WS},
-	author = {Névéol, Aurélie and Robert, Aude and Grippo, F and Lavergne, Thomas and Morgand, C and Orsi, C and Pelikán, L and Ramadier, Lionel and Rey, Grégoire and Zweigenbaum, Pierre},
+	author = {Névéol, Aurélie and Robert, Aude and Grippo, F and Morgand, C and Orsi, C and Pelikán, L and Ramadier, Lionel and Rey, Grégoire and Zweigenbaum, Pierre},
 	year = {2018}
 }