diff --git a/paper/20_related_work.tex b/paper/20_related_work.tex index 00d9f8e2eab982e6a24a69f07abadfc4ab972583..690a80fa2d5579c51318d6acfc4339811f4d6649 100644 --- a/paper/20_related_work.tex +++ b/paper/20_related_work.tex @@ -4,20 +4,22 @@ eHealth lab. Participating teams used a plethora of different approaches to tackle the classification problem. The methods can essentially be divided into two categories: knowledge-based \cite{cabot_sibm_2016,jonnagaddala_automatic_2017,van_mulligen_erasmus_2016} and -machine learning approaches. The former relies on lexical sources, medical -terminologies and other ontologies to match (parts of) the certificate text with -entries from the knowledge-bases according to a rule framework. For example, Di -Nunzio et al. \cite{di_nunzio_lexicon_2017} calculate a score for each ICD-10 -dictionary entry by summing the binary or tf-idf weights of each term of a -certificate line segment and assign the ICD-10 code with the highest score. In -contrast, Ho-Dac et al. \cite{ho-dac_litl_2017} treat the problem as information -retrieval task and utilze the SOLR search engine. +machine learning approaches +\cite{dermouche_ecstra-inserm_2016,ebersbach_fusion_2017,ho-dac_litl_2016,miftakhutdinov_kfu_2017}. +The former relies on lexical sources, medical terminologies and other ontologies +to match (parts of) the certificate text with entries from the knowledge-bases +according to a rule framework. For example, Di Nunzio et al. +\cite{di_nunzio_lexicon_2017} calculate a score for each ICD-10 dictionary entry +by summing the binary or tf-idf weights of each term of a certificate line +segment and assign the ICD-10 code with the highest score. In contrast, Ho-Dac +et al. \cite{ho-dac_litl_2017} treat the problem as information retrieval task +and utilze the SOLR search engine. The machine learning based approaches employ a variety techniques, e.g. Conditional Random Fields (CRFs) \cite{ho-dac_litl_2016}, Labeled Latent -Dirichlet Analysis (LDA) \cite{dermouche_ecstra} and Support Vector Machines +Dirichlet Analysis (LDA) \cite{dermouche_ecstra-inserm_2016} and Support Vector Machines (SVMs) \cite{ebersbach_fusion_2017} with diverse hand-crafted features. Most -similar to our approach is the work from Miftahutdinov and Tutbalina \cite{}, +similar to our approach is the work from Miftahutdinov and Tutbalina \cite{miftakhutdinov_kfu_2017}, which achieved the best results for English certificates in the last year's competition. They use a neural LSTM-based encoder-decoder model that processes the raw certificate text as input and encodes it into a vector representation. @@ -29,7 +31,7 @@ ICD-10 code in the decoding step. In contrast to their work, our approach introduces a model for multi-language ICD-10 classification. We utilitize two separate recurrent neural networks, one sequence to sequence model for symptom extraction and one for classification, to predict the ICD-10 codes for a -certificate text independent from which language they originate. +certificate text independent from which language they originate.