Skip to content
Snippets Groups Projects
Commit b2e0e924 authored by Mario Sänger's avatar Mario Sänger
Browse files

Updated introduction + added explanation of ICD-10 model + updated tables in experiments

parent ff54c4bd
No related merge requests found
......@@ -30,6 +30,15 @@ Tutbalina \cite{miftakhutdinov_kfu_2017} in the last year's competition we opt
for the development of a deep learning model for this year's task. Our work
introduces a language independent approach for ICD-10 classification using
multi-language word embeddings and LSTM-based recurrent models. We divide the
the classification into two tasks. First, we extract symptoms from a certificate
line backed by an encoder-decoder model. Given the symptoms the actual ICD-10
classification will be performed by a separate LSTM model.
\ No newline at end of file
the classification into two tasks. First, we extract the death cause description
from a certificate line backed by an encoder-decoder model. Given the death cause
the actual ICD-10 classification will be performed by a separate LSTM model. Our
work focus on the introduction of and the experiment with an
language-independent approach which requires as little as possible additional
resources and only needs one single model for all three languages.
......@@ -24,12 +24,12 @@ which achieved the best results for English certificates in the last year's
competition. They use a neural LSTM-based encoder-decoder model that processes the raw
certificate text as input and encodes it into a vector representation.
Furthermore a vector which captures the textual similarity between the
certificate line and the symptons resp. diagnosis of the individual ICD-10 codes
certificate line and the death causes resp. diagnosis texts of the individual ICD-10 codes
is used to integrate prior knowledge into the model. The concatenation of both
vector representations is then used to output the characters and numbers of the
ICD-10 code in the decoding step. In contrast to their work, our approach
introduces a model for multi-language ICD-10 classification. We utilitize two
separate recurrent neural networks, one sequence to sequence model for symptom
separate recurrent neural networks, one sequence to sequence model for death cause
extraction and one for classification, to predict the ICD-10 codes for a
certificate text independent from which language they originate.
......
Our approach models the extraction and classification of death causes as
two-step process. First, we employ a neural, multi-language sequence-to-sequence
model to receive a symptom name for a given death certificate line. We will then
model to receive a death cause description for a given death certificate line. We will then
use a second classification model to assign the respective ICD-10 codes to the
obtained symptom names. The remainder of this section gives a short introduction
obtained death cause. The remainder of this section gives a short introduction
to recurrent neural networks, followed by a detailed explanation of our two models.
\subsection{Recurrent neural networks}
......
\subsection{Symptom Model}
The first step in our pipeline is the extraction of a symptom name from a given
death certificate line. We use the training certificate lines (with their
corresponding ICD-10 codes) and the ICD-10 dictionaries as basis for
our model. The dictionaries provide us with a symptom name for each ICD-10 code.
The goal of the model is to reassemble the dictionary symptom name from the
certificate line.
\subsection{Death Cause Extraction Model}
The first step in our pipeline is the extraction of the death cause description
from a given certificate line. We use the training certificate lines (with their
corresponding ICD-10 codes) and the ICD-10 dictionaries as basis for our model.
The dictionaries provide us with death causes resp. diagnosis for each ICD-10
code. The goal of the model is to reassemble the dictionary death cause
description text from the certificate line.
For this we adopt the encoder-decoder architecture proposed in
\cite{sutskever_sequence_2014}. Figure \ref{fig:encoder_decoder} illustrates the
......@@ -23,24 +23,27 @@ the word. The encoders final state represents the semantic meaning of the
certificate line and serves as intial input for decoding process.
\begin{figure}
\includegraphics[width=\textwidth,trim={0 17cm 0 3cm},clip=true]{encoder-decoder-model.pdf}
\caption{Illustration of the neural encoder-decoder model for symptom
extraction. The encoder processes a death certificate line token-wise from left
to right. The final state of the encoder forms a semantic representation of the
line and serves as initial input for the decoding process. The decoder will be
trained to predict the symptom name word by word. All input tokens will be
represented using the concatenation of the FastText embeddings of all three
languages.}
\includegraphics[width=\textwidth,trim={0 17cm 0
3cm},clip=true]{encoder-decoder-model.pdf} \caption{Illustration of the neural
encoder-decoder model for death cause extraction. The encoder processes a death
certificate line token-wise from left to right. The final state of the encoder
forms a semantic representation of the line and serves as initial input for the
decoding process. The decoder will be trained to predict the death cause
description text from the provided ICD-10 dictionaries word by word (using
special tags \textbackslash s and \textbackslash e for start resp. end of a
sequence). All input tokens will be represented using the concatenation of the
FastText embeddings of all three languages.}
\label{fig:encoder_decoder}
\end{figure}
As decoder with utilize another LSTM model. The initial input of the decoder is
the final state of the encoder. Moreover, each token of the dictionary symptom
name (padded with special start and end tag) serves as input for the different
time steps. Again, we use FastEmbeddngs of all three languages to represent the
token. The decoder predicts one-hot-encoded words of the symptom name. During
test time we use the encoder to obtain a semantic representation of the
certificate line and decode the symptom name word by word starting with the
special start tag. The decoding process finishs when the decoder outputs the end
tag.
the final state of the encoder. Moreover, each token of the dictionary death
cause description name (padded with special start and end tag) serves as input
for the different time steps. Again, we use FastEmbeddngs of all three languages
to represent the token. The decoder predicts one-hot-encoded words of the
symptom name. During test time we use the encoder to obtain a semantic
representation of the certificate line and decode the death cause description
word by word starting with the special start tag. The decoding process finishs
when the decoder outputs the end tag.
\subsection{ICD-10 Classification Model}
The second step in our pipeline is to assign a ICD-10 code to the obtained
symptom name. For this purpose we employ a bidirectional LSTM model which is
able to capture the past and future context for each token of a symptom name.
Just as in our encoder-decoder disease name model we encode each token using the
The second step in our pipeline is to assign a ICD-10 code to the obtained death
cause description. For this purpose we employ a bidirectional LSTM model which
is able to capture the past and future context for each token of a death cause description.
Just as in our encoder-decoder model we encode each token using the
concatenation of the FastText embeddings of the word from all three languages.
To enable our model to attend to different parts of a disease name we add an
extra attention layer \cite{raffel_feed-forward_2015} to the model. We train the
model using the provided ICD-10 dictionaries from all three languages.
To enable our model to attend to different parts of the death cause description
we add an extra attention layer \cite{raffel_feed-forward_2015} to the model.
Through the attention mechanism our model learns a fixed-sized embedding of the
death cause description by computing an adaptive weighted average of the state
sequence of the LSTM model. This allows the model to better integrate
information over time. Figure \ref{fig:classification-model} presents the
architecture of our ICD-10 classification model.
During development we also experimented with character-level RNNs for better
ICD-10 classification, however couldn't achieve any performance approvements.
\begin{figure}
\centering
\includegraphics[width=\textwidth,trim={0cm 16.5cm 0cm
3cm},clip=true]{classification-model.pdf} \caption{Illustration of the neural
ICD-10 classification model. The model utilizes a bi-directional LSTM layer,
which processes the death cause description from left to right and vice versa.
The attention layer summarizes the whole description by computing an adaptive
weighted average over the LSTM states. The resulting death cause embedding will
be feed through a softmax layer to get the final classification. Equivalent to
our encoder-decoder model all input tokens will be represented using the
concatenation of the FastText embeddings of all three languages.}
\label{fig:classification-model}
\end{figure}
We train the model using the provided ICD-10 dictionaries from all three
languages. During development we also experimented with character-level RNNs for
better ICD-10 classification, however couldn't achieve any performance
approvements.
In this section we will present experminets and obtained results for the two developed models, both individually as well as combined in the proposed pipeline.
As mentioned in Section \ref{sec:methods}, The proposed pipeline combined two NN models.
\subsection{Available datasets}
The CLEF e-Health 2018 Task 1 participants where provided with annotated death certificates for the three selected languages: french, italian and hungarian.
Each of the languages is supported by several datasources.
Provided data sets are imbalanced; the italian corpora consists of 49.823, french corpora of 77.348 and hungarian corpora 323.175 datapoints.
The data used in this approach was created by combining available datasources and will be explained for each of the models.
No external data was used.
Each dataset was split in to a train and evaluation part.
Although no cross-valiation was performed during training, our models shuffeled the train dataset before each epoch.
Additionally, no hyperparameter optimization was performed during training, with the default parameters values for individual layers being used.
We used pretrained fastText\footnote{https://github.com/facebookresearch/fastText/blob/master/docs/crawl-vectors.md}[CITATION] word embeddings. The embeddings were trained on Common Crawl and Wikipedia.
The embeddings were trained with the following parameters: CBOW with position-weights, in dimension 300, with character n-grams of length 5, a window of size 5 and 10 negatives.
Unfortunatelly, they are trained on corpora not related with the biomedical domain and do not represent the best possible embedding space.
Final embedding space used by our models is created by concatenating individual embedding vectors.
All models were implemented with the Keras[CITATION] library.
\subsection{Named Entity Recognition with Sequence2Sequence model}
To identify possible tokens as candidates for death cause, we focused on the use of a sequence to sequence model.
The generated sequence of tokens in then passed on to the next step for normalization to a ICD-10 code.
This model consists of two parts: the encoder and the decoder.
The encoder uses an embedding layer with input masking on zero values and an LSTM with 256 dimensions.
The encoders output is used as the initial state of the decoder.
the decoder employs the same architecture, followed by a dense layer and a softmax activation function.
The model, based on the input sentence and a start token, generates tokens out of the vocabulary until it generated the end token.
The entire model is optimized using the Adam optimizer, with a batch size of 700.
The model trains either for 100 epochs or if an early stopping criteria is met (no change in validation loss for two epochs).
As the available dataset is highly imbalanced, we devised two approaches: (1) balanced, where each language was supported by 49.823 randomly drawn datapoints (lenght of the smallest corpus) and (2) extended, where all available data is used.
The results, obtained on the validation set, are shown in Table \ref{tab:s2s}.
In this section we will present experiments and obtained results for the two
developed models, both individually as well as combined in a pipeline setting.
\subsection{Training Data and Experiment Setup}
The CLEF e-Health 2018 Task 1 participants where provided with annotated death
certificates for the three selected languages: French, Italian and Hungarian.
Each of the languages is supported by several data sources. Provided data sets
were imbalanced concerning the different languages: the Italian corpora consists
of 49,823, French corpora of 77,348\footnote{For French we only took the
provided data from 2014.} and Hungarian corpora 323,175 certificate lines.
The training data used in this approach was created by combining the data
sources of all three languages. Despite the provided certificate data we used no
further, external data sources. Each dataset was split into a train and
a hold-out evaluation set. We didn't perform cross-validation during development, however
we shuffle the train and validation dataset before each training epoch.
Moreover, no hyperparameter optimization was performed due to time constraints
during the development phase. Instead we set default
the default parameters values for individual layers being used.
We used pre-trained fastText\footnote{https://github.com/facebookresearch/fastText/blob/master/docs/crawl-vectors.md}
word embeddings \cite{bojanowski_enriching_2016}. The embeddings were trained on
Common Crawl and a Wikipedia dump. The embeddings were trained with the
following parameters: CBOW with position-weights, embedding dimension size 300,
with character n-grams of length 5, a window of size 5 and 10 negatives.
Unfortunately, they are trained on corpora not related with the biomedical
domain and therefore do not represent the best possible embedding space for
biomedical information extraction. Final embedding space used by our models is
created by concatenating individual embedding vectors for all three languages.
Thus the input of our model is embedding vector of size 900. All models were
implemented with the Keras library \footnote{https://keras.io/}.
\subsection{Death cause extraction model}
To identify possible tokens as candidates for a death cause description, we
focused on the use of an encoder-decoder model. The encoder uses an embedding
layer with input masking on zero values and a LSTM layer with 256 units. The
encoders output is used as the initial state of the decoder.
The decoder generates, based on the input description from the dictionary and a
special start token, a death cause word by word. This decoding process continues
until a special end token is generated. The entire model is optimized using the
Adam optimizer and a batch size of 700. Model training was performed either for
100 epochs or if an early stopping criteria is met (no change in validation loss
for two epochs).
As the available dataset are imbalanced concerning the different languages, we
devised two approaches: (1) DCEM-Balanced, where each language was supported by
49.823 randomly drawn data points (size of the smallest corpus) and (2) DCEM-Full,
where all available data is used. The results, obtained on the validation set,
are shown in Table \ref{tab:s2s}.
\begin{table}[]
\centering
\begin{tabular}{l|l|l|l|l|l}
Model & Trained for epochs & Train Accuracy & Train Loss & Validation Accuracy & Validation Loss \\
Balanced & 18 & 0.958 & 0.205 & 0.899 & 0.634 \\
Extended & 9 &0.709 & 0.098 & 0.678 & 0.330 \\
\end{tabular}
\caption{Named Entity Recgonition: S2S model evaluation}
\begin{tabularx}{0.9\textwidth}{p{3cm}|c|c|c|c|c}
\toprule
\multirow{2}{*}{\textbf{Setting}} & \multirow{2}{*}{\textbf{Trained Epochs}}&\multicolumn{2}{c|}{\textbf{Train}}&\multicolumn{2}{c}{\textbf{Validation}} \\
\cline{3-6}
&&\textbf{Accuracy}&\textbf{Loss}&\textbf{Accuracy}&\textbf{Loss} \\
\hline
DCEM-Balanced & 18 & 0.958 & 0.205 & 0.899 & 0.634 \\
DCEM-Full & 9 &0.709 & 0.098 & 0.678 & 0.330 \\
\bottomrule
\end{tabularx}
\caption{Experiment results of our death cause extraction sequence-to-sequence
model concerning balanced (equal number of training data per language) and full
data set setting.}
\label{tab:s2s}
\end{table}
\subsection{Named Entity Normalization with ICD-10 Classification}
As input the model described here expects a string, which we generate in the previous step.
The model itself uses an embedding layer with input masking on zero values, followed by and bidirectional LSTM layer with 256 dimension hidden layer.
It is followed by an attention layer and a dense layer with a softmax activation function.
Adam was used as the optimizer.
The model was validated on 25\% od the data.
Again, no cross-validation or hyperparameter was performed.
Once again, we devised two approaches.
This was manly influenced by the lack of adequate training data in terms of coverage for individual ICD-10 codes.
Therefore, we once again defined two datasets: (1) minimal, where only ICD-10 codes with 2 or more supporting data points are used.
This, of course, minimizes the number of ICD-10 codes in the label space.
Therefore, (2) extended dataset was defined.
Here, the original ICD-10 codes mappings, found in the supplied dictionaries, are extended with the data from individual langugae Causes Calcules.
Finally, for the remaining ICD-10 codes with support of 1 we duplicate those datapoints.
The goal of this approach is to extend our possible label space to all available ICD-1o labels.
The results obtained from the two approaches are shown in Table \ref{tab:icd10Classification}.
\subsection{ICD-10 Classification Model}
The classification model is responsible for assigning a ICD-10 code to death
cause description obtained during the first step. Our model uses an embedding
layer with input masking on zero values, followed by and bidirectional LSTM
layer with 256 dimension hidden layer. Thereafter a attention layer builds an
adaptive weighted average over all LSTM states. They ICD-10 code will be
determined by a dense layer with softmax activation function.
We use the Adam optimizer to perform model training. The model was validated on
25\% od the data. As for the extraction model, no cross-validation or
hyperparameter was performed due to time contraints during development. Once
again, we devised two approaches. This was manly influenced by the lack of
adequate training data in terms of coverage for individual ICD-10 codes.
Therefore, we once again defined two datasets: (1) minimal, where only ICD-10
codes with 2 or more supporting data points are used. This, of course, minimizes
the number of ICD-10 codes in the label space. Therefore, (2) an extended
dataset was defined. Here, the original ICD-10 codes mappings, found in the
supplied dictionaries, are extended with the data from individual langugae
Causes Calcules. Finally, for the remaining ICD-10 codes with support of 1 we
duplicate those datapoints. The goal of this approach is to extend our possible
label space to all available ICD-10 codes. The results obtained from the two
approaches are shown in Table \ref{tab:icd10Classification}.
\begin{table}[]
\centering
\begin{tabular}{l|l|l|l|l|l|l}
Tokenization & Model & Trained for epochs & Train Accuracy & Train Loss & Validation Accuracy & Validation Loss \\
\begin{tabularx}{\textwidth}{p{2.25cm}|p{1.75cm}|c|c|c|c|c}
\toprule
\multirow{2}{*}{\textbf{Tokenization}}&\multirow{2}{*}{\textbf{Model}}&\multirow{2}{*}{\textbf{Trained Epochs}}&\multicolumn{2}{c|}{\textbf{Train}}&\multicolumn{2}{c}{\textbf{Validation}} \\
\cline{4-7}
&&&\textbf{Accuracy}&\textbf{Loss}&\textbf{Accuracy}&\textbf{Loss} \\
\hline
Word & Minimal & 69 & 0.925 & 0.190 & 0.937 & 0.169 \\
Word & Extended & 41 & 0.950 & 0.156 & 0.954 & 0.141 \\
Character & Minimal & 91 & 0.732 & 1.186 & 0.516 & 2.505 \\
\end{tabular}
\caption{Named Entity Normalization: ICD-10 Classification }
\bottomrule
\end{tabularx}
\caption{Experiment results for our ICD-10 classification model regarding different settings.}
\label{tab:icd10Classification}
\end{table}
\subsection{Final Pipeline}
The two modeles where combined to create the final pipeline.
We tested both NER models in the final pipeline, as their performance differs significantly.
As both NEN models performe similary, we used the word and extended ICD-10 classification model in the final pipeline.
The results obtained during training are presented in Table \ref{tab:final_train}.
Results obtained on the evaluation dataset are shown in Table \ref{tab:final_test}
\subsection{Complete Pipeline}
The two models where combined to create the final pipeline. We tested both
neural models in the final pipeline, as their performance differs greatly.
As both ICD-10 classification models perform similarly, we used the word and
extended ICD-10 classification model in the final pipeline. The results obtained
during training are presented in Table \ref{tab:final_train}. Results obtained
on the evaluation dataset are shown in Table \ref{tab:final_test}.
\begin{table}[]
\centering
\begin{tabular}{|l|l|l|l|}
Model & Precision & Recall & F-1\\
Model & Precision & Recall & F-score \\
S2S balanced + ICD-10 extended & 0.73 & 0.61 & 0.61 \\
S2S extended + ICD-10 extended & 0.74 & 0.62 & 0.63 \\
\end{tabular}
\caption{Final Pipeline Evaluation}
\label{tab:final_train}
\begin{tabular}{|l|l|l|l|}
Model & Precision & Recall & F-1\\
S2S balanced + ICD-10 extended & & & \\
S2S extended + ICD-10 extended & & & \\
\end{tabular}
\end{table}
\begin{table}[]
\centering
\begin{tabularx}{0.8\textwidth}{p{2cm}|p{3cm}|c|c|c}
\toprule
\textbf{Language} & \textbf{Model} & \textbf{Precision} & \textbf{Recall} & \textbf{F-score}\\
\hline
\multirow{2}{*}{French}
& DCEM-Balanced & 0.494 & 0.246 & 0.329 \\
& DCEM-Full & 0.512 & 0.253 & 0.339 \\
\cline{2-5}
& Baseline & 0.341 & 0.200 & 0.253 \\
& Average & 0.723 & 0.410 & 0.507 \\
& Median & 0.798 & 0.475 & 0.579 \\
\hline
\multirow{2}{*}{Hungarian}
& DCEM-Balanced & 0.518 & 0.384 & 0.441 \\
& DCEM-Full & 0.522 & 0.388 & 0.445 \\
\cline{2-5}
& Baseline & 0.243 & 0.174 & 0.202 \\
& Average & 0.827 & 0.783 & 0.803 \\
& Median & 0.922 & 0.897 & 0.910 \\
\hline
\multirow{3}{*}{Italian}
& DCEM-Balanced & 0.857 & 0.685 & 0.761 \\
& DCEM-Full & 0.862 & 0.689 & 0.766 \\
\cline{2-5}
& Baseline & 0,165 & 0.172 & 0.169 \\
& Average & 0.844 & 0.760 & 0.799 \\
& Median & 0,900 & 0.824 & 0.863 \\
\bottomrule
\end{tabularx}
\caption{Final Pipeline Evaluation}
\label{tab:final_test}
\end{table}
In this paper we tackled the problem of information extraction of death causes in an multilingual environment.
The proposed solution was focused in language-independent models and relies on word embeddings for each of the languages.
In this paper we tackled the problem of information extraction of death causes
in an multilingual environment. The proposed solution was focused in language-independent models and relies on
word embeddings for each of the languages.
The proposed pipeline is divided in two steps: (1) first, possible token describing the death cause are generated by using a sequence to sequence model with attention mechanism; then, (2) generated token sequence is normalized to a possible ICD-10 code.
We detected several issues with the proposed pipeline. These issues also serve as prospecitve future work.
......@@ -13,7 +14,7 @@ Creating an unifying embeddings space would create a truly language-independent
Additionally, it was shown that in-domain embeddings improve the quality of achieved results. This will be the main focus on our future work.
The normalization step also suffered from lack of adequate training data.
Unfortunately, we were unable to obtain ICD-10 dictinaries for all languages and can, therefore, not guarantee the completeness of the ICD-10 label space.
Final downside of the proposed pipeline is the lack fo support for mutli-label classification.
Another downside of the proposed pipeline is the lack fo support for mutli-label classification.
......
File added
File added
No preview for this file type
No preview for this file type
@article{bojanowski_enriching_2016,
title = {Enriching {Word} {Vectors} with {Subword} {Information}},
url = {http://arxiv.org/abs/1607.04606},
abstract = {Continuous word representations, trained on large unlabeled corpora are useful for many natural language processing tasks. Popular models that learn such representations ignore the morphology of words, by assigning a distinct vector to each word. This is a limitation, especially for languages with large vocabularies and many rare words. In this paper, we propose a new approach based on the skipgram model, where each word is represented as a bag of character \$n\$-grams. A vector representation is associated to each character \$n\$-gram; words being represented as the sum of these representations. Our method is fast, allowing to train models on large corpora quickly and allows us to compute word representations for words that did not appear in the training data. We evaluate our word representations on nine different languages, both on word similarity and analogy tasks. By comparing to recently proposed morphological word representations, we show that our vectors achieve state-of-the-art performance on these tasks.},
urldate = {2018-03-12},
journal = {arXiv:1607.04606 [cs]},
author = {Bojanowski, Piotr and Grave, Edouard and Joulin, Armand and Mikolov, Tomas},
month = jul,
year = {2016},
note = {arXiv: 1607.04606},
keywords = {Read, Embeddings, Word Embeddings, FastText},
file = {arXiv\:1607.04606 PDF:/Users/mario/Zotero/storage/9WC5C7M6/Bojanowski et al. - 2016 - Enriching Word Vectors with Subword Information.pdf:application/pdf;arXiv.org Snapshot:/Users/mario/Zotero/storage/YPS6YZHR/1607.html:text/html}
}
@inproceedings{neveol_clef_2017,
title = {{CLEF} {eHealth} 2017 {Multilingual} {Information} {Extraction} task overview: {ICD}10 coding of death certificates in {English} and {French}},
shorttitle = {{CLEF} {eHealth} 2017 {Multilingual} {Information} {Extraction} task overview},
......@@ -16,7 +30,7 @@
publisher = {CEUR-WS},
author = {Miftakhutdinov, Zulfat and Tutubalina, Elena},
year = {2017},
keywords = {CLEF, ICD-10-Classification, Read},
keywords = {Read, CLEF, ICD-10-Classification},
file = {Fulltext:/Users/mario/Zotero/storage/HRZ6Q8Q6/Miftakhutdinov und Tutubalina - 2017 - Kfu at clef ehealth 2017 task 1 Icd-10 coding of .pdf:application/pdf;Snapshot:/Users/mario/Zotero/storage/J8TXTUNT/Miftakhutdinov und Tutubalina - 2017 - Kfu at clef ehealth 2017 task 1 Icd-10 coding of .pdf:application/pdf}
}
......@@ -318,4 +332,25 @@ The system proposed in this study provides automatic identification and characte
author = {Ebersbach, Mike and Herms, Robert and Eibl, Maximilian},
year = {2017},
file = {Fulltext:/Users/mario/Zotero/storage/LKIZA2P4/Ebersbach et al. - 2017 - Fusion Methods for ICD10 Code Classification of De.pdf:application/pdf;Snapshot:/Users/mario/Zotero/storage/CIX48RIC/Ebersbach et al. - 2017 - Fusion Methods for ICD10 Code Classification of De.pdf:application/pdf}
}
@inproceedings{xu_show_2015,
title = {Show, attend and tell: {Neural} image caption generation with visual attention},
shorttitle = {Show, attend and tell},
booktitle = {International {Conference} on {Machine} {Learning}},
author = {Xu, Kelvin and Ba, Jimmy and Kiros, Ryan and Cho, Kyunghyun and Courville, Aaron and Salakhudinov, Ruslan and Zemel, Rich and Bengio, Yoshua},
year = {2015},
pages = {2048--2057},
file = {Fulltext:/Users/mario/Zotero/storage/QASCM4G3/Xu et al. - 2015 - Show, attend and tell Neural image caption genera.pdf:application/pdf;Snapshot:/Users/mario/Zotero/storage/VILIPKYC/Xu et al. - 2015 - Show, attend and tell Neural image caption genera.pdf:application/pdf}
}
@inproceedings{chan_listen_2016,
title = {Listen, attend and spell: {A} neural network for large vocabulary conversational speech recognition},
shorttitle = {Listen, attend and spell},
booktitle = {Acoustics, {Speech} and {Signal} {Processing} ({ICASSP}), 2016 {IEEE} {International} {Conference} on},
publisher = {IEEE},
author = {Chan, William and Jaitly, Navdeep and Le, Quoc and Vinyals, Oriol},
year = {2016},
pages = {4960--4964},
file = {Fulltext:/Users/mario/Zotero/storage/ZV5B2GQJ/Chan et al. - 2016 - Listen, attend and spell A neural network for lar.pdf:application/pdf;Snapshot:/Users/mario/Zotero/storage/RS8MBCM8/7472621.html:text/html}
}
\ No newline at end of file
......@@ -6,6 +6,8 @@
\usepackage[utf8]{inputenc}
\usepackage[english]{babel}
\usepackage{color}
\usepackage{multirow,tabularx}
\usepackage{booktabs}
% Used for displaying a sample figure. If possible, figure files should
% be included in EPS format.
......
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment