@@ -22,12 +22,12 @@ The generated sequence of tokens in then passed on to the next step for normaliz
This model consists of two parts: the encoder and the decoder.
The encoder uses an embedding layer with input masking on zero values and an LSTM with 256 dimensions.
The encoders output is used as the initial state of the decoder.
the decoder employs the same arhitecture, followed by a dense layer and a softmax activation function.
the decoder employs the same architecture, followed by a dense layer and a softmax activation function.
The model, based on the input sentence and a start token, generates tokens out of the vocabulary until it generated the end token.
The entire model is optimized using the Adam optimizer, with a batch size of 700.
The model trains either for 100 eopchs or if an eraly stoppping criteria is met (no change in validation loss for two epochs).
The model trains either for 100 epochs or if an early stopping criteria is met (no change in validation loss for two epochs).
As the available dataset is highly imbalanced, we devised two approaches: (1) balanced, where each language was supproted by 49.823 randomly drawn datapoints (lenght of the smallest corpus) and (2) extended, where all available data is used.
As the available dataset is highly imbalanced, we devised two approaches: (1) balanced, where each language was supported by 49.823 randomly drawn datapoints (lenght of the smallest corpus) and (2) extended, where all available data is used.
The results, obtained on the validation set, are shown in Table \ref{tab:s2s}.
\begin{table}[]
...
...
@@ -47,11 +47,11 @@ The model itself uses an embedding layer with input masking on zero values, foll
It is followed by an attention layer and a dense layer with a softmax activation function.
Adam was used as the optimizer.
The model was validated on 25\% od the data.
Again, no cross-validation or hyperparamter was performed.
Once again, we devised two approahces.
Again, no cross-validation or hyperparameter was performed.
Once again, we devised two approaches.
This was manly influenced by the lack of adequate training data in terms of coverage for individual ICD-10 codes.
Therefore, we once again defined two datasets: (1) minimal, where only ICD-10 codes with 2 or more supporting data points are used.
This, ofcourse, minimizes the number of ICD-10 codes in the label space.
This, ofcourse, minimizes the number of ICD-10 codes in the label space.
Therefore, (2) extended dataset was defined.
Here, the original ICD-10 codes mappings, found in the supplied dictionaries, are extended with the data from individual langugae Causes Calcules.
Finally, for the remaining ICD-10 codes with support of 1 we duplicate those datapoints.
...
...
@@ -60,7 +60,7 @@ The results obtained from the two approaches are shown in Table \ref{tab:icd10Cl
\begin{table}[]
\centering
\begin{tabular}{l|l|l|l|l|l}
\begin{tabular}{l|l|l|l|l|l|l}
Mode & Model & Trained for epochs & Train Accuracy & Train Loss & Validation Accuracy & Validation Loss \\
In this paper we tackled the problem of information extraction of death causes in an multilingual environment.
The proposed solution was focused in language-independent models and relies on word embeddings for each of the languages.
The proposed pipeline is divided in two steps: (1) first, possible token describing the death cause are generated by using a sequence to sequence model with attention mechanism; then, (2) generated token sequence is normalized to a possible ICD-10 code.