Fix references

9163335a · Mario Sänger · f5caafac · 9163335a · 9163335a · 9163335a
Commit 9163335a authored 6 years ago by Mario Sänger
--- a/paper/20_related_work.tex
+++ b/paper/20_related_work.tex
@@ -4,7 +4,7 @@ The section concludes with a summary of ICD-10 classification approaches used in

 \subsection{Recurrent neural networks (RNN)}
 RNNs are a widely used technique for sequence learning problems such as machine translation
-\cite{bahdanau_neural_2014,cho_learning_2014}, image captioning
+\cite{bahdanau_neural_2018,cho_learning_2014}, image captioning
 \cite{bengio_scheduled_2015}, named entity recognition
 \cite{lample_neural_2016,wei_disease_2016}, dependency parsing
 \cite{dyer_transition-based_2015} and POS-tagging \cite{wang_part--speech_2015}.
@@ -33,22 +33,22 @@ linear combination of both states.
 \subsection{Word Embeddings} 
 Distributional semantic models (DSMs) have been researched for decades in NLP \cite{turney_frequency_2010}.
 Based on a huge amount of unlabeled texts, DSMs aim to represent words using a real-valued vector (also called embedding) which captures syntactic and semantic similarities between the units.
-Starting with the publication of the work from Collobert et al. \cite{collobert_natural_2011} in 2011, learning embeddings for linguistic units, such as words, sentences or paragraphs, are one of the hot topics in NLP and a plethora of appraoches have been proposed \cite{bojanowski_enriching_2016,mikolov_distributed_2013,peters_deep_2018,pennington_glove}.
+Starting with the publication of the work from Collobert et al. \cite{collobert_natural_2011} in 2011, learning embeddings for linguistic units, such as words, sentences or paragraphs, are one of the hot topics in NLP and a plethora of appraoches have been proposed \cite{bojanowski_enriching_2017,mikolov_distributed_2013,peters_deep_2018,pennington_glove_2014}.
 
 The majority of todays embedding models are based on deep learning models trained to perform some kind of language modeling task \cite{peters_semi-supervised_2017,peters_deep_2018,pinter_mimicking_2017}. 
-The most popular embedding model is the Word2Vec model introduced by Mikolov et. al \cite{mikolov_distributed_2013, mikolov_efficient_2013}. 
+The most popular embedding model is the Word2Vec model introduced by Mikolov et. al \cite{mikolov_distributed_2013,mikolov_efficient_2013}. 
 They propose two shallow neural network models, continuous bag-of-words (CBOW) and SkipGram, that are trained to reconstruct the context given a center word and vice versa.
-In contrast, Pennington et al. \cite{pennington_glove:_2014} use the ratio between co-occurrence probabilities of two words with another one to learn a vector representation.
+In contrast, Pennington et al. \cite{pennington_glove_2014} use the ratio between co-occurrence probabilities of two words with another one to learn a vector representation.
 In \cite{peters_deep_2018} deep bi-directional LSTM models will be utilized to learn word embeddings that also capture different contexts of it. 

 Several recent models focus on the integration of subword and morphological information to provide suitable representations even for unseen, out-of-vocabulary words. 
 For example, Pinter et al. \cite{pinter_mimicking_2017} try to reconstruct a pre-trained word embedding by learning a bi-directional LSTM model on character level. 
-Similarily, Bojanowski et al. \cite{bojanowski_enriching_2016} adapt the SkipGram by taking character n-grams into account. 
+Similarily, Bojanowski et al. \cite{bojanowski_enriching_2017} adapt the SkipGram by taking character n-grams into account. 
 Within their so called fastText model they assign a vector representation to each character n-gram and represent words by summing over all of these representations of a word.

 Next to embeddings that capture word similiarities in one language also multi- resp. cross-lingual approaches have been investigated.
 Proposed methods either learn a linear mapping between monolingual representations \cite{faruqui_improving_2014,xing_normalized_2015} or utilize word- \cite{guo_cross-lingual_2015,vyas_sparse_2016}, sentence- \cite{pham_learning_2015} or document-aligned \cite{sogaard_inverted_2015} corpora to build a shared embedding space.
-   
+    
 \subsection{ICD-10 Classification}
 The ICD-10 coding task has already been carried out in the 2016 \cite{neveol_clinical_2016} and 2017 \cite{neveol_clef_2017} edition of the eHealth lab. 
 Participating teams used a plethora of different approaches to tackle the classification problem. 
@@ -66,10 +66,3 @@ The concatenation of both vector representations is then used to output the char
 In contrast to their work, our approach introduces a model for multi-language ICD-10 classification. 
 We utilize two separate RNNs, a sequence-to-sequence model for death cause extraction and a for classification, to predict the ICD-10 codes for a certificate text independent from the original language.

-
-
-
-
-
-
- 
--- a/paper/31_methods_seq2seq.tex
+++ b/paper/31_methods_seq2seq.tex
@@ -6,7 +6,7 @@ The goal of the model is to reassemble the dictionary death cause description te

 For this we adopt the encoder-decoder architecture proposed in \cite{sutskever_sequence_2014}. Figure \ref{fig:encoder_decoder} illustrates the architecture of the model. 
 As encoder we utilize a unidirectional LSTM model, which takes the single words of a certificate line as inputs and scans the line from left to right.
-Each token is represented using pre-trained fastText\footnote{\url{https://github.com/facebookresearch/fastText/}} word embeddings \cite{bojanowski_enriching_2016}.   
+Each token is represented using pre-trained fastText\footnote{\url{https://github.com/facebookresearch/fastText/}} word embeddings \cite{bojanowski_enriching_2017}.   
 We utilize fastText embedding models for French, Italian and Hungarian trained on Common Crawl and Wikipedia articles\footnote{\url{https://github.com/facebookresearch/fastText/blob/master/docs/crawl-vectors.md}}. 
 Independently from the original language a word we represent it by looking up the word in all three embedding models and concatenate the obtained vectors. 
 Through this we get a (basic) multi-language representation of the word. 
@@ -19,7 +19,7 @@ Encoders' final state represents the semantic representation of the certificate
 \caption{Illustration of the neural encoder-decoder model for death cause extraction. The encoder processes a death certificate line token-wise from left to right. The final state of the encoder forms a semantic representation of the line and serves as initial input for the decoding process. The decoder will be trained to predict the death cause description text from the provided ICD-10 dictionaries word by word (using special tags \textbackslash s and \textbackslash e for start resp. end of a sequence). All input tokens will be represented using the concatenation of the fastText embeddings %\cite{bojanowski_enriching_2016} 
 of all three languages.}
 \label{fig:encoder_decoder}
-\end{figure}
+\end{figure} 

 For the decoder we utilize another LSTM model. The initial input of the decoder is the final state of the encoder model. 
 Moreover, each token of the dictionary death cause description name (padded with special start and end tag) serves as input for the different time steps. 
@@ -27,5 +27,3 @@ Again, we use fastText embeddings of all three languages to represent the token.
 The decoder predicts one-hot-encoded words of the symptom name. 
 During test time we use the encoder to obtain a semantic representation of the certificate line and decode the death cause description word by word starting with the special start tag. 
 The decoding process finishes when the decoder outputs the end tag.
-
-
--- a/paper/32_methods_icd10.tex
+++ b/paper/32_methods_icd10.tex
@@ -2,7 +2,7 @@
 The second step in our pipeline is to assign a ICD-10 code to the generated death cause description. 
 For this we employ a bidirectional LSTM model which is able to capture the past and future context for each token of a death cause description.
 Just as in our encoder-decoder model we encode each token using the concatenation of the fastText embeddings of the word from all three languages.
-To enable our model to attend to different parts of the death cause description we add an extra attention layer \cite{raffel_feed-forward_2015} to the model.
+To enable our model to attend to different parts of the death cause description we add an extra attention layer \cite{raffel_feed-forward_2016} to the model.
 Through the attention mechanism our model learns a fixed-sized embedding of the death cause description by computing an adaptive weighted average of the state sequence of the LSTM model. 
 This allows the model to better integrate information over time. Figure \ref{fig:classification-model} presents the architecture of our ICD-10 classification model.


--- a/paper/references.bib
+++ b/paper/references.bib

-@article{peters_deep_2018,
+@inproceedings{peters_deep_2018,
 	title = {Deep contextualized word representations},
-	url = {https://openreview.net/forum?id=SJTCsqMUf},
-	journal = {arXiv:1802.05365},
 	abstract = {We introduce a new type of deep contextualized word representation that models both (1) complex characteristics of word use (e.g., syntax and semantics), and (2) how these uses vary across...},
 	urldate = {2018-02-16},
+	booktitle = {The 16th {Annual} {Conference} of the {North} {American} {Chapter} of the {Association} for {Computational} {Linguistics}},
 	author = {Peters, Matthew E. and Neumann, Mark and Iyyer, Mohit and Gardner, Matt and Clark, Christopher and Lee, Kenton and Zettlemoyer, Luke},
-	month = feb,
 	year = {2018},
 	keywords = {Context Embeedings, Document Classification, Embeddings, Read},
 	file = {arXiv\:1802.05365 PDF:/Users/mario/Zotero/storage/89C2DP8R/Peters et al. - 2018 - Deep contextualized word representations.pdf:application/pdf;arXiv.org Snapshot:/Users/mario/Zotero/storage/YF7GZNUI/1802.html:text/html;Full Text PDF:/Users/mario/Zotero/storage/2SWMPWEA/Peters et al. - 2018 - Deep contextualized word representations.pdf:application/pdf;Snapshot:/Users/mario/Zotero/storage/9X2UN33P/forum.html:text/html}
 }

-@article{peters_semi-supervised_2017,
-	title = {Semi-supervised sequence tagging with bidirectional language models},
-	url = {http://arxiv.org/abs/1705.00108},
-	abstract = {Pre-trained word embeddings learned from unlabeled text have become a standard component of neural network architectures for NLP tasks. However, in most cases, the recurrent network that operates on word-level representations to produce context sensitive representations is trained on relatively little labeled data. In this paper, we demonstrate a general semi-supervised approach for adding pre- trained context embeddings from bidirectional language models to NLP systems and apply it to sequence labeling tasks. We evaluate our model on two standard datasets for named entity recognition (NER) and chunking, and in both cases achieve state of the art results, surpassing previous systems that use other forms of transfer or joint learning with additional labeled data and task specific gazetteers.},
-	urldate = {2018-02-16},
-	journal = {arXiv:1705.00108},
-	author = {Peters, Matthew E. and Ammar, Waleed and Bhagavatula, Chandra and Power, Russell},
-	month = apr,
-	year = {2017},
-	note = {arXiv: 1705.00108},
-	keywords = {Read, Embeddings, Word Embeddings, Language Models},
-	file = {arXiv\:1705.00108 PDF:/Users/mario/Zotero/storage/DW4C3I9R/Peters et al. - 2017 - Semi-supervised sequence tagging with bidirectiona.pdf:application/pdf;arXiv.org Snapshot:/Users/mario/Zotero/storage/SQ4CHQJL/1705.html:text/html}
-}
-
-@article{bojanowski_enriching_2016,
-	title = {Enriching {Word} {Vectors} with {Subword} {Information}},
-	url = {http://arxiv.org/abs/1607.04606},
-	abstract = {Continuous word representations, trained on large unlabeled corpora are useful for many natural language processing tasks. Popular models that learn such representations ignore the morphology of words, by assigning a distinct vector to each word. This is a limitation, especially for languages with large vocabularies and many rare words. In this paper, we propose a new approach based on the skipgram model, where each word is represented as a bag of character \$n\$-grams. A vector representation is associated to each character \$n\$-gram; words being represented as the sum of these representations. Our method is fast, allowing to train models on large corpora quickly and allows us to compute word representations for words that did not appear in the training data. We evaluate our word representations on nine different languages, both on word similarity and analogy tasks. By comparing to recently proposed morphological word representations, we show that our vectors achieve state-of-the-art performance on these tasks.},
-	urldate = {2018-03-12},
-	journal = {arXiv:1607.04606 [cs]},
-	author = {Bojanowski, Piotr and Grave, Edouard and Joulin, Armand and Mikolov, Tomas},
-	month = jul,
-	year = {2016},
-	note = {arXiv: 1607.04606},
-	keywords = {Read, Embeddings, Word Embeddings, FastText},
-	file = {arXiv\:1607.04606 PDF:/Users/mario/Zotero/storage/9WC5C7M6/Bojanowski et al. - 2016 - Enriching Word Vectors with Subword Information.pdf:application/pdf;arXiv.org Snapshot:/Users/mario/Zotero/storage/YPS6YZHR/1607.html:text/html}
-}
-
-@article{pinter_mimicking_2017,
-	title = {Mimicking {Word} {Embeddings} using {Subword} {RNNs}},
-	url = {http://arxiv.org/abs/1707.06961},
-	abstract = {Word embeddings improve generalization over lexical features by placing each word in a lower-dimensional space, using distributional information obtained from unlabeled data. However, the effectiveness of word embeddings for downstream NLP tasks is limited by out-of-vocabulary (OOV) words, for which embeddings do not exist. In this paper, we present MIMICK, an approach to generating OOV word embeddings compositionally, by learning a function from spellings to distributional embeddings. Unlike prior work, MIMICK does not require re-training on the original word embedding corpus; instead, learning is performed at the type level. Intrinsic and extrinsic evaluations demonstrate the power of this simple approach. On 23 languages, MIMICK improves performance over a word-based baseline for tagging part-of-speech and morphosyntactic attributes. It is competitive with (and complementary to) a supervised character-based model in low-resource settings.},
-	urldate = {2018-03-12},
-	journal = {arXiv:1707.06961 [cs]},
-	author = {Pinter, Yuval and Guthrie, Robert and Eisenstein, Jacob},
-	month = jul,
-	year = {2017},
-	note = {arXiv: 1707.06961},
-	keywords = {Embeddings, Read, Word Embeddings},
-	file = {arXiv\:1707.06961 PDF:/Users/mario/Zotero/storage/33XVJS9Z/Pinter et al. - 2017 - Mimicking Word Embeddings using Subword RNNs.pdf:application/pdf;arXiv.org Snapshot:/Users/mario/Zotero/storage/5U39STXC/1707.html:text/html}
-}
-
 @inproceedings{neveol_clef_2017,
 	title = {{CLEF} {eHealth} 2017 {Multilingual} {Information} {Extraction} task overview: {ICD}10 coding of death certificates in {English} and {French}},
 	shorttitle = {{CLEF} {eHealth} 2017 {Multilingual} {Information} {Extraction} task overview},
@@ -105,7 +61,7 @@
 	file = {Fulltext:/Users/mario/Zotero/storage/494A5KSG/Mikolov et al. - 2013 - Efficient estimation of word representations in ve.pdf:application/pdf;Snapshot:/Users/mario/Zotero/storage/84YYF44Z/1301.html:text/html}
 }

-@inproceedings{pennington_glove:_2014,
+@inproceedings{pennington_glove_2014,
 	title = {Glove: {Global} vectors for word representation},
 	shorttitle = {Glove},
 	booktitle = {Proceedings of the 2014 conference on empirical methods in natural language processing ({EMNLP})},
@@ -214,11 +170,11 @@ The system proposed in this study provides automatic identification and characte
 	file = {Fulltext:/Users/mario/Zotero/storage/3UDDZ4LG/Hochreiter et al. - 2001 - Gradient flow in recurrent nets the difficulty of.pdf:application/pdf;Snapshot:/Users/mario/Zotero/storage/SU2LW7FM/Hochreiter et al. - 2001 - Gradient flow in recurrent nets the difficulty of.pdf:application/pdf}
 }

-@article{bahdanau_neural_2014,
+@inproceedings{bahdanau_neural_2018,
 	title = {Neural machine translation by jointly learning to align and translate},
-	journal = {arXiv preprint arXiv:1409.0473},
+	booktitle = {Proceedings of the 6th {International} {Conference} on {Learning} {Representations} ({ICLR} 2018)},
 	author = {Bahdanau, Dzmitry and Cho, Kyunghyun and Bengio, Yoshua},
-	year = {2014},
+	year = {2018},
 	file = {Fulltext:/Users/mario/Zotero/storage/IS5LGCET/Bahdanau et al. - 2014 - Neural machine translation by jointly learning to .pdf:application/pdf;Snapshot:/Users/mario/Zotero/storage/GR2XHEZN/1409.html:text/html}
 }

@@ -228,7 +184,6 @@ The system proposed in this study provides automatic identification and characte
 	booktitle = {Advances in {Neural} {Information} {Processing} {Systems} 28},
 	publisher = {Curran Associates, Inc.},
 	author = {Bengio, Samy and Vinyals, Oriol and Jaitly, Navdeep and Shazeer, Noam},
-	editor = {Cortes, C. and Lawrence, N. D. and Lee, D. D. and Sugiyama, M. and Garnett, R.},
 	year = {2015},
 	pages = {1171--1179},
 	file = {NIPS Full Text PDF:/Users/mario/Zotero/storage/D2B4JCFG/Bengio et al. - 2015 - Scheduled Sampling for Sequence Prediction with Re.pdf:application/pdf;NIPS Snapshort:/Users/mario/Zotero/storage/VDKFT7GD/5956-scheduled-sampling-for-sequence-prediction-with-recurrent-neural-networks.html:text/html}
@@ -236,7 +191,7 @@ The system proposed in this study provides automatic identification and characte

 @inproceedings{lample_neural_2016,
 	title = {Neural {Architectures} for {Named} {Entity} {Recognition}},
-	booktitle = {Proceedings of {NAACL}-{HLT}},
+	booktitle = {Proceedings of the 15th {Annual} {Conference} of the {North} {American} {Chapter} of the {Association} for {Computational} {Linguistics}: {Human} {Language} {Technologies}},
 	author = {Lample, Guillaume and Ballesteros, Miguel and Subramanian, Sandeep and Kawakami, Kazuya and Dyer, Chris},
 	year = {2016},
 	pages = {260--270},
@@ -246,7 +201,7 @@ The system proposed in this study provides automatic identification and characte
 @article{wei_disease_2016,
 	title = {Disease named entity recognition by combining conditional random fields and bidirectional recurrent neural networks},
 	volume = {2016},
-	journal = {Database},
+	journal = {Database: The Journal of Biological Databases and Curation},
 	author = {Wei, Qikang and Chen, Tao and Xu, Ruifeng and He, Yulan and Gui, Lin},
 	year = {2016},
 	file = {Fulltext:/Users/mario/Zotero/storage/CCKZ2IWM/2630532.html:text/html;Snapshot:/Users/mario/Zotero/storage/KPKNC9SU/2630532.html:text/html}
@@ -292,11 +247,11 @@ The system proposed in this study provides automatic identification and characte
 	file = {Fulltext:/Users/mario/Zotero/storage/XVFURMYQ/Hochreiter und Schmidhuber - 1997 - Long short-term memory.pdf:application/pdf;Snapshot:/Users/mario/Zotero/storage/BA5KN5ZW/neco.1997.9.8.html:text/html}
 }

-@article{raffel_feed-forward_2015,
+@inproceedings{raffel_feed-forward_2016,
 	title = {Feed-forward networks with attention can solve some long-term memory problems},
-	journal = {arXiv preprint arXiv:1512.08756},
+	booktitle = {Workshop {Extended} {Abstracts} of the 4th {International} {Conference} on {Learning} {Representations}},
 	author = {Raffel, Colin and Ellis, Daniel PW},
-	year = {2015},
+	year = {2016},
 	file = {Fulltext:/Users/mario/Zotero/storage/V3UB65AD/Raffel und Ellis - 2015 - Feed-forward networks with attention can solve som.pdf:application/pdf;Snapshot:/Users/mario/Zotero/storage/66LDNKRG/1512.html:text/html}
 }

@@ -320,7 +275,6 @@ The system proposed in this study provides automatic identification and characte
 @inproceedings{cho_learning_2014,
 	address = {Doha, Qatar},
 	title = {Learning {Phrase} {Representations} using {RNN} {Encoder}–{Decoder} for {Statistical} {Machine} {Translation}},
-	url = {http://www.aclweb.org/anthology/D14-1179},
 	urldate = {2018-05-23},
 	booktitle = {Proceedings of the 2014 {Conference} on {Empirical} {Methods} in {Natural} {Language} {Processing} ({EMNLP})},
 	publisher = {Association for Computational Linguistics},
@@ -335,7 +289,6 @@ The system proposed in this study provides automatic identification and characte
 	title = {Clinical {Information} {Extraction} at the {CLEF} {eHealth} {Evaluation} lab 2016},
 	volume = {1609},
 	issn = {1613-0073},
-	url = {https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5756095/},
 	abstract = {This paper reports on Task 2 of the 2016 CLEF eHealth evaluation lab which extended the previous information extraction tasks of ShARe/CLEF eHealth evaluation labs. The task continued with named entity recognition and normalization in French narratives, as offered in CLEF eHealth 2015. Named entity recognition involved ten types of entities including disorders that were defined according to Semantic Groups in the Unified Medical Language System® (UMLS®), which was also used for normalizing the entities. In addition, we introduced a large-scale classification task in French death certificates, which consisted of extracting causes of death as coded in the International Classification of Diseases, tenth revision (ICD10). Participant systems were evaluated against a blind reference standard of 832 titles of scientific articles indexed in MEDLINE, 4 drug monographs published by the European Medicines Agency (EMEA) and 27,850 death certificates using Precision, Recall and F-measure. In total, seven teams participated, including five in the entity recognition and normalization task, and five in the death certificate coding task. Three teams submitted their systems to our newly offered reproducibility track. For entity recognition, the highest performance was achieved on the EMEA corpus, with an overall F-measure of 0.702 for plain entities recognition and 0.529 for normalized entity recognition. For entity normalization, the highest performance was achieved on the MEDLINE corpus, with an overall F-measure of 0.552. For death certificate coding, the highest performance was 0.848 F-measure.},
 	urldate = {2018-05-23},
 	journal = {CEUR workshop proceedings},
@@ -450,15 +403,6 @@ The system proposed in this study provides automatic identification and characte
 	file = {Fulltext:/Users/mario/Zotero/storage/ZV5B2GQJ/Chan et al. - 2016 - Listen, attend and spell A neural network for lar.pdf:application/pdf;Snapshot:/Users/mario/Zotero/storage/RS8MBCM8/7472621.html:text/html}
 }

-@article{kingma_adam:_2014,
-	title = {Adam: {A} method for stochastic optimization},
-	shorttitle = {Adam},
-	journal = {arXiv preprint arXiv:1412.6980},
-	author = {Kingma, Diederik P. and Ba, Jimmy},
-	year = {2014},
-	file = {Snapshot:/Users/mario/Zotero/storage/YSR9BL4W/1412.html:text/html}
-}
-
 @inproceedings{vinyals_show_2015,
 	title = {Show and tell: {A} neural image caption generator},
 	shorttitle = {Show and tell},
@@ -532,4 +476,42 @@ The system proposed in this study provides automatic identification and characte
 	author = {Søgaard, Anders and Agić, Željko and Alonso, Héctor Martínez and Plank, Barbara and Bohnet, Bernd and Johannsen, Anders},
 	year = {2015},
 	file = {Fulltext:/Users/mario/Zotero/storage/UZN66Q7M/Søgaard et al. - 2015 - Inverted indexing for cross-lingual NLP.pdf:application/pdf;Snapshot:/Users/mario/Zotero/storage/26MECM8N/Søgaard et al. - 2015 - Inverted indexing for cross-lingual NLP.pdf:application/pdf}
+}
+
+@article{bojanowski_enriching_2017,
+	title = {Enriching {Word} {Vectors} with {Subword} {Information}},
+	volume = {5},
+	number = {1},
+	journal = {Transactions of the Association of Computational Linguistics},
+	author = {Bojanowski, Piotr and Grave, Edouard and Joulin, Armand and Mikolov, Tomas},
+	year = {2017},
+	pages = {135--146},
+	file = {Fulltext:/Users/mario/Zotero/storage/ZMHFQUNA/Bojanowski et al. - 2017 - Enriching Word Vectors with Subword Information.pdf:application/pdf;Snapshot:/Users/mario/Zotero/storage/USMIEAEL/Bojanowski et al. - 2017 - Enriching Word Vectors with Subword Information.pdf:application/pdf}
+}
+
+@inproceedings{kingma_adam:_2014,
+	title = {Adam: {A} method for stochastic optimization},
+	booktitle = {Proceedings of the 3rd {International} {Conference} on {Learning} {Representations} ({ICLR})},
+	author = {Kingma, Diederik P. and Ba, Jimmy},
+	year = {2014},
+	file = {Fulltext:/Users/mario/Zotero/storage/A9DC95XN/Kingma und Ba - 2014 - Adam A method for stochastic optimization.pdf:application/pdf;Snapshot:/Users/mario/Zotero/storage/4CQAFF7H/1412.html:text/html}
+}
+
+@inproceedings{peters_semi-supervised_2017,
+	title = {Semi-supervised sequence tagging with bidirectional language models},
+	volume = {1},
+	booktitle = {Proceedings of the 55th {Annual} {Meeting} of the {Association} for {Computational} {Linguistics} ({Volume} 1: {Long} {Papers})},
+	author = {Peters, Matthew and Ammar, Waleed and Bhagavatula, Chandra and Power, Russell},
+	year = {2017},
+	pages = {1756--1765},
+	file = {Fulltext:/Users/mario/Zotero/storage/UQYRUUBQ/Peters et al. - 2017 - Semi-supervised sequence tagging with bidirectiona.pdf:application/pdf;Snapshot:/Users/mario/Zotero/storage/PJ2YN7VR/Peters et al. - 2017 - Semi-supervised sequence tagging with bidirectiona.pdf:application/pdf}
+}
+
+@inproceedings{pinter_mimicking_2017,
+	title = {Mimicking {Word} {Embeddings} using {Subword} {RNNs}},
+	booktitle = {Proceedings of the 2017 {Conference} on {Empirical} {Methods} in {Natural} {Language} {Processing}},
+	author = {Pinter, Yuval and Guthrie, Robert and Eisenstein, Jacob},
+	year = {2017},
+	pages = {102--112},
+	file = {Fulltext:/Users/mario/Zotero/storage/QY3T7DCJ/Pinter et al. - 2017 - Mimicking Word Embeddings using Subword RNNs.pdf:application/pdf;Snapshot:/Users/mario/Zotero/storage/MD8TGGLY/Pinter et al. - 2017 - Mimicking Word Embeddings using Subword RNNs.pdf:application/pdf}
 }
\ No newline at end of file