COMPUTATION AND PSYCHOLINGUISTICS LAB

@ JOHNS HOPKINS UNIVERSITY

What are the mental representations that constitute our knowledge of language? How do we use them to understand and produce language? Some people, unsure in their own level of expertise, guide from writing services like Order-Essays.com, but the solution can be even easier than that. In the Computation and Psycholinguistics Lab, we address these questions and others through the use of computational models and human experiments. Our lab is part of the Department of Cognitive Science at Johns Hopkins University, and we frequently collaborate with the Center for Language and Speech Processing. Read on to learn more about who we are and what we do.

Students (from left) Grusha, Karl, Junghyun, Suhas, and Tom unpack the black box of NLP.

LAB NEWS

Tom and Tal's work was featured in an article in Quanta Magazine on the state of language understanding in NLP. (October 17th, 2019)
Grusha, Marty and Tal's paper on syntactic priming for studying neural network representations is accepted to CoNLL. (August 28th, 2019)
Marty, Aaron Mueller, and Tal's paper on data efficiency of neural network language models is accepted to EMNLP. (August 13, 2019)
Tal is giving a keynote talk at the Conference on Formal Grammar in Riga (August 11, 2019)
Tal is participating in the Workshop on Compositionality in Brains and Machines in Leiden (August 5-9, 2019)
Tal is co-organizing the BlackboxNLP 2019workshop at ACL (August 1, 2019)
Karl Mulligan joins the lab as a PhD student, and Junghyun Min joins the lab as a Masters student (July 2019)
Tal and Yoav Goldberg were awarded a grant by the US-Israel Binational Science Foundation. (July 2019)
Keynote talk at the Workshop on Evaluating Vector Space Representations for NLP (RepEval) at NAACL. Slides. (June 6, 2019)
Tal gave talks at Waseda University and the RIKEN Institute in Tokyo (May 24, 2019)
Paper on syntactic heuristics in neural language inference systems accepted to ACL (May 13, 2019)
Upcoming keynote talk at Midwest Speech and Language Days (May 3, 2019)
Upcoming talk at the Cognitive Talk Series at Princeton (April 24, 2019)
Tal gave a talk at the CompLang seminar at MIT (April 18, 2019)
Tal received a Google Faculty Research Award (March 13, 2019)
Tal gave LTI colloquium talk at Carnegie Mellon University (February 8, 2019)
Grusha had two abstracts accepted for CUNY 2019, one with Tal and the other with Tal and Marty (January 25, 2019)
Marty gave a presentation at SCiL (January 4, 2019)
Tal gave talks over winter break at Yale Linguistics (December 10), Microsoft Research Redmond (December 13), Allen AI Institue (December 14), and Google New York (January 7). (December 2018 - January 2019)

PEOPLE

PRINCIPAL INVESTIGATOR

Tal Linzen ( personal site )
Assistant Professor of Cognitive Science
[email protected]

Tal is an Assistant Professor in the Department of Cognitive Science at Johns Hopkins University where he directs the JHU Computational Psycholinguistics Lab. He is also affiliated with the Center for Language and Speech Processing

GRADUATE STUDENTS

Grusha Prasad ( personal site )
PhD Student in Cognitive Science
[email protected]

I am interested in how people represent statistical regularities in linguistic structure and what factors can cause these representations to change. My approach to addressing these questions involves running psycholinguistic experiments on humans that are informed by computational models of the linguistic/cognitive phenomenon of interest. Outside of work, puns and word play get me very excited.

Tom McCoy ( personal site )
PhD Student in Cognitive Science
[email protected]

I use computational modeling to understand the formal properties of language, how these properties are instantiated in the mind, and which of these properties are innate vs. learned. I am currently co-advised by Tal Linzen and Paul Smolensky, and I continue to collaborate with my undergraduate advisor, Robert Frank. Outside of research, I enjoy running and constructing crossword puzzles.

Suhas Arehalli ( personal site )
PhD Student in Cognitive Science
[email protected]

My interests include machine learning, computational modelling, and psycholinguistics. I am particularly interested in the cognitive mechanisms underlying sentence processing, and particularly in what linguistic illusions can tell us about them. I am also passionate about teaching statistical and computational literacy, particularly how algorithms can think about data and the impact on society of those algorithms.

Karl Mulligan ( personal site )
PhD Student in Cognitive Science
[email protected]

Broadly, I'm interested in the representations behind language production and understanding. I hope to characterize how supervision and context contribute to linguistic development, and in particular what role information from extralinguistic cognitive processes, like numerical or spatial cognition, might play. My work uses methods from formal linguistics, machine learning, and psychological research. In my spare time, I like to cook, hike, and wipe out on my surfboard.

Junghyun Min ( personal site )
Masters student in Cognitive Science
[email protected]

UNDERGRADUATE RESEARCH ASSISTANTS

Michael Lepori
Undergraduate Senior in Physics and Computer Science
[email protected]

Nicholas Douglass
Undergraduate Senior in Cognitive Science and the Writing Seminars
[email protected]

Daniela Torres
Undergraduate Junior in Cognitive Science
[email protected]

LAB MANAGER

Brian Leonard
Lab Manager
[email protected]

I'm interested in everything for which science doesn't have clear answers yet. Naturally, this makes language an ideal field of study. I'm particularly interested in the question of how syntactic structure is generated and processed in the mind. I spend my free time telling jokes, reading fiction, and writing poems.

LAB ALUMNI

Marten van Schijndel ( personal site )
Post-Doctoral Fellow
[email protected]

I'm interested in incremental (left-to-right, single pass) neural language models. I analyze the linguistic representations learned by these models to see what linguistic aspects they find helpful, and I test their cognitive plausibility by evaluating how well their performance matches human behavior (e.g. reading times or speech errors).

Rebecca Marvin ( personal site )
PhD Student in Computer Science
[email protected]

I'm working on my PhD in the Center for Language and Speech Processing in the Computer Science Department at JHU. My main research interests include machine translation, error analysis, and interpretability of neural systems. When I'm not working, I'm probably spending time with my cat.

RESEARCH

OVERVIEW

What are the mental representations that constitute our knowledge of language? How do we use them to understand and produce language?

We address these questions using computational models and human experiments. The goal of our models is to mimic the processes that humans engage in when learning and processing language; these models often combine techniques from machine learning with representations from theoretical linguistics.

We then compare the predictions of these models to human language comprehension. In a typical experiment in our lab, we invite participants to read a range of sentences, and record how long they take to read each word, measured based on key presses or eye movements. Other techniques include artificial language learning experiments and neural measurements.

Finally, we use linguistics and psycholinguistics to understand and improve artificial intelligence systems, in particular “deep learning” models that are otherwise difficult to analyze.

EXPECTATION-BASED LANGUAGE COMPREHENSION

The probability of a word or a syntactic structure is a major predictor of how difficult they are to read. What are the syntactic representations over which those probability distributions are maintained? How is processing difficulty affected by the probability distribution we maintain over the representations we predict, and in particular, our uncertainty about the structure and meaning of the sentence?

We can study these questions by implementing computational models that which incorporate different representational assumptions, and deriving quantitative predictions from those models:

We can then measure to what extent these predictions match up with human sentence comprehension processes, as measured by reading times (eyetracking, self-paced reading) or neural measurements such as MEG.

Expectations are sometimes malleable and context-specific. If the person we’re talking to is unusually fond of a particular syntactic construction, say passive verbs, we might learn to expect them to use this construction more often than other people. In ongoing research, we’re investigating the extent to which our expectations for specific syntactic representations can vary from context to context.

LINGUISTIC REPRESENTATIONS IN ARTIFICIAL NEURAL NETWORKS

Artificial neural networks are a powerful statistical learning technique that underpins some of the best-performing artificial intelligence software we have. Many of the neural networks that have been successful in practical applications do not have any explicit linguistic representations (e.g., syntax trees or logical forms). Is the performance of neural networks really as impressive when evaluated using rigorous linguistic and psycholinguistic tests? If so, how do these networks represent or approximate the structures that are normally seen as the building blocks of language?

A related topic of research is lexical representations in neural networks. Neural networks are typically allowed to evolve their own lexical representations, which are normally nothing but unstructured lists of numbers. We have explored to what extent such lexical representations implicitly capture the linguistic distinctions that are assumed in linguistics (in particular, formal semantics).

GENERALIZATION IN LANGUAGE

We regularly generalize our knowledge of language to words and sentences we have never heard before. When is our linguistic knowledge limited to a specific item, and when do we apply it to novel items? What representations do we use to generalize beyond the specific items that we have encountered?

We can often study these questions using artificial language learning experiments. In one experiment, for example, we taught participants an artificial language with a simple phonological regularity, and tested how they generalized this regularity to new sounds:

PUBLICATIONS

IN PROGRESS

Reassessing the evidence for syntactic adaptation from self-paced reading studies. G. Prasad, & T. Linzen. (2019). In Proceedings for The 32nd CUNY Conference on Human Sentence Processing. [BibTeX]
BibTeX: @inproceedings{grushacuny2019, title = {Reassessing the evidence for syntactic adaptation from self-paced reading studies}, author = {Prasad, Grusha and Linzen, Tal}, journal = {Proceedings for The 32nd CUNY Conference on Human Sentence Processing}, year = {2019} }
Analyzing and Interpreting Neural Networks for NLP: A Report on the First BlackboxNLP Workshop. A. Alishahi, G. Chrupała, & T. Linzen. (2019). Journal of Natural Language Engineering. [Abstract] [PDF] [BibTeX]
Abstract: The EMNLP 2018 workshop BlackboxNLP was dedicated to resources and techniques specifically developed for analyzing and understanding the inner-workings and representations acquired by neural models of language. Approaches included: systematic manipulation of input to neural networks and investigating the impact on their performance, testing whether interpretable knowledge can be decoded from intermediate representations acquired by neural networks, proposing modifications to neural network architectures to make their knowledge state or generated output more explainable, and examining the performance of networks on simplified or formal languages. Here we review a number of representative studies in each category.

BibTeX: @article{alishahi2019analyzing, title = {Analyzing and Interpreting Neural Networks for NLP: A Report on the First BlackboxNLP Workshop}, author = {Alishahi, Afra and Chrupa{\l}a, Grzegorz and Linzen, Tal}, journal = {Journal of Natural Language Engineering}, year = {2019}, url = {https://arxiv.org/pdf/1904.04063.pdf} }
Do self-paced reading studies provide evidence for rapid syntactic adaptation?. G. Prasad, & T. Linzen. (2019). [Abstract] [PDF] [BibTeX]
Abstract: Syntactically ambiguous sentences that are disambiguated in favor of a less preferred parse are read more slowly than their unambiguous counterparts. This is called a garden path effect. Recent studies have found that this garden path effect decreased as participants are exposed to many such syntactically ambiguous sentences over the course of an experiment. This decrease has been interpreted as evidence for rapid syntactic adaptation — i.e. evidence that readers rapidly calibrate their expectations to a new environment in order to minimize how surprised they are when they encounter these unexpected syntactic structures (Fine, Jaeger, Farmer, & Qian, 2013). Syntactic adaptation is only one possible explanation for the observed decrease in garden-path effect, however: this decrease could also be driven by increased familiarity with the experimental paradigm (task adaptation), which impacts difficult sentences more than it does easy ones. The goal of this paper is to tease apart these two explanations. Using a between-group design, we demonstrate that the decrease in garden path effect is not dependent on readers’ experience during the experiment. This suggests that it is unlikely to be driven primarily by syntactic adaptation. We also provide preliminary evidence that the decrease in garden path effect is driven by asymmetric effects of task adaptation. We conclude that selfpaced reading studies cannot provide unambiguous evidence for rapid syntactic adaptation.

BibTeX: @article{prasad2019self, title = {Do self-paced reading studies provide evidence for rapid syntactic adaptation?}, author = {Prasad, Grusha and Linzen, Tal}, year = {2019}, publisher = {PsyArXiv}, url = {https://psyarxiv.com/9ptg4/} }
Studying the Inductive Biases of RNNs with Synthetic Variations of Natural Languages. S. Ravfogel, Y. Goldberg, & T. Linzen. (2019). North American Chapter of the Association for Computational Linguistics (NAACL). [Abstract] [PDF] [BibTeX]
Abstract: How do typological properties such as word order and morphological case marking affect the ability of neural sequence models to acquire the syntax of a language? Crosslinguistic comparisons of RNNs’ syntactic performance (e.g., on subject-verb agreement prediction) are complicated by the fact that any two languages differ in multiple typological properties, as well as by differences in training corpus. We propose a paradigm that addresses these issues: we create synthetic versions of English, which differ from English in a single typological parameter, and generate corpora for those languages based on a parsed English corpus. We report a series of experiments in which RNNs were trained to predict agreement features for verbs in each of those synthetic languages. Among other findings, (1) performance was higher in subject-verbobject order (as in English) than in subjectobject-verb order (as in Japanese), suggesting that RNNs have a recency bias; (2) predicting agreement with both subject and object (polypersonal agreement) improves over predicting each separately, suggesting that underlying syntactic knowledge transfers across the two tasks; and (3) overt morphological case makes agreement prediction significantly easier, regardless of word order.

BibTeX: @article{ravfogel2019studying, title = {Studying the Inductive Biases of RNNs with Synthetic Variations of Natural Languages}, author = {Ravfogel, Shauli and Goldberg, Yoav and Linzen, Tal}, journal = {North American Chapter of the Association for Computational Linguistics (NAACL)}, year = {2019}, url = {https://arxiv.org/pdf/1903.06400} }
Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference. R. T. McCoy, E. Pavlick, & T. Linzen. (2019). ArXiv Preprint ArXiv:1902.01007. [Abstract] [PDF] [BibTeX]
Abstract: Machine learning systems can often achieve high performance on a test set by relying on heuristics that are effective for frequent example types but break down in more challenging cases. We study this issue within natural language inference (NLI), the task of determining whether one sentence entails another. Based on an analysis of the task, we hypothesize three fallible syntactic heuristics that NLI models are likely to adopt: the lexical overlap heuristic, the subsequence heuristic, and the constituent heuristic. To determine whether models have adopted these heuristics, we introduce a controlled evaluation set called HANS (Heuristic Analysis for NLI Systems), which contains many examples where the heuristics fail. We find that models trained on MNLI, including the state-of-the-art model BERT, perform very poorly on HANS, suggesting that they have indeed adopted these heuristics. We conclude that there is substantial room for improvement in NLI systems, and that the HANS dataset can motivate and measure progress in this area.

BibTeX: @article{mccoy2019right, title = {Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference}, author = {McCoy, R Thomas and Pavlick, Ellie and Linzen, Tal}, journal = {arXiv preprint arXiv:1902.01007}, year = {2019}, url = {https://arxiv.org/pdf/1902.01007.pdf} }
Probing what different NLP tasks teach machines about function word comprehension. N. Kim, R. Patel, A. Poliak, A. Wang, P. Xia, R. T. McCoy, I. Tenney, A. Ross, T. Linzen, B. Van Durme, S. R. Bowman, & E. Pavlick. (2019). In Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019). [BibTeX]
BibTeX: @inproceedings{kimetal, title = {Probing what different NLP tasks teach machines about function word comprehension}, author = {Kim, Najoung and Patel, Roma and Poliak, Adam and Wang, Alex and Xia, Patrick and McCoy, R Thomas and Tenney, Ian and Ross, Alexis and Linzen, Tal and Van Durme, Benjamin and Bowman, Samuel R and Pavlick, Ellie}, journal = {Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019)}, year = {2019} }
Human few-shot learning of compositional instructions. B. M. Lake, T. Linzen, & M. Baroni. (2019). In Proceedings of the 41st Annual Conference of the Cognitive Science Society. [Abstract] [PDF] [BibTeX]
Abstract: People learn in fast and flexible ways that have not been emulated by machines. Once a person learns a new verb “dax,” he or she can effortlessly understand how to “dax twice,” “walk and dax,” or “dax vigorously.” There have been striking recent improvements in machine learning for natural language processing, yet the best algorithms require vast amounts of experience and struggle to generalize new concepts in compositional ways. To better understand these distinctively human abilities, we study the compositional skills of people through languagelike instruction learning tasks. Our results show that people can learn and use novel functional concepts from very few examples (few-shot learning), successfully applying familiar functions to novel inputs. People can also compose concepts in complex ways that go beyond the provided demonstrations. Two additional experiments examined the assumptions and inductive biases that people make when solving these tasks, revealing three biases: mutual exclusivity, one-to-one mappings, and iconic concatenation. We discuss the implications for cognitive modeling and the potential for building machines with more human-like language learning capabilities.

BibTeX: @inproceedings{lake2019human, title = {Human few-shot learning of compositional instructions}, author = {Lake, Brenden M and Linzen, Tal and Baroni, Marco}, journal = {Proceedings of the 41st Annual Conference of the Cognitive Science Society}, year = {2019}, url = {https://arxiv.org/pdf/1901.04587} }
Using syntactic priming to investigate how recurrent neural networks represent syntax.. G. Prasad, M. van Schijndel, & T. Linzen. (2019). In to appear in the Proceedings for The 32nd CUNY Conference on Human Sentence Processing. [BibTeX]
BibTeX: @inproceedings{grushaetalcuny2019, title = {Using syntactic priming to investigate how recurrent neural networks represent syntax.}, author = {Prasad, Grusha and van Schijndel, Marten and Linzen, Tal}, journal = {to appear in the Proceedings for The 32nd CUNY Conference on Human Sentence Processing}, year = {2019} }
RNNs Implicitly Implement Tensor Product Representations. R. T. McCoy, T. Linzen, E. Dunbar, & P. Smolensky. (2019). In To appear in International Conference on Learning Representations (ICLR). [Abstract] [PDF] [BibTeX]
Abstract: Recurrent neural networks (RNNs) can learn continuous vector representations of symbolic structures such as sequences and sentences; these representations often exhibit linear regularities (analogies). Such regularities motivate our hypothesis that RNNs that show such regularities implicitly compile symbolic structures into tensor product representations (TPRs; Smolensky, 1990), which additively combine tensor products of vectors representing roles (e.g., sequence positions) and vectors representing fillers (e.g., particular words). To test this hypothesis, we introduce Tensor Product Decomposition Networks (TPDNs), which use TPRs to approximate existing vector representations. We demonstrate using synthetic data that TPDNs can successfully approximate linear and tree-based RNN autoencoder representations, suggesting that these representations exhibit interpretable compositional structure; we explore the settings that lead RNNs to induce such structure-sensitive representations. By contrast, further TPDN experiments show that the representations of four models trained to encode naturally-occurring sentences can be largely approximated with a bag-of-words, with only marginal improvements from more sophisticated structures. We conclude that TPDNs provide a powerful method for interpreting vector representations, and that standard RNNs can induce compositional sequence representations that are remarkably well approximated by TPRs; at the same time, existing training tasks for sentence representation learning may not be sufficient for inducing robust structural representations.

BibTeX: @inproceedings{mccoy2018rnns, title = {RNNs Implicitly Implement Tensor Product Representations}, author = {McCoy, R Thomas and Linzen, Tal and Dunbar, Ewan and Smolensky, Paul}, journal = {To appear in International Conference on Learning Representations (ICLR)}, year = {2019}, url = {https://arxiv.org/abs/1812.08718} }
Syntactic categories as lexical features or syntactic heads: An MEG approach. J. King, T. Linzen, & A. (accepted with revisions) Marantz. (2015). Linguistic Inquiry. [Abstract] [PDF] [BibTeX]
Abstract: Are syntactic categories like noun and verb categories of stems, such that the noun and verb versions of ambiguous stems like hammer are distinct, though related, lexical items, or are syntactic categories carried by affixes attached to uncategorized roots, such that noun and verb versions of ambiguous stems are derived forms built on a single root? This paper addresses the representational question posed by syntactic categories by examining the processing of category ambiguous words. If syntactic categories are in fact categories of stems, category ambiguity should yield processing uncertainty parallel to that engendered by other forms of lexical ambiguity, such as homophony. On the other hand, if syntactic categories result from affixation, category ambiguity should yield processing uncertainty parallel to that engendered by syntactic uncertainty, at least if morphological structure reduces to syntactic structure as claimed by Distributed Morphology. A magnetoencephalographic (MEG) experiment exploiting a single word lexical decision task supports the syntactic over the lexical account of syntactic categories; category ambiguity parallels syntactic ambiguity rather than lexical ambiguity. The paper illustrates how neurolinguistic data can contribute to testing competing representational theories, but only when tight linking hypotheses are motivated connecting linguistic theory, cognitive processing, and neural responses.

BibTeX: @article{KingMarantzLinzen2015, title = {Syntactic categories as lexical features or syntactic heads: An MEG approach}, author = {King, Joseph and Linzen, Tal and Marantz, Alec (accepted with revisions)}, journal = {Linguistic Inquiry}, url = {http://tallinzen.net/media/papers/king_linzen_marantz_2015_syntactic_categories.pdf}, year = {2015} }

PUBLISHED

2019

Non-entailed subsequences as a challenge for natural language inference. R. T. McCoy, & T. Linzen. (2019). In Proceedings of the Society for Computation in Linguistics (SCiL). [PDF] [BibTeX]
BibTeX: @inproceedings{mccoy2018non, title = {Non-entailed subsequences as a challenge for natural language inference}, author = {McCoy, R Thomas and Linzen, Tal}, journal = {Proceedings of the Society for Computation in Linguistics (SCiL)}, year = {2019}, url = {http://tallinzen.net/media/papers/mccoy_linzen_2019_scil.pdf} }
Can Entropy Explain Successor Surprisal Effects in Reading?. M. van Schijndel, & T. Linzen. (2019). In Proceedings of the Society for Computation in Linguistics (SCiL). [Abstract] [PDF] [BibTeX]
Abstract: Human reading behavior is sensitive to surprisal: more predictable words tend to be read faster. Unexpectedly, this applies not only to the surprisal of the word that is currently being read, but also to the surprisal of upcoming (successor) words that have not been fixated yet. This finding has been interpreted as evidence that readers can extract lexical information parafoveally. Calling this interpretation into question, Angele et al. (2015) showed that successor effects appear even in contexts in which those successor words are not yet visible. They hypothesized that successor surprisal predicts reading time because it approximates the reader’s uncertainty about upcoming words. We test this hypothesis on a reading time corpus using an LSTM language model, and find that successor surprisal and entropy are independent predictors of reading time. This independence suggests that entropy alone is unlikely to be the full explanation for successor surprisal effects.

BibTeX: @inproceedings{van2018can, title = {Can Entropy Explain Successor Surprisal Effects in Reading?}, author = {van Schijndel, Marten and Linzen, Tal}, journal = {Proceedings of the Society for Computation in Linguistics (SCiL)}, year = {2019}, url = {http://tallinzen.net/media/papers/vanschijndel_linzen_2019_scil.pdf} }

2018

Colorless green recurrent networks dream hierarchically. K. Gulordava, P. Bojanowski, E. Grave, T. Linzen, & M. Baroni. (2018). CoRR. [Abstract] [PDF] [BibTeX]
Abstract: Recurrent neural networks (RNNs) have achieved impressive results in a variety of linguistic processing tasks, suggesting that they can induce non-trivial properties of language. We investigate here to what extent RNNs learn to track abstract hierarchical syntactic structure. We test whether RNNs trained with a generic language modeling objective in four languages (Italian, English, Hebrew, Russian) can predict long-distance number agreement in various constructions. We include in our evaluation nonsensical sentences where RNNs cannot rely on semantic or lexical cues (“The colorless green ideas ideas I ate with the chair sleep sleep furiously”), and, for Italian, we compare model performance to human intuitions. Our language-model-trained RNNs make reliable predictions about long-distance agreement, and do not lag much behind human performance. We thus bring support to the hypothesis that RNNs are not just shallowpattern extractors, but they also acquire deeper grammatical competence.

BibTeX: @article{GulordavaEtAl2018, author = {Gulordava, Kristina and Bojanowski, Piotr and Grave, Edouard and Linzen, Tal and Baroni, Marco}, title = {Colorless green recurrent networks dream hierarchically}, journal = {CoRR}, volume = {abs/1803.11138}, year = {2018}, url = {http://arxiv.org/abs/1803.11138}, archiveprefix = {arXiv}, eprint = {1803.11138}, timestamp = {Thu, 12 Apr 2018 17:07:48 +0200}, biburl = {https://dblp.org/rec/bib/journals/corr/abs-1803-11138}, bibsource = {dblp computer science bibliography, https://dblp.org} }
What can linguistics and deep learning contribute to each other?. T. Linzen. (2018). To Appear in Language. [Abstract] [PDF] [BibTeX]
Abstract: Joe Pater’s target article calls for greater interaction between neural network research and linguistics. I expand on this call and show how such interaction can benefit both fields. Linguists can contribute to research on neural networks for language technologies by clearly delineating the linguistic capabilities that can be expected of such systems, and by constructing controlled experimental paradigms that can determine whether those desiderata have been met. In the other direction, neural networks can benefit the scientific study of language by providing infrastructure for modeling human sentence processing and for evaluating the necessity of particular innate constraints on language acquisition.

BibTeX: @article{linzen2018can, title = {What can linguistics and deep learning contribute to each other?}, author = {Linzen, Tal}, journal = {To appear in Language}, year = {2018}, url = {https://arxiv.org/abs/1809.04179} }
A morphosyntactic inductive bias in artificial language learning. I. Kastner, & T. Linzen. (2018). [PDF] [BibTeX]
BibTeX: @inproceedings{kastner2017morphosyntactic, title = {A morphosyntactic inductive bias in artificial language learning}, author = {Kastner, Itamar and Linzen, Tal}, year = {2018}, organization = {the 48th Annual Meeting of the North East Linguistic Society (NELS 48)}, url = {http://tallinzen.net/media/papers/kastner_linzen_nels48_proceedings.pdf} }
A morphosyntactic inductive bias in artificial language learning.. J. White, R. Kager, T. Linzen, G. Markopoulos, A. Martin, A. Nevins, S. Peperkamp, K. Polgárdi, N. Topintzi, & R. van de Vijver. (2018). In the 48th Annual Meeting of the North East Linguistic Society (NELS 48). [PDF] [BibTeX]
BibTeX: @inproceedings{white2018preferencf, title = {A morphosyntactic inductive bias in artificial language learning.}, author = {White, James and Kager, Ren{\'e} and Linzen, Tal and Markopoulos, Giorgos and Martin, Alexander and Nevins, Andrew and Peperkamp, Sharon and Polg{\'a}rdi, Krisztina and Topintzi, Nina and van de Vijver, Ruben}, year = {2018}, journal = {the 48th Annual Meeting of the North East Linguistic Society (NELS 48)}, url = {http://tallinzen.net/media/papers/white_et_al_nels_48.pdf} }
Revisiting the poverty of the stimulus: hierarchical generalization without a hierarchical bias in recurrent neural networks. R. T. McCoy, R. Frank, & T. Linzen. (2018). In Proceedings of the 40th Annual Conference of the Cognitive Science Society. [Abstract] [PDF] [BibTeX]
Abstract: Syntactic rules in human language usually refer to the hierarchical structure of sentences. However, the input during language acquisition can often be explained equally well with rules based on linear order. The fact that children consistently ignore these linear explanations to instead settle on hierarchical explanations has been used to argue for an innate hierarchical bias in humans. We revisit this argument by using recurrent neural networks (RNNs), which have no hierarchical bias, to simulate the acquisition of question formation, a hierarchical transformation, in an artificial language modeled after English. Even though this transformation could be explained with a linear rule, we find that some RNN architectures consistently learn the correct hierarchical rule instead. This finding suggests that hierarchical cues within the language are suf- ficient to induce a preference for hierarchical generalization. This conclusion is strengthened by the finding that adding an additional hierarchical cue, namely syntactic agreement, further improves performance.

BibTeX: @inproceedings{McCoyFrank2018, keywords = {learning bias, poverty of the stimulus, recurrent neural networks}, author = {McCoy, R. Thomas and Frank, Robert and Linzen, Tal}, title = {Revisiting the poverty of the stimulus: hierarchical generalization without a hierarchical bias in recurrent neural networks}, booktitle = {Proceedings of the 40th Annual Conference of the Cognitive Science Society}, year = {2018}, url = {http://arxiv.org/abs/1802.09091} }
The reliability of acceptability judgments across languages. Tal Linzen & Yohei Oseki. (2018). Glossa: a Journal of General Linguistics. [Abstract] [PDF] [BibTeX]
Abstract: The reliability of acceptability judgments made by individual linguists has often been called into question. Recent large-scale replication studies conducted in response to this criticism have shown that the majority of published English acceptability judgments are robust. We make two observations about these replication studies. First, we raise the concern that English acceptability judgments may be more reliable than judgments in other languages. Second, we argue that it is unnecessary to replicate judgments that illustrate uncontroversial descriptive facts; rather, candidates for replication can emerge during formal or informal peer review. We present two experiments motivated by these arguments. Published Hebrew and Japanese acceptability contrasts considered questionable by the authors of the present paper were rated for acceptability by a large sample of naive participants. Approximately half of the contrasts did not replicate. We suggest that the reliability of acceptability judgments, especially in languages other than English, can be improved using a simple open review system, and that formal experiments are only necessary in controversial cases.

BibTeX: @article{TalLinzen&YoheiOseki2018, author = {{Tal Linzen {\&} Yohei Oseki}}, title = {{The reliability of acceptability judgments across languages}}, url = {http://tallinzen.net/media/papers/linzen_oseki_2018_glossa.pdf}, journal = {Glossa: a journal of general linguistics}, year = {2018} }
Modeling garden path effects without explicit hierarchical syntax. M. van Schijndel, & T. Linzen. (2018). In Proceedings of the 40th Annual Conference of the Cognitive Science Society. [Abstract] [PDF] [BibTeX]
Abstract: The disambiguation of syntactically ambiguous sentences can lead to reading difficulty, often referred to as a garden path effect. The surprisal hypothesis suggests that this difficulty can be accounted for using word predictability. We tested this hypothesis using predictability estimates derived from two families of language models: grammar-based models, which explicitly encode the syntax of the language; and recurrent neural network (RNN) models, which do not. Both classes of models correctly predicted increased difficulty in ambiguous sentences compared to controls, suggesting that the syntactic representations induced by RNNs are sufficient for this purpose. At the same time, surprisal estimates derived from all models systematically underestimated the magnitude of the effect, and failed to predict the difference between easier (NP/S) and harder (NP/Z) ambiguities. This suggests that it may not be possible to reduce garden path effects to predictability.

BibTeX: @inproceedings{vanSchijndel2018, keywords = {self-paced reading, garden path, neural networks}, author = {van Schijndel, Marten and Linzen, Tal}, title = {Modeling garden path effects without explicit hierarchical syntax}, booktitle = {Proceedings of the 40th Annual Conference of the Cognitive Science Society}, year = {2018}, url = {http://tallinzen.net/media/papers/van_schijndel_linzen_2018_cogsci.pdf} }
Distinct patterns of syntactic agreement errors in recurrent networks and humans. T. Linzen, & B. Leonard. (2018). In Proceedings of the 40th Annual Conference of the Cognitive Science Society. [Abstract] [PDF] [BibTeX]
Abstract: Determining the correct form of a verb in context requires an understanding of the syntactic structure of the sentence. Recurrent neural networks have been shown to perform this task with an error rate comparable to humans, despite the fact that they are not designed with explicit syntactic representations. To examine the extent to which the syntactic representations of these networks are similar to those used by humans when processing sentences, we compare the detailed pattern of errors that RNNs and humans make on this task. Despite significant similarities (attraction errors, asymmetry between singular and plural subjects), the error patterns differed in important ways. In particular, in complex sentences with relative clauses error rates increased in RNNs but decreased in humans. Furthermore, RNNs showed a cumulative effect of attractors but humans did not. We conclude that at least in some respects the syntactic representations acquired by RNNs are fundamentally different from those used by humans.

BibTeX: @inproceedings{Leonard2018, keywords = {Psycholinguistics, syntax, recurrent neural networks, agreement attraction}, author = {Linzen, Tal and Leonard, Brian}, title = {Distinct patterns of syntactic agreement errors in recurrent networks and humans}, booktitle = {Proceedings of the 40th Annual Conference of the Cognitive Science Society}, year = {2018}, url = {http://tallinzen.net/media/papers/linzen_leonard_2018_cogsci.pdf} }
In spoken word recognition the future predicts the past. L. Gwilliams, T. Linzen, D. Poeppel, & A. Marantz. (2018). Journal of Neuroscience. [Abstract] [PDF] [BibTeX]
Abstract: Speech is an inherently noisy and ambiguous signal. In order to fluently derive meaning, a listener must integrate contextual information to guide interpretations of the sensory input. While many studies have demonstrated the influence of prior context, the neural mechanisms supporting the integration of subsequent information remain unknown. Using magnetoencephalography, we analysed responses to spoken words with a varyingly ambiguous onset phoneme, the identity of which is later disambiguated at the lexical uniqueness point. Our results uncover a three-level processing network. Subphonemic detail is preserved in primary auditory cortex over long timescales, and re-evoked at subsequent phoneme positions. Commitments to phonological categories occur in parallel, resolving on the shorter time-scale of 450 ms. Finally, predictions are formed over likely lexical items. These findings provide evidence that future input determines the perception of earlier speech sounds by maintaining sensory features until they can be optimally integrated with top-down information.

BibTeX: @article{GwilliamsEtAlSpokenWord, author = {Gwilliams, Laura and Linzen, Tal and Poeppel, David and Marantz, Alec}, title = {In spoken word recognition the future predicts the past}, year = {2018}, journal = {Journal of Neuroscience}, url = {http://tallinzen.net/media/papers/gwilliams_linzen_poeppel_marantz_2018_jneurosci.pdf} }
Targeted syntactic evaluation of language models. R. Martin, & T. Linzen. (2018). In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP 2018). [Abstract] [PDF] [BibTeX]
Abstract: We present a dataset for evaluating the grammaticality of the predictions of a language model. We automatically construct a large number of minimally different pairs of English sentences, each consisting of a grammatical and an ungrammatical sentence. The sentence pairs represent different variations of structure-sensitive phenomena: subject-verb agreement, reflexive anaphora and negative polarity items. We expect a language model to assign a higher probability to the grammatical sentence than the ungrammatical one. In an experiment using this data set, an LSTM language model performed poorly on many of the constructions. Multi-task training with a syntactic objective (CCG supertagging) improved the LSTM’s accuracy, but a large gap remained between its performance and the accuracy of human participants recruited online. This suggests that there is considerable room for improvement over LSTMs in capturing syntax in a language model.

BibTeX: @inproceedings{MarvinLinzen2018, title = {Targeted syntactic evaluation of language models}, author = {Martin, Rebecca and Linzen, Tal}, journal = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP 2018)}, year = {2018}, url = {http://tallinzen.net/media/papers/marvin_linzen_2018_emnlp.pdf} }
A neural model of adaptation in reading. M. van Schijndel, & T. Linzen. (2018). In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP 2018). [Abstract] [PDF] [BibTeX]
Abstract: It has been argued that humans rapidly adapt their lexical and syntactic expectations to match the statistics of the current linguistic context. We provide further support to this claim by showing that the addition of a simple adaptation mechanism to a neural language model improves our predictions of human reading times compared to a non-adaptive model. We analyze the performance of the model on controlled materials from psycholinguistic experiments and show that it adapts not only to lexical items but also to abstract syntactic structures.

BibTeX: @inproceedings{vanSchijndelLinzen2018, title = {A neural model of adaptation in reading}, author = {van Schijndel, Marten and Linzen, Tal}, journal = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP 2018)}, year = {2018}, url = {https://arxiv.org/pdf/1808.09930} }
Phonological (un)certainty weights lexical activation. L. Gwilliams, D. Poeppel, A. Marantz, & T. Linzen. (2018). In Proceedings of the 8th Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2018). [Abstract] [PDF] [BibTeX]
Abstract: Spoken word recognition involves at least two basic computations. First is matching acoustic input to phonological categories (e.g. /b/, /p/, /d/). Second is activating words consistent with those phonological categories. Here we test the hypothesis that the listener’s probability distribution over lexical items is weighted by the outcome of both computations: uncertainty about phonological discretisation and the frequency of the selected word(s). To test this, we record neural responses in auditory cortex using magnetoencephalography, and model this activity as a function of the size and relative activation of lexical candidates. Our findings indicate that towards the beginning of a word, the processing system indeed weights lexical candidates by both phonological certainty and lexical frequency; however, later into the word, activation is weighted by frequency alone.

BibTeX: @inproceedings{GwilliamsEtAl2018, author = {Gwilliams, Laura and Poeppel, David and Marantz, Alec and Linzen, Tal}, title = {Phonological (un)certainty weights lexical activation}, booktitle = {Proceedings of the 8th Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2018)}, year = {2018}, publisher = {Association for Computational Linguistics}, pages = {29--34}, location = {Salt Lake City, Utah}, url = {http://aclweb.org/anthology/W18-0104} }
Preference for locality is affected by the prefix/suffix asymmetry: Evidence from artificial language learning. J. White, R. Kager, T. Linzen, G. Markopoulos, A. Martin, A. Nevins, S. Peperkamp, K. Polgárdi, N. Topintzi, & R. van de Vijver. (2018). In the 48th Annual Meeting of the North East Linguistic Society (NELS 48). [PDF] [BibTeX]
BibTeX: @inproceedings{white2018preference, title = {Preference for locality is affected by the prefix/suffix asymmetry: Evidence from artificial language learning}, author = {White, James and Kager, Ren{\'e} and Linzen, Tal and Markopoulos, Giorgos and Martin, Alexander and Nevins, Andrew and Peperkamp, Sharon and Polg{\'a}rdi, Krisztina and Topintzi, Nina and van de Vijver, Ruben}, year = {2018}, journal = {the 48th Annual Meeting of the North East Linguistic Society (NELS 48)}, url = {http://tallinzen.net/media/papers/white_et_al_nels_48.pdf} }

2017

Prediction and uncertainty in an artificial language. T. Linzen, N. Siegelman, & L. Bogaerts. (2017). In Proceedings of the 39th Annual Conference of the Cognitive Science Society. [Abstract] [PDF] [BibTeX]
Abstract: Probabilistic prediction is a central process in language comprehension. Properties of probability distributions over predictions are often difficult to study in natural language. To obtain precise control over these distributions, we created artificial languages consisting of sequences of shapes. The languages were constructed to vary the uncertainty of the probability distribution over predictions as well as the probability of the predicted item. Participants were exposed to the languages in a self-paced presentation paradigm, which provides a measure of processing difficulty at each element of a sequence. There was a robust pattern of graded predictability: shapes were processed faster the more predictable they were, as in natural language. Processing times were also affected by the uncertainty (entropy) over predictions at the point at which those predictions were made; this effect was less consistent, however

BibTeX: @inproceedings{TalLinzen2017, author = {Linzen, Tal and Siegelman, Noam and Bogaerts, Louisa}, booktitle = {Proceedings of the 39th Annual Conference of the Cognitive Science Society}, keywords = {Entropy,artificial,language,prediction,psycholinguistics,statistical learning}, pages = {6}, title = {{Prediction and uncertainty in an artificial language}}, url = {http://tallinzen.net/media/papers/linzen{\_}siegelman{\_}bogaerts{\_}2017{\_}cogsci.pdf}, year = {2017} }
Exploring the Syntactic Abilities of RNNs with Multi-task Learning. É. Enguehard, Y. Goldberg, & T. Linzen. (2017). In Proceedings of the SIGNLL Conference on Computational Natural Language Learning (CoNLL). [Abstract] [PDF] [BibTeX]
Abstract: Recent work has explored the syntactic abilities of RNNs using the subject-verb agreement task, which diagnoses sensitivity to sentence structure. RNNs performed this task well in common cases, but faltered in complex sentences (Linzen et al., 2016). We test whether these errors are due to inherent limitations of the architecture or to the relatively indirect supervision provided by most agreement dependencies in a corpus. We trained a single RNN to perform both the agreement task and an additional task, either CCG supertagging or language modeling. Multitask training led to significantly lower error rates, in particular on complex sentences, suggesting that RNNs have the ability to evolve more sophisticated syntactic representations than shown before. We also show that easily available agreement training data can improve performance on other syntactic tasks, in particular when only a limited amount of training data is available for those tasks. The multi-task paradigm can also be leveraged to inject grammatical knowledge into language models.

BibTeX: @inproceedings{EmileEnguehard, author = {Enguehard, {\'{E}}mile and Goldberg, Yoav and Linzen, Tal}, booktitle = {Proceedings of the SIGNLL Conference on Computational Natural Language Learning (CoNLL)}, year = {2017}, title = {{Exploring the Syntactic Abilities of RNNs with Multi-task Learning}}, url = {http://tallinzen.net/media/papers/enguehard{\_}goldberg{\_}linzen{\_}2017{\_}conll.pdf} }
Rapid generalization in phonotactic learning. G. Gallagher, & T. Linzen. (2017). Laboratory Phonology. [Abstract] [PDF] [BibTeX]
Abstract: Speakers judge novel strings to be better potential words of their language if those strings consist of sound sequences that are attested in the language. These intuitions are often generalized to new sequences that share some properties with attested ones: participants exposed to an artificial language where all words start with the voiced stops [b] and [d] will prefer words that start with other voiced stops (e.g., [g]) to words that start with vowels or nasals. The current study tracks the evolution of generalization across sounds during the early stages of artificial language learning. In Experiments 1 and 2, participants received varying amounts of exposure to an artificial language. Learners rapidly generalized to new sounds: in fact, following short exposure to the language, attested patterns were not distinguished from unattested patterns that were similar in their phonological properties to the attested ones. Following additional exposure, participants showed an increasing preference for attested sounds, alongside sustained generalization to unattested ones. Finally, Experiment 3 tested whether participants can rapidly generalize to new sounds based on a single type of sound. We discuss the implications of our results for computational models of phonotactic learning.

BibTeX: @article{Gallagher2017, author = {Gallagher, Gillian and Linzen, Tal}, journal = {Laboratory Phonology}, title = {{Rapid generalization in phonotactic learning}}, url = {http://tallinzen.net/media/papers/linzen{\_}gallagher{\_}2017{\_}labphon.pdf}, year = {2017} }
Comparing Character-level Neural Language Models Using a Lexical Decision Task. G. Le Godais, T. Linzen, & E. Dupoux. (2017). In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. [Abstract] [PDF] [BibTeX]
Abstract: What is the information captured by neural network models of language? We address this question in the case of character-level recurrent neural language models. These models do not have explicit word representations; do they acquire implicit ones? We assess the lexical capacity of a network using the lexical decision task common in psycholinguistics: the system is required to decide whether or not a string of characters forms a word. We explore how accuracy on this task is affected by the architecture of the network, focusing on cell type (LSTM vs. SRN), depth and width. We also compare these architectural properties to a simple count of the parameters of the network. The overall number of parameters in the network turns out to be the most important predictor of accuracy; in particular, there is little evidence that deeper networks are beneficial for this task.

BibTeX: @inproceedings{legodais2017comparing, address = {Valencia, Spain}, author = {{Le Godais}, Ga{\"{e}}l and Linzen, Tal and Dupoux, Emmanuel}, booktitle = {Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers}, month = apr, pages = {125--130}, publisher = {Association for Computational Linguistics}, title = {{Comparing Character-level Neural Language Models Using a Lexical Decision Task}}, url = {http://tallinzen.net/media/papers/legodais{\_}linzen{\_}dupoux{\_}2017{\_}eacl.pdf}, year = {2017} }

2016

The diminishing role of inalienability in the {Hebrew} possessive dative. T. Linzen. (2016). Corpus Linguistics and Linguistic Theory. [Abstract] [PDF] [BibTeX]
Abstract: Hebrew has two constructions that are used to convey possessive relations: ordinary possession (OP) and possessive dative (PD). PD is most often used when the possessor is perceived as affected by the action or state described in the sentence. This study investigates the possibility that this tendency is gradually diminishing – in other words, that unaffected possessors in PD are in the process of becoming more acceptable. This hypothesis was evaluated in a blog corpus study, which focused on a central correlate of possessor affectedness: whether or not the possessed object was a body part (inalienability). In line with the hypothesis, inalienability had a weaker effect on the choice of construction in younger than in older bloggers. The overall proportion of PD constructions was similar across age groups. This suggests that the change is best viewed as semantic bleaching of PD rather than as a process in which PD is gaining ground at the expense of OP.

BibTeX: @article{linzen2016diminishing, author = {Linzen, Tal}, journal = {Corpus Linguistics and Linguistic Theory}, keywords = {Hebrew,affectedness,language change,possession,possessive dative}, number = {2}, pages = {325--354}, title = {{The diminishing role of inalienability in the {\{}Hebrew{\}} possessive dative}}, url = {http://tallinzen.net/media/papers/linzen{\_}2016{\_}cllt.pdf}, volume = {12}, year = {2016} }
Against all odds: exhaustive activation in lexical access of verb complementation options. E. Shetreet, T. Linzen, & N. Friedmann. (2016). Language, Cognition and Neuroscience. [Abstract] [PDF] [BibTeX]
Abstract: Various findings suggest that once a verb is accessed, all of its complementation options are activated. This fMRI study examined whether all the complementation options are activated even in contexts where this seems unnecessary. We examined whether introducing the selected complement prior to the verb (in topicalised sentences) still involves the activation of all complementation options. We performed ROI analyses in the left STG, a brain region that has been linked to the processing of argument structure and the number of complementation options. In this region, multiple-option verbs elicited greater activations compared with one-option verbs, both when the complement appeared after the verb and when it appeared pre-verbally. This suggests encapsulated lexical retrieval of the verb, which involves exhaustive activation of all its complementation options when the verb is accessed.

BibTeX: @article{shetreet2016against, author = {Shetreet, Einat and Linzen, Tal and Friedmann, Naama}, journal = {Language, Cognition and Neuroscience}, keywords = {Argument structure,STG,fMRI,sentence comprehension,word order variation}, number = {9}, pages = {1206--1214}, title = {{Against all odds: exhaustive activation in lexical access of verb complementation options}}, url = {http://tallinzen.net/media/papers/shetreet{\_}linzen{\_}friedmann{\_}2016{\_}lcn.pdf}, volume = {31}, year = {2016} }
Uncertainty and Expectation in Sentence Processing: Evidence From Subcategorization Distributions. T. Linzen, & T. F. Jaeger. (2016). Cognitive Science. [Abstract] [PDF] [BibTeX]
Abstract: There is now considerable evidence that human sentence processing is expectation based: As people read a sentence, they use their statistical experience with their language to generate predictions about upcoming syntactic structure. This study examines how sentence processing is affected by readers’ uncertainty about those expectations. In a self-paced reading study, we use lexical subcategorization distributions to factorially manipulate both the strength of expectations and the uncertainty about them. We compare two types of uncertainty: uncertainty about the verb’s complement, reflecting the next prediction step; and uncertainty about the full sentence, reflecting an unbounded number of prediction steps. We find that uncertainty about the full structure, but not about the next step, was a significant predictor of processing difficulty: Greater reduction in uncertainty was correlated with increased reading times (RTs). We additionally replicated previously observed effects of expectation violation (surprisal), orthogonal to the effect of uncertainty. This suggests that both surprisal and uncertainty affect human RTs. We discuss the consequences for theories of sentence comprehension.

BibTeX: @article{linzen2016uncertainty, author = {Linzen, Tal and Jaeger, T Florian}, journal = {Cognitive Science}, keywords = {Competition,Entropy reduction,Prediction,Sentence processing,Surprisal,Uncertainty}, number = {6}, pages = {1382--1411}, title = {{Uncertainty and Expectation in Sentence Processing: Evidence From Subcategorization Distributions}}, url = {http://tallinzen.net/media/papers/linzen{\_}jaeger{\_}2016{\_}cognitive{\_}science.pdf}, volume = {40}, year = {2016} }
Evaluating vector space models using human semantic priming results. A. Ettinger, & T. Linzen. (2016). In Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP. [Abstract] [PDF] [BibTeX]
Abstract: Vector space models of word representation are often evaluated using human similarity ratings. Those ratings are elicited in explicit tasks and have well-known subjective biases. As an alternative, we propose evaluating vector spaces using implicit cognitive measures. We focus in particular on semantic priming, exploring the strengths and limitations of existing datasets, and propose ways in which those datasets can be improved.

BibTeX: @inproceedings{ettinger2016evaluating, address = {Berlin, Germany}, author = {Ettinger, Allyson and Linzen, Tal}, booktitle = {Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP}, pages = {72--77}, publisher = {Association for Computational Linguistics}, title = {{Evaluating vector space models using human semantic priming results}}, url = {http://tallinzen.net/media/papers/ettinger{\_}linzen{\_}2016{\_}repeval.pdf}, year = {2016} }
Quantificational features in distributional word representations. T. Linzen, E. Dupoux, & B. Spector. (2016). In Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics (*SEM 2016). [Abstract] [PDF] [BibTeX]
Abstract: Do distributional word representations encode the linguistic regularities that theories of meaning argue they should encode? We address this question in the case of the logical properties (monotonicity, force) of quantificational words such as everything (in the object domain) and always (in the time domain). Using the vector offset approach to solving word analogies, we find that the skip-gram model of distributional semantics behaves in a way that is remarkably consistent with encoding these features in some domains, with accuracy approaching 100%, especially with mediumsized context windows. Accuracy in others domains was less impressive. We compare the performance of the model to the behavior of human participants, and find that humans performed well even where the models struggled.

BibTeX: @inproceedings{linzen2016quantificational, author = {Linzen, Tal and Dupoux, Emmanuel and Spector, Benjamin}, booktitle = {Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics (*SEM 2016)}, pages = {1--11}, publisher = {Association for Computational Linguistics}, title = {{Quantificational features in distributional word representations}}, url = {http://tallinzen.net/media/papers/linzen{\_}dupoux{\_}spector{\_}2016{\_}starsem.pdf}, year = {2016} }
Issues in evaluating semantic spaces using word analogies. T. Linzen. (2016). In Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP. [Abstract] [PDF] [BibTeX]
Abstract: The offset method for solving word analogies has become a standard evaluation tool for vector-space semantic models: it is considered desirable for a space to represent semantic relations as consistent vector offsets. We show that the method’s reliance on cosine similarity conflates offset consistency with largely irrelevant neighborhood structure, and propose simple baselines that should be used to improve the utility of the method in vector space evaluation.

BibTeX: @inproceedings{linzen2016issues, address = {Berlin, Germany}, author = {Linzen, Tal}, booktitle = {Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP}, pages = {13--18}, publisher = {Association for Computational Linguistics}, title = {{Issues in evaluating semantic spaces using word analogies}}, url = {http://tallinzen.net/media/papers/linzen{\_}2016{\_}repeval.pdf}, year = {2016} }
Assessing the ability of {LSTMs} to learn syntax-sensitive dependencies. T. Linzen, E. Dupoux, & Y. Goldberg. (2016). Transactions of the Association for Computational Linguistics. [Abstract] [PDF] [BibTeX]
Abstract: The success of long short-term memory (LSTM) neural networks in language processing is typically attributed to their ability to capture long-distance statistical regularities. Linguistic regularities are often sensitive to syntactic structure; can such dependencies be captured by LSTMs, which do not have explicit structural representations? We begin addressing this question using number agreement in English subject-verb dependencies. We probe the architecture’s grammatical competence both using training objectives with an explicit grammatical target (number prediction, grammaticality judgments) and using language models. In the strongly supervised settings, the LSTM achieved very high overall accuracy (less than 1% errors), but errors increased when sequential and structural information con- flicted. The frequency of such errors rose sharply in the language-modeling setting. We conclude that LSTMs can capture a non-trivial amount of grammatical structure given targeted supervision, but stronger architectures may be required to further reduce errors; furthermore, the language modeling signal is insufficient for capturing syntax-sensitive dependencies, and should be supplemented with more direct supervision

BibTeX: @article{linzen2016assessing, author = {Linzen, Tal and Dupoux, Emmanuel and Goldberg, Yoav}, journal = {Transactions of the Association for Computational Linguistics}, pages = {521--535}, title = {{Assessing the ability of {\{}LSTMs{\}} to learn syntax-sensitive dependencies}}, url = {http://tallinzen.net/media/papers/linzen{\_}dupoux{\_}goldberg{\_}2016{\_}tacl.pdf}, volume = {4}, year = {2016} }

2015

Morphological conditioning of phonological regularization. M. Gouskova, & T. Linzen. (2015). The Linguistic Review. [Abstract] [PDF] [BibTeX]
Abstract: We analyze three types of cases in which exceptional morphemes become regular in the presence of other morphemes (regularization effects). Vowel deletion in some Russian prepositions depends on the root that follows the preposition and also on the suffix that follows the root. In Japanese, dominant suffixes assign an accentual pattern to accented roots, but in Slovenian, dominance is conditional – revoked by another suffix. Finally, Tagalog and Dutch loanwords can contain nonnative segments, except when certain affixes are present. We account for these phenomena in a new constraint-based framework, Lexical MaxEnt with regularization factors. In this framework, constraint weights are rescaled for exceptional morphemes, and some affixes carry regularization factors that reduce or cancel rescaling. We argue that regularization is a property of morphemes rather than whole words, and that it follows from how these morphemes are combined in the grammar rather than from whole-word storage in the lexicon.

BibTeX: @article{gouskova2015morphological, author = {Gouskova, Maria and Linzen, Tal}, journal = {The Linguistic Review}, keywords = {distributed morphology,dominance,dutch,exceptions,japanese,lexical,lexical accent,loanwords,maxent,maximum entropy,morphology,phonology,russian,slovenian,stress,tagalog,turkish}, number = {3}, pages = {427--473}, title = {{Morphological conditioning of phonological regularization}}, url = {http://tallinzen.net/media/papers/gouskova{\_}linzen{\_}2015{\_}tlr.pdf}, volume = {32}, year = {2015} }
Pronominal datives: The royal road to argument status. M. Ariel, E. Dattner, J. W. Du Bois, & T. Linzen. (2015). Studies in Language. [Abstract] [PDF] [BibTeX]
Abstract: Based on a large corpus of dative constructions in Hebrew, we propose that dative-marked pronominals manifest a facilitated path from adjunct to argument. Since datives tend to be pronominal, adding them onto existing argument structures avoids a clash with the Preferred Argument Structure (PAS) Quantity constraint against more than one lexical noun phrase per clause. Supporting a more fluid adjunct/argument distinction, our first claim is that different Hebrew datives are grammaticized as arguments to different degrees. We then demonstrate a correlation between the degree of grammaticization of the dative as an argument and pronoun/lexical ratios. We show that incipient grammaticization phases involve virtually exclusive use of pronominal datives, but deeper grammaticization phases allow increased use of lexical nouns, within the constraints of PAS. Thus, it is pronouns that blaze the path from adjunct to argument status.

BibTeX: @article{ariel2015pronominal, author = {Ariel, Mira and Dattner, Elitzur and {Du Bois}, John W and Linzen, Tal}, journal = {Studies in Language}, keywords = {Hebrew,Preferred Argument Structure,adjunct/,argument,dative constructions,grammaticization}, number = {2}, pages = {257--321}, title = {{Pronominal datives: The royal road to argument status}}, url = {http://tallinzen.net/media/papers/ariel{\_}dattner{\_}dubois{\_}linzen{\_}2015{\_}studies{\_}in{\_}language.pdf}, volume = {39}, year = {2015} }
Lexical preactivation in basic linguistic phrases. J. Fruchter, T. Linzen, M. Westerlund, & A. Marantz. (2015). Journal of Cognitive Neuroscience. [Abstract] [PDF] [BibTeX]
Abstract: Many previous studies have shown that predictable words are read faster and lead to reduced neural activation, consistent with a model of reading in which words are activated in advance of being encountered. The nature of such preactivation, however, has typically been studied indirectly through its subsequent effect on word recognition. Here, we use magnetoencephalography to study the dynamics of prediction within serially presented adjective–noun phrases, beginning at the point at which the predictive information is first available to the reader. Using corpus transitional probability to estimate the predictability of a noun, we found an increase in activity in the left middle temporal gyrus in response to the presentation of highly predictive adjectives (i.e., adjectives that license a strong noun prediction). Moreover, we found that adjective predictivity and expected noun frequency interacted, such that the response to the highly predictive adjectives (e.g., stainless) was modulated by the frequency of the expected noun (steel). These results likely reflect preactivation of nouns in highly predictive contexts. The fact that the preactivation process was modulated by the frequency of the predicted item is argued to provide support for a frequency-sensitive lexicon.

BibTeX: @article{fruchter2015lexical, author = {Fruchter, Joseph and Linzen, Tal and Westerlund, Masha and Marantz, Alec}, journal = {Journal of Cognitive Neuroscience}, number = {10}, pages = {1912--1935}, title = {{Lexical preactivation in basic linguistic phrases}}, url = {http://tallinzen.net/media/papers/fruchter{\_}linzen{\_}westerlund{\_}marantz{\_}2015{\_}jocn.pdf}, volume = {27}, year = {2015} }
A model of rapid phonotactic generalization. T. Linzen, & T. J. O’Donnell. (2015). In Proceedings of Empirical Methods for Natural Language Processing (EMNLP) 2015. [Abstract] [PDF] [BibTeX]
Abstract: The phonotactics of a language describes the ways in which the sounds of the language combine to form possible morphemes and words. Humans can learn phonotactic patterns at the level of abstract classes, generalizing across sounds (e.g., “words can end in a voiced stop”). Moreover, they rapidly acquire these generalizations, even before they acquire soundspecific patterns. We present a probabilistic model intended to capture this earlyabstraction phenomenon. The model represents both abstract and concrete generalizations in its hypothesis space from the outset of learning. This—combined with a parsimony bias in favor of compact descriptions of the input data—leads the model to favor rapid abstraction in a way similar to human learners.

BibTeX: @inproceedings{linzen2015model, author = {Linzen, Tal and O'Donnell, Timothy J}, booktitle = {Proceedings of Empirical Methods for Natural Language Processing (EMNLP) 2015}, pages = {1126--1131}, publisher = {Association for Computational Linguistics}, title = {{A model of rapid phonotactic generalization}}, url = {http://tallinzen.net/media/papers/linzen{\_}odonnell{\_}2015{\_}emnlp.pdf}, year = {2015} }

2014

The role of morphology in phoneme prediction: Evidence from MEG. A. Ettinger, T. Linzen, & A. Marantz. (2014). Brain and Language. [Abstract] [PDF] [BibTeX]
Abstract: There is substantial neural evidence for the role of morphology (word-internal structure) in visual word recognition. We extend this work to auditory word recognition, drawing on recent evidence that phoneme prediction is central to this process. In a magnetoencephalography (MEG) study, we crossed morphological complexity (bruis-er vs. bourbon) with the predictability of the word ending (bourbon vs. burble). High prediction error (surprisal) led to increased auditory cortex activity. This effect was enhanced for morphologically complex words. Additionally, we calculated for each timepoint the surprisal corresponding to the phoneme perceived at that timepoint, as well as the cohort entropy, which quantifies the competition among words compatible with the string prefix up to that timepoint. Higher surprisal increased neural activity at the end of the word, and higher entropy decreased neural activity shortly after word onset. These results reinforce the role of morphology and phoneme prediction in spoken word recognition.

BibTeX: @article{ettinger2014role, author = {Ettinger, Allyson and Linzen, Tal and Marantz, Alec}, journal = {Brain and Language}, keywords = {Entropy,MEG,Morphology,Prediction,Spoken word recognition,Surprisal}, pages = {14--23}, title = {{The role of morphology in phoneme prediction: Evidence from MEG}}, url = {http://tallinzen.net/media/papers/ettinger{\_}linzen{\_}marantz{\_}2014{\_}brln.pdf}, volume = {129}, year = {2014} }
Parallels between cross-linguistic and language-internal variation in Hebrew possessive constructions. T. Linzen. (2014). Linguistics. [Abstract] [PDF] [BibTeX]
Abstract: Grammatical constraints in one language often surface as statistical tendencies in another, suggesting that cross-linguistic comparative studies can play a central role in the study of language-internal “free” variation. This paper applies this approach to the case of the variation between two Hebrew constructions: possessive dative (PD) and ordinary possession. While both constructions convey a possessive meaning, PD additionally highlights the fact that the possessor was affected by an event involving his or her possessed object. To elucidate the concept of affectedness, we turn to European languages that have encoded in their grammar various concrete reflexes of this notion, such as the animacy of the possessor (animate possessors are more often perceived as affected). We show that these concrete reflexes, while not grammatically encoded in Hebrew, have a statistical effect in that language as well. This makes it possible to predict the choice of construction in any given context using these objective proxies of affectedness. Furthermore, we argue that certain categorical restrictions on PD, previously attributed to formal syntactic factors, are best captured as consequences of the semantic affectedness condition. Our results illustrate the continuum between categorical constraints and statistical tendencies, both across languages and within a single language.

BibTeX: @article{linzen2014parallels, author = {Linzen, Tal}, journal = {Linguistics}, keywords = {Hebrew,affectedness,external possession,possessive dative,variation}, number = {3}, pages = {759--792}, title = {{Parallels between cross-linguistic and language-internal variation in Hebrew possessive constructions}}, url = {http://tallinzen.net/media/papers/linzen{\_}2014{\_}linguistics.pdf}, volume = {52}, year = {2014} }
The timecourse of generalization in phonotactic learning. T. Linzen, & G. Gallagher. (2014). In Proceedings of Phonology 2013 , J. Kingston, C. Moore-Cantwell, J. Pater, & R. Staub (Editors). [PDF] [BibTeX]
BibTeX: @inproceedings{linzen2014timecourse, author = {Linzen, Tal and Gallagher, Gillian}, booktitle = {Proceedings of Phonology 2013}, editor = {Kingston, John and Moore-Cantwell, Claire and Pater, Joe and Staub, Robert}, publisher = {Washington, DC: Linguistic Society of America}, title = {{The timecourse of generalization in phonotactic learning}}, url = {http://tallinzen.net/media/papers/linzen{\_}gallagher{\_}2014{\_}amp.pdf}, year = {2014} }
Investigating the role of entropy in sentence processing. T. Linzen, & T. F. Jaeger. (2014). In Proceedings of the 2014 ACL Workshop on Cognitive Modeling and Computational Linguistics. [Abstract] [PDF] [BibTeX]
Abstract: We outline four ways in which uncertainty might affect comprehension difficulty in human sentence processing. These four hypotheses motivate a self-paced reading experiment, in which we used verb subcategorization distributions to manipulate the uncertainty over the next step in the syntactic derivation (single step entropy) and the surprisal of the verb’s complement. We additionally estimate wordby-word surprisal and total entropy over parses of the sentence using a probabilistic context-free grammar (PCFG). Surprisal and total entropy, but not single step entropy, were significant predictors of reading times in different parts of the sentence. This suggests that a complete model of sentence processing shoul

BibTeX: @inproceedings{linzen2014investigating, author = {Linzen, Tal and Jaeger, T Florian}, booktitle = {Proceedings of the 2014 ACL Workshop on Cognitive Modeling and Computational Linguistics}, pages = {10--18}, publisher = {Association for Computational Linguistics}, title = {{Investigating the role of entropy in sentence processing}}, url = {http://tallinzen.net/media/papers/linzen{\_}jaeger{\_}2014{\_}cmcl.pdf}, year = {2014} }

2013

Syntactic context effects in visual word recognition: An {MEG} study. T. Linzen, A. Marantz, & L. Pylkkänen. (2013). The Mental Lexicon. [Abstract] [PDF] [BibTeX]
Abstract: Words are typically encountered in the context of a sentence. Recent studies suggest that the contexts in which a word typically appears can a!ect the way it is recognized in isolation. We distinguish two types of context: collocational, involving speci"c lexical items, and syntactic, involving abstract syntactic structures. We investigate the e!ects of syntactic context using the distribution that verbs induce over the syntactic category of their complements (subcategorization frames). Magnetoencephalography (MEG) data was recorded while participants performed a lexical decision task. Verbs with low-entropy subcategorization distributions, in which most of the probability mass is concentrated on a handful of syntactic categories, elicited increased activity in the le# anterior temporal lobe, a brain region associated with combinatory processing. Collocational context did not modulate neural activity, but had an e!ect on reaction times. $ese results indicate that both collocational and syntactic contextual factors a!ect word recognition, even in isolation.

BibTeX: @article{linzen2013syntactic, author = {Linzen, Tal and Marantz, Alec and Pylkk{\"{a}}nen, Liina}, journal = {The Mental Lexicon}, keywords = {anterior temporal lobe,argument structure,information theory,lexical processing,magnetoencephalography,subcategorization frames,syntax}, number = {2}, pages = {117--139}, title = {{Syntactic context effects in visual word recognition: An {\{}MEG{\}} study}}, url = {http://tallinzen.net/media/papers/linzen{\_}marantz{\_}pylkkanen{\_}2013{\_}mental{\_}lexicon.pdf}, volume = {8}, year = {2013} }
Lexical and phonological variation in {Russian} prepositions. T. Linzen, S. Kasyanenko, & M. Gouskova. (2013). Phonology. [Abstract] [PDF] [BibTeX]
Abstract: Phonological rules can be variable in two ways: they can apply to a subset of the lexicon (lexical variation), or apply optionally, with a probability that depends on the phonological environment (stochastic variation). These two types of variation are occasionally seen as mutually exclusive. We show that the vowel–zero alternation in Russian prepositions ([s trudom] ‘with difficulty’ vs. [s@ stinoj] ‘with the wall’) exhibits both types of variation. In two corpus studies and a nonce-word experiment, we document novel stochastic factors that influence the alternation: similarity avoidance, stress position and sonority profile. These constraints interact additively, lending support to a weighted-constraints analysis. In addition to phonologically determined stochastic variation, we find significant lexical variation: phonologically similar nouns differ in the rate at which they condition the alternation in the prepositions. We analyse this pattern by augmenting the weighted-constraints approach with lexical scaling factors.

BibTeX: @article{linzen2013lexical, author = {Linzen, Tal and Kasyanenko, Sofya and Gouskova, Maria}, journal = {Phonology}, number = {3}, pages = {453--515}, title = {{Lexical and phonological variation in {\{}Russian{\}} prepositions}}, url = {http://tallinzen.net/media/papers/linzen{\_}kasyanenko{\_}gouskova{\_}2013{\_}phonology.pdf}, volume = {30}, year = {2013} }

HIRING POSTDOCTORAL FELLOW:

MODELING LANGUAGE IN THE HUMAN BRAIN USING ARTIFICIAL NEURAL NETWORKS

A joint postdoctoral position is available in the labs of Christopher Honey and Tal Linzen at Johns Hopkins University.

The goal of this project is to use state-of-the-art artificial neural networks to understand the mechanisms and architectures that enable the human brain to integrate linguistic information at the levels of syllables, words and sentences. For this purpose, the project lead will have access to high-fidelity intracranial recordings from the surface of the human brain, as people process sentences and narratives. In parallel, this project is expected to generate new computational models and analytic methods for natural language processing (NLP), informed and constrained by human data.

Johns Hopkins is home to a large and vibrant community in neuroscience and computational linguistics, and the training environment will span the Departments of Cognitive Science, Psychological and Brain Sciences, and Computer Science. The postdoctoral researcher will be affiliated with the Center for Language and Speech Processing, one of the world’s largest centers for computational linguistics.

For candidates who wish to collect new datasets, Hopkins provides a top-notch neuroimaging center, including 3T and 7T scanners; new TMS and EEG facilities housed in the PBS department; and access to human intracranial experiments via neurology collaborators in Baltimore and Toronto. The postdoctoral researcher will have access to a large number of GPUs for training neural networks and other computational models through the Maryland Advanced Research Computing Center.

The position is available immediately, though start date is somewhat flexible. Applications will be reviewed on a rolling basis. The initial appointment is for one year, with the opportunity for renewal thereafter. We especially encourage applications from women and members of minorities that are underrepresented in science.

QUALIFICATIONS

Candidates should have (i) a PhD in a relevant field (e.g., linguistics, cognitive science, neuroscience, physics, psychology, mathematics, or computer science) by the start date; (ii) a publication record that includes computational modeling and empirical data analysis. The ideal candidate will have a combined background in computational linguistics, machine learning and neuroscience.

APPLICATION INSTRUCTIONS

To apply, please email a cover letter (including a brief summary of previous research accomplishments and future plans), a current CV, and a relevant publication to [email protected]. In the CV or cover letter, please include contact information for three references. For any questions, feel free to email Chris Honey ([email protected]) and Tal Linzen ([email protected]).

HIRING POSTDOCTORAL FELLOW:

MODELING LANGUAGE USING ARTIFICIAL NEURAL NETWORKS

The Computation and Psycholinguistics Lab at Johns Hopkins University (caplabjhu.edu), directed by Tal Linzen (tallinzen.net), is seeking to hire a post-doctoral researcher. Research in the lab lies at the intersection of linguistics, psycholinguistics and deep learning (for a survey of some of the areas of research in the lab, see this paper). There is considerable flexibility as to the specific topic of research; potential areas include:

* Studying syntactic and semantic generalization across languages and neural network architectures. This topic is particularly well-suited to candidates with a strong background in syntax or semantics and significant computational skills; it does not require existing expertise in neural networks.

* Developing neural network models that learn syntax from the input available to a child and/or match human comprehension and reading behavior.

The training environment will span the Departments of Cognitive Science and Computer Science. The postdoctoral researcher will be affiliated with the Center for Language and Speech Processing (CLSP), one of the world's largest centers for computational linguistics; collaborations with other groups at CLSP will be encouraged. The candidate will have access to extensive computational resources through the Maryland Advanced Research Computing Center, as well as an eye-tracker for running behavioral experiments, if relevant to the project.

The position is available immediately, and start date is flexible. Applications will be reviewed on a rolling basis. The initial appointment is for one year, with the opportunity for renewal thereafter. We especially encourage applications from women and members of minorities that are underrepresented in science.

QUALIFICATIONS

Candidates should have a PhD in a relevant field (including, but not limited to, linguistics, psychology, cognitive science and computer science) by the start date.

COMPUTATION AND PSYCHOLINGUISTICS LAB

@ JOHNS HOPKINS UNIVERSITY

LAB NEWS

PEOPLE

PRINCIPAL INVESTIGATOR

GRADUATE STUDENTS

UNDERGRADUATE RESEARCH ASSISTANTS

LAB MANAGER

LAB ALUMNI

RESEARCH

OVERVIEW

EXPECTATION-BASED LANGUAGE COMPREHENSION

LINGUISTIC REPRESENTATIONS IN ARTIFICIAL NEURAL NETWORKS

GENERALIZATION IN LANGUAGE

PUBLICATIONS

IN PROGRESS

PUBLISHED

2019

2018

2017

2016

2015

2014

2013

HIRING POSTDOCTORAL FELLOW:

MODELING LANGUAGE IN THE HUMAN BRAIN USING ARTIFICIAL NEURAL NETWORKS

QUALIFICATIONS

APPLICATION INSTRUCTIONS

HIRING POSTDOCTORAL FELLOW:

MODELING LANGUAGE USING ARTIFICIAL NEURAL NETWORKS

QUALIFICATIONS

APPLICATION INSTRUCTIONS

Our lab is currently located in room 113 of Krieger Hall on JHU’s Homewood Campus