for ACL 2020, I wrote up a nontechnical summary of our research. posting here in case it’s useful.

Inflecting When There’s No Majority: Limitations of Encoder-Decoders as Cognitive Models of German Plurals

Kate McCurdy, Sharon Goldwater, Adam Lopez

Our paper looks at whether artificial neural networks behave similarly to human speakers when faced with new words. Neural networks successfully learn to extend frequent patterns to new words. For example, they can learn to use the English plural suffix -s (as in dogs and cats) for new words (e.g. wugs), and some researchers have argued that this behavior is human-like. In some cases, however, human speakers also extend infrequent or rare patterns to new words. A hypothetical example in English would be a new word which took the same singular and plural form, like sheep or deer; however, while this almost never occurs in English, there are other languages where speakers reliably apply rare patterns to new words. Can neural models also learn this behavior?

If a German speaker learns a new word Bral and wants to produce its plural form, that speaker must make a decision. Will Bral take the suffix -e (like the existing word Male, which means “times”) or -en (like Wahlen, “votes”), both of which are quite frequent patterns in German? Or will it take a more rare suffix such as -er (like Täler, “valleys”) or -s (like Schals, “scarves”)? Based on previous linguistic research, the speaker is more likely to use the rare suffix -s if the new word sounds more unusual (e.g. Bnöhk or Plaupf), rather than similar to existing German words (e.g. Bral). This feature of the German plural system makes it a useful test case to evaluate whether neural models can learn to generalize rare patterns as speakers do.

Our experiment used two lists of made-up words, developed by previous researchers: Rhymes, which sound like existing German words (e.g. Bral, Spert), and Non-Rhymes, which sound more unusual (e.g. Bnöhk, Plaupf). We presented these words to German speakers and asked them to produce the plural forms of these words. We also trained a neural network to produce plural forms of German words, and presented the same list of made-up words to the network.

Our results indicate that neural models do not use rare patterns the way speakers do. We found that speakers used the rare suffix -s more on Non-Rhymes compared to Rhymes, consistent with earlier studies. By contrast, the neural network did not use -s more on Non-Rhymes; instead, it showed a frequency bias, using the frequent suffix -e more on Non-Rhymes than Rhymes. This finding suggests that neural models do not fully capture human speaker cognition: they fail to learn the conditions under which speakers generalize rare patterns.