[Lecture Series] Content-Driven Machine Learning: Using Lexical Variability to Optimize Models of Natural Language – The Centre for Advanced Research in Experimental and Applied Linguistics

By Dr. Brendan Johns, October 22, 2019, 11:30 am to 1:00 pm

Department of Linguistics and Languages will host a talk by Dr. Brendan Johns, an Assistant Professor at the University at Buffalo in Communicative Disorders and Sciences, Computational Linguistics, and Cognitive Science. Dr. Johns obtained his PhD from Indiana University in the Departments of Psychological and Brain Sciences and Cognitive Science. His research is based on machine learning and big data approaches to language and memory, and has won awards from the Canadian Society for Brain, Behaviour, and Cognitive Science, the Cognitive Science Society, the Psychonomic Society, and the Society for Computers in Psychology.

Title: Content-Driven Machine Learning: Using Lexical Variability to Optimize Models of Natural Language
Presenter: Dr. Brendan Johns
Date: Tuesday, October 22, 2019
Time: 11:30 am to 1:00 pm
Location: TSH 201, McMaster University

Abstract

The collection of large text sources has revolutionized the field of natural language processing, and has led to the development of many different cognitive models that are capable of extracting sophisticated semantic representations of words based on the statistical redundancies contained within natural language. However, these models are trained on corpora that are a random collection of language written by many different authors, designed to represent one’s average experience with language. This talk will focus on two main issues: 1) how variable the usage of language is across individuals, and 2) how this variability can be used to optimized models of natural language processing. It will be shown that by optimizing models of natural language based on the same lexical variability that human’s experience, it is possible to attain benchmark fits to a wide variety of lexical tasks. Additionally, it will be demonstrated that the optimization procedure is capable of inferring demographic characteristics (such as time and place of data collection) of a group of subjects, shedding light on the experiential basis of language processing.

Expandable List

Abstract