Gries for SPLaT on Thursday

Please mark your calendars for a visit that should be of interest to anyone who plays with big collected data. Stefan Gries is among the leading folks in corpus methods, tackling research questions that span the linguistic spectrum – phonology, acquisition, cognition, syntax, semantics, and more. His visit this time is hosted by SPLaT!

When: Thursday, May 16th, 4:15-6
Where: Greenberg Room

Contact Robin Melnick (rmelnick@stanford.edu) ASAP if you are interested in meeting with Stefan 1-on-1 while he is here.

Improving corpus-linguistic methods: three small examples / case studies
In this talk, I will discuss three corpus-linguistic case studies that aim at drawing attention to patterns and methods that corpus-linguistic approaches may not pay enough attention to.

The first case study is concerned with phonological similarity within a set of successively more abstract/versatile syntactic patterns or constructions (and a comparison/control set). I will show that, in addition to previous findings on alliterations, phonological similarity between different aspects of component of constructions can also be found; on the side, I will also make a case for the use of robust statistics.

The second case study discusses an ever popular topic in corpus linguistics, measures of association between different linguistic units. Most measures of association that are being used regularly – MI, t, LL – are bidirectional whereas association obviously need not be bidirectional. In this case study, I will showcase one particular measure from the associative learning literature and explore its characteristics when applied to two-word units and ‘control’ collocations.

The final case study discusses a proposed improvement to the analysis of learner corpora. Most traditional learner corpus research is based on simplistic (and risky) statistical analyses and claims to be context-based, but I will try and show a statistical approach to the comparison between learner and native speaker data that is much more appropriate given the stated goals of work in corpus-based SLA.