I am a computational linguist living in Singapore. My primary research interests include:
I'm currently a postdoctoral research fellow in the School of Humanities at Nanyang Technological University. Here is my CV.
Abstract Meaning Representation (AMR) (Banarescu et al., 2013) is a framework for semantic dependencies that encodes its rooted and directed acyclic graphs in a format called PENMAN notation. The format is simple enough that users of AMR data often write small scripts or libraries for parsing it into an internal graph representation, but there is enough complexity that these users could benefit from a more sophisticated and well-tested solution. The open-source Python library Penman provides a robust parser, functions for graph inspection and manipulation, and functions for formatting graphs into PENMAN notation. Many functions are also available in a command-line tool, thus extending its utility to non-Python setups.
bibIn this paper we discuss the experience of bringing together over 40 different wordnets. We introduce some extensions to the GWA wordnet LMF format proposed in Vossen et al. (2016) and look at how this new information can be displayed. Notable extensions include: confidence, corpus frequency, orthographic variants, lexicalized and non-lexicalized synsets and lemmas, new parts of speech, and more. Many of these extensions already exist in multiple wordnets &endash; the challenge was to find a compatible representation. To this end, we introduce a new version of the Open Multilingual Wordnet (Bond and Foster, 2013), that integrates a new set of tools that tests the extensions introduced by this new format, while also ensuring the integrity of the Collaborative Interlingual Index (CILI: Bond et al., 2016), avoiding the same new concept to be introduced through multiple projects. bib
Abstract Meaning Representation (AMR; Banarescu et al., 2013) encodes the meaning of sentences as a directed graph and Smatch (Cai and Knight, 2013) is the primary metric for evaluating AMR graphs. Smatch, however, is unaware of some meaning-equivalent variations in graph structure allowed by the AMR Specification and gives different scores for AMRs exhibiting these variations. In this paper I propose four normalization methods for helping to ensure that conceptually equivalent AMRs are evaluated as equivalent. Equivalent AMRs with and without normalization can look quite different---comparing a gold corpus to itself with relation reification alone yields a difference of 25 Smatch points, suggesting that the outputs of two systems may not be directly comparable without normalization. The algorithms described in this paper are implemented on top of an existing open-source Python toolkit for AMR and will be released under the same license.
bibAn open-source library for working with Minimal Recursion Semantics, [incr tsdb()] test suites, TDL code, and other representations used in HPSG grammars as produced within the DELPH-IN consortium.
A framework for working with interlinear glossed text, including the eponymous Xigt data model that uses a flat structure with ID-references in order to accommodate non-projective annotations, e.g., for annotating semantic dependencies.
A processing pipeline and related scripts for transfer-based machine translation in the LOGON paradigm. This project forms the bulk of the task-specific code I used for my Ph.D. research, although it may be useful for others working in a similar space.
Postdoctoral Research Fellow, Nanyang Technological University, February 2019–
Ph.D. Linguistics, University of Washington, June 2018
Dissertation: Semantic Operations for Transfer-based Machine Translation Defense slides
Research Associate, Nanyang Technological University, April 2014–August 2015
MA Computational Linguistics, University of Washington, August 2009
Thesis: Egad: Efficiently Evaluating and Extracting Errors from Deep Grammars
Contractor for Microsoft Translator, Microsoft Research (Populus Group), February 2009–June 2009
Invited Advisor, National Institute of Information and Communications Technology (NICT / 情報通信研究機構), October 2008–January 2009
BS Computer Science, Oregon State University, August 2007