This is an old revision of the document!

Native Language Identification Shared Task 2013

A shared task in Native Language Identification to identify the native language of a writer based solely on a sample of their writing.

Home page: https://sites.google.com/site/nlisharedtask2013/home
Team: Barbora Hladka (contact person, related projects, data, ML), Martin Holub (algorithms, ML), Silvie Cinkova (features), …
Important Dates:
- January 14 Training Data Release
- March 11 Test Data Release
- March 18 Submissions Due
- March 25 Results Announcement
- April 08 Papers Due
- April 10 Revision Requests Sent
- April 12 Camera Ready Version Due
- June 13 or 14 NLI Shared Task Presentations @ BEA8 Workshop, Atlanta, GA, USA
Data: TBA
References
1. Brooke, Julian, Greme Hirst. Robust lexicalized native language identification. COLING 2012.
2. Brooke, Julian, Greme Hirst. Native language detectin with 'cheap' learner corpora. In Proceedings of the Conference on Learner Corpus Research, Louvain-la-Neuve. 2011.
  - learner corpora review;
  - they discuss topic bias - Do we have to care about it in the NLI task?
  - Feature set: [character|POS|word] n-grams, function words, features from machine translation, features from L1 corpora. How many features: ???
  - Machine learning algorithm: ??
3. Wong, Sze-Meng Jojo, Dras Mark, Johnson, Mark. Topic Modeling for Native Language Identification. In Proceedings of Australasian Language Technology Association Workshop, pp. 115-124 (pdf).

Institute of Formal and Applied Linguistics Wiki