This is an old revision of the document!

Native Language Identification Shared Task 2013

A shared task in Native Language Identification to identify the native language of a writer based solely on a sample of their writing.

Home page: https://sites.google.com/site/nlisharedtask2013/home
Team: Barbora Hladka (contact person, related projects, data, ML), Martin Holub (algorithms, ML), Silvie Cinkova (features), …
Important Dates:
- January 14 Training Data Release
- March 11 Test Data Release
- March 18 Submissions Due
- March 25 Results Announcement
- April 08 Papers Due
- April 10 Revision Requests Sent
- April 12 Camera Ready Version Due
- June 13 or 14 NLI Shared Task Presentations @ BEA8 Workshop, Atlanta, GA, USA
Data: TBA
References
1. Brooke, Julian, Greme Hirst. Native language detectin with 'cheap' learner corpora. In Proceedings of the Conference on Learner Corpus Research, Louvain-la-Neuve. 2011.
  - learner corpora review;
  - they discuss topic bias - Do we have to care about it in the NLI task?
  - Feature set: [character|POS|word] n-grams, function words, features from machine translation, features from L1 corpora. How many features: ???
  - Machine learning algorithm: ??
2. Wong, Sze-Meng Jojo, Dras Mark, Johnson, Mark. Topic Modeling for Native Language Identification. In Proceedings of Australasian Language Technology Association Workshop, pp. 115-124 (pdf).

Institute of Formal and Applied Linguistics Wiki