This is an old revision of the document!
Native Language Identification Shared Task 2013
A shared task in Native Language Identification to identify the native language of a writer based solely on a sample of their writing.
- Team: Barbora Hladka (contact person, related projects, data, ML), Martin Holub (algorithms, ML), Silvie Cinkova (features), …
- Important Dates:
- January 14 Training Data Release
- March 11 Test Data Release
- March 18 Submissions Due
- March 25 Results Announcement
- April 08 Papers Due
- April 10 Revision Requests Sent
- April 12 Camera Ready Version Due
- June 13 or 14 NLI Shared Task Presentations @ BEA8 Workshop, Atlanta, GA, USA
- Data: TBA
- References
- Brooke, Julian, Greme Hirst. Robust lexicalized native language identification. COLING 2012.
- Brooke, Julian, Greme Hirst. Native language detectin with 'cheap' learner corpora. In Proceedings of the Conference on Learner Corpus Research, Louvain-la-Neuve. 2011.
- learner corpora review;
- they discuss topic bias - Do we have to care about it in the NLI task?
- Feature set: [character|POS|word] n-grams, function words, features from machine translation, features from L1 corpora. How many features: ???
- Machine learning algorithm: ??
- Wong, Sze-Meng Jojo, Dras Mark, Johnson, Mark. Topic Modeling for Native Language Identification. In Proceedings of Australasian Language Technology Association Workshop, pp. 115-124 (pdf).