[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Table of Contents

How to Write a Master's Thesis

NLP Master's Thesis from Enrollment to Defense

This tutorial could also be called “things I wish I had known when I was writing my diploma thesis”. This guideline is intended to ease writing your master's thesis (and hopefully, to produce better theses and successful defenses, as a result) and what better way is there than provide you with a spectrum of information I gained being on both sides of the trench. I'm going to navigate you through the process of your master's thesis assignment, writing and defense. The guideline is mostly fitted to a typical experimental NLP master's thesis (at ÚFAL) but I'm sure you can tweak it to other situation once you get the general idea.

Timeline

I don't think I can emphasize enough: start EARLY. Seriously. You will make your life much easier if you start programming, measuring and writing (that especially!) in good time. Most opponents can recognize hasty writing hurriedly finished over the last week before deadline. As for the question what is the minimum time in which I can write a thesis, let's pretend I never heard the question.

The exact dates of everything that happens at the Faculty of Mathematics and Physics, such as beginnings and ends of semesters, deadlines for courses and exams enrollments, deadlines for theses submissions and theses exams, is strictly bound by an official Academic Calendar of the faculty. Look out for the exact deadlines of the required actions in the current version.

With that in mind, let's say that a typical master's curriculum takes two academic years (four semesters). Master's thesis writing is officially a three-semester course, consisting of three subsequent (!) semesters each of which you must subscribe to in the Study Information System (SIS), in the appointed deadline. ML: ALE to není oficiámní požadavek a často si studenti zapíší všechny tři kurzy až najednou na konci čtvrtého semestru; navíc v akad. kalendáři je doporučení vybrat si diplomku až v listopadu 2. roku (což je zejména důležité pro LCT studenty).

ML: However, this ideal timetable is not applicable in all case, especially, e.g., for LCT students who spend one year abroad - LCT students typically select the topic at the beginning of their second year, i.e. during October (note that those doing their second year at another university still must find a co-supervisor from Prague.

That generates roughly the following course of action:

- Master's study defense (obhajoba in Czech): requires submitting the electronic (PDF) and the paper (hardcover) version of the thesis in the appointed deadline, enrolling for the exam in the Study Information System (SIS) (again, the in the official deadline) and obviously, showing up for the defence. More about the actual thesis defense later.
- Master's state final examinations (státnice in Czech): requires registering for the examination in the required deadline and showing up for the examination. Also more on the examination later.

So each of both parts is offered separately in the Study Information System (SIS), each of them must be enrolled in the required deadline and they can be taken jointly or separately in three possible terms throughout the year: summer (June), autumn (September) and winter (February). Please note that the deadline for registration differs for each of the terms. There is a preferred way, when everything went well, and that is the summer final examination after the 2nd year's summer semester, taking both the defense and examination, which places the registration for both examinations and the submission of the thesis somewhere to May.

For LCT students: You will also have to come in person and defend the thesis plus pass the state exams at Charles University, even if you are at a partner university for the second year (unless discussed and arranged in a different way well in advance).

Survivor Advice:

“One thing I would like to do better next time (and advise students to do better in their first time) is time management: The programming part should be finished in a half time assigned to bachelor's thesis (even in case it is the main part of the master's thesis). In the end, it will delay and there will be a lot to correct and improve. The experimenting, evaluation, text writing and corrections (corrections!!!) will take more time than one expects. And even if it is not the case - then it's great, at least one is not stressed out from submitting at the last time and will have some spare time.”'

Selecting the Topic, Finding the Supervisor and Registering the Topic in SIS

There are ML: two several ways to select your thesis topic:

Your supervisor will be typically one of ÚFAL professors, researchers and Ph.D. students, although the supervisor may in theory come from other departments of Charles University (or even from other institution).

The student and their supervisor put together the title of the thesis and the description of the work (abstract): this will constitute the official assignment. It should be broad enough to allow deviations once the student gets their first results and realizes that the originally anticipated course of experiments is not the best one to follow. Later changes of title and topic are in theory possible but it is an administrative hassle that is better to avoid.

Necessary steps:

The student office will later send back the hardcopy stamped and signed by the vicedean. Ms. Brdičková will keep it until the end of the academic year. The student will then have to pick it up and include it in one of the printed copies of the thesis (mandatory part, this copy goes to the faculty library).

Assignment of the thesis via SIS is described here: https://www.mff.cuni.cz/en/students/bachelor-and-master-thesis/guidelines-for-writing-a-master-thesis

Writing the Thesis

From the faculty guidelines:
The aim of the master thesis is to demonstrate an ability to work independently. The form a bachelor thesis takes may vary (background research, software, proof of a new theorem, etc.), due to the differences between various study programmes and disciplines. Presentation of original results is always desirable, but not necessary.

How Much Is Enough

The very fist question I usually get is “How much do I have to write?”. As far as I know, there is no official number anywehere, so I'm just going to speak from experience: An experimental NLP master's thesis most probably non-rejectable on the basis of “too few text” if it consists of absolute minimum of 40-50 A4 pages of content, not including the front page, acknowledgements, table of contents, bibliography & appendices. That is, the Conclusion Section should appear on page 40-50.

Me or We

Most scientific contributions were achieved in a team and it is therefore customary to write publications in plural, as in “We present”, “We implemented” and “We conclude that…”. If you are a single author of the work, though, opinions differ on the choice of singular (“I”) and plural (“We”). Some even say a third person should be used as in “The researcher found out that…” It is generally advisable to avoid the singular/plural choice wherever you can by using impersonal passive voice, such as “The research shows that…”, “The findings suggest that…”. The active voice singular (“I”) would then be kept for sentences in which you underscore your particular contribution: “I implemented”, “I measured”. Even the use of “We” throughout the thesis in all places where you are clearly referring to you as a single author, is acceptable. Using a plural “we” instead of singular “I” in scientific writings is supposedly called ''pluralis auctoris'' and Wikipedia says it is more common towards the East (hence the Wiki page is only in a few languages, including Czech, but excluding English). The English term for this is author's we. The APA style allows first person both in singular and plural. That being said, the choice is really yours.

The Content

An typical experimental NLP thesis usually consists of these parts:

I describe each section in detail on a separate page.

Typesetting and Formatting

This is an area which you can get right and get some plus points for very low cost. Correct typesetting and formatting can be sitted through with a little of diligence and patience even if one is no Einstein. You won't get complaints for your thesis not being rocket science, but you can get a lot of complaints for poor presentation of your work. All of this can be avoided, if you:

(Petricek, 2016)1)

For LCT students: You should follow the instructions for the thesis formatting as strictly as possible; however, some slight deviations commonly occur for LCT students and are tolerated. You can have your other university's name and logo on the title page. If you have co-supervisor from the other university, you can include their name as well. For the purpose of defense at Charles University, you should show your CUNI supervisor (i.e. the one that is indicated at your CUNI official assignment) as the main one. Name your CUNI supervisor on the abstract page.

Referencing, Plagiarism and These Things (Don't Skip Me!)

Plagiarism is a big NO-NO in science. It's so big you can get in serious trouble if caught. The trouble is, many people are not quite sure what plagiarism is and what it isn't. Sometimes, there is a grey area, but sometimes, the borderline is pretty clear.

Generally, plagiarism is presenting someone else's work/idea/text/source code as your own. Specifically, anything that appears in your thesis and is not referenced or is not general knowledge, is being presented by you as your work/idea/text/source code, unless properly referenced. Some very obvious examples of things that should be referenced are:

Every time you copy/reproduce a sentence/definition/figure/table, the reference must be repeated. It is not OK to state in the beginning of the section “And from now on, I shall draw from publication XY” and then go on three pages freely mixing your text and sentences from XY. With some exceptions, it is also not OK to copy entire paragraphs or pages, even if they are properly referenced, as you are supposed to write you own thesis, not copy someone else's work. It is however allright if you introduce/explain an idea with a proper reference and then discuss the idea in the next three paragraphs without referencing it again and again. If your discussion, however, replicates someone else's opinion, you should reference it, e.g. “Our data support hypothesis ABC, as well as results of REF-to-XY.”

Less obvious and not always necessary references may be required by:

In scientific writing, there is a rigid, accustomed way of proper referencing. TODO write about reference norms.

CUNI student handbook:
(pdf cz) https://karolinum.cz/knihy/foltynek-jak-se-vyhnout-plagiatorstvi-24022
(web cz) https://www.akademickaetika.cz/prirucka-pro-studenty/

(pdf eng) https://karolinum.cz/knihy/foltynek-how-to-avoid-plagiarism-24023
(web eng) https://www.akademickaetika.cz/en/student-handbook/

CUNI handbook for academic staff:
(pdf cz) https://karolinum.cz/knihy/foltynek-jak-predchazet-plagiatorstvi-ve-studentskych-pracich-24082
(web cz) https://www.akademickaetika.cz/prirucka-pro-akademicke-pracovniky/

(pdf eng) https://karolinum.cz/knihy/foltynek-how-to-prevent-plagiarism-in-student-work-24024
(web eng) https://www.akademickaetika.cz/en/handbook-for-academic-staff/

Moodle kurz na FF
https://www.ff.cuni.cz/2021/02/e-learning-focused-academic-integrity-research-ethics-english/
(asi zatím dostupné jen pro studenty FF, pracuje se na verzi pro celou uni - mělo by být dostupné v dubnu/květnu 2022)

odkazy na cizí uni:
https://gpsdocs.rice.edu/orientation/Plagiarism_Hewitt_document.pdf
http://cnx.org/content/col10604/1.1/

Abstract

píše se nakonec :-))
Jak s překladem do češtiny - typicky lze požádat o pomoc supervisora :-)

It is strongly recommended that the abstract in your thesis differs from the abstract in the official assignment. Even though in theory you could claim that you did exactly what was in the assignment, it is better to show the reviewer that it is not just a copy. Typically, the assignment is more vague because you do not know what exactly you will do after you see the results of the first experiments. In contrast, the abstract should summarize what you actually did. Even if your thesis closely matches the assignment, the abstract probably should highlight your main achievement(s) (e.g. “we were able to improve the state of the art by 50%”).

Submitting the Thesis

Getting ECTS credits for the thesis

When you plan your classes for the academic year (and register them in SIS), you should also register for the three pseudo-courses reflecting your work on the thesis:

At the end of the summer semester you will (hopefully) get the credits (in SIS) from your supervisor. This is not directly connected with the fact that the thesis will be (can be) defended. The credits just reflect the fact that you have invested significant time and effort into doing the research and preparing the thesis / meeting the agreed milestones.

In practice, the supervisor will not be able to award the credits if SIS does not know that you want them, i.e. if you have not registered for these pseudo-courses! While the registration can be completed just before the submission of the thesis, it may complicate the situation because the assistance of the student office is needed, the staff may be out of office (vacation time!), you may be traveling from the other end of Europe, trying to find all the people and get all the signatures within one afternoon etc.

Submitting

Note that you have to finish all your courses and get necessary credits before you are allowed to submit your diploma thesis and/or registered for the state exam.

https://www.mff.cuni.cz/en/students/bachelor-and-master-thesis/guidelines-for-writing-a-master-thesis

The deadline for submission of a master thesis is specified in the Academic calendar https://www.mff.cuni.cz/en/students/academic-calendar ;
three possible terms are offered throughout the year: summer (mid May), autumn (mid July) and winter (beginning of January).

You are suppose to submit your thesis in the following formats:

  1. An electronic version of the thesis should be submitted to SIS in due time.
  2. Two hardcopies are submitted as well

Electronic version

Don't fall in the PDF/A trap! The electronic system requires all submitted PDF in PDF/A format and there is an automatic check for PDF/A. Allow yourself enough time to find out how to convert your PDF into PDF/A (like, not in the last two hours before deadline midnight). Some good advice on PDF/A, unfortunately only in Czech, can be found here.

The electronic file should not exceed 850 MB in size.

Dan píše ve svých poznámkách:
PDF-A requirements:
there is an ufal-wide installation of the verapdf tool available at /opt/tools/verapdf, with a useful wrapper script /opt/tools/verapdf/checkpdf that can be more talkative if you use the –verbose option.

Hardcopies

The hardcopies typically also contain a CD/DVD with the electronic version of the text (PDF) and all data and software that you created during the work, including documentation and possibly third-party software, if it is relevant and redistributable.
TODO: is this still true in 2021? The standard way is a zip attachment in SIS (plus a link to GitHub/Lindat/etc.).

If you order the binding in Prague, the stores typically know how the product should look like, so they will only ask you your name and the name of the faculty. You will have to submit two hardcopies of the thesis: one for the supervisor / the opponent and one for the library. The hardcopy for the library must also contain the original assignment, signed and stamped from the dean's office (at this time it is probably being kept for you by Ms. Doušová – see above).

Final Examination

As has already been said, the final examination consists of two parts:

  1. Master's study defence (obhajoba in Czech): requires submitting the electronic (PDF) and the paper (hardcover) version of the thesis in the appointed deadline, enrolling for the exam in the study information system (SIS) (again, the in the official deadline) and obviously, showing up for the defence. More about the actual thesis defence later.
  2. Master's state final examinations (státnice in Czech): requires registering for the examination in the required deadline and showing up for the examination. Also more on the examination later.

So each of both parts is offered separately in the study information system (SIS), each of them must be enrolled in the required deadline and they can be taken jointly or separately in three possible terms throughout the year: summer (June), autumn (September) and winter (February). Please note that the deadline for registration differs for each of the terms. Actually, there is a preferred way, when everything went well, and that is the summer final examination after the 2nd year's summer semester, taking both the defence and examination, placing the registration for both examinations and the submission of the thesis somewhere to May.

In fact, the defence and the final state examination are usually organized in two different days with span about a week. One day accommodates all defences, the other on all final state examinations. The reason for this is purely organizational: different committees have to meet for each event.

Thesis Defence

https://www.mff.cuni.cz/en/students/bachelor-and-master-thesis/guidelines-for-writing-a-master-thesis

Prepare a presentation (PDF, Microsoft PowerPoint or LibreOffice Impress). A laptop will be available (or you can use your own), with a dataprojector.

You will have only 10 minutes, which is very short time! The committee will be NLP-literate, but not necessarily specialized in your problem. Keep the introduction very short: the area I selected is XXX, the concrete problem is YYY, it is a problem because ZZZ (impact, motivation). Then another slide introducing methodology, then possibly a few interesting details from your research, then results. As with every presentation, avoid slides with too much text or too many numbers (or if you believe you need many numbers to be able to answer questions, highlight the one or two numbers that the audience should not miss). Avoid complicated formulas or anything that may take some time for the audience to grasp (remember, you will not give them much time because you will not have it). Avoid too many slides (1 minute per slide should be minimum).

Present the slides to yourself aloud, possibly several times. Try to arrange a dry-run with your supervisor. Get confident about what you are going to say. Check the timing.

There will be two written reviews of your thesis: one by your supervisor, the other by an opponent. Both reviews shall be available to you at least one week before the defense. The reviews may contain questions that you will have to answer during the defense. If so, prepare the answers. You may prepare slides to back the answers if applicable. In that case, don't include these slides in your main presentation. They don't count towards your time limit. Wait until asked, then show them (they may be part of the same presentation file, e.g. you may put them after the Thank You slide).

Final State Examination

https://ufal.mff.cuni.cz/teaching/state-exams

1)
Tomas Petricek [@tomaspetricek]. (2016, May 11). Making silly #latex jokes is much more fun than doing final tweaks in my thesis on #coeffects…. Twitter. URL

[ Back to the navigation ] [ Back to the content ]