Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
khresmoi:fr [2012/01/23 10:55] ufal |
— (current) | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ===== Jazyková data FR ===== | ||
- | Zatím mám vše uloženo u sebe. --- // | ||
- | == Vysvětlivky == | ||
- | 8-) už stažená data | ||
- | :?: nevíme, jestli chceme | ||
- | :-? chceme stahovat, ale zatím nevíme, jak na to ... z různých příčin | ||
- | ==== ELRA ==== | ||
- | |||
- | * **ELRA-E0022: | ||
- | |||
- | Subpart: 140 Mb of data from the medical domain | ||
- | Zatím nedodáno (PP) | ||
- | |||
- | * **ELRA-E0019: | ||
- | |||
- | Subpart (medical corpus): 9,000,000 words | ||
- | Zatím nedodáno (PP) | ||
- | |||
- | ==== LDC ==== | ||
- | * **French Gigaword** 3rd edition, catalogue number LDC2011T10, máme DVD | ||
- | Formát: SGML, segmentace na věty, netokenizováno | ||
- | 862 851 slov, tj. simply the number of white space-separated tokens (of all types) after all SGML tags are eliminated | ||
- | Všeobecné novinové texty, ne lékařské - Agence France-Presse, | ||
- | Dále jsem našla: | ||
- | * **Hansard French/ | ||
- | To by bylo třeba objednat, ale je to drahé: | ||
- | Member fee: $0 for 1995, 1996, 1997 members | ||
- | Reduced-License Fee: US $3250.00 | ||
- | * **UN Parallel Text (Complete)** ... LDC Catalog No.: LDC94T4A, jazyky EN, FR, SP, government documents | ||
- | To by bylo třeba objednat, ale je to drahé: | ||
- | Member fee: $0 for 1994 members | ||
- | Non-member Fee: US $4000.00 | ||
- | Reduced-License Fee: US $2000.00 |