Product Code Database
Example Keywords: office -stocking $39
   » » Wiki: Treebank
Tag Wiki 'Treebank'.
Tag

In , a treebank is a parsed that or sentence structure. The construction of parsed corpora in the early 1990s revolutionized computational linguistics, which benefitted from large-scale .Alexander Clark, Chris Fox and Shalom Lappin (2010). The handbook of computational linguistics and natural language processing. Wiley.


Etymology
The term treebank was coined by linguist in the 1980s, by analogy to other repositories such as a or .Sampson, G. (2003) ‘Reflections of a dendrographer.’ In A. Wilson, P. Rayson and T. McEnery (eds.) Corpus Linguistics by the Lune: A Festschrift for Geoffrey Leech, Frankfurt am Main: Peter Lang, pp. 157-184 This is because both syntactic and semantic structure are commonly represented compositionally as a . The term parsed corpus is often used interchangeably with the term treebank, with the emphasis on the primacy of sentences rather than trees.


Construction
Treebanks are often created on top of a corpus that has already been annotated with part-of-speech tags. In turn, treebanks are sometimes enhanced with or other linguistic information. Treebanks can be created completely manually, where linguists annotate each sentence with syntactic structure, or semi-automatically, where a assigns some syntactic structure which linguists then check and, if necessary, correct. In practice, fully checking and completing the parsing of natural language corpora is a labour-intensive project that can take teams of graduate linguists several years. The level of annotation detail and the breadth of the linguistic sample determine the difficulty of the task and the length of time required to build a treebank.

Some treebanks follow a specific linguistic theory in their syntactic annotation (e.g. the BulTreeBank follows ) but most try to be less theory-specific. However, two main groups can be distinguished: treebanks that annotate phrase structure (for example the Penn Treebank or ICE-GB) and those that annotate dependency structure (for example the Prague Dependency Treebank or the Quranic Arabic Dependency Treebank).

It is important to clarify the distinction between the formal representation and the file format used to store the annotated data. Treebanks are necessarily constructed according to a particular grammar. The same grammar may be implemented by different file formats. For example, the syntactic analysis for John loves Mary, shown in the figure on the right/above, may be represented by simple labelled brackets in a text file, like this (following the Penn Treebank notation):

(S (NP (NNP John))
   (VP (VPZ loves)
       (NP (NNP Mary)))
   (. .))
     

This type of representation is popular because it is light on resources, and the tree structure is relatively easy to read without software tools. However, as corpora become increasingly complex, other file formats may be preferred. Alternatives include treebank-specific schemes, numbered indentation and various types of standoff notation.


Applications
From a computational linguisticsHaitao Liu, Wei Huang — A Chinese Dependency Syntax for Treebanking, published by Communication University of China, published (online) by the Association for Computational Linguistics - accessed 2020-2-4 perspective, treebanks have been used to engineer state-of-the-art natural language processing systems such as part-of-speech taggers, , semantic analyzers and machine translation systems. Most computational systems utilize gold-standard treebank data. However, an automatically parsed corpus that is not corrected by human linguists can still be useful. It can provide evidence of rule frequency for a parser. A parser may be improved by applying it to large amounts of text and gathering rule frequencies. However, it should be obvious that only by a process of correcting and completing a corpus by hand is it possible then to identify rules absent from the parser knowledge base. In addition, frequencies are likely to be more accurate.

In corpus linguistics, treebanks are used to study syntactic phenomena (for example, diachronic corpora can be used to study the time course of syntactic change). Once parsed, a corpus will contain frequency evidence showing how common different grammatical structures are in use. Treebanks also provide evidence of coverage and support the discovery of new, unanticipated, grammatical phenomena.

Another use of treebanks in theoretical linguistics and psycholinguistics is interaction evidence. A completed treebank can help linguists carry out experiments as to how the decision to use one grammatical construction tends to influence the decision to form others, and to try to understand how speakers and writers make decisions as they form sentences. Interaction research is particularly fruitful as further layers of annotation, e.g. semantic, pragmatic, are added to a corpus. It is then possible to evaluate the impact of non-syntactic phenomena on grammatical choices.

In linguistics research, annotated treebank data has been used in syntactic research to test linguistic theories of sentence structure against large quantities of naturally occurring examples.


Semantic treebanks
A semantic treebank is a collection of natural language sentences annotated with a meaning representation. These resources use a formal representation of each sentence's semantic structure. Semantic treebanks vary in the depth of their semantic representation. A notable example of deep semantic annotation is the Groningen Meaning Bank, developed at the University of Groningen and annotated using Discourse Representation Theory. An example of a shallow semantic treebank is , which provides annotation of verbal propositions and their arguments, without attempting to represent every word in the corpus in .

Chinese Universal Propositions semantics
Abstract Meaning Representation (AMR) BankDeep semantics
Shallow semantics
Universal Conceptual Cognitive Annotation (UCCA)Deep semantics
Robot Commands Treebank Kais Dukes (2013) Semantic Annotation of Robotic Spatial Commands. Language and Technology Conference (LTC). Poznan, Poland.Deep semantics
Groningen Meaning BankDeep semantics
Parallel Meaning BankDeep semantics
Parallel Meaning BankDeep semantics
Parallel Meaning BankDeep semantics
Parallel Meaning BankDeep semantics
DeepBank projectDeep semantics
Treebank Semantics Parsed CorpusDeep semantics
RoboCup CorpusDeep semantics
GeoqueryDeep semantics
PropBank semantics
Finnish Universal Propositions semantics
Finnish PropBank semantics
French Universal Propositions semantics
German Universal Propositions semantics
Italian Universal Propositions semantics
Portuguese Portuguese PortLex semantics
Portuguese Portuguese Universal Propositions semantics
Spanish Universal Propositions semantics
Turkish PropBank semantics


Syntactic treebanks
Many syntactic treebanks have been developed for a wide variety of languages:

Universal Dependencies, ATBDependency
Afrikaans Universal Dependencies, AfriBoomsDependency
Akkadian Universal Dependencies, PISANDUBDependency
Albanian Universal Dependencies, TSADependency
Universal Dependencies, ATTDependency
Ancient Greek Universal Dependencies, PerseusDependency
Ancient Greek Universal Dependencies, PROIELDependency
Https://github.com/PerseusDL/treebank_data/edit/master/AGDT2/guidelines< /ref>Mambrini, F. 2016. The Ancient Greek Dependency Treebank: Linguistic Annotation in a Teaching Environment. In: Bodard, G & Romanello, M (eds.) Digital Classics Outside the Echo-Chamber: Teaching, Knowledge Exchange & Public Engagement, Pp. 83–99. London: Ubiquity Press. Dependency
PROIEL TreebankDag Haug. 2015. Treebanks in historical linguistic research. In Carlotta Viti (ed.), Perspectives on Historical Syntax, Benjamins, 188-202. A preprint is available at http://folk.uio.no/daghaug/historical-treebanks.pdf.Dependency
Columbia Arabic Treebank (CATiB)Dependency
Prague Arabic Dependency Treebank (PADT)Dependency
Universal Dependencies, NYUADDependency
Universal Dependencies, PADTDependency
Universal Dependencies, PUDDependency
Penn Arabic TreebankPhrase structure
Armenian Universal Dependencies, ArmTDPDependency
Assyrian (Neo-Aramaic) Universal Dependencies, ASDependency
Universal Dependencies, CRBDependency
Universal Dependencies, BDTDependency
Belarusian Universal Dependencies, HSEDependency
Bhojpuri Universal Dependencies, BhEnDependency
Bhojpuri Universal Dependencies, BHTBDependency
Universal Dependencies, KEBDependency
Bulgarian Universal Dependencies, BTBDependency
Bulgarian BulTreeBankHPSG
Universal Dependencies, BDTDependency
Cantonese Universal Dependencies, HKDependency
Cat3LBPhrase structure
Universal Dependencies, AnCoraDependency
Sinica Treebank
Universal Dependencies, CFLDependency
Universal Dependencies, GSDDependency
Universal Dependencies, GSDSimpDependency
Universal Dependencies, HKDependency
Universal Dependencies, PUDDependency
Penn Chinese TreebankPhrase structure
Chinese Dependency TreebankDependency
Quranic Arabic Dependency Treebank (QADT) (Quranic Arabic Corpus)Dependency
Classical Armenian PROIEL TreebankDependency
Universal Dependencies, Coptic ScriptoriumDependency
Croatian Croatian Dependency TreebankDependency
Croatian Universal Dependencies, SETDependency
Prague Dependency TreebankDependency
Universal Dependencies, CACDependency
Universal Dependencies, CLTTDependency
Universal Dependencies, FicTreeDependency
Universal Dependencies, PDTDependency
Universal Dependencies, PUDDependency
Danish Dependency TreebankDependency
Arboretum: A syntactic tree corpus of DanishPhrase structure
Universal Dependencies, DDTDependency
Universal Dependencies, DTBDependency
Spoken Dutch Corpus (CGN)Phrase structure
Universal Dependencies, AlpinoDependency
Universal Dependencies, LassySmallDependency
LASSY Small and LargeDependency
Alpino TreebankDependency
Egyptian Universal Dependencies, UJaenDependency
CCGbankCombinatory categorial grammar
LinGO RedwoodsHPSG
Lancaster Parsed CorpusPhrase structure
Prague English Dependency TreebankDependency
Universal Dependencies, BhEnDependency
Universal Dependencies, ESLDependency
Universal Dependencies, EWTDependency
Universal Dependencies, GUMDependency
Universal Dependencies, GUMRedditDependency
Universal Dependencies, LinESDependency
Universal Dependencies, ParTUTDependency
Universal Dependencies, PronounsDependency
Universal Dependencies, PUDDependency
Treebank Semantics Parsed CorpusPhrase structure
Christine CorpusPhrase structure
Lucy CorpusPhrase structure
Susanne CorpusPhrase structure
BLLIP WSJ corpusPhrase structure
Tübingen Treebank of English / Spontaneous Speech (TüBa-E/S)HPSG
Diachronic Corpus of Present-Day Spoken English (DCPSE)Phrase structure
British Component of the International Corpus of English (ICE-GB)Phrase structure
The PARC 700 Dependency BankDependency
Yahoo Query TreebankDependency
Penn TreebankPhrase structure
Multi-TreebankPhrase structure
CHILDES Brown Eve corpus with dependency annotationDependency
SMULTRON - Parallel Treebank EN-DE-SVPhrase structure
Universal Dependencies, JRDependency
Estonian ArborestPhrase structure
Estonian Syntactically analyzed and disambiguated text corpusDependency
Estonian Universal Dependencies, EDTDependency
Estonian Universal Dependencies, EWTDependency
Universal Dependencies, FarPaHCDependency
Universal Dependencies, OFTDependency
Turku Dependency Treebank (TDT)Dependency
Universal Dependencies, FTBDependency
Universal Dependencies, PUDDependency
Universal Dependencies, TDTDependency
RhapsodieDependency and macrosyntactic annotation
L'ArboratoirePhrase structure
Universal Dependencies, CrapBankDependency
Universal Dependencies, FQBDependency
Universal Dependencies, FTBDependency
Universal Dependencies, GSDDependency
Universal Dependencies, ParTUTDependency
Universal Dependencies, PUDDependency
Universal Dependencies, SequoiaDependency
Universal Dependencies, SpokenDependency
French TreebankPhrase structure
Free French TreebankPhrase structure
Sequoia TreebankPhrase structure & Dependency
Galician Universal Dependencies, CTGDependency
Galician Universal Dependencies, TreeGalDependency
Hamburg Dependency Treebank (HDT)Dependency
Universal Dependencies, GSDDependency
Universal Dependencies, LITDependency
Universal Dependencies, PUDDependency
SMULTRON - Parallel Treebank EN-DE-SVPhrase structure
NEGRAPhrase structure
TIGERPhrase structure
Tübingen Treebank of German / Spontaneous Speech (TüBa-D/S)Phrase structure
Tübingen Treebank of Written German (TüBa-D/Z)Phrase structure
Tübingen Partially Parsed Corpus of Written German (TüPP-D/Z)Phrase structure
PROIEL TreebankDependency
Universal Dependencies, PROIELDependency
Greek Dependency TreebankDependency
Universal Dependencies, GDTDependency
Universal Dependencies, HTBDependency
Hebrew Dependency TreebankDependency
Hindi English Universal Dependencies, HIENCSDependency
Universal Dependencies, HDTBDependency
Universal Dependencies, PUDDependency
AnnCorraDependency
English (historical) Penn Parsed Corpora of Historical English;Phrase structure
English (historical) York-Toronto-Helsinki Parsed Corpus of Old English Prose (YCOE)Phrase structure
French (historical) Corpus MCVFPhrase structure
Portuguese (historical) Tycho Brahe corpusPhrase structure
Hungarian Universal Dependencies, SzegedDependency
Hungarian Hungarian TreebankPhrase structure
Icelandic IcePaHC - Icelandic Parsed Historical CorpusPhrase structure
Icelandic Universal Dependencies, IcePaHCDependency
Icelandic Universal Dependencies, PUDDependency
Indonesian Universal Dependencies, GSDDependency
Indonesian Universal Dependencies, PUDDependency
Indonesian ICONPhrase structure
Universal Dependencies, IDTDependency
ISST - Italian Syntactic-Semantic TreebankPhrase structure and dependency
MIDT (Merged Italian Dependency Treebank) resulting from the merging and harmonization of the TUT and ISST-CoNLL/TANL treebanksdependency
VIT - Venice Italian TreebankPhrase structure and dependency
Universal Dependencies, ISDTDependency
Universal Dependencies, ParTUTDependency
Universal Dependencies, PoSTWITADependency
Universal Dependencies, PUDDependency
Universal Dependencies, TWITTIRODependency
Universal Dependencies, VITDependency
Italian Syntactic-Semantic Treebank for the CoNLL-2007 Shared Task (ISST-CoNLL)dependency
SUT - Siena University Treebank
TUT - Turin University TreebankDependency
ISDT (Italian Stanford Dependency Treebank)dependency
Japanese Kyoto Text Corpus
Japanese Universal Dependencies, BCCWJDependency
Japanese Universal Dependencies, GSDDependency
Japanese Universal Dependencies, KTCDependency
Japanese Universal Dependencies, ModernDependency
Japanese Universal Dependencies, PUDDependency
Japanese Keyaki TreebankPhrase structure
Japanese Tübingen Treebank of Japanese / Spontaneous Speech (TüBa-J/S)Phrase structure
Japanese ATR Dependency corpusDependency
Karelian Universal Dependencies, KKPPDependency
Universal Dependencies, KTBDependency
Komi Permyak Universal Dependencies, UHDependency
Komi Zyrian Universal Dependencies, IKDPDependency
Komi Zyrian Universal Dependencies, LatticeDependency
Universal Dependencies, GSDDependency
Universal Dependencies, KaistDependency
Universal Dependencies, PennDependency
Universal Dependencies, PUDDependency
Universal Dependencies, SejongDependency
Korean TreebankPhrase structure
Kurmanji Universal Dependencies, MGDependency
Universal Dependencies, ITTBDependency
Universal Dependencies, LLCTDependency
Universal Dependencies, PerseusDependency
Universal Dependencies, PROIELDependency
Index Thomisticus TreebankDependency
PROIEL TreebankDependency
Latin Dependency TreebankBamman David & al. 2008. Guidelines for the Syntactic Annotation of Latin Treebanks (v. 1.3). http://nlp.perseus.tufts.edu/syntax/treebank/1.3/docs/guidelines.pdfDependency
Universal Dependencies, LVTBDependency
Lithuanian Universal Dependencies, ALKSNISDependency
Lithuanian Universal Dependencies, HSEDependency
Universal Dependencies, KKPPDependency
Universal Dependencies, MGTBDependency
Universal Dependencies, MUDTDependency
Universal Dependencies, UFALDependency
Mbya Guarani Universal Dependencies, DooleyDependency
Mbya Guarani Universal Dependencies, ThomasDependency
Middle Irish Universal Dependencies, CritMITBDependency
Middle Irish Universal Dependencies, DipMITBDependency
Universal Dependencies, JRDependency
Universal Dependencies, NSCDependency
North Sami Universal Dependencies, GiellaDependency
Norwegian INESS treebanking infrastructureLFG
Norwegian Universal Dependencies, BokmaalDependency
Norwegian Universal Dependencies, NynorskDependency
Norwegian Universal Dependencies, NynorskLIADependency
Old Church Slavonic Universal Dependencies, PROIELDependency
Old Church Slavonic TOROT TreebankDependency
Old French Universal Dependencies, SRCMFDependency
Old Russian Universal Dependencies, RNCDependency
Old Russian Universal Dependencies, TOROTDependency
TOROT TreebankDependency
Persian Dependency Treebank (PerDT)Dependency
PerTreeBankHPSG
Universal Dependencies, SerajiDependency
A Treebank / Test Suite for PolishHPSG
Universal Dependencies, LFGDependency
Universal Dependencies, PDBDependency
Universal Dependencies, PUDDependency
SkładnicaPhrase structure and Dependency
Portuguese Universal Dependencies, BosqueDependency
Portuguese Universal Dependencies, GSDDependency
Portuguese Universal Dependencies, PUDDependency
Portuguese Projecto Floresta Sintá(c)ticaDependency, Phrase structure
Romanian Romanian Dependency TreebankDependency
Romanian Universal Dependencies, NonstandardDependency
Romanian Universal Dependencies, RRTDependency
Romanian Universal Dependencies, SiMoNERoDependency
Universal Dependencies, GSDDependency
Universal Dependencies, PUDDependency
Universal Dependencies, SynTagRusDependency
Universal Dependencies, TaigaDependency
SynTagRus Dependency Treebank (Russian National Corpus)Dependency
Sanskrit Universal Dependencies, UFALDependency
Sanskrit Universal Dependencies, VedicDependency
Scottish Gaelic Universal Dependencies, ARCOSGDependency
Universal Dependencies, SETDependency
Universal Dependencies, MazharDootioDependency
Skolt Sami Universal Dependencies, GiellagasDependency
Universal Dependencies, SNKDependency
Slovene Dependency TreebankDependency
Slovenian Universal Dependencies, SSJDependency
Slovenian Universal Dependencies, SSTDependency
Cast3LBPhrase structure and dependency
Universal Dependencies, AnCoraDependency
Universal Dependencies, GSDDependency
Universal Dependencies, PUDDependency
UAM Treebank of SpanishPhrase structure
Talbanken05Phrase structure and dependency
Swedish TreebankPhrase structure
Universal Dependencies, LinESDependency
Universal Dependencies, PUDDependency
Universal Dependencies, TalbankenDependency
SMULTRON - Parallel Treebank EN-DE-SVPhrase structure
Swedish Sign Language Universal Dependencies, SSLCDependency
Swiss German Universal Dependencies, UZHDependency
Universal Dependencies, TRGDependency
Universal Dependencies, UgnayanDependency
Universal Dependencies, TTBDependency
Universal Dependencies, MTGDependency
NAiST Thai TreebankDependency
Universal Dependencies, PUDDependency
THTBPhrase structure
METU-Sabanci Turkish TreebankDependency
Universal Dependencies, BOUNDependency
Universal Dependencies, GBDependency
Universal Dependencies, IMSTDependency
Universal Dependencies, PUDDependency
Ukrainian Institute for Ukrainian, NGO Gold StandardDependency
Ukrainian Universal Dependencies, IUDependency
Upper Sorbian Universal Dependencies, UFALDependency
NU-FAST TreebankPhrase structure
The URDU.KON-TB TreebankPhrase and Hyper Dependency Structure
Universal Dependencies, UDTBDependency
Universal Dependencies, UDTDependency
Vietnamese Universal Dependencies, VTBDependency
Vietnamese Vietnamese TreebankPhrase structure
Vietnamese Vietnamese Dependency TreebankDependency
Warlpiri Universal Dependencies, UFALDependency
Universal Dependencies, CCGDependency
Universal Dependencies, WTBDependency
Universal Dependencies, YTBDependency

To facilitate the further researches between multilingual tasks, some researchers discussed the universal annotation scheme for cross-languages. In this way, people try to utilize or merge the advantages of different treebanks corpora. For instance, The universal annotation approach for dependency treebanks; and the universal annotation approach for phrase structure treebanks.


Search tools
One of the key ways to extract evidence from a treebank is through search tools. Search tools for parsed corpora typically depend on the annotation scheme that was applied to the corpus. User interfaces range in sophistication from expression-based query systems aimed at computer programmers to full exploration environments aimed at general linguists. Wallis (2008) discusses the principles of searching treebanks in detail and reviews the state of the art around that time.Wallis, Sean (2008). Searching treebanks and other structured corpora. Chapter 34 in Lüdeling, A. & Kytö, M. (ed.) Corpus Linguistics: An International Handbook. Handbücher zur Sprache und Kommunikationswissenschaft series. Berlin: Mouton de Gruyter.


See also

Page 1 of 1
1
Page 1 of 1
1

Account

Social:
Pages:  ..   .. 
Items:  .. 

Navigation

General: Atom Feed Atom Feed  .. 
Help:  ..   .. 
Category:  ..   .. 
Media:  ..   .. 
Posts:  ..   ..   .. 

Statistics

Page:  .. 
Summary:  .. 
1 Tags
10/10 Page Rank
5 Page Refs
1s Time