HyTex Phase I

Model architecture and strategies of text-to-hypertext-conversion

The specialized text corpus and its editing

Demo prototype HyTex.1

TermNet - modeling and processing of terminological knowledge

Technical implementation


The results of the first phase are documented in the report on work and results—project B1 (PDF 184 KB).

Practical results of the first phase are:

In you are interested in these results, do not hesitate to contact us.

=> Back to top

Model architecture and strategies of text-to-hypertext-conversion

In the first phase we developed a model-architecture as a theoretical and methodical basis for the automatic text-to-hypertext conversion. This architecture exploits information from three layers for segmentation and linking according to coherence criteria. (cf. depiction ):

  • Information on linguistic and text-grammatical annotation on the document layer.
  • A modeling of central concepts and relations of the domain within a knowledge net (as Topic Map).
  • Information on static and dynamic user profiles.

In developing strategies of text-to-hypertext conversion, we focused on the following subject areas:

  1. On the micro level, we experiment with strategies of producing cohesively closed module views based on text-grammatical distinction.
  2. On the macro level, we develop linking strategies according to preconceptions. We distinguish three types:
    1. strategies to automatically link preconditions bound to terminology on the basis of automatic annotation and weighting of definitional text segments.
    2. strategies of automatic link filtering and link weighting on the basis of the knowledge extract modeled in TermNet.
    3. strategies of building directory paths, considering topical and rhetorical-functional text structures.

Macro strategy: "Terminology-sensitive linking"

A main problem of establishing coherence in the selective reception of specialized texts arises from the fact that a receiver is—with respect to the employment of terms—not able to decide which specific conceptualization the author based them on. In the subject area of sensitive linking, we develop a pragmatically established method which allows linking instances of specialized terms to the appropriate definition in the preceding text which is necessary to comprehend the correct meaning of the term in its current context.

See also:

Generating links in order to reconstruct terminology-bound preconceptions (project publication; PDF 166 KB)

Annotation of definitional text segments and "terminology-sensitive linking" (work report; 122 KB)

Annotation layer: Definitions and instances of term use (documentation; PDF 216 KB)

=> Back to top

The specialized text corpus and its editing

The complete specialized text corpus consists of documents of different text types and contains approximately 25,000 standard pages. In order to distinguish the logical document structure of the corpus, we - in cooperation with the sub-project  SemDoc in Gießen - developed a scheme which is based on DocBook , but nevertheless appropriate for the distinction of available texts. This annotation aims at modularization, among other things. The text grammatical annotation furthermore focuses on the following annotation layers: as a basis for linking according to preconceptions, definitional text segments and entities of term use were annotated. In text-to-hypertext conversion, the annotation of phoric and text-deictic references shall establish cohesive closeness (e.g. dissolution of pronouns whose antecedents exceed the module borders). We developed a pattern for the annotation of topical structures.

See also:

Beißwenger, Michael/Wellinghoff, Sandra (February 2003, revised June 1, 2006): Content and composition of the specialized text corpus. Documentation.( PDF 65 KB )

Download the specialized text corpus

=> Back to top

Demo prototype HyTex.1

We have nearly completed the development of a demo prototype by means of which the different strategies of text-to-hypertext conversion may be tested. For that purpose, the core corpus was annotated according to logical text structure and in regard to definitions and entities of term use; the annotation in regard to phenomena of co-reference and connectives is not yet completed. The strategies of text-to-hypertext conversion were applied (segmentation and linking).

=> Screenshots...

=> The prototype...

=> Back to top

TermNet - modeling and processing of terminological knowledge

By borrowing from the description concepts introduced in  WordNet , we created a terminology net (TermNet) which contains central concepts and terms of the specialized text domain. We enhanced these concepts with relations that are domain-specific and those that are relevant for the German language. We use tools of  Intelligent Views to create and administer the TermNet. Based on inferences conducted on the terminological net, we automatically generate a hypertextual glossary as well as visualizations of excerpts from the net through which one may navigate. These visualizations are implemented in SVG.

Our statistics show the various units which have been modeled (TermSets, lexemes, different types of relations).

See also:

Modeling of a terminology net for automatic linking on the basis of WordNet (project publication; PDF 389 KB)

Modeling of the terminological knowledge net TermNet (documentation; PDF 548 KB)

Process operations of the terminological net (documentation; PDF 243)

=> Back to top

Technical implementation

The technical implementation is conducted on the basis of XML technologies. The different annotation layers are unified and converted into a web-based presentation format by means of the programming language XSLT. Doing this, the TermNet is analyzed as well. In future work, this transformation shall not be directly programmed by means of XSLT. Instead, it shall be programmed in HTTL (Hypertext Transformation Language), a programming language we developed ourselves for generating hypertext views.

=> Back to top


( Deutsch )

deutschicon.gif