Text-grammatical foundations for the (semi)automated text-to-hypertext conversionResearch topics and goalsConverting linear text documents into documents publishable in a hypertext environment is a complex task requiring conversion software on the technical side as well as conversion strategies and methods on the conceptual side. While most of the research on text-to-hypertext conversion has concentrated on technical aspects or was related to specific projects and systems, there is now a growing need for general principles and strategies for handling conceptual problems of text-to-hypertext conversion such as:
The project focuses on these conceptual problems, using XML as the technical basis for hypertext modelling and viewing. The central idea of the project is to base conversion strategies on annotations which explicitly mark-up the text-grammatical structures and relations between text segments, e.g. co-reference relations, semantics of connectives, text-deictic expressions, and expressions indicating topic handling. The project developed a methodology which (semi)-automatically constructs hypertext layers and views, using the text-grammatical annotations. Our conversion approach operates on two levels:
In this approach, we store the hypertext views as an additional document layer. Since we preserve structure and content of the original text documents, the reader still has the choice between sequential and selective (hypertext-driven) reading modes. The users that we have in mind in generating our hypertext views are in search for information in a scientific domain in which they have previous but no expert knowledge. Their time is constrained, and they have to solve a very specific type of problem. Such situations are typical for many contexts, e.g. interdisciplinary research, scientific journalism, or specialised lexicography. In scenarios like these, users often read excursively and perceive only parts of longer documents. When these documents are sequentially organised, i.e. designed to be read from the beginning to the end, this selective reading may result in coherence problems. For example, a reader, jumping right in the middle of a sequential document, may not understand (or may misunderstand) a paragraph because he lacks the prerequisite knowledge given in the preceding text. The goal of our conversion approach is to generate hypertext views on sequential documents which avoid these coherence problems and make selective reading and browsing more efficient and more convenient than it would be possible with printmedia. Feasability and performance of the conversion methodology are tested and evaluated using a sample text corpus containing documents of the domain "hypertext research" and "text-technology". => back to top Project phase I (finished)The textgrammar-based conversion methodology was developed and tested using 20 German documents from a corpus dealing with the subject domains "text technology" and "hypertext research". On the document level of our two-level architecture, these documents were annotated on three annotation layers:
The annotations of these three levels were then combined using a unification approach developed in our partner project On the domain knowledge level we represent all technical terms occurring in these documents in a WordNet-style representation. The technical basis for this representation is On this basis we implemented our segmentation, linking, and reorganization procedures that generate hypertext views on these documents. Our conversion strategies were implemented and evaluated in a demonstration prototype. For further information please consult our list of publications . => back to top Project phase IIThe HyTex project is in its second phase since August 2005. The main issues of this phase are:
For further information please consult our list of publications . => back to top |