Annotating an article online

3/30/2024

Markup aims to incorporate desirable features from existing tools, whilst introducing novel features to assist annotators and streamline the annotation process. As such, opportunities to utilise the information embedded in unstructured documents for NLP and AI may be limited.

The time-consuming nature of this process, combined with the value of annotator time, can make it infeasible for individual groups to annotate the quantities of documents necessary to train large-scale, accurate AI systems. However, it takes an experienced annotator an average of 15–30 min to annotate a document that has 41 data elements embedded ( 11). The structured annotations that result from annotating a document with an annotation tool can be used as the building blocks of datasets for training and developing NLP tools and Artificial Intelligence (AI) systems. There is also a growing emphasis on web-based software that does not rely on local installations, as shown by tools such as Anafora ( 10). INCEpTION and ezTAG use active learning, adapting to user annotations and improving suggestions over time ( 8, 9). The Rapid Text Annotation Tool generates pre-annotations from a gold standard annotated corpus. These approaches fall into two categories: pre-annotation and active learning.

Machine learning has been used in some tools to speed up the annotation process by providing annotation suggestions to the user. Users can set up projects with annotation schemas and are able to compute inter-annotator agreement across sessions. More recent annotation tools, such as the brat based WebAnno, TeatTat, and Marky have introduced useful features that emphasise distributed annotation tasks ( 5– 7). Like brat, these tools are run locally on user machines ( 3, 4).

Some tools, such as the extensible Human Oracle Suite of Tools and Knowtator, include the ability to annotate against pre-existing ontologies such as UMLS. The brat rapid annotation tool (brat) is widely used and allows users to annotate both entities and attributes within entities (e.g., negation status), and define linkage, or relationships between entities ( 2). Several tools have been developed with the aim of assisting annotators throughout the annotation processes. Support from such tools could consist of highlighting key phrases, capturing detailed attributes within a phrase, or suggesting domain-specific annotations ( 1). It is therefore important to provide domain experts with the tools necessary to annotate unstructured text rapidly and accurately. We demonstrate a real-world use case of how Markup has been used in a healthcare setting to annotate structured information from unstructured clinic letters, where captured annotations were used to build and test NLP applications. Markup incorporates NLP and Active Learning (AL) technologies to enable rapid and accurate annotation using custom user configurations, predictive annotation suggestions, and automated mapping suggestions to both domain-specific ontologies, such as the Unified Medical Language System (UMLS), and custom, user-defined ontologies. We present Markup ( ) an open-source, web-based annotation tool that is undergoing continued development for use across all domains. Annotation tools can assist with this process by providing functionality that enables the accurate capture and transformation of unstructured texts into structured annotations, which can be used individually, or as part of larger Natural Language Processing (NLP) pipelines. However, the unstructured nature of free-text data poses a significant challenge for its utilisation due to the necessity of substantial manual intervention from domain-experts to label embedded information. These potential data sources often contain rich information that could be used for domain-specific and research purposes.

3Neurology Department, Morriston Hospital, Swansea Bay University Health Board, Swansea, United KingdomĪcross various domains, such as health and social care, law, news, and social media, there are increasing quantities of unstructured texts being produced.
2Swansea University Medical School, Swansea University, Swansea, United Kingdom.
1Health Data Research UK, Swansea University Medical School, Swansea University, Swansea, United Kingdom.
Owen Pickrell 2,3 Beata Fonferko-Shadrach 2 Carys Jones 2 Ashley Akbari 1,2 Simon Thompson 1,2 Arron Lacey 1,2 *

0 Comments

Annotating an article online

Leave a Reply.

Author

Archives

Categories