The simplest and most familiar example is the document comparison feature in Microsoft Word. The most basic function is taking two different versions of the same document and letting these tools highlight the differences. Most sophisticated tools can perform high-end linguistic analysis, such as tagging parts of speech (POS), creating concordances, collating versions, analysing sentiments and keyword density/prominence, visualizing patterns, exploring intertextual parallels and modelling topics.
TEXT CONVERSION AND ENCODING
Unselect
TEXT VISUALIZATION
Unselect
TEXT TRANSCRIPTION
Unselect
TEXT EDITING AND PROCESSING
Unselect
On a most basic level, this is simply adding notes or glosses to a document, for instance, putting sticky-note comments on a PDF file for personal use. But it can also be done on web pages and HTML files and shared among a community of readers. This process usually involves a body, an anchor and a marker: that is, the text of the note, the material to which it specifically refers and the way the connection is indicated (e.g. with a circle or underline). These markers are by now common and well-known and they derive from the same notation culture first formulated in medieval manuscripts.
TEXT CONVERSION AND ENCODING
Unselect
TEXT VISUALIZATION
Unselect
TEXT TRANSCRIPTION
Unselect
TEXT EDITING AND PROCESSING
Unselect
Every text in computer format is encoded with tags, whether this is apparent to the user or not. Everything from font and point size, bold, italics and underline, line and paragraph spacing, justification and superscripts are the result of such coding tags. Common formats include RTF, plain text and robustly coded text. Text converters transform all these tags from on format to another so they can be used in different applications. Originally many of these converters were stand-alone applications. Now they are add-ons, or they are embedded within a program so that a user can, for example, create a PDF, an HTML, or an ASCII file from a Microsoft Word document or create an EPUB file directly from an Adobe InDesign file.
TEXT CONVERSION AND ENCODING
Unselect
TEXT VISUALIZATION
Unselect
TEXT TRANSCRIPTION
Unselect
TEXT EDITING AND PROCESSING
Unselect
These tools or applications generally allow users to perform the following operations in text documents: write, search, cut, paste, format, do and undo, check spelling and grammar, outline and generate tables of contents. They can also include capabilities for HTML processing. Among humanities scholars these are among the most commonly used digital tools.
TEXT CONVERSION AND ENCODING
Unselect
TEXT VISUALIZATION
Unselect
TEXT TRANSCRIPTION
Unselect
TEXT EDITING AND PROCESSING
Unselect
When text material is incorporated into scholarly research, it often first needs to be converted into information that can be analysed for patterns. Developing software to derive this information from text has been a major undertaking of several digital humanities efforts. These programs extract data from text according to certain parameters and deliver the data in useful file formats.
TEXT CONVERSION AND ENCODING
Unselect
TEXT VISUALIZATION
Unselect
TEXT TRANSCRIPTION
Unselect
TEXT EDITING AND PROCESSING
Unselect
There are several different types of tools for this process that automatically convert input into a standard text file format.
Optical Character Recognition (ORC). These tools that automatically recognize characters and create documents from digital images of text. This is particularly effective for standard type, such as printed books and magazines, but great advances have been made on recognizing handwritten documents and a vast array of non-Western alphabets.
Handwriting Recognition (HWR). These tools allow users to transcribe handwriting and produce documents. Their effectiveness for reading manuscript books has evolved greatly over the past decade, but they still require much direct intervention or “instruction” on the part of a researcher.
Music Recognition. These tools can process a printed score and create editable music files.
Speech Recognition. Speech recognition software enables a user to automatically convert audio files, such as mp3s, to text. It is particularly useful for personal notes, but also for interviews, and can be applied to both user-created materials and materials downloaded from other sources.
TEXT CONVERSION AND ENCODING
Unselect
TEXT VISUALIZATION
Unselect
TEXT TRANSCRIPTION
Unselect
TEXT EDITING AND PROCESSING
Unselect
These tools take text and create various visual representations of texts and words, such as semantic maps and word clouds.
TEXT CONVERSION AND ENCODING
Unselect
TEXT VISUALIZATION
Unselect
TEXT TRANSCRIPTION
Unselect
TEXT EDITING AND PROCESSING
Unselect
There are several different types of tools that assist a user in converting images or recordings of words into digital information in a standard text file format. There are also tools that facilitate crowdsourcing documents on the web. Through the New York Public Library’s What’s on the Menu?, for example, participants have transcribed more than one million dishes from more than ten thousand menus.
Speech to Text Transcription. These tools allow users to transcribe audio files in various formats. Many of these facilitate the process by eliminating the need to alternate between an audio player and a text editor. For instance, a user can load an audio file of a speech and have tools to control the audio on the same page where there is a window for transcribing the text.
Text to Text Transcription. These tools allow users to make transcriptions of the digital images of documents in the same interface, presenting the image alongside a text-editing window. For instance, a user can upload an image of a handwritten letter in one window and transcribe the letter into text format in a window alongside.
TEXT CONVERSION AND ENCODING
Unselect
TEXT VISUALIZATION
Unselect
TEXT TRANSCRIPTION
Unselect
TEXT EDITING AND PROCESSING
Unselect