This chapter describes methods for encoding textual content with MEI. It is divided
into section: One part deals with 9.1 Text Structures in MEI, the other with 9.2 Text in MEI. While the first covers structures of textual documents such as front matter or back
matter, the latter describes how to mark up features and various entities within a
text, such as names, tables or quotes. These features may appear both within data
(a figure scribbled as marginal annotation into a score by a bored second violin…) and metadata (a number of dates within a text about the creation of a work). Accordingly, many of the elements and models explained in this chapter
are used to encode 3 Metadata in MEI. However, they should not be confused with the elements from the 8 Lyrics and Performance Directions chapter, which deals with performed text in MEI.
Most of the elements described here take inspiration from encoding formats that deal
primarily with text, such as HTML and the Text Encoding Initiative (TEI). These elements
are provided to encode relatively basic textual information. For deeper encoding of
text, these Guidelines recommend consideration of other text-specific encoding formats
with embedded MEI markup.
9.1Text Structures
This chapter focuses on the text that accompanies the score, i.e., paratext (prefatory material, back matter, appendices, etc.).
9.1.1Organizing Text into Divisions
Text can be organized in different parts, for example in chapters or sections. The
div element is used to encode such structural divisions.
Major structural division of text, such as a preface, chapter or
section.
For example, printed scores, before the actual notation, can have text that can be
organized in multiple sections (e.g., a preface, a critical report, performance instructions, etc. for which see the following
sections); each of these sections should be identified by a different div element. Text might also occur in between music sections (see 2.1.2.2 Content of Musical Divisions), for example in a collection of romantic piano works, a few pieces might be preceded
or followed by poetry. Such text should be encoded with the div element, as demonstrated in the following example:
Textual divisions may have titles or other forms of introductory material, which are
encoded with the head element.
Contains any heading, for example, the title of a section of text, or the
heading of a list.
The following example shows the encoding of a preface translated into three different
languages, each with a different heading:
Having said that div identifies any structural organization of text, it is often helpful to distinguish
the typology of division. The type attribute can be used for this purpose.
Characterizes the textual division in some sense, using any convenient classification
scheme or typology that employs single-token labels.
type may contain any number of space-separated tags describing the nature of the div (or, in fact, any other element). The following example shows the use of type (in combination with n) to indicate three prefaces in English, German and Italian are columns on the same
page.
9.1.2Paratext
This section introduces paratextual material, such as title pages, prefaces, indexes
and other text that precedes or follows the actual score.
9.1.2.1Front Matter
By ‘front matter’ these Guidelines mean distinct sections of a text (usually, but
not necessarily, a printed one), prefixed to it by way of introduction or identification
as a part of its production. Features such as title pages or prefaces are clear examples;
a less definite case might be the prologue attached to a dramatic work. The front
matter of an encoded text should not be confused with the MEI header described in
chapter 3.2 Structure of the MEI Header, which provides metadata for the entire file.
An encoder may choose simply to ignore the front matter in a text, if the original
presentation of the work is of no interest. No specific tags are provided for the
various kinds of subdivision which may appear within front matter: instead, generic
div (“division”) elements may be used, which should not be confused with mdiv (“musical division”) elements. The following suggested values for the type attribute may be used to distinguish various kinds of division characteristic of
front matter:
'preface':
A foreword or preface addressed to the reader in which the author or publisher explains
the content, purpose, or origin of the text.
'ack':
A formal declaration of acknowledgement by the author in which persons and institutions
are thanked for their part in the creation of a text.
'dedication':
A formal offering or dedication of a text to one or more persons or institutions by
the author.
'abstract':
A summary of the content of a text as continuous prose.
'contents':
A table of contents, specifying the structure of a work and listing its constituents.
The list element should be used to mark its structure.
'frontispiece':
A pictorial frontispiece, possibly including some text.
The following extended example demonstrates how various parts of the front matter
of a text may be encoded. The front part begins with a title page, which is presented
in section 3.7.1.2 Title Pages, below. This is followed by a dedication and a preface, each of which is encoded
as a distinct div:
The front matter concludes with another div element, shown in the next example, this time containing a table of contents, which
contains a list element (as described in chapter 9.2.4 Lists). Note the use of the ptr element to provide page-references: the implication here is that the target identifiers
(song1, song2, etc.) will correspond with identifiers used for the mdiv elements containing the individual songs. (For a description of the ptr element, see chapter 13.2 Linking from MEI.)
Alternatively, the pointers in the table of contents might link to the page beginnings
at which a song begins, assuming that these have been included in the markup:
9.1.2.2Back Matter
Conventions vary as to which elements are grouped as back matter and which as front.
For example, some books place the table of contents at the front, and others at the
back. For this reason, the content models of the front and back elements are identical.
The following suggested values may be used for the type attribute on all division elements, in order to distinguish various kinds of divisions
characteristic of back matter:
'appendix':
An ancillary self-contained section of a work, often providing additional but in some
sense extra-canonical text.
'glossary':
A list of terms associated with definition texts (‘glosses’).
'notes':
A section in which textual notes are gathered together.
'bibliography':
A list of bibliographic citations.
'index':
Any form of index to the work.
'colophon':
A statement appearing at the end of a book describing the conditions of its physical
production.
No additional elements are proposed for the encoding of back matter at present. Some
characteristic examples follow; first, an index (for the case in which a printed index
is of sufficient interest to merit transcription):
Note that if the page beginnings in the original source have also been explicitly
encoded, and given identifiers, the references to them in the above index can more
usefully be recorded as links. For example, assuming that the encoding of page 77
of the original source starts like this:
then the last item above might be encoded more usefully in the following form:
9.2Text in MEI
This chapter describes methods for encoding textual content with MEI. Textual information
on scores has several different uses, although some text is closer to music notation
than other kinds. For example, tempo marks, directives and lyrics are directly related
to the functionality of the notated music and are, therefore, described in other chapters
(see for example 8.1 Vocal Text and 2.2.5.1 Text Directives).
Most of the elements described here take inspiration from encoding formats that deal
primarily with text, such as HTML and the Text Encoding Initiative (TEI). These elements
are provided to encode relatively basic textual information. For deeper encoding of
text, these Guidelines recommend consideration of other text-specific encoding formats
with embedded MEI markup.
9.2.1Paragraphs
Paragraphs are fundamental to prose text and typically group one or more sentences
that form a logical passage. Usually, it is typographically distinct; that is, it
usually begins on a new line and the first letter of the content is often indented,
enlarged, or both. This element has a similar meaning as the corresponding elements
in Encoded Archival Description (EAD), Text Encoding Initiative (TEI), and HTML.
One or more text phrases that form a logical prose passage.
Prose text is used for several different purposes within a MEI document, therefore
p can occur in many situations. For example, it may be used within metadata elements
(see 3.2 Structure of the MEI Header):
Alternatively, paragraphs may be part of the document contents (and therefore encoded
within music), either as 9.1.2 Paratext or within the music notation. In these cases, a paragraph will likely be contained
by a div or other elements containing prose (e.g., annot, figDesc, etc.).
The following example shows a paragraph in a preface section:
9.2.2Text Rendition
Sometimes, it is desirable to capture the typographical qualities of a word or phrase
without assigning it a special meaning. For this purpose, MEI offers the rend element, similar to TEI’s hi element. Using CSS-like values, its rend attribute can be used to specify many typographic features, such as font style, font
variants, and relative font size and weight. In addition, text decoration, direction,
and enclosing ‘boxes’ may be captured. While rend is used to record relative font size and weight, absolute values for these qualities
(measured in printer’s points) should be specified using the fontsize and fontweight attributes. In addition to commonly found typographical qualities, MEI provides the
altrend attribute for the capture of additional, user-defined rendition information.
The rend element can accept glyph.auth and glyph.uri attributes, which provide encoders with the ability to specify an external authority
for Unicode codepoints in the textual content. Only the text content that should be
rendered using SMuFL code points should go inside the rend element when using glyph.auth and glyph.uri.
Indicates the size of a font expressed in printers' points, i.e., 1/72nd of an inch,
relative terms, e.g., small, larger, etc., or percentage values relative to normal
size, e.g., 125%.
The fig element groups elements representing or containing graphic information such as an
illustration or figure. This element is modelled on the figure element in the Text
Encoding Initiative (TEI). The fig element is used to contain images, captions, and textual descriptions of the pictures.
The images themselves are specified using the graphic element, whose target attribute provides the location of an image. For example:
The graphic element may occur multiple times within the markup of the figure in order
to indicate the availability of different image formats or resolutions:
9.2.3.1Figure Captions and Descriptions
The element caption may be used to transcribe (or supply) a title or descriptive heading for the graphic
itself, as in the following example:
The figure description (figDesc) element usually contains a brief prose description of the appearance or content
of a graphic figure, for use when documenting an image, perhaps without displaying
it. This element is intended for use as an alternative to the content of its parent
fig element; for example, for display when the equipment in use cannot display graphic
images. It may also be used for indexing or documentary purposes, in which case best
practice suggests the use of controlled vocabulary terms.
Occasionally, a figure description may have a complex structure. In this case, one
or more textual component elements (p [paragraph], table, list, quote, or lg [linegroup]) may be used to model the internal structure of the description:
9.2.3.2Images
The graphic element indicates the location of an inline graphic, illustration, or figure. As
noted above, there exists a wide variety of different graphics formats, and the following
list is in no way exhaustive. Moreover, inclusion of any format in this list should
not be taken as indicating endorsement by the MEI of this format or any products associated
with it. Some of the formats listed here are proprietary to a greater or lesser extent
and cannot therefore be regarded as standards in any meaningful sense. They are, however,
widely used by many different vendors. The following formats are widely used at the
present time, and are likely to remain supported by more than one vendor’s software:
BMP: Microsoft bitmap format
CGM: Computer Graphics Metafile
GIF: Graphics Interchange Format
JPEG: Joint Photographic Expert Group
PBM: Portable Bit Map
PCX: IBM PC raster format
PICT: Macintosh drawing format
PNG: Portable Network Graphics format
Photo-CD: Kodak Photo Compact Disk format
QuickTime: Apple real-time image system
SMIL: Synchronized Multimedia Integration Language format
SVG: Scalable Vector Graphics format
TIFF: Tagged Image File Format
Brief descriptions of all the above are given below. Where possible, current addresses
or other contact information are shown for the originator of each format. Many formal
standards, especially those promulgated by the ISO and many related national organizations
(ANSI, DIN, BSI, and many more), are available from those national organizations.
Addresses may be found in any standard organizational directory for the country in
question.
9.2.3.2.1Vector Graphic Formats
SVG (Scalable Vector Graphics format):
SVG is a language for describing two-dimensional vector and mixed vector or raster
graphics in XML. It is defined by the Scalable Vector Graphics (SVG) 1.0 Specification,
W3C Recommendation, 04 September 2001, available at http://www.w3.org/TR/2001/REC-SVG-20010904/.
PICT (Macintosh drawing format):
This format is universally supported on Macintosh (tm) systems, and readable by a
limited range of software for other systems. Documentation is available from Apple
Computer, Cupertino, California USA.
CGM (Computer Graphics Metafile):
This vector graphics format is specified by an ISO standard, ISO 8632:1987, amended
in 1990. It defines binary, character, and plain-text encodings; the non-binary forms
are safer for blind interchange, especially over networks. Documentation is available
from ISO and from its member national bodies, such as AFNOR, ANSI, BSI, DIN, JIS,
etc.
9.2.3.2.2Raster Graphic Formats
PNG (Portable Network Graphics format):
PNG is a non-proprietary raster format currently widely available. It provides an
extensible file format for the losslessly compressed storage of raster images. Indexed-color,
grayscale, and true-color images are supported, plus an optional alpha channel. Sample
depths range from 1 to 16 bits. It is defined by IETF RFC 2083, March 1997.
TIFF (Tagged Image File Format):
Currently the most widely supported raster image format, especially for black and
white images, TIFF is also one of the few formats commonly supported on more than
one operating system. The drawback to TIFF is that it actually is a wrapper for several
formats, and some TIFF-supporting software does not support all variants. TIFF files
may use LZW, CCITT Group 4, or PackBits compression methods, or may use no compression
at all. Also, TIFF files may be monochrome, greyscale, or polychromatic. All such
options should be specified in prose at the end of the encodingDesc section of the
MEI header for any document including TIFF images. TIFF is owned by Aldus Corporation.
Documentation on TIFF is available from the owner at Craigcook Castle, Craigcook Road,
Edinburgh EH4 3UH, Scotland, or 411 First Avenue South, Seattle, Washington 98104
USA.
GIF (Graphics Interchange Format):
Raster images are widely available in this form, which was created by CompuServe Information
Services, but has by now been implemented for many other systems as well. Documentation
is copyright by, and is available from, CompuServe Incorporated, Graphics Technology
Department, 5000 Arlington Center Boulevard, Columbus, Ohio 43220 USA.
PBM (Portable Bit Map:
PBM files are easy to process, eschewing all compression in favor of transparency
of file format. PBM files can, of course, be compressed by generic file-compression
tools for storage and transfer. Public domain software exists which will convert many
other formats to and from PBM. Documentation of PBM is copyright by Jeff Poskanzer,
and is available widely on the Internet.
PCX (IBM PC raster format):
This format is used by most IBM PC paint programs, and supports both monochrome and
polychromatic images. Documentation is available from ZSoft Corporation, Technical
Support Department, ATTN: Technical Reference Manual, 450 Franklin Rd. Suite 100,
Marietta, GA 30067 USA.
BMP (Microsoft bitmap format):
This format is the standard raster format for computer using Microsoft Windows (tm)
or Presentation Manager (tm). Documentation is available from Microsoft Corporation.
9.2.3.2.3Photographic and Motion Video Formats
JPEG (Joint Photographic Experts Group):
This format is sponsored by CCITT and by ISO. It is ISO/IEC Draft International Standard
10918-1, and CCITT T.81. It handles monochrome and polychromatic images with a variety
of compression techniques. JPEG per se, like CCITT Group IV, must be encapsulated
before transmission; this can be done via TIFF, or via the JPEG File Interchange Format
(JFIF), as commonly done for Internet delivery.
Photo-CD (Kodak Photo Compact Disk format):
This format was introduced by Kodak for rasterizing photographs and storing them on
CD-ROMs (about one hundred 35mm file images fit on one disk), for display on televisions
or CD-I systems. Information on Photo-CD is available from Kodak Limited, Research
and Development, Headstone Drive, Harrow, Middlesex HA1 4TY, UK.
9.2.4Lists
When a text contains lists, they can be encoded with the following elements:
Single instance or exemplar of a source/manifestation.
The list element can identify any kind of list; the form attribute can be used to specify whether the list is ordered, unordered etc. Each
item in the list is encoded with the li element. The n can be used to record a label for a list item, as in the following example:
Occasionally, lists have headers or titles, which can be encoded with head:
9.2.5Tables
The element table contains text displayed in tabular form, i.e., in rows and columns. A table is the least ‘graphic’ of the elements discussed in
this chapter. Almost any text structure can be presented as a series of rows and columns:
one might, for example, choose to show a glossary or other form of list in tabular
form, without necessarily regarding it as a table. When tabular presentation is regarded
as of less intrinsic importance, it is correspondingly simpler to encode descriptive
or functional information about the contents of the table, for example to identify
one cell as containing a name and another as containing a date, though the two methods
may be combined.
The table element may appear both within other components (such as paragraphs), or between
them, provided that the module defined in this chapter has been enabled. It is to
a large extent arbitrary whether a table should be regarded as a series of rows or
as a series of columns. For compatibility with currently available systems, however,
these Guidelines require a row-by-row description of a table.
While rows and columns are always encoded in top-to-bottom, left-to-right order, formatting
properties such as those provided by CSS may be used to specify that they should be
displayed differently.
9.2.5.1Rows
The tr (table row) element is a formatting element that contains one or more td or th elements (cells) in a table. A cell is the intersection of a row and a column. The precise rendition of the table
and its cells should be specified in a style steet.
9.2.5.2Cells
The td (table data) element designates a table cell that contains data as opposed to a cell
that contains column or row heading information. The colspan and rowspan attributes provide tabular rendering information. They indicate that a particular
cell or row of a table spans more than one row or column.
The th (table header) element designates a table cell containing column or row heading information
as opposed to one containing data. The colspan and rowspan attributes tabular display rendering information. They indicate that a particular
cell or row of a table spans more than one row or column.
9.2.6Quotation
It is common, in many types of texts, to find quotations. A quotation is typically
attributed to another text other than the one being encoded. Often, the quoted material
is typographically distinct from the surrounding text; i.e., surrounded by so-called ‘quote marks’ or rendered as a separate block of text. The
quote element is used to mark this function:
Contains a paragraph-like block of text attributed to an external
source, normally set off from the surrounding text by spacing or other typographic
distinction.
This lg (line group) element is used generically to encode any section of text that is organized
as a group of lines. Following the recommendations of the Text Encoding Initiative,
it is recommended to use it, along with the following elements, for marking up poetry:
May be used for any section of text that is organized as a group of lines;
however, it is most often used for a group of verse lines functioning as a formal
unit, e.g., a
stanza, refrain, verse paragraph, etc.
Contains a single line of text within a line group.
Because lg groups verses, it can be used to encode additional stanzas not integrated into the
music notation. In addition, it is common for a poem to include a title or a header,
as is demonstrated by the following example:
9.2.8Names
The name element may be used to mark up portions of a text that function as name.
The name element is intended for generic applications and may be used to identify any named
entity, such as a person, item, application, place, etc. Sometimes, however, a more
specific encoding is desired, identifying the type of entity by using dedicated elements.
MEI offers an (optional) module for this, which provides such elements for various
types of names.
9.2.9Dates
The date element may be used to mark up portions of a text that denote a date.
A string identifying a point in time or the time period between two such points.
The element date contains a date in any format, including a date range. A date range may be expressed
as textual content or, when intervening punctuation is present, as a combination of
date sub-elements and text.
To be more specific about the date, the attributes in the att.datable and att.calendared classes can be used:
This element is useful when it is necessary to provide specific information about
numeric data, such as the unit of measurement or the kind of quantity described, or
when it should be displayed in a special manner.
9.2.11Addresses
Addresses may be encoded using the address element, which itself may hold an arbitrary number of addrLine elements.
It is important to note that the address element does not hold a reference to the person or organization whose address is
specified. This must be provided in a separate element, as in the following example:
9.2.12Bibliographic Citations and References
The following element is used in the encoding of bibliographic citations and references:
A person or organization who transcribes a musical composition, usually for a different
medium from that of the original; in an arrangement the musical substance remains
essentially
unchanged.
Non-bibliographic details of the creation of an intellectual entity, in narrative
form,
such as the date, place, and circumstances of its composition. More detailed information
may
be captured within the history element.
A word or text phrase that indicates a difference in either
content or form between the item being described and a related item previously issued
by the
same publisher/distributor (e.g., 2nd edition, version 2.0, etc.), or simultaneously issued by
either the same publisher/distributor or another publisher/distributor (e.g., large print
edition, British edition, etc.).
Names of individuals, institutions, or organizations responsible for funding. Funders
provide financial support for a project; they are distinct from sponsors, who provide
intellectual support and authority.
Groups information about the current physical location of a
bibliographic item, such as the repository in which it is located and its shelf mark(s),
and
its previous locations.
Transcription of text that names one or more individuals,
groups, or in rare cases, mechanical processes, responsible for creation, realization,
production, funding, or distribution of the intellectual or artistic content.
Names of sponsoring individuals, organizations or institutions. Sponsors give their
intellectual authority to a project; they are to be distinguished from funders, who
provide
the funding but do not necessarily take intellectual responsibility.
These elements fall into the following categories: - identification of the bibliographic
entity and those responsible for its intellectual content - publication and distribution
data for the bibliographic entity - description of the physical characteristics of
the item - annotation of the bibliographic citation and additional details regarding
the item’s intellectual content
The identifier for a given item may be an International Standard Book/Music Number, Library of Congress
Control Number, a publisher’s or plate number, a personal identification number, an
entry in a bibliography or catalog, etc.
To classify the title according to some convenient typology, the type attribute may be used. Sample values include: main (main title), subordinate (subtitle,
title of part), abbreviated (abbreviated form of title), alternative (alternate title
by which the work is also known), translated (translated form of title), uniform (collective
title). The type attribute is provided for convenience in analysing titles and processing them according
to their type; where such specialized processing is not necessary, there is no need
for such analysis, and the entire title, including subtitles and any parallel titles,
may be enclosed within a single title element. Title parts may be encoded in title sub-elements. The name of the list from which a controlled value is taken may be
recorded using the authority attribute.
Publication and distribution data may be captured using pubPlace, publisher, distributor, and date elements directly inside bibl when the citation is unstructured. However, these elements should be grouped within
imprint whenever practical.
The physical characteristics of the cited item may be described using the extent element.
Annotation of the bibliographic citation and the provision of other pertinent details
are addressed by several elements. Commentary on the bibliographic item or citation
is accommodated by the annot and creation elements. The annot element is provided for generic comments, while creation is intended to hold information about the context of the creation of the cited item.
Terms by which the bibliographic item can be classified may be placed in genre. For letters and other correspondence, recipient captures the name of the person or organization to whom the item was addressed. The
natural language(s) of the item may be recorded in one or more textLang elements. Finally, a holding institution may be documented using the repository element directly within bibl, but physLoc should be used whenever possible as a grouping mechanism for location and shelfmark
information. To identify sub-units of the holding institution, repository sub-elements may be used. The name of the list from which a controlled value for
the agency name is taken may be recorded using the authority attribute.
When supplied with a target attribute, bibl may function as a hypertext reference to an external electronic resource. In addition,
other related bibliographic items may be described or referenced using the relatedItem element.
Please consult 9.2.8 Names and 9.2.9 Dates for more information about recording the names and dates frequently found in bibliographic
citations.
9.2.12.1Related Items
In some situations it is necessary to provide references from one bibliographic item
to another. For these situations, MEI offers the relatedItem element. A relatedItem may be used inside of bibl, and may either point to a different entity using its target attribute, or may hold the related item as a child.
In this example, the nested relatedItem / bibl provides information about the ‘container’ where the outer bibl may be found. The kind of relation is expressed using the rel attribute. It describes the relationship of the child bibl to the relatedItem’s parent bibl.
Describes the relationship between the entity identified by the relatedItem element and the resource described in the parent element, i.e., bibl, source or relatedItem.
In these relations, the subject is always the relatedItem, and the object is always
the parent of the relatedItem. Thus, a value of rel="preceding" indicates that the resource described within the relatedItem (or referenced
by its target attribute) precedes the bibl containing the relatedItem. Following MODS, both values of preceding and succeeding indicate a temporal order.
Provides a statement explaining the text or indicating the basis for an
assertion.
This element may be contained by a wide range of other elements and may contain a
large number of other elements. While this offers great flexibility in addressing
the wide variety of textual features that might occur within an annotation, it may
lead to markup that cannot be effectively processed mechanistically.
In all cases, annot provides a comment upon a feature of the encoding, but never contains textual transcription.
Depending on its context, an annotation will deal with either its parent element,
or, more usually, with the element(s) specified in its plist attribute. This attribute uses URI references to link to one or more other elements
using their xml:id attribute values, as in the following example: