This Chapter of the MEI Guidelines is based on the older MEI v3 release. It may contradict the current state of the MEI specifications as documented in the Elements, Attribute Classes, Model Classes, Data Types and Macro Groups sections. The Community is currently working to update these Guidelines. Of course, help is greatly appreciated. In case you would like to contribute, please reach out to us.
This is a placeholder that needs to be filled.
This chapter describes methods for encoding textual content with MEI. Textual information on scores has several different uses, although some text is closer to music notation than other kinds. For example, tempo marks, directives and lyrics are directly related to the functionality of the notated music and are, therefore, described in other chapters (see for example Vocal Text and Text Directives). This chapter, on the other hand, focuses on the text that accompanies the score, i.e., paratext (prefatory material, title pages, back matter, appendices, etc.), titles, prose, poetry, etc.
Most of the elements described here take inspiration from encoding formats that deal primarily with text, such as HTML and the Text Encoding Initiative (TEI). These elements are provided to encode relatively basic textual information. For deeper encoding of text, these Guidelines recommend consideration of other text-specific encoding formats with embedded MEI markup.
Text can be organized in different parts, for example in chapters or sections. The div element is used to encode such structural divisions.
For example, printed scores, before the actual notation, can have text that can be organized in multiple sections (e.g. a preface, a critical report, performance instructions, etc. for which see the following sections); each of these sections should be identified by a different div element. Text might also occur in between music sections (see Content of Musical Divisions), for example in a collection of romantic piano works, a few pieces might be preceded or followed by poetry. Such text should be encoded with the div element, as demonstrated in the following example:
Textual divisions may have titles or other forms of introductory material, which are encoded with the head element.
The following example shows the encoding of a preface translated into three different languages, each with a different heading:
Having said that div identifies any structural organization of text, it is often helpful to distinguish the typology of division. The attributes @type and @subtype can be used for this purpose. It is required that @type be present when @subtype is used, though their values can be freely set by the encoder.
The following example shows the use of @type to indicate three prefaces in English, German and Italian are columns on the same page.
Paragraphs are fundamental to prose text and typically group one or more sentences that form a logical passage. A paragraph is usually typographically distinct: The text begins on a new line and the first letter of the content is often indented, enlarged, or both.
A paragraph is encoded with the p element:
Alternatively, paragraphs may be part of the document contents (and therefore encoded within music), either as Paratext or within the music notation. In these cases, a paragraph will likely be contained by a div or other elements containing prose (e.g. annot, figDesc, etc.).
The following example shows a paragraph in a preface section:
When a text contains lists, they can be encoded with the following elements:
The list element can identify any kind of list; the @form attribute can be used to specify whether the list is ordered, unordered etc. Each item in the list is encoded with the li element. The @n can be used to record a label for a list item, as in the following example:
Occasionally, lists have headers or titles, which can be encoded with head:
It is common, in many types of texts, to find quotations. A quotation is typically attributed to another text other than the one being encoded. Often, the quoted material is typographically distinct from the surrounding text; i.e., surrounded by so-called ‘quote marks’ or rendered as a separate block of text. The quote element is used to mark this function:
The following examples show the use of quote.
This lg (line group) element is used generically to encode any section of text that is organized as a group of lines. Following the recommendations of the Text Encoding Initiative, it is recommended to use it, along with the following elements, for marking up poetry:
Because lg groups verses, it can be used to encode additional stanzas not integrated into the music notation. In addition, it is common for a poem to include a title or a header, as is demonstrated by the following example:
This section introduces paratextual material, such as title pages, prefaces, indexes and other text that precedes or follows the actual score.
By ‘front matter’ these Guidelines mean distinct sections of a text (usually, but not necessarily, a printed one), prefixed to it by way of introduction or identification as a part of its production. Features such as title pages or prefaces are clear examples; a less definite case might be the prologue attached to a dramatic work. The front matter of an encoded text should not be confused with the MEI header described in chapter The MEI Header, which provides metadata for the entire file.
An encoder may choose simply to ignore the front matter in a text, if the original presentation of the work is of no interest. No specific tags are provided for the various kinds of subdivision which may appear within front matter: instead, generic div (“division”) elements may be used, which should not be confused with mdiv (“musical division”) elements. The following suggested values for the @type attribute may be used to distinguish various kinds of division characteristic of front matter:
‘preface’: A foreword or preface addressed to the reader in which the author or publisher explains the content, purpose, or origin of the text.
‘ack’: A formal declaration of acknowledgement by the author in which persons and institutions are thanked for their part in the creation of a text.
‘dedication’: A formal offering or dedication of a text to one or more persons or institutions by the author.
‘abstract’: A summary of the content of a text as continuous prose.
‘contents’: A table of contents, specifying the structure of a work and listing its constituents. The list element should be used to mark its structure.
‘frontispiece’: A pictorial frontispiece, possibly including some text.
The following extended example demonstrates how various parts of the front matter of a text may be encoded. The front part begins with a title page, which is presented in section Title Pages, below. This is followed by a dedication and a preface, each of which is encoded as a distinct div:
The front matter concludes with another div element, shown in the next example, this time containing a table of contents, which contains a list element (as described in chapter Lists). Note the use of the ptr element to provide page-references: the implication here is that the target identifiers (song1, song2, etc.) will correspond with identifiers used for the mdiv elements containing the individual songs. (For a description of the ptr element, see chapter Pointers and References.)
Alternatively, the pointers in the table of contents might link to the page breaks at which a song begins, assuming that these have been included in the markup:
Detailed analysis of the title page and other preliminaries of older printed books and manuscripts is of major importance in descriptive bibliography and the cataloging of printed books; such analysis, however, requires a more detailed approach than the general one described here. The following elements are suggested as a means of encoding the major features of most title pages for faithful rendition:
The following example shows the encoding of the title page of Vaughan Williams’ On Wenlock Edge. Note the use of the lb element to mark the line breaks present in the original.
The physical rendition of title page information is often of considerable importance. One approach to this requirement would be to use the rend element, described in chapter Text Rendition to specify the rendition of each of the components of the title page. Another would be to employ a CSS stylesheet. Finally, a module customized for the description of typographic entities such as pages, lines, rules, etc., bearing special-purpose attributes to describe line-height, leading, degree of kerning, font, etc. could be employed.
Conventions vary as to which elements are grouped as back matter and which as front. For example, some books place the table of contents at the front, and others at the back. For this reason, the content models of the front and back elements are identical.
The following suggested values may be used for the @type attribute on all division elements, in order to distinguish various kinds of divisions characteristic of back matter:
‘appendix’: An ancillary self-contained section of a work, often providing additional but in some sense extra-canonical text.
‘glossary’: A list of terms associated with definition texts (‘glosses’).
‘notes’: A section in which textual notes are gathered together.
‘bibliography’: A list of bibliographic citations.
‘index’: Any form of index to the work.
‘colophon’: A statement appearing at the end of a book describing the conditions of its physical production.
No additional elements are proposed for the encoding of back matter at present. Some characteristic examples follow; first, an index (for the case in which a printed index is of sufficient interest to merit transcription):
Note that if the page breaks in the original source have also been explicitly encoded, and given identifiers, the references to them in the above index can more usefully be recorded as links. For example, assuming that the encoding of page 77 of the original source starts like this:
then the last item above might be encoded more usefully in the following form:
Apart from music and text, musical documents, both historical and contemporary, may also contain material in graphical or tabular format. In such materials, details of layout and presentation may also be of comparatively greater significance or complexity than they are for running text. Although some types of graphical material can be represented directly with markup, it is more common practice to include such information by using a reference to an external entity (typically a URL) encoded in a suitable graphical format.
The module defined by this chapter defines special purpose ‘container’ elements that can be used to encapsulate occurrences of such data within an MEI-conformant document in a portable way. Specific recommendations for the encoding of figures, figure descriptions and graphics as well as tables with their sub-elements tr, td and th are provided at the beginnig of this chapter. As there exists a wide variety of different graphic formats, a short list of formats that are widely used at the present time, is given in section Images. Each one includes a very brief description. The chapter closes with attribute and model classes which are defined by the module.
The module described in this chapter makes available the following components:
The fig element groups elements representing or containing graphic information such as an illustration or figure. This element is modelled on the figure element in the Text Encoding Initiative (TEI). The fig element is used to contain images, captions, and textual descriptions of the pictures. The images themselves are specified using the graphic element, whose @target attribute provides the location of an image. For example:
The graphic element may occur multiple times within the markup of the figure in order to indicate the availablity of different image formats or resolutions:
The element caption may be used to transcribe (or supply) a title or descriptive heading for the graphic itself, as in the following example:
The figure description ( figDesc) element usually contains a brief prose description of the appearance or content of a graphic figure, for use when documenting an image, perhaps without displaying it. This element is intended for use as an alternative to the content of its parent fig element; for example, for display when the equipment in use cannot display graphic images. It may also be used for indexing or documentary purposes, in which case best practice suggests the use of controlled vocabulary terms.
Occasionally, a figure description may have a complex structure. In this case, one or more textual component elements ( p [paragraph], table, list, quote, or lg [linegroup]) may be used to model the internal structure of the description:
The graphic element indicates the location of an inline graphic, illustration, or figure. As noted above, there exists a wide variety of different graphics formats, and the following list is in no way exhaustive. Moreover, inclusion of any format in this list should not be taken as indicating endorsement by the MEI of this format or any products associated with it. Some of the formats listed here are proprietary to a greater or lesser extent and cannot therefore be regarded as standards in any meaningful sense. They are, however, widely used by many different vendors. The following formats are widely used at the present time, and are likely to remain supported by more than one vendor’s software:
Brief descriptions of all the above are given below. Where possible, current addresses or other contact information are shown for the originator of each format. Many formal standards, especially those promulgated by the ISO and many related national organizations (ANSI, DIN, BSI, and many more), are available from those national organizations. Addresses may be found in any standard organizational directory for the country in question.
CGM: Computer Graphics Metafile: This vector graphics format is specified by an ISO standard, ISO 8632:1987, amended in 1990. It defines binary, character, and plain-text encodings; the non-binary forms are safer for blind interchange, especially over networks. Documentation is available from ISO and from its member national bodies, such as AFNOR, ANSI, BSI, DIN, JIS, etc.
SVG: Scalable Vector Graphics format: SVG is a language for describing two-dimensional vector and mixed vector or raster graphics in XML. It is defined by the Scalable Vector Graphics (SVG) 1.0 Specification, W3C Recommendation, 04 September 2001, available at http://www.w3.org/TR/2001/REC-SVG-20010904/.
PICT: Macintosh drawing format: This format is universally supported on Macintosh (tm) systems, and readable by a limited range of software for other systems. Documentation is available from Apple Computer, Cupertino, California USA.
PNG: Portable Network Graphics format: PNG is a non-proprietary raster format currently widely available. It provides an extensible file format for the losslessly compressed storage of raster images. Indexed-color, grayscale, and true-color images are supported, plus an optional alpha channel. Sample depths range from 1 to 16 bits. It is defined by IETF RFC 2083, March 1997.
TIFF: Tagged Image File Format: Currently the most widely supported raster image format, especially for black and white images, TIFF is also one of the few formats commonly supported on more than one operating system. The drawback to TIFF is that it actually is a wrapper for several formats, and some TIFF-supporting software does not support all variants. TIFF files may use LZW, CCITT Group 4, or PackBits compression methods, or may use no compression at all. Also, TIFF files may be monochrome, greyscale, or polychromatic. All such options should be specified in prose at the end of the encodingDesc section of the MEI header for any document including TIFF images. TIFF is owned by Aldus Corporation. Documentation on TIFF is available from the owner at Craigcook Castle, Craigcook Road, Edinburgh EH4 3UH, Scotland, or 411 First Avenue South, Seattle, Washington 98104 USA.
GIF: Graphics Interchange Format: Raster images are widely available in this form, which was created by CompuServe Information Services, but has by now been implemented for many other systems as well. Documentation is copyright by, and is available from, CompuServe Incorporated, Graphics Technology Department, 5000 Arlington Center Boulevard, Columbus, Ohio 43220 USA.
PBM: Portable Bit Map: PBM files are easy to process, eschewing all compression in favor of transparency of file format. PBM files can, of course, be compressed by generic file-compression tools for storage and transfer. Public domain software exists which will convert many other formats to and from PBM. Documentation of PBM is copyright by Jeff Poskanzer, and is available widely on the Internet.
PCX: IBM PC raster format: This format is used by most IBM PC paint programs, and supports both monochrome and polychromatic images. Documentation is available from ZSoft Corporation, Technical Support Department, ATTN: Technical Reference Manual, 450 Franklin Rd. Suite 100, Marietta, GA 30067 USA.
BMP: Microsoft bitmap format: This format is the standard raster format for computer using Microsoft Windows (tm) or Presentation Manager (tm). Documentation is available from Microsoft Corporation.
JPEG: Joint Photographic Experts Group: This format is sponsored by CCITT and by ISO. It is ISO/IEC Draft International Standard 10918-1, and CCITT T.81. It handles monochrome and polychromatic images with a variety of compression techniques. JPEG per se, like CCITT Group IV, must be encapsulated before transmission; this can be done via TIFF, or via the JPEG File Interchange Format (JFIF), as commonly done for Internet delivery.
Photo-CD: Kodak Photo Compact Disk format: This format was introduced by Kodak for rasterizing photographs and storing them on CD-ROMs (about one hundred 35mm file images fit on one disk), for display on televisions or CD-I systems. Information on Photo-CD is available from Kodak Limited, Research and Development, Headstone Drive, Harrow, Middlesex HA1 4TY, UK.
The element table contains text displayed in tabular form, i.e., in rows and columns. A table is the least ‘graphic’ of the elements discussed in this chapter. Almost any text structure can be presented as a series of rows and columns: one might, for example, choose to show a glossary or other form of list in tabular form, without necessarily regarding it as a table. When tabular presentation is regarded as of less intrinsic importance, it is correspondingly simpler to encode descriptive or functional information about the contents of the table, for example to identify one cell as containing a name and another as containing a date, though the two methods may be combined.
The table element may appear both within other components (such as paragraphs), or between them, provided that the module defined in this chapter has been enabled. It is to a large extent arbitrary whether a table should be regarded as a series of rows or as a series of columns. For compatibility with currently available systems, however, these Guidelines require a row-by-row description of a table.
While rows and columns are always encoded in top-to-bottom, left-to-right order, formatting properties such as those provided by CSS may be used to specify that they should be displayed differently.
The tr (table row) element is a formatting element that contains one or more td or th elements (cells) in a table. A cell is the intersection of a row and a column. The precise rendition of the table and its cells should be specified in a style steet.
The td (table data) element designates a table cell that contains data as opposed to a cell that contains column or row heading information. The @colspan and @rowspan attributes provide tabular rendering information. They indicate that a particular cell or row of a table spans more than one row or column.
The th (table header) element designates a table cell containing column or row heading information as opposed to one containing data. The @colspan and @rowspan attributes tabular display rendering information. They indicate that a particular cell or row of a table spans more than one row or column.
This chapter describes the MEI module used for the encoding of names (names of persons or corporations/organizations) or descriptive phrases for styles, periods or geographical indications. In section Names, Dates, Numbers, Abbreviations, and Addresses it was noted that the elements provided in the core module allow an encoder to specify that a given text segment is a proper noun. The elements provided by the present module allow the encoder to supply a detailed sub-structure for such proper nouns, and to distinguish explicitly between persons and organizations, and between stylistic, periodical or geographical indications.
The chapter begins by discussing the elements provided for the encoding of names ( name) and dates ( date) in general and finishes by addressing more specific elements for corporate names ( corpName), geographic names ( geogName), period names ( periodName), personal names ( persName) and style names ( styleName). In general it is recommended to use standardized forms of proper nouns and to record the names and web-accessible locations of the controlled vocabularies used. There are several commonly-referenced authority files, especially for geographical, organizational and personal names, such as the Gemeinsame Normdatei (GND), the Library of Congress Authorities, the Getty Thesaurus of Geographic Names (TGN), and the MARC code list for relators. Recommendations on which standards could be used can be found in the descriptions of the individual elements.
The basic elements for capturing names and dates are defined in the shared module:
The name element contains the name of an entity that is difficult to tag more specifically as a corpName, geogName, persName, or title. In section Names, Dates, Numbers, Abbreviations, and Addresses it was noted that the name element may be used in place of the more specific elements when it is not known what kind of name is being described or when a high degree of precision is not necessary. For example, the name element might be used when it is not clear whether the name “Bach” refers to a person or a geographic feature. When name parts are needed, use name sub-elements. The recommended values for the @type attribute are:
‘pers’: a personal name
‘corp’: the name of a corporate entity
‘place’: a geographic name
‘process’: the name of a process or mechanical agent
The date sub-element is available within name in order to record any dates associated with the name, for example, creation and dissolution in the case of a corporate entity or place or birth and death dates in the case of an individual. The name of the list from which a controlled value is taken, such as the Library of Congress Name Authority File (LCNAF), may be recorded using the authority attribute.
Examples of the use of the name element:
The element date contains a date in any format, including a date range. A date range may be expressed as textual content or, when intervening punctuation is present, as a combination of date sub-elements and text.
To be more specific about the date, the attributes in the att.datable class can be used:
startdate: contains the starting point of a date range in standard ISO form
enddate: contains the end point of a date range in standard ISO form
notbefore: contains a lower boundary for an uncertain date
notafter: contains an upper boundary for an uncertain date
isodate: gives the value of a textual date in standard ISO form
calendar: indicates the system or calendar to which a date belongs, for example, Gregorian, Julian, Roman, Mosaic, Revolutionary, Islamic, etc.
cert: signifies the degree of certainty or precision associated with a feature (high, medium, low, unknown)
In the following example, the ambiguous date text “5/3/05” is resolved using the @isodate attribute:
Corporate names are non-personal names which refer to structured bodies of one or more persons that act as a single entity. Typical examples include associations, businesses, projects or institutions (e.g.,. ‘the Royal College of Music’ or ‘the BBC’), but also racial or ethnic groupings or political factions where these are regarded as forming a single agency. Organization names typically include some type of indicator or pattern or words that help identify them as non-personal names.
It may also be used wherever it is necessary to mark a corporate name, for example when a corporation is responsible for a certain event in the history of a musical work:
When it is necessary to provide structure for a name, the separate parts of the name may be encoded in corpName sub-elements, for example:
Standard designations for corporate bodies can be taken from a controlled vocabulary, such as the Gemeinsame Normdatei (GND). If a controlled value is used, the list from which it is taken should be recorded. In this case, the following attributes are particularly relevant:
authority: to record the list from which a controlled value is taken,
authURI (authority URI): to record the web-accessible location of the controlled vocabulary from which the value is taken,
codedval (coded value): to record a value which serves as a primary key in an external database.
Geographic names are proper noun designations for places (e.g.,. Baltimore, Maryland), natural features (e.g.,. Black Forest) or political jurisdictions (e.g.,. Quartier Latin, Paris).
The element can be used, e.g., to label geographical names in titles:
Geographic name sub-parts may be encoded in geogName sub-elements. For example:
Alternatively, geographic name sub-parts may be encoded using the following more specific elements:
In contrast to the way addrLine is used to mark the physical arrangement of the parts of an address, these elements can be used to mark the semantic components of an address. For example:
They may also be used to identify place name components within textual content:
To enable localization of an organization, or to specify names of places with identical names, the use of controlled vocabulary is recommend for names of administrative divisions, such as cities, states, and countries. In this case, the following attributes are particularly relevant:
authority: records the list from which a controlled value is taken, e.g., the Thesaurus of Geographic Names (TGN),
authURI (authority URI): records the web-accessible location of the controlled vocabulary from which the value is taken,
codedval (coded value): records a value which serves as a primary key in an external database.
The encoder may use these attributes in combination. In case of the German city of Frankfurt, for example, a clarification whether Frankfurt am Main or Frankfurt an der Oder is meant can be achieved by referring to the ID of the TGN entry:
The names of places given within addresses can be marked with geogName elements, for example:
The periodName element is for names which describe a particular period of time, for example, those which characterize obvious similarities in style, such as ‘Baroque’ or ‘3rd Style Period’:
The date sub-element is available within periodName in order to record any dates associated with the name that should be captured in the text, for example, start and end dates of the named period:
Recording start and end points of a certain period using the @startdate and @enddate attributes may prove to be better for machine processing:
If a controlled value is used, the list from which it is taken should be recorded. In this case the following attributes are relevant:
authority: to record the list from which a controlled value is taken,
authURI (authority URI): to record the web-accessible location of the controlled vocabulary from which the value is taken,
codedval (coded value): to record a value which serves as a primary key in an external database.
Personal names within an MEI document may simply be marked with the persName element containing a proper noun or proper noun phrase referring to an individual. For example:
Apart from the composer or originator of a musical work, there could be many other persons involved in the genesis of a musical work, such as librettists, lyricists, arrangers, editors, transcribers, printers, publishers, etc. In addition, sometimes a single individual may have multiple functions with regard to a musical work, e.g. composer and librettist. The @role attribute on persName may be used to capture a person’s responsibility. For example:
The Marc code list for relators offers a variety of controlled terms that may serve as values for this use of @role.
Personal names often consist of several components, such as given names, surnames, inherited or life-time titles of nobility, honorific or academic prefixes, military ranks, and other traditional descriptive phrases. These components can be marked using persName sub-elements, the function of which may be indicated using the @type attribute with the following values:
‘forename’: contains a forename, given or baptismal name.
‘surname’: a family (inherited) name, as opposed to a given, baptismal, or nick name.
‘rolename’: contains a name component which indicates that the referent has a particular role or position in society, such as an official title or rank.
‘addname’ (additional name): contains an additional name component, such as a nickname, epithet, or alias, or any other descriptive phrase used within a personal name.
‘namelink’ (name link): contains a connecting phrase or link used within a name but not regarded as part of it, such as van der or of.
‘genname’ (generational name): contains a name component used to distinguish otherwise similar names on the basis of the relative ages or generations of the persons named.
However, the recommended practice is to employ the following sub-elements provided the namesDates module:
In the case of individuals with more than one forename, it is often sufficient to place all given names within a single foreName element:
However, the advantage of marking names with specific name part elements instead of nested persName elements, is that it becomes possible to use the @type attribute to distinguish between multiple instances of the same generic name component. The following example indicates the function of each of the given names of Wolfgang Mozart:
The use of a controlled list, such as the Gemeinsame Normdatei (GND) or the Library of Congress Name Authorities, is recommended for names, especially those occurring within the metadata header. When a controlled value is used, information about the the value should be recorded. The following attributes are provided for this purpose:
For maximal machine-processability, these three attributes may be used in combination. For example:
Music can be divided into different styles, genres, and forms. The term style denotes a mode of expression, or more particularly, the manner in which a work of art is executed:
“In the discussion of music, which is orientated towards relationships rather than meanings, the term raises special difficulties; it may be used to denote music characterized of an individual composer, of a period, of a geographical area or center, or of a society or social function. For the aesthetician style concerns surface or appearance, though in music appearance and essence are ultimately inseparable. For the historian a style is a distinguishing and ordering concept, both consistent of and denoting generalities; he or she groups examples of music according to similarities between them.” (Source: “Style”, Grove Music Online, accessed: April 27, 2012)
The name of a musical style can be marked by the styleName element, for example:
It may be, e.g., used for recording a style name within a title:
or to record a style of a certain epoch by using the styleName sub-element:
9.1. Text in MEI 9.1.1. Organizing Text into Divisions 9.1.2. Paragraphs 9.1.3. Lists 9.1.4. Quotation 9.1.5. Poetry 9.1.6. Paratext 9.1.7. Back Matter 9.2. Figures and Tables 9.2.1. Figures 9.2.2. Tables 9.3. Names and Dates 9.3.1. Basic Elements for Names and Dates 9.3.2. Specialized Name and Date Elements