12Facsimiles and Recordings

MEI can be used to connect an encoding of some sort – either a transcription of existing material, or the specification of some expected output in some form – with existing sources. This existing material may be in different formats – music notation in any combination of print and manuscript, or audio or video footage. The concepts for establishing such connections between encoded music and source material is described in the following chapters.

12.1Facsimiles

Most often, MEI is used for the preparation of a digital musical text based on an existing music document, or with the intention of rendering the encoded notation into a document or audio rendition. MEI can, however, be used to provide a different kind of digital reproduction of a source document, which relies on the description and provision of digital imagery. Both approaches may be combined, so that the encoding of the musical content and digital facsimiles may add different facets to the same MEI document.

12.1.1Elements of the Facsimile Module

This module makes available the following elements for encoding facsimiles:

Contains a representation of a written source in the form of a set of images rather than as transcribed or encoded text.
Defines a writing surface in terms of a rectangular coordinate space, optionally grouping one or more graphic representations of that space, and rectangular zones of interest within it.
Defines an area of interest within a surface or graphic file.

These element are used to add a separate subtree to MEI, starting with the facsimile element inside music, as seen in the following example:

<mei>
<meiHead>
<!-- metadata header -->
</meiHead>
<music>
<facsimile>
<!-- The facsimile subtree starts here. -->
</facsimile>
<body>
<!-- The encoding of the musical content goes here. -->
</body>
</music>
</mei>
Listing 396.

It is possible to have more than one facsimile element in this location. This is especially useful when multiple sources are encoded in the same file using the mechanisms described in chapter 11.2 Editorial Markup of these Guidelines. In this case, the decls (declarations) attribute of facsimile may be used to refer to a source defined in the document’s header, as seen in the following example:

<mei>
<meiHead>
<fileDesc>
<sourceDesc>
<source xml:id="facsimile.source1">
<!-- description of source -->
</source>
</sourceDesc>
</fileDesc>
</meiHead>
<music>
<facsimile decls="#facsimile.source1">
<!-- facsimile content -->
</facsimile>
</music>
</mei>
Listing 397.

When using the FRBR model (see 3.5 Functional Requirements for Bibliographic Records (FRBR)), it is equally possible to reference a manifestation element instead of source.

Within a facsimile element, each page of the source is represented by a surface element. Each surface may be assigned an identifying string utilizing the label attribute. In addition, it may encapsulate more detailed metadata about itself in a figDesc element. The coordinate space of the surface may be recorded in abstract terms in the ulx, uly, lrx, and lry attributes. For navigation purposes, surface has a startid attribute that accommodates pointing to the first object appearing on this particular writing surface.

<facsimile>
<surface label="page 1" lrx="2000" lry="3000" startid="#measure1" ulx="0" uly="0"/>
</facsimile>
Listing 398.

Within surface elements, one may nest one or more graphic elements, each providing a reference to an image file that represents the writing surface. Multiple graphic elements are permitted in order to accommodate alternative versions (different resolutions or formats, for instance) of the surface image. In spite of changes in resolution or format, all images must contain the same content, i.e., the entire writing surface. A graphic may refer to a single page within a multi-page document, which is – at least for Adobe PDF documents – available through a #page=X suffix to the target attribute.

<facsimile>
<surface>
<graphic height="2000px" target="image1.jpg" width="3000px"/>
<graphic height="1000px" target="image1smaller.jpg" width="1500px"/>
<graphic height="200px" target="image1smallest.png" width="300px"/>
</surface>
</facsimile>
<facsimile>
<surface>
<graphic height="297mm" target="source1.pdf#page=1" width="210mm"/>
</surface>
<surface>
<graphic height="297mm" target="source1.pdf#page=2" width="210mm"/>
</surface>
</facsimile>
Listing 399.

The preceding markup will provide the basis for most page-turning applications. Often, however, it is desirable to focus attention on particular areas of the graphical representation of the surface. The zone element fulfills this purpose:

<surface lrx="3000" lry="2000" ulx="0" uly="0">
<graphic height="2000px" target="image1.jpg" width="3000px"/>
<zone lrx="370" lry="410" ulx="300" uly="200"/>
<zone lrx="439" lry="410" ulx="367" uly="200"/>
<zone lrx="512" lry="410" ulx="436" uly="200"/>
</surface>
Listing 400.

The coordinates of each zone define a space relative to the coordinate space of its parent surface. Note that this is not necessarily the same coordinate space defined by the width and height attributes of the graphic that represents the surface. The zone coordinates in the preceding example do not represent regions within the graphic, but rather regions of the writing surface.

Because the coordinate space of a zone is defined relative to that of a surface, it is possible to provide multiple graphic elements and multiple zone elements within a single surface. In the following example, two different images representing the entire surface are provided alongside specification of two zones of interest within the surface:

<surface lrx="3000" lry="2000" ulx="0" uly="0">
<graphic height="2000px" target="image1.jpg" width="3000px"/>
<graphic height="1995px" target="image1cropped.jpg" width="2995px"/>
<zone lrx="370" lry="410" ulx="300" uly="200"/>
<zone lrx="30" lry="30" ulx="0" uly="0"/>
</surface>
Listing 401.

A zone element may contain figDesc or graphic elements that provide detailed descriptive information about the zone and additional images, e.g., at a different/higher resolution, of the rectangle defined by the zone. The data objects contained within the zone may also be specified through the use of the data attribute, which contains ID references to one more elements in the content tree of the MEI file, such as a note, measure, etc.

<!-- In the facsimile subtree: -->
<zone data="#facsimile.measure1" xml:id="facsimile.zone1"/>
<!-- somewhere in the content: -->
<measure xml:id="facsimile.measure1">
<!-- measure content -->
</measure>
Listing 402.

Conversely, an element in the content may refer to the facsimile subtree using its facs attribute, which is made available by the att.facsimile attribute class. The last example could therefore be encoded with pointers in the other direction:

<!-- In the facsimile subtree: -->
<zone xml:id="facsimile.zone2"/>
<!-- somewhere in the content: -->
<measure facs="#facsimile.zone2" xml:id="facsimile.measure2">
<!-- measure content -->
</measure>
Listing 403.

The pb element defined in the 2 Shared Concepts in MEI makes special use of the facs attribute, in that it does not point to a zone, but a surface element instead. A pb marks the beginning of a page, so it can be concluded that all elements in the content tree which are encoded between any two pb elements encode musical symbols written on the page (surface) referenced by the first of these two pb element’s facs attribute.

The encoding of facsimile elements is intended to support sequential display of page images. If an encoder wants to describe the physical setup of a source document, the foliaDesc element is more appropriate. The difference of both approaches, and how to combine them, is described in chapter 3.7.1.5 Description of Folia.

12.2Performances

This chapter describes the ‘performance’ module, which can be used for organizing audio and video files of performances of a musical work. The elements provided allow the encoder to group different recordings of the same performance, identify temporal segments within the recordings, and encode simple alignments with a music text.

12.2.1Overview

The following elements are available to encode information about a recorded performance:

A presentation of one or more musical works.
A recorded performance.
References an external digital audio or video file.
Defines a time segment of interest within a recording or within a digital audio or video file.
Indicates a point in time either absolutely (using the absolute attribute), or relative to another when element (using the since, interval and inttype attributes).

The performance element begins a subtree of the music element and appears alongside with, or instead of, body (described in 2.1.2 General Music Structure Elements) and facsimile (described in 12.1 Facsimiles). A performance element represents one recorded performance event. As a performance may be recorded in multiple formats or by different personnel or using different equipment, the performance element may group one or more recordings of the event.

The decls attribute can be used to point to performance medium metadata for the performed work. See 3.6.7 Performance Medium and 3.5 Functional Requirements for Bibliographic Records (FRBR) for more details.

The recording element identifies a single recording event taking place within an absolute temporal space. The class att.mediaBounds contains attributes that can be used to define this space:

@
Specifies a point where the relevant content begins. A numerical value must be less and a time value must be earlier than that given by the end attribute.
@
Specifies a point where the relevant content ends. If not specified, the end of the content is assumed to be the end point. A numerical value must be greater and a time value must be later than that given by the begin attribute.
@
betype (att.mediaBounds)
Type of values used in the begin/end attributes. The begin and end attributes can only be interpreted meaningfully in conjunction with this attribute.

The avFile element identifies an external file associated with a recording act. In the simplest case, the recording element will contain one avFile element identifying a file that represents it. The target attribute contains the URI of the digital media file. Use of the mimetype attribute is recommended for the avFile element. Its value should be a valid MIME media type defined by the Internet Engineering Task Force in RFC 2046. It is also recommended that all avFile elements have a recording or clip parent which bears the begin, end, and betype attributes.

<performance>
<recording begin="00:00:00.00" betype="time" end="00:03:10.00">
<avFile mimetype="audio/wav" target="http://example.com/path/to/audio/recording"/>
</recording>
</performance>
Listing 404.

Sometimes, multiple digital files are created in order to provide greater flexibility in redistribution and playback capabilities. In this case, multiple avFile elements may occur, each with a different mimetype. Keep in mind, however, that each file still represents the complete temporal extent of the recording act in spite of the change of file format:

<performance>
<recording begin="00:00:00.00" betype="time" end="00:03:10.00">
<avFile mimetype="audio/wav" target="http://example.com/path/to/audio/recording/wav"/>
<avFile mimetype="audio/mpeg" target="http://example.com/path/to/audio/recording/mpeg"/>
</recording>
</performance>
Listing 405.

The clip element identifies a temporal segment of a recording act. In the following example, the clip begins two minutes into the timeframe of the recording and ends 20 seconds later:

<recording begin="00:00:00.00" betype="time" end="00:03:10.00">
<clip begin="00:02:00.00" betype="time" end="00:20:20.00"/>
</recording>
Listing 406.

Beyond these relatively simple uses, complex situations may occur that require equally complex markup. For example, a single performance may be represented by multiple digital media files. Because they have differing durations, the media files must be the result of separate recording acts, even if these recording acts took place at the same time:

<performance>
<recording begin="00:00:00.00" betype="time" end="00:03:10.00">
<avFile mimetype="audio/wav" target="http://example.com/path/to/audio/recording/wav"/>
</recording>
<recording begin="00:00:00.00" betype="time" end="00:03:15.00">
<avFile mimetype="audio/mpeg" target="http://example.com/path/to/audio/recording/mpg"/>
</recording>
</performance>
Listing 407.

A single performance may also be represented by multiple, sequential digital files, as when a complete work is recorded in several so-called ‘takes’. In this case, the files may be considered to be parts of a single recording act, the extent of which is the combined extent of the individual clips. For example, a series of clip elements may be used to identify each movement of a piece and give start and end times for the movements in relation to the overall temporal space of the complete work:

<performance>
<recording>
<clip begin="00:00:00.00" betype="time" end="00:07:00.00" n="mov1">
<avFile mimetype="audio/aiff" target="movement01.aiff"/>
</clip>
<clip begin="00:07:01.00" betype="time" end="00:12:03.00" n="mov2">
<avFile mimetype="audio/aiff" target="movement02.aiff"/>
</clip>
</recording>
</performance>
Listing 408.

Similar markup is also applicable when a single file representing the entirety of a recording act is broken into segments later, as is often done for practical storage and distribution reasons. The file from which the clips are derived is indicated using an avFile element:

<performance>
<recording begin="00:00:00.00" betype="time" end="00:12:03.00" n="completeWork">
<avFile mimetype="audio/aiff" target="completeWork.aiff"/>
<clip begin="00:00:00.00" betype="time" end="00:07:00.00" n="mov1">
<avFile mimetype="audio/aiff" target="movement01.aiff"/>
</clip>
<clip begin="00:07:02.00" betype="time" end="00:12:03.00" n="mov2">
<avFile mimetype="audio/aiff" target="movement02.aiff"/>
</clip>
</recording>
</performance>
Listing 409.

A clip may be used to define any region of interest, such as a cadenza or a modulation, a song verse, etc. The following example shows the use of clip and its attributes to identify significant sections of a recording:

<performance>
<recording begin="00:00:00.00" betype="time" end="00:05:21.00">
<!-- Exposition -->
<clip begin="00:00:00.00" betype="time" end="00:01:41.00"/>
<!-- Development -->
<clip begin="00:01:41.00" betype="time" end="00:03:14.00"/>
<!-- Recapitulation -->
<clip begin="00:03:14.00" betype="time" end="00:04:28.00"/>
<!-- Coda -->
<clip begin="00:04:28.00" betype="time" end="00:05:21.00"/>
</recording>
</performance>
Listing 410.

The preceding example also demonstrates that media files are not required in order to define the temporal space of a recording act or clip. This makes it possible to set the boundaries of these features, then use the content of the performance element as a rudimentary "edit decision list" to create the matching digital files.

If an encoding of the notated text with which the media files are associated is included in the MEI file, the startid attribute can be used to indicate the first element in the sequence of events to which the recording corresponds:

<performance>
<recording begin="00:00:00.00" betype="time" end="00:07:00.00" n="mov1" startid="#performance.m1_1">
<avFile mimetype="audio/aiff" target="fullpiece.aiff"/>
</recording>
</performance>
<!-- ... -->
<body>
<mdiv>
<score>
<section>
<measure n="1" xml:id="performance.m1_1">
<!-- ... -->
</measure>
</section>
</score>
</mdiv>
</body>
Listing 411.

Clips can also be aligned with components of the musical text encoded in the body. The startid attribute can be used to specify the starting element in the sequence of events to which the clip corresponds. The following example shows the use of clip elements to identify the exposition of the first movement from Beethoven’s piano sonata Op. 14, no. 2 and its concluding ‘codetta’.

<performance>
<recording begin="00:00:00.00" betype="time" end="00:05:21.00">
<avFile mimetype="audio/aiff" target="BeethovenOp14N2-Mov1.aiff"/>
<!-- Exposition -->
<clip begin="00:00:0.00" betype="time" end="00:01:41.00" startid="#performance.m1"/>
<!-- Exposition's "codetta" -->
<clip begin="00:01:31.00" betype="time" end="00:01:41.00" startid="#performance.m48"/>
</recording>
</performance>
<!-- ... -->
<body>
<mdiv>
<score>
<section>
<measure n="1" xml:id="performance.m1">
<!-- ... -->
</measure>
<!-- ... -->
<measure n="48" xml:id="performance.m48">
<!-- ... -->
</measure>
</section>
</score>
</mdiv>
</body>
Listing 412.

Please note that the begin and end times of clips may overlap. In the preceding example, the extent of the codetta is contained within that of the exposition. Overlapping beginning and ending points may also be used to provide additional performance context for a segment or because there is uncertainty with regard to precise values for these points.

<performance>
<recording begin="00:00:00.00" betype="time" end="00:03:06.54">
<!-- a section of interest -->
<clip begin="00:00:00.00" betype="time" end="00:00:41.00"/>
<!-- the following section starts a little before the end of the previous one to give some "adjustment" time -->
<clip begin="00:00:31.00" betype="time" end="00:01:07.00"/>
<!-- the boundaries of the following section are "fuzzy" -->
<clip begin="00:02:18.00" betype="time" end="00:02:49.85"/>
</recording>
</performance>
Listing 413.

A bibliographic description of a recording or metadata explaining how clip boundaries were determined may be associated with the recording and clip elements via the decls attribute:

<performance>
<recording begin="00:00:00.00" betype="time" decls="#performance.recBibDesc" end="00:03:06.54">
<!-- a section of interest -->
<clip begin="00:00:00.00" betype="time" end="00:00:41.00"/>
<!-- the following section starts a little before the end of the previous one to give some "adjustment" time -->
<clip begin="00:00:31.00" betype="time" decls="#performance.clipDesc" end="00:01:07.00"/>
<!-- the boundaries of the following section are "fuzzy" -->
<clip begin="00:02:18.00" betype="time" end="00:02:49.85"/>
</recording>
</performance>
Listing 414.

Associations between a feature of the encoding, such as a note, dynamic mark, or annotation, and a time point, may be created using when elements and when attributes.

The when element identifies a particular point in time during the playback of a media file, such as an audio recording.

<when absolute="00:00:01.915291666" abstype="time" xml:id="t1"/>
Listing 415.

Time points may be identified in absolute terms as above; that is, in hours, minutes, and seconds since the beginning of the recording, or in relative terms using the interval, inttype, and since attributes. In the following example, the time point of interest happens 48 frames after the occurrence of the point labelled as "t1".

<when interval="48" inttype="smpte-ndf29.97" since="#t1" xml:id="t1.1"/>
Listing 416.

Having identified a point of interest, another feature of the encoding may be associated with this point using its when attribute:

<annot resp="#LvB" when="#t1">
<p>I like this part!</p>
</annot>
Listing 417.

One use of the association created between the annotation and the time point is to display the text of the annotation as the recording or clip is played.

The when attributes allows only a single value, so only one-to-one relationships can be created using this mechanism. However, one-to-many relationships are accommodated in the opposite direction; that is, from a time point to other features of the markup. For example,

<when xml:id="t1.2" absolute="00:00:01.915291666" abstype="time" data="#feature1 #feature2 #feature3"/>
Listing 418.

indicates that the entities identified in data all occur at the same instant.

extData is a container for holding non-MEI data formats, similar to extMeta but available in when rather than in meiHead. extData allows for data from audio or other sources to be linked to notes or other score events. Data should be enclosed in a CDATA tag.

The following example shows JSON formatted performance data encoded with extMeta for a single note (presumed to be defined elsewhere in the document as with the ID "note_1"). Both single-value summaries (e.g., pitch) and time series values (e.g., contF0) are encoded.

<when absolute="00:00:00.00" xml:id="when_1" data="#note_1">
<extData> <![CDATA[ {"offset": "00:00:02.9005", "pitch": "455.98", "contF0": [454.3737606, 454.7165531, 455.2337513, 455.4622624, 456.0605954]} ]]> </extData>
</when>
Listing 419.