WRID>RIFF-WAVE>iXML
iXML chunk
"All iXML parameters and objects are OPTIONAL. Readers must not assume that any particular parameters exist."
[IXML2004]
"... to ensure that no products are developed which enforce schema compatibility, it has been decided NOT to create a schema for iXML."
[IXML2004]
Buckle-up folks, you're going to see a lot of variation in iXML chunks.
iXML root tags (WRID>RIFF-WAVE>iXML)
TODO: overview
| Chunk | Name | Bytes | Type | condition | Description | WRID |
|---|---|---|---|---|---|---|
| iXML | id |
4 |
u8[4] |
id = "iXML"
|
...>id | |
| iXML | size |
4 |
u32 |
|
The raw bytes of this chunk should be interpreted as XML, encoded in any unicode encoding. | ...>size |
| iXML | ixml_version |
|
String |
|
The version number of the iXML specification used to prepare the iXML audio file. This version appears in the front page at http://www.ixml.info, and takes the form of x.y where x and y are whole numbers, for example 1.51 | ...>ixml_version |
| iXML | project |
|
String |
|
The name of the project to which this file belongs. This might typically be the name of the motion picture or program which is in production. | ...>project |
| iXML | scene |
|
String |
|
The name of the scene / slate being recorded. For US system this might typically be 32, 32A, 32B, A32B, 32AB etc. For UK system this might typically be a incrementing number with no letters. | ...>scene |
| iXML | tape |
|
String |
|
The SoundRoll which identifies a group of recordings. Normally, the SoundRoll is a vital component of workflow to differentiate audio recorded with time of day on different days. In other words for 2 (completely different) recordings each covering a period around 11am, the soundroll would differentiate them by (typically) telling you which shooting day this recording applies to. Some projects may turnover sound more than once per day, and increment the soundroll at this point. In any event, the soundroll should change at least once in any 24 hour period. Some systems change the soundroll for every recording which is also a valid option, in effect using the soundroll as a unique file identifer (although this function is explicitly provided with the iXML FILE_UID parameter). | ...>tape |
| iXML | take |
|
String |
|
The number of the take in the current scene or slate. Usually this will be a simple number, although variations for things like wild tracks may yield takes like 1, 2, 3, WT1, WT2 etc. | ...>take |
| iXML | take_type |
|
[Enum] |
|
(New in iXML v2.0) A dictionary based tag allowing selection from a defined list of values to explicitly categorise the type/purpose/function of the current take. This tag overlaps with the existing NO_GOOD / FALSE_START / WILD_TRACK which are deprecated in iXML 2.0, This tag can contain multiple entries, separated by commas and can be expanded in the future with additional dictionary entries, detailed in the TAKE_TYPE dictionary. | ...>take_type |
| iXML | (map to take_type) |
|
- |
|
(Deprecated in iXML v2.0, superceded by TAKE_TYPE) This parameter allows a recorder to mark this recording as "no good" (ie, of no use whatsoever, and in effect to be deleted). The value should be TRUE or FALSE. If absent, this should be assumed FALSE. | ...>no_good |
| iXML | (map to take_type) |
|
- |
|
(Deprecated in iXML v2.0, superceded by TAKE_TYPE) This parameter allows a recorder to mark this recording as a false start, this may indicate that another file could exist with the same take number. Typically this file might also be marked as <NO_GOOD>. The value should be TRUE or FALSE. If absent, this should be assumed FALSE. | ...>false_start |
| iXML | (map to take_type) |
|
- |
|
(Deprecated in iXML v2.0, superceded by TAKE_TYPE) This parameter allows a recorder to mark this recording as a wild track, with no specific relationship to any take, although it might be marked with a specific scene, for example when recording ambience for a given location. The value should be TRUE or FALSE. If absent, this should be assumed FALSE. | ...>wild_track |
| iXML | circled |
|
bool |
|
This parameter allows a recorder to mark this recording as a circle-take. The value should be TRUE or FALSE. If absent, this should be assumed FALSE. | ...>circled |
| iXML | file_uid |
|
String |
|
A unique number which identifies this physical FILE, regardless of the number of channels etc. If your system employs a unique SoundRoll per recording, your FILE_UID and TAPE parameters should be the same. | ...>file_uid |
| iXML | ubits |
|
String |
|
The userbits associated with this recording. This may have been extracted from incoming timecode when the file was recorded, or generated by the recorder from the date, or any other metadata. Typically the userbits are rarely used now because other more explicit metadata supercedes this function. | ...>ubits |
| iXML | note |
|
String |
|
A free text note to add user metadata to the recording. This might typically used to communicate information such as TAIL SLATE, NO SLATE, or to warn of noise interruptions - PLANE OVERHEAD etc. | ...>note |
| iXML | pre_record_samplecount |
|
String |
|
(Deprecated - do not use) This parameter allows a recorder which incorporates a pre-record buffer to communicate how much of the recording came from the pre-record buffer. This is a sample count (like timestamp) offset from the start of the file. Playback applications can, by default skip past the pre-record stage, offering the user a more intuitive playback which starts from the point where the record button was actually pressed. DEPRECATION NOTE: iXML 1.4 defines a SYNC_POINT_FUNCTION for Pre-record sample count, which is preferable to using this old iXML 1.3 deprecated root tag. For full backwards compatibility you should support both the root Pre-Record Samplecount tag, and also override it if one is found inside the Sync Point List. | ...>pre_record_samplecount |
SYNC_POINT_LIST tags (WRID>RIFF-WAVE>iXML>SYNC_POINT_LIST )
TODO: overview
Example:
<SYNC_POINT_LIST>
<SYNC_POINT_COUNT>2</SYNC_POINT_COUNT>
<SYNC_POINT>
<SYNC_POINT_TYPE>RELATIVE</SYNC_POINT_TYPE>
<SYNC_POINT_FUNCTION>PRE_RECORD_SAMPLECOUNT</SYNC_POINT_FUNCTION>
<SYNC_POINT_LOW>930000</SYNC_POINT_LOW>
<SYNC_POINT_HIGH>0</SYNC_POINT_HIGH>
<SYNC_POINT_EVENT_DURATION>0</SYNC_POINT_EVENT_DURATION>
</SYNC_POINT>
<SYNC_POINT>
<SYNC_POINT_TYPE>ABSOLUTE</SYNC_POINT_TYPE>
<SYNC_POINT_FUNCTION>SLATE_GENERIC</SYNC_POINT_FUNCTION>
<SYNC_POINT_COMMENT>Near start</SYNC_POINT_COMMENT>
<SYNC_POINT_LOW>1242</SYNC_POINT_LOW>
<SYNC_POINT_HIGH>0</SYNC_POINT_HIGH>
<SYNC_POINT_EVENT_DURATION>0</SYNC_POINT_EVENT_DURATION>
</SYNC_POINT>
</SYNC_POINT_LIST>
| Chunk | Group | Name | Bytes | Type | Description | WRID |
|---|---|---|---|---|---|---|
| iXML | SYNC_POINT_LIST | sync_point_count |
|
String |
Should appear at the start of the SYNC POINT LIST to allow readers to prepare memory for the following list. | ...>sync_point_count |
| iXML | SYNC_POINT_LIST | sync_point |
|
String |
Multiple sample based counts which represents a sync point for this recording. | ...>sync_point |
| Chunk | Group | Name | Bytes | Type | Description | WRID |
|---|---|---|---|---|---|---|
| iXML | SYNC_POINT | point_type |
|
String |
Can be either RELATIVE or ABSOLUTE, which represents a sample frames count from the start of the file, or an absolute sample count since midnight. Note the relative sample count needs multiplying by the wordsize and number of channels to translated into a byte count from the start of the file. For 32 bit sample counts, only the _LOW parameter is needed, with _HIGH set to zero. For high sample rates, the _HIGH parameter is also needed to communicate 24 hour sample counts. | ...>point_type |
| iXML | SYNC_POINT | function |
|
String |
Determines the function of this sync point. There are a number of defined functions for sync points, indicating things like the Pre-Record Sample Count, or the primary slate. See the Sync Point Function dictionary for a list of defined functions. | ...>function |
| iXML | SYNC_POINT | comment |
|
String |
Allows a note for each sync point to be entered, for example "camera a", "camera b", etc. | ...>comment |
| iXML | SYNC_POINT | low |
|
String |
For 32 bit sample counts, only the _LOW parameter is needed, with _HIGH set to zero. | ...>low |
| iXML | SYNC_POINT | high |
|
String |
For high sample rates, the _HIGH parameter is also needed to communicate 24 hour sample counts. | ...>high |
| iXML | SYNC_POINT | event_duration |
|
String |
Allows a sync point to be a region with a defined start and stop point/duration. This can be useful for playbacks where you would like the play to stop automatically at the end of the sync event. A value of 0 in the SYNC_POINT_EVENT_DURATION implies a non-duration (marker) or unknown duration event. (Note: units for this value are not specified in iXML spec.) | ...>event_duration |
SPEED tag (WRID>RIFF-WAVE>iXML>SPEED)
TODO: overview
Recommend carefully reading the SPEED section of the iXML spec for details and examples for both setting and reading these values.
Example:
<SPEED>
<NOTE>camera overcranked</NOTE>
<MASTER_SPEED>24/1</MASTER_SPEED>
<CURRENT_SPEED>48/1</CURRENT_SPEED>
<TIMECODE_FLAG>NDF</TIMECODE_FLAG>
<TIMECODE_RATE>24000/1001</TIMECODE_RATE>
<FILE_SAMPLE_RATE>48000</FILE_SAMPLE_RATE>
<AUDIO_BIT_DEPTH>24</AUDIO_BIT_DEPTH>
<DIGITIZER_SAMPLE_RATE>48048</DIGITIZER_SAMPLE_RATE>
<TIMESTAMP_SAMPLES_SINCE_MIDNIGHT_HI>0</TIMESTAMP_SAMPLES_SINCE_MIDNIGHT_HI>
<TIMESTAMP_SAMPLES_SINCE_MIDNIGHT_LO>48048000</TIMESTAMP_SAMPLES_SINCE_MIDNIGHT_LO>
<TIMESTAMP_SAMPLE_RATE>48000</TIMESTAMP_SAMPLE_RATE>
</SPEED>
| Chunk | Group | Name | Bytes | Type | Description | WRID |
|---|---|---|---|---|---|---|
| iXML | SPEED | note |
|
String |
User comments about speed. | ...>note |
| iXML | SPEED | master_speed |
|
String |
Speed at which the material will be replayed. | ...>master_speed |
| iXML | SPEED | current_speed |
|
String |
Speed of this recording. Might be different than master_speed. In slow motion for example, master_speed might be "24/1", and current_speed might be "48/1". |
...>current_speed |
| iXML | SPEED | timecode_rate |
|
String |
Timecode frame rate used during record. Ex: 24/1, 25/1, 30000/1001, 24000/1001, 30/1 | ...>timecode_rate |
| iXML | SPEED | timecode_flag |
|
String |
Timecode Drop Frame used during record. Ex: DF for Drop Frame, NDF for Non Drop Frame. Defaults to NDF. Useful to calculate H:m:s:f format timecode. | ...>timecode_flag |
| iXML | SPEED | file_sample_rate |
|
String |
Duplicated from fmt chunk "for convenience". |
...>file_sample_rate |
| iXML | SPEED | audio_bit_depth |
|
String |
Duplicated from fmt chunk "for convenience". |
...>audio_bit_depth |
| iXML | SPEED | digitizer_sample_rate |
|
String |
True wordclock speed of the A-D convertors used during the recording. | ...>digitizer_sample_rate |
| iXML | SPEED | timestamp_samples_since_midnight_hi |
|
String |
Duplicated from fmt chunk "for convenience". |
...>timestamp_samples_since_midnight_hi |
| iXML | SPEED | timestamp_samples_since_midnight_lo |
|
String |
Duplicated from fmt chunk "for convenience". |
...>timestamp_samples_since_midnight_lo |
| iXML | SPEED | timestamp_sample_rate |
|
String |
Sample rate used to calculate the timestamp, and which must be used in mathematic calculations to recover the timecode timestamp for the file. | ...>timestamp_sample_rate |
IMPORTANT - the file samplerate, audio bit depth and timestamp samples since midnight are included redundantly in the SPEED object in order to assemble all the important data in a single place for human readability when troubleshooting workflow problems. iXML Readers should ignore this information and instead take the data from the official fmt and bext chunks in the file. Generic utilities changing file data might change the fmt or bxt chunk but not update the iXML SPEED tag, and for EBU officially specified BWF data like timestamp, the EBU data takes precedence (unlike the unofficial informal bext metadata which is superceded by iXML)
iXML spec
LOUDNESS tag (WRID>RIFF-WAVE>iXML>LOUDNESS)
TODO: overview
Example:
<LOUDNESS>
<LOUDNESS_VALUE></LOUDNESS_VALUE>
<LOUDNESS_RANGE></LOUDNESS_RANGE>
<MAX_TRUE_PEAK_LEVEL></MAX_TRUE_PEAK_LEVEL>
<MAX_MOMENTARY_LOUDNESS></MAX_MOMENTARY_LOUDNESS>
<MAX_SHORT_TERM_LOUDNESS></MAX_SHORT_TERM_LOUDNESS>
</LOUDNESS>
| Chunk | Group | Name | Bytes | Type | Description | WRID |
|---|---|---|---|---|---|---|
| iXML | LOUDNESS | loudness_value |
|
String |
Duplicated from bext chunk. iXML body equivalent to the LoudnessValue field from the BEXT. |
...>loudness_value |
| iXML | LOUDNESS | loudness_range |
|
String |
Duplicated from bext chunk. iXML body equivalent to the LoudnessRange field from the BEXT. |
...>loudness_range |
| iXML | LOUDNESS | max_true_peak_level |
|
String |
Duplicated from bext chunk. iXML body equivalent to the MaxTruePeakLevel field from the BEXT. |
...>max_true_peak_level |
| iXML | LOUDNESS | max_momentary_loudness |
|
String |
Duplicated from bext chunk. iXML body equivalent to the MaxMomentaryLoudness field from the BEXT. |
...>max_momentary_loudness |
| iXML | LOUDNESS | max_short_term_loudness |
|
String |
Duplicated from bext chunk. iXML body equivalent to the MaxShortTermLoudness field from the BEXT. |
...>max_short_term_loudness |
These Loudness fields are the iXML body equivalent to the BWF_LOUDNESS_VALUE etc fields from the BEXT. Since the iXML BEXT object is intended to be a redundant copy of BEXT and of any iXML native information, it would be ignored by most iXML readers, and as such a dedicated native iXML LOUDNESS object is required
iXML spec
FWIW... this explanation doesn't make sense to me. And it makes for yet another place BEXT data is duplicated in iXML. At least the field names match. Luckily, it seems like very few iXML writers are currently writing it. That seems like a good idea to me, since the data is already in the BEXT chunk. And if iXML was being used outside of a WAV file, there is already a full copy of BEXT data in iXML.
HISTORY tag (WRID>RIFF-WAVE>iXML>HISTORY )
The HISTORY object allows tracking of a file's origins, where it may have been created as a derivation of another file
iXML spec
TODO: overview
Example:
<HISTORY>
<ORIGINAL_FILENAME>myname_1.wav</ORIGINAL_FILENAME>
<PARENT_FILENAME>myname.bwf</PARENT_FILENAME>
<PARENT_UID>9876543210</PARENT_UID>
</HISTORY>
| Chunk | Group | Name | Bytes | Type | Description | WRID |
|---|---|---|---|---|---|---|
| iXML | HISTORY | original_filename |
|
String |
Name given to this file when it was created. Using the original_filename metadata allows systems to track back to the original name if it changes. |
...>original_filename |
| iXML | HISTORY | parent_filename |
|
String |
Identification of the source of a derived file. | ...>parent_filename |
| iXML | HISTORY | parent_uid |
|
String |
Identification of the source of a derived file. Likely contains values from file_uid in another file. |
...>parent_uid |
FILE_SET tag (WRID>RIFF-WAVE>iXML>FILE_SET)
TODO: overview
When multiple files should be treated as a group, FILE_SET helps identify other members of the group.
The rules for interpreting these fields are complex, see the the spec for details.
Example:
<FILE_SET>
<TOTAL_FILES>1</TOTAL_FILES>
<FAMILY_UID>MTIPMX17654200508051445053840000</FAMILY_UID>
<FAMILY_NAME>21/33</FAMILY_NAME>
<FILE_SET_INDEX>A</FILE_SET_INDEX>
</FILE_SET>
| Chunk | Group | Name | Bytes | Type | Description | WRID |
|---|---|---|---|---|---|---|
| iXML | FILE_SET | total_files |
|
String |
Total number of companion files in the same set of files. | ...>total_files |
| iXML | FILE_SET | family_uid |
|
String |
Multiple Files which represent a single recording should share a common family_uid. |
...>family_uid |
| iXML | FILE_SET | family_name |
|
String |
Non-unique text name for the file set. | ...>family_name |
| iXML | FILE_SET | index |
|
String |
Origination index in a group of files. For mono files, use indexes 1 to n. for multi-poly files, use letters A, B etc. It is strongly recommended that dual poly recordings should use this tag. | ...>index |
| iXML | FILE_SET | start_time_hi |
|
String |
Rarely set. When a FILE_SET is non-coherent (tracks are enabled or disabled during a recording)L each file of a FILE_SET group MUST include the additional FILE_SET tag <FILE_SET_START_TIME_HI> and <FILE_SET_START_TIME_LO> to locate the group as a whole. Uses the same counting used for the TIMESTAMP in the |
...>start_time_hi |
| iXML | FILE_SET | start_time_lo |
|
String |
As above. | ...>start_time_lo |
TRACK_LIST tag (WRID>RIFF-WAVE>iXML>FILE_SET)
TODO: overview
An umabigous way to identify track labels, track indexing (microphone source identification), multichannel / multifile relationship of tracks and track function identification.
iXML spec
Example:
<TRACK_LIST>
<TRACK_COUNT>2</TRACK_COUNT>
<TRACK>
<CHANNEL_INDEX>1</CHANNEL_INDEX>
<INTERLEAVE_INDEX>1</INTERLEAVE_INDEX>
<NAME>Mid</NAME>
<FUNCTION>M-MID_SIDE</FUNCTION>
</TRACK>
<TRACK>
<CHANNEL_INDEX>2</CHANNEL_INDEX>
<INTERLEAVE_INDEX>2</INTERLEAVE_INDEX>
<NAME>Side</NAME>
<FUNCTION>S-MID_SIDE</FUNCTION>
</TRACK>
</TRACK_LIST>
| Chunk | Group | Name | Bytes | Type | Description | WRID |
|---|---|---|---|---|---|---|
| iXML | TRACK_LIST | track_count |
|
String |
Number of TRACK objects in this group, which would normally match the number of tracks in the file. | ...>track_count |
| Chunk | Group | Name | Bytes | Type | Description | WRID |
|---|---|---|---|---|---|---|
| iXML | TRACK | channel_index |
|
String |
Input or source number on a recorder. For an 8 channel recorder, the inputs might be identified as 1 to 8, however for a given recording, only tracks 4 and 6 may be armed. Integer, counting from 1. | ...>channel_index |
| iXML | TRACK | interleave_index |
|
String |
The index of the track in this file. Integer, counting from 1. | ...>interleave_index |
| iXML | TRACK | name |
|
String |
Track label. | ...>name |
| iXML | TRACK | function |
|
String |
Purpose of the track. Ex: LEFT, RIGHT. See "FUNCTION dictionary" for standard values. | ...>function |
BEXT tag (WRID>RIFF-WAVE>iXML>BEXT)
TODO: overview
Broadcast WAVE files employ a bext chunk to communicate EBU standardised metadata, which complements iXML. In non WAVE files, such as AIFF, it may be desireable to also store an equivalent set of information. For this purpose iXML includes an optional BEXT object which can communicate the standard EBU bext metadata in any file employing iXML. If this
object appears (redundantly) in a WAVE file (for continuity of format) it is ESSENTIAL that the values match those in the official bext chunk in the same file. iXML spec
Example:
<BEXT>
<BWF_DESCRIPTION>all the old stuff</BWF_DESCRIPTION>
<BWF_ORIGINATOR>METACORDER</BWF_ORIGINATOR>
<BWF_ORIGINATOR_REFERENCE>123456</BWF_ORIGINATOR_REFERENCE>
<BWF_ORIGINATION_DATE>2003-10-30</BWF_ORIGINATION_DATE>
<BWF_ORIGINATION_TIME>03:27:17</BWF_ORIGINATION_TIME>
<BWF_TIME_REFERENCE_LOW>123674376</BWF_TIME_REFERENCE_LOW>
<BWF_TIME_REFERENCE_HIGH>0</BWF_TIME_REFERENCE_HIGH>
<BWF_VERSION>1.0</BWF_VERSION>
<BWF_UMID>MTIPMX17654200508051445053840001</BWF_UMID>
<BWF_RESERVED>00000000000000000000000000000000000000000</BWF_RESERVED>
<BWF_CODING_HISTORY>some info</BWF_CODING_HISTORY>
</BEXT>
| Chunk | Group | Name | Bytes | Type | Description | WRID |
|---|---|---|---|---|---|---|
| iXML | BEXT | description |
|
String |
Duplicated from bext chunk. |
...>description |
| iXML | BEXT | originator |
|
String |
Duplicated from bext chunk. |
...>originator |
| iXML | BEXT | originator_reference |
|
String |
Duplicated from bext chunk. |
...>originator_reference |
| iXML | BEXT | origination_date |
|
String |
Duplicated from bext chunk. |
...>origination_date |
| iXML | BEXT | origination_time |
|
String |
Duplicated from bext chunk. |
...>origination_time |
| iXML | BEXT | time_reference |
|
String |
Duplicated from bext chunk. |
...>time_reference |
| iXML | BEXT | version |
|
String |
Duplicated from bext chunk. |
...>version |
| iXML | BEXT | umid |
|
String |
Duplicated from bext chunk. |
...>umid |
| iXML | BEXT | loudness_value |
|
String |
Duplicated from bext chunk. |
...>loudness_value |
| iXML | BEXT | loudness_range |
|
String |
Duplicated from bext chunk. |
...>loudness_range |
| iXML | BEXT | max_true_peak_level |
|
String |
Duplicated from bext chunk. |
...>max_true_peak_level |
| iXML | BEXT | max_momentary_loudness |
|
String |
Duplicated from bext chunk. |
...>max_momentary_loudness |
| iXML | BEXT | max_short_term_loudness |
|
String |
Duplicated from bext chunk. |
...>max_short_term_loudness |
| iXML | BEXT | reserved |
|
String |
Duplicated from bext chunk. |
...>reserved |
| iXML | BEXT | coding_history |
|
String |
Duplicated from bext chunk. |
...>coding_history |
USER tag (WRID>RIFF-WAVE>iXML>USER)
TODO: overview
USER is different than all of the other tags specified in iXML. It acts as both a top level "field" (the text between tags is the value) and as a container for additional user defined tags. It is allowed to have either or both kinds of data.
The USER field is completely defined by the user, hardware or application, it has no defined function, and may contain any kind of human readable data. It is intended that this field be used (comptible with a defined schema) to store miscellaneous information, which is not appropriate for any other field. Applications are free to sub-divide this field with tagging systems like the old bext description, although typically this field is designed to be human readable rather than machine readable, so any tagging should be based on interpretation of human readable, neat text. iXML viewing applications will typically display the entire USER field in one text area. One of the primary functions of this field would typically be to allow extended information about the recording process used, and personnel involved with a field recording. This is typically not file-specific but will be the same for a whole group of recordings, and which appears in the iXML of all recordings. It is ideally suited to storage of the metadata which would normally appear at the top of a sound report, ie. the name of the mixer, contact details etc. UPDATE: iXML v2.0 - the USER field can now contain XML tagged data, providing more explicit and machine readable information, using the list of fields recommended by AFSI:
iXML spec
Example:
<USER>
Production : iXML Test Movie Production
Mixer : Mark Gilbert
Recorder : MetaCorder 1.5
Contact : fieldsound@gallery.co.uk
Location : Leavesden Studios Sound Stage 5
Day : 5
Reference Level : -20dBf
Microphones : Sennheiser MKH-70, Sanken COS-11
<PRODUCTION_NAME>iXML Test Movie Production</PRODUCTION_NAME>
<SOUND_MIXER_NAME>Mark Gilbert</SOUND_MIXER_NAME>
</USER>
| Chunk | Group | Name | Bytes | Type | Description | WRID |
|---|---|---|---|---|---|---|
| iXML | USER | text |
|
String |
The USER field is completely defined by the user, hardware or application, it has no defined function, and may contain any kind of human readable data. It is intended that this field be used (comptible with a defined schema) to store miscellaneous information, which is not appropriate for any other field. Applications are free to sub-divide this field with tagging systems like the old bext description, although typically this field is designed to be human readable rather than machine readable, so any tagging should be based on interpretation of human readable, neat text. iXML viewing applications will typically display the entire USER field in one text area. One of the primary functions of this field would typically be to allow extended information about the recording process used, and personnel involved with a field recording. This is typically not file-specific but will be the same for a whole group of recordings, and which appears in the iXML of all recordings. It is ideally suited to storage of the metadata which would normally appear at the top of a sound report, ie. the name of the mixer, contact details etc. UPDATE: iXML v2.0 - the USER field can now contain XML tagged data, providing more explicit and machine readable information, using the list of fields recommended by AFSI (https://www.afsi.eu/articles/30623-compte-rendu-du-groupe-de-travail-afsi-sur-les-rapports-son). | ...> |
| iXML | USER | full_title |
|
String |
No description in spec. | ...>full_title |
| iXML | USER | director_name |
|
String |
No description in spec. | ...>director_name |
| iXML | USER | production_name |
|
String |
No description in spec. | ...>production_name |
| iXML | USER | production_address |
|
String |
No description in spec. | ...>production_address |
| iXML | USER | production_email |
|
String |
No description in spec. | ...>production_email |
| iXML | USER | production_phone |
|
String |
No description in spec. | ...>production_phone |
| iXML | USER | production_note |
|
String |
No description in spec. | ...>production_note |
| iXML | USER | sound_mixer_name |
|
String |
No description in spec. | ...>sound_mixer_name |
| iXML | USER | sound_mixer_address |
|
String |
No description in spec. | ...>sound_mixer_address |
| iXML | USER | sound_mixer_email |
|
String |
No description in spec. | ...>sound_mixer_email |
| iXML | USER | sound_mixer_phone |
|
String |
No description in spec. | ...>sound_mixer_phone |
| iXML | USER | sound_mixer_note |
|
String |
No description in spec. | ...>sound_mixer_note |
| iXML | USER | audio_recorder_model |
|
String |
No description in spec. | ...>audio_recorder_model |
| iXML | USER | audio_recorder_serial_number |
|
String |
No description in spec. | ...>audio_recorder_serial_number |
| iXML | USER | audio_recorder_firmware |
|
String |
No description in spec. | ...>audio_recorder_firmware |
LOCATION tag (WRID>RIFF-WAVE>iXML>LOCATION)
TODO: overview
The LOCATION object group is designed to hold machine readable information about the location this recording ws made in. In particular to support geotagging of recordings. LOCATION_GPS is specified in standard decimal form: latitude, longitude
iXML spec
Example:
<LOCATION>
<LOCATION_NAME>Human readable description of location</LOCATION_NAME>
<LOCATION_GPS>47.756787, -123.729977</LOCATION_GPS>
<LOCATION_ALTITUDE></LOCATION_ALTITUDE>
<LOCATION_TYPE>[dictionary]</LOCATION_TYPE>
<LOCATION_TIME>[dictionary]</LOCATION_TIME>
</LOCATION>
| Chunk | Name | Bytes | Type | Description | WRID |
|---|---|---|---|---|---|
| iXML | name |
|
String |
Human readable description of location. | ...>name |
| iXML | gps |
|
String |
Specified in standard decimal form: latitude, longitude, ex: 47.756787, -123.729977 | ...>gps |
| iXML | altitutde |
|
String |
No description in spec. | ...>altitutde |
| iXML | type |
|
String |
Contains one or more of the following values to describe the environment in which the recording was made. In the future additional items may be added to this dictionary and as shown below multiple entries should be comma delimited. Ex: INT EXT INT,EXT | ...>type |
| iXML | time |
|
String |
Contains one or more of the following values to describe the time in which the recording was made, in terms of subjective description, rather than literal time, which is represented with timestamps. Multiple entries should be comma delimited. | ...>time |
See http://www.gallery.co.uk/ixml/ for details on the two dictionary fields.
ASWG tags (WRID>RIFF-WAVE>iXML>ASWG)
TODO: overview
"Sony PlayStation Studios' Audio Standards Working Group - iXML Extension
The extension is designed to provide developers of interactive audio content and audio researchers the ability to store production and research related metadata within the BWFXML chunk of a Broadcast Wave file, describing its contents and other related information.
The specification defines fields relating to metadata that can be used within interactive media development applications and workflows as well as machine learning and deep learning feature sets.
The extension contains fields covering sound effects, music, dialogue and audio-driven haptic content, as well as more general project information.
Anyone may include the ASWG object and any of the tags contained in this document, if they find them useful. Please do not include any tags not listed in this document within the ASWG object." -- ASWG-G006 - iXML Extension Specification v1.1.pdf
At the time this section (ASWG) was last updated, there are two versions published versions of ASWG - 1.0 and 1.1. 1.1 appears to only add new fields. And since all fields in
iXMLare optional, we're documenting the 1.1 fields below.
| Chunk | Group | Name | Bytes | Type | Description | WRID |
|---|---|---|---|---|---|---|
| iXML | ASWG | content_type |
|
String |
Content Type (sfx/music/dialog/haptic/impulse/mixed). category:General | ...>content_type |
| iXML | ASWG | project |
|
String |
Project name asset was developed for. category:General | ...>project |
| iXML | ASWG | originator |
|
String |
Designer. category:General | ...>originator |
| iXML | ASWG | originator_studio |
|
String |
Name of originating studio. category:General | ...>originator_studio |
| iXML | ASWG | notes |
|
String |
General information not covered in other fields. category:General | ...>notes |
| iXML | ASWG | session |
|
String |
Application (Pro Tools/Reaper etc.) session name. category:General | ...>session |
| iXML | ASWG | state |
|
String |
File version: mastered, processed, raw, placeholder. category:General | ...>state |
| iXML | ASWG | editor |
|
String |
Name of editor. category:General | ...>editor |
| iXML | ASWG | mixer |
|
String |
Mix engineer. category:General | ...>mixer |
| iXML | ASWG | fx_chain_name |
|
String |
Name of FX chain used on file, Reaper chain name, for example. category:General | ...>fx_chain_name |
| iXML | ASWG | is_generated |
|
String |
Content is AI generated, or contains elements/sections that are AI generated. true/false. category:General | ...>is_generated |
| iXML | ASWG | mastering_engineer |
|
String |
Name of the mastering engineer. category:General | ...>mastering_engineer |
| iXML | ASWG | origination_date |
|
String |
Date of original upload of asset in format yyyy-MM-dd. category:General | ...>origination_date |
| iXML | ASWG | channel_config |
|
String |
Channel configuration of the file: mono, stereo, LCR, Quad, 5.0, 5.1, 7.0, 7.1, 12.2, ambisonic. category:Format | ...>channel_config |
| iXML | ASWG | ambisonic_format |
|
String |
Ambisonic format: #p, #h#p, #h#v. eg: 5p, 3h1v, 4h2p. category:Format" format="#p, #h#p, #h#v. eg: '5p', '3h1v', '4h2p' | ...>ambisonic_format |
| iXML | ASWG | ambisonic_chn_order |
|
String |
Ambisonic channel order: fuma, acn. category:Format | ...>ambisonic_chn_order |
| iXML | ASWG | ambisonic_norm |
|
String |
Ambisonic normalization: snd3, maxn, n3d. category:Format | ...>ambisonic_norm |
| iXML | ASWG | mic_type |
|
String |
Microphone(s) used. Where multiple mics used, prefix with channel number: 1-Neumann U87i, 2-AKG C414. category:Recording | ...>mic_type |
| iXML | ASWG | mic_config |
|
String |
Microphone configuration: Mono, AB, XY, ORTF, MS. category:Recording | ...>mic_config |
| iXML | ASWG | mic_distance |
|
String |
Microphone distance in meters OR headmounted - 1m, 2m, 0.3m, head. category:Recording | ...>mic_distance |
| iXML | ASWG | recording_loc |
|
String |
Recording location. category:Recording | ...>recording_loc |
| iXML | ASWG | is_designed |
|
String |
SFX: Is the sound designed, or is it a raw recording - true if designed, false if raw recording. category:Recording | ...>is_designed |
| iXML | ASWG | rec_engineer |
|
String |
Name of the recording engineer. category:Recording | ...>rec_engineer |
| iXML | ASWG | rec_studio |
|
String |
Music: Recording Studio. category:Recording | ...>rec_studio |
| iXML | ASWG | impulse_location |
|
String |
Impulse: Location of impulse. category:Impulse | ...>impulse_location |
| iXML | ASWG | category |
|
String |
UCS compliant SFX category. category:Sound Effects | ...>category |
| iXML | ASWG | sub_category |
|
String |
UCS compliant SFX sub-category. category:Sound Effects | ...>sub_category |
| iXML | ASWG | cat_id |
|
String |
UCS compliant SFX category ID. category:Sound Effects | ...>cat_id |
| iXML | ASWG | user_category |
|
String |
UCS complaint user category. category:Sound Effects | ...>user_category |
| iXML | ASWG | user_data |
|
String |
UCS compliant user data. category:Sound Effects | ...>user_data |
| iXML | ASWG | vendor_category |
|
String |
UCS compliant vendor category. category:Sound Effects | ...>vendor_category |
| iXML | ASWG | fx_name |
|
String |
UCS compliant FX name. category:Sound Effects | ...>fx_name |
| iXML | ASWG | library |
|
String |
UCS compliant library. category:Sound Effects | ...>library |
| iXML | ASWG | creator_id |
|
String |
UCS compliant SFX creator/publisher. category:Sound Effects | ...>creator_id |
| iXML | ASWG | source_id |
|
String |
UCS compliant SFX SourceID. category:Sound Effects | ...>source_id |
| iXML | ASWG | rms_power |
|
String |
RMS power of file. category:Audio Features | ...>rms_power |
| iXML | ASWG | loudness |
|
String |
Integrated loudness of file, measured with ITU-R BS1770-3 compliant metering. category:Audio Features | ...>loudness |
| iXML | ASWG | loudness_range |
|
String |
Loudness Range - EBU 3342 compliant. category:Audio Features | ...>loudness_range |
| iXML | ASWG | max_peak |
|
String |
Maximum sample value, in dBFS. category:Audio Features | ...>max_peak |
| iXML | ASWG | spec_density |
|
String |
Spectral density of file - amount of power at a standard set of frequency ranges. Freq ranges to be defined***. category:Audio Features | ...>spec_density |
| iXML | ASWG | zero_cross_rate |
|
String |
Zero Cross Rate, average frequency of entire file. category:Audio Features | ...>zero_cross_rate |
| iXML | ASWG | papr |
|
String |
Peak to average power ratio. category:Audio Features | ...>papr |
| iXML | ASWG | text |
|
String |
Dialogue: Transcript of the dialogue file. category:Dialogue | ...>text |
| iXML | ASWG | efforts |
|
String |
Dialogue: Whether the file contains efforts, dialogue or a mix of the two - True, False, Mixed. category:Dialogue | ...>efforts |
| iXML | ASWG | effort_type |
|
String |
Effort type - strain, pain. category:Dialogue | ...>effort_type |
| iXML | ASWG | projection |
|
String |
Dialogue projection level. 1- whispered, 2- spoken, 3- raised, 4- projected, 5- shouted. category:Dialogue | ...>projection |
| iXML | ASWG | language |
|
String |
Dialogue language - ISO639-1 Language Code. category:Dialogue" format="## e.g 'en' | ...>language |
| iXML | ASWG | timing_restriction |
|
String |
Dialogue timing restriction: wild, time, lip, na (not applicable). category:Dialogue | ...>timing_restriction |
| iXML | ASWG | character_name |
|
String |
Dialogue: Character name for dialogue files. category:Dialogue | ...>character_name |
| iXML | ASWG | character_gender |
|
String |
Dialogue: Sex/gender of character. category:Dialogue | ...>character_gender |
| iXML | ASWG | character_age |
|
String |
Dialogue: Age of (human) character. category:Dialogue | ...>character_age |
| iXML | ASWG | character_role |
|
String |
Dialogue: Whether the character is a main (significant) character or a background character: significant, background. category:Dialogue | ...>character_role |
| iXML | ASWG | actor_name |
|
String |
Dialogue: Name of actor. category:Dialogue | ...>actor_name |
| iXML | ASWG | actor_gender |
|
String |
Dialogue: Sex/gender of actor: male, female. category:Dialogue | ...>actor_gender |
| iXML | ASWG | director |
|
String |
Dialogue: Name of director. category:Dialogue | ...>director |
| iXML | ASWG | direction |
|
String |
Director’s notes, for context; explaining the scene and character motivation.. category:Dialogue | ...>direction |
| iXML | ASWG | fx_used |
|
String |
Effects used on file eg. Radio. category:Dialogue | ...>fx_used |
| iXML | ASWG | usage_rights |
|
String |
Dialogue: Code for usage rights of content: *Internal. category:Dialogue | ...>usage_rights |
| iXML | ASWG | is_union |
|
String |
Dialogue: Was recording done under a union contract: true, false. category:Dialogue | ...>is_union |
| iXML | ASWG | accent |
|
String |
Regional accent of the spoken dialogue, if applicable. category:Dialogue | ...>accent |
| iXML | ASWG | emotion |
|
String |
Emotional content present in the delivery of the dialogue. category:Dialogue | ...>emotion |
| iXML | ASWG | addressee_gender |
|
String |
Gender of addressee; male/female/malegroup/femalegroup/mixedgroup. category:Dialogue | ...>addressee_gender |
| iXML | ASWG | is_formal |
|
String |
Either formal or informal, depending on the relationship between the speaker and the addressee. formal/informal. category:Dialogue | ...>is_formal |
| iXML | ASWG | dev_language |
|
String |
Original language used by developer. category:Dialogue | ...>dev_language |
| iXML | ASWG | billing_code |
|
String |
Music: project billing code. category:Music | ...>billing_code |
| iXML | ASWG | composer |
|
String |
Music: Composer. category:Music | ...>composer |
| iXML | ASWG | artist |
|
String |
Music: Name of artist . category:Music | ...>artist |
| iXML | ASWG | song_title |
|
String |
Music: Song title. category:Music | ...>song_title |
| iXML | ASWG | genre |
|
String |
Music: Genre. category:Music | ...>genre |
| iXML | ASWG | sub_genre |
|
String |
Music: Sub-genre. category:Music | ...>sub_genre |
| iXML | ASWG | producer |
|
String |
Music: Producer name. category:Music | ...>producer |
| iXML | ASWG | music_sup |
|
String |
Music: Music supervisor. category:Music | ...>music_sup |
| iXML | ASWG | instrument |
|
String |
Music: Instrument on track/stem. category:Music | ...>instrument |
| iXML | ASWG | music_publisher |
|
String |
Music: PublishtimeSiger. category:Music | ...>music_publisher |
| iXML | ASWG | rights_owner |
|
String |
Music: Owner of the recorded work. category:Music | ...>rights_owner |
| iXML | ASWG | is_source |
|
String |
Music: Is this an asset as the composer delivered (source) or an edit of that source? true, false. category:Music | ...>is_source |
| iXML | ASWG | is_loop |
|
String |
Is the content loopable - true, false. category:Music | ...>is_loop |
| iXML | ASWG | intensity |
|
String |
Music: intensity. category:Music | ...>intensity |
| iXML | ASWG | is_final |
|
String |
Music: Is cue temp or final. category:Music | ...>is_final |
| iXML | ASWG | order_ref |
|
String |
Order reference of cue, if applicable *Internal. category:Music | ...>order_ref |
| iXML | ASWG | is_ost |
|
String |
Music: Is part of the Original Soundtrack. category:Music | ...>is_ost |
| iXML | ASWG | is_cinematic |
|
String |
Music: Asset is associated with a cinematic. category:Music | ...>is_cinematic |
| iXML | ASWG | is_licensed |
|
String |
Music: Asset is licensed and owned by 3rd party. category:Music | ...>is_licensed |
| iXML | ASWG | is_diegetic |
|
String |
Music: Track is diegetic in game. category:Music | ...>is_diegetic |
| iXML | ASWG | music_version |
|
String |
Music: Version number. category:Music | ...>music_version |
| iXML | ASWG | isrc_id |
|
String |
Music: ISRC code. category:Music" format="## ### ## ##### e.g 'UK AAA 05 00001' | ...>isrc_id |
| iXML | ASWG | tempo |
|
String |
Music: Tempo in bpm. category:Music | ...>tempo |
| iXML | ASWG | time_sig |
|
String |
Music: Time Signature. e.g 3:4. category:Music" format="A:B e.g '3:4' | ...>time_sig |
| iXML | ASWG | in_key |
|
String |
Music: In key. category:Music | ...>in_key |
BEXT and LOUDNESS tags
"If this
[IXML2004]object appears (redundantly) in a WAVE file (for continuity of format) it is ESSENTIAL that the values match those in the official bext chunk in the same file."
The iXML spec includes duplicate fields from the bext chunk in the <BEXT> and <LOUDNESS> tags. This gives a place to store BEXT data in non WAV formats where other chunks may not exist. However, in WAV files it creates the possibility for confusion if they values differ. On top of that, the spec gives contradtictory information on which to use if they're different.
For users, be aware that this is a tricky area between specifications and different applications will sometimes display different data when values are different between these duplicated fields. If you see applications showing different data, check the metadata with a tool which will show you both.
For implementors focusing on maxium correctness:
- Warn users if both versions of fields exist and they have different values.
- Do write BEXT data into the
bextchunk. - Do NOT write
<BEXT>and<LOUDNESS>tags when writing theiXMLchunk.
However, I've heard of software which doesn't read from bext at all if iXML is present... so...
For implementors focusing on maximum compatibility:
- Warn users if both versions of fields exist and they have different values.
- Do write BEXT data into the
bextchunk. - Do write
<BEXT>and<LOUDNESS>tags when writing theiXMLchunk. Abort and give an error if values differ between the duplicate fields
Learning References
Specifications
- [IXML2004] Standard for embedded metadata in production media files (2004).
- [ASWG-G006] iXML-Extension from Sony PlayStation Studios' Audio Standards Working Group (2024).