WRID>RIFF-WAVE>iXML

iXML chunk

"All iXML parameters and objects are OPTIONAL. Readers must not assume that any particular parameters exist."

[IXML2004]

"... to ensure that no products are developed which enforce schema compatibility, it has been decided NOT to create a schema for iXML."

[IXML2004]

Buckle-up folks, you're going to see a lot of variation in iXML chunks.

iXML root tags (WRID>RIFF-WAVE>iXML)

TODO: overview

Chunk Name Bytes Type conditionDescriptionWRID
iXML id 4 u8[4] id = "iXML" ...>id
iXML size 4 u32 The raw bytes of this chunk should be interpreted as XML, encoded in any unicode encoding. ...>size
iXML ixml_version String The version number of the iXML specification used to prepare the iXML audio file. This version appears in the front page at http://www.ixml.info, and takes the form of x.y where x and y are whole numbers, for example 1.51 ...>ixml_version
iXML project String The name of the project to which this file belongs. This might typically be the name of the motion picture or program which is in production. ...>project
iXML scene String The name of the scene / slate being recorded. For US system this might typically be 32, 32A, 32B, A32B, 32AB etc. For UK system this might typically be a incrementing number with no letters. ...>scene
iXML tape String The SoundRoll which identifies a group of recordings. Normally, the SoundRoll is a vital component of workflow to differentiate audio recorded with time of day on different days. In other words for 2 (completely different) recordings each covering a period around 11am, the soundroll would differentiate them by (typically) telling you which shooting day this recording applies to. Some projects may turnover sound more than once per day, and increment the soundroll at this point. In any event, the soundroll should change at least once in any 24 hour period. Some systems change the soundroll for every recording which is also a valid option, in effect using the soundroll as a unique file identifer (although this function is explicitly provided with the iXML FILE_UID parameter). ...>tape
iXML take String The number of the take in the current scene or slate. Usually this will be a simple number, although variations for things like wild tracks may yield takes like 1, 2, 3, WT1, WT2 etc. ...>take
iXML take_type [Enum] (New in iXML v2.0) A dictionary based tag allowing selection from a defined list of values to explicitly categorise the type/purpose/function of the current take. This tag overlaps with the existing NO_GOOD / FALSE_START / WILD_TRACK which are deprecated in iXML 2.0, This tag can contain multiple entries, separated by commas and can be expanded in the future with additional dictionary entries, detailed in the TAKE_TYPE dictionary. ...>take_type
iXML (map to take_type) - (Deprecated in iXML v2.0, superceded by TAKE_TYPE) This parameter allows a recorder to mark this recording as "no good" (ie, of no use whatsoever, and in effect to be deleted). The value should be TRUE or FALSE. If absent, this should be assumed FALSE. ...>no_good
iXML (map to take_type) - (Deprecated in iXML v2.0, superceded by TAKE_TYPE) This parameter allows a recorder to mark this recording as a false start, this may indicate that another file could exist with the same take number. Typically this file might also be marked as <NO_GOOD>. The value should be TRUE or FALSE. If absent, this should be assumed FALSE. ...>false_start
iXML (map to take_type) - (Deprecated in iXML v2.0, superceded by TAKE_TYPE) This parameter allows a recorder to mark this recording as a wild track, with no specific relationship to any take, although it might be marked with a specific scene, for example when recording ambience for a given location. The value should be TRUE or FALSE. If absent, this should be assumed FALSE. ...>wild_track
iXML circled bool This parameter allows a recorder to mark this recording as a circle-take. The value should be TRUE or FALSE. If absent, this should be assumed FALSE. ...>circled
iXML file_uid String A unique number which identifies this physical FILE, regardless of the number of channels etc. If your system employs a unique SoundRoll per recording, your FILE_UID and TAPE parameters should be the same. ...>file_uid
iXML ubits String The userbits associated with this recording. This may have been extracted from incoming timecode when the file was recorded, or generated by the recorder from the date, or any other metadata. Typically the userbits are rarely used now because other more explicit metadata supercedes this function. ...>ubits
iXML note String A free text note to add user metadata to the recording. This might typically used to communicate information such as TAIL SLATE, NO SLATE, or to warn of noise interruptions - PLANE OVERHEAD etc. ...>note
iXML pre_record_samplecount String (Deprecated - do not use) This parameter allows a recorder which incorporates a pre-record buffer to communicate how much of the recording came from the pre-record buffer. This is a sample count (like timestamp) offset from the start of the file. Playback applications can, by default skip past the pre-record stage, offering the user a more intuitive playback which starts from the point where the record button was actually pressed. DEPRECATION NOTE: iXML 1.4 defines a SYNC_POINT_FUNCTION for Pre-record sample count, which is preferable to using this old iXML 1.3 deprecated root tag. For full backwards compatibility you should support both the root Pre-Record Samplecount tag, and also override it if one is found inside the Sync Point List. ...>pre_record_samplecount

SYNC_POINT_LIST tags (WRID>RIFF-WAVE>iXML>SYNC_POINT_LIST )

TODO: overview

Example:

<SYNC_POINT_LIST>
  <SYNC_POINT_COUNT>2</SYNC_POINT_COUNT>
  <SYNC_POINT>
    <SYNC_POINT_TYPE>RELATIVE</SYNC_POINT_TYPE>
    <SYNC_POINT_FUNCTION>PRE_RECORD_SAMPLECOUNT</SYNC_POINT_FUNCTION>
    <SYNC_POINT_LOW>930000</SYNC_POINT_LOW>
    <SYNC_POINT_HIGH>0</SYNC_POINT_HIGH>
    <SYNC_POINT_EVENT_DURATION>0</SYNC_POINT_EVENT_DURATION>
  </SYNC_POINT>
  <SYNC_POINT>
    <SYNC_POINT_TYPE>ABSOLUTE</SYNC_POINT_TYPE>
    <SYNC_POINT_FUNCTION>SLATE_GENERIC</SYNC_POINT_FUNCTION>
    <SYNC_POINT_COMMENT>Near start</SYNC_POINT_COMMENT>
    <SYNC_POINT_LOW>1242</SYNC_POINT_LOW>
    <SYNC_POINT_HIGH>0</SYNC_POINT_HIGH>
    <SYNC_POINT_EVENT_DURATION>0</SYNC_POINT_EVENT_DURATION>
  </SYNC_POINT>
</SYNC_POINT_LIST>
Chunk GroupName Bytes Type DescriptionWRID
iXML SYNC_POINT_LIST sync_point_count String Should appear at the start of the SYNC POINT LIST to allow readers to prepare memory for the following list. ...>sync_point_count
iXML SYNC_POINT_LIST sync_point String Multiple sample based counts which represents a sync point for this recording. ...>sync_point
Chunk GroupName Bytes Type DescriptionWRID
iXML SYNC_POINT point_type String Can be either RELATIVE or ABSOLUTE, which represents a sample frames count from the start of the file, or an absolute sample count since midnight. Note the relative sample count needs multiplying by the wordsize and number of channels to translated into a byte count from the start of the file. For 32 bit sample counts, only the _LOW parameter is needed, with _HIGH set to zero. For high sample rates, the _HIGH parameter is also needed to communicate 24 hour sample counts. ...>point_type
iXML SYNC_POINT function String Determines the function of this sync point. There are a number of defined functions for sync points, indicating things like the Pre-Record Sample Count, or the primary slate. See the Sync Point Function dictionary for a list of defined functions. ...>function
iXML SYNC_POINT comment String Allows a note for each sync point to be entered, for example "camera a", "camera b", etc. ...>comment
iXML SYNC_POINT low String For 32 bit sample counts, only the _LOW parameter is needed, with _HIGH set to zero. ...>low
iXML SYNC_POINT high String For high sample rates, the _HIGH parameter is also needed to communicate 24 hour sample counts. ...>high
iXML SYNC_POINT event_duration String Allows a sync point to be a region with a defined start and stop point/duration. This can be useful for playbacks where you would like the play to stop automatically at the end of the sync event. A value of 0 in the SYNC_POINT_EVENT_DURATION implies a non-duration (marker) or unknown duration event. (Note: units for this value are not specified in iXML spec.) ...>event_duration

SPEED tag (WRID>RIFF-WAVE>iXML>SPEED)

TODO: overview

Recommend carefully reading the SPEED section of the iXML spec for details and examples for both setting and reading these values.

Example:

<SPEED>
  <NOTE>camera overcranked</NOTE>
  <MASTER_SPEED>24/1</MASTER_SPEED>
  <CURRENT_SPEED>48/1</CURRENT_SPEED>
  <TIMECODE_FLAG>NDF</TIMECODE_FLAG>
  <TIMECODE_RATE>24000/1001</TIMECODE_RATE>
  <FILE_SAMPLE_RATE>48000</FILE_SAMPLE_RATE>
  <AUDIO_BIT_DEPTH>24</AUDIO_BIT_DEPTH>
  <DIGITIZER_SAMPLE_RATE>48048</DIGITIZER_SAMPLE_RATE>
  <TIMESTAMP_SAMPLES_SINCE_MIDNIGHT_HI>0</TIMESTAMP_SAMPLES_SINCE_MIDNIGHT_HI>
  <TIMESTAMP_SAMPLES_SINCE_MIDNIGHT_LO>48048000</TIMESTAMP_SAMPLES_SINCE_MIDNIGHT_LO>
  <TIMESTAMP_SAMPLE_RATE>48000</TIMESTAMP_SAMPLE_RATE>
</SPEED>
Chunk GroupName Bytes Type DescriptionWRID
iXML SPEED note String User comments about speed. ...>note
iXML SPEED master_speed String Speed at which the material will be replayed. ...>master_speed
iXML SPEED current_speed String Speed of this recording. Might be different than master_speed. In slow motion for example, master_speed might be "24/1", and current_speed might be "48/1". ...>current_speed
iXML SPEED timecode_rate String Timecode frame rate used during record. Ex: 24/1, 25/1, 30000/1001, 24000/1001, 30/1 ...>timecode_rate
iXML SPEED timecode_flag String Timecode Drop Frame used during record. Ex: DF for Drop Frame, NDF for Non Drop Frame. Defaults to NDF. Useful to calculate H:m:s:f format timecode. ...>timecode_flag
iXML SPEED file_sample_rate String Duplicated from fmt chunk "for convenience". ...>file_sample_rate
iXML SPEED audio_bit_depth String Duplicated from fmt chunk "for convenience". ...>audio_bit_depth
iXML SPEED digitizer_sample_rate String True wordclock speed of the A-D convertors used during the recording. ...>digitizer_sample_rate
iXML SPEED timestamp_samples_since_midnight_hi String Duplicated from fmt chunk "for convenience". ...>timestamp_samples_since_midnight_hi
iXML SPEED timestamp_samples_since_midnight_lo String Duplicated from fmt chunk "for convenience". ...>timestamp_samples_since_midnight_lo
iXML SPEED timestamp_sample_rate String Sample rate used to calculate the timestamp, and which must be used in mathematic calculations to recover the timecode timestamp for the file. ...>timestamp_sample_rate

IMPORTANT - the file samplerate, audio bit depth and timestamp samples since midnight are included redundantly in the SPEED object in order to assemble all the important data in a single place for human readability when troubleshooting workflow problems. iXML Readers should ignore this information and instead take the data from the official fmt and bext chunks in the file. Generic utilities changing file data might change the fmt or bxt chunk but not update the iXML SPEED tag, and for EBU officially specified BWF data like timestamp, the EBU data takes precedence (unlike the unofficial informal bext metadata which is superceded by iXML)

iXML spec

LOUDNESS tag (WRID>RIFF-WAVE>iXML>LOUDNESS)

TODO: overview

Example:

<LOUDNESS>
  <LOUDNESS_VALUE></LOUDNESS_VALUE>
  <LOUDNESS_RANGE></LOUDNESS_RANGE>
  <MAX_TRUE_PEAK_LEVEL></MAX_TRUE_PEAK_LEVEL>
  <MAX_MOMENTARY_LOUDNESS></MAX_MOMENTARY_LOUDNESS>
  <MAX_SHORT_TERM_LOUDNESS></MAX_SHORT_TERM_LOUDNESS>
</LOUDNESS>
Chunk GroupName Bytes Type DescriptionWRID
iXML LOUDNESS loudness_value String Duplicated from bext chunk. iXML body equivalent to the LoudnessValue field from the BEXT. ...>loudness_value
iXML LOUDNESS loudness_range String Duplicated from bext chunk. iXML body equivalent to the LoudnessRange field from the BEXT. ...>loudness_range
iXML LOUDNESS max_true_peak_level String Duplicated from bext chunk. iXML body equivalent to the MaxTruePeakLevel field from the BEXT. ...>max_true_peak_level
iXML LOUDNESS max_momentary_loudness String Duplicated from bext chunk. iXML body equivalent to the MaxMomentaryLoudness field from the BEXT. ...>max_momentary_loudness
iXML LOUDNESS max_short_term_loudness String Duplicated from bext chunk. iXML body equivalent to the MaxShortTermLoudness field from the BEXT. ...>max_short_term_loudness

These Loudness fields are the iXML body equivalent to the BWF_LOUDNESS_VALUE etc fields from the BEXT. Since the iXML BEXT object is intended to be a redundant copy of BEXT and of any iXML native information, it would be ignored by most iXML readers, and as such a dedicated native iXML LOUDNESS object is required

iXML spec

FWIW... this explanation doesn't make sense to me. And it makes for yet another place BEXT data is duplicated in iXML. At least the field names match. Luckily, it seems like very few iXML writers are currently writing it. That seems like a good idea to me, since the data is already in the BEXT chunk. And if iXML was being used outside of a WAV file, there is already a full copy of BEXT data in iXML.

HISTORY tag (WRID>RIFF-WAVE>iXML>HISTORY )

The HISTORY object allows tracking of a file's origins, where it may have been created as a derivation of another file

iXML spec

TODO: overview

Example:

<HISTORY>
	<ORIGINAL_FILENAME>myname_1.wav</ORIGINAL_FILENAME>
	<PARENT_FILENAME>myname.bwf</PARENT_FILENAME>
	<PARENT_UID>9876543210</PARENT_UID>
</HISTORY>
Chunk GroupName Bytes Type DescriptionWRID
iXML HISTORY original_filename String Name given to this file when it was created. Using the original_filename metadata allows systems to track back to the original name if it changes. ...>original_filename
iXML HISTORY parent_filename String Identification of the source of a derived file. ...>parent_filename
iXML HISTORY parent_uid String Identification of the source of a derived file. Likely contains values from file_uid in another file. ...>parent_uid

FILE_SET tag (WRID>RIFF-WAVE>iXML>FILE_SET)

TODO: overview

When multiple files should be treated as a group, FILE_SET helps identify other members of the group.

The rules for interpreting these fields are complex, see the the spec for details.

Example:

<FILE_SET>
  <TOTAL_FILES>1</TOTAL_FILES>
  <FAMILY_UID>MTIPMX17654200508051445053840000</FAMILY_UID>
  <FAMILY_NAME>21/33</FAMILY_NAME>
  <FILE_SET_INDEX>A</FILE_SET_INDEX>
</FILE_SET>
Chunk GroupName Bytes Type DescriptionWRID
iXML FILE_SET total_files String Total number of companion files in the same set of files. ...>total_files
iXML FILE_SET family_uid String Multiple Files which represent a single recording should share a common family_uid. ...>family_uid
iXML FILE_SET family_name String Non-unique text name for the file set. ...>family_name
iXML FILE_SET index String Origination index in a group of files. For mono files, use indexes 1 to n. for multi-poly files, use letters A, B etc. It is strongly recommended that dual poly recordings should use this tag. ...>index
iXML FILE_SET start_time_hi String Rarely set. When a FILE_SET is non-coherent (tracks are enabled or disabled during a recording)L each file of a FILE_SET group MUST include the additional FILE_SET tag <FILE_SET_START_TIME_HI> and <FILE_SET_START_TIME_LO> to locate the group as a whole. Uses the same counting used for the TIMESTAMP in the tag. The FILE_SET_START_TIME is used to locate the start of the FILE_SET group, with each file offset by its GROUP_OFFSET SYNC_POINT. Where a member of the FILE_SET has not GROUP_OFFSET sync point, it is assumed that this file starts with an offset of 0 from the group. ...>start_time_hi
iXML FILE_SET start_time_lo String As above. ...>start_time_lo

TRACK_LIST tag (WRID>RIFF-WAVE>iXML>FILE_SET)

TODO: overview

An umabigous way to identify track labels, track indexing (microphone source identification), multichannel / multifile relationship of tracks and track function identification.

iXML spec

Example:

<TRACK_LIST>
  <TRACK_COUNT>2</TRACK_COUNT>
  <TRACK>
    <CHANNEL_INDEX>1</CHANNEL_INDEX>
    <INTERLEAVE_INDEX>1</INTERLEAVE_INDEX>
    <NAME>Mid</NAME>
    <FUNCTION>M-MID_SIDE</FUNCTION>
  </TRACK>
  <TRACK>
    <CHANNEL_INDEX>2</CHANNEL_INDEX>
    <INTERLEAVE_INDEX>2</INTERLEAVE_INDEX>
    <NAME>Side</NAME>
    <FUNCTION>S-MID_SIDE</FUNCTION>
  </TRACK>
</TRACK_LIST>
Chunk GroupName Bytes Type DescriptionWRID
iXML TRACK_LIST track_count String Number of TRACK objects in this group, which would normally match the number of tracks in the file. ...>track_count
Chunk GroupName Bytes Type DescriptionWRID
iXML TRACK channel_index String Input or source number on a recorder. For an 8 channel recorder, the inputs might be identified as 1 to 8, however for a given recording, only tracks 4 and 6 may be armed. Integer, counting from 1. ...>channel_index
iXML TRACK interleave_index String The index of the track in this file. Integer, counting from 1. ...>interleave_index
iXML TRACK name String Track label. ...>name
iXML TRACK function String Purpose of the track. Ex: LEFT, RIGHT. See "FUNCTION dictionary" for standard values. ...>function

BEXT tag (WRID>RIFF-WAVE>iXML>BEXT)

TODO: overview

Broadcast WAVE files employ a bext chunk to communicate EBU standardised metadata, which complements iXML. In non WAVE files, such as AIFF, it may be desireable to also store an equivalent set of information. For this purpose iXML includes an optional BEXT object which can communicate the standard EBU bext metadata in any file employing iXML. If this object appears (redundantly) in a WAVE file (for continuity of format) it is ESSENTIAL that the values match those in the official bext chunk in the same file.

iXML spec

Example:

<BEXT>
  <BWF_DESCRIPTION>all the old stuff</BWF_DESCRIPTION>
  <BWF_ORIGINATOR>METACORDER</BWF_ORIGINATOR>
  <BWF_ORIGINATOR_REFERENCE>123456</BWF_ORIGINATOR_REFERENCE>
  <BWF_ORIGINATION_DATE>2003-10-30</BWF_ORIGINATION_DATE>
  <BWF_ORIGINATION_TIME>03:27:17</BWF_ORIGINATION_TIME>
  <BWF_TIME_REFERENCE_LOW>123674376</BWF_TIME_REFERENCE_LOW>
  <BWF_TIME_REFERENCE_HIGH>0</BWF_TIME_REFERENCE_HIGH>
  <BWF_VERSION>1.0</BWF_VERSION>
  <BWF_UMID>MTIPMX17654200508051445053840001</BWF_UMID>
  <BWF_RESERVED>00000000000000000000000000000000000000000</BWF_RESERVED>
  <BWF_CODING_HISTORY>some info</BWF_CODING_HISTORY>
</BEXT>
Chunk GroupName Bytes Type DescriptionWRID
iXML BEXT description String Duplicated from bext chunk. ...>description
iXML BEXT originator String Duplicated from bext chunk. ...>originator
iXML BEXT originator_reference String Duplicated from bext chunk. ...>originator_reference
iXML BEXT origination_date String Duplicated from bext chunk. ...>origination_date
iXML BEXT origination_time String Duplicated from bext chunk. ...>origination_time
iXML BEXT time_reference String Duplicated from bext chunk. ...>time_reference
iXML BEXT version String Duplicated from bext chunk. ...>version
iXML BEXT umid String Duplicated from bext chunk. ...>umid
iXML BEXT loudness_value String Duplicated from bext chunk. ...>loudness_value
iXML BEXT loudness_range String Duplicated from bext chunk. ...>loudness_range
iXML BEXT max_true_peak_level String Duplicated from bext chunk. ...>max_true_peak_level
iXML BEXT max_momentary_loudness String Duplicated from bext chunk. ...>max_momentary_loudness
iXML BEXT max_short_term_loudness String Duplicated from bext chunk. ...>max_short_term_loudness
iXML BEXT reserved String Duplicated from bext chunk. ...>reserved
iXML BEXT coding_history String Duplicated from bext chunk. ...>coding_history

USER tag (WRID>RIFF-WAVE>iXML>USER)

TODO: overview

USER is different than all of the other tags specified in iXML. It acts as both a top level "field" (the text between tags is the value) and as a container for additional user defined tags. It is allowed to have either or both kinds of data.

The USER field is completely defined by the user, hardware or application, it has no defined function, and may contain any kind of human readable data. It is intended that this field be used (comptible with a defined schema) to store miscellaneous information, which is not appropriate for any other field. Applications are free to sub-divide this field with tagging systems like the old bext description, although typically this field is designed to be human readable rather than machine readable, so any tagging should be based on interpretation of human readable, neat text. iXML viewing applications will typically display the entire USER field in one text area. One of the primary functions of this field would typically be to allow extended information about the recording process used, and personnel involved with a field recording. This is typically not file-specific but will be the same for a whole group of recordings, and which appears in the iXML of all recordings. It is ideally suited to storage of the metadata which would normally appear at the top of a sound report, ie. the name of the mixer, contact details etc. UPDATE: iXML v2.0 - the USER field can now contain XML tagged data, providing more explicit and machine readable information, using the list of fields recommended by AFSI:

iXML spec

Example:

<USER>
  Production : iXML Test Movie Production
  Mixer : Mark Gilbert
  Recorder : MetaCorder 1.5
  Contact : fieldsound@gallery.co.uk
  Location : Leavesden Studios Sound Stage 5
  Day : 5
  Reference Level : -20dBf
  Microphones : Sennheiser MKH-70, Sanken COS-11
  <PRODUCTION_NAME>iXML Test Movie Production</PRODUCTION_NAME>
  <SOUND_MIXER_NAME>Mark Gilbert</SOUND_MIXER_NAME>
</USER>
Chunk GroupName Bytes Type DescriptionWRID
iXML USER text String The USER field is completely defined by the user, hardware or application, it has no defined function, and may contain any kind of human readable data. It is intended that this field be used (comptible with a defined schema) to store miscellaneous information, which is not appropriate for any other field. Applications are free to sub-divide this field with tagging systems like the old bext description, although typically this field is designed to be human readable rather than machine readable, so any tagging should be based on interpretation of human readable, neat text. iXML viewing applications will typically display the entire USER field in one text area. One of the primary functions of this field would typically be to allow extended information about the recording process used, and personnel involved with a field recording. This is typically not file-specific but will be the same for a whole group of recordings, and which appears in the iXML of all recordings. It is ideally suited to storage of the metadata which would normally appear at the top of a sound report, ie. the name of the mixer, contact details etc. UPDATE: iXML v2.0 - the USER field can now contain XML tagged data, providing more explicit and machine readable information, using the list of fields recommended by AFSI (https://www.afsi.eu/articles/30623-compte-rendu-du-groupe-de-travail-afsi-sur-les-rapports-son). ...>
iXML USER full_title String No description in spec. ...>full_title
iXML USER director_name String No description in spec. ...>director_name
iXML USER production_name String No description in spec. ...>production_name
iXML USER production_address String No description in spec. ...>production_address
iXML USER production_email String No description in spec. ...>production_email
iXML USER production_phone String No description in spec. ...>production_phone
iXML USER production_note String No description in spec. ...>production_note
iXML USER sound_mixer_name String No description in spec. ...>sound_mixer_name
iXML USER sound_mixer_address String No description in spec. ...>sound_mixer_address
iXML USER sound_mixer_email String No description in spec. ...>sound_mixer_email
iXML USER sound_mixer_phone String No description in spec. ...>sound_mixer_phone
iXML USER sound_mixer_note String No description in spec. ...>sound_mixer_note
iXML USER audio_recorder_model String No description in spec. ...>audio_recorder_model
iXML USER audio_recorder_serial_number String No description in spec. ...>audio_recorder_serial_number
iXML USER audio_recorder_firmware String No description in spec. ...>audio_recorder_firmware

LOCATION tag (WRID>RIFF-WAVE>iXML>LOCATION)

TODO: overview

The LOCATION object group is designed to hold machine readable information about the location this recording ws made in. In particular to support geotagging of recordings. LOCATION_GPS is specified in standard decimal form: latitude, longitude

iXML spec

Example:

<LOCATION>
  <LOCATION_NAME>Human readable description of location</LOCATION_NAME>
  <LOCATION_GPS>47.756787, -123.729977</LOCATION_GPS>
  <LOCATION_ALTITUDE></LOCATION_ALTITUDE>
  <LOCATION_TYPE>[dictionary]</LOCATION_TYPE>
  <LOCATION_TIME>[dictionary]</LOCATION_TIME>
</LOCATION>
Chunk Name Bytes Type DescriptionWRID
iXML name String Human readable description of location. ...>name
iXML gps String Specified in standard decimal form: latitude, longitude, ex: 47.756787, -123.729977 ...>gps
iXML altitutde String No description in spec. ...>altitutde
iXML type String Contains one or more of the following values to describe the environment in which the recording was made. In the future additional items may be added to this dictionary and as shown below multiple entries should be comma delimited. Ex: INT EXT INT,EXT ...>type
iXML time String Contains one or more of the following values to describe the time in which the recording was made, in terms of subjective description, rather than literal time, which is represented with timestamps. Multiple entries should be comma delimited. ...>time

See http://www.gallery.co.uk/ixml/ for details on the two dictionary fields.

ASWG tags (WRID>RIFF-WAVE>iXML>ASWG)

TODO: overview

"Sony PlayStation Studios' Audio Standards Working Group - iXML Extension

The extension is designed to provide developers of interactive audio content and audio researchers the ability to store production and research related metadata within the BWFXML chunk of a Broadcast Wave file, describing its contents and other related information.

The specification defines fields relating to metadata that can be used within interactive media development applications and workflows as well as machine learning and deep learning feature sets.

The extension contains fields covering sound effects, music, dialogue and audio-driven haptic content, as well as more general project information.

Anyone may include the ASWG object and any of the tags contained in this document, if they find them useful. Please do not include any tags not listed in this document within the ASWG object." -- ASWG-G006 - iXML Extension Specification v1.1.pdf

At the time this section (ASWG) was last updated, there are two versions published versions of ASWG - 1.0 and 1.1. 1.1 appears to only add new fields. And since all fields in iXML are optional, we're documenting the 1.1 fields below.

Chunk GroupName Bytes Type DescriptionWRID
iXML ASWG content_type String Content Type (sfx/music/dialog/haptic/impulse/mixed). category:General ...>content_type
iXML ASWG project String Project name asset was developed for. category:General ...>project
iXML ASWG originator String Designer. category:General ...>originator
iXML ASWG originator_studio String Name of originating studio. category:General ...>originator_studio
iXML ASWG notes String General information not covered in other fields. category:General ...>notes
iXML ASWG session String Application (Pro Tools/Reaper etc.) session name. category:General ...>session
iXML ASWG state String File version: mastered, processed, raw, placeholder. category:General ...>state
iXML ASWG editor String Name of editor. category:General ...>editor
iXML ASWG mixer String Mix engineer. category:General ...>mixer
iXML ASWG fx_chain_name String Name of FX chain used on file, Reaper chain name, for example. category:General ...>fx_chain_name
iXML ASWG is_generated String Content is AI generated, or contains elements/sections that are AI generated. true/false. category:General ...>is_generated
iXML ASWG mastering_engineer String Name of the mastering engineer. category:General ...>mastering_engineer
iXML ASWG origination_date String Date of original upload of asset in format yyyy-MM-dd. category:General ...>origination_date
iXML ASWG channel_config String Channel configuration of the file: mono, stereo, LCR, Quad, 5.0, 5.1, 7.0, 7.1, 12.2, ambisonic. category:Format ...>channel_config
iXML ASWG ambisonic_format String Ambisonic format: #p, #h#p, #h#v. eg: 5p, 3h1v, 4h2p. category:Format" format="#p, #h#p, #h#v. eg: '5p', '3h1v', '4h2p' ...>ambisonic_format
iXML ASWG ambisonic_chn_order String Ambisonic channel order: fuma, acn. category:Format ...>ambisonic_chn_order
iXML ASWG ambisonic_norm String Ambisonic normalization: snd3, maxn, n3d. category:Format ...>ambisonic_norm
iXML ASWG mic_type String Microphone(s) used. Where multiple mics used, prefix with channel number: 1-Neumann U87i, 2-AKG C414. category:Recording ...>mic_type
iXML ASWG mic_config String Microphone configuration: Mono, AB, XY, ORTF, MS. category:Recording ...>mic_config
iXML ASWG mic_distance String Microphone distance in meters OR headmounted - 1m, 2m, 0.3m, head. category:Recording ...>mic_distance
iXML ASWG recording_loc String Recording location. category:Recording ...>recording_loc
iXML ASWG is_designed String SFX: Is the sound designed, or is it a raw recording - true if designed, false if raw recording. category:Recording ...>is_designed
iXML ASWG rec_engineer String Name of the recording engineer. category:Recording ...>rec_engineer
iXML ASWG rec_studio String Music: Recording Studio. category:Recording ...>rec_studio
iXML ASWG impulse_location String Impulse: Location of impulse. category:Impulse ...>impulse_location
iXML ASWG category String UCS compliant SFX category. category:Sound Effects ...>category
iXML ASWG sub_category String UCS compliant SFX sub-category. category:Sound Effects ...>sub_category
iXML ASWG cat_id String UCS compliant SFX category ID. category:Sound Effects ...>cat_id
iXML ASWG user_category String UCS complaint user category. category:Sound Effects ...>user_category
iXML ASWG user_data String UCS compliant user data. category:Sound Effects ...>user_data
iXML ASWG vendor_category String UCS compliant vendor category. category:Sound Effects ...>vendor_category
iXML ASWG fx_name String UCS compliant FX name. category:Sound Effects ...>fx_name
iXML ASWG library String UCS compliant library. category:Sound Effects ...>library
iXML ASWG creator_id String UCS compliant SFX creator/publisher. category:Sound Effects ...>creator_id
iXML ASWG source_id String UCS compliant SFX SourceID. category:Sound Effects ...>source_id
iXML ASWG rms_power String RMS power of file. category:Audio Features ...>rms_power
iXML ASWG loudness String Integrated loudness of file, measured with ITU-R BS1770-3 compliant metering. category:Audio Features ...>loudness
iXML ASWG loudness_range String Loudness Range - EBU 3342 compliant. category:Audio Features ...>loudness_range
iXML ASWG max_peak String Maximum sample value, in dBFS. category:Audio Features ...>max_peak
iXML ASWG spec_density String Spectral density of file - amount of power at a standard set of frequency ranges. Freq ranges to be defined***. category:Audio Features ...>spec_density
iXML ASWG zero_cross_rate String Zero Cross Rate, average frequency of entire file. category:Audio Features ...>zero_cross_rate
iXML ASWG papr String Peak to average power ratio. category:Audio Features ...>papr
iXML ASWG text String Dialogue: Transcript of the dialogue file. category:Dialogue ...>text
iXML ASWG efforts String Dialogue: Whether the file contains efforts, dialogue or a mix of the two - True, False, Mixed. category:Dialogue ...>efforts
iXML ASWG effort_type String Effort type - strain, pain. category:Dialogue ...>effort_type
iXML ASWG projection String Dialogue projection level. 1- whispered, 2- spoken, 3- raised, 4- projected, 5- shouted. category:Dialogue ...>projection
iXML ASWG language String Dialogue language - ISO639-1 Language Code. category:Dialogue" format="## e.g 'en' ...>language
iXML ASWG timing_restriction String Dialogue timing restriction: wild, time, lip, na (not applicable). category:Dialogue ...>timing_restriction
iXML ASWG character_name String Dialogue: Character name for dialogue files. category:Dialogue ...>character_name
iXML ASWG character_gender String Dialogue: Sex/gender of character. category:Dialogue ...>character_gender
iXML ASWG character_age String Dialogue: Age of (human) character. category:Dialogue ...>character_age
iXML ASWG character_role String Dialogue: Whether the character is a main (significant) character or a background character: significant, background. category:Dialogue ...>character_role
iXML ASWG actor_name String Dialogue: Name of actor. category:Dialogue ...>actor_name
iXML ASWG actor_gender String Dialogue: Sex/gender of actor: male, female. category:Dialogue ...>actor_gender
iXML ASWG director String Dialogue: Name of director. category:Dialogue ...>director
iXML ASWG direction String Director’s notes, for context; explaining the scene and character motivation.. category:Dialogue ...>direction
iXML ASWG fx_used String Effects used on file eg. Radio. category:Dialogue ...>fx_used
iXML ASWG usage_rights String Dialogue: Code for usage rights of content: *Internal. category:Dialogue ...>usage_rights
iXML ASWG is_union String Dialogue: Was recording done under a union contract: true, false. category:Dialogue ...>is_union
iXML ASWG accent String Regional accent of the spoken dialogue, if applicable. category:Dialogue ...>accent
iXML ASWG emotion String Emotional content present in the delivery of the dialogue. category:Dialogue ...>emotion
iXML ASWG addressee_gender String Gender of addressee; male/female/malegroup/femalegroup/mixedgroup. category:Dialogue ...>addressee_gender
iXML ASWG is_formal String Either formal or informal, depending on the relationship between the speaker and the addressee. formal/informal. category:Dialogue ...>is_formal
iXML ASWG dev_language String Original language used by developer. category:Dialogue ...>dev_language
iXML ASWG billing_code String Music: project billing code. category:Music ...>billing_code
iXML ASWG composer String Music: Composer. category:Music ...>composer
iXML ASWG artist String Music: Name of artist . category:Music ...>artist
iXML ASWG song_title String Music: Song title. category:Music ...>song_title
iXML ASWG genre String Music: Genre. category:Music ...>genre
iXML ASWG sub_genre String Music: Sub-genre. category:Music ...>sub_genre
iXML ASWG producer String Music: Producer name. category:Music ...>producer
iXML ASWG music_sup String Music: Music supervisor. category:Music ...>music_sup
iXML ASWG instrument String Music: Instrument on track/stem. category:Music ...>instrument
iXML ASWG music_publisher String Music: PublishtimeSiger. category:Music ...>music_publisher
iXML ASWG rights_owner String Music: Owner of the recorded work. category:Music ...>rights_owner
iXML ASWG is_source String Music: Is this an asset as the composer delivered (source) or an edit of that source? true, false. category:Music ...>is_source
iXML ASWG is_loop String Is the content loopable - true, false. category:Music ...>is_loop
iXML ASWG intensity String Music: intensity. category:Music ...>intensity
iXML ASWG is_final String Music: Is cue temp or final. category:Music ...>is_final
iXML ASWG order_ref String Order reference of cue, if applicable *Internal. category:Music ...>order_ref
iXML ASWG is_ost String Music: Is part of the Original Soundtrack. category:Music ...>is_ost
iXML ASWG is_cinematic String Music: Asset is associated with a cinematic. category:Music ...>is_cinematic
iXML ASWG is_licensed String Music: Asset is licensed and owned by 3rd party. category:Music ...>is_licensed
iXML ASWG is_diegetic String Music: Track is diegetic in game. category:Music ...>is_diegetic
iXML ASWG music_version String Music: Version number. category:Music ...>music_version
iXML ASWG isrc_id String Music: ISRC code. category:Music" format="## ### ## ##### e.g 'UK AAA 05 00001' ...>isrc_id
iXML ASWG tempo String Music: Tempo in bpm. category:Music ...>tempo
iXML ASWG time_sig String Music: Time Signature. e.g 3:4. category:Music" format="A:B e.g '3:4' ...>time_sig
iXML ASWG in_key String Music: In key. category:Music ...>in_key

BEXT and LOUDNESS tags

"If this object appears (redundantly) in a WAVE file (for continuity of format) it is ESSENTIAL that the values match those in the official bext chunk in the same file."

[IXML2004]

The iXML spec includes duplicate fields from the bext chunk in the <BEXT> and <LOUDNESS> tags. This gives a place to store BEXT data in non WAV formats where other chunks may not exist. However, in WAV files it creates the possibility for confusion if they values differ. On top of that, the spec gives contradtictory information on which to use if they're different.

For users, be aware that this is a tricky area between specifications and different applications will sometimes display different data when values are different between these duplicated fields. If you see applications showing different data, check the metadata with a tool which will show you both.

For implementors focusing on maxium correctness:

  • Warn users if both versions of fields exist and they have different values.
  • Do write BEXT data into the bext chunk.
  • Do NOT write <BEXT> and <LOUDNESS> tags when writing the iXML chunk.

However, I've heard of software which doesn't read from bext at all if iXML is present... so...

For implementors focusing on maximum compatibility:

  • Warn users if both versions of fields exist and they have different values.
  • Do write BEXT data into the bext chunk.
  • Do write <BEXT> and <LOUDNESS> tags when writing the iXML chunk. Abort and give an error if values differ between the duplicate fields

Learning References

Specifications

  • [IXML2004] Standard for embedded metadata in production media files (2004).
  • [ASWG-G006] iXML-Extension from Sony PlayStation Studios' Audio Standards Working Group (2024).