Preservation of 'Born-Digital' Audiovisual Files

Preservation of 'Born-Digital' Audiovisual Files

Abstract

This article addresses a possible workflow and strategy to ingest audiovisual files (audio, video, film, etc.) in order to preserve them as long as possible and in a way that allows future users to open it under unknown, future technology conditions.

It discusses the properties of file formats, and their impact on preservability and quality. The concept of a feasible and practically proven workflow is also described so that others can reproduce it.

File Format Considerations

Here we'll take a look at which properties of a data format should be considered when Digital Long-Term Preservation (DLTP) of the content is the goal.

Let's assume long-term to be "infinity +1". To be on the safe side ;)

Then we'll try to get to that goal as close as possible. Since it is very unrealistic, that any file format or technology will be around and known forever, it makes sense to include the requirement of "Eternal Migration".

Eternal Migration is the concept of including exit plans from one format or technology to a proper successor. Not just once, but repeating this planning when using a new format (or technology).

Unless we have a time machine or crystal ball to look into the future, we can only make assumptions, extrapolated from previous experiences and knowhow from other formats with similar properties or situations. The best we can do is to at least reduce the amount of additional, often men-made obstacles or issues that hinder future accessibility of the object.

Long story short: There is no one-size-fits-all or silver bullet, but we can increase the chance that we did our job well.

We need to provide future archivists and users with as much "meta"-stuff they need to get replayer X working with "Unknown Future Technology 3000" and open our "weird" digital carriers and properly open the videos stored like that.

To quote "Impact of Technology-Licensing on Archiving":

"Therefore, proprietary formats should be avoided for use in long-term preservation. Technology or formats that are restrictively licensed, patented or solely available from one vendor, can generally be expected to cause problems or obstacles for future generations. In order to avoid preservation-hindering scenarios, demanding commercially supported FOSS instead might be a perfectly suitable improvement."

And:

These requirements shall be applied to all data formats used within a single file or data set.

Compression: Lossy, Lossless, Uncompressed

This section is an overview of different compression methods for audiovisual media.

The more often I use these words in a row: "lossy, lossless, uncompressed", the more it sounds a bit like a witch spell. I hope it's not, because it is such an important topic that everyone who has to do with digital audiovisual files, that we'll keep hearing these three words a lot in the near future.

Why? Because lossy compression is currently the most convenient, while lossless or uncompressed are still not even widely supported or considered by developers, vendors and even teachers, archivists or producers.

The default is to think lossy. And assume "it just is the only way".

There's a good reason lossy being so omnipresent: uncompressed or lossless are still way to huge, and often require more processing power or faster storage devices and processing pipelines. Adds to costs. Considerably.

But it's still worth it: If you can, go for lossless or uncompressed for preservation (or editing), in order not to add additional generation/quality loss. Again, both have its pros and cons. It's often overlooked, by the way, that each form of "uncompressed" requires a different codec.

Uncompressed is easier to be understood without any specification. It can easily be reverse-engineered by brute force.

Lossless codecs may offer additional features, such as error-resilience properties which can help to restore copies that have survived into the future.

Although it should technically be (easily?) possible to create an uncompressed codec with error-resilience data embedded. For the time being, it's neither common nor widely implemented. As far as I know, using Matroska (.mkv) as container format might add some resilience-data. Not widely implemented or know, but H.264 (MPEG-4 AVC) has an uncompressed mode that might make the bitstream more robust. Maybe "Uncompressed FFV1"? ;) Why not.

But please be aware that with the necessity of Eternal Migration to provide long-term preservation, it's required to avoid generation loss wherever possible. I already wish I had not thrown away the original DV files of my own family videos. Back then I thought Video CDs (VCDs) were the future. I was wrong. And now I need to transcode (and lose even more quality) in order to watch it on my Raspberry Pi.

Thanks to 100fps.com, I also learned my lesson that deinterlacing can greatly hurt the image.

Since only FOSS licensed programs and open formats are used in the workflows mentioned here, I can easily apply professional preservation standards to my own private collections.

I've also learned some lessions from audio production that might save some trouble with video. Such as: How to preserve my "Digital Audio Workstation (DAW) editing projects for future edits? One lifesaver: hashcodes and proper backups. And testing the restore before you need it.

Container

Almost every audiovisual format has a container. This is a wrapper around the actual content, such as audio- or video-streams. It's often the place where technical- and descriptive-metadata is stored. For example: the image resolution, the audio samplerate or author and title, etc.

It also contains data that is necessary for keeping all streams that are to be played simultaneously, synchronized. In difference to analog carriers, that could literally contain streams of video and audio data, physically locked to each other, therefore never running out of sync - digital streams are separately stored chunks of audio- and video-data, fitted with timestamps that tell the replayer which chunks to align. Seems to be a tough job.

Additionally, containers can also contain time-based metadata tracks/streams, like timecode or subtitles for example. Even whole files can be embedded in a audiovisual container. RAWcooked for example, uses this feature to create something like "ZIP for film", by packing non-audiovisual metadata files into the videofile.

There are numerous audio/video container formats in existence - and used.

Some of the most common ones are:

  • Video
    • MKV (Matroska)
    • AVI (Audio Video Interleave)
    • MOV (Quicktime)
    • FLV (Flash Video)
    • MP4,M4V (MPEG-4 Video)
    • ...
  • Audio
    • WAV (RIFF Container)
    • MP3 (MPEG-1, Layer 3)
    • OGG (Ogg)
    • M4A,AAC (MPEG-4 Audio)
    • ...

Since this data format defines the file extension, it is often the term commonly used to say "which audio/video format" it is. This however, only says which container is used.

Since the container is like a folder that contains the audio- video- and meta-data, it can be modified, augmented or even changed to a different container format, without necessarily modifying the actual payload inside (audio, video, etc). This is great for DLTP, as it allows us to migrate from the source container format to one that is more suitable. For example, Matroska (MKV) contains error-resilience data in its structure, which allows to detect and even correct bit-errors.

This method is often used to "rewrap" or "remultiplex" (="remux" for short) from a proprietary container to an Open Standard format.

Unfortunately, not all applications support all containers equally. Even if an application supports, for example "MOV", it doesn't mean that it supports all versions, flavors and features of it. Sometimes not at all, sometimes improperly. Therefore producing "dialect" versions of a format (language), which is a major reason for interoperability issues or quirks.

It can even make sense to rewrap the container to the same format: Since different programs or cameras that create the media file may have different implementation variatons that cause "dialects". So files from different sources sometimes behave differently. A common issue for example, is audio and video running out of sync.

If you rewrap the container, say ".mp4" to ".mp4" for example, you use the same program, that uses the same "dialect" or "flavor" for creating your preseration copies (e.g. FFmpeg). Therefore, you have the ability to test if your encoder understands the source "dialect" properly and produces specification-conform output - therefore (re-)gaining control over your archive material. You may be able to augment the newly generated container with error-correction bits or additional metadata.

However, if you rewrap or modify the container, the hashcode (e.g. MD5) of that file changes of course, and must therefore be recalculated for future fixity checks.

See the section "Rewrapping the Container" for details on how to do it practically.

Video

The video is stored as "stream" inside the container.

It not only contains the actual video/film image data, but also technical metadata, such as aspect ratio, framerate or error-correction data.

If we're lucky, the value in metadata fields that container and codec have in common, matches. Unfortunately, I've seen too many files where this wasn't the case. So I can not say it's highly unlikely to happen.

For proper DLTP, the principle of preferring FOSS and avoiding proprietary, applies here too. So check if your source videocodec has the right licensing and specification properties.

If you want to convert (=transcode) to a different videocodec, you need to check if the target codec is able to preserve the significant properties, while maintaining the quality/fidelity closest to the original source.

Lossy, Lossless, Uncompressed

For example, if you transcode from a lossy codec (e.g. MPEG-2) to a lossy codec (e.g. MPEG-4 AVC/H.264), you preserve the significant properties (yay!) but add one generation loss to the image quality (oooh...).

The only way to avoid this generation loss, is to transcode to a lossless target format. This can either be "mathematically-losslessly compressed" ("lossless" for short) - or not compressed at all: literally "uncompressed".

A common assumption is that "uncompressed" is always the same form, supported by all applications. It is not.

Although uncompressed or lossless is by definition always the best possible option for preservation/editing. There's a reason why it's the exception: The rule is lossy.

Because video is HUGE (and fast camera storage expensive).

Approximately 1.4 GB per minute for uncompressed PAL SD-material, 8 bits-per-component (bpc). Imagine 4k/16bpc... See section "File Size Matters" for details on this.

Therefore, if the source codec is lossy, it would be nice if we could just keep it in this format. It's waaay smaller.

But if the source format is not an Open Standard (or at least has its de-facto- & reference-implementation available under a FOSS-license), we should transcode.

Audio

The audio is stored as "stream" inside the container.

Luckily, digital audio is well-charted territory. The audio engineers always pioneered their profession before video ;P Just kidding. I just like to remind myself every now and then not to forget considering audio when dealing with video/film.

Audio has feelings too, you know.

Another great thing about audio is, that we can now easily handle its filesizes. Whenever you deal with digital moving image size considerations, audio size feels like a joy to deal with. See "File Size Matters" below for details on this.

Lossy, Lossless, Uncompressed

Whereas video is currently "lossy by default" - even in production and professional use, it's the opposite case with audio.

This is also the reason, by the way, why high quality, high accuracy analog-digital converters (ADC) are actively manufactured, supported and available for audio, but finding good ADCs for tape-based video sources is getting harder: Video is now digital - lossy - from the source: the camera. Whereas audio still has an analogue part of their signal chain as daily business.

Again, if you can: Store uncompressed or lossless.

The size doesn't really matter much in this case (because uncompressed audio is no problem today), so currently you're quite safe transcoding audio to uncompressed linear PCM. There might be issues here and there with endianess, signedness, integer or float, but it's common knowledge to the audio tribe how to convert losslessly from one to the other. Because: Mac vs Windows.

As a band, we always had to convert our demo tape recordings to "Big" Endian to be considered "professional". The majority of PCM audio is "Little" Endian still.

File Size Matters

TODO

Don't Trust In Digital: Keep the Original

Just kidding. Digital is not a bad thing. Just don't trust it blindly. There are pitfalls, but they can be managed with the right preparations.

So don't throw away the original source file(s) if at all possible.

It's like keeping the analogue carrier, even though you've already digitized it. We all know already how often we were lucky to be able to re-digitize a precious artifact, seeing what would have been lost to the future viewers if the original had been disposed of.

Even just an offline copy on LTO-tape. Just in case. You won't regret it.

There are two kinds of "unknowns" we have to deal with:

  1. The known unknowns
  2. The unknown unknowns

The first one, we can deal with: For example, if you know you can't take the timecode track as stream in your archive copy, so you set the start-timecode metadata field, and document that you've done so.

Unknown, how big the effect of that decision will be in the future. But you had to make a decision. That's necessary.

The second one is a bit trickier: What if there was valuable metadata embedded in the original data set, that you were unaware of? It was unknown to you that e.g. deinterlacing is very bad, but you just learned about this later. Too late? If I had kept the original, it would be no problem :)

You might also want to have a backup copy of the original.

Practical Workflows

FFmpeg is Key/King

If you are working with audiovisual media files, you will want to learn at least some "FFmpeg-Foo". "ffmprovisr" is a great reference for preservation-related FFmpeg commands. You'll master your use cases eventually, and you will find that you'll feel much more in control of your digital media "assets".

FFmpeg is the swiss army knife for manipulating digital audiovisual files and formats.

It is a commandline tool that is most oftenly used to convert from one format to another. It is one of the few video tools that allow to rewrap the container, cut or concatenate sequences without requiring to re-transcode the audio/video streams. This is perfect for high-quality workflows and digital preservation.

To make it an even better partner for DLTP, FFmpeg originated as - and still is - a FOSS licensed application. Thanks to its developers and maintainers (Hi Michael! Hi Carl! :)), its core libraries are well known and widely used inside almost every multimedia technology in existence today. Even proprietary ones. But for some reason proprietary vendors are usually quiet about this. It's often fun to ask them how they "implemented this or that". You should try! ;P Great at trade shows or conferences. It gives us archivists a more accurate insight about their products than their sales materials.

Rewrapping the Container

Above, I've explained why it almost always makes sense to rewrap the container. For video it definitely makes a difference, whereas I must admit that I haven't had container issues with audio files (except once: MP3 encoded audio in a RIFF(WAV) container).

Audio extracted from a video container to stand alone audio: sometimes issues.

Here's our first magic spell: Converting AVI to MKV.

$ ffmpeg -i VIDEO_IN.avi -c:v copy -c:a copy VIDEO_OUT.mkv

Here's what the parameters do:

  • -i: input audio/video source
  • -c:v: which video codec to use. "copy" means transfer codec-stream as-is.
  • -c:a: which audio codec to use. "copy" means transfer codec-stream as-is.

So FFmpeg takes "VIDEO_IN.avi", copies the audio- and video-streams in their respective source codec and writes it to "VIDEO_OUT.mkv". Usually, FFmpeg will guess the right output container format by the file ending of the output filename.

The same command, but shorter syntax: Just using "-c copy" to apply to both: audio and video streams.

$ ffmpeg -i VIDEO_IN.avi -c copy VIDEO_OUT.mkv

Comfortably easy, isn't it?

Audio

For additional reading on transcoding/encoding audio, see the "HighQualityAudio" article in the FFmpeg Wiki.

Transcode to Uncompressed

Generates the largest files.

$ ffmpeg -i VIDEO_IN.mp4 -c:v copy -c:a pcm_s16le VIDEO_OUT.mkv

Here's what the parameters do:

  • "-c:v copy": Keep video codec-stream as-is.
  • "-c:a pcm_s16le": This contains several audio encoding options in one.
    • pcm: Uncompressed PCM as audio codec.
    • s: signed (as opposed to "unsigned")
    • 16: Audio sample bit-depth. Common values are 16 or 24 bits.
    • le: "Little Endian" [PC] (as opposed to "Big Endian") [MAC]

As mentioned above, transcoding audio to uncompressed PCM can be considered as default preservation option.

The above command is a quite popular choice for e.g. keeping lossy video as-is (saves space), while normalizing the audio format.

Oh, btw: The MP4 container specification seems not to include PCM as an option. It's technically possible though, and some Sony cameras produce H.264/PCM in MP4 files.

Transcode to Lossless

Generates "smaller" files than uncompressed, yet bigger than lossy.

$ ffmpeg -i VIDEO_IN.mp4 -c:v copy -c:a flac -compression_level 12 VIDEO_OUT.mkv

Here's what the parameters do:

  • "-c:v copy": Keep video codec-stream as-is.
  • "-c:a flac": Use the "Free Lossless Audio Codec" (FLAC) for audio encoding.
  • "-compression_level 12": Use maximum compression. Default is 5.

See the FFmpeg documentation for more FLAC encoding options.

Transcode to Lossy

Generates the smallest files.

The audible artifacts of a digital generation loss due to lossy compression in (moving-)images are perceived stronger and less "comfortable" than compared to analogue. Digital glitch-art has very different esthetics than retro media.

So please avoid this for preservation copies. It's common for creating access copies though.

$ ffmpeg -i VIDEO_IN.mp4 -c:v copy -c:a aac -b:a 160k VIDEO_OUT.mkv

Here's what the parameters do:

  • "-c:v copy": Keep video codec-stream as-is.
  • "-c:a aac": Use the "Advanced Audio Coding" (AAC) as audio format. It's very common in combination with H.264.
  • "-b:a 160k": Defines the bitrate to use for encoding. Rule of thumb: "bigger = better quality."

Video

Transcode to Uncompressed

Generates the largest files.

$ ffmpeg -i VIDEO_IN -c:a copy -c:v

TODO

Here's what the parameters do:

  • "-c:a copy": Keep audio codec-stream as-is.

Transcode to Lossless

Generates "smaller" files than uncompressed, yet bigger than lossy.

$ ffmpeg -i VIDEO_IN. -c:a copy -c:v ffv1 VIDEO_OUT.mkv

Here's what the parameters do:

FFV1 is currently the best supported lossless videocodec, with a very good size/speed ratio. For preservation, it's good practice to define some other encoding options, too:

$ ffmpeg -i VIDEO_IN. -c:a copy -c:v ffv1 -level 3 -slices 24 -slicecrc 1 -g 1 VIDEO_OUT.mkv

Here's what the parameters do:

  • "-c:a copy": Keep audio codec-stream as-is.
  • "-c:v ffv1": Encode to FFV1
  • "-level 3": Use FFV1 Version 3 (FFV1.3) to support multithreading and error-resilience (slice-CRC).
  • "-slices 24": Split each frame into 24 pieces. This number has an impact on size and encoding/decoding performance.
  • "-slicecrc 1": Equip each frame-slice with "Cyclic Redundancy Check" (CRC) bits.

Transcode to Lossy

Generates the smallest files.

The visual artifacts of a digital generation loss due to lossy compression in (moving-)images are perceived stronger and less "comfortable" than compared to analogue. Digital glitch-art has very different esthetics than retro media.

It also matters which lossy codecs were used as source and target format, because visual artifacts of different strength or shapes are produced, depending on the interaction and nature of the codec algorithms. The artifacts accumulate, as they are applied in sequence.

$ ffmpeg -i VIDEO_IN. -c:a copy -c:v libx264 -b:v 850k VIDEO_OUT.mkv

Here's what the parameters do:

  • "-c:a copy": Keep audio codec-stream as-is.
  • "-c:v libx264": Use the "x264" implementation to encode H.264 video.
  • "-b:v 850k": Set video encoding bitrate to 850 kbps. Use higher values (e.g Mbit) for higher quality (=larger files, of course).

See the FFmpeg documentation for more H.264 encoding options.

<!-- TODO ## Verifying the Rewrapping/Transcoding

TODO framemd5