Reconciling the OAIS Model with Information Theory

Nick Krabbenhoeft
Topics: 

Information science is the realm of libraries, archives, and museums. Information theory is the realm of mathematics and engineering. Never the twain shall meet… which is a shame. The results from information theory are so powerful and so applicable to LAMs that there should be more incorporation. The Information by James Gleick is an incredibly accessible read into how we quantized information, slowly understanding how deeply it underpins everything.

As a researcher for AT&T, Claude Shannon developed the foundation of information theory. He studied communication (the transmission of information), eventually developing a framework to describe it in a paper, A Mathematical Theory of Communication.This paper is important enough that when it was republished as a monograph, they changed the A to The. On the third page, he draws a simple but powerful diagram to describe any kind of communication.

Shannon's model of communication
Producer
something that makes a message; a person, a sensor, a frog, etc
Encoder
a mechanism for converting a message into a signal in a specific medium; writing, beeping, ribbiting, etc
Decoder
a mechanism for retrieving a message from a signal in a specific medium; reading, recording, listening, etc
Consumer
something that receives a message; another person, a computer, another frog, etc
Noise
any cause of unwanted distortion in the message signal between its encoding and decoding; a letter getting wet, electrical interference, a windy night, etc

What makes this model so powerful is how generalizable it is. I put a few examples in the definitions, but you could describe any process of communication. This includes institutions like libraries, archives, and museums (LAMs), which accept objects and protect them against many kinds of loss for the purpose of future access.

Now, I cheated in my model of Shannon’s diagram above. Since he was studying communication networks, he used terms like transmitter and destination. He also didn’t have that box labeled “Information System.” But, I wanted to link this to another powerful model, one that does exist in the world of LAMs, OAIS (Open Archival Information System). OAIS describes a system that preserves access to materials. The full standard is extremely detailed, but we’re interested in the model that describes the framework.

The OAIS model mixed with the Shannon's model of communication. The OAIS functions should prevent noise from corrupting the contents of the system.

Every archival information system has 6 core functions:

Ingest
accept and describe information from producers
Data Management
store the description (metadata) of the information
Archival Storage
store the information itself
Dissemination
allow consumers to access the information via its description
Preservation Planning
scan for challenges to preservation
Administration
manage the operation of the system

Just like Shannon’s model, what makes OAIS so powerful is its generalizability. Unfortunately, OAIS includes the word archival, which has the effect of making it seem like its own applicable to archives (particularly digital archives). However, this model is strong enough that it could describe

a museum
an archive where you can only access what’s on exhibit (no touching!)
a library
an archive where you can take stuff home
a bank
an archive only concerned with how numbers go and up down

The list goes on. …and is probably worth its own series of posts. I have already used the model to compare what functions really separate libraries, archives, and museums. The key point is that OAIS is generalizable because it is a more explicit version of Shannon’s model. Where Shannon has an encoder and decoder in the system, OAIS terms them ingest and dissemination and splits the connection between them into storage for metadata and the information itself. What OAIS does not include from Shannon's model, and what I sneakily added, is noise.

The entire purpose of OAIS is to protect long-term access to information against sources of noise. Noise can cover almost any threat to the stability of information. For example, noise could include cosmic rays (causing bit rot), user error (causing metadata loss), technology trends (causing format obsolescence), or shifting budget politics (causing funding shortages). A goal of any LAM should be to have policies, processes, and technologies to mitigate the threat of noise.

The functions that mitigate the threat of noise, preservation planning and administration, are the real workhorse of OAIS. Shannon’s model describes a one-way flow of information from producers to consumers. The preservation planning and administration functions are two-way conversations with both producers and consumers. They work separately but parallel to the flow of information in order to document the causes of noise and prepare updates to the ingest, data management, storage, and dissemination functions to respond to those threats.

Good preservation is not passive. I don't mean preservation should require constant item-level preservation actions. That should be the exception, not the rule. Good preservation requires regular contact with communities to understand their needs, regular updates of processes and technologies, and regular reports on how well a system is actually performing.

This is a high-level application of information theory in information science, but there are deeper ones as well.One that I’m looking at right now is using entropy to evaluate the richness of metadata records. See page 21 of Metadata Quality for Federated Collections for an example of its use on Dublin Core elements. Information theory is an incredibly deep field of mathematics that addresses many issues including encryption, error-checking, and decoding. However, its results re-contextualize the way we can view and improve our collections.

Add new comment