The basic metadata supporting OAI harvesting is Simple Dublin Core. A data provider (repository) that intends to be compliant with the requirements of OAI harvesting will produce an unqualified DC datastream as a minimum requirement.
At least one repository solution, VTLS’s VITAL, is designed to use the simple DC data as the basis for the repository’s metadata splash page that contains the repository institution’s branding and is used to direct the user to the archived document that the user is requesting. (As far as I know at this point this is not an issue with open source repositories such as Eprints and Dspace.)
This means the repository is attempting to use a single datastream for two different purposes. That becomes a problem if the oai-dc data is constructed in a way that meets one purpose (e.g. the oai harvester), but that particular dc construct is not what we want for the other function (e.g. the portal display of the metadata page). That metadata display page with the link to the deposited document would be better linked to some other datastream — such as a MARC or MODS or VRA or anything OTHER than the OAI-DC data configuration.
This means that a repository manager must be very clear about exactly what it wants a service provider (SP) to display from its repository. For example, does one want the service provider to display the repository’s metadata splash page for each document, so that public users will be first directed to their metadata details for a particular record, where institutional branding also is prominent, and from there link to the full article or document? Or does one want to cut out one’s repository branding and descriptive metadata page and allow the SP to take a user directly to the article. What the SP will do will depend on how certain data is entered into the oai-dc datastream.
When an SP receives a request for a particular article in your repository, it will rely on the oai-dc record to “identify” that particular article. It thus looks for a dc.identifier value with a resolvable URI link.
This means that:
- If the URI value in a dc.identifier is the link to the repository’s metadata page, complete with the full descriptive metadata record of the article, institution’s branding, and link to the full text of the document, then the SP will direct users to this repository page.
- If, however, the URI value in the dc.identifier is the link directly to the article itself, possibly offline at, say, a publisher’s site, then the SP will bypass the repository metadata page and direct users directly to the article wherever it is located.
- If there are dc.identifier values that are non-resolvable text strings such as an ISSN the SP will ignore these for this purpose.
Normally a repository can and will be configured so that documents deposited into it will generate in the oai-dc a dc.identifier value that is a handle or link to the repository metadata page first.
But if the repository contains only a link to an offsite copy of the document, and if this is also entered into a dc.identifier field, the SP will direct users away from the repository to the off-site document. No problem, perhaps, for the self-effacing repository manager who wishes to serve the user more than the reputation of the institution supporting the repository, but not politically savvy if one of the very arguments presented to fund the repository in the first place was that it would increase the institution’s exposure to the world. That mediating branding page is normally pretty important.
There are two or three ways around this but each has drawbacks.
One can enter an offsite link to dc.relation instead of dc.identifier. This reserves the dc.identifier field for the repository default metadata page link — normally machine generated by the repository itself.
Another solution is to enter the offsite link into the dc.format so that it would look like this:
<dc.format>PDF http://www.offsitelink.com/name.pdf </dc.format>
Either of these solutions will cause a problem for the manager whose repository is dependent on mapping its portal display options from that same oai-dc record.
It will mean that a portal display link to a deposited record within the repository itself will be mapped from dc.identifier, while a portal display link to an offsite record will be mapped from dc.relation or dc.format.
So the consistency issue arises if one’s repository depends on mapping its display from the same data that is used for oai harvesting.
An offsite link will have to be treated the same way for display purposes (same portal display label terms) as all other values entered in other dc.relation or dc.format fields.
That can cause headaches. One wants to show an onsite link and an offsite link to an article in a similar way for users. What is important to them is that they can see at a glance a constant way to get to the article regardless of where it is stored. One does not want to present an offsite link in a way that looks like it is not a link to the article described, but to some “relation” of it, for example. One can rename “relation” for the portal display, but it means that whatever display name is chosen for the display must be constant for all other display names mapped from that same dc.relation in the oai-dc field.
And whatever solution is decided upon will need to consider preservation and sustainability questions. One day the records will be migrated to some other software — what will happen to any such solutions then? What systems will ensure consistency over time within a repository, and what issues will arise in the broader world of databases needing to be able to talk to each other in the future?
Repository designs need to allow for record and metadata displays to be configured independently of the data used for oai harvesting.