September 14, 2007

Repositories 101 — Part 2 (For Users)

Filed under: Repositories,Repositories 101 — Neil Godfrey @ 12:13 am

Since repository collections are generally run by librarians the librarian philosophy of “service for clients” rules.

Librarians take care in decisions about the decor of their reading rooms, how their shelves and computers are arranged, building signage and portal customizations, client service skills, policies for different types of users and different types of resources, how their materials are described and classified, vendor choices and workflows, etc etc — all for the purpose of improving service to clients.

It is the same with repositories. Librarians adapt their user-first philosophy to the different types of resources and different ways of handling resources in repositories.

Where and who are the clients?

Repository users are academics and students within the repository’s institution.

They are also academics, students and generally interested others anywhere in the world.

These same authors will also be regular “input clients” as they access the repository to deposit their works with the expectation of higher citation rates; or to store datasets for the longterm ability for peers to verify and build on their research; or to raise issues with peers instantly through online and centrally accessed discussion papers, etc.

They are the authors of the materials deposited into the repositories since they will be checking through the repository how often their work is accessed and downloaded by others.

Academic reporting brings in another set of “clients” or assessors although they will be making restricted use of certain materials accessed via repositories. This is a separate topic.

How do the clients access the documents in repositories?

On the internet.

First time users in the repository’s institution will find it as easy as going to their library’s homepage and clicking on the link to their repository. The repository webpage will have browse and search pages to guide users for finding works by a particular author, or works on a particular subject, or works of a particular category (e.g. peer-reviewed publications only), and so on.

Users anywhere else in the world who are looking for a particular work by a known academic can type the name and keywords in to a Google or Yahoo! search box and chances are very good that the exact thing they are wanting will appear at or near the top of the search results/hits page. A click takes them to that work in the repository.

You can test this by going to one of the repositories linked in my previous (Part 1) post, locating an author’s and title of something they have written, and copying a few distinctive words from name and title into a Google or Yahoo! browse box and seeing how easy they are to find that way.

Users anywhere can also go to a website that specializes in collating repository resources and search across one or more other repositories directly. These sites are known as Service Providers. The service they provide for users is making available a storage of information held in all repositories who have registered with that particular Service Provider. Users can go to a Service Provider and search a range of repositories by authors, keywords, and such.

Some examples of Service Providers:

OAIster (a union catalogue of digital resources)

Arrow National Discovery Service (hosted by National Library of Australia)

Scirus (for scientific resources only)

ADT (for higher research theses from Australasia)

Currently in development: Driver (for European research)

How it works

Users think how easy it is to find a book in a library and have no idea how much work goes in to making it so easy. They wonder what on earth librarians could do all day behind office doors except label and read books and send out overdue notices.

We know libraries need to apply specialist standards and skills to make sure their books are easily found. AACR2 has become a basic standard for describing materials and access points so they can be readily found. There are name and subject authorities and ‘see’ and ‘see also’ references to put some controls on the various ways names and topics can be written. And MARC is a brilliant tool for processing these rules.

Easy-to-find hits in Google and Yahoo! for repository documents do not happen by chance.

Repositories need a way to communicate with Service Providers so their data can be centralized for easy searching.

And people who work with repositories need to know and understand what standards, authorities and tools are required to make it all happen so easily.

Communicating with Service Providers

The repository-Service Provider communication is the basic level of communication that enables repositories to be found on the web. (I’ll explain in a future post how communication with Service Providers affects Google and Yahoo! search results too. But it needs to be only vaguely appreciated that it is the repository-Service Provider communication that is the basis at this stage of most Google and Yahoo! hits.)

The common standard: Dublin Core

The first step in communicating is to use a common language or standard way of reading information. Repositories and Service Providers speak to each other through a standard called Dublin Core. Dublin Core is a standard way of describing documents that was created for the purpose of sharing information about the vast range of resources available on the internet. So its role includes but also extends beyond the library world. And it is used to bring repository resources into the wider world of the internet:

  • an editor or librarian enters data into a repository to describe a document it holds;
  • we want the repository to communicate to a Service Provider that it contains this document so that it can be found by an international audience on the internet;
  • so the repository needs to convert key information about that document (its title, author, subject, date) into a Dublin Core description so the Service Provider can understand and use that information.

Librarians are used to working with AACR2 and MARC; Dublin Core is another tool that is used by repositories. AACR2 and MARC are used to enable libraries to function for users and share information among themselves; Dublin Core is used by repositories to enable users to freely locate the resources they want to read on the internet. (It’s name derives from Dublin in Ohio, not Dublin in Ireland.)

Dublin Core is much simpler than MARC or AACR. MARC has hundreds of fields and tag indicators and subelement codes and AACR has hundreds of pages of rules.

Dublin Core has 15 elements. That is 15 words. That’s it.

(There is also an expanded or “qualified” Dublin Core standard that extends these 15 into dozens of additional qualifiers, but that is not relevant for Service Providers to find repositories, gather or “harvest” a list of their contents, and enable users to search for specific authors or keywords. This expanded Dublin Core is referred to as “Qualified Dublin Core” and the basic Dublin Core of 15 elements is sometimes called Simple or Unqualified Dublin Core.)

The 15 elements or properties of Dublin Core are:

  • title
  • creator
  • contributor
  • date
  • type
  • format
  • subject
  • description
  • coverage
  • relation
  • publisher
  • source
  • language
  • rights
  • identifier

Those are listed in no particular order. It does not matter what order they appear in, and as general rule it does not matter how many times any of them are repeated, and one does not need to use all of them for every record one is describing.

But just as the complexity of AACR and MARC can sometimes raise more questions than they answer, so can the simplicity of Dublin Core also sometimes pose difficult questions to resolve. So librarian, cataloguing and technical skills are not made obsolete by its simplicity.

What a Dublin Core record looks like:

<dc:title>Waking up to a new future</dc:title>
<dc:creator>Inayatulluh, S</dc:creator>
<dc:publisher>Graduate Institute of Futures Studies. Online</dc:publisher>
<dc:type>Journal Article </dc:type>
<dc:relation>Journal of Futures Studies</dc:relation>

This looks very brief and librarians can be excused for wondering where all the other info that normally goes in to a library record has gone.

Data Providers, Service Providers and Harvesting

All the other details necessary for bibliographic citation and more are still in the repository ready to be used. But first things first. The repository wants to have its collection “harvested” by the Service Provider. That means it has to provide just enough data about each of its records so that the Service Provider can list them all and index them with a few basic headings like “title” and “creator”.

So in this context the repository is sometimes called a Data Provider. It provides data coded in Dublin Core to the Service Provider. The Service Provider is said to harvest the records provided by the repository or data provider.

The data provider at a more technical level is actually the component in the repository computer system that generates data for the service provider, but in general terms the repository itself is often referred to as the data provider.

The Data Provider will provide enough data (in the Dublin Core format above) for the Service Provider to know what each of the repository record is about and how to locate it. It will not usually collect all the data the repository has about each of its records. But it does want enough to be able to present users with a brief general description and enough terms suitable for indexed searching, so users can search for “authors” or “subjects” etc. Have a look at some of the Service Providers listed above to see what information they contain about the repository records.

Displaying the full records for users

Once the Service Provider has harvested the repositories via Dublin Core, users can then navigate their way to the document they want to read and that is most commonly still stored in the repository. The Service Provider redirects the user to the document and the repository where it is stored.

So the user gains access to the repository and its stored document via the Service Provider.

Once the user has been redirected to the repository, they can see all the other descriptive data about the stored document. It was not all available in Dublin Core to the Service Provider, but the Service Provider harvested enough Dublin Core information to enable the users to find what they wanted and to redirect them to the repository where it was located.

Once in the repository the user can see the rest of the information necessary for full bibliographic citation, what rights are associated with the document, the size of the file they may have to download, etc. Most of that information is not in Dublin Core. Dublin Core data was only necessary to share just enough information for Service Providers to assist users find what they are looking for.

%d bloggers like this: