Meta-reflections 2

May 10, 2008

Meta-reflections 2

Filed under: Metadata,Repositories — Neil Godfrey @ 11:04 am
Tags: Eprints

I did not come into RUBRIC cold. I was involved in the planning stages of the implementation of an EPrints repository, and then as a cataloguer I was the one responsible for testing how the data entries and outputs worked.

The point was to understand how and where the repository data stood in relation to the rest of the library’s resources, and in relation to the needs and interests of the various university departments and academics who would be using the repository.

We began low-key. Not with award winning research papers or datasets embedded with configurations for their re-use. Rather, it was decided to make our first entries in the repository all the fourth year engineering projects. New students wanted to consult these, and lecturers encouraged the study of them. We would soon learn that once in the repository they would acquire a sizable audience beyond our university, too.

But low-key soon made its voice heard loudly if not immediately very clearly. It was a low key beginning in one respect, but to make it all work I quickly found myself visiting the head and other academics in the engineering department in order to confidently assess the metadata requirements for this sort of resource material. Dewey, USMARC, AACR2, the LOC and OCLC sites all left me high and dry with some of my the metadata needs for this new type of database.

On the technical side, some of the files accompanying some the projects were simply too big or formatted in a way that would not fit in the repository, let alone be viewable to users. Some files were uselessly battened down with password-only access. Some MSWord documents that we wanted to convert to PDF contained formatting that made that process extremely difficult and time-consuming. Most of these issues were solvable or postponable at that stage.

But on the metadata side, academics used a standard research classification code to classify their works. But when I looked into that code (the RFCD component of ASRC) it emerged that this research classification served a different purpose from the descriptive subject classifications such as LCSH. It was not a descriptive discovery classification scheme at all, but a classification scheme for grants and policy reviews. Yet it was the schema known among and used by academic institutions. For starters, there were too many “XYZ not elsewhere classified” entries in it to be of real use as a true subject finding aid.

It appeared from a library cataloguer’s perspective that with our repository we were trying to make a square peg fit into a round hole.

I was yet to learn that the repository was not going to be simply a quantitative extension of traditional library resource services. The repository was not just an enhancement of our services. It was going to be something quite qualitatively different, heading in a direction hitherto alien to libraries, and we needed to start thinking of ways traditional library services themselves could start working with it.

The outcomes from a metadata perspective of this exploratory testing were three documents in particular. The first was a manual of best practice for metadata entry procedures for our EPrints 2.(X) repository. Not just a do/don’t list, but a guideline that explained rationales so that the principles could be applied with a bit of thought and understanding for whoever was to do the data entry, and to know how to work through the inevitable curly questions that come along with data entry.

But that was the easy bit. Much of the fun and frustration of this exploratory work was trying to figure out how to handle all the exceptions to the rules.

Did the exceptions matter? Was it important to standardize repository and library catalogue data? If there was no immediate need, what of the future? What function(s) did each bit of data serve and to what extent could the software take care of our needs, and to what extent were data entry guidelines required?

To help work through these questions I compiled a list of all the potential metadata issues, from the big and fat to the nit-picky or even possibly illusory. This was constructed in 3 columns. The first was a description of each issue. The second was a comparison of that issue with what was understood from normal library practice. And the third was for comments on what the real or imaginary implications might be in the longterm for each “issue”.

This helped focus and prioritize the issues, and eventually I produced a more formal thesaurus of metadata exceptions for USQ’s Eprints repository.

The attempted goal in compiling those documents was to sift through known and true library standards and best practices and to see how these could be applied, if at all, to repository metadata. And where no normal library data standards did apply then to try to assess what a standard or best practice should look like. This sometimes meant attempting to extrapolate from a rule in cataloguing and seeing if it could or should be justified in the repository context.

I have no doubt that given what I have learned since I would change some, possibly much, of what I wrote in those documents. But one has to start somewhere.

I was still very green when it came to broader issues such as OAI harvesting. And we were only working at this stage with pdf documents.

But will elaborate in future meta-reflections . . . . .

Comments Off

Metalogger

May 10, 2008