May 16, 2008

Reflecting on falling through the cracks, and segmented leadership in the Australian repository scene

Filed under: Harvesting,Repositories — Neil Godfrey @ 1:01 am

My last post recollecting the time it took to learn about the difference between a DCMI rule and an OAI-PMH rule for the meaning of dc:identifier — a difference that only made sense in the context of the politics of what repositories are about — in hindsight looks very embarrassing. It’s obvious when you know.

But it was not obvious to everyone I spoke to who is closely tied up with the DCMI community. And asking strangers via email questions about something complex which one is learning from scratch can be fraught with the cloudiness of not quite understanding what the real issue is, and therefore how to frame the question, and the two parties not quite understanding the frames of reference of each other.

The answer only finally became obvious after several face to face encounters at a conference, and then finally finding “the key person” — a harvester woman! — to talk to, with pen and paper and lots of doodling diagrams. Till then, some who were specialists in their particular area were saying the conflict ought not to exist, and that it needed to be fixed. I was beginning to think I knew more of the issues and questions than the veterans, if not the answers. But it was really only a question of finding which one of the scores of people in the room to single out for this particular question. A question relating to DC did not necessarily mean that a DC specialist theorist would know how to answer it.

Lesson: in something as complex and new as repositories and their related activities such as harvesting, we cannot rely on the normal channels of communication and learning that work for well-established protocols and systems, as we have with normal library functions. I found massive background reading was essential, and even then there turned out to be gaps that were only filled by direct personal exchanges.

It’s a team sport, with all players needing to share their experiences and issues, and to get together often (not exclusively virtually either) to plan and discuss what they are doing, hassling over, etc. Then the simple and obvious things really are.

But that’s hardly the optimum way of operating — it’s too easy for one to fall through cracks along the way and wait to be picked up and dusted off.

That was when I was involved with simple “first generation” repositories. Deposit an object, retrieve an object, with all the preservation and authentication bits in between.

There were other issues too even at this basic level. Some harvesters complained that the data they were picking up from repositories included a lot of “noise”. Sometimes a maverick repository would use a DC element for data unrelated to its real purpose. In other cases multiple terms would be used to describe the one type of resource (e.g. periodical, newspaper, journal). And in other cases there would be too many of the same DC elements coming through (e.g. Date) without any obvious differentiation (e.g. date published, date copyrighted, date awarded, etc).

None of those was or is insuperable. Why not (relatively) simply set up a program that will enable the harvester to streamline the data it receives — so that the known common alternatives (e.g. periodical, magazine) were all dumped under the resource type “journal” or whatever the desired appropriate standard? Or in the case of multiple undifferentiated dc elements like dc:date, then would it be too difficult for a specialist harvester to take the initiative and introduce a slightly modified dc schema (a DC application profile, possibly one already in successful use elsewhere) for, say, theses?  There are other work-arounds, but a business-case / cost-benefit study should help assess the best alternative for the long-term.

One reason they have not been resolved here until now may be, I think, because Australia has lacked a coordinating or leadership body relating to these areas. Ad hoc team-work has its limits. Australia has had a number of bodies — ARROW, the National Library’s Discovery Service and the Australasian Digital Thesis Program and other libraries who  relate only to one or two of these — working on their own remits without a real coordinating vision.

Each of these bodies grew up like Topsy. There is now MACAR, and that body is looking at recommending metadata standards for repositories. What it is working on is important. But it does not have the resources to meet often enough and to make its presence felt strongly enough, or to address comprehensively enough the key issues affecting all stakeholders, to be seen as a coordinating leader providing the vision and programs needed to smooth out the issues each separate body feels are part of the way things must be for the foreseeable future. Leadership in Australian libraries has traditionally come from the National Library. What I missed when learning of the multi-faceted issues of repositories and metadata was something like a National LIbrary coordinating leadership in this area. Such a nationally recognized body (or one with clear  sponsorship by the National Library) might have had the means to lead in smoothing out the respective issues faced by each discrete part of the repository-harvesting picture.

But now there are other developments on the horizon that appear to have the potential to augment the very purposes and functionings of repositories. Till now Australian repositories have mainly been storage bins for single objects, sometimes multi part or multi file objects. They are often promoted as vehicles to showcase an institution’s (and an individual academic’s) scholarly output. But the next stage may be to use repositories as tools for research as and with the needs of end users being the main rationale.

Going beyond first generation repositories, — in scholarly communities a single work can consist of many parts — a text discussion, datasets relating to the text, specialized types of images that are not only illustration but the very source and object of analysis. The sort of idea now being worked out is that of a user being able to draw out a representation of such an image from one repository and compare it with data harvested from another research repository in a single operation.

Developers are currently testing ways to harvest not just the representations of single pdf or jpeg objects from repositories, but to harvest, say,  URI’s assigned to selected parts different objects across a number of repositories. In simplest terms, it may be possible, for example, to “harvest” or “create” a complete journal edition from the multiple journal articles scattered across a range of repositories. Okay, why. But think through the possibilities once it is understood that’s the sort capability we want to establish.

The implications are vast. Different types of repositories from a range of institutions need to be part of a framework that has the sort of consensus that will make this possible. The technological infrastructure. The institutional support for each repository and the agreement on standards and policies that will support this, and the growth of a research community program that will facilitate the use of all this.

The spadework for some of this has now begun with OAI-PMHs sponsorship in the OAI-ORE project. Proposals for an Australian Data Commons are now being tabled. With effort, planning and maturing these early-day visions and testings will generate their own leadership.

But in the meantime university library repositories have proved how responsive they are willing to be to a national leadership plan and vision. They all focussed on “what to do now” with the immediate future in mind when that was clearly spelled out as RQF. Okay, money and a bit of compulsion were at play there too. But they did not exercise their collective freedom to dig in and protest.

Libraries like authorities, whether AACR2 or the National Library. And being able to confidently adapt an authority to one’s own institutional requirements, without sacrificing anything important, makes some tasks worthwhile. In that sense, the authorities are seen as friendly guides towards the vision, with whom they willingly cooperate.

Till now changes have been happening so fast that there has scarcely been time for an acknowledged leadership in these areas to emerge. Everyone is grappling to learn their areas — and sometimes something can fall through the cracks for a time, waiting to be rescued along the way. The leaderships that do exist are in segmented areas.

%d bloggers like this: