The earlier posts demonstrate a range of resource or document type terminologies used across repositories. Is that a problem? Would it be better to have standard terms? Would it be necessary? Possible?
In the library world it is important to use standard lists of genre terms in order to manage one’s collection efficiently. Things get lost otherwise. The general material designation [GMD] terms have to be consistent.
Within repositories the same principle applies — terms need to be consistent within each repository, obviously.
But does it matter if a repository at one university uses “Article” and another university uses “Journal article” for the same type of document in their respective repositories?
(I realize the portal display names may not be the same as the behind-the-scenes values that are mapped to Dublin Core values, but am assuming that in many real life cases they do differ as much as differences in the web displays.)
To enable searching by resource type within a repository obviously one needs to decide on a single term. But how strong is the case for all universities using the same terms?
Technology has brought us federated searching and crosswalks. Would it not be easier to design a technological solution if someone wanted to search all journal articles across Australian universities than realistically expect some or all universities to change their metadata database entries?
The ADT example
One can search all Australasian research level theses now through the ADT technology, for example. And no nationally uniform unique metadata value terms are required for this. What is necessary is application of the Dublin Core metadata schema. It doesn’t matter if one university lists this document type as “theses” or “thesis” or “dissertation” as a value in a DC field. Simple DC used with the right technological solution works for ADT theses. (More detailed information about how DC works with ADT harvesting can be found in the RUBRIC Toolkit’s Metadata Chapter under “Harvesting 3: Australian Digital Thesis“.)
While users will sometimes want to search for particular resource types within their own institution (How many refereed papers does my department have published in our repository?), users searching repositories collectively (e.g. via OAIster or the Australian National Library’s hosted ARROW Discovery Service) are primarily looking for information about a topic or for works by an author. How often will they only want “reports”, for example? I suggest they want to know what’s available and when they see the harvested material they will then decide which bits are the most useful for their needs, whether a thesis or a journal article.
The OAIster harvester search page does give an option to search by one (or all) of the following resource types:
Each of these types is populated with thousands of records. This short list gives the impression that OAIster has been able to conflate a much wider range of value expressions in their harvested data. But I have sent a query to OAIster to check this and will update this blog when I receive a reply.
It would be interesting to do a study to know who and how often and why users would use these resource types in their searching. But more to the point of this post, such a simple range of resource types for faceted searching can be created by the technology from the metadata provided. The technology can bring metadata values such as “Article” and “Journal article” both into the “Text” resource type.
ARROW Discovery Service advanced search page lists more resource types:
- e-journal article
- research paper
- conference paper
- book chapter
- working/discussion paper
- technical report
- course list
- learning object
- rich-media (non text)
- still image
- multimedia object
- research dataset
- arc project report
But this appears to be mostly an early-days wish-list. Only those in bold type (my emphasis) yield hits. (And wonder how many users will want to restrict their searching to book chapters across Australian repositories.)
There will continue to be mutation and innovation in resource types. Multimedia, Web 2.0, simple technological progress, make this state of affairs inevitable for the foreseeable future. Will it even be possible to make a definitive list?
Does it matter?
When I first confronted the wide variation in terminology in the metadata world my instinctive reaction was to wish for a clearly defined standard set of terms for everything. I hoped that in time certain usages would come to dominate and settle as those standards. But now I’m not so sure. I can see change continuing apace and terms that are more likely to disappear from use will not be replaced always with “standard” replacements, but with yet more variegated terms and concepts.
And it doesn’t even always matter if some standardized terms are misused by the technology. EPrints, for example, has used dc.identifier to refer to the eprint record itself, contrary to DCMI’s own guidelines that say of this element: “It should not be used for identification of the metadata record itself.” (See 4.14 Identifier on this page.) But Eprints still works. DC elements are repeatable so at least the right identifiers still get a leg up.
By no means am I suggesting we can be careless about what data we enter in our records. Surveys show that in some cases some editors do enter, uselessly, “Yes” or “No” in dc:type and other fields.
Is there a case?
But we are dealing with scholarly repositories and each institution has a preferred set of terms and subterms for their scholarly works. Despite some such variations, the scholarly institutions do talk to and understand each other. Article and Journal article are both understood to be close enough to the same thing. Technology now has the ability to replicate that mutual understanding in machine searching. I am not sure there is a very strong argument to come up with a standard that decides between “article” and “journal article” or an alternative to either.