October 2, 2008

Online journals and institutional repositories – comparing their potential impacts on research methods (and journal publications)

While discussing the potential impact of institutional repositories on journal publications, an academic alerted me to an article — Electronic Publication and the Narrowing of Science and Scholarship by James A. Evans — discussing research into the impact of the online journal culture on research methods in the world of scholarship.

The article is interesting for several reasons, but I want to address its conclusions in relation to research repositories – something that the research did not intend to address. I think such a comparison is worthwhile to the extent that it helps clarify what I see as how institutional repositories and online journal databases each differently potentially impacts both research methods and publishing companies’ futures. I also can’t resist a comment on my thoughts of where we are headed subsequent to the current world of online databases, whether of journals or research repositories.

The opening abstract of the article notes:

Online journals . . .  are used differently than print — scientists and scholars tend to search electronically and follow hyperlinks rather than browse or peruse . . . .

IR comparison comment: IRs, on the other hand, do facilitate browsing by keywords, titles, authors, year, resource type (that is, text or still image or video . . . )  — not just searching as per online journals like JSTOR or EBSCO.

Evans’ research into the impact of online journal databases found that:

As deeper backfiles became available, more recent articles were referenced; as more articles became available, fewer were cited and citations become more concentrated within fewer articles.

Costs and benefits of online journal databases

His interpretation of this paradox:

  1. as online searching replaces browsing in print, there is greater avoidance of older and less relevant literature;
  2. hyperlinking through an online archive puts experts in touch with consensus about the most important prior work – what work is broadly discussed and referenced;
  3. thus online search bypasses many marginally related articles that are still skimmed by print researchers.

Findings and ideas that do not become consensus quickly will be forgotten quickly.

This research ironically intimates that one of the chief values of print library research is poor indexing. Poor indexing – indexing by titles and authors, primarily within core journals – likely had unintended consequences that assisted the integration of science and scholarship. By drawing researchers through unrelated articles, print browsing and perusal may have facilitated broader comparisons and led researchers into the past.

Evans sees this as one more step away from the contextualized monograph:

the contextualized monograph, like Newton’s Principia or Darwin’s Origin of the Species, to the modern research article. The Principia and Origin, each produced over the course of more than a decade, not only were engaged in current debates, but wove their propositions into conversations with astronomers, geometers, and naturalists from centuries past.

Thus the higher efficiency with which arguments can be framed with the assistance of online searching and hyperlinking, “the more focused – and more narrow – past and present” they will be.

It is not a strictly fair comparison, but this does remind me of a time I was inspired with fresh insights into a topic I was investigating simply from the serendipitous luck of accidentally noticing a title on a tangential topic placed just one library shelf above the one which held the exact classification numbers I was directed to browse. No online search – nor any print citation index – could ever enable me to repeat that particular stroke of luck.

In other words, any technological change will charge some costs against the way we used to do things. Maybe the advent of the printing press led some researchers to miss the chance discoveries they made in the days they had to rely on personal travel to where certain hand-copied books were known to be stored. But each change also brings its own new avenues for broader comparisons and insights from unintended serendipity. If this is not currently happening on a large enough scale to impact the statistical research of Evans’ article, then it is reassuring to know that this is not the end of the story.

But no doubt Evans also would acknowledge that the broader conversations involved in works like Principia and Origin were themselves scarcely the outcome of unintended consequences of the relatively poor indexing of the print media. Today’s researchers and scholars, I suspect, are under far more institutional pressures to specialize, produce and publish at a certain rate than regularly experienced by Newton and Darwin.

Innovation also follows demands and needs, including those of researchers. Online journal and e-book databases and catalogs are not the only points to which the electronic media are leading.

Comparing IRs with online journals, their functions and potential impacts

IRs are finding their way into a growing number of universities. (See my list of current Australian IRs and links to other registries.) These IRs do support broader opportunities for browsing, not just searching, by both uncontrolled keywords and sometimes controlled vocabularies, resource types, authors and titles. Browsing is one of the alleyways the Evans article laments is missing with electronic searches. That is not the case with most IRs. Nor are IRs restricted to the localized browsing of a single institution’s research repository. They can be browsed collectively through harvesters such as OAIster, the Australia’s Discovery Service, etc.

IRs are something other than a high-tech attempt at a more efficient journal database. They are quite different. They are a means of individual academics – and their institutions — making visible and accessible their collective works. They are a means of both showcasing and preserving personal and institutional research, and also of making publicly funded research instantly and openly accessible to all.

IRs offer the capacity to more easily interrogate the discussions of individual authors over time. This is surely a potentially useful alternative to journal databases that are structured around topics. Researchers do publish across many journals and the history of a single researcher’s output can be immediately apparent in an IR even though they have been published across a wide range of journals.

Not only immediately apparent, but immediately accessible in those repositories generated by the open access principle that holds that publicly funded research should be publicly available. Most online journal databases restict access to those who belong to an institution with the appropriate subscriptions. If a journal is neither online nor held in print by an institution then one must wait days for the interlibrary loan process before accessing the article. In an open access repository it is instantly available. No waiting time or tedious paperwork/online-form processes to negotiate.

Journal titles that originally hosted these publications are also referenced, thus in some cases also raising awareness of journals that might otherwise have been less widely known.

Where it’s all headed?

That’s the present. And the web is still in its infancy. Our wheels are still revving in the first generation of the Web, Web 1.0. Online journal databases, and institutional repositories too, are nothing more than a mass of web pages or documents waiting to be accessed. They are little more than a “more efficient” form of the print media and print indexing. In the case of IRs, they have the bonus of allowing more innovative and extensive browsing, too. Web 2.0 is a cute next step allowing social networking, which a growing number of scholars are finding really is more than simply cute. But the next evolutionary step, Web 3.0, is beginning to mutate.

This will be the semantic web, where information will be meaningfully contextualized in the way early (19th century!) information managers and innovators (thinking of Charles Cutter), and who knew only the print medium, originally intended. The semantic web will mean an online world where all the varying information topics (not just web pages or pdf files) have their uri namespaces and where it will be possible for users to search through them via meaningful relationship enquiries (not just “X links to Y” but “how X links to Y”; and both X and Y can be interrogated within their ontological relationship to each other – is one a subset of the other? or is one an echo of another in a different discipline?) Not that Cutter envisaged the semantic web, of course. But he did seek a way of organizing information in a way that was more meaningful and useful than the classification systems that we ended up with in libraries.

Part of this will be the exchange and reuse of objects within datasets and research databases (ORE). Datasets from different fields and disciplines can scarcely “talk” to each other today because of their different measuring and conceptual modelling. Any overlapping concepts are known to few outside those familiar with both disciplines, and even those few may rarely be able to make use of such overlaps because of their varying languages. The next stage of web development will see a working towards the ability to interrogate meaningfully, and select and re-use in other contexts the specific information we seek, as well as the ability to explore major and minor side-avenues. We will not be restricted to searching or browsing pages that have been prepared for searching and browsing (and “mere” hyperlinking) by others.

If we are moving away from the “humanist” benefits of inefficient print indexing, we are, I believe, moving towards an even greater scope for creatively exploring the total chaos of information.


