Over the last few years I have worked closely with a number of different institutional repository solutions, both open-source and enterprise products. There are several I have not had personal experience with, but I have taken opportunities to speak with a wide number of users of these products, too, as well as with representatives and producers of those solutions. I have also sought input from other users of repositories I am personally familiar with in an attempt to balance out my own personal impressions. The following comparison is based on feedback primarily from managers of the systems — whether they have live production systems or have done extensive testing on systems they expect to take live soon.
The purpose of this comparison is to give an intro level guideline for institutions interested in “what (else) is out there”.
- The comparisons are not a systematic point-by-point balanced presentation. Anyone interested in a serious in-depth comparison or study of any particular repository solution would need to speak to other users themselves, as well as the producers or agents of the solutions.
- It is also restricted to the repositories I know from my experiences in Australia.
- I have not referenced costs or specific institutions here.
- Nor have I attempted a serious comparison of the IT architecture across the systems.
- The main focus is on support provided/needed and functionality of each product.
Digital Commons is a “presentation repository”, not a “preservation repository”. Emphasis in a repository designed primarily to showcase an institution’s research is on an attractive and compelling interface for users, including self-submitters. Digital Commons is a hosted solution (i.e. hosted in California). There is no hardware to purchase, install or maintain. An institution can begin to upload papers immediately after installation. Purchase and maintenance is on a renewable one year or limited number of years basis.
Digital Commons cannot be synchronized with another preservation repository for migration purposes. A preservation repository, unlike Digital Commons, will record and preserve authentication, versioning, rights, structural and descriptive metadata. In Digital Commons such data will not be preserved for migration/exit strategy purposes to a preservation repository.
- Three universities in Australia using Digital Commons reported that the service from BePress is “very good”.
- The setup period consists of about 3 weeks. All report that Digital Commons is easy to set up.
- Phone hook-ups were used for training and instruction at the beginning. Web demonstrations accompanied these.
- Requests by a university to change the front page appearance were responded to quickly and changes made efficiently.
- Another institution has requested many ‘fine tuning’ modifications to their instance of Digital Commons, and all these requests have been met “pretty quickly”. One institution wanted the format of citations changed and BePress effected this change quickly for them.
- One institution who has used Digital Commons for more than 2 years said they had never had any down-time with it.
- Nightly uploads of new documents.
- Documents can be set to open or closed access
- Different authentications can be set up for different users
- Self submission is possible
- Workflow stages can be configured “to some extent”, so that a central library service can monitor self-submitted documents for quality control and copyright issues
- Embargo functionality
- Different types of media files can be deposited (e.g. mp3, pdf, video)
- OAI harvesting
- PRPs (Personal Researcher Pages) – this was a strong selling point at one institution. In addition to the central epubs repository, links can take one to a PRP of an author, and this PRP can contain a list of their publications, be used as their homepage, and a point from which to access their documents. The library instance of Digital Commons “harvests” these PRPs and includes links back to the PRPs on the document pages.
- Documents organized by collections
- Able to hide preparatory work on a document being uploaded until it is ready to go live.
- Reporting and statistics
An Ex Libris product, DigiTool is a Digital Asset Management System (DAMS). It is designed primarily for teaching functions, and its repository capacities consist of a set of additional modules. The DAMS is primarily designed for teachers to share their digital objects (images, course notes and notices, indexes of resources, exam papers etc.). Much of this data is ephemeral.
DigiTool is not a hosted solution, but there is a community of users, a consortia client of Ex Libris, who do support a “hosted server” – UNILINC: http://www.unilinc.edu.au/services/hosting.html
Users of DigiTool report that it definitely requires their own local IT support to configure it appropriately for specific institutional needs.
- Functionality depends on the modules purchased. Modules are available for:
- Academic self-submission
- another for collection management (for arranging objects, adding thumbnails and descriptive metadata)
- a JPEG 2000 viewer as an optional plugin
- OAI interoperability (harvesting) module
- The Ex Libris demo of DigiTool says it is scalable, and users of DigiTool all spoke of it being able to do much more than their immediate requirements.
- Ex Libris also advertises that it “supports interoperability through open architecture”.
- Embargo periods
- Self-submission via the Deposit Module. The Deposit Module provides an interface and workflow which enables submission of objects and metadata by non-staff users.
- Workflow also allows for authorized staff to control, edit, and approve/decline the submitted material.
- Different levels of authentication: user/patron authentication is handled by the Local User management function or via LDAP.
- Copyright: this can be managed by manual assignment of access rights to the object.
- Objects can be assigned with access rights permissions.
- The following formats are supported for load into DigiTool: MARCXML, DCXML, MODSXML, CSV, METS — Given the claim that DigiTool is based on open architecture, one should expect the data stored would be migratable to other systems.
- ExLibris advertises that DigiTool supports preservation standards such as PREMIS and the OAIS (Trusted Repositories) standard model.
- DigiTool’s “interoperability module” for OAI harvesting does not configure OAI compliant Dublin Core. This is not a problem for harvesting by the NLA’s Discovery Service because DS have configured their service provider to read and harvest their DigiTool feeders. But seamless OAI harvesting cannot be guaranteed by other service providers. DigiTool users are expected to be informed of this issue by USQ-Repository Services. I have not been able to learn if this is unique to DigiTool or is also an issue with other proprietary solutions discussed in this report.
- Some users see DigiTool’s deposit procedure as “klunky” in trying to get it to do what they want. Editing of objects can require hours (overnight) to take effect; citations need to be created separately since they are not automatically generated; and multiple key strokes are required for some “simple” operations such as moving an object from open to restricted access.
- One user said that the upcoming version of DigiTool “promises” to be able to give them the ability to handle hierarchical structures. “We think it will do what we want.”
- Citations need to be specifically created in DigiTool – they are not automatic as in EPrints.
Reasons for adoption
Most institutions who have adopted it or who are considering doing so have said that their primary reason was to establish synchronicity with their other Ex Libris products. Some specifically added that it was policy for them to favour enterprise solutions over open-access solutions.
DSpace is an open source solution developed by MIT. It has a large and active community of users. At least 450 registered DSpace repositories worldwide are evidence of DSpace’s robustness, ease of implementation, simplicity of maintenance and ongoing use, and low-cost.
- A large and active community of supporters with experience and expertise available to draw on
- Thorough online documentation for IT staff and managers for customization and implementation
- Step by step online tutorials
- Online assistance
- The amount of local IT support required for the implementation of DSpace depends on the extent of configuration changes an institution wishes to make.
- DSpace provides a module, Manakin, which enables the configuration of much more “original” interfaces without “intensive long term” IT support.
- Institutions with basic largely “out of the box” configurations report that they can do “in the main” without local IT support. The payoff is that a few “minor issues” (e.g. maintaining correct indexing records when changing the location of an object from one collection to another) persist.
- DSpace manages objects in an hierarchical collections based structure. Collections (or hierarchies of collections) display alphabetically on the main page.
- This Collections based organization, with inbuilt workflow and authentication capabilities, enables different faculties or departments to manage their own deposits and structure of their collections. Workflows can be set up to still provide for central quality control and final editing by the library.
- Descriptive metadata for the objects has a flat structure, which means that in cases of objects with multiple authors from different affiliations, there is no automatic guarantee that data can be transferred intact from one repository to another. This requires IT support in order to set up, say, a METS package, in order to encapsulate the data in its original relationships for successful migration.
- Workflows and authentications are supported.
- Embargo periods are supported (metadata page displays but the attached document becomes public at a preset date)
- Objects can be made inactive to be hidden from public view.
- Different mime types are supported, including video and audio.
- DSpace is integrated with Research Management systems in several universities.
EPrints is an open source solution developed and supported by the University of Southampton. EPrints is “easy to install, easy to configure, and needs minimal maintenance. Once installed, it simply works without fuss. Over a year, no maintenance has been required to the UTas server apart from updates.” (Arthur Sale, UTas)
All EPrints administrators I have contacted have spoken well of its simplicity and stability. It is widely seen as an ideal repository solution for initial implementation in a university with limited financial resources and IT support.
“EPrints is a mature software package, with an established community. It offers a complete solution for managing a research repository for Open Access. EPrints can be put to other uses, but for other uses such as image repositories alternative software might be more appropriate. . . . However, the software is under active development and it is particularly useful as an Open Access document repository.” (http://rubric.edu.au/repositories/eprints.htm)
“Many institutions do not have the resources necessary to build or maintain an institutional repository. The EPrints Services team offers a complete range of advice and consultancy to support institutions who have adopted, or who are looking to adopt, the EPrints solution. We can provide as much or as little support as you need to create and maintain a professional repository.” – EPrints site
This assistance is gratis to those implementing and maintaining an EPrints repository.
I contacted at least half a dozen universities using EPrints and expressed the unqualified praise for the level and timeliness of support from Southampton. This praise came from both IT staff who have had to liaise with Southampton as well as from repository managers.
Patches and upgrades are released regularly. Users have remarked on the ease with which these are installed and the robustness of their maintenance.
- EPrints supports OAI-PMH harvesting protocols.
- Plug-ins have been developed so it can also support specific research reporting requirements and for supporting the emerging SWAP (Scholarly Works Application Profile) that is pioneering interoperability and semantic web developments for scholarly works.
- Self-submission (with a simple self-submission interface that is quick and easy to learn) is supported.
- Workflows can be configured for editors, submitters, and monitoring staff with different permissions.
- Objects can be removed from open access.
- Batch import (e.g. of ADT records) is supported.
- Peer review status, publication status, copyright and other administrative information, and citation generation and statistics by objects and author are all part of the “out of the box” package.
- EPrints supports text (in particular pdf) and image files, including multiple files per object.
- Flat metadata structure. So when there are multiple authors with different affiliations there is no guarantee that the right author-affiliation matches will be maintained in future migrations to other repositories. Ensuring this requires some workarounds that are available, but need IT support to implement. In raising the flat metadata structure issue, it should also be mentioned that EPrints are developing an RDF module that converts their metadata into “triples” (subject-predicate-predicate). RDF (Resource Descriptive Framework) is the basis of the emerging Web 3.0 (Semantic Web) and enables data to be converted into multiple schema, including complex hierarchical structures.
- EPrints has recently begun to support preservation metadata with the work of its Presev project and this has preservation functions have been implemented with EPrints 3:
1 A history module to record changes to an object and actions performed on an object
2 METS/DIDL plugins to package and disseminate data for delivery to external preservation services
- Slow indexing issues in EPrints have been rectified with the EPrints 3 version.
EPrints development has ensured robust functionality and this has limited the file types supported in the earlier versions of the “out of the box” EPrints. But successive releases are allowing for wider variety of mime-types to be supported.
One limitation up till now (version 3.0) with EPrints has been the failure of the Embargo facility to publish objects on the release dates. These need to be manually published.
Dates for theses appear as “date published” instead of “date completed” in at least one institution. It is not clear to me if this is a configuration issue that is resolvable with IT and/or Southampton support.
269 archives are known to be running EPrints archives worldwide.
The reason for this recommendation was that it was “easy to set up, easy to configure, easy to use, and has a very large, open community supporting it and using it . . . and not all that expensive . . . for small institutions such as these, it was ideal.” (USQ-RS Manager in personal email)
Equella is developed by The Learning Edge International (a Tasmanian based company).
Equella is primarily a “Learning Content Management System”. Learning Edge also describe it as a repository, but speak of it as a teaching repository tool. It is “a fully integrated Digital Repository and Content Authoring Tool”. It can be used as a collaborative lesson planning tool as well as a repository. It can plug into Blackboard and WebCT.
Users have described its administration module as well developed. “A non-IT person can set it up with a graphic user interface for collections and permissions and configuration.”
- One institution needed “some” local IT support to set up Equella. They do not need local support for much more than activating periodic patches that Equella sends out now.
- Another also said that the “setting up” of Equella was most complex part, but that this included the preparatory work. Equella is so very flexible that many decisions need to be made in advance about what exactly was wanted, what should be the best and proper policies. Once this work had been recorded it was relatively easy to setup. All the development work is done by Learning Edge.
- One institution said that the initial setup involved two days’ training, which was described as “very adequate”.
- self-submission (including students of higher theses)
- different levels of authentication (easy to setup – a simple switch; manages various levels of permissions among academics – very flexible)
- workflow systems (staged steps to check for copyright compliance with OAKLIST, Sherpa, etc)
- digital rights management (can aggregate objects to groups for specific workflows and permissions)
- embargo periods
- OAI-PMH harvesting (e.g. to Google Scholar)
- records can be pulled out of Research Master and sent back into Research Master
- one user noted that the user frontend is boring, uninspiring, but the functionality behind it is “great fun” – relying on their own IT people to soon rectify this web appearance.
Equella has about 18 clients in Australia, including several Tasmanian, Queensland, South Australian and Victorian TAFEs and Education Departments. One institution chose Equella to handle 4 primary tasks:
- RQF reporting (now ERA)
- As a backend to Moodle CMS
- To be the base of the e-reserve collection
- Developing a university research repository
- Open Source solutions were not an option because of limitations of the brief.
- it has a well defined administration module, so a non-IT person can set it up with a graphic user interface with various collections and permissions, and they did not want to align with a particular library system
Fez is a front end to the Fedora repository software. It is developed by the University of Queensland Library as an open source project hosted on SourceForge. See http://sourceforge.net/projects/fez/
Project Overview: http://espace.library.uq.edu.au/documentation/
Fez is part of the Australian Partnership for Sustainable Repositories (APSR). Fez is one of the deliverables of the APSR eScholarship testbed in the University of Queensland Library.
- University of Queensland developer/s does offer support for other users of Fez. These should be contacted for details.
- To implement Fez an IT officer with the following basic knowledge set would be required:
- * MySQL
* Fez – written in PHP, but also CSS, html (Smarty html templating)
* (related other pre-req software)
* Understanding how it all fits together
- * MySQL
- This would also involve that person (obviously) having programmer level access to the server on which Fez ran (something to be considered, depending on your IT department’s policies).
- The University of Queensland developer
RUBRIC recommendation at February 2007:
For institutions wanting to run a general purpose repository Fez is a promising choice, provided that the technical resources are available to manage it. Contact the developers, as they may be able to offer support under a formal arrangement.
- Fez is a rapidly maturing repository software application.
- Fez is built around constructs known as “Communities” and “Collections”.
- Supports self-archiving
- Workflow authentications and authorizations. These are configurable through GUI interface.
- Security based on FezACML to describe user roles and rights on a per object basis or through parent collection or community security inheritance.
- Security at object granularity.
- Statistics (eg Downloads per Author, per Community, per Collection, per Subject etc).
- Preservation metadata extraction
- OAI service provider for harvesting
- Supports migration to and from other repository systems (DSpace, EPrints, VITAL)
VITAL provides every feature–ingesting, storing, indexing, cataloging, searching and retrieving–required for handling large text and rich content collections. VITAL takes advantage of technology standards such as RDF, XML, TEI, EAD and Dublin Core to easily describe and index an assortment of electronic resources. VITAL leverages the benefits of open-source solutions such as Apache, MySQL, McKOI and FEDORA™. VITAL conforms to common Internet data communications standards such as TCP/IP, HTTP, SOAP and FTP. Additional standards utilized include WSDL Web Services, OAI-PMH, Dublin Core, MARCXML, JHOVE, MIX (Metadata for Images in XML Schema), and SRU. (from the VTLS site)
A PREMIS datastream is also generated at ingest.
Australia’s community of VITAL users has been able to coordinate support through ARROW. 2008 is the final year of the ARROW project. ARROW is expected to be replaced by a CAUL sponsored body, CAIRRS, with a remit of supporting the Australian university repository community more generally. VTLS has acknowledged that their past record in prior testing of new product versions, and their follow up support, could have been better. At a recent ARROW meeting in Brisbane, a VITAL representative assured users that VITAL itself would upgrade users’ current versions to 3.4, due for release around the end of October. In the past, patches have not always been available for recognized issues (including significant ones such as certain PDF files not able to be indexed) and users have had to wait for new version releases. VTLS does have an online “hotline” for logging such issues.
The VITAL product promises much. It has the advantages of a Fedora base, which enables the storage of a wide variety of content types, and greater sophistication in their management and support for any standard metadata schema (hierarchical or flat).
- Documents can be set to active or inactive (public or hidden) — although at the moment of deposit they necessarily default to active
- Self submission is possible through VALET
- Workflow stages can be configured so that a central library service can monitor self-submitted documents for quality control and copyright issues
- No embargo functionality
- Different types of media files can be deposited (e.g. mp3, pdf, video)
- OAI harvesting
- Statistics — although some institutions opt to hide this function because the stats file consistently corrupts at regular intervals or the statistics re-set to zero, although this is said to be fixed in version 3.4
- Copyright: this can be managed by manual assignment of access rights to the object.
- Supports migration to and from other repository systems (DSpace, EPrints, VITAL) with METS.
- Ability to search the full-text content of PDF, DOC, RTF and other document formats — although there are currently security issues with this function, such as public searches not being completely cut off from “hidden objects”
- Ability to display multi-page documents.
- Integrated editors for easy editing of metadata.
- Customizable templates for display of content.