Sharing here a crosswalk between Australasian Digital Thesis data and MARC. This has been done many times before, including by moi, but am making my latest available stab at it for benefit of others who are still in early days of (Australian) repository implementation.
August 5, 2008
August 16, 2007
To see the common elements in the 5 thesis metadata schema — ETD_MS, UKETD-DC, XMETADISS, TEF, TDL) — check Thesis-specific metadata across 5 schema (colour coded) (a pdf file )
In sum, the thesis-specific elements all five schema contain are: (more…)
August 11, 2007
August 10, 2007
July 6, 2007
Just to be contrary (– not really — merely expressing thoughts in flux — that’s what blogs are for, right?) why bother with an ETD metadata schema at all? Why treat theses any differently from any other resource in repositories? Obviously there are some specific differences that need attention when it comes to images or videos in comparison with text or pdf files, but this is a function of format. The concept “thesis” is of course not a format but an intellectual content based idea.
By all means maintain the uniquely “thesis” metadata in a repository record (awarding institution, degree name and level, etc), but for harvesting purposes, is there any need to go beyond what is already available through simple Dublin Core data? (more…)
July 2, 2007
The NDLTD may not have the status of an international standard metadata schema for online theses and dissertations, but it certainly has established a lead throughout the US, Canada, UK, France and Germany at least. It is far from confined to the US. It is used by/used as a mappable set of elements with Library and Archives Canada (LAC), the EThOS’s UKETD-DC, Theses Electroniques Francaises (TEF), Germany’s XMetaDiss, and also within the Texas Digital Library Consortium’s MODS schema for theses. And much of the USA of course. (more…)
June 29, 2007
Other miscellaneous notes, no longer attributable :
importance when studying surveys etc to observe what people DO, not just what they say they do. thus for example people may not really want an interface like a google box, but really want a structured break-down search box into categories, e.g. one column for authors, an adjacent column for titles, an adjacent one for types, another for year ….. People may well prefer structure with tabs to tick etc. — unlike Web of Science’s navigation.
importance of being able to take uses to the data itself so they can display it in their own preferential way. (web.2.0 ..c.f. itunes, greasemonkey….)
the resource type “thesis” in metadata schema will need to have subdivisions not just the type of thesis (e.g. research, professional, coursework…) but also whether it is pdf, scanned pdf, thesis by publication, multimedia (one student in US now is doing hers on a wiki).
From Last Day of Conference:
A session on plagiarism addressed mainly academic “cheating” rather than third party property issues and such. Another session focussed on catching up with getting all the other old print theses online — the logistics and strategies for coping with scanning these and adding them to a repository collection. One I regrettably missed but have since followed up via email (at least made personal contact at Uppsala) was a case study in Belgium of a library coping with changes that need to be made to the theses metadata over time, and how new policies and metadata issues in relation to etd’s are handled in repositories. Look forward to reading that paper in depth and reporting on the experiences it discusses.
In humanities at least, is there a need for a database separate from a thesis database — the separate one being fore the massive supporting evidence underlying the theses?
And just to make it simpler, we should be preparing for cases of dissertations that are co-authored — with parts of the dissertations being re-prints from published journals….
These posts reflect, of course, my own experiences of the conference and not the totality of what was covered. So many nuggets come up at such conferences that do not lend themselves easily to this sort of note-taking, though I have tried a few times to include them — I know many more will come to me over time as specific contexts jog memory and that will be time for making more notes no doubt. Many of such nuggets come from informal discussions, question and answer sessions, and other asides…. One of the biggest benefits was simply in meeting others from around the world, all continents, who are involved in working towards the same things — and thus knowing where we at RUBRIC and Australia do fit in with the larger picture. This is invaluable for better knowing how to interpret many of the articles one reads, and the various policies and practices both locally here and elsewhere, and to keep in mind a practical vision of what is required for the goals of meeting the tech changes and requirements this imposes on metadata (my specialty of course) and other aspects of repository management.
I have much to follow up on now — and have already blanketed the globe with follow up emails to certain other delegates, some of whom I met there and others I may have met — and to examine afresh the metadata requirements of Australian ETDs — not to forget getting a larger view of other related repository issues as well. I have made references to specifics throughout the posts, and expect to share some of the followup work here in future posts.
And many at least now have heard of RUBRIC, too, both from personal contacts and more formal discussions following the sessions, not to forget of course the presentation of Peter Sefton! Now that was a real hit with many subsequent mentions in the sessions. Many commented with envy that there was an organization like RUBRIC that would send a metadata delegate to such a conference almost as a matter of policy — to be on the cutting edge in order to deliver the best services possible. So I should thank RUBRIC management (past and present) for making it possible for me to attend.
Forgot to add earlier that the TDL uses the Manikin module interface — worth comparing with the normal DSpace view. The next session I was able to discuss with others (e.g. MIT — Craig Thomas) their use of Manikin as well for improving DSpace’s functionality, and how they found it in ‘real life’.
Else Nygren addressed the differences between old and new ways of learning which I found very interesting. Spoke of problems mixing Metalib with Google habits, the need to find the habits of users, and to make content accessible across cultural and cognitive barriers. Asked afterwards who the “new users” a repository should be alert for Else spoke of interested young people, not university students. I’m sure there are more interested among groups other than the young, too. I don’t see that public accessibility, open access, will be of interest exclusively to students and academics.
One of the most interesting sessions from my particular metadata perspective was Session 5’s “Discovery and Access” segment, and I made the most of using the Q and A session at the end of it. Sharon Reeves discussed user generated metadata for etd’s in Canada, in particular for the national LAC (Library and Archives Canada). Austin McLean of Proquest read Dr Livia Vasas’s (unable to attend in person) paper, and John Hagen of West Virginia libraries spoke on Building Effective Discovery Tools for Academic Promotion and Tenure Evidence. UMI’s PQDT (Proquest’s progenitor database?) apparently pays authors royalties on sales of copies of online theses? LAC uses ETD-MS — cataloguers don’t look at the record so there are no controlled vocabularies. (Compare this with the controlled subject vocabularies I noticed in other networks of repositories in the U.S.) There was a table in her presentation showing the relationship between MARC and ETD-MS which I must see in detail as soon as it is available. I was curious to know why ETD-MS was chosen by Canada (it has not been adopted by ADT in Australia reportedly because it is not yet a universally recognized or adopted standard.) I also wanted to know if it was chosen over comparisons with other metadata schema.
The other main query I had was the problem of reconciling different (international differences) meanings of the terms “doctoral” and “masters” etc. Not all doctoral theses are research theses in all countries, although that term might be the definition that explains it IS a research thesis in, say, the Netherlands. Clearly we cannot rely on or expect a common terminology. The differences in the terms are culturally and politically rooted. It is up to additional metadata fields to clarify the natures of each thesis type.
Place this in the context of the value of ETD-MS. I don’t think that that schema does justice to this problem. The global solution has not yet arrived, but this did highlight for me the importance of building the required granularity into the metadata schema now — whether through a MODS application or other. This is going to have to be a priority that I will want to work on and make a proposal for others here in Australia.
But while my time was with this session I was missing out on comparative developments in India and Japan. Clearly Australia needs to be in step with Asia as much as Europe given much of our research focus. But I am currently following up personal contacts made with some of the delegates from these countries.
Also missed was DissOnline Portal by Germany’s National Library Natascha Schumann — a topic I’d really need to tackle with input from ICE-RS Peter Sefton; also EthOS in the UK — but I’ve since meeting Susan Copeland briefly followed up with the metadata issues and schema involved here, and will be making use of those in evaluating Australian needs.
The afternoon session was also a bit of a head spinner for me. There was a session on the power of pdf files now to embed video and sound files in them, thus enabling interactive simulations within pdf’s. But discussions with others subsequently showed some strong divide and necessary cautions over this technology. Joan Cheverie of Georgetown Uni spoke of social science data and etd’s, and Austin of Proquest also made an appearance in this context, though there was no apparent linkage between the 2 institutions. These in part made reference to their use of controlled vocabularies, a topic of some interest to me at different levels – contrary to the presentations either side of this one. Concept maps in NDLTD were discussed by Edward Fox. The limitations of Scirus, for one, in not listing the department awarding the thesis, was commented on. This underscored for me the impossibility of standard schema and terminologies, and the need for interoperable (read, in part, granular) local or national schema for future-proofing our databases. But again I found worthwhile the opportunity at the Q and A conclusion to discuss and ask their views on the relative benefits of controlled vocabularies in the context of the available technologies. I know this is something that many will find infrastructure impositions upon them deciding the issue for them, but I did find myself leaning again and further towards maintaining controlled vocabs if at all possible.
Again, there were session I missed and I look forward to catching up with some of the sessions discussing situations in Italy and elsewhere. It is a plus to have made contact with the personnel involved, and knowing that a communication has begun with some that I have since begun to follow through. The abstracts at least at this stage are online at the conference site, and probably email addresses for others interested too.
Some of the keynote speakers did succeed in their intention to be provocative, but some of the delegates felt they were being too much so — and if taken at their word one might be left with the impression that repositories have no place at all. But the balance here that needs a place before going that far is the work of integrated systems, such as ICE and other systems working towards this world and in use in Sweden and elsewhere. Peter Murray-Rust’s presentations, for example, should be read in tandem with Peter Sefton’s.
June 28, 2007
Greg Crane spoke of the need and inevitability of moving beyond book-imitation pdf files. He used Peseus Classics Online as an example of the potential we should be aiming towards — where texts contain multiple links for each word — to dictionaries, to other related texts, to commentaries. The potential impact will move us beyond the slow and limited intake of information that comes currently from reading lines at a time, then moving on to other texts …. a 2 dimensional process as opposed to the 3 dimensional or more organic structure possible with the sort of thing we now see at Perseus.
I don’t know the technical structure behind Perseus, but I know Perseus well enough to see it as one model for a future online database — and as for metadata implications, what it is calling for is work on ontologies and the semantic web (i suspect perseus is not based on that at present but i could be wrong — and I see Greg has an article online discussing this Perseus project in more depth that I must read) — and that means RDF ideally rather than traditional schema such as MODS or MARC or DC. — though the RDF structured content could generate such schema when needed. (My thoughts arising from Greg’s presentation.)
Next session I attended covered Emory University’s work (Martin Halbert) on integrating IR’s upon Fedora, and building Web 2.0 web services on top of the Fedora repositories for ETD submission and admin and user/public dissemination processes. The approach is to balance flexibility and standards to achieve interoperability. I have requested a copy of the paper presented for this to investigate in more detail the metadata issues behind this balance of flexibility and standards.
I was intrigued by Adam Mikeal’s presentation on the Texas Digital Library. This is a consortium of libraries that deposit their ETD’s with the TDL — a federated collection of ETD’s apparently similar to our original Australasian Digital Theses Program. The metadata application used is a MODS application for theses, not ETD-MS. I had a brief discussion with Adam afterwards and have since received more info on the schema used. Keen to follow this through and see how it might be adapted for Australian needs.
An Indian presentation followed that pointed in a similar direction as the way the TDL is going — a centralized ETD repository — a national database collection. There are several ETD repositories in India but the IR scene is not uniform, hence the hopes for the national db to fill the need.
By attending that group of sessions I missed RUBRIC colleague Peter Sefton’s presentation, but, well, I have heard Peter discuss aspects of the Integrated Content Environment for Research and Scholarship (ICE-RS) piecemeal a number of times: in this context, it’s about writing and publishing a thesis, multimedia format, in pdf/html, with versioning controls in the process, and preservation and descriptive metadata . . . But check out the full story in his own presentation at USQ Eprints repository.
One can’t attend all simultaneous presentations and another I would have loved to have attended was another discussing how SURF (The Netherlands), JISC (UK) and DIVA (Sweden) have begun a project to harvest ETD’s from repositories internationally.
Where is Australia here? But having at least shaken hands with some of these people and “being there”, it gives one some hope that follow up contacts can begin to work towards making things happen for the Australian-New Zealand ADT program. Earlier this year I was appalled when email correspondence indicated that Australian repositories (Arrow Discovery Service) is a nonentity in the UK and Europe, and a bit player in some OAIster or SCIRUS harvesters. Will have to begin email links now between the Europeans and ADT here to see where we can move, and if that fails, to see what foundations can be laid to propel future collaboration between Australian IR’s — ETD’s being the driving force? — and the “world”.
Another presentation I missed while attending one that presented MODS for e-theses, was Ana Pavani’s (Brazil) “Looking at ETD’s from Different Points of View”. This promised a discussion of the considerable efforts put into metadata sets and union catalogue creation for the discovery of e-theses. I have already emailed Ana for more details to catch up here. In another presentation it was clear that ETD’s have the potential in many quarters, whether housed in separate collections or part of the rest of an IR, to promote the university or granting institutions given the right structures and metadata and recovery systems.
I also regretted not being able to attend NDLTD presentations, but I did meet several people from NDLTD and its UK and European sub-projects, and look forward to replies from emails I have since sent back to them to resume contact, and to continue online engagement in what is happening re international cooperative potentials for harvesting of ETD’s. (Again, where has Australia been till now!!)
To be contd….
June 27, 2007
This is not really an update on the etd conference but a spinoff of thoughts from there, specifically about our Australian-Australasian situation.
We don’t need to adopt one of the theses metadata schema currently used in the US or Europe but we should develop something compatible with those while meeting our own needs.
The US ETD-MS could be seen as a minimal thesis schema, the simple dublin core with a handful of additional etd elements added. But the UKETD-DC is a much richer thesis schema. It is an application of simple DC, some DC refinements (qualified elements), and about 10 “local refinements” such as publisher.institution for the awarding institution, publisher.department for the author affiliation, and publisher.commercial for a publisher.
There is also the French schema (TEF — theses electroniques francaises) which incorporates DC, DCterms, METS, METSRIGHTS, as well as TEF thesis specific elements. Germany is revamping their html based MetaDiss into XMetaDiss to be xml based, and compatible with ETD-MS.
We should be contacting reps from GUIDE (Guiding Universities in Doctoral E-Theses) — a working group of the NDLTD focussed on European doctoral e-theses and NDLTD et al to be doing the equivalent in Australia and the ADT program.
Maybe the ADT program needs to be extended with a subbranch to look at harvesting other thesis types from repositories too?
I’m looking forward to studying the various e-thesis schema more closely with a view to Australian needs, and proposing something more concrete asap.
And not just the metadata schema — but a closer look at the multiple long term requirements for preservation and extensibility for theses in the broader Australian context.