One question that comes up when starting data entry in repositories is whether or not to include the nonsort initial article (A, An, The) in a title entry or not. When I started as an Eprint editor this question did arise at least for me and my first impulse was to omit it. The repository did not allow for an alphabetical sort by titles at that time, but what if that was to be required in the future?
But in my metadata chapter for the RUBRIC toolkit I expressed the opposite view.
Including the initial articles is the expected format for most people to see with title with the article, especially if people are looking for a known title. And most people submitting to the repository will be academics and many nontrained indexers and library staff who will find this the natural way to enter data.
Okay, I know this brings up worries about indexing. But was not the rationale for the omission of initial articles simply the problem of using systems that found them too difficult to handle for indexing. That is fast becoming an obsolete issue.
So what of indexing? Systems are developing quickly and repositories that don’t currently omit stopwords/articles can be expected to change. Vital 2.1 does not, but Vital 3 does have that option – it can be configured to omit stopwords in indexing. DSpace does omit the articles in the Title browse list. Eprints does not have a title browse list, nor, I think, Fez. But I have no doubt if Fez did introduce that then stopwords would be an index feature.
Eprints and DSpace citations look much more “natural” or consistent with citations as listed elsewhere in print and electronic databases when the initial articles are included, and by including these articles in the titles the system generated citations look “just right”.
As systems are becoming increasingly sophisticated I see no need to omit the initial articles, and if they are a part of a title it is the “natural” and expected thing to see them included in citations, and references to that title.
I should also add that where the articles are included and stored in a granular datastream such as MODS or MARC then the triggers for a system to regard these as stopwords are already part of the data.
But I’m more than happy for alternative feedback.