September 29, 2009
by Bill Hubbard
A thread on JISC-Repositories this week has been discussing whether to delete repository records when an academic leaves. This set me thinking about such policies in general and how the interaction of different policies between repositories may affect access or collections in the long run. It is an example, I think, of the way that institutional repositories work best when seen as a network of interdependent and collaborative nodes that can be driven by their own needs but produce a more general collective system.
Our policy in Nottingham is that we see our repository as a collection of material that is produced by our staff. Therefore, our policy developed that when a member of staff leaves, we will not delete their items as this is a record of their research production while they were here.
More than that, authors should not expect such deletion even upon their request, except in very unusual circumstances. If repositories are to be used as trusted sources of information, the stability of the records they hold is very important.
If authors have put material into the repository which includes their “back-catalogue” produced at previous institutions, then that is fine too — we will accept them and keep them. Strictly, they did not produce this material while they are employed at Nottingham, but if they are not openly accessible elsewhere, why not take them? It might be slightly anomalous to hold this material but if it opens access to research information, that’s the basis of what it’s all about.
I think there is a transition period here, while academics adopt the idea of depositing material. I think it’s likely that academics will put their back-catalogue to date into the first major repository that they use in earnest, if they have the right versions available. Thereafter, as this material should be kept safe and accessible, they can always link back to it. In other words, once they have deposited their back catalogue, there are unlikely to want to do it at every subsequent institution they move to: as long as they know it will be safe and that they can link to it. There is an advocacy theme here to help researchers understand that repositories are linked and that the repository – and repository network – will serve them throughout their career.
For a newly-arrived member of staff with material in a previous institution’s repository, then it all depends on the new institution’s collection policy as to whether the institution would prefer them to just deposit outputs they produce from that time on; deposit all their own material again; or create a virtual full record of outputs by copying the metadata and linking back to full-text in the previous repository(ies). This will depend in turn as to whether the previous repositories are trusted to match the new institution’s own terms for access and preservation.
Maybe if the material is held on a repository without long-term assurance of durability — maybe on a commercial service — and if the institution’s repository works on a level which cannot be matched, then there would be a rationale for holding a local copy of the full-text. This may be held and exposed, or possibly be held in reserve in case of external service failure. Otherwise, simply linking back to the full-text held on the previous repository seems most practical if a full record is required.
If the previous repository is trusted to provide the same level of service in access, preservation, and stability, then it does not really matter which URL or repository underlies the “click for full text” link. Academics can compile their list of publications and draw from the different institutions at which they have worked: repositories can hold their own copy of metadata records and link to external trusted repositories; and as far as the reader-user is concerned it’s still “search for paper — find — click — get full-text paper”.
This kind of pragmatic approach may well mean that some duplicates (metadata record and/or full-text) get into the system by being held at more than one location. Duplication/ close-to-duplication will have to become a non-issue. I cannot see that duplication can be completely avoided in future: it already happens. As such, handling close and exact duplicates is an issue we cannot avoid and must solve in some way as it inevitably arises. That is not to say that the publisher’s version will automatically become the “official” record in the way that it tends to be used now. We do not know how versions/ variants/ dynamic developments of papers will be used and regarded by researchers: we are just at the start of a period of change in research communications. Therefore if a process offers solutions and benefits, associated risks of duplication are not sufficient to dismiss the process as impractical.
After all, what is the alternative? If as repository managers we start deleting records when folks leave and have to create/import/ask the academic for a complete set of their outputs when they arrive at a new institution, I think we, and significantly the users of open access research, will very quickly get into a situation where we lose track of what is where.
Even if we try to create policies or systems to replace an old link (to the now-deleted full-text), with a new link (to the full-text in the new repository), I cannot see this working seamlessly and things will get lost. In addition I think that subsequent moves by the author would create daisy chains of onward references which would be very fragile.
While the use of repository references in citation of materials relates to research practice and so is for resolution between researchers rather than between ourselves, I don’t think we should deliberately disrupt longer-term references to material. Rather, I would see the system building on existing stable records and all institutional repositories able to play a part in the system-wide provision of information as stable sources.
Therefore, I would suggest that repositories should continue to hold staff items after they have left, as this helps fulfil their role as institutional assets and records. Repositories can accept an academic’s back-catalogue, even if it has not been produced at the institution, as being anomalous but in line with our joint overall aim of providing access to research information. Adopting standard practices will help reassure each institution that other repositories can be trusted with access and curation and allow stable cross-linking. Once a repository has material openly accessible, then, given matching service levels, the whole system supports linking to that material, without anything but possible local needs for additional, duplicate copies. Overall, repositories can follow their institutional self-interest and still create a robust networked system.
Bill