Links between open data

In another move towards more open exploration of university data, Southampton University have recently released a site which allows experiment and mashup with some of their administrative data.  This follows Tim Berners-Lee’s ideas on Linked Data and presents RDF structured data. There is an interesting piece from The Register IT-blog on the initiative which links the approach to work with the Ordnance Survey.

This takes its place in a range of current experiments and acts as one pole of an approach using structured data. The other pole is exemplified by the previously reported competition to use the heterogeneous collection of information available through Mendeley. The tension between usage of large sets of information with basic (if any) metadata and far smaller restricted sets with structure that allows wider experiment exists as a basic question in information management. The debate will doubtless continue.


Data-mining and repositories

There has been discussion for a while about the limits of using material from typical repositories.  In the absence of formal user-licences, there is an implict permission to read material – but what about data-mining?

A formal, cautious approach is to assume a lowest-common-denominator approach, where the rights to re-use material in a repository are taken as being those of the most restricted piece of content.

Material in a repository is normally pretty heterogeneous with respect to re-use rights.  And where the individual pieces do not have their rights associated with them, liberal rights pieces cannot be told apart from restrictive rights pieces.  Material that is truly Open Access is in a minority and will generally result form archiving true Open Access published materials, from BMC for example, where limited named rights are explicitly given to the publisher.  Many more OA published materials are actually restricted in re-use:  many OA publishers are anything but true OA. The corollary of true OA publishing is that all rights not granted to the publisher are retained by the author and then, presumably, licensed to the repository.

The majority of content is in a different situation from true OA material.  It will be in there as a result of copyright being transferred or exclusively licensed to a publisher, who has then granted back, or allowed retention of, nominated rights.  In such a circumstance then the author (and by extension the repository, see above), has only these certain, nominated, rights and if data-mining or other forms of re-use are not mentioned explicitly, then strictly, no such right exists.  Some publishers explicitly exclude the right to data-mine the article and so, without being able to identify these, the lowest common-denominator approach kicks in.

The easiest solution for data-mining (and it could be argued for open access in general)  is blanket rights for data-mining being retained by funders: or for publicly funded research to be placed in the public domain as regards copyright, as is done in the States.

All this concern, of course, restricts the full potential of open access being realised: what assumptions can or should be made, what liaibility, if any,  should be risked in order to get at this potential?

Mendeley have one solution:  do it and see!  They have just announced a competition to mine the articles that authors have put on their Mendeley accounts.  It will be interesting to see how the rights issue will be handled: it may prove to be a model for others to follow.


Altruism is not Enough

Another interesting evening event organised by RIN last week in the series Research Information in Transition looked at the topics of data handling and data sharing.

I was not surprised to see that many of the issues we’ve identified as having a bearing on the take-up of Green/Gold open access also raised their heads in connection with data.

Andrew Young from Liverpool John Moores University talked about the challenges of persuading researchers to put their data into an institutional repository even when it was a conditionof their grant. Policies, systems and guidance may all be in place but further incentives seem to be needed.

Carole Goble from Manchester University described designing systems to encourage the sharing of data between scientists working on the SysMO (Systems Biology of Microorganisms) project. Some were reluctant to share – among other reasons, becasue data sharing isn’t recognised by the academic reward system.

Kevin Ashley, Director of the Digital Curation Centre, addressed similar issues, stressing the need for interaction between policies and behaviours: policies on their own don’t have enough effect.

Our discussions with researchers about open access are leading us to reach the same conclusion. So the challenge for policy makers and funders is to enmesh the open sharing of research results with the attainment of academic prestige, promotion and kudos. Altruism is not enough.

Read more about managing and sharing data in the Scholarly Communications Action Handbook.

Open data

Data punch card

There has been a lot of talk about open data as of late (see the Panton Principles, Kevin Smith’s blog post “What is Open Data”, and Michael Gurstein’s blog posts here and here). I’ve also recently read about numerous open data projects/initiatives through the Connotea open access tracking project (developed by Peter Suber), including the Hawaii Open Data Project, DataSF, and the World Bank, to name a few. Much of these projects involve governmental and organisational data, which seems to be transitioning to open with little trouble.

Data produced in the academic environment, alternatively, still has a way to go. Discussions at the Publishing primary research data breakout session at Science Online London demonstrated that academics still have concerns about data source citing and how credit is given, and would like to see the publication of data recognised as an “academic contribution”, aka equivalent to the publication of a journal article. Fortunately work is being done by JISC (Managing Research Data Programme) and the British Library (DataCite) to support academics in this area. It will be interesting to see how things continue to develop in this area.

Image credit: Chris Campbell