Peter Desmet

Reporting on open science

What data license should GBIF use?

On August 5, the Global Biodiversity Information Facility (GBIF) released a consultation document1 asking feedback regarding applying a machine readable license to all GBIF-mediated data. I am very happy GBIF is finally addressing this issue. The big question is of course: what license(s)?

I faced the same question for the data shared through the Canadensys network, and I'm still immensely proud that most of the Canadensys participants released their data into the public domain under a CC0 waiver and rely on community norms, not legal instruments, to express how the data should be used. This makes the data truly usable.

This is the best and probably only valid approach GBIF should take, as much more eloquently expressed then I ever could by Jonathan Rees, Karen Cranston, Hilmar Lapp and Todd Vision in their response to GBIF on this issue. Canadensys benefitted from being a new, unified and small network in taking this approach, but that doesn't mean GBIF cannot take such a "more radical" approach (as GBIF puts it).

This is why I would suggest to:

  1. Draft community norms for data use and publication. The Canadensys norms could be a starting point (and are actually available on GitHub for that purpose). This would be immediately beneficial to data publishers that want to rely on GBIF community approved norms with CC0, instead of using the Canadensys norms (e.g. this dataset from the INBO).
  2. Educate the GBIF community about licenses and open data. It's only recently that widely-recognized best practices (such as CC0 for data or a definition of open data) are emerging from numerous domains, so it's no wonder that many data publishers don't care, don't know or have misconceptions. Heck, I didn't know or care about this until I educated myself 2 years ago.
  3. Allow new datasets to only be published under CC0 and the community norms (this is the approach Dryad has taken). This ensures that at least new data are truly usable. The data publishers currently have to agree with the data sharing agreement before publishing, so the mechanism is already there to notify them.
  4. Communicate the truly open stance the GBIF community is taking with data publishers of existing datasets, instead of provisionally applying a license that might be based on a misconception. I don't know though if there should be a hard deadline to this migration period, what options/restrictions to offer to data publishers that do not want to move to open data, and if GBIF can legally apply CC0 to datasets after numerous attempts to contact the data publisher.
  5. Promote standards and technologies that enable the effective tracking of data use (from Rees et al. above). GBIF is already working on this and should continue to do so.
  6. And on the specific question regarding supporting restrictions on commercial use (option 1 in the document): I would find it disheartening if effort and resources are put into creating and supporting an infrastructure that would allow the use of an ill-defined and non-open data license.

I think most of all that this is an excellent opportunity for the GBIF community to send a strong message that truly open data is the way to go. I know it will help me to convince data publishers to publish their data as open biodiversity data.

  1. The consolation document was sent by email, but has been put online by Roderic Page on Dropbox and Google Drive