<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://peterdesmet.com/feed.xml" rel="self" type="application/atom+xml" /><link href="https://peterdesmet.com/" rel="alternate" type="text/html" /><updated>2026-05-19T07:41:21+00:00</updated><id>https://peterdesmet.com/feed.xml</id><title type="html">Peter Desmet</title><subtitle>Content and settings for my blog</subtitle><author><name>Peter Desmet</name></author><entry><title type="html">Support GBIF in choosing CC0 + norms</title><link href="https://peterdesmet.com/posts/gbif-data-license-2.html" rel="alternate" type="text/html" title="Support GBIF in choosing CC0 + norms" /><published>2014-05-05T00:00:00+00:00</published><updated>2014-05-05T00:00:00+00:00</updated><id>https://peterdesmet.com/posts/gbif-data-license-2</id><content type="html" xml:base="https://peterdesmet.com/posts/gbif-data-license-2.html"><![CDATA[<p>Last year, the Global Biodiversity Information Facility (<a href="http://www.gbif.org">GBIF</a>) asked what license(s) should be chosen for GBIF-mediated data. This triggered 32 responses (<a href="/posts/gbif-data-license.html">including my own</a>) with the following key messages:</p>

<blockquote>
  <ol>
    <li>The proposed copyright licensing model (e.g. CC licenses) may not be a suitable mechanism to control or restrict use, as the type of data published through GBIF falls in a legal ‘grey area’. These data may not in fact be eligible for copyright at all, at least in some jurisdictions. Deciding this point may also be complicated by the types of content offered, particularly when video, image or other multimedia content is included. GBIF were advised not to contribute to misunderstanding by promoting licenses which may not be legally actionable.</li>
    <li>There is a strong desire by some (about one third of the respondents) to attempt to restrict access for commercial use. However, as some discussed in their responses, it is very difficult to agree or define what constitutes commercial use.</li>
    <li>Many of the concerns expressed relate to the need for better attribution. It is suggested that, rather than asserting a copyright license, GBIF should focus with high urgency on improving citation and tracking data use.</li>
  </ol>
</blockquote>

<p>In response to this, GBIF has issued a <a href="http://www.gbif.org/newsroom/consultations#licensing">second consultation call</a>, with a set of proposed changes which are in my opinion a tremendous step forward<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>. These include adopting <a href="https://creativecommons.org/publicdomain/zero/1.0/">CC0</a> for all data<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>, educating the community, drafting community norms, allowing publishers to at least express the wish that data should not used for commercial gain, and building tools to facilitate citation, attribution, and data use tracking.</p>

<p>If you want to support GBIF in taking these necessary steps towards open biodiversity data, <a href="http://www.gbif.org/newsroom/consultations#licensing">please reply to the consultation</a>, even (or especially) if you are not part of the GBIF community. The deadline for replying is <strong>June 14</strong>.</p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p>The proposed changes go even further than <a href="/posts/gbif-data-license.html">my wish list</a> and include a realistic proposal for the non-commercial use issue. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:2" role="doc-endnote">
      <p>VertNet now also strongly recommends CC0 for data in the recently published “<a href="http://www.vertnet.org/resources/datalicensingguide.html">Quick guide to copyright and licenses for dataset publication</a>”, in addition to <a href="http://community.canadensys.net/2012/why-we-should-publish-our-data-under-cc0">Canadensys</a> and others. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Peter Desmet</name></author><category term="open data" /><category term="GBIF" /><summary type="html"><![CDATA[Changes proposed by GBIF regarding data licensing are a tremendous step forward.]]></summary></entry><entry><title type="html">Analyzing the licenses of all 11,000+ GBIF registered datasets</title><link href="https://peterdesmet.com/posts/analyzing-gbif-data-licenses.html" rel="alternate" type="text/html" title="Analyzing the licenses of all 11,000+ GBIF registered datasets" /><published>2013-11-22T00:00:00+00:00</published><updated>2013-11-22T00:00:00+00:00</updated><id>https://peterdesmet.com/posts/analyzing-gbif-data-licenses</id><content type="html" xml:base="https://peterdesmet.com/posts/analyzing-gbif-data-licenses.html"><![CDATA[<!-- jquery included via Petridish -->
<script src="https://d3js.org/d3.v3.min.js"></script>

<script src="https://datafable.github.io/gbif-data-licenses/charts-for-blog/js/nv.d3.min.js"></script>

<script src="https://datafable.github.io/gbif-data-licenses/charts-for-blog/js/data.js"></script>

<script src="https://datafable.github.io/gbif-data-licenses/charts-for-blog/js/charts.js"></script>

<link href="https://datafable.github.io/gbif-data-licenses/charts-for-blog/css/nv.d3.min.css" rel="stylesheet" type="text/css" />

<style>
  .chart {
    display: block;
    height: 300px;
    width: 100%;
  }
  .chart .title {
      font-weight: bold;
  }
  .nvtooltip h3 {
      font-size: 1.2em;
  }
</style>

<p>In my <a href="/posts/illegal-bullfrogs.html">previous post</a>, I highlighted the legal issues showing 13,297 American bullfrog records downloaded from <a href="http://www.gbif.org">GBIF</a> on a map. 96% of those records had no or a non-standard data license, making data use legally cumbersome.</p>

<p>But how much of this applies to all <a href="http://www.gbif.org/occurrence">417+ million occurrence records</a> in GBIF? How challenging is GBIF’s <a href="/posts/gbif-data-license.html">2014 mission to provide a machine readable, standard license</a> for all datasets? Fellow Datafable<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> member <a href="https://twitter.com/bartaelterman">Bart Aelterman</a> and I tried to figure out.</p>

<h2 id="methodology">Methodology</h2>

<p>We used the <a href="http://www.gbif.org/developer/registry">GBIF registry API</a> to obtain the metadata for all <a href="http://www.gbif.org/dataset/">11,000+ GBIF registered datasets</a> and in particular the <code class="language-plaintext highlighter-rouge">rights</code> field, which is where data publishers can provide the license under which the dataset is published. We then created a <a href="https://github.com/datafable/gbif-data-licenses/blob/master/data/licenses.csv">unique list of all licenses</a> used, which we annotated with parameters such as <code class="language-plaintext highlighter-rouge">use allowed</code> and <code class="language-plaintext highlighter-rouge">attribution required</code>. This information was joined back with the dataset information to get an idea of the distribution of certain types of licenses over all datasets and occurrence records. We also documented the <a href="https://github.com/datafable/gbif-data-licenses/blob/master/guidelines.md">guidelines</a> we used for annotating these licenses.</p>

<p>In total we analyzed <strong>11,974 datasets</strong><sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>, representing <strong>415,927,654 occurrences</strong>. The first thing we noticed is that only 10% of those datasets (26% of the occurrences) have a license. This is problematic (see further), but it had the welcome side effect that we “only” had to <a href="https://github.com/datafable/gbif-data-licenses/blob/master/data/licenses.csv">annotate 432 different licenses</a>.</p>

<p>All code and data<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup> for this project are available on <a href="https://github.com/datafable/gbif-data-licenses">GitHub</a>. <code class="language-plaintext highlighter-rouge">#openresearch #ftw</code></p>

<h2 id="results">Results</h2>

<h3 id="overview-of-the-licenses-used">Overview of the licenses used</h3>

<table>
  <thead>
    <tr>
      <th>License</th>
      <th style="text-align: right"># of datasets</th>
      <th style="text-align: right"># of records</th>
      <th style="text-align: right">% of records</th>
      <th><a href="https://docs.google.com/file/d/0B-PC5KKdhYCQZ1Y5Q2RySmdPbjQ/edit?usp=sharing">GBIF practice?</a></th>
      <th><a href="http://opendefinition.org/okd/">Open data?</a></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><a href="http://creativecommons.org/publicdomain/zero/1.0/">CC0</a></td>
      <td style="text-align: right">105</td>
      <td style="text-align: right">2,155,108</td>
      <td style="text-align: right">0.5%</td>
      <td>yes</td>
      <td>yes</td>
    </tr>
    <tr>
      <td><a href="http://creativecommons.org/licenses/by/3.0/">CC BY</a></td>
      <td style="text-align: right">8</td>
      <td style="text-align: right">2,240,674</td>
      <td style="text-align: right">0.5%</td>
      <td>yes</td>
      <td>yes</td>
    </tr>
    <tr>
      <td><a href="http://opendatacommons.org/licenses/by/1.0/">ODC-By</a></td>
      <td style="text-align: right">11</td>
      <td style="text-align: right">567,675</td>
      <td style="text-align: right">0.1%</td>
      <td>yes</td>
      <td>yes</td>
    </tr>
    <tr>
      <td><a href="http://creativecommons.org/licenses/by-sa/3.0/">CC BY-SA</a></td>
      <td style="text-align: right">16</td>
      <td style="text-align: right">450,421</td>
      <td style="text-align: right">0.1</td>
      <td>no</td>
      <td>yes</td>
    </tr>
    <tr>
      <td><a href="http://opendatacommons.org/licenses/odbl/1.0/">ODbL</a> &amp; <a href="http://opendatacommons.org/licenses/dbcl/1.0/">DbCL</a></td>
      <td style="text-align: right">3</td>
      <td style="text-align: right">864</td>
      <td style="text-align: right">0.0%</td>
      <td>no</td>
      <td>yes</td>
    </tr>
    <tr>
      <td><a href="http://creativecommons.org/licenses/by-nc/3.0/">CC BY-NC</a></td>
      <td style="text-align: right">10</td>
      <td style="text-align: right">4,308,627</td>
      <td style="text-align: right">1.0%</td>
      <td>expected by some</td>
      <td>no</td>
    </tr>
    <tr>
      <td><a href="http://creativecommons.org/licenses/by-nc-sa/3.0/">CC BY-NC-SA</a></td>
      <td style="text-align: right">17</td>
      <td style="text-align: right">569,040</td>
      <td style="text-align: right">0.1%</td>
      <td>no</td>
      <td>no</td>
    </tr>
    <tr>
      <td><a href="http://creativecommons.org/licenses/by-nc-nd/3.0/">CC BY-NC-ND</a></td>
      <td style="text-align: right">1</td>
      <td style="text-align: right">26,132</td>
      <td style="text-align: right">0.0%</td>
      <td>no</td>
      <td>no</td>
    </tr>
    <tr>
      <td>Non-standard license</td>
      <td style="text-align: right">1,069</td>
      <td style="text-align: right">100,062,731</td>
      <td style="text-align: right">24.1%</td>
      <td>?</td>
      <td>?</td>
    </tr>
    <tr>
      <td>No license</td>
      <td style="text-align: right">10,734</td>
      <td style="text-align: right">305,546,382</td>
      <td style="text-align: right">73.5%</td>
      <td>?</td>
      <td>?</td>
    </tr>
  </tbody>
</table>

<h3 id="standard-licenses">Standard licenses</h3>

<p>Ignoring for a moment that <a href="http://community.canadensys.net/2012/why-we-should-publish-our-data-under-cc0">CC0 is</a> <a href="http://blog.datadryad.org/2011/10/05/why-does-dryad-use-cc0/">the only</a> <a href="http://doi.org/10.6084/m9.figshare.799766">sensible license</a> <a href="/posts/gbif-data-license.html">for data</a>, a standard license (<a href="http://creativecommons.org/licenses/">Creative Commons</a> or <a href="http://opendatacommons.org/licenses/">Open Data Commons</a>) is at least standardized and easy to understand. Only 1.4% of all datasets however (2% of all occurrences) are published with a standard license.</p>

<div class="clearfix">
  <svg id="chart1" class="chart" style="float:left; width: 50%;"></svg>
  <svg id="chart2" class="chart" style="float:left; width: 50%;"></svg>
</div>

<p>Data dedicated to the public domain under <a href="http://creativecommons.org/publicdomain/zero/1.0/">CC0</a> represents an even smaller percentage: 0.9% of all datasets (0.5% of all occurrences). The silver lining is that most data publishers who choose a standard license, choose CC0 (105 datasets).</p>

<h3 id="interpreting-the-other-licenses">Interpreting the other licenses</h3>

<p>All other data are provided with no or a non-standard license, with a percentage similar to the <a href="/posts/illegal-bullfrogs.html">bullfrog sample</a> (98% vs 96% of the occurrences). These data are in a legal gray zone: it’s a mixture of legalese, norms, restrictions, agreements, or in most cases no information at all. It is up to every data user to figure out the details.</p>

<p>We tried to lift some of that burden by <a href="https://github.com/datafable/gbif-data-licenses/blob/master/data/licenses.csv">interpreting all these licenses</a>, extracting some characteristics, but it should be clear that this is an attempt<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup> that should only be used with caution. The results are presented in the charts below. You can click the legends to toggle parts of the chart.</p>

<h4 id="datasets">Datasets</h4>

<div><svg id="chart3" class="chart"></svg></div>

<h4 id="occurrences">Occurrences</h4>

<div><svg id="chart4" class="chart"></svg></div>

<h2 id="conclusion">Conclusion</h2>

<p>Our analysis of the licenses of all 11.000+ GBIF registered datasets shows a bleak picture. Very few GBIF registered datasets can be easily and legally used, let alone without restrictions. This is mainly due to data being published with no or a non-standard license.</p>

<p>Fixing this is crucial, and GBIF’s 2014 mission to provide a machine readable, standard license to all datasets is a step in the good direction. We hope our <a href="https://github.com/datafable/gbif-data-licenses">analysis</a> (which can be run again) and <a href="https://github.com/datafable/gbif-data-licenses/blob/master/guidelines.md">guidelines</a> already help with:</p>

<blockquote>
  <p>The Secretariat would review existing metadata provisionally to assign<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup> each current data set to one of these categories and would then communicate with data publishers to confirm the assignment. [<a href="https://docs.google.com/file/d/0B-PC5KKdhYCQZ1Y5Q2RySmdPbjQ/edit?usp=sharing">source</a>]</p>
</blockquote>

<p>More importantly, this mission should be used as <a href="/posts/gbif-data-license.html">an opportunity</a> to make the <code class="language-plaintext highlighter-rouge">rights</code> field mandatory, require CC0, and shift the discussion about ethical data use (including attribution) to norms rather than ill-suited legal tools.</p>
<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p>To combine our skills and organize some of our extracurricular activities, we started a team of open data enthusiasts called <a href="https://twitter.com/datafable">Datafable</a>. The results of our first project was <a href="http://www.gbif.org/page/2991">published by GBIF</a> last week. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:2" role="doc-endnote">
      <p>These include <a href="http://www.gbif.org/dataset/search?type=CHECKLIST">checklist</a> and <a href="http://www.gbif.org/dataset/search?type=OCCURRENCE">occurrence datasets</a>. Obviously, only occurrence datasets are represented in the results for occurrences. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:3" role="doc-endnote">
      <p>Additional legal issue: what license applies to the <strong>metadata</strong> of GBIF registered datasets? Can we publish even part of it on a GitHub repository? Note that metadata <em>does</em> include creative content, and some of it is even published as data papers. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:4" role="doc-endnote">
      <p>We considered an alternative interpretation, taking into account the <a href="http://www.gbif.org/disclaimer/datause">GBIF use agreement</a> (DUA). <a href="https://twitter.com/jar346">Jonathan A. Rees</a> pointed out however that a DUA can only add restrictions or conditions, but never grant permissions (only copyright holders have the legal standing to do so). In other words, the GBIF DUA does not solve the situation of having no license: users still have to figure out the legal implications. See <a href="https://github.com/datafable/gbif-data-licenses/issues/12">this issue</a> for the whole discussion. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:5" role="doc-endnote">
      <p>The characteristics we assigned to the licenses (<code class="language-plaintext highlighter-rouge">commercial use allowed</code>, <code class="language-plaintext highlighter-rouge">notification required</code>, etc.) could even be provided as machine tags on the GBIF portal, allowing users to already get some indication of what is allowed/required. <a href="#fnref:5" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Peter Desmet</name></author><category term="open data" /><category term="open research" /><category term="GBIF" /><category term="Datafable" /><summary type="html"><![CDATA[How much GBIF mediated data can be legally used easily? A collaborative analysis.]]></summary></entry><entry><title type="html">Showing you this map of aggregated bullfrog occurrences would be illegal</title><link href="https://peterdesmet.com/posts/illegal-bullfrogs.html" rel="alternate" type="text/html" title="Showing you this map of aggregated bullfrog occurrences would be illegal" /><published>2013-10-17T00:00:00+00:00</published><updated>2013-10-17T00:00:00+00:00</updated><id>https://peterdesmet.com/posts/illegal-bullfrogs</id><content type="html" xml:base="https://peterdesmet.com/posts/illegal-bullfrogs.html"><![CDATA[<p>Last week, the Global Biodiversity Information Facility (GBIF) launched their <a href="http://www.gbif.org/">new awesome data portal</a>. One of the things I like most is that the record limit on downloads has been lifted, so we now have free and open access to all 415+ million occurrence records GBIF aggregates. GBIF also makes an effort to lower the barrier to correctly attribute the data publishers, by providing extensive metadata and a citation suggestion in each data download.</p>

<p>That doesn’t mean however that it is actually easy to legally use the data, <a href="/posts/gbif-data-license.html">something GBIF is aware of</a>. As a test, I downloaded all <a href="http://www.gbif.org/occurrence/search?TAXON_KEY=2427091&amp;HAS_COORDINATE=true&amp;HAS_GEOSPATIAL_ISSUE=false">13,297 georeferenced American bullfrog records</a> and would like to visualize and share these on a map using <a href="http://cartodb.com">CartoDB</a>. Technically, this would only take me a few minutes, but to make sure I’m not violating any restrictions, I need to take a closer look at the fine print.</p>

<p><img src="/assets/images/2013-10-17-illegal-bullfrogs-map-unavailable.png" alt="Unavailable bullfrog records" /></p>

<h2 id="65-data-licenses">65 data licenses</h2>

<p>By downloading the data from GBIF, I agree with the <a href="http://www.gbif.org/disclaimer/datause">data use agreement</a>, which states among other things:</p>

<blockquote>
  <ul>
    <li>Users must comply with additional terms and conditions of use set by the Data Publisher. Where these exist they will be available through the metadata associated with the data.</li>
  </ul>
</blockquote>

<p>Indeed, GBIF includes metadata for each dataset included in my download, and a file with all the rights as supplied by the data publishers (<code class="language-plaintext highlighter-rouge">rights.txt</code>). Since my download aggregates records from 65 data publishers, I have to read and understand 65 license statements before I can use the data<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>.</p>

<h3 id="standard-data-licenses">Standard data licenses</h3>

<p>Four datasets are provided with a standard data license (e.g. <a href="http://creativecommons.org/licenses/">Creative Commons</a>) and thus easy to understand:</p>

<table>
  <thead>
    <tr>
      <th>License</th>
      <th style="text-align: right"># of datasets</th>
      <th style="text-align: right"># of records</th>
      <th style="text-align: right">% of records</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><a href="http://creativecommons.org/publicdomain/zero/1.0/">CC0</a></td>
      <td style="text-align: right">2 (<a href="http://www.gbif.org/dataset/8c201186-d997-4b65-aac9-2fcf442a93f6">1</a> &amp; <a href="http://www.gbif.org/dataset/cc28549b-467f-448c-875e-881ca507aba8">1</a>)</td>
      <td style="text-align: right">543</td>
      <td style="text-align: right">4%</td>
    </tr>
    <tr>
      <td><a href="http://creativecommons.org/licenses/by-sa/3.0/">CC-BY-SA</a></td>
      <td style="text-align: right"><a href="http://www.gbif.org/dataset/b70121ef-b7ea-4316-a05b-abdf30f5ca09">1</a></td>
      <td style="text-align: right">4</td>
      <td style="text-align: right">0%</td>
    </tr>
    <tr>
      <td><a href="http://creativecommons.org/licenses/by-sa/3.0/">CC-BY-NC-SA</a></td>
      <td style="text-align: right"><a href="http://www.gbif.org/dataset/94dce9c1-e2f0-45cb-a77b-8e5caa871a41">1</a></td>
      <td style="text-align: right">1</td>
      <td style="text-align: right">0%</td>
    </tr>
    <tr>
      <td>Non-standard license</td>
      <td style="text-align: right">61</td>
      <td style="text-align: right">12,749</td>
      <td style="text-align: right">96%</td>
    </tr>
  </tbody>
</table>

<h3 id="interpreting-the-other-licenses">Interpreting the other licenses</h3>

<p>I am entering unknown legal territory by interpreting the non-standard licenses, but since I would like to create an occurrence map with more than 4% of the data, I’ll try anyway.</p>

<p>24 datasets don’t supply rights, so I could either interpret this as: 1) I’m free to use these data under the general GBIF data use agreement, or 2) I don’t want to risk violating any applicable copyright or database rights, so I won’t use these data.</p>

<p>I interpreted<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup> other statements as <em>Non-commercial use</em> (1 dataset, 22 records), <em>Non-commercial use with attribution</em> (5 datasets, 3,190 records), <em>Public domain</em> (1 dataset, 1 record) or <em>All rights reserved</em> (1 dataset: I’m looking at you, <a href="http://www.gbif.org/dataset/8138eb72-f762-11e1-a439-00145eb45e9a">Royal Belgian Institute for Natural Sciences</a>, 1 record). For 3 datasets (373 records) it was unclear to me what the license allowed.</p>

<p>The bulk of the data however (26 datasets, 6,894 records) have a license of the form (emphasis mine):</p>

<blockquote>
  <p>[Institution A] data records may be used by individual researchers or research groups, but <strong>they may not be repackaged, resold, or redistributed in any form without the express written consent</strong> of a curatorial staff member of [Institution A]. If any of these records are used in an analysis or report, the provenance of the original data must be acknowledged and [Institution A] notified. [Institution A] and its staff are not responsible for damages, injury or loss due to the use of these data.</p>
</blockquote>

<p>… or something along the same lines, which I interpreted as <em>Some use with attribution, no redistribution</em>.</p>

<h3 id="overview-of-the-licenses-used">Overview of the licenses used</h3>

<table>
  <thead>
    <tr>
      <th>License</th>
      <th style="text-align: right"># of datasets</th>
      <th style="text-align: right"># of records</th>
      <th style="text-align: right">% of records</th>
      <th><a href="https://docs.google.com/file/d/0B-PC5KKdhYCQZ1Y5Q2RySmdPbjQ/edit?usp=sharing">GBIF practice?</a></th>
      <th><a href="http://opendefinition.org/okd/">Open data?</a></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Public domain (incl. CC0)</td>
      <td style="text-align: right">3</td>
      <td style="text-align: right">544</td>
      <td style="text-align: right">4%</td>
      <td>yes</td>
      <td>yes</td>
    </tr>
    <tr>
      <td>Use with attribution and share alike (incl. CC-BY-SA)</td>
      <td style="text-align: right">1</td>
      <td style="text-align: right">4</td>
      <td style="text-align: right">0%</td>
      <td>no</td>
      <td>yes</td>
    </tr>
    <tr>
      <td>Non-commercial use</td>
      <td style="text-align: right">1</td>
      <td style="text-align: right">22</td>
      <td style="text-align: right">0%</td>
      <td>expected by some</td>
      <td>no</td>
    </tr>
    <tr>
      <td>Non-commercial use with attribution (similar to <a href="http://creativecommons.org/licenses/by-nc/3.0/">CC-BY-NC</a>)</td>
      <td style="text-align: right">5</td>
      <td style="text-align: right">3,190</td>
      <td style="text-align: right">24%</td>
      <td>expected by some</td>
      <td>no</td>
    </tr>
    <tr>
      <td>Non-commercial use with attribution and share alike (incl. CC-BY-NC-SA</td>
      <td style="text-align: right">1</td>
      <td style="text-align: right">1</td>
      <td style="text-align: right">0%</td>
      <td>no</td>
      <td>no</td>
    </tr>
    <tr>
      <td>Some use with attribution, no redistribution</td>
      <td style="text-align: right">26</td>
      <td style="text-align: right">6,894</td>
      <td style="text-align: right">52%</td>
      <td>no</td>
      <td>no</td>
    </tr>
    <tr>
      <td>All rights reserved</td>
      <td style="text-align: right">1</td>
      <td style="text-align: right">1</td>
      <td style="text-align: right">0%</td>
      <td>no</td>
      <td>no</td>
    </tr>
    <tr>
      <td>Unclear</td>
      <td style="text-align: right">3</td>
      <td style="text-align: right">373</td>
      <td style="text-align: right">3%</td>
      <td>?</td>
      <td>?</td>
    </tr>
    <tr>
      <td>Not supplied</td>
      <td style="text-align: right">24</td>
      <td style="text-align: right">2,268</td>
      <td style="text-align: right">17%</td>
      <td>?</td>
      <td>?</td>
    </tr>
  </tbody>
</table>

<p>These results are quite bleak: only 4 data publishers (4% of the data) publish their bullfrog occurrences as open data and only 8 (28% of the data) publish their data compatible with practices or expectations in GBIF today. This doesn’t even address if the chosen license makes actual sense for data (<a href="/posts/gbif-data-license.html">see my previous blog post</a>).</p>

<h2 id="what-maps-can-i-show-you">What maps can I show you?</h2>

<h3 id="public-records">Public records</h3>

<p><img src="/assets/images/2013-10-17-illegal-bullfrogs-map-public.png" alt="Public bullfrog records" /></p>

<p>This map shows all <strong>544 records dedicated to the public domain (4%)</strong>. To comply with the GBIF data use agreement, I must publicly acknowledge the data publishers whose biodiversity data I have used here. I am happy to highlight such open datasets:</p>

<ul>
  <li><a href="http://www.gbif.org/dataset/8c201186-d997-4b65-aac9-2fcf442a93f6">Herpetology Collection - Royal Ontario Museum</a></li>
  <li><a href="http://www.gbif.org/dataset/cc28549b-467f-448c-875e-881ca507aba8">Colección de anfibios - Museo de Herpetología de la Universidad de Antioquia</a></li>
  <li><a href="http://www.gbif.org/dataset/635e4476-f762-11e1-a439-00145eb45e9a">Ministerio de Medio Ambiente, y Medio Rural y Marino. Dirección General de Medio Natural y Política Forestal. Inventario Nacional de Biodiversidad 2007, Anfibios</a>, though it would be useful if a standard license was used instead of <code class="language-plaintext highlighter-rouge">Público</code>.</li>
</ul>

<h3 id="public-and-non-commercial-use-records">Public and non-commercial use records</h3>

<p>Since my blog is not ad-supported, I can also include records with a non-commercial use restriction, which adds up to <strong>3,756 records (28%)</strong>. Beware if you want to repost this image.</p>

<p><img src="/assets/images/2013-10-17-illegal-bullfrogs-map-non-commercial.png" alt="Public and non-commercial bullfrog records" /></p>

<p>The included data publishers are now:</p>

<blockquote>
  <p>Museum of Vertebrate Zoology, Ministry of Agriculture, Food and Environment, Cornell Lab of Ornithology, Bird Studies Canada, Ohio State University Insect Collection, National Museum of Natural History, Smithsonian Institution, Royal Ontario Museum &amp; Universidad de Antioquia</p>
</blockquote>

<h3 id="all-records">All records</h3>

<p>I cannot show you <a href="http://www.gbif.org/species/2427091">a map</a><sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup> including the other 72% of the records, either because the license is incompatible with the other data (such as share alike), but mainly because I first need to contact 52 institutions to either get some clarification of their license or to receive a written consent that I can actually repackage the data as a map. Frankly, that seems quite a hassle for a simple map. And even if I did this, you would have to do the same if you want to use, repackage or redistribute that map as well.</p>

<h2 id="conclusion">Conclusion</h2>

<p>I just used a small sample of aggregated data<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>, but I hope it demonstrates the unnecessary legal burden that is put on users of the data (note that I haven’t even started assessing the data quality or fit for my use). This will either result in less use of the data, or - and I think this is what happens most often - users ignoring the fine print. In both cases, it shows what <a href="/posts/gbif-data-license.html">I</a> and <a href="http://doi.org/10.6084/m9.figshare.799766">others</a> have written before:</p>

<blockquote>
  <p>A legal license is not the correct tool to enforce or communicate expected data use.</p>
</blockquote>

<p>Biodiversity data should be dedicated to the <a href="http://creativecommons.org/publicdomain/zero/1.0/">public domain</a>. Its ethical use should be communicated via community norms.</p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p>Sadly, these licenses are not attached to the data itself (maybe to prevent file size bloat), so I had to manually match (with some help of <a href="http://openrefine.org/">Open Refine</a>) the metadata with the data using <code class="language-plaintext highlighter-rouge">dataset_id</code> in order to have the license per record. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:2" role="doc-endnote">
      <p>I would like to show you how I interpreted the licenses by posting the data on GitHub, but that would violate some licenses. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:3" role="doc-endnote">
      <p>So why can GBIF show you the occurrence map that I can’t? I believe because of this clause in <a href="http://www.gbif.org/species/2427091">their data sharing agreement</a>: <em>GBIF Secretariat may cache a copy and serve full or partial data further to other users together with the terms and conditions for use set by the Data Publisher.</em> <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:4" role="doc-endnote">
      <p>I would love to redo this exercise for all 415+ million occurrences, but that is beyond the scope of what I can do alone, in my free time. Collaborative GitHub project anyone? <a href="#fnref:4" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Peter Desmet</name></author><category term="open data" /><category term="GBIF" /><summary type="html"><![CDATA[Non-standard data licenses are a burden for users of aggregated GBIF data.]]></summary></entry><entry><title type="html">What data license should GBIF use?</title><link href="https://peterdesmet.com/posts/gbif-data-license.html" rel="alternate" type="text/html" title="What data license should GBIF use?" /><published>2013-09-17T00:00:00+00:00</published><updated>2013-09-17T00:00:00+00:00</updated><id>https://peterdesmet.com/posts/gbif-data-license</id><content type="html" xml:base="https://peterdesmet.com/posts/gbif-data-license.html"><![CDATA[<p>On August 5, the Global Biodiversity Information Facility (<a href="http://www.gbif.org">GBIF</a>) released a consultation document<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> asking feedback regarding applying a machine readable license to all GBIF-mediated data. I am very happy GBIF is finally addressing this issue. The big question is of course: what license(s)?</p>

<p>I faced the same question for the data shared through the <a href="http://www.canadensys.net">Canadensys</a> network, and I’m still immensely proud that most of the Canadensys participants <a href="http://community.canadensys.net/2012/why-we-should-publish-our-data-under-cc0">released their data into the public domain</a> under a <a href="http://creativecommons.org/publicdomain/zero/1.0/">CC0</a> waiver and rely on <a href="http://community.canadensys.net/norms">community norms</a>, not legal instruments, to express how the data should be used. This makes the data truly usable.</p>

<p>This is the best and probably only valid approach GBIF should take, as much more eloquently expressed then I ever could by Jonathan Rees, Karen Cranston, Hilmar Lapp and Todd Vision in <a href="http://doi.org/10.6084/m9.figshare.799766">their response to GBIF on this issue</a>. Canadensys benefitted from being a new, unified and small network in taking this approach, but that doesn’t mean GBIF cannot take such a “more radical” approach (as GBIF puts it).</p>

<p>This is why I would suggest to:</p>

<ol>
  <li>Draft community norms for data use and publication. The <a href="http://community.canadensys.net/norms">Canadensys norms</a> could be a starting point (and are actually <a href="https://github.com/Canadensys/canadensys-norms">available on GitHub</a> for that purpose). This would be immediately beneficial to data publishers that want to rely on GBIF community approved norms with CC0, instead of using the Canadensys norms (e.g. <a href="https://ipt.inbo.be/depletion-fishing-nete-occurrences">this dataset</a> from the INBO).</li>
  <li>Educate the GBIF community about licenses and open data. It’s only recently that widely-recognized best practices (such as CC0 for data or <a href="http://opendefinition.org/okd/">a definition of open data</a>) are emerging from numerous domains, so it’s no wonder that many data publishers don’t care, don’t know or have misconceptions. Heck, I didn’t know or care about this until I educated myself 2 years ago.</li>
  <li>Allow new datasets to only be published under CC0 and the community norms (this is the approach <a href="http://blog.datadryad.org/2011/10/05/why-does-dryad-use-cc0/">Dryad</a> has taken). This ensures that at least new data are truly usable. The data publishers currently have to agree with the <a href="http://data.gbif.org/tutorial/datasharingagreement">data sharing agreement</a> before publishing, so the mechanism is already there to notify them.</li>
  <li>Communicate the truly open stance the GBIF community is taking with data publishers of existing datasets, instead of provisionally applying a license that might be based on a misconception. I don’t know though if there should be a hard deadline to this migration period, what options/restrictions to offer to data publishers that do not want to move to open data, and if GBIF can legally apply CC0 to datasets after numerous attempts to contact the data publisher.</li>
  <li>Promote standards and technologies that enable the effective tracking of data use (from Rees et al. above). GBIF is already working on this and should continue to do so.</li>
  <li>And on the specific question regarding supporting restrictions on commercial use (option 1 in the document): I would find it disheartening if effort and resources are put into creating and supporting an infrastructure that would allow the use of an ill-defined and non-open data license.</li>
</ol>

<p>I think most of all that this is an excellent opportunity for the GBIF community to send a strong message that truly open data is the way to go. I know it will help me to convince data publishers to publish their data as open biodiversity data.</p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p>The consolation document was sent by email, but has been put online by <a href="http://iphylo.blogspot.be/2013/08/gbif-and-open-biodiversity-data-what.html">Roderic Page</a> on <a href="https://docs.google.com/file/d/0B-PC5KKdhYCQZ1Y5Q2RySmdPbjQ/edit?usp=sharing">Google Drive</a>. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Peter Desmet</name></author><category term="open data" /><category term="GBIF" /><summary type="html"><![CDATA[From free and open access to biodiversity data to open biodiversity data.]]></summary></entry><entry><title type="html">Finding open access research articles</title><link href="https://peterdesmet.com/posts/oa-3.html" rel="alternate" type="text/html" title="Finding open access research articles" /><published>2013-08-10T00:00:00+00:00</published><updated>2013-08-10T00:00:00+00:00</updated><id>https://peterdesmet.com/posts/oa-3</id><content type="html" xml:base="https://peterdesmet.com/posts/oa-3.html"><![CDATA[<p><em>Note: This post is a response to <a href="https://p2pu.org/en/courses/5/content/367/">this task</a> of an <a href="https://p2pu.org/en/courses/5/open-science-an-introduction/">online course on open science</a> I am following.</em></p>

<p>As an exercise, I searched for open access (OA) research articles published after <code class="language-plaintext highlighter-rouge">2008</code> and related to <code class="language-plaintext highlighter-rouge">"GBIF" AND "data quality"</code>.</p>

<h2 id="search-tools">Search tools</h2>

<p>I tried four search tools:</p>

<table>
  <thead>
    <tr>
      <th>Search tool</th>
      <th>Results</th>
      <th>OA filter</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><a href="http://scholar.google.com/scholar?as_q=gbif+%22data+quality%22&amp;as_occt=any&amp;as_ylo=2009&amp;as_yhi=2013">Google Scholar</a></td>
      <td>332</td>
      <td>no<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup></td>
    </tr>
    <tr>
      <td><a href="http://www.sciencedirect.com/science?_ob=ArticleListURL&amp;_method=list&amp;_ArticleListID=-329228712&amp;_sort=r&amp;_st=4&amp;_acct=C000059224&amp;_version=1&amp;_urlVersion=0&amp;_userid=2932513&amp;md5=1ef663bb9bac18a3eb8260442c743c30&amp;searchtype=a">ScienceDirect</a></td>
      <td>100</td>
      <td>no<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup></td>
    </tr>
    <tr>
      <td><a href="http://www.ncbi.nlm.nih.gov/pmc/?term=GBIF+AND+data+quality+AND+%222009%22%5BPublication+Date%5D+%3A+%223000%22%5BPublication+Date%5D+AND+%22open+access%22%5BFilter%5D">PubMed Central</a></td>
      <td>148</td>
      <td><a href="http://www.ncbi.nlm.nih.gov/pmc/tools/openftlist/">yes</a></td>
    </tr>
    <tr>
      <td><a href="http://www.doaj.org/doaj?func=advancedSearch&amp;uiLanguage=en&amp;fromWeb=1&amp;first=1&amp;query1=GBIF&amp;field1=all&amp;bool1=AND&amp;query2=data+quality&amp;field2=all&amp;pubYear=rangeYears&amp;fromYear=2009&amp;toYear=2013">DOAJ</a></td>
      <td>5</td>
      <td>OA only</td>
    </tr>
  </tbody>
</table>

<p>I found it surprising that not all of these offer the option to filter on <code class="language-plaintext highlighter-rouge">open access only</code>. I am curious to know if this is mostly because of technical or other limitations.</p>

<h2 id="finding-the-license">Finding the license</h2>

<p>For three articles, I dug a little deeper to find the license. Two of these use and clearly indicate <a href="http://creativecommons.org/licenses/by/3.0/">Creative Commons Attribution</a> (coincidentally these two were not found by ScienceDirect):</p>

<blockquote>
  <p>Hill A.W., Guralnick R., Flemons P., Beaman R., Wieczorek J., Ranipeta A., Chavan V., &amp; Remsen D. (2009). Location, location, location: utilizing pipelines and services to more effectively georeference the world’s biodiversity data. BMC Bioinformatics, 10(Suppl 14), S3. doi: <a href="https://doi.org/10.1186/1471-2105-10-S14-S3">10.1186/1471-2105-10-S14-S3</a></p>
</blockquote>

<blockquote>
  <p>Belbin L., Daly J., Hirsch T., Hobern D., &amp; La Salle J. (2013). A specialist’s audit of aggregated occurrence records: An ‘aggregator’s’ perspective. Zookeys (305), 67-76. doi: <a href="https://doi.org/10.3897/zookeys.305.5438">10.3897/zookeys.305.5438</a></p>
</blockquote>

<p>The other one:</p>

<blockquote>
  <p>Costello M.J., Michener W.K., Gahegan M., Zhang Z., &amp; Bourne P.E. (2013). Biodiversity data should be published, cited, and peer reviewed. Trends in Ecology &amp; Evolution, 28(8), 454-461. doi: <a href="https://doi.org/10.1016/j.tree.2013.05.002">10.1016/j.tree.2013.05.002</a></p>
</blockquote>

<p>… (which was not found by PubMed Central and DOAJ) can be downloaded as pdf (= OA), but if you actually want to do something with the content, you have to <a href="https://s100.copyright.com/AppDispatchServlet?publisherName=ELS&amp;contentID=S0169534713001092&amp;orderBeanReset=true">indicate in a very detailed manner</a> what you want to (re)use the article for. Very few use cases seem to be free. In other words: yuck!</p>

<p>I am quite familiar with the <a href="http://creativecommons.org/licenses/">Creative Commons licenses</a>, but now I realize what a mess you get if those aren’t applied. This OA article also demonstrates that the term “open access” doesn’t tell you that much: there’s still a whole range of <a href="/posts/oa-2.html">how open something really is</a>.</p>
<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p>Although <a href="http://scholar.google.com/">Google Scholar</a> does not provide an OA filter, it clearly indicates which search results can be accessed for free, by providing a link on the right hand side (which includes the provider and format). <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:2" role="doc-endnote">
      <p><a href="http://www.sciencedirect.com/">ScienceDirect</a> also indicates which search results have <code class="language-plaintext highlighter-rouge">Full-text available</code>. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Peter Desmet</name></author><category term="open science course" /><category term="open access" /><summary type="html"><![CDATA[Not all search tools are equal. Also, the advantage of standard licenses.]]></summary></entry><entry><title type="html">How open is my paper?</title><link href="https://peterdesmet.com/posts/oa-2.html" rel="alternate" type="text/html" title="How open is my paper?" /><published>2013-08-08T00:00:00+00:00</published><updated>2013-08-08T00:00:00+00:00</updated><id>https://peterdesmet.com/posts/oa-2</id><content type="html" xml:base="https://peterdesmet.com/posts/oa-2.html"><![CDATA[<p><em>Note: This post is a response to <a href="https://p2pu.org/en/courses/5/content/366/">this task</a> of an <a href="https://p2pu.org/en/courses/5/open-science-an-introduction/">online course on open science</a> I am following.</em></p>

<p>In my <a href="/posts/oa-1.html">previous post</a>, I explained what open access is. But we can go beyond the simple question of “Is it open access?” and evaluate how open a resource actually is.</p>

<p>PLOS, SPARC and OASPA developed a <strong><a href="https://www.plos.org/how-open-is-it">2-page guide</a></strong> to do just that. I will use it to assess the openness of <a href="https://doi.org/10.3897/phytokeys.25.3100">my first paper</a> (co-written with Luc Brouillet) that was published two weeks ago in <a href="https://phytokeys.pensoft.net/">PhytoKeys</a>, a <a href="http://www.pensoft.net/">Pensoft</a> journal. Not as a vanity project, but to figure out how well I - as an advocate for open - am doing by submitting my paper to PhytoKeys as the journal.</p>

<blockquote>
  <p>Desmet P, Brouillet L (2013) Database of Vascular Plants of Canada (VASCAN): a community contributed taxonomic checklist of all vascular plants of Canada, Saint Pierre and Miquelon, and Greenland. PhytoKeys 25: 55–67. doi: <a href="https://doi.org/10.3897/phytokeys.25.3100">10.3897/phytokeys.25.3100</a> GBIF key: <a href="http://www.gbif.org/dataset/3f8a1297-3259-4700-91fc-acc4170b27ce">3f8a1297-3259-4700-91fc-acc4170b27ce</a></p>
</blockquote>

<h2 id="reader-rights">Reader rights</h2>

<blockquote>
  <p>This journal provides immediate open access to its content on the principle that making research freely available to the public supports a greater global exchange of knowledge. (<a href="http://www.pensoft.net/journals/phytokeys/about/Open%20Access%20Policy#Open Access Policy">source</a>)</p>
</blockquote>

<p>No fees, no embargoes, better science. <strong>5/5</strong></p>

<h2 id="reuse-rights">Reuse rights</h2>

<blockquote>
  <p>The article and any associated published material is distributed under the <a href="http://creativecommons.org/licenses/by/3.0/">Creative Commons Attribution License 3.0 (CC-BY)</a>. (<a href="http://www.pensoft.net/journals/phytokeys/about/Open%20Access%20Policy#Copyright%20Notice">source</a>)</p>
</blockquote>

<p>Reusing and remixing galore. <strong>5/5</strong></p>

<h2 id="copyrights">Copyrights</h2>

<blockquote>
  <p>Copyright on any article is retained by the author(s). […] (<a href="http://www.pensoft.net/journals/phytokeys/about/Open%20Access%20Policy#Copyright%20Notice">source</a>)</p>
</blockquote>

<p>And the transfer of copyrights between the authors and publisher is regulated through the CC-BY license (which applies to anyone), but some of these rights are explicitly spelled out in the policy. Nice! <strong>5/5</strong></p>

<h2 id="author-posting-rights">Author posting rights</h2>

<blockquote>
  <p>[…] Authors are thus encouraged to post the pdf files of published papers on their homepages or elsewhere to expedite distribution. […] (<a href="https://phytokeys.pensoft.net/">source</a>)</p>
</blockquote>

<p>This encouragement is not mentioned in the editorial policies, but on the journal’s homepage. It is not 100% clear if pre-prints (before peer-review) can be posted as well, but the fact that:</p>

<blockquote>
  <p>The work described has not been published before (except in the form of an abstract or as part of a published lecture, review or thesis) […] (<a href="http://www.pensoft.net/journals/phytokeys/about/Open%20Access%20Policy#Copyright%20Notice">source</a>)</p>
</blockquote>

<p>… and a <a href="http://www.sherpa.ac.uk/romeo/search.php">SHERPA/RoMEO</a> search indicating PhytoKeys as a blue journal, with a “pre-print archiving status unclear”, leads me to conclude that this is not allowed. <strong>4/5</strong></p>

<h2 id="automatic-posting">Automatic posting</h2>

<blockquote>
  <p>Archived in <a href="http://www.ncbi.nlm.nih.gov/pmc/journals/1603/">PubMedCentral</a> and <a href="http://www.clockss.org/">CLOCKSS</a>. (<a href="https://phytokeys.pensoft.net/">source</a>)</p>
</blockquote>

<p>The journal is <a href="http://www.ncbi.nlm.nih.gov/pmc/journals/1603/">indeed archived</a> in PubMedCentral, but the latest issue (25) is not available yet. I’m assuming this will be done within 6 months. <strong>4/5</strong></p>

<h2 id="machine-readability">Machine readability</h2>

<p>This criterion is more verbose:</p>

<blockquote>
  <p>Article full text, metadata, citations, &amp; data, including supplementary data, provided in community machine-readable standard formats through a community standard API or protocol. (<a href="https://www.plos.org/files/HowOpenIsIt_English.pdf">source</a>)</p>
</blockquote>

<p>… so let’s break it up in parts:</p>

<p>The article (including full text, metadata, citations) is <a href="http://www.pensoft.net/J_FILES/3/articles/3100/3100-G-2-layout.xml">available as XML</a>. These XML files are also deposited on a <a href="https://github.com/pensoft/PhytoKeys-xml">GitHub repository</a> and seem to be marked up in a community standard (see the xml header info) and provided through a standard API or protocol (GitHub and http). I’m not sure the links to included figures work, as these don’t start with <code class="language-plaintext highlighter-rouge">http://</code>.</p>

<p>In addition, I posted the full article (including figures) as an editable, <a href="http://daringfireball.net/projects/markdown/">Markdown</a> formatted text file on <a href="https://github.com/peterdesmet/vascan-data-paper">GitHub</a>.</p>

<p>The data, which are hosted on the <a href="https://doi.org/10.5886/Y7SMZY5P">Canadensys Repository</a> and <a href="http://www.gbif.org/dataset/3f8a1297-3259-4700-91fc-acc4170b27ce">indexed by GBIF</a> are available as a <a href="http://www.gbif.org/informatics/standards-and-tools/publishing-data/data-standards/darwin-core-archives/">Darwin Core archive</a>, which is a community standard, and are provided through a standard protocol (http). There are no supplementary data. <strong>5/5</strong></p>

<h2 id="conclusion">Conclusion</h2>

<p><strong>28/30</strong>, which is awesome!</p>

<p>Congrats to the folks at <a href="http://www.pensoft.net/">Pensoft</a> on providing <a href="http://www.pensoft.net/journals/phytokeys/about/Open%20Access%20Policy">clear policies for PhytoKeys</a>, although the author posting rights and automatic posting could be clarified. Please let me know if I misinterpreted something.</p>]]></content><author><name>Peter Desmet</name></author><category term="open science course" /><category term="open access" /><category term="PhytoKeys" /><summary type="html"><![CDATA[And how well is the journal communicating this?]]></summary></entry><entry><title type="html">What is open access?</title><link href="https://peterdesmet.com/posts/oa-1.html" rel="alternate" type="text/html" title="What is open access?" /><published>2013-08-07T00:00:00+00:00</published><updated>2013-08-07T00:00:00+00:00</updated><id>https://peterdesmet.com/posts/oa-1</id><content type="html" xml:base="https://peterdesmet.com/posts/oa-1.html"><![CDATA[<p><em>Note: This post is a response to <a href="https://p2pu.org/en/courses/5/content/365/">this task</a> of an <a href="https://p2pu.org/en/courses/5/open-science-an-introduction/">online course on open science</a> I am following. In addition to the sources mentioned in text, I have also used a blog post from fellow participant <a href="http://linked-data.blogspot.com/2013/08/introduction-to-open-access.html">Quentin Reul</a>.</em></p>

<p>Open access is <strong>free</strong>, <strong>immediate</strong>, and <strong>online</strong> availability of peer-reviewed research articles and other scientific works. It has several advantages over traditional publication practices and will hopefully replace those in the long term.</p>

<h2 id="old-model">Old model</h2>

<p>The invention of the <a href="http://en.wikipedia.org/wiki/Printing_press">printing press</a> in the 15th century suddenly allowed works to be created, copied and distributed to a wide audience, rather than being painstakingly copied by hand. Scientific societies were founded, which used this tool to facilitate the publication of scientific works (peer-reviewing, type-setting, distributing). To recoup the (rather high) costs, articles were published in <a href="http://en.wikipedia.org/wiki/Subscription_business_model">subscription based</a> journals.</p>

<h2 id="new-model">New model</h2>

<p>This business model has worked for a long time (and is still the dominant model in use today), but since the ubiquity of the internet, the <strong>cost of publishing</strong> works has decreased dramatically, so other publication models are possible.</p>

<p>Numerous scientific works are also the result of scientific research that is funded with public money, so there is a <strong>moral duty</strong> to make those results available fast and free to anyone.</p>

<p>And publicly available scientific works <strong>increase the efficiency</strong> of research, not only research in general, but also the research of the author of the article, by:</p>

<ul>
  <li>Reaching a wider (and different) audience</li>
  <li>Publishing research results faster (including <a href="http://blog.f1000research.com/2013/05/15/no-article-fee-for-negative-results-until-end-of-august/">negative results</a>)</li>
  <li>Increasing engagement and collaboration</li>
  <li>Increasing transparency (which allows reproducibility and plagiarism detection)</li>
  <li>Allowing <a href="http://altmetrics.org/manifesto/">alt metrics</a> systems to be built around scientific works</li>
</ul>

<p>The importance of open access is <a href="http://www.youtube.com/watch?v=L5rVH1KGBCY">illustrated in this video</a> by <a href="https://twitter.com/R2RC">Nick Shockey</a>, <a href="https://twitter.com/phylogenomics">Jonathan Eisen</a> and <a href="https://twitter.com/phdcomics">Jorge Cham</a> (of the excellent <a href="http://www.phdcomics.com/">PhD Comics</a>).</p>

<iframe width="100%" height="500" src="//www.youtube.com/embed/L5rVH1KGBCY" frameborder="0" allowfullscreen=""></iframe>

<h2 id="licenses">Licenses</h2>

<p>From a legal perspective, the open access movement has been made possible by the creation of licenses - such as the <a href="http://creativecommons.org/licenses/">Creative Commons licenses</a> - that allow copyright holders to loosen some of the rights they hold over their work. The <a href="http://creativecommons.org/licenses/by/3.0/">CC-BY license</a> for example, which I use for this blog post, allows anyone to legally share and remix the work, as long as they credit the original source.</p>

<h2 id="flavours-of-open-access">Flavours of open access</h2>

<p>As with any paradigm shift, the ultimate goal of <a href="http://en.wikipedia.org/wiki/Open_access">open access</a> won’t be reached immediately, so there are different flavours of open access:</p>

<ul>
  <li><strong>Green open access</strong>: the journal might no be open access, but the authors can <a href="http://en.wikipedia.org/wiki/Self-archiving">self-archive</a> their article under an open access license (here’s a <a href="http://datapub.cdlib.org/2012/11/06/researchers-make-your-previous-work-oa/">tutorial</a>).</li>
  <li><strong>Gold open access</strong>: the <a href="http://en.wikipedia.org/wiki/Open_access_journal">journal</a> itself is open access (see for example the <a href="http://www.doaj.org/">Directory on open access journals</a>).
    <ul>
      <li><strong>Gratis open access</strong>: The article is available online at no-cost.</li>
      <li><strong>Libre open access</strong>: The article is available online at no-cost and there are additional usage rights (yeay!)</li>
    </ul>
  </li>
</ul>]]></content><author><name>Peter Desmet</name></author><category term="open science course" /><category term="open access" /><summary type="html"><![CDATA[A short introduction to open access.]]></summary></entry><entry><title type="html">Coding resolutions</title><link href="https://peterdesmet.com/posts/coding-resolutions.html" rel="alternate" type="text/html" title="Coding resolutions" /><published>2013-01-17T00:00:00+00:00</published><updated>2013-01-17T00:00:00+00:00</updated><id>https://peterdesmet.com/posts/coding-resolutions</id><content type="html" xml:base="https://peterdesmet.com/posts/coding-resolutions.html"><![CDATA[<p>It’s mid-January, but it’s never too late to make some resolutions to feel bad about not reaching at the end of the year. Mine are coding-related and hardly world-shattering, giving me a chance to actually reach them.</p>

<h2 id="learning">Learning</h2>

<p>I’m enjoying the courses on <a href="http://www.codecademy.com/">Codecademy</a>, but since I have a tendency to abandon them before I reach the end, here are my learning resolutions:</p>

<ul>
  <li>Finish the <a href="http://www.codecademy.com/tracks/python">Python course</a></li>
  <li>Finish the <a href="http://www.codecademy.com/tracks/javascript">JavaScript course</a></li>
  <li>Finish the <a href="http://www.codecademy.com/tracks/jquery">jQuery course</a></li>
  <li>Create <a href="http://www.codecademy.com/tracks/projects">3 projects</a></li>
  <li>Follow <a href="http://www.codecademy.com/tracks/apis">3 API courses</a></li>
</ul>

<p>And as a bonus, <a href="http://www.codeschool.com/courses/try-r">try R on Code School</a>.</p>

<h2 id="creating">Creating</h2>

<p>Learning is fine, but doing is better:</p>

<ul>
  <li>Create a data repository on <a href="https://github.com/peterdesmet">GitHub</a> and release it to the public domain.</li>
  <li>Create a one-page website visualizing data, using the <a href="http://cartodb.com/">CartoDB</a> API.</li>
  <li>Transfer my website from WordPress to GitHub, using <a href="http://docs.getpelican.com/en/latest/">Pelican</a>.</li>
  <li>Write at least 5 posts.</li>
</ul>

<p>And as a bonus, create a theme for my website.</p>

<h2 id="contributing">Contributing</h2>

<ul>
  <li>Contribute at least once a month to someone else’s repository or documentation. (Must. keep. <a href="https://github.com/peterdesmet">contributions summary.</a> active.)</li>
</ul>

<p>And as a bonus, help an old lady cross the street and/or nurture a wounded fantasy animal into a powerful pet ally.</p>]]></content><author><name>Peter Desmet</name></author><category term="coding" /><category term="new year" /><summary type="html"><![CDATA[Resolutions in coding land. Also, fantasy pets.]]></summary></entry></feed>