Skip to Main Content
Our Guides

Research Data Management

About Rights and Permissions

As a data creator, you have certain rights over the work and an opportunity to license your data appropriately to facilitate sharing and re-use. The application of copyright and licensing depends on several factors - whether your data set contains quantitative data, qualitative data, or sensitive information. Copyright and licensing options vary depending on the type of data and its sensitivity. 

Best practices include:

  • Understanding the nature of your dataset and whether your data are subject to copyright.
  • Making your data as open and reusable as possible, ideally by dedicating it to the Public Domain.
  • Identifying any restrictions of sharing data, e.g. from Terms of Use.
  • Asserting your rights under the Doctrine of Fair Use if necessary.
  • Considering carefully any ethical questions involved in sharing your openly and choosing licensing and access options to match. 

The information presented here is a brief overview of a very complicated topic. Please get in touch with the Research Data Management team for help with any of the rights and permissions considerations below. 

Rights and Quantitative Data

By quantitative data, we mean data that are numerical values or measurements of facts about the universe. Because facts are not subject to copyright, most quantitative data are not copyrightable in the United States and copyright laws usually do not apply or are not enforceable. 

However, the arrangement, selection, and coordination of the data set as a whole may be subject to copyright. This depends on the creativity involved with arranging and displaying the data. 

Many researchers believe in the importance of sharing data openly to facilitate the greatest possible reuse of the data. For example, Dryad and the Panton Principles for Open Data strongly recommend that data be contributed to the public domain. When a data set is dedicated to the Public Domain, then the creator declares that others may use the data set in its current form (and, therefore, the potential copyright of the arrangement, selection, and coordination of the data set are dedicated to the Public Domain). Below are two examples of licenses that a data set creator can apply to a quantitative data set to dedicate it to the Public Domain.

Rights and Qualitative Data

By qualitative data, we mean data that contain observations, texts, conversations, artistic or creative works, which are usually collected in the humanities and social science fields. Some examples of qualitative data include text corpora, interviews, photographs, and social media output. Because these are often creative expressions made by individuals that are fixed in a tangible form, many of these data sets are subject to copyright, and permission may need to be obtained for their use. For those compiling qualitative data sets, privacy, ethics, and licenses are of key concern.

For those collecting interviews or other recordings and documentation made by research subjects, clear guidelines for the usage and ownership of these materials should be set out in a Consent Form and cleared with the IRB. This is also the case when research work is conducted via the Internet.

Researchers must identify whether the data are in the public domain, subject to licensing terms, or may qualify as Fair Use. Because these data sets often include substantial transformative use, a Fair Use argument may be particularly powerful for qualitative data sets.

When obtaining data from the Internet via scraping tools, the restrictions in Terms of Service and Developer Policies apply, especially from social media websites.

Access Control and Permissions for Sensitve Data

For data sets that contain sensitive research, e.g. human subject research, access control may be an option. Mixed levels of access control may be put in place for some data, combining controlled access to confidential data with standard access to non-confidential data. 

Anonymizing Data

Before data collected during research with human subjects is published, researchers should ensure the removal of any personally identifiable information (or PII). A documented plan for anonymizing the data will serve to mitigate the risk to participants, encourage consistency in practices among the research team throughout the project, and help future users to understand what decisions were made during the anonymization.

Some approaches for anonymization include:

  • Avoid the collection of identifying information that is unnecessary for the study
  • Remove direct identifiers (i.e. participant names, addresses, phone numbers) from the data and, when appropriate replace this information with a code (i.e. participant number or pseudonym in place of name)
  • Aggregate variables or reduce the precision of reporting when possible to lower the potential for identification. For example, rather than recording full birth dates or precise ages of participants, the research team may decide to record year of birth or age range.

Licensing Options

Beyond the Public Domain licensing options above, there are some other licensing options that can apply to data sets. Creative Commons licenses allow creators to specify the rights for reuse - typically with attribution to the creator, but potentially also including bans on commercial use and derivatives. It is not recommended to prohibit derivative works on a data set, as this will compromise the usability of that data set.

 

​Licenses can work in tandem with access control, Fair Use, and ethical considerations detailed above. For complex situations, contact us for guidance.

Intellectual Property Considerations

Copyright law protects the original creative expressions that are fixed in either physical or digital form. The US Copyright Law provides examples of creative works that are protected -- including literary works, musical works, and motion pictures -- and works that are not eligible for copyright protection -- including ideas, processes, and concepts. Factual information has been interpreted as being outside of the protection of the copyright law, which has implications for data. Peter Hirtle of Cornell University Library cautions, "Not all data is in the public domain. A project might, for example, be built around copyrighted photographs; the photographs are part of the project’s 'data.' But in many cases, the data in a data management system as well as the metadata describing that data will be factual, and hence not protected by copyright." For for more information, see Cornell's "Introduction to Intellectual Property Rights in Data Management."

Even if datasets are not protected under copyright, researchers who are not the creators may be uncertain whether they are indeed allowed to use it for their own work. Licenses that clearly outline the terms of use can help to alleviate this uncertainty and to promote the use the data. Creative Commons licenses and Open Data Commons licenses are two noteworthy instruments for specifying the terms of use for datasets.