Metadata is often defined as "data about data," a characterization that fails to capture why it is important and what it does. Paul Miller provides a richer description that illustrates metadata's many values and purposes:
“In essence, metadata is the extra baggage associated with any resource that enables a real or potential user to find that resource; to decide whether or not it is of value to them; to discover where, when and by whom it was created, as well as for what purpose; to know what tools will be needed to manipulate the resource; to determine whether or not they will actually be allowed access to the resource itself and how much this will cost them. Metadata is, in short, a means by which largely meaningless data may be transformed into information, interpretable and reusable by those other than the creator of the data resource.”1
Metadata is structured information about an object, like a dataset, and has value to both the original creator and other users. Complete metadata allows researchers to locate data they created and recall the circumstances and context under which they created and analyzed the data. It allows researchers outside of the original research team to:
1. Miller, Paul (2004). Metadata: What it means for memory institutions. In Metadata applications and management, ed. G.E. Gorman and Daniel G. Dorner. Lanham, MD: Scarecrow Press, p. 4.
Metadata are structured information that provides context for information objects of all kinds, including research data, and in doing so enables discovery, use, exchange, and preservation of those objects. Metadata for data typically includes information about the researchers involved with the data creation, a name or title of the data set, dates associated with the creation of the data, a brief description or abstract, and terms and conditions associated with the data set.
There are a variety of metadata standards for describing data sets based on discipline, international standards, and many other characteristics of the data. Academic disciplines have supported initiatives to formalize metadata specifications within their community. The type of resource being represented and the desired uses of the represented resource will influence the metadata standards. Some examples of widely adopted metadata standards include the following:
General
Sciences
Digital Curation Centre - List of Biology metadata standards, including tools and use cases
Social Sciences
Humanities
For data to be interpretable and useful to others, researchers should document their research workflow, decisions that they make during their research process, and their manipulation of the data. The UK Data Archive outlines a set of best practices for data documentation, which is captured here:
Good data documentation includes information on:
At data-level, datasets should also be documented with:
Variable-level descriptions may be embedded within a dataset itself as metadata. Other documentation may be contained in user guides, reports, publications, working papers and laboratory books (see Managing and Sharing Data UK Data Archive).
In the context of research data, a readme file is a plain text file (.txt) that helps others understand your data and interconnections among data files. By titling the file "readme," the date creator signals to other users that this file should be looked at first. For researchers depositing data in D-Scholarship@Pitt, the information in the readme file may mirror and augment information included in the metadata form and, if the deposit includes multiple files, may explain the file naming structure, relationship among the files, and abbreviations used.
Cornell University's Research Data Management Service Group has made a useful readme file template available for download. At a minimum, the Cornell group recommends completing the following sections in the readme file template:
General information
Data set title
Name and contact information for investigators
Date (or date range) of data collection
Geographic location of data collection
Data and file overview
A short description of each file
Date that the file was created
Methodological information
Description of methods for data collection
Description of methods for data processing
Data specific-information
Variable list, with full names and definitions of column headings if tabular data
Units of measurement
Definitions for codes or symbols used to record missing information (see Cornell University, Guide to writing "readme" style metadata)
A data dictionary describes all the data stored in a data set or used by a database, including their types, attributes, structure, relationships, and usage in the database or software program. A good data dictionary can be a valuable part of the metadata describing a data set, enabling a user to get a clear understanding of the content and organization of the data and how it could be modified, if necessary. In the context of a database or software package, the data dictionary may be an essential piece of software that programmers and the database management system require to access and use the data properly. The user view of a data dictionary is usually presented as a table or spreadsheet. Dictionaries may also be incorporated into XML files or other mark-up languages. A data dictionary does not contain the data, but only describes it.
A data dictionary typically contains a list of all files in the database, names for each file, the type of data included, a list of all field names and variable names, a description of the information contained in each field, and the various attributes of each field. These may include type (text, date, numeric, etc.), standard formats, units, field length, description, unique identifiers, default values, whether a value is required or not, and more, depending on the specific data.
For some examples of data dictionaries, check the following sites: