Good data management begins with creating a record of research that thoroughly, accurately, and clearly documents the work and evidence that went into creating a scholarly product, such as a paper, book, patent, computer program, etc. Beyond data collection, good data management includes recognizing who owns data, when and how data should be shared, and when data can be destroyed.
What is Data?
Most definitions of data are very broad. For example, the National Institutes of Health (NIH) uses the following definition in its grants manual in connection with rules on the availability of research results.
For this purpose, "data" means recorded information, regardless of the form or media on which it may be recorded, and includes writings, films, sound recordings, pictorial reproductions, drawings, designs or other graphic representations, procedural manuals, forms, diagrams, work flow charts, equipment descriptions, data files, data processing or computer programs (software), statistical records, and other research data (NIH Grants Policy Statement (03/01)).
Further, NIH requires that researchers who receive its funds make available not only data, but also "unique resources," to other scholars. This includes a wide range of information and biologicals, such as "synthetic compounds, organisms, cell lines, viruses, cell products, and cloned DNA" (NIH Grants Policy Statement (03/01)).
Data Collection Guidelines
The most rigorous standards for data collection come from industry and human subjects research. Since Congress passed the Bayh-Dole Act in 1980, which gives universities control over intellectual property created by researchers with federal grants, patenting has significantly increased on university campuses and, with this trend, universities have moved towards industrial standards for data collection. These standards focus on what should be recorded in a laboratory notebook and how a notebook should be kept. Guidelines often recommend that notebooks include:
- Descriptions of reasons for experiments
- Experimental protocols
- Observations, measurements, and other experimental results
- Printouts, photographs, and other machine generated data
- Mathematical calculations performed on raw data
- Brief interpretations of the results
The following style conventions are widely recognized for laboratory notebooks:
- Permanent binding
- Consecutively numbered pages
- Tables of contents
- Explanations of abbreviations
- Dated entries
- If the date of the experiment is different from the date of recording, recording both dates
- Dating and initialing all changes
- Keeping legible and clear records in permanent ink
- Periodic review and signing of notebooks by someone not directly involved in the research
A traditional value of academic communities has been the sharing of
research results. The federal government requires that data and unique
created with its funding be shared and encourages timely
dissemination of results through publication and presentation in
academic venues. In the last two
decades, emphases on industrial collaboration and patenting in
medicine and the life sciences have challenged older values.
Guidelines for data retention range from three years (NIH regulations) to twenty-three years (Patent Office). When no other concerns supersede, UW-Madison often uses seven years past publication or completion of research as a rule-of-thumb for how long data should be retained. The Wisconsin state statute of limitations that applies to actions for fraud underlies this recommendation.
What Rules Apply?
For investigators with funding from National Institutes of Health, data ownership, data sharing, and data retention are addressed in the NIH Data Sharing Policy. Other funding agencies may have their own policies concerning data management.