Forked from
R3 / school / courses
286 commits behind the upstream repository.
Code owners
Assign users and groups as approvers for specific file changes. Learn more.
data-housekeeping.md 3.88 KiB
Data housekeeping
File names
General pricinples
- Machine readable
- Human readable
- Plays well with default ordering
Separators
- No spaces
- Underscore to separate
- Hyphen to combine
2018-12-03
2018-12-06_1700
Bad names
PhD-project-Jan19 alldata_final.foo
Finacial detailes BIocore 19/11/12.xls
ATACseq1Londonmapped.bam
Hlad.jez.M-L-průtoky JíObj.z Ohře-od 10-2011.xlsx
Good names
Iris-setosa_samples_1927-05-12.csv
PI102_Mouse12_EEG_2018-11-03_1245.tsv
Bioinfiniti_FullProposal_2018-11-15_1655.do
From Jenny Bryan by CC-BY
(https://speakerdeck.com/jennybc/how-to-name-files)
Data housekeeping
File organization
-
Have folder organization conventions for your group
- Per Paper
- Per Study/Project
- Per Collaborator
-
Keep readme files for data
- Title
- Date of Creation/Receipt
- Instrument or software specific information
- People involved
- Relations between multiple files/folders
-
Separate files you are actively working from the old ones
-
Orient newcomers to the group's conventions
Data housekeeping
When working
- Clarify and separate source and intermediate data
- Keep data copies to a minimum
- Cleanup post-analysis
- Cleanup copies created for presentations or for sharing
- Handover data to a new responsible when leaving
Data housekeeping
End of project
- data should be kept as a single copy on server-side storage
- no copies on desktops and external devices
- non-proprietary formats
- minimal metadata:
- source
- context of generation
- data structure
- content
- sensitive data (e.g. whole genome) must be encrypted
* If not specified otherwise, data must be kept for **10 years** following project end for reproducibility purposes Note: sometimes it is hard to find/understand dataset 10 days old
In doubt on data archival?
Contact R3 for support on archival of datasets using tickets:
- https://service.uni.lu/sp
- Home > Catalog > LCSB > Biocore: Application services > Request for: Support
Data housekeeping - Summary
Server is your friend!
- Allows a consistent backup policy for your datasets
- Keeps number of copies to minimum
- Specification of clear access rights
- High accessibility
- Data are discoverable
- Server can't be stolen
General guidelines
- Use institutional media for storage of all data
- Research data (particularly sensitive data) should be in a single source location
- Enable encryption for data stored on movable media
- Clarify and separate source and intermediate data
- Disable write access to relevant source data (read-only)
- Backup research data!
- Download Anti-virus software
- Generate checksums