# Data housekeeping ## File names <div style="display:flex; position:static; width:100%"> <div class="fragment" data-fragment-index="0" style="position:static; width:30%"> ### General pricinples * Machine readable * Human readable * Plays well with default ordering </div> <div class="fragment" data-fragment-index="1" style="position:absolute; left:33%; width:30%"> ### Separators * No spaces * Underscore to separate * Hyphen to combine </div> <div class="fragment" data-fragment-index="2" style="position:absolute; left:66%; width:30%"> ### Date format follows **ISO 8601**<br> 2018-12-03<br> 2018-12-06_1700 </div> </div> <div class="fragment" data-fragment-index="3" style="width:100%; position:static"> <div style="position:absolute;width:55%"> <b>Bad</b> names ```bash PhD-project-Jan19 alldata_final.foo Finacial detailes BIocore 19/11/12.xls ATACseq1Londonmapped.bam Hlad.jez.M-L-průtoky JíObj.z Ohře-od 10-2011.xlsx ``` </div> <div style="position:relative;width:55%; bottom:20%; left:50%"> <b>Good</b> names ```bash Iris-setosa_samples_1927-05-12.csv PI102_Mouse12_EEG_2018-11-03_1245.tsv Bioinfiniti_FullProposal_2018-11-15_1655.do ``` </div> </div> <br> <br> <div class="fragment" data-fragment-index="3" style="width:100%;"> From Jenny Bryan by CC-BY (https://speakerdeck.com/jennybc/how-to-name-files) </div> # Data housekeeping ## File organization * Have folder organization conventions for your **group** * Per Paper * Per Study/Project * Per Collaborator * Keep <b>readme files</b> for data * Title * Date of Creation/Receipt * Instrument or software specific information * People involved * Relations between multiple files/folders * Separate files you are actively working from the old ones * Orient newcomers to the group's conventions # Data housekeeping <div style="position:absolute"> ## When working * Clarify and separate source and intermediate data * Keep data copies to a **minimum** * Cleanup post-analysis * Cleanup copies created for presentations or for sharing * Handover data to a new responsible when leaving </div> <div style="position:relative;left:50%; width:40%"> <img src="slides/img/cleaning-table.jpg" height="450px"> </div> # Data housekeeping ## End of project * data should be kept as a single copy on server-side storage * no copies on desktops and external devices * non-proprietary formats * minimal metadata: * source * context of generation * data structure * content * sensitive data (e.g. whole genome) **must** be encrypted <br/> <br/> * If not specified otherwise, data must be kept for **10 years** following project end for reproducibility purposes <aside class="notes"> Note: sometimes it is hard to find/understand dataset 10 days old </aside> ## In doubt on data archival? Contact R<sup>3</sup> for support on archival of datasets using tickets: * https://service.uni.lu/sp * Home > Catalog > LCSB > Biocore: Application services > Request for: Support <div style="position:absolute; width:45%; left:50%; top:28em; text-align:right"> <a href=" https://howto.lcsb.uni.lu/?policies:LCSB-POL-BIC-03" style="color:grey; font-size:0.8em;">Research Data Retention and Archival Policy</a> </div> # Data housekeeping - Summary ## Server is your friend! * Allows a consistent backup policy for your datasets * Keeps number of copies to minimum * Specification of clear access rights * High accessibility * Data are discoverable * Server can't be stolen ## General guidelines * Use institutional media for storage of **all** data * Research data (particularly sensitive data) should be in a single source location * Enable encryption for data stored on movable media * Clarify and separate source and intermediate data * Disable write access to relevant source data (read-only) * Backup research data! * Download Anti-virus software * Generate checksums