# Data housekeeping
## File names
<div  style="display:flex; position:static; width:100%">
<div class="fragment" data-fragment-index="0" style="position:static; width:30%">

### General pricinples  
  * Machine readable
  * Human readable
  * Plays well with default ordering
</div>
<div class="fragment" data-fragment-index="1" style="position:absolute; left:33%; width:30%">

### Separators  
  * No spaces
  * Underscore to separate
  * Hyphen to combine
  
</div>
<div class="fragment" data-fragment-index="2" style="position:absolute; left:66%; width:30%">

### Date format follows **ISO 8601**<br>

  2018-12-03<br> 
  2018-12-06_1700  

</div>
</div>
 

<div class="fragment" data-fragment-index="3" style="width:100%; position:static">
<div style="position:absolute;width:55%">
<b>Bad</b> names

```bash
 PhD-project-Jan19 alldata_final.foo
 Finacial detailes BIocore 19/11/12.xls
 ATACseq1Londonmapped.bam
 Hlad.jez.M-L-průtoky JíObj.z Ohře-od 10-2011.xlsx
```
</div>
<div style="position:relative;width:55%; bottom:20%; left:50%">
<b>Good</b> names

```bash
Iris-setosa_samples_1927-05-12.csv
PI102_Mouse12_EEG_2018-11-03_1245.tsv
Bioinfiniti_FullProposal_2018-11-15_1655.do
```
</div>
</div>
<br>
<br>
<div class="fragment" data-fragment-index="3" style="width:100%;">
From Jenny Bryan by CC-BY  
(https://speakerdeck.com/jennybc/how-to-name-files)
</div>



# Data housekeeping
## File organization
* Have folder organization conventions for your **group**
  * Per Paper
  * Per Study/Project 
  * Per Collaborator
* Keep <b>readme files</b> for data  
  * Title
  * Date of Creation/Receipt
  * Instrument or software specific information
  * People involved
  * Relations between multiple files/folders 

* Separate files you are actively working from the old ones  
* Orient newcomers to the group's conventions



# Data housekeeping
<div style="position:absolute">

## When working 
  * Clarify and separate source and intermediate data
  * Keep data copies to a **minimum**
  * Cleanup post-analysis
  * Cleanup copies created for presentations or for sharing
  * Handover data to a new responsible when leaving
</div> 
<div style="position:relative;left:50%; width:40%">
<img src="slides/img/cleaning-table.jpg" height="450px">
</div>



# Data housekeeping
## End of project
  * data should be kept as a single copy on server-side storage 
    * no copies on desktops and external devices
  * non-proprietary formats
  * minimal metadata:
    * source
    * context of generation
    * data structure
    * content
  * sensitive data (e.g. whole genome) **must** be encrypted
  <br/>
  <br/>
  * If not specified otherwise, data must be kept for **10 years** following project end for reproducibility purposes
<aside class="notes">
Note: sometimes it is hard to find/understand dataset 10 days old
</aside>
 
## In doubt on data archival?
Contact R<sup>3</sup> for support on archival of datasets using tickets:
  * https://service.uni.lu/sp
  * Home > Catalog > LCSB > Biocore: Application services > Request for: Support

<div style="position:absolute; width:45%; left:50%; top:28em; text-align:right">
<a href=" https://howto.lcsb.uni.lu/?policies:LCSB-POL-BIC-03" style="color:grey; font-size:0.8em;">Research Data Retention and Archival Policy</a>
</div>



# Data housekeeping - Summary
## Server is your friend!
  * Allows a consistent backup policy for your datasets
  * Keeps number of copies to minimum
  * Specification of clear access rights
  * High accessibility
  * Data are discoverable
  * Server can't be stolen
  
## General guidelines
  * Use institutional media for storage of **all** data
  * Research data (particularly sensitive data) should be in a single source location
  * Enable encryption for data stored on movable media
  * Clarify and separate source and intermediate data
  * Disable write access to relevant source data (read-only)
  * Backup research data!
  * Download Anti-virus software
  * Generate checksums