Skip to content
Snippets Groups Projects
Commit 61b65b0d authored by Laurent Heirendt's avatar Laurent Heirendt :airplane:
Browse files

Merge branch '2021-07-27_IT101-DM' into 'develop'

2021 07 27 it101 dm

See merge request !108
parents e9435d7e 10f66eac
No related branches found
No related tags found
No related merge requests found
Showing
with 824 additions and 116 deletions
...@@ -73,13 +73,6 @@ From Jenny Bryan by CC-BY ...@@ -73,13 +73,6 @@ From Jenny Bryan by CC-BY
* Separate files you are actively working from the old ones * Separate files you are actively working from the old ones
* Orient newcomers to the group's conventions * Orient newcomers to the group's conventions
<div style="position:absolute;left:43%;top:10%">
<img src="slides/img/folder_structure.png" height="700px">
</div>
<div style="position:absolute; width:45%; left:50%; top:28em; text-align:right">
<a href="https://riojournal.com/article/56508/" style="color:grey; font-size:0.8em;">Foundational Practices of Research Data Management</a>
</div>
# Data housekeeping # Data housekeeping
......
...@@ -4,34 +4,50 @@ ...@@ -4,34 +4,50 @@
<img src="slides/img/LCSB_storages_full.png" height="750px"> <img src="slides/img/LCSB_storages_full.png" height="750px">
</div> </div>
<div class='fragment' style="position:relative"> <div class='fragment' style="position:absolute">
<img src="slides/img/LCSB_storages_personal-crossed.png" height="750px"> <img src="slides/img/LCSB_storages_personal-crossed.png" height="750px">
</div>
<div style="position:absolute; width:45%; left:50%; top:28em; text-align:right">
<a href=" https://howto.lcsb.uni.lu/?policies:LCSB-POL-BIC-02" style="color:grey; font-size:0.8em;">Data Storage and Backup Policy</a>
</div>
# Data ingestion/transfer
## Receiving and sending data
<div style="position:absolute;left:65%;top:60%"> <img height="400px" style="position:relative;left:10%" src="slides/img/banned_exchange_channels.png"><br>
<div style="position:absolute; left:10%;width:30%">
* Unless consortium/project has formally agreed to use a secure commercial cloud ## E-mail is not for data transfer
* Avoid transfer of any data by e-mail
* E-mail is a poor repository
* Data can be read on passage
</div> </div>
<div class="fragment" style="left:50%; width:30%; position:absolute">
## Exchanging data
* Share on Atlas server
* OwnCloud share (LCSB - BioCore)
* DropIt service (SIU)
* LFT (IBM Aspera) share for sensitive data
</div>
</div> </div>
<div style="position:absolute; width:45%; left:50%; top:28em; text-align:right"> <div style="position:absolute; width:45%; left:50%; top:28em; text-align:right">
<a href=" https://howto.lcsb.uni.lu/?policies:LCSB-POL-BIC-02" style="color:grey; font-size:0.8em;">Data Storage and Backup Policy</a> <a href=" https://howto.lcsb.uni.lu/?policies:LCSB-POL-BIC-05" style="color:grey; font-size:0.8em;">Research Human Data Sharing Policy</a>
</div> </div>
# Data ingestion/transfer
# Data ingestion: Transfer and Integrity
* When sending data: <font color="red">Do not use emails, use secure platforms (Cloud, Aspera, Atlas share...)!</font>
<div class="fragment">
Data can be corrupted: Data can be corrupted:
* (non-)malicious modification * (non-)malicious modification
* faulty file transfer * faulty file transfer
* disk corruption * disk corruption
</div>
<div class="fragment"> <div class="fragment">
...@@ -39,8 +55,8 @@ Data can be corrupted: ...@@ -39,8 +55,8 @@ Data can be corrupted:
* disable write access to the source data * disable write access to the source data
* generate checksums! * generate checksums!
<div style="position:absolute;left:40%;top:30%"> <div style="position:absolute;left:40%">
<img src="slides/img/checksum.png" width="500px"> <img src="slides/img/checksum.png" width="500px">
</div> </div>
</div> </div>
...@@ -61,33 +77,3 @@ Data can be corrupted: ...@@ -61,33 +77,3 @@ Data can be corrupted:
<div style="position:absolute; width:45%; left:50%; top:28em; text-align:right"> <div style="position:absolute; width:45%; left:50%; top:28em; text-align:right">
<a href=" https://howto.lcsb.uni.lu/?policies:LCSB-POL-BIC-02" style="color:grey; font-size:0.8em;">Data Storage and Backup Policy</a> <a href=" https://howto.lcsb.uni.lu/?policies:LCSB-POL-BIC-02" style="color:grey; font-size:0.8em;">Data Storage and Backup Policy</a>
</div> </div>
# Data ingestion/Integrity
## Encryption
<div class='fragment' style="position:relative;left:25%;top:60%">
<img align="middle" height="300px" src="slides/img/encryption.png">
</div>
<div class='fragment'>
* Sensitive data protected by encryption
</div>
<div class='fragment'>
* Guaranted confidentiality
</div>
<div class='fragment'>
* Encryption key which need to be kept safe
</div>
<div class='fragment'>
* <font color= red>Loosing your encryption key means loosing your data!</font>
</div>
<div class='fragment'>
* Make a off-site backup of your data
</div>
...@@ -24,5 +24,3 @@ Prof. Dr. Rudi Balling, director ...@@ -24,5 +24,3 @@ Prof. Dr. Rudi Balling, director
* Technicians * Technicians
* Administrators * Administrators
</div> </div>
[ [
{ "filename": "index.md" }, {"filename": "index.md"},
{ "filename": "introduction.md" }, {"filename": "introduction.md"},
{ "filename": "access_management.md" }, {"filename": "data-introduction.md"},
{ "filename": "data-introduction.md" }, {"filename": "data_flow.md"},
{ "filename": "data_flow.md" }, {"filename": "ingestion.md"},
{ "filename": "ingestion.md" }, {"filename": "storage_setup.md"},
{ "filename": "storage_setup.md" }, {"filename": "data-housekeeping.md"},
{ "filename": "data-housekeeping.md" }, {"filename": "howtos.md"},
{ "filename": "howtos.md" }, {"filename": "reproducibility.md"},
{ "filename": "reproducibility.md" }, {"filename": "code_versioning.md"},
{ "filename": "code_versioning.md" }, {"filename": "visualization.md"},
{ "filename": "visualization.md" }, {"filename": "problem_solving.md"},
{ "filename": "data_life_cycle.md" }, {"filename": "fair-principles.md"},
{ "filename": "problem_solving.md" }, {"filename": "r3_group.md"},
{ "filename": "fair-principles.md" }, {"filename": "thanks.md"}
{ "filename": "r3_group.md" }, ]
{ "filename": "thanks.md" }
]
\ No newline at end of file
...@@ -94,4 +94,3 @@ ...@@ -94,4 +94,3 @@
<img src="slides/img/red-cross.png" width="700px"><br> <img src="slides/img/red-cross.png" width="700px"><br>
</div> </div>
</div> </div>
...@@ -4,8 +4,12 @@ ...@@ -4,8 +4,12 @@
* Regularly update your SW/OS * Regularly update your SW/OS
* Encrypt movable media * Encrypt movable media
### Passwords
<div class="fragment" > * Strong passwords
* Password manager
* Safe password exchange channels
* Expiration time on password share
### Backup ### Backup
* take care of your own backups! * take care of your own backups!
...@@ -20,35 +24,6 @@ ...@@ -20,35 +24,6 @@
<a href=" https://howto.lcsb.uni.lu/?policies:LCSB-POL-BIC-02" style="color:grey; font-size:0.8em;">Data Storage and Backup Policy</a> <a href=" https://howto.lcsb.uni.lu/?policies:LCSB-POL-BIC-02" style="color:grey; font-size:0.8em;">Data Storage and Backup Policy</a>
</div> </div>
</div>
<div class="fragment">
### Passwords
* Strong passwords
* Password manager
* Safe password exchange channels
* Expiration time on password share
</div>
# Storage set-up
## Password exchange channels
<div style="position:relative">
<img src="slides/img/privateBin.png" height="350px">
</div>
<div style="position:absolute;left:65%;top:85%">
* Free service provided by LSCB at <a href="https://privatebin.lcsb.uni.lu" style="color:blue; font-size:0.8em;">privatebin.lcsb.uni.lu</a>
* **LUMS** account is required
* Set expiry period
* Can expire upon first access
* Password only accessible by sender and recipient
</div>
# Storage set-up # Storage set-up
......
../../2021-04-20_IT101-DM/slides/code_versioning.md
\ No newline at end of file
# Code versioning
<div style="position:absolute; width:40%">
**git**
* Current standard for code versioning
* Maintain versions of your code as it develops
* Local system, which does not require an online repository
* Repositories allow distributed development
<img align="middle" height="300px" src="slides/img/Git-logo.png">
</div>
<div class="fragment" style="position:absolute; left:50%; width:40%"">
**git@lcsb**
* Recommended, supported repository
* Allows tracking of issues
* Ready for continous integration - code checked on commits to the repository.
* [https://git-r3lab.uni.lu](https://git-r3lab.uni.lu)
**Use at LCSB**
* All analyses code should be in a repository
* Minimally at submission of a manuscript
* Better daily
* Even better "analyses chunkwise"
</div>
<div style="position:absolute; width:45%; left:50%; top:28em; text-align:right">
<a href="https://howto.lcsb.uni.lu/?policies:LCSB-POL-BIC-07" style="color:grey; font-size:0.8em;">LCSB-POL-BIC-07 Source Code Management Policy</a>
</div>
../../2021-04-20_IT101-DM/slides/data-housekeeping.md
\ No newline at end of file
# Data housekeeping
## File names
<div style="display:flex; position:static; width:100%">
<div class="fragment" data-fragment-index="0" style="position:static; width:30%">
### General pricinples
* Machine readable
* Human readable
* Plays well with default ordering
</div>
<div class="fragment" data-fragment-index="1" style="position:absolute; left:33%; width:30%">
### Separators
* No spaces
* Underscore to separate
* Hyphen to combine
</div>
<div class="fragment" data-fragment-index="2" style="position:absolute; left:66%; width:30%">
### Date format follows **ISO 8601**<br>
2018-12-03<br>
2018-12-06_1700
</div>
</div>
<div class="fragment" data-fragment-index="3" style="width:100%; position:static">
<div style="position:absolute;width:55%">
<b>Bad</b> names
```bash
PhD-project-Jan19 alldata_final.foo
Finacial detailes BIocore 19/11/12.xls
ATACseq1Londonmapped.bam
Hlad.jez.M-L-průtoky JíObj.z Ohře-od 10-2011.xlsx
```
</div>
<div style="position:relative;width:55%; bottom:20%; left:50%">
<b>Good</b> names
```bash
Iris-setosa_samples_1927-05-12.csv
PI102_Mouse12_EEG_2018-11-03_1245.tsv
Bioinfiniti_FullProposal_2018-11-15_1655.do
```
</div>
</div>
<br>
<br>
<div class="fragment" data-fragment-index="3" style="width:100%;">
From Jenny Bryan by CC-BY
(https://speakerdeck.com/jennybc/how-to-name-files)
</div>
# Data housekeeping
## File organization
* Have folder organization conventions for your **group**
* Per Paper
* Per Study/Project
* Per Collaborator
* Keep <b>readme files</b> for data
* Title
* Date of Creation/Receipt
* Instrument or software specific information
* People involved
* Relations between multiple files/folders
* Separate files you are actively working from the old ones
* Orient newcomers to the group's conventions
<div style="position:absolute;left:43%;top:10%">
<img src="slides/img/folder_structure.png" height="700px">
</div>
<div style="position:absolute; width:45%; left:50%; top:28em; text-align:right">
<a href="https://riojournal.com/article/56508/" style="color:grey; font-size:0.8em;">Foundational Practices of Research Data Management</a>
</div>
# Data housekeeping
<div style="position:absolute">
## When working
* Clarify and separate source and intermediate data
* Keep data copies to a **minimum**
* Cleanup post-analysis
* Cleanup copies created for presentations or for sharing
</div>
<div style="position:relative;left:50%; width:40%">
<img src="slides/img/cleaning-table.jpg" height="450px">
</div>
# Data housekeeping
## End of project
* handover data to a new responsible when leaving
* data should be kept as a single copy on server-side storage
* no copies on desktops and external devices
* non-proprietary formats
* minimal metadata
* sensitive data (e.g. whole genome) **must** be encrypted
<br/>
<br/>
* If not specified otherwise, data must be kept for **10 years** following project end for reproducibility purposes
<aside class="notes">
Note: sometimes it is hard to find/understand dataset 10 days old
</aside>
## In doubt on data archival?
Contact R<sup>3</sup> for support on archival of datasets using tickets:
* https://service.uni.lu/sp
* Home > Catalog > LCSB > Biocore: Application services > Request for: Support
<div style="position:absolute; width:45%; left:50%; top:28em; text-align:right">
<a href=" https://howto.lcsb.uni.lu/?policies:LCSB-POL-BIC-03" style="color:grey; font-size:0.8em;">Research Data Retention and Archival Policy</a>
</div>
# Data housekeeping - Summary
## Server is your friend!
* Allows a consistent backup policy for your datasets
* Keeps number of copies to minimum
* Specification of clear access rights
* High accessibility
* Data are discoverable
* Server can't be stolen
## General guidelines
* Use institutional media for storage of **all** data
* Research data (particularly sensitive data) should be in a single source location
* Enable encryption for data stored on movable media
* Clarify and separate source and intermediate data
* Disable write access to relevant source data (read-only)
* Backup research data!
* Download Anti-virus software
* Generate checksums
../../2021-04-20_IT101-DM/slides/data-introduction.md
\ No newline at end of file
# Data and metadata
<div style="display:grid;grid-gap:100px;grid-template-columns: 40% 40%">
<div >
## Data
* "*information in digital form that can be transmitted or processed*"
<p align="right">-- Merriam-Webster dictionary</p>
* "*information in an electronic form that can be stored and processed by a computer*"
<p align="right">--Cambridge dictionary</p>
</div>
<div>
## Metadata
* data describing other data
* information that is given to describe or help you use other information
* metadata are data
* can be processed and analyzed
</div>
</div>
<div class="fragment">
## Metadata examples:
<div style="position:absolute">
<ul>
<li> LabBook </li>
<li> author/owner of the data</li>
<li> origin of the data
<li> data type
</ul>
</div>
<div style="position:absolute;left:25%">
<ul>
<li> description of content </li>
<li> modification date </li>
<li> description of modification </li>
<li> location </li>
</ul>
</div>
<div style="position:relative;left:50%;top:0.7em">
<ul>
<li> calibration readings</li>
<li> software/firmware version</li>
<li> data purpose</li>
<li> means of creation</li>
</ul>
</div>
</div>
<div class="fragment">
<br>
</center>
<center style="color:red">!Insufficient metadata make the data useless!</center>
</div>
<aside class="notes">
Sometimes metadata collection takes more time than data collection
</aside>
# LCSB research data
three categories:
* **Primary data**
* scientific data
* measurements, images, observations, notes, surveys, ...
* models, software codes, libraries, ...
* metadata directly describing the data
* data dictionaries
* format, version, coverage descriptions, ...
* **Research record**
* description of the research process, including experiment
* experiment set-up
* followed protocols
* ...
* **Project accompanying documentation**
* ethical approvals, information on the consent
* collaboration agreements
* intellectual property ownership
* other relevant documentation
<div style="position:absolute; width:45%; left:50%; top:28em; text-align:right;">
<a href="https://howto.lcsb.uni.lu/internal/policies/LCSB-POL-BIC-03/" style="color:grey; font-size:0.8em;">LCSB-POL-BIC-03 Research Data Retention and Archival Policy</a>
</div>
\ No newline at end of file
../../2021-04-20_IT101-DM/slides/data_flow.md
\ No newline at end of file
# Typical flow of data
<div style="display:grid;grid-gap:10px;grid-template-columns: 30% 20% 30%;
grid-auto-flow:column;grid-template-rows: repeat(4,auto);position:relative;left:8%">
<div class="content-box fragment" data-fragment-index="1">
<div class="box-title red">Source data</div>
<div class="content">
* Experimental results
* Large data sets
* Manually collected data
* External
</div>
</div>
<div class="content-box fragment" data-fragment-index="2">
<div class="box-title yellow">Intermediate</div>
<div class="content">
* Derived data
* Tidy data
* Curated sets
</div>
</div>
<div class="content-box fragment" data-fragment-index="3">
<div class="box-title blue">Analyses</div>
<div class="content">
* Exploratory
* Model building
* Hypothesis testing
</div>
</div>
<div class="content-box fragment" data-fragment-index="4">
<div class="box-title green">Dissemination</div>
<div class="content">
* Manuscript, report, presentation, ...
</div>
</div>
<center>
<img src="slides/img/data-flow_sources.png" height=60%>
</center>
<center>
<img src="slides/img/data-flow_transformation.png" height=60%>
</center>
<center>
<img src="slides/img/data-flow_chart.png" height=60%>
</center>
<center>
<img src="slides/img/data-flow_paper.png" height=60%>
</center>
<div class="content-box fragment" data-fragment-index="5">
<div class="box-title red">Preserve</div>
<div class="content">
* Version data sets
* Backup
* Protect
</div>
</div>
<div class="content-box fragment" data-fragment-index="6">
<div class="box-title yellow">Reproduce</div>
<div class="content">
* Automate your builds
* Use workflow tools (e.g. Snakemake)
</div>
</div>
<div class="content-box fragment" data-fragment-index="7">
<div class="box-title blue">Trace</div>
<div class="content">
* Multiple iterations.
* Code versioning (Git)
</div>
</div>
<div class="content-box fragment" data-fragment-index="8">
<div class="box-title green">Track</div>
<div class="content">
* Through multiple versions
</div>
</div>
</div>
<aside class="notes">
flow of the data is downstream (mostly), but you are going back and forth
applies to all data (financial report, lab safety assessment)
</aside>
# Some practical recommandations # Some practical recommandations
* Plan your data walking along the data life cycle * Do your data processing according to the data life cycle steps
<div class='fragment' style="position:relative;left:25%;top:60%"> <div class='fragment' style="position:relative;left:25%;top:60%">
<img align="middle" height="300px" src="slides/img/rdm-cycle.png"> <img align="middle" height="300px" src="slides/img/rdm-cycle.png">
</div> </div>
<div class="fragment"> <div class="fragment">
* Use data management tools: * Use data management planning tools:
* DMPonline <a href="https://dmponline.elixir-luxembourg.org/" style="color:blue; font-size:0.8em;">https://dmponline.elixir-luxembourg.org/</a> * DMPonline <a href="https://dmponline.elixir-luxembourg.org/" style="color:blue; font-size:0.8em;">https://dmponline.elixir-luxembourg.org/</a>
<img src="slides/img/dmponline_logo.png" height="50px"> <img src="slides/img/dmponline_logo.png" height="50px">
* DS Wizard <a href="https://learning.ds-wizard.org/" style="color:blue; font-size:0.8em;">https://learning.ds-wizard.org/</a> * DS Wizard <a href="https://learning.ds-wizard.org/" style="color:blue; font-size:0.8em;">https://learning.ds-wizard.org/</a>
......
../../2021-04-20_IT101-DM/slides/fair-principles.md
\ No newline at end of file
# FAIR (meta)data principles
* dates back to 2014
* well accepted by scientific community
* necessity in data driven science
* officially embraced by EU and G20
* required by funding agencies and journal publishers
<center>
<img src="slides/img/fair-principles.png" height="400px">
</center>
<br>
<br>
../../2021-04-20_IT101-DM/slides/howtos.md
\ No newline at end of file
# LCSB How-Tos
<br>
https://howto.lcsb.uni.lu/
<center>
<img src="slides/img/howtocard.png" width="50%">
</center>
../../2021-04-20_IT101-DM/slides/img
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment