Skip to content
Snippets Groups Projects
Commit 61b65b0d authored by Laurent Heirendt's avatar Laurent Heirendt :airplane:
Browse files

Merge branch '2021-07-27_IT101-DM' into 'develop'

2021 07 27 it101 dm

See merge request R3/school/courses!108
parents e9435d7e 10f66eac
No related branches found
No related tags found
No related merge requests found
Showing
with 824 additions and 116 deletions
......@@ -73,13 +73,6 @@ From Jenny Bryan by CC-BY
* Separate files you are actively working from the old ones
* Orient newcomers to the group's conventions
<div style="position:absolute;left:43%;top:10%">
<img src="slides/img/folder_structure.png" height="700px">
</div>
<div style="position:absolute; width:45%; left:50%; top:28em; text-align:right">
<a href="https://riojournal.com/article/56508/" style="color:grey; font-size:0.8em;">Foundational Practices of Research Data Management</a>
</div>
# Data housekeeping
......
......@@ -4,34 +4,50 @@
<img src="slides/img/LCSB_storages_full.png" height="750px">
</div>
<div class='fragment' style="position:relative">
<div class='fragment' style="position:absolute">
<img src="slides/img/LCSB_storages_personal-crossed.png" height="750px">
</div>
<div style="position:absolute; width:45%; left:50%; top:28em; text-align:right">
<a href=" https://howto.lcsb.uni.lu/?policies:LCSB-POL-BIC-02" style="color:grey; font-size:0.8em;">Data Storage and Backup Policy</a>
</div>
# Data ingestion/transfer
## Receiving and sending data
<div style="position:absolute;left:65%;top:60%">
<img height="400px" style="position:relative;left:10%" src="slides/img/banned_exchange_channels.png"><br>
<div style="position:absolute; left:10%;width:30%">
* Unless consortium/project has formally agreed to use a secure commercial cloud
## E-mail is not for data transfer
* Avoid transfer of any data by e-mail
* E-mail is a poor repository
* Data can be read on passage
</div>
<div class="fragment" style="left:50%; width:30%; position:absolute">
## Exchanging data
* Share on Atlas server
* OwnCloud share (LCSB - BioCore)
* DropIt service (SIU)
* LFT (IBM Aspera) share for sensitive data
</div>
</div>
<div style="position:absolute; width:45%; left:50%; top:28em; text-align:right">
<a href=" https://howto.lcsb.uni.lu/?policies:LCSB-POL-BIC-02" style="color:grey; font-size:0.8em;">Data Storage and Backup Policy</a>
<a href=" https://howto.lcsb.uni.lu/?policies:LCSB-POL-BIC-05" style="color:grey; font-size:0.8em;">Research Human Data Sharing Policy</a>
</div>
# Data ingestion: Transfer and Integrity
* When sending data: <font color="red">Do not use emails, use secure platforms (Cloud, Aspera, Atlas share...)!</font>
<div class="fragment">
# Data ingestion/transfer
Data can be corrupted:
* (non-)malicious modification
* faulty file transfer
* disk corruption
</div>
<div class="fragment">
......@@ -39,8 +55,8 @@ Data can be corrupted:
* disable write access to the source data
* generate checksums!
<div style="position:absolute;left:40%;top:30%">
<div style="position:absolute;left:40%">
<img src="slides/img/checksum.png" width="500px">
</div>
</div>
......@@ -61,33 +77,3 @@ Data can be corrupted:
<div style="position:absolute; width:45%; left:50%; top:28em; text-align:right">
<a href=" https://howto.lcsb.uni.lu/?policies:LCSB-POL-BIC-02" style="color:grey; font-size:0.8em;">Data Storage and Backup Policy</a>
</div>
# Data ingestion/Integrity
## Encryption
<div class='fragment' style="position:relative;left:25%;top:60%">
<img align="middle" height="300px" src="slides/img/encryption.png">
</div>
<div class='fragment'>
* Sensitive data protected by encryption
</div>
<div class='fragment'>
* Guaranted confidentiality
</div>
<div class='fragment'>
* Encryption key which need to be kept safe
</div>
<div class='fragment'>
* <font color= red>Loosing your encryption key means loosing your data!</font>
</div>
<div class='fragment'>
* Make a off-site backup of your data
</div>
......@@ -24,5 +24,3 @@ Prof. Dr. Rudi Balling, director
* Technicians
* Administrators
</div>
[
{ "filename": "index.md" },
{ "filename": "introduction.md" },
{ "filename": "access_management.md" },
{ "filename": "data-introduction.md" },
{ "filename": "data_flow.md" },
{ "filename": "ingestion.md" },
{ "filename": "storage_setup.md" },
{ "filename": "data-housekeeping.md" },
{ "filename": "howtos.md" },
{ "filename": "reproducibility.md" },
{ "filename": "code_versioning.md" },
{ "filename": "visualization.md" },
{ "filename": "data_life_cycle.md" },
{ "filename": "problem_solving.md" },
{ "filename": "fair-principles.md" },
{ "filename": "r3_group.md" },
{ "filename": "thanks.md" }
]
\ No newline at end of file
{"filename": "index.md"},
{"filename": "introduction.md"},
{"filename": "data-introduction.md"},
{"filename": "data_flow.md"},
{"filename": "ingestion.md"},
{"filename": "storage_setup.md"},
{"filename": "data-housekeeping.md"},
{"filename": "howtos.md"},
{"filename": "reproducibility.md"},
{"filename": "code_versioning.md"},
{"filename": "visualization.md"},
{"filename": "problem_solving.md"},
{"filename": "fair-principles.md"},
{"filename": "r3_group.md"},
{"filename": "thanks.md"}
]
......@@ -94,4 +94,3 @@
<img src="slides/img/red-cross.png" width="700px"><br>
</div>
</div>
......@@ -4,8 +4,12 @@
* Regularly update your SW/OS
* Encrypt movable media
### Passwords
<div class="fragment" >
* Strong passwords
* Password manager
* Safe password exchange channels
* Expiration time on password share
### Backup
* take care of your own backups!
......@@ -20,35 +24,6 @@
<a href=" https://howto.lcsb.uni.lu/?policies:LCSB-POL-BIC-02" style="color:grey; font-size:0.8em;">Data Storage and Backup Policy</a>
</div>
</div>
<div class="fragment">
### Passwords
* Strong passwords
* Password manager
* Safe password exchange channels
* Expiration time on password share
</div>
# Storage set-up
## Password exchange channels
<div style="position:relative">
<img src="slides/img/privateBin.png" height="350px">
</div>
<div style="position:absolute;left:65%;top:85%">
* Free service provided by LSCB at <a href="https://privatebin.lcsb.uni.lu" style="color:blue; font-size:0.8em;">privatebin.lcsb.uni.lu</a>
* **LUMS** account is required
* Set expiry period
* Can expire upon first access
* Password only accessible by sender and recipient
</div>
# Storage set-up
......
../../2021-04-20_IT101-DM/slides/code_versioning.md
\ No newline at end of file
# Code versioning
<div style="position:absolute; width:40%">
**git**
* Current standard for code versioning
* Maintain versions of your code as it develops
* Local system, which does not require an online repository
* Repositories allow distributed development
<img align="middle" height="300px" src="slides/img/Git-logo.png">
</div>
<div class="fragment" style="position:absolute; left:50%; width:40%"">
**git@lcsb**
* Recommended, supported repository
* Allows tracking of issues
* Ready for continous integration - code checked on commits to the repository.
* [https://git-r3lab.uni.lu](https://git-r3lab.uni.lu)
**Use at LCSB**
* All analyses code should be in a repository
* Minimally at submission of a manuscript
* Better daily
* Even better "analyses chunkwise"
</div>
<div style="position:absolute; width:45%; left:50%; top:28em; text-align:right">
<a href="https://howto.lcsb.uni.lu/?policies:LCSB-POL-BIC-07" style="color:grey; font-size:0.8em;">LCSB-POL-BIC-07 Source Code Management Policy</a>
</div>
../../2021-04-20_IT101-DM/slides/data-housekeeping.md
\ No newline at end of file
# Data housekeeping
## File names
<div style="display:flex; position:static; width:100%">
<div class="fragment" data-fragment-index="0" style="position:static; width:30%">
### General pricinples
* Machine readable
* Human readable
* Plays well with default ordering
</div>
<div class="fragment" data-fragment-index="1" style="position:absolute; left:33%; width:30%">
### Separators
* No spaces
* Underscore to separate
* Hyphen to combine
</div>
<div class="fragment" data-fragment-index="2" style="position:absolute; left:66%; width:30%">
### Date format follows **ISO 8601**<br>
2018-12-03<br>
2018-12-06_1700
</div>
</div>
<div class="fragment" data-fragment-index="3" style="width:100%; position:static">
<div style="position:absolute;width:55%">
<b>Bad</b> names
```bash
PhD-project-Jan19 alldata_final.foo
Finacial detailes BIocore 19/11/12.xls
ATACseq1Londonmapped.bam
Hlad.jez.M-L-průtoky JíObj.z Ohře-od 10-2011.xlsx
```
</div>
<div style="position:relative;width:55%; bottom:20%; left:50%">
<b>Good</b> names
```bash
Iris-setosa_samples_1927-05-12.csv
PI102_Mouse12_EEG_2018-11-03_1245.tsv
Bioinfiniti_FullProposal_2018-11-15_1655.do
```
</div>
</div>
<br>
<br>
<div class="fragment" data-fragment-index="3" style="width:100%;">
From Jenny Bryan by CC-BY
(https://speakerdeck.com/jennybc/how-to-name-files)
</div>
# Data housekeeping
## File organization
* Have folder organization conventions for your **group**
* Per Paper
* Per Study/Project
* Per Collaborator
* Keep <b>readme files</b> for data
* Title
* Date of Creation/Receipt
* Instrument or software specific information
* People involved
* Relations between multiple files/folders
* Separate files you are actively working from the old ones
* Orient newcomers to the group's conventions
<div style="position:absolute;left:43%;top:10%">
<img src="slides/img/folder_structure.png" height="700px">
</div>
<div style="position:absolute; width:45%; left:50%; top:28em; text-align:right">
<a href="https://riojournal.com/article/56508/" style="color:grey; font-size:0.8em;">Foundational Practices of Research Data Management</a>
</div>
# Data housekeeping
<div style="position:absolute">
## When working
* Clarify and separate source and intermediate data
* Keep data copies to a **minimum**
* Cleanup post-analysis
* Cleanup copies created for presentations or for sharing
</div>
<div style="position:relative;left:50%; width:40%">
<img src="slides/img/cleaning-table.jpg" height="450px">
</div>
# Data housekeeping
## End of project
* handover data to a new responsible when leaving
* data should be kept as a single copy on server-side storage
* no copies on desktops and external devices
* non-proprietary formats
* minimal metadata
* sensitive data (e.g. whole genome) **must** be encrypted
<br/>
<br/>
* If not specified otherwise, data must be kept for **10 years** following project end for reproducibility purposes
<aside class="notes">
Note: sometimes it is hard to find/understand dataset 10 days old
</aside>
## In doubt on data archival?
Contact R<sup>3</sup> for support on archival of datasets using tickets:
* https://service.uni.lu/sp
* Home > Catalog > LCSB > Biocore: Application services > Request for: Support
<div style="position:absolute; width:45%; left:50%; top:28em; text-align:right">
<a href=" https://howto.lcsb.uni.lu/?policies:LCSB-POL-BIC-03" style="color:grey; font-size:0.8em;">Research Data Retention and Archival Policy</a>
</div>
# Data housekeeping - Summary
## Server is your friend!
* Allows a consistent backup policy for your datasets
* Keeps number of copies to minimum
* Specification of clear access rights
* High accessibility
* Data are discoverable
* Server can't be stolen
## General guidelines
* Use institutional media for storage of **all** data
* Research data (particularly sensitive data) should be in a single source location
* Enable encryption for data stored on movable media
* Clarify and separate source and intermediate data
* Disable write access to relevant source data (read-only)
* Backup research data!
* Download Anti-virus software
* Generate checksums
../../2021-04-20_IT101-DM/slides/data-introduction.md
\ No newline at end of file
# Data and metadata
<div style="display:grid;grid-gap:100px;grid-template-columns: 40% 40%">
<div >
## Data
* "*information in digital form that can be transmitted or processed*"
<p align="right">-- Merriam-Webster dictionary</p>
* "*information in an electronic form that can be stored and processed by a computer*"
<p align="right">--Cambridge dictionary</p>
</div>
<div>
## Metadata
* data describing other data
* information that is given to describe or help you use other information
* metadata are data
* can be processed and analyzed
</div>
</div>
<div class="fragment">
## Metadata examples:
<div style="position:absolute">
<ul>
<li> LabBook </li>
<li> author/owner of the data</li>
<li> origin of the data
<li> data type
</ul>
</div>
<div style="position:absolute;left:25%">
<ul>
<li> description of content </li>
<li> modification date </li>
<li> description of modification </li>
<li> location </li>
</ul>
</div>
<div style="position:relative;left:50%;top:0.7em">
<ul>
<li> calibration readings</li>
<li> software/firmware version</li>
<li> data purpose</li>
<li> means of creation</li>
</ul>
</div>
</div>
<div class="fragment">
<br>
</center>
<center style="color:red">!Insufficient metadata make the data useless!</center>
</div>
<aside class="notes">
Sometimes metadata collection takes more time than data collection
</aside>
# LCSB research data
three categories:
* **Primary data**
* scientific data
* measurements, images, observations, notes, surveys, ...
* models, software codes, libraries, ...
* metadata directly describing the data
* data dictionaries
* format, version, coverage descriptions, ...
* **Research record**
* description of the research process, including experiment
* experiment set-up
* followed protocols
* ...
* **Project accompanying documentation**
* ethical approvals, information on the consent
* collaboration agreements
* intellectual property ownership
* other relevant documentation
<div style="position:absolute; width:45%; left:50%; top:28em; text-align:right;">
<a href="https://howto.lcsb.uni.lu/internal/policies/LCSB-POL-BIC-03/" style="color:grey; font-size:0.8em;">LCSB-POL-BIC-03 Research Data Retention and Archival Policy</a>
</div>
\ No newline at end of file
../../2021-04-20_IT101-DM/slides/data_flow.md
\ No newline at end of file
# Typical flow of data
<div style="display:grid;grid-gap:10px;grid-template-columns: 30% 20% 30%;
grid-auto-flow:column;grid-template-rows: repeat(4,auto);position:relative;left:8%">
<div class="content-box fragment" data-fragment-index="1">
<div class="box-title red">Source data</div>
<div class="content">
* Experimental results
* Large data sets
* Manually collected data
* External
</div>
</div>
<div class="content-box fragment" data-fragment-index="2">
<div class="box-title yellow">Intermediate</div>
<div class="content">
* Derived data
* Tidy data
* Curated sets
</div>
</div>
<div class="content-box fragment" data-fragment-index="3">
<div class="box-title blue">Analyses</div>
<div class="content">
* Exploratory
* Model building
* Hypothesis testing
</div>
</div>
<div class="content-box fragment" data-fragment-index="4">
<div class="box-title green">Dissemination</div>
<div class="content">
* Manuscript, report, presentation, ...
</div>
</div>
<center>
<img src="slides/img/data-flow_sources.png" height=60%>
</center>
<center>
<img src="slides/img/data-flow_transformation.png" height=60%>
</center>
<center>
<img src="slides/img/data-flow_chart.png" height=60%>
</center>
<center>
<img src="slides/img/data-flow_paper.png" height=60%>
</center>
<div class="content-box fragment" data-fragment-index="5">
<div class="box-title red">Preserve</div>
<div class="content">
* Version data sets
* Backup
* Protect
</div>
</div>
<div class="content-box fragment" data-fragment-index="6">
<div class="box-title yellow">Reproduce</div>
<div class="content">
* Automate your builds
* Use workflow tools (e.g. Snakemake)
</div>
</div>
<div class="content-box fragment" data-fragment-index="7">
<div class="box-title blue">Trace</div>
<div class="content">
* Multiple iterations.
* Code versioning (Git)
</div>
</div>
<div class="content-box fragment" data-fragment-index="8">
<div class="box-title green">Track</div>
<div class="content">
* Through multiple versions
</div>
</div>
</div>
<aside class="notes">
flow of the data is downstream (mostly), but you are going back and forth
applies to all data (financial report, lab safety assessment)
</aside>
# Some practical recommandations
* Plan your data walking along the data life cycle
* Do your data processing according to the data life cycle steps
<div class='fragment' style="position:relative;left:25%;top:60%">
<img align="middle" height="300px" src="slides/img/rdm-cycle.png">
</div>
<div class="fragment">
* Use data management tools:
* Use data management planning tools:
* DMPonline <a href="https://dmponline.elixir-luxembourg.org/" style="color:blue; font-size:0.8em;">https://dmponline.elixir-luxembourg.org/</a>
<img src="slides/img/dmponline_logo.png" height="50px">
* DS Wizard <a href="https://learning.ds-wizard.org/" style="color:blue; font-size:0.8em;">https://learning.ds-wizard.org/</a>
......
../../2021-04-20_IT101-DM/slides/fair-principles.md
\ No newline at end of file
# FAIR (meta)data principles
* dates back to 2014
* well accepted by scientific community
* necessity in data driven science
* officially embraced by EU and G20
* required by funding agencies and journal publishers
<center>
<img src="slides/img/fair-principles.png" height="400px">
</center>
<br>
<br>
../../2021-04-20_IT101-DM/slides/howtos.md
\ No newline at end of file
# LCSB How-Tos
<br>
https://howto.lcsb.uni.lu/
<center>
<img src="slides/img/howtocard.png" width="50%">
</center>
../../2021-04-20_IT101-DM/slides/img
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment