Skip to content
GitLab
Projects
Groups
Snippets
/
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Sign in
Toggle navigation
Menu
Open sidebar
R3
school
courses
Commits
15d837c7
Commit
15d837c7
authored
Jun 28, 2021
by
Laurent Heirendt
✈
Browse files
Merge branch 'develop' into 'master'
[release] Regular merge of develop See merge request
!105
parents
e1af3b7e
3e19938a
Pipeline
#43536
passed with stages
in 4 minutes and 24 seconds
Changes
76
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
2021/2021-04-20_IT101-DM/slides/code_versioning.md
0 → 100644
View file @
15d837c7
# Code versioning
<div
style=
"position:absolute; width:40%"
>
**git**
*
Current standard for code versioning
*
Maintain versions of your code as it develops
*
Local system, which does not require an online repository
*
Repositories allow distributed development
<img
align=
"middle"
height=
"300px"
src=
"slides/img/Git-logo.png"
>
</div>
<div
class=
"fragment"
style=
"position:absolute; left:50%; width:40%"
"
>
**git@lcsb**
*
Recommended, supported repository
*
Allows tracking of issues
*
Ready for continous integration - code checked on commits to the repository.
*
[
https://git-r3lab.uni.lu
](
https://git-r3lab.uni.lu
)
**Use at LCSB**
*
All analyses code should be in a repository
*
Minimally at submission of a manuscript
*
Better daily
*
Even better "analyses chunkwise"
</div>
<div
style=
"position:absolute; width:45%; left:50%; top:28em; text-align:right"
>
<a
href=
"https://howto.lcsb.uni.lu/?policies:LCSB-POL-BIC-07"
style=
"color:grey; font-size:0.8em;"
>
LCSB-POL-BIC-07 Source Code Management Policy
</a>
</div>
2021/2021-04-20_IT101-DM/slides/data-housekeeping.md
0 → 100644
View file @
15d837c7
# Data housekeeping
## File names
<div
style=
"display:flex; position:static; width:100%"
>
<div
class=
"fragment"
data-fragment-index=
"0"
style=
"position:static; width:30%"
>
### General pricinples
*
Machine readable
*
Human readable
*
Plays well with default ordering
</div>
<div
class=
"fragment"
data-fragment-index=
"1"
style=
"position:absolute; left:33%; width:30%"
>
### Separators
*
No spaces
*
Underscore to separate
*
Hyphen to combine
</div>
<div
class=
"fragment"
data-fragment-index=
"2"
style=
"position:absolute; left:66%; width:30%"
>
### Date format follows **ISO 8601**<br>
2018-12-03
<br>
2018-12-06_1700
</div>
</div>
<div
class=
"fragment"
data-fragment-index=
"3"
style=
"width:100%; position:static"
>
<div
style=
"position:absolute;width:55%"
>
<b>
Bad
</b>
names
```
bash
PhD-project-Jan19 alldata_final.foo
Finacial detailes BIocore 19/11/12.xls
ATACseq1Londonmapped.bam
Hlad.jez.M-L-průtoky JíObj.z Ohře-od 10-2011.xlsx
```
</div>
<div
style=
"position:relative;width:55%; bottom:20%; left:50%"
>
<b>
Good
</b>
names
```
bash
Iris-setosa_samples_1927-05-12.csv
PI102_Mouse12_EEG_2018-11-03_1245.tsv
Bioinfiniti_FullProposal_2018-11-15_1655.do
```
</div>
</div>
<br>
<br>
<div
class=
"fragment"
data-fragment-index=
"3"
style=
"width:100%;"
>
From Jenny Bryan by CC-BY
(https://speakerdeck.com/jennybc/how-to-name-files)
</div>
# Data housekeeping
## File organization
*
Have folder organization conventions for your
**group**
*
Per Paper
*
Per Study/Project
*
Per Collaborator
*
Keep
<b>
readme files
</b>
for data
*
Title
*
Date of Creation/Receipt
*
Instrument or software specific information
*
People involved
*
Relations between multiple files/folders
*
Separate files you are actively working from the old ones
*
Orient newcomers to the group's conventions
# Data housekeeping
<div
style=
"position:absolute"
>
## When working
*
Clarify and separate source and intermediate data
*
Keep data copies to a
**minimum**
*
Cleanup post-analysis
*
Cleanup copies created for presentations or for sharing
</div>
<div
style=
"position:relative;left:50%; width:40%"
>
<img
src=
"slides/img/cleaning-table.jpg"
height=
"450px"
>
</div>
# Data housekeeping
## End of project
*
handover data to a new responsible when leaving
*
data should be kept as a single copy on server-side storage
*
no copies on desktops and external devices
*
non-proprietary formats
*
minimal metadata
*
sensitive data (e.g. whole genome)
**must**
be encrypted
<br/>
<br/>
*
If not specified otherwise, data must be kept for
**10 years**
following project end for reproducibility purposes
<aside
class=
"notes"
>
Note: sometimes it is hard to find/understand dataset 10 days old
</aside>
## In doubt on data archival?
Contact R
<sup>
3
</sup>
for support on archival of datasets using tickets:
*
https://service.uni.lu/sp
*
Home > Catalog > LCSB > Biocore: Application services > Request for: Support
<div
style=
"position:absolute; width:45%; left:50%; top:28em; text-align:right"
>
<a
href=
" https://howto.lcsb.uni.lu/?policies:LCSB-POL-BIC-03"
style=
"color:grey; font-size:0.8em;"
>
Research Data Retention and Archival Policy
</a>
</div>
# Data housekeeping - Summary
## Server is your friend!
*
Allows a consistent backup policy for your datasets
*
Keeps number of copies to minimum
*
Specification of clear access rights
*
High accessibility
*
Data are discoverable
*
Server can't be stolen
## General guidelines
*
Use institutional media for storage of
**all**
data
*
Research data (particularly sensitive data) should be in a single source location
*
Enable encryption for data stored on movable media
*
Clarify and separate source and intermediate data
*
Disable write access to relevant source data (read-only)
*
Backup research data!
*
Download Anti-virus software
*
Generate checksums
2021/2021-04-20_IT101-DM/slides/data-introduction.md
0 → 100644
View file @
15d837c7
# Data and metadata
<div
style=
"display:grid;grid-gap:100px;grid-template-columns: 40% 40%"
>
<div
>
## Data
*
"
*information in digital form that can be transmitted or processed*
"
<p
align=
"right"
>
-- Merriam-Webster dictionary
</p>
*
"
*information in an electronic form that can be stored and processed by a computer*
"
<p
align=
"right"
>
--Cambridge dictionary
</p>
</div>
<div>
## Metadata
*
data describing other data
*
information that is given to describe or help you use other information
*
metadata are data
*
can be processed and analyzed
</div>
</div>
<div
class=
"fragment"
>
## Metadata examples:
<div
style=
"position:absolute"
>
<ul>
<li>
LabBook
</li>
<li>
author/owner of the data
</li>
<li>
origin of the data
<li>
data type
</ul>
</div>
<div
style=
"position:absolute;left:25%"
>
<ul>
<li>
description of content
</li>
<li>
modification date
</li>
<li>
description of modification
</li>
<li>
location
</li>
</ul>
</div>
<div
style=
"position:relative;left:50%;top:0.7em"
>
<ul>
<li>
calibration readings
</li>
<li>
software/firmware version
</li>
<li>
data purpose
</li>
<li>
means of creation
</li>
</ul>
</div>
</div>
<div
class=
"fragment"
>
<br>
</center>
<center
style=
"color:red"
>
!Insufficient metadata make the data useless!
</center>
</div>
<aside
class=
"notes"
>
Sometimes metadata collection takes more time than data collection
</aside>
# LCSB research data
three categories:
*
**Primary data**
*
scientific data
*
measurements, images, observations, notes, surveys, ...
*
models, software codes, libraries, ...
*
metadata directly describing the data
*
data dictionaries
*
format, version, coverage descriptions, ...
*
**Research record**
*
description of the research process, including experiment
*
experiment set-up
*
followed protocols
*
...
*
**Project accompanying documentation**
*
ethical approvals, information on the consent
*
collaboration agreements
*
intellectual property ownership
*
other relevant documentation
<div
style=
"position:absolute; width:45%; left:50%; top:28em; text-align:right;"
>
<a
href=
"https://howto.lcsb.uni.lu/internal/policies/LCSB-POL-BIC-03/"
style=
"color:grey; font-size:0.8em;"
>
LCSB-POL-BIC-03 Research Data Retention and Archival Policy
</a>
</div>
\ No newline at end of file
2021/2021-04-20_IT101-DM/slides/data_flow.md
0 → 100644
View file @
15d837c7
# Typical flow of data
<div style="display:grid;grid-gap:10px;grid-template-columns: 30% 20% 30%;
grid-auto-flow:column;grid-template-rows: repeat(4,auto);position:relative;left:8%">
<div
class=
"content-box fragment"
data-fragment-index=
"1"
>
<div
class=
"box-title red"
>
Source data
</div>
<div
class=
"content"
>
*
Experimental results
*
Large data sets
*
Manually collected data
*
External
</div>
</div>
<div
class=
"content-box fragment"
data-fragment-index=
"2"
>
<div
class=
"box-title yellow"
>
Intermediate
</div>
<div
class=
"content"
>
*
Derived data
*
Tidy data
*
Curated sets
</div>
</div>
<div
class=
"content-box fragment"
data-fragment-index=
"3"
>
<div
class=
"box-title blue"
>
Analyses
</div>
<div
class=
"content"
>
*
Exploratory
*
Model building
*
Hypothesis testing
</div>
</div>
<div
class=
"content-box fragment"
data-fragment-index=
"4"
>
<div
class=
"box-title green"
>
Dissemination
</div>
<div
class=
"content"
>
*
Manuscript, report, presentation, ...
</div>
</div>
<center>
<img
src=
"slides/img/data-flow_sources.png"
height=
60%
>
</center>
<center>
<img
src=
"slides/img/data-flow_transformation.png"
height=
60%
>
</center>
<center>
<img
src=
"slides/img/data-flow_chart.png"
height=
60%
>
</center>
<center>
<img
src=
"slides/img/data-flow_paper.png"
height=
60%
>
</center>
<div
class=
"content-box fragment"
data-fragment-index=
"5"
>
<div
class=
"box-title red"
>
Preserve
</div>
<div
class=
"content"
>
*
Version data sets
*
Backup
*
Protect
</div>
</div>
<div
class=
"content-box fragment"
data-fragment-index=
"6"
>
<div
class=
"box-title yellow"
>
Reproduce
</div>
<div
class=
"content"
>
*
Automate your builds
*
Use workflow tools (e.g. Snakemake)
</div>
</div>
<div
class=
"content-box fragment"
data-fragment-index=
"7"
>
<div
class=
"box-title blue"
>
Trace
</div>
<div
class=
"content"
>
*
Multiple iterations.
*
Code versioning (Git)
</div>
</div>
<div
class=
"content-box fragment"
data-fragment-index=
"8"
>
<div
class=
"box-title green"
>
Track
</div>
<div
class=
"content"
>
*
Through multiple versions
</div>
</div>
</div>
<aside
class=
"notes"
>
flow of the data is downstream (mostly), but you are going back and forth
applies to all data (financial report, lab safety assessment)
</aside>
2021/2021-04-20_IT101-DM/slides/fair-principles.md
0 → 100644
View file @
15d837c7
# FAIR (meta)data principles
*
dates back to 2014
*
well accepted by scientific community
*
necessity in data driven science
*
officially embraced by EU and G20
*
required by funding agencies and journal publishers
<center>
<img
src=
"slides/img/fair-principles.png"
height=
"400px"
>
</center>
<br>
<br>
2021/2021-04-20_IT101-DM/slides/howtos.md
0 → 100644
View file @
15d837c7
# LCSB How-Tos
<br>
https://howto.lcsb.uni.lu/
<center>
<img
src=
"slides/img/howtocard.png"
width=
"50%"
>
</center>
2021/2021-04-20_IT101-DM/slides/img/3pillars-full.png
0 → 120000
View file @
15d837c7
../../../../2020/2020-06-09_IT101-DM/slides/img/3pillars-full.png
\ No newline at end of file
2021/2021-04-20_IT101-DM/slides/img/DinoSequentialSmaller.gif
0 → 120000
View file @
15d837c7
../../../../2020/2020-06-09_IT101-DM/slides/img/DinoSequentialSmaller.gif
\ No newline at end of file
2021/2021-04-20_IT101-DM/slides/img/Git-logo.png
0 → 120000
View file @
15d837c7
../../../../2020/2020-06-09_IT101-DM/slides/img/Git-logo.png
\ No newline at end of file
2021/2021-04-20_IT101-DM/slides/img/LCSB_storages_backed-up.png
0 → 120000
View file @
15d837c7
../../../../2020/2020-06-09_IT101-DM/slides/img/LCSB_storages_backed-up.png
\ No newline at end of file
2021/2021-04-20_IT101-DM/slides/img/LCSB_storages_backup.png
0 → 120000
View file @
15d837c7
../../../../2020/2020-06-09_IT101-DM/slides/img/LCSB_storages_backup.png
\ No newline at end of file
2021/2021-04-20_IT101-DM/slides/img/LCSB_storages_full.png
0 → 120000
View file @
15d837c7
../../../../2020/2020-06-09_IT101-DM/slides/img/LCSB_storages_full.png
\ No newline at end of file
2021/2021-04-20_IT101-DM/slides/img/LCSB_storages_personal-crossed.png
0 → 120000
View file @
15d837c7
../../../../2020/2020-06-09_IT101-DM/slides/img/LCSB_storages_personal-crossed.png
\ No newline at end of file
2021/2021-04-20_IT101-DM/slides/img/R3_profile_pictures/alexey_kolodkin.png
0 → 120000
View file @
15d837c7
../../../../../2020/2020-06-09_IT101-DM/slides/img/R3_profile_pictures/alexey_kolodkin.png
\ No newline at end of file
2021/2021-04-20_IT101-DM/slides/img/R3_profile_pictures/christophe_trefois.png
0 → 120000
View file @
15d837c7
../../../../../2020/2020-06-09_IT101-DM/slides/img/R3_profile_pictures/christophe_trefois.png
\ No newline at end of file
2021/2021-04-20_IT101-DM/slides/img/R3_profile_pictures/karim_chaouch.png
0 → 100644
View file @
15d837c7
283 KB
2021/2021-04-20_IT101-DM/slides/img/R3_profile_pictures/laurent_heirendt.png
0 → 120000
View file @
15d837c7
../../../../../2020/2020-06-09_IT101-DM/slides/img/R3_profile_pictures/laurent_heirendt.png
\ No newline at end of file
2021/2021-04-20_IT101-DM/slides/img/R3_profile_pictures/maharshi_vyas.png
0 → 120000
View file @
15d837c7
../../../../../2020/2020-06-09_IT101-DM/slides/img/R3_profile_pictures/maharshi_vyas.png
\ No newline at end of file
2021/2021-04-20_IT101-DM/slides/img/R3_profile_pictures/nene_barry.png
0 → 100644
View file @
15d837c7
18.2 KB
2021/2021-04-20_IT101-DM/slides/img/R3_profile_pictures/noua_toukourou.png
0 → 120000
View file @
15d837c7
../../../../../2020/2020-06-09_IT101-DM/slides/img/R3_profile_pictures/noua_toukourou.png
\ No newline at end of file
Prev
1
2
3
4
Next
Write
Preview
Supports
Markdown
0%
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment