Verified Commit 159bed92 authored by Laurent Heirendt's avatar Laurent Heirendt
Browse files

move parts of integrity to external

parent 444f29d6
layout: page
permalink: /internal/encryption/disk/
shortcut: encryption:disk
- /cards/encryption:disk
- /internal/cards/encryption:disk
# Encrypting the Startup Disk for Your Laptop/Desktop
Encrypting an entire hard disk is an effective protective measure against computer theft and loss. In this lab card we provide instructions for switching on disk encryption on macOS and Windows platforms.
**IMPORTANT NOTICE:** One important requirement of using Encryption is that you must manage your Encryption Passwords/Keys. Failing to do so will mean **loosing your data indefinitely**. In this [link](../../passwords/) we list tools that can be used for password management. **Please make sure you have arrangements for password management before starting the encryption of data**.
## macOS
The disk encryption feature of macOS is called **FileVault**. When you switch **FileVault** on for the first time, it will initially encrypt all of your existing data, and then it will ensure all new data is encrypted. The steps to enable **FileVault** are as follows:
1. Using Finder open up **/Applications/System Preferences** and Select **Security & Privacy**, then select the **FileVault** tab. If you have not enabled FileVault already, you should see the following <br/> ![Alt](disk_enc_mac_1.png "Title")
2. Select **Turn on FileVault**. If there are multiple user accounts on your Mac, you will be asked whether you want to enable File Vault for those users. If you choose to enable File Vault for users (other than the one currently logged in), you'd be prompted for passwords. <br/> ![Alt](disk_enc_mac_2.png "Title")
3. FileVault links encryption and decryption of the disk to your login password. Additionally, it lets you create a recovery key, in case you forget your password. It is advised to create a recovery key (see below), which will be a 24 character alphanumeric key displayed to you at the time of creation. It is important that you keep this key somewhere safe (e.g. in your password manager). <br/> ![Alt](disk_enc_mac_3.png "Title")
## Windows
Windows's native feature for disk encryption is called **BitLocker**. Please note that not all Windows versions support it, **BitLocker** is available on Win 10 (Edu, Pro, Ent), Win 8 (Pro, Ent), Win 7 (Ent, Ult). In order for BitLocker to work, you need to ensure that a hardware configuration, called Trusted Platform Module (TPM), is enabled on your computer's motherboard. The TPM settings will be checked at the very beginning of **BitLocker** setup and you will be notified if they are not met, this is likely to happen if you're using Win 7 platform. If you receive the below message during BitLocker setup then please contact [University IT Support](
> **"TPM is disabled on your computer. Please enable TPM in your computer BIOS to install BitLocker"**
The steps to enable **BitLocker** are as follows:
1. Login as administrator. From the **Control Panel**, select **BitLocker Drive Encryption** and click **Turn on BitLocker**. <br/> ![Alt](disk_enc_win_0.png "Title")
2. BitLocker will check whether your computer meets the disk encryption requirements (including TPM described above). If requirements are met and TPM is already switched on, then the setup will take you **BitLocker Startup Preferences** Steps-7 onwards. If TPM is switched OFF, then Steps 3-6 will also need to be followed.
3. The setup wizard will list two tasks that will be performed. First Turning on TPM and, second, encrypting the drive (see below). <br/> ![Alt](disk_enc_win_1.png "Title")
4. You will be prompted to remove any CDs, DVDs, and USB flash drives from your computer. Do so, and select **Shutdown** as seen below. <br/> ![Alt](disk_enc_win_2.png "Title")
5. Once your computer is shutdown completely, turn it back on. You will be asked for confirmation of TPM setup. Confirm that you want to perform this action.
6. Your computer may shut down once after the confirmation, turn it back on.
7. The **BitLocker Setup wizard** will pop up automatically and show that first of the two setup tasks is complete (see below). <br/> ![Alt](disk_enc_win_3.png "Title")
8. You will be shown **BitLocker startup preferences** screen seen below. Select the option to **Require PIN at every startup**. <br/> ![Alt](disk_enc_win_4.png "Title")
9. You will be asked for a PIN, provide it. Remember that this PIN will be asked each time your computer is (re)started. This will happen before you're asked for your login password. <br/> ![Alt](disk_enc_win_5.png "Title")
10. In the next step BitLocker will generate for you a 48 digit recovery key, and you will be asked for how you want to store it. You will be given the option to either print the key or save it in a USB drive. Also, it is a good idea to store this key in your password manager.
11. You will need to restart your computer. Disk encryption process will then start in the background, you may continue to work during this process.
In case you want to change your start-up PIN, you can do this by going to **Control Panel/BitLocker Drive Encryption** and then selecting **Manage BitLocker**, and then from the available options select **Change PIN**.
layout: page
permalink: /internal/encryption/file/
shortcut: encryption:file
- /cards/encryption:file
- /internal/cards/encryption:file
# Encrypting Files and Folders
Encryption is an effective measure to protect sensitive data. In this lab card we provide instructions for file/folder encryption on platforms commonly used by LCSB staff.
**IMPORTANT NOTICE:** One important requirement of using Encryption is that you must manage your Encryption Passwords/Keys. Failing to do so will mean **loosing your data indefinitely**. In this [link](../../passwords/) we list tools that can be used for password management. **Please make sure you have arrangements for password management before starting the encryption of data**.
## macOS
The built-in mechanism for file-level encryption on a Mac is Encrypted Disk Images (*.dmg* files). In order to create a disk image:
1. Using Finder open up **/Applications/Utilities/Disk Utility**
2. From the Disk Utility menu select **/File/New Image** <br/> ![Alt](enc_mac_1.png "Title")
3. You will have two options; either to create a **Blank Image** or to create an **Image from Folder**. Choose the option that fits your situation.
4. You will be prompted for configurations for the image. <br/> ![Alt](enc_mac_2.png "Title")
5. Make the following settings
- Set a name for your image (also set a size if this is a blank image),
- Set a format for the image. Format should be _read/write_ for blank images. When creating an image from a folder to an image the format can be _read/write_ or _read_ only.
- Turn encryption on by selecting either 128 or 256 bit option, you will then be prompted for a _password_ for your image. Provide a password and save.
6. A _.dmg_ file will be created with the name you supplied. <br/> ![Alt](enc_mac_3.png "Title")
7. Whenever you want to load the image you will be prompted for the password. Remember to eject the disk image when you're not accessing the files within. <br/> ![Alt](enc_mac_4.png "Title")
## Linux
Make sure you have all relevant data in a single file. In case you have multiple files, put them in a folder and create a compressed archive (aka tarball).
tar cvzf your-compressed-file-name.tar.gz your-directory/
You can use the below command to encrypt a file on Linux.
gpg -c file_to_be_encrypted
You will be asked for a passphrase.
Enter passphrase:<YOUR_PASSWORD>
Repeat passphrase:<YOUR_PASSWORD>
The following command can be used to decrypt the file.
gpg encrypted_file
Instead of using a passphrase, you can also encrypt files using an encryption key. You can use GPG to create an encryption key as follows.
gpg --gen-key
If the above command hangs for a long time, and complains about _entropy_ then run the following commands and then re-run key generation.
yum install rng-tools
rngd -r /dev/random
## Windows
On Windows, file level encryption can be achieved using the Encrypting File System (EFS) feature. Note that EFS is only available on Windows 10.
In order to use turn on EFS for a folder:
1. Using **File Explorer** locate the folder you want to encrypt. Right click and select **Properties**.
2. Select **Advanced**. From the **Advanced Attributes** screen check the option **Encrypt contents to secure data** and click **OK**, then **Apply**. If this option is appearing dimmed or disabled please contact [University IT Support]( <br/> ![Alt](enc_win_1.png "Title")
3. When prompted select the option to apply changes (encryption) to **subfolders and files** and click **OK**.
4. Notice that this process does not ask you for a password as the files are protected with a key enabled only when **you** login. When other users, including admins, login to your machine they will not be able to see the contents of encrypted folders/files.
5. When you enable EFS on your machine Windows will start prompting you to backup your **encryption key**. It is advised to backup, as you may not be able access encrypted folders after a Windows re-install.
6. When prompted for backup, choose **Backup now**. This will take you to the **Certificate Export Wizard**. for the export format select **Personal Information Exchange (.PFX)** also select **Enable certificate privacy**. <br/> ![Alt](enc_win_2.png "Title")
7. In the **Security** step select **Password** and set a password on the encryption key.
8. In the final step navigate to the location you want the key backup (the _.pfx_ file) to be stored. This could be a USB drive or your personal ownCloud folder.
## Cloud Platforms
As per LCSB Policy, you should not store sensitive human data on commercial cloud services (e.g. Google Drive, Dropbox). However, there may be situations where commercial clouds are used:
- There is a project/consortium level agreement to use external cloud storage,
- You're working with sensitive data, and need to temporarily co-access it with research collaborators,
- You're working with non-sensitive data and using the cloud as a backup target.
In such cases, you may use the following desktop tools to encrypt cloud folders.
- [boxcryptor]( (Paid). If you're holding sensitive LCSB research data on commercial cloud (case 2 above), you must use Boxcryptor. Contact the LCSB IT team to request a license.
- [Installation/Mac](
- [Installation/Windows](
- [Sharing Folders](
- [Encrypt Folder](
- [Decrypt Folder](
- [Cyberduck]( (Free).
# Coming Soon
The Uni-LU HPC Team is planning to install [EncFS]( on the HPC clusters. EncFS allows for the creation of an encrypted volume (similar to a folder). EncFS provides transparent encryption, once you mount the encrypted volume, anything that goes into the volume will automatically be encrypted. Also, whenever you try to view or process a file in a mounted EncFS volume, it will be decrypted for you (behind the scenes) automatically.
We will provide instructions for EncFS once it becomes available .
layout: page
permalink: /internal/integrity/checksum/
shortcut: integrity:checksum
- /cards/integrity:checksum
- /internal/cards/integrity:checksum
# Ensuring Integrity of Data Files with Checksums
Integrity of data files is critical for the verifiability of computational and lab-based analyses. The way to seal a data file's content at a point in time is to generate a checksum. Checksum is a small sized datum generated by running an algorithm, called a cryptographic hash function, on a file. As long as a data file does not change, the calculation of the checksum will always result in the same datum. If you recalculate the checksum and it is different from a past calculation, then you know the file has been **altered** or **corrupted** in some way.
Below are typical situations that call for checksum generation:
* A data file has been newly downloaded or received from a collaborator.
* You have copied data files to a new storage location, for instance you moved data from local computer to HPC to start an analysis.
You want to create a snapshot of your data, for instance when you're creating a supplementary material folder for a paper/report. Note: If you snapshot your data by depositing it to a data repository. then typically checksum generation will be taken care of by the repository.
In the remainder of this section we provide instructions for checksum generation on macOS, Windows and Linux platforms.
## Linux and macOS
Command-line instructions for checksumming on Linux and macOS (Terminal) are common and is as follows.
shasum -a 256 name_of_file
Alternative to **shasum**, you may also use the commands **md5sum** (Linux) and **md5** (Mac).
An example execution of the **shasum** command is given below.
$ shasum -a 256 my_file.csv 0a1802c47c9c7fb29d8a6116dc40250c33321b56767125de332a862078570364 my_file.csv
The recommended practice when generating checksums is to forward the checksum datum to a file, ideally with the same name as the data file. An example is given below.
$ shasum -a 256 my_file.csv > my_file.sha256
The _.sha256_ file extension denotes the algorithm that generated the checksum. Other common extensions are _.sha1_ or _.md5_.
Given a data file and its checksum, one can **verify** the file against the checksum with the following command.
$ shasum -c my_file.sha256
my_file.csv: OK
Finally, it is important to note that checksum calculation uses a file's contents. When you create a copy of a file, the checksum calculator will generate the same datum for the copy. See an example below.
$ shasum -a 256 my_file.csv
0a1802c47c9c7fb29d8a6116dc40250c33321b56767125de332a862078570364 my_file.csv
$ cp my_file.csv copy_file.csv
$ shasum -a 256 copy_file.csv
0a1802c47c9c7fb29d8a6116dc40250c33321b56767125de332a862078570364 copy_file.csv
When you have several data files, checksum creation and verification needs to be automated. The following are free and open source utilities that can be used for this purpose:
* [Checksums]( macOS Automator workflow and shell script
* [md5deep]( Command Line utility for recursive checksum operations (requires compilation on your platform)
## Windows
CertUtil is the command-line tool that Windows provides for checksum calculation. It is available in Windows Version 7 onwards.
To access CertUtil, from the **Start** menu select **Command Prompt**. Go to the folder containing your data file (my_file.csv) and run the following command:
> certutil -hashfile my_file.csv SHA256
SHA256 hash of my_file.csv:
CertUtil: -hashfile command completed successfully.
You can run the **certutil** command by changing the last parameter, **SHA256** to **MD5** or **SHA512**.
You can direct the result of checksum operation to a file as follows:
> certutil -hashfile my_file.csv SHA256 > my_file.sha256
CertUtil does not provide an option to automatically verify a given checksum against a file. Therefore, in order to do verification, you'd need to re-run the **"certutil -hashfile ..."** command and manually compare the result with the earlier generated checksum.
[MD5Summer]( ) is a free MD5 checksum tool for Windows. It is operated via a GUI and can perform recursive checksumming of files in folders. A step-by-step tutorial on using MD5Summer is provided by the UK Data Archive [here](
layout: page
permalink: /internal/good-practice/file_naming/
shortcut: good-practice:file_naming
- /cards/good-practice:file_naming
- /internal/cards/good-practice:file_naming
# Naming files
(Re)Naming a file is very easy operation usually one or two clicks away (*right click+rename, F2, ...*). Maybe thats why people do not pay enough attention when choosing a proper file name even though it can have a big impact on their ability to find those files later and to understand what they contain.
Good file name follows three basic principles:
* machine readable
* human readable
* plays well with default ordering
## Machine readable
Special characters can have different meaning for different operation system or software. The most commonly found are
white characters like **space** or **tabulator**.
The only two which are recommended in file names are hyphen "**&#45;**" and underscore "**&#95;**". You can use underscore to separate and hyphen to combine.
File name
gives us already some information about date of creation (2013-06-26), assay (BRAFWTNEGASSAY), sample set (Plasmid-Cellline-100-1MutantFraction) and well (A01). While following names
are much more prone to misinterpretation.
#### Accented characters
Your language might be very rich on various accented or special characters
but both colleagues and your machines will have hard time to work with them.
Special letters like **&#231;**, **&#228;**, **&#244;**,
**&#283;**, **&#341;**, etc. require special encoding and might cause troublesome issues when used in file names.
Beware of typos and avoid using multiple names varying in small ways unless it has some true meaning. Following file names are distinct, but can you tell where exactly?
#### Exploiting machine readable names
You may already have a lot of files collected for your project or you have received big dataset from one of your collaborators. Then you might think about organizing and renaming them to be compliant with your new or existing naming policy.
If the names are consistent and you don't want to loose time renaming them by hand, you may try to use dedicated tools (e.g. [PSRenamer]( or simple commands in your command line (**rename** for Mac and Linux, **ren** for Windows).
Once your skills develop, you will be able to use machines and machine readable file names to perform advanced operations on them, e.g. search using regular expression.
Imagine folder with thousands of files. Running simple R command
flist <- list.files(pattern = "Plasmid")
will give you all file names containing word "Plasmid".
This result can be easily further processed into an awesome meta-data table by applying split in places of underscore and dot:
flist_df <- stringr::str_split_fixed(flist, "[_\\.]", 5)
names(flist_df) <- c("Date", "Assay", "Sample_set", "Well", "Format")
| Date | Assay | Sample_set | Well | Format |
| "2013-06-26" | "BRAFWTNEGASSAY" | "Plasmid-Cellline-100-1MutantFraction" | "A01" | csv |
| "2013-06-26" | "BRAFWTNEGASSAY" | "Plasmid-Cellline-100-1MutantFraction" | "A02" | csv |
| "2013-06-26" | "BRAFWTNEGASSAY" | "Plasmid-Cellline-100-1MutantFraction" | "A03" | csv |
| "2013-06-26" | "BRAFWTNEGASSAY" | "Plasmid-Cellline-100-1MutantFraction" | "B01" | csv |
| "2013-06-26" | "BRAFWTNEGASSAY" | "Plasmid-Cellline-100-1MutantFraction" | "B02" | csv |
| "2013-06-26" | "BRAFWTNEGASSAY" | "Plasmid-Cellline-100-1MutantFraction" | "B03" | csv |
Of course, similarly simple and powerful commands can be found in every programming language/interpreter (Python, Bash, ...)
#### Case sensitivity
It is generally recommended **not** to use upper case letters.
Firstly, matching patterns and splitting names with upper case letters is much harder and error prone. Another drawback might be the fact, that Windows file system is case insensitive (unlike Mac or Linux OS).
If you really want to extend hyphen-underscore semantic separation, you can use so called [**camelCase**]( - substituting spaces between words by upper-casing their first letters.
#### Machine readable names allow us:
* easily search for files later
* easily narrow file lists based on names
* easily extract info from file names, e.g. by splitting
Remember that the rules on machine readability apply also for naming your **folders** (now containing your nicely named files). In fact, it is a good practice to stick to these rules even when naming **variables** in your data files.
## Human readable
* Be specific. It is generally better to create longer file name which is fulfilling its purpose than using short abbreviations which might be hard to grasp by your colleagues, eventually by yourself after some time. Stay away from cryptic names and non-standard or unclear abbreviations.
| Bad named | Better name |
| ------------------------- | ----------------------------------------------------- |
| myabstract.txt | John-White_Sensitivity-of-PLFA-analyses_abstract.txt |
| samples_project_start.csv | PA324_samples_2019-12-11.csv |
| ms_cresp_final.doc | John-White_Cell-respiration-manuscript_2019-12-11.doc |
| fig_1.png | John-White_Cell-respiration_fig-1_2019-12-11.png |
* Usually, file extension is already telling you some information about the file itself.
Here are some examples of file names which are unnecessarily long and could be easily shortened:
* Never use suffixes (or prefixes) like **"final"**, **"old"**, **"new"**, **"current"**, **"obsolete"**, **"recent"**, **"latest"**, **"best"**...
File is hardly in such states and it will change sooner or later anyway.
* Name should naturally explain why the file exists. If you have to search for additional information (either asking your colleagues or reading some README files), the file name is probably not chosen properly. Name file in a way that even a total stranger could get it easily.
* Leave out meaningless or redundant words, e.g. "the", "and", "a", "file", "data" ...
* Do not be too creative, do not pun and stay professional. Bad examples:
bio-rect_UM.csv - data related to bio-reactors at University of Michigan
PEPA_d-pic.jpeg - a fourth picture from your paper on Performace Evaluation Process Algebra
#### Semantic versioning
If your files or documents change very often and you want to track the versions manually instead of using some sophisticated versioning software<!-- TODO: link to GIT howto-card -->, you might follow semantic versioning scheme widely used in software development.
It is based on adding several numbers, standard is 3, into a suffix of your file name where:
* first number called **MAJOR** version is increased once the document has undergone **significant changes**
* second number called **MINOR** version is incremented once some new information is added to the document or something is deleted
* last number called **PATCH** should refer to very minor changes like fixing of typos or rephrasing a sentence.
These can be be headed by the letter „V“ in order to indicate the following version information.
Human readable names allow us:
* easily understand what the file is and what it contains
* easily share files with others
## Default ordering
Inbuilt tools (e.g. file explorer) allows you to order files by name in alphanumerical order. Make the best out of this great feature.
* Put the terms in general-to-specific order. That way, you will have files grouped in logical order and related files will be naturally close to each other.
* Put the date first to get chronological ordering:
* Put number defining explicit order as first. Remember that the ordering is done by character, not by the whole number, so you might want to add leading zeros just to be sure that the ordering will be correct with growing number of your files.
## Dates
Including date in your file names allows you to sort them easily and find exactly the one you want in very short time.
Remember that recording dates using anything else than numbers (e.g. month abbreviations) can due to different language background result in formats like "*11dic2019*" or "*11Dez2019*", etc., which doesn't have to be recognized as date at all.
It is much better to use only numeric format but even then it can be written in endless variations which are hard to read or more importantly make them ambiguous, like date **11th of December 2019** in following examples:
Luckily, there is a standard for date format, YYYY-MM-DD ([*ISO 8601*](, which really nicely comply with all three principles above. Therefore, the **only** correct format of 11th of December 2019 is:
<!-- TODO: stability of names in shared repository which is not read-only - e.g. someone gets nuts and starts to rename everything. Dangerous if there is any analyses link directly to a file. -->
<!-- TODO: do some guidelines/rules/recommendations apply to different classes of files - source code, data, documents -->
## Final notes
When starting your project or creating a new repository, give yourself a time to set a proper naming design.
Remember that it should be also accepted by your teammates and other collaborators accessing the files.
To make dissemination of the naming design as easy as possible, don't forget to document it and include it into policies of your group/project.
Adopting proposed recommendations might seem like a lot of work now.
But the truth is that it will pay off once the projects get more complex and your skills will evolve. Choosing good names takes time but saves more than it takes.
If you don't agree with naming rules which are adopted in your group, follow them or make an effort to change it globally.
The **consistency** is much more important than your preferred naming.
# Resources
Jenny Brian's [slides]( on "Naming things" from Reproducible Science Workshop, Duke, 2015
Semantic versioning - [](
LCSB *IT101* training [presentation](
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment