Environmental metagenomics data analysis
Access the virtual machine:
Check what your home folder, it is pretty much empty:
We definitely need to configure shared conda environment as described in the useful tips. Activate the environment and check if you can run installed software:
Check data folder, what kind of files it contains?
The data analysis workflow is following: 1. Quality control with fastqc 2. Quality control reports aggregation with multiqc 3. If the quality is not satisfactory — trimming with trim_galore (minimum length of the read is 70, quality threshold — 20) 4. OTU (operational taxonomic unit) calling with MetaPhlan2 5. Summary plots generation with Krona 6. Basic analysis in R
with ggplots2
We can start with advanced snakemake example to produce fastqc
report, modifying it to contain more rules for our workflow.
Last updated