The complete pipeline#
Let us build the complete pipeline, then alter it to demonstrate versatility.
Note
This pipeline uses workflow elements that were built with linux/macOS in mind, and as such may not run on Windows at this time.
Construct the workflow#
Open GRAPEVNE (or clear the graph view).
Select (vocpl) from the repository filter. Here you will find modules
corresponding to the different stages of our pipeline. Following the workflow
presented on the previous page, drag the following items into the graph-view
and connect them together, taking care to connect them in the correct order
using the In or fasta inputs (do not connect the seeds inputs). Use
the fasttree module for maximum likelihood estimation.
Your workflow should now contain the following nodes:
subsample_alignment
nextalign
fasttree
treetime
dta
That’s our working pipeline complete! We now need to provide two pieces of
information: a seeds.txt file, containing the seeds for our analysis, and
the beta.fasta file, which is available through controlled access via GISAID
(if you are attending a GRAPEVNE workshop then this file will be provided
in-person).
In order to include a file (the beta.fasta file) from your local file system
we have some choices, but the simplest is to drag the LoadFile module into
our graph at the top of the workflow and connect it to the fasta input.
Within this module, ensure that the path to the local file is correctly
specified.
Finally, we need to provide the seeds for our analysis. We will do this by using
a module that supplies a seeds-file, provide_seeds. Conceptually, this module
could be replaced with a local file, a database query, or a prompt which asks
the user which seeds to use. However, we are demonstrating that resources
can be provided along with our workflow modules (in-fact, all scripts that are
used in this analysis are also resources which are downloaded automatically
when the workflow is run).
We now need to connect the provide_seeds module in to our workflow. But where
should we connect it? While intuitively you might want to connect it to the
first module (provide_seeds), or indeed connect it to all modules (which would
also work), we actually only need to connect the provide_seeds module in to
the final module of the workflow: the dta module in our case. This is
because snakemake is a build-system, which means that we specify the desired
output, and snakemake will work out the necessary steps to provide that result.
Our graph acts to provide a clear sequence of steps that must be undertaken
to take us from the provide beta.fasta file, to the desired trait anlaysis
for our seeds of interest (dta and provide_seeds).
You workflow should now be ready to run. Select Build & Test and then wait
for the workflow to finish (this should take less than 10 mins on a modern
laptop).
Change the tree estimation method#
By now the workflow should have completed and you should have access to the various files generated during the run.
Let us now imagine that we want to change the workflow and try out a different
maximum likelihood method for tree estimation. In particular, we want to replace
the fasttree module with another: the iqtree module.
Delete the fasttree module. Drag in and connect up the iqtree module. Done.
Re-run the workflow.