For very large data sets with thousands of samples it is recommended to use GrapeTree for drawing a Minimum Spanning Tree (MST).
Installation of GrapeTree on WSL
GrapeTree can be installed into a conda environment using the following command on Windows Subsystem For Linux (WSL).
- Step 1: Open the start menu, type
wsl -d ridom_ubuntu
and choose to execute it.
- Step 2: If the computer requires a proxy to access the Internet, the proxy configuration must be configured first in the WSL. This can be done by placing a file named
.condarc
in the Linux users home directory containing the proxy configuration. The file can be created with the following command replacing PROXYSERVER:PROXYPORT with your proxy-server and -port:
echo -e "\nproxy_servers:\n http: http://PROXYSERVER:PROXYPORT\n https: http://PROXYSERVER:PROXYPORT\n" >> ~/.condarc
- Step 3: When the black WSL console window has started up, enter the command:
conda create --name grapetree -c bioconda grapetree
Creating a GrapeTree MST from a SeqSphere+ Comparison Table
GrapeTree export function
- Step 1: Choose in the comparison table menu the function File | Export profile and metadata files for GrapeTree (tsv). This function will create two TSV files: one profile file, containing the allelic profiles and one metadata file, containing the epi metadata from the comparison table. To be accessible from the WSL, these files must be saved locally on your computer, i.e. on you C or D drive, not on a network drive.
- Step 2: If correctly installed, GrapeTree will be automatically started and create a NWK tree file. Then the GrapeTree local server will be started and a webbrowser will automatically open with URL http://127.0.0.1:8000/.
- Step 3: The GrapeTree page will be shown in the webbrowser. Press Load Files button and import first the NWK file and then the metadata file created by SepSphere+. Further information about loading the files and modifying the tree layout can be found in the GrapeTree tutorial.
Tree Topology
We noted slight differences in the tree topology between our MST and GrapeTree trees! Those differences are most likely due to different treatment of missing data and/or different tie-breaking rules.
Runtime and Memory usage
Following table contains the runtime and memory usage for calculation and visualization of Mycobacterium tuberculosis samples.
No. of Samples
|
Intel i7, 4 cores, WSL
|
Intel Xeon, 5 cores Linux
|
Intel Xeon, 10 cores, Linux
|
5k
|
2m (5GB)
|
2m (5GB)
|
1m (5GB)
|
10K
|
8m (18GB)
|
6m (18GB)
|
2m (18GB)
|
15k
|
14m(19GB)
|
10m (20GB)
|
7m (20GB)
|