SeqSphere+ can be used to download FASTQ files from NCBI Sequence Read Archive (SRA). Invoke the function Tools | Download FASTQ from SRA to open a dialog window and enter or import the NCBI accessions that should be downloaded.
The following types of accessions are supported (
(See more details at NCBI) The found metadata for the given accessions are shown in a table, where each row represents one SRA run. Each SRA run belongs to a SRA experiment, each SRA experiment belongs to a SRA sample, and each SRA sample belongs to a SRA study. A SRA experiment can contain multiple SRA runs done from the same library. A SRA sample can contain multiple SRA experiments and it is usually not a good idea to assemble reads across various experiments. All SRA samples have a Sample Alias and most SRA samples have a Strain Name and a Sample Title that all must not be unique. By default the Strain Name is taken as SeqSphere+ Sample ID and the FASTQ File Name Trunk. If the SRA sample has no Strain Name attached then the Sample Alias or the Sample Title is taken instead. The underscore and other special characters (e.g., !, :, /) are in the SeqSphere+ Sample ID and the FASTQ File Name Trunk replaced against empty space (the unchanged names are stored in the searchable Strain and Alias ID(s) SeqSphere+ data fields, respectively). In addition the SRA run accession is attached with a leading underscore to the FASTQ File Name Trunk. The SeqSphere+ Sample ID determines how the downloaded reads are treated. If there are potential problems with the Sample ID, context-sensitive warnings are shown below the table in the left corner of the window. Downloading FASTQs and metadata with default settings would result in assembling multiple SRA runs of the same SRA experiment together once a pipeline with default file naming parameters would be started. However, also multiple SRA experiments of the same SRA sample would be assembled together. To avoid this latter case either use the 'Append SRA experiment accession to SeqSphere+ Sample ID' checkbox option to assemble the experiments separately or apply the 'SRA experiments' filter. Similar, if there would be SRA samples with the same Strain Name also those reads would assemble wrongly together. In this case either use the the 'Append SRA experiment accession to SeqSphere+ Sample ID' checkbox option to assemble the SRA samples separately or apply the 'SRA samples' filter. Finally, in the unlikely case that two SRA samples have no Strain Name but the same Sample Alias, a warning 'Different SRA samples with the same SeqSphere+ Sample ID were found!' would be shown. Again the issue could be solved by using the 'Append SRA experiment accession to SeqSphere+ Sample ID' checkbox option to assemble the SRA samples separately. The table can be filtered using the button Filter Settings. Four different filters are available:
When the FASTQ files are downloaded from SRA, SeqSphere+ creates automatically a SPEC file with the same file name, that contains all available metadata. After the FASTQ files are downloaded, they can be processed using a SeqSphere+ Pipeline. The metadata from SPEC files is automatically imported by the pipeline. SeqSphere+ first tries to download the SRA file via a direct https download and then creates a FASTQ file using the SRA toolkit (fastq-dump) for conversion of the file. If this approach fails for whatever reasons, then the SRA toolkit is also used to retrieve and download the FASTQ file (which takes normally longer than the direct download). Get a List of Available Run Accessions of a Certain SpeciesA list of accessions for all available SRA sequences of a certain species, can be downloaded from the SRA website using the following steps:
The list of run accessions can be entered in the SeqSphere+ Tools | Download FASTQ from SRA dialog to download the metadata and the FASTQ files. The metadata could also be exported and a tool like MS Excel could be used to filter and/or sort (e.g., for country and/or time) the run accessions. The remaining run accessions could then be entered again in the SeqSphere+ Tools | Download FASTQ from SRA dialog to download and process only the data of the samples required for analysis. |