ContentsPreparationBefore whole genome sequence data can be imported a Project with at least one Task Template for whole genome sequencing data must be existing in the database. Important: When working with read files (FASTQ) the Pipeline Mode should be used instead. Choose Input WGS DataUse the menu function File | Process Assembled Genome Data to open a dialog window. A Project and at least one Task Template must be selected. If a single Task Template is selected, the process can be limited to specific targets using the Define Targets checkbox. In the Input Sequence Data section the files with whole genome sequence data can be selected. It is possible to either
Allowed input file formats are FASTA, GenBank, SAM/BAM and ACE-files. If SAM/BAM files do not contain a reference sequence, a dialog windows opens that allows to specify a FASTA-file with the reference sequence. The sequence names in the FASTA-file must match the names in the SAM/BAM file. If on or more files are added file to the list, a dialog with details are shown. This allows to view and edit:
After confirming, the files are added with the as samples to the preview table. The icon is shown, if the Sample ID does not exist in the project yet. If the Sample ID is already existing in the project, the icon of the existing sample is shown, and the right-click menu offers functions to open the existing sample(s) or to delete it directly. If the dialog is confirmed with Sample IDs of existing samples, a warning will be shown. The targets of the existing samples will be overwritten, if overwriting was enabled in the General Settings. Press the button Target Scan Procedure Details... to show the parameters:
Click OK to start the process. Scanning Targets in WGS dataNow the ref.-seqs. from the Task Template are scanned in the WGS data using the integrated BLAST. If a unique hit exists that succeeds the threshold that were defined in the Task Template or overwritten in the Scanning Procedure Details, the target is found. Hint: For SAM/BAM files (they contain reference-mapped data) a special consensus caller is used. Reads with a mapping quality below a given threshold (default 10) are discarded when the SAM/BAM file is read. The threshold can be set in the Preferences. Preview the Found TargetsIf the batch-mode was disabled, a table with all found hits is shown per input data file. Each row in this table represents one target that was searched. The rows that are highlighted red do not fulfill the defined thresholds. Rows for targets that already exist in a Sample with the same name are disabled. To enable overwriting of existing target sequences, mark the checkbox Allow to replace existing targets. The first column of the table shows a checkbox that defines if the found region should be extracted as sequence for the searched target. By default only the targets that fulfill thresholds unambiguously, and that are not already found in an existing Sample are selected. The thresholds can be changed in this preview. The selection marks in the first column are updated automatically. The selection marks can also be changed manually row by row. Press the confirm button at the bottom of the window to create the new Samples, or to extend existing ones. Importing the Found TargetsNow the regions that match to the found targets are extracted from the input data, and added to new or existing Samples. If the input data contains the read information (ACE/BAM file), the aligned reads for this are also extracted and imported corresponding to the advanced settings. However, with default settings the read data will be discarded if the target succeeds all analysis checks to reduce the disk storage size. |