Contents1 OverviewThe Target Parameters of a Task Template are an important feature of SeqSphere+. They are used configure the sequence processing and analysis algorithms for the target of the Task Template. Each target of a Task Template has its own Target Parameters. However, they can also be edited all at once. But in some cases it makes sense to change the parameters for a specific target. Some parts of the Target Parameters are only available for Sanger sequencing data Task Templates. Therefore following sections are marked with:
2 Base and Quality CallingThis dialog defines, if SeqSphere+ should perform quality/base/heterozygote callings on all chromatograms that are added to a target. This feature is only available for Sanger sequencing data. 2.1 Perform base quality callingSeqSphere+ can calculate base qualities for each base call in the reads. The quality values are related to the error probability of the base calls. This operation will overwrite any existent base qualities that were calculated before and stored in the read files. It is recommended to enable this setting.
2.2 Perform new base callingSeqSphere+ has a built-in algorithms to perform a base calling on chromatogram files. However, it is recommend to use the standard base-calling software (if available) that shipped with your sequencing machine, unless your results are poor. 2.3 Perform heterozygote callingSeqSphere+ can detect heterozygote base calls in chromatograms. The heterozygote bases are called as ambiguities. The parameter for the second peak height/area can be used to configure the detection of the mixed base positions. By using the last option the heterozygote caller will be instructed to call only heterozygote bases, if a certain minimum regional (10 bases before and after the position) average base quality is surpassed. 3 Read TrimmingThis dialog defines trimming and processing options for the reads. This feature is only available for Sanger sequencing data. 3.1 Perform fixed length trimmingCuts a fixed length from the beginning of the sequence, and cuts the end of the sequence to trim it to a given length. 3.2 Perform read 5'/3' trimmingTrims the 5' and/or the 3' read region by base qualities or by occurrences of the ambiguity N. For better assembly results, it is recommended to perform at least trimmings by N's. If both trimming functions are selected, the operations will be performed consecutively. 3.3 Read Rejection ThresholdsThree thresholds can be set to define a minimum quality a read must meet to be used in an assembly. If a read does not surpass these thresholds, it will be marked as rejected in the Navigation tree. All thresholds refer to the trimmed sequence, if trimming was enabled.
4 Read AssemblingThis dialogs defines the consensus calling and assembling functions. This feature is only available for Sanger sequencing data. 4.1 Consensus callerThe consensus caller for the contigs of an assembly can be chosen here. Currently there are five different consensus callers available. 4.1.1 Quality Consensus Caller Without Resolving of AmbiguitiesThis algorithm calculates the consensus base for a sequence column with Bayes formula. The quality of the read bases and the read orientation are used in this calculation. In a first step separate quality values for forward and reverse directions are calculated with Bayes formula. In a second step Bayes formula is used again to calculate a combined consensus quality value. Ambiguity bases are counted as separate base types and are not resolved. Each gap is given the quality of 20, and if the quality sum of the gaps exceeds the quality sum of all base calls, the consensus will have a gap at this position. This consensus calling method should be used for input sequences with possible heterozygous positions (determined by a heterozygot caller) and with quality values related to an error probability by the formula <math> quality = -10 * \log_{10} (error-probability) </math> 4.1.2 Quality Consensus Caller With Resolving of AmbiguitiesThis algorithm is very similar to the first one (Quality Consensus Caller Without Resolving of Ambiguities). The only difference is the treatment of ambiguities: ambiguous bases are resolved here, for example a S counts as C and G. This consensus calling method should be used for input sequences (usually with a heterozygot caller turned off) with quality values related to an error probability by the formula <math> quality = -10 * \log_{10} (error-probability) </math> 4.1.3 Majority Consensus CallerThis algorithm gives each base the same weight. The consensus base is determined by the majority base type of a column. Ambiguities cast a vote for each possible base call they represent, for example an S counts for C and G. If there is no unique majority base, an ambiguity including all bases will be called. If there are 66% or more gaps at one column position, the consensus will have a gap at this position. This consensus calling method should be used for input sequences without quality values 4.1.4 Strict Consensus CallerThis algorithm performs a consensus calling allowing only a very limited number of mismatches. In general the algorithm performs a majority consensus call. However, all consensus columns with a coverage of only one read get a N consensus base. The table below shows the maximum allowed number of mismatches in a column for a given coverage. More mismatches will lead to a N call. All ambiguous base calls are treated as N. If there are 66% or more gaps at one column position, the consensus will have a gap at this position. This consensus calling method should be used for forensic DNA typing 4.1.5 Inclusive Consensus CallerThis algorithm includes every base type in a column. The consensus base is the smallest ambiguity base covering all read bases. If there are 66% or more gaps at one column position, the consensus will have a gap at this position. 4.2 Minimum read overlapThis parameter can be used to configure the minimum overlap that two reads must share, before they are aligned together. Under normal circumstances the default value of 50 works fine. 4.3 Mismatch / Gap Opening / Gap Extension PenaltiesThe penalty values for the assembling algorithm. The defaults are 3 / 5 / 2. 4.4 Assemble to ref.-seq. onlyInstead of aligning all reads to each other in an assembling process, the reference sequence only can be used for guiding the building of an assembly (this will speed up the process).
5 Reference Sequence5.1 Reference SequenceDefines a reference sequence (ref.-seq.) for the target. The main usage of a reference sequence is to orientate and crop a contig and to define variant positions. The reference sequence can be imported from a file or pasted from the clipboard. If the Task Template was created from a seed genome, the ref.-seq. is taken from the seed core gene of this target. 5.2 Layer SettingsA layer defines a coding area that does not need to be continous. Each layer can consist of multiple areas. 5.2.1 ref.-seq. AreasAreas are particularly labeled continuous regions in a reference sequence.
They can be imported from GenBank sequence file, or they can be created manually.
Areas are orientated (forward or reverse) and they can be marked as translatable.
5.2.2 ref.-seq. LayersLayers are used to group one or more non-overlapping areas of a reference sequence.
By default, the first layer always covers the whole reference sequence.
5.2.3 Different Position Specifications
5.3 Contig Settings5.3.1 Contig SignaturesDefines the 5' and the 3' signature for a contig target. For convenience the signatures can be copied directly from a selection made in the reference sequence (buttons 5' and 3'). In addition the signatures can be imported from a multiple alignment file by using the From Sequence Library button. Again an inclusive consensus will be called and the beginning and the end of the consensus can be used as signatures. The Including signatures check-box can be used to define, if the signatures itself should belong to the contig, or not. The signatures may contain ambiguities. In this case, the signatures will match to every base that is expressed by this ambiguity, and to the ambiguity itself. If the signature should not match the ambiguity but only to the different bases, the base characters must be grouped with [ ]. Example: W matches to A,T or W; but [AT] matches only to A or T. 5.3.2 ref.-seq. Alignment SettingsThe gap opening penalty, and the gap extension penalty for the alignment between the consensus and the ref.-seq. can be set. 5.3.3 Contig Cropping and OrientationThe signatures and the reference sequence can be used to orientate and to trim the contig automatically.
6 Target QC ProcedureAll targets are automatically checked for the quality issues that were defined in the Analysis Parameters of the Task Template. Each check can set to
But default, the parameters are the same for all targets, but they can also be defined targets-specific. If a target has multiple sequences and if this is allowed by the first check), all further checks are performed for the first one only, The checks always work on the areas of the layers that are defined for the target in the task template. Usually (e.g., in cgMLST), a target has only one layer with a single area that covers the whole target sequence, and defines its orientation (forward/reverse). Therefore, the terms 'layer' and 'consensus area(s)' are in most cases equivalent to the target sequence. The following checks can be set to Ignore/Warning/Error:
6.1 Processing Options
7 View Settings DefaultsThese settings can be used to define which views should be opened initially for the contigs of the according Task Template. 7.1 Contig Alignment
7.2 Additional Views
7.3 Information Tables
7.4 Task Entry Overview
|