Fei Bioinformatics Lab

SVFilter

Introduction

System requirement and dependencies

Installation

SV file format

Run SVFilter

Frequently Asked Questions (FAQs)

Download SVFilter

Contact

Introduction

Genomic structural variations (SVs), including large deletions, insertions, inversions, duplications and translocations, constitute an important source of genetic diversity. Recent advances in next-generation sequencing (NGS) technologies and computational algorithms have enabled the genome-wide mapping of SVs at a fine resolution. However, false discovery rate in the current SV discovery programs remains high. We have developed the following five filters that can be used to efficiently identify false SVs.

Ratio filter - filter based on the ratio of normal and abnormal reads. Within an anchoring window where abnormal reads cluster and form a SV, if substantial normal reads, which share the same orientation as the abnormal reads, are also present, then this SV is discarded.

SNV filter - filter based on the SNVs between normal and abnormal reads. Within an anchoring window, if SNV(s) can be detected between normal and abnormal reads, then this SV is removed.

Gap filter - filter based on gaps in the identified SVs. If an SV spans a genomic region that contains gap(s), then this SV is discarded. Such SV spans across at least two contigs or even two scaffolds.

Read coverage filter - filter based on the read coverage of the potential SV by normal reads. This filter can only be applied to deletion events. If a substantial fraction of the deleted region is covered by normal reads, then this candidate deletion is removed.

Sequencing depth filter - filter based on the sequencing depth of the potential SV region. This filter can be applied to tandem duplication events. If the average sequencing depth over the duplicated segment is not significantly higher than the genome-wide average, then the predicted tandem duplication is rejected.

Please check this figure for further explanation of these filters.

System requirement and dependencies

Linux (required) - Mac OS X is not supported
samtools

Installation

Download the SVFilter and unzip the downloaded file.

$ tar -xzvf SVFilter-1.0.tar.gz

This will generate a directory named "SVFilter-1.0". The directory contains three subdirectories:

bin directory: includes all executables.
test_files directory: includes all necessary input files to test the SV filter programs
src directory: includes C++ source codes.

The executables under the "bin" directory were pre-compiled on a 64-bit Linux machine. For a 32-bit Linux machine, user needs to compile the C++ source codes and then moves the executables to the "bin" directory. This can be done by running "install.sh" shell script (sh install.sh) provided in the package.

Next, add the "bin" directory to the environmental variable PATH.

SV file format

Each of the SV filters requires an input file in tab-delimited text format which contains a list of SVs (deletion, insertion, inversion, duplication, etc.). Here is an example of the file. Each SV contains 11 fields. Here is the explanation of each field:

field No.	field name	example value	notes
1	chromosome/scaffold ID	chr1	These four fields define the left anchoring window of the SV: where it locates (chromosome/scaffold and start and end coordinates), and orientation of the abnormal reads mapped within the window: R for reverse, F for forward
2	start position	18099267
3	end position	18099607
4	read strand	R
5	chromosome/scaffold ID	chr1	These four fields define the right anchoring window of the SV
6	start position	18100733
7	end position	18101053
8	read strand	F
9	number of abnormal pairs	4
10	abnormal read IDs	(FC42CA5AAXX:5:51:1514:1044#0, FC42CA5AAXX:5:95:188:1715#0, FC42CA5AAXX:5:23:1419:1684#0, FC42CA5AAXX:5:78:1132:1599#0)	This field lists the IDs of abnormal read pairs, one for each pair. The IDs must be enclosed within a parenthesis and separated by commas
11	SV type	DELETION	The type of the SV including DELETION, INSERTION, INVERSION and LARGE_DUPLI

Run SVFilter

The five filters implemented in SVFilter are run separately. Each filter generates two output files, one containing the list of SVs that pass the filter (kept) and one containing the list of SVs that are discarded by the filter. The file containing the kept SVs is in the format described above and can be used as the input for other filters. Click here for detailed description of the files containing discarded SVs.

Ratio filter - run the program:

$ ratiofilter test_SV normPair.sam 0.2 75