iAssembler (current version: v1.3.3 - 04/06/17)
Introduction
iAssembler is a standalone package to assemble ESTs generated using Sanger and/or Roche-454 pyrosequencing technologies into contigs. The pipeline gives much higher accuracy in EST assembly than other existing assemblers by employing an iterative assembly strategy and automated error corrections of mis-assemblies. iAssembler first performs iterative assemblies using MIRA and CAP3 (default: four cycles of MIRA assemblies followed by one CAP3 assembly) to correct assembly errors (mostly sequences derived from the same transcript fail to be assembled together) which occur frequently in just one round of assembly. The program then performs post-assembly quality checking by 1) aligning each EST sequence to its corresponding unigene sequence to identify mis-assemblies; and 2) performing all-verus-all pair-wise sequence alignments of unigenes to identify sequences derived from same transcripts that fail to be assembled together. The identified mis-assemblies are then corrected by the program automatically.
From version 1.32, iAssembler (64bit version) can be used to do second assembly of transcriptome produced by a transcriptome assembler (e.g. trinity).
Citation:
Zheng Y, Zhao L, Gao J, Fei Z. (2011) iAssembler: a package for
de novo assembly of Roche-454/Sanger transcriptome sequences. BMC Bioinformatics 12:453
Check the short presentation on iAssembler for more information.
System requirement and dependencies
Release notes
- iAssembler v1.3.3 - 04/06/17. Changes from previous version:
- Add -c parameter for processing reads from strand-sepcific library
- iAssembler v1.3.2 - 07/20/12. Changes from previous version:
- Fixed a small bug in parsing megablast result to get sequence length
- iAssembler v1.3.1 - 06/04/12. Changes from previous version:
- Support MIRA v3.4.0. iAssembler stops supporting older versions of MIRA
- Fixed a bug in parsing megablast result
- iAssembler v1.3 - 05/04/11. Changes from previous version:
- Add a function to correct unigene base errors
- Add headers to the output SAM file
- iAssembler v1.2.2 - 03/28/11. Changes from previous version:
- Replaced MIRA with a newer version (V2.9.43 -> V3.2.0).
- Fixed several other small bugs
- iAssembler v1.2.1 - 12/02/10. Changes from previous version:
- Fixed a small bug - [-e can't be less than 6].
- iAssembler v1.2 - 08/02/10. Changes from previous version:
- Compatible with MIRA version 3.x
- iAssembler v1.1 - 06/23/10. Changes from previous version:
- Fixed the error that caused EST clustering to fail for datasets containing highly redundant sequences
- Fixed several other small bugs
- iAssembler v1.0 - 05/21/10. Changes from previous version:
- Added an output file in SAM format. The file contains the alignment information of each sequence read to its corresponding unigene and can be views by several visualization programs such as Tablet and IGV.
- Combined percent identity cutoff for clustering (-x) and assembly (-p) into a single parameter (-p). Parameter -x is disabled
- Disabled clustering using blastn. Currently only megablast is used for clustering. Parameter -b now has different meaning (see below)
- Added -b parameter which specifies the number of threads used for MIRA assembly program
- Added -d parameter to control whether to generate program log files
- iAssembler v1.0 (beta) - 04/13/10
Installation
Installation of iAssembler is straightforward. Just download the appropriate version of iAssembler for your system and uncompress the downloaded file.
shell$ tar -xzvf iAssembler-1.0.x32.tar.gz
|
This will generate a folder named "iAssembler-1.0.x32" on a 32-bit machine or "iAssembler-1.0.x64" on a 64-bit machine (we call this folder "iAssember home folder"). iAssembler home folder includes two subfolders, a "bin" folder which contains all executables and a "doc" folder which contains the program documentation and the example configure file (see below). The home folder also contains a perl script, iAssembler.pl, which is the core script to run the whole iAssembler pipeline.
Run iAssembler
Quick Start
- Put the EST sequence file in FASTA format (assuming the file name is input_EST_seq) into iAssembler home folder
- Go to iAssembler home folder and run iAssembler with the following command
shell$ perl iAssembler.pl -i input_EST_seq |
- The program will generate an output folder named input_EST_seq_output which contains all the output files. See below for the description of the output files.
Input files
iAssembler takes a sequence file in FASTA format, and optionally the corresponding sequence quality file, as its input. The sequences must be processed and cleaned by removing low quality regions and sequences derived from adapters, vectors, rRNAs, tRNAs, as well as sequences from other organelles such as chloroplast and mitochondrion. iAssembler itself does not provide functions to clean and trim raw sequences. Two such programs are lucy and seqclean.
Parameters
(Note: Based on our experiences, the default settings of iAssembler program can achieve very high quality assemblies for most Sanger and/or 454 ESTs.)
Section 1: Input parameters
-i | [String] | Name of the input sequence file in FASTA format (required) |
-q | [String] | Name of the quality file in FASTA format (default: none) |
Section 2: Assembly parameters
-a | [Integer] | number of CPUs used for megablast clustering (default = 1) |
-b | [String] | number of CPUs used for MIRA assembly program (default = 1) |
-e | [Integer] | maximum length of end clips (6~100; default = 30) |
-h | [Integer] | minimum overlap length (>=30; default = 40) |
-p | [Integer] | minimum percent identify for sequence clustering and assembly (95~100; default = 97)
|
-m | | disable cap3 and mira |
-c | | only for sequences assembled from strand specific RNA-seq |
Section 3 : Output parameters
-u | [String] | prefix used for IDs of the assembled unigenes (default = UN) iAssembler names the resulted unigenes with a prefix and trailing numbers, e.g., UN00001 |
-l | [Integer] | length of the trailing numbers in unigene IDs (>= default; defalut = number characters of the maximum number assigned to unigenes)
For example, if the maximum trailing number assigned to the resulted unigenes is 5000, then the default of -l is 4 ('5000' has 4 characters). In this case users can set a number greater than or equal to 4. |
-s | [Integer] | start number of unigene ID trailing number (>= 1; default = 1) |
-o | [String] | Name of the output directory (default = "input file name" + "_output") |
-d | | Produce log files. With this parameter will produce log files in the output folder |
Output files
iAssembler generates five files and a "log" folder (if -d is supplied) in the output directory.
- unigene_seq.fasta
Unigene sequences (FASTA format) gerenated from the EST assembly process.
- unigene.sam
A SAM format file containing the alignment information of each sequence read to its corresponding unigene. The file can be viewed by Tablet, IGV, and many others...
- contig_member
A tab-delimited txt file containing unigenes and their corresponding EST members.
- unigene_mp
A tab-delimited txt file containing the mapping details of EST members to their corresponding unigenes
EST ID | EST Length | Uningene ID | Unigene length | Query Start | Query End | Hit Start | Hit End | Strand | % Identity |
EST0001 | 116 | UN0001 | 1195 | 11 | 108 | 650 | 747 | 1 | 100.00 |
- member_position_stat
A tab-delimited file containing the summary statistics of aligning ESTs to their corresponding unigenes.
- log folder (if parameter -d is supplied)
A folder containing all the log files from the program
Download
Current version of iAssembler is v1.3.3. It's available for only 64-bit linux systems.
Download iAssembler from the ftp server
Note: For large dataset, 32-bit CAP3 can run into the "out of memory" problem. In this case please use the 64-bit version of iAssembler.
Contact
For questions and suggestions, please contact us at bioinfo@cornell.edu
|
|