Fei Bioinformatics Lab

iAssembler (current version: v1.3.3 - 04/06/17)

Introduction

System requirement and dependencies

Release notes

Installation

Run iAssembler

Input files

Parameters

Output files

Download iAssembler

Contact

Introduction

iAssembler is a standalone package to assemble ESTs generated using Sanger and/or Roche-454 pyrosequencing technologies into contigs. The pipeline gives much higher accuracy in EST assembly than other existing assemblers by employing an iterative assembly strategy and automated error corrections of mis-assemblies. iAssembler first performs iterative assemblies using MIRA and CAP3 (default: four cycles of MIRA assemblies followed by one CAP3 assembly) to correct assembly errors (mostly sequences derived from the same transcript fail to be assembled together) which occur frequently in just one round of assembly. The program then performs post-assembly quality checking by 1) aligning each EST sequence to its corresponding unigene sequence to identify mis-assemblies; and 2) performing all-verus-all pair-wise sequence alignments of unigenes to identify sequences derived from same transcripts that fail to be assembled together. The identified mis-assemblies are then corrected by the program automatically.

From version 1.32, iAssembler (64bit version) can be used to do second assembly of transcriptome produced by a transcriptome assembler (e.g. trinity).

Citation:
Zheng Y, Zhao L, Gao J, Fei Z. (2011) iAssembler: a package for de novo assembly of Roche-454/Sanger transcriptome sequences. BMC Bioinformatics 12:453

Check the short presentation on iAssembler for more information.

System requirement and dependencies

Linux (required) - Mac OS X is not supported
Perl version 5.10.0 or higher (required). Perl was installed by default on most Linux systems
BioPerl version 1.006 or higher (required). Please check http://www.bioperl.org and wiki/Installing_BioPerl for more details on installation of BioPerl.
NCBI BLAST package (required). Provided in iAssembler.
MIRA assembly program (required). Provided in iAssembler.
CAP3 assembly program (required). Provided in iAssembler.

Release notes

iAssembler v1.3.3 - 04/06/17. Changes from previous version:

Add -c parameter for processing reads from strand-sepcific library

iAssembler v1.3.2 - 07/20/12. Changes from previous version:

Fixed a small bug in parsing megablast result to get sequence length

iAssembler v1.3.1 - 06/04/12. Changes from previous version:

Support MIRA v3.4.0. iAssembler stops supporting older versions of MIRA
Fixed a bug in parsing megablast result

iAssembler v1.3 - 05/04/11. Changes from previous version:

Add a function to correct unigene base errors
Add headers to the output SAM file

iAssembler v1.2.2 - 03/28/11. Changes from previous version:

Replaced MIRA with a newer version (V2.9.43 -> V3.2.0).
Fixed several other small bugs

iAssembler v1.2.1 - 12/02/10. Changes from previous version:

Fixed a small bug - [-e can't be less than 6].

iAssembler v1.2 - 08/02/10. Changes from previous version:

Compatible with MIRA version 3.x

iAssembler v1.1 - 06/23/10. Changes from previous version:

Fixed the error that caused EST clustering to fail for datasets containing highly redundant sequences
Fixed several other small bugs

iAssembler v1.0 - 05/21/10. Changes from previous version:

Added an output file in SAM format. The file contains the alignment information of each sequence read to its corresponding unigene and can be views by several visualization programs such as Tablet and IGV.
Combined percent identity cutoff for clustering (-x) and assembly (-p) into a single parameter (-p). Parameter -x is disabled
Disabled clustering using blastn. Currently only megablast is used for clustering. Parameter -b now has different meaning (see below)
Added -b parameter which specifies the number of threads used for MIRA assembly program
Added -d parameter to control whether to generate program log files

iAssembler v1.0 (beta) - 04/13/10

Installation

Installation of iAssembler is straightforward. Just download the appropriate version of iAssembler for your system and uncompress the downloaded file.

shell$ tar -xzvf iAssembler-1.0.x32.tar.gz

This will generate a folder named "iAssembler-1.0.x32" on a 32-bit machine or "iAssembler-1.0.x64" on a 64-bit machine (we call this folder "iAssember home folder"). iAssembler home folder includes two subfolders, a "bin" folder which contains all executables and a "doc" folder which contains the program documentation and the example configure file (see below). The home folder also contains a perl script, iAssembler.pl, which is the core script to run the whole iAssembler pipeline.

Run iAssembler

Quick Start

Put the EST sequence file in FASTA format (assuming the file name is input_EST_seq) into iAssembler home folder
Go to iAssembler home folder and run iAssembler with the following command

shell$ perl iAssembler.pl -i input_EST_seq

The program will generate an output folder named input_EST_seq_output which contains all the output files. See below for the description of the output files.

Input files

iAssembler takes a sequence file in FASTA format, and optionally the corresponding sequence quality file, as its input. The sequences must be processed and cleaned by removing low quality regions and sequences derived from adapters, vectors, rRNAs, tRNAs, as well as sequences from other organelles such as chloroplast and mitochondrion. iAssembler itself does not provide functions to clean and trim raw sequences. Two such programs are lucy and seqclean.

Parameters

(Note: Based on our experiences, the default settings of iAssembler program can achieve very high quality assemblies for most Sanger and/or 454 ESTs.)

Section 1: Input parameters

-i	[String]	Name of the input sequence file in FASTA format (required)
-q	[String]	Name of the quality file in FASTA format (default: none)

Section 2: Assembly parameters

-a	[Integer]	number of CPUs used for megablast clustering (default = 1)
-b	[String]	number of CPUs used for MIRA assembly program (default = 1)
-e	[Integer]	maximum length of end clips (6~100; default = 30)
-h	[Integer]	minimum overlap length (>=30; default = 40)
-p	[Integer]	minimum percent identify for sequence clustering and assembly (95~100; default = 97)
-m		disable cap3 and mira
-c		only for sequences assembled from strand specific RNA-seq

Section 3 : Output parameters

-u	[String]	prefix used for IDs of the assembled unigenes (default = UN) iAssembler names the resulted unigenes with a prefix and trailing numbers, e.g., UN00001
-l	[Integer]	length of the trailing numbers in unigene IDs (>= default; defalut = number characters of the maximum number assigned to unigenes) For example, if the maximum trailing number assigned to the resulted unigenes is 5000, then the default of -l is 4 ('5000' has 4 characters). In this case users can set a number greater than or equal to 4.
-s	[Integer]	start number of unigene ID trailing number (>= 1; default = 1)
-o	[String]	Name of the output directory (default = "input file name" + "_output")
-d		Produce log files. With this parameter will produce log files in the output folder

Output files

iAssembler generates five files and a "log" folder (if -d is supplied) in the output directory.

unigene_seq.fasta
unigene.sam

SAM format

Tablet

IGV

contig_member
unigene_mp

EST ID	EST Length	Uningene ID	Unigene length	Query Start	Query End	Hit Start	Hit End	Strand	% Identity
EST0001	116	UN0001	1195	11	108	650	747	1	100.00

member_position_stat

log folder (if parameter -d is supplied)

Download

Current version of iAssembler is v1.3.3. It's available for only 64-bit linux systems.
Download iAssembler from the ftp server

Note: For large dataset, 32-bit CAP3 can run into the "out of memory" problem. In this case please use the 64-bit version of iAssembler.

Contact

For questions and suggestions, please contact us at bioinfo@cornell.edu