Created by researchers at the Bioinformatics and Computational Biology lab at the Washington State University, pClust is an open source (BSD) software package that enables fast and efficient clustering of protein and DNA sequences. The speedup is achieved by using the highly efficient, parallelized software Parasail for sequence alignment and Grappolo for clustering.
Sequence alignment algorithms are implemented in SIMD C (C99) for efficiency. Parasail library contains vectorized implementations of the three most popular sequence alignment algorithms guaranteed to find optimal alignments. As its output, Parasail returns a graph and three alignment statistics for each edge computed. Aligned protein sequences can then be clustered with Grappolo based on their similarity measure. And, because of its multi-threaded implementation using OpenMP, Grappolo is very fast. For a graph with approximately 2M edges high-homogeneity protein clusters are identified by Grappolo in less than 1 minute.
Finally, pClust graphical user interface, implemented in Java, provides convenient control of many sequence alignment parameters including gap opening and extend penalties, computation bit precision, choice of scoring matrix (PAM and BLOSUM), and many others:
Case Study: pClust Use for Construction of Phylogenomic Network of 102 Organisms
Using the pipeline of pClust = Parasail + Grappolo, the researchers at BCB lab constructed a complete genome phylogenomic network of 102 microorganisms. The 120K protein sequences were aligned and clustered with pClust and the pairwise distances between the genomes were computed. The total computation time was under 10 minutes.
Use of Excelsior JET
BCB lab uses Excelsior JET to create Mac‑ and Windows‑compatible installers for pClust. Here is what they had to say about our product:
We were really pleased to discover Excelsior JET. It provides a painless deployment toolkit for use in creating installers that contain all necessary files, including the application executable file, runtime files, data files, documentation, licenses, and so on.