There are two ways to use PLAST on the command-line: using PLAST Wizard or PLAST Tool.
- PLAST wizard is an easy-to-use shell script that helps you configuring your sequence comparison job. It is intended to be used by newbies trying PLAST for the first time.
- PLAST tool is the official PLAST software that requires you pass in all required arguments to setup and run a sequence comparison job. It is intended to be used by expert users that already know how to use PLAST software.
Contents
Using PLAST Wizard
PLAST Wizard is a shell script located in the ‘scripts’ directory of PLAST distribution. It is called “plast.sh” (Linux, MacOS X). Using this script is very straightforward: simply type in the shell name in your command-line, hit [Enter] key and follow instructions. You’ll see that just before running the PLAST job, PLAST Wizard displays full set of arguments that will be used to start the job: using PLAST Wizard you can understand how to setup a PLAST job.
Using PLAST Tool
PLAST Tool is the official (i.e. standard) way of using PLAST software. It requires you know PLAST command-line arguments. Let’s see how to use PLAST.
A first job
A basic PLAST command-line is as follows:
plast -p method -i my_query -d my_databank -o my_results
These are the mandatory arguments: comparison method (-p), query file (-i), reference databank (-d) and output file (-o). Query file has to be a Fasta file. Reference databank can be either a FASTA file or a BLAST databank.
Let’s see how to use PLAST with a Fasta file as the reference (subject) databank. Next section covers the use of a BLAST formatted databank.
Notice: be aware that PLAST only accepts Fasta files compliant with Fasta specifications. Among others, sequences have to be written as a series of lines, each of which should be no longer than 80 characters.
Let’s try with the sample sequence files provided with PLAST software:
plast -p plastp -i [path_to_plast_home]/db/query.fa -d [path_to_plast_home]/db/tursiops.fa -o out.tab
The file created by PLAST (out.csv in this example) is always formatted as a BLAST-like tabular formatted file (result will contain the following columns: query ID, subject ID, percent identities, alignment length, nb. misses, nb. gaps, query begin, query end, subject begin, subject end, e-value, bit score); such a file is easily manageabled with other command-line based data processing tools or by spreadsheet applications.
- accepted values for “-p” are: plastp, plastn, plastx, tplastn and tplastx.
- accepted value for “-i” (query) is a Fasta file.
- accepted value for “-d” (reference) is a Fasta file, or a comma-separated list of Fasta files, or a Blast formatted databank (use .pin, .pal, .nin or .nal file with its extension).
Notice: it is advised to always use absolute paths to specify -i, -d and -o argument values.
Using a BLAST databank as the reference
As stated in the previous section, PLAST is capable of using a BLAST databank as the subject (-d argument). This is quite simple since you just have to pass in to “-d” argument the path to the BLAST databank. The only trick is that a BLAST databank is made of several files, depending on the size of databank and its content (nucleotide or protein). To use such a databank with PLAST, locate one of the following file: .nin (single volume nucleotide databank), .nal (multi-volume nucleotide databank), .pin (single volume protein databank), .pal (multi-volume protein databank).
Then run PLAST as follows:
plast -p plastp -i [path_to_plast_home]/db/query.fa -d [path_to_plast_home]/db/swissprot_vertebrate.pin -o out.csv
A BLAST databank can come from one of the following sources:
- NCBI; see ftp://ftp.ncbi.nlm.nih.gov/blast/db/
- BLAST databanks prepared on your own using either formatdb (legacy Blast) or makeblastdb (Blast+)
Setting up search parameters
The above command-lines run PLAST with default parameters: e-value threshold is set to 10, number of computing cores is set to the maximum number of cores available on your computer, sequence low-complexity filtering is set to false, and number of matching hits is set to “one hsp per hit and one hit per query”. If you want to control these values, use these arguments, respectively: -e, -a, -F, -H and -Q.
In this example, we set an e-value threshold to 1e-3, the number of cores to 8, we want low-complexity filtering, and we request to get 1 HSP/hit and at most 10 hits/query:
plast -e 1e-3 -a 8 -F T -H 1 -Q 10 -p plastp -i [path_to_plast_home]/db/query.fa -d [path_to_plast_home]/db/tursiops.fa -o out.tab
Producing all results
In addition to BLAST-like tabular output format (-outfmt 1), you can ask PLAST to produce all data records in the results:
plast ... -outfmt 2
Result will contain the following columns: query ID, subject ID, percent identities, alignment length, nb. misses, nb. gaps, query begin, query end, subject begin, subject end, e-value, bit score, query length, query frame, query translated, query coverage, nb. gaps in query, subject length, subject frame, subject translated, subject coverage, nb. gaps in subject.
When using “-outfmt 1”, result will contain the following columns: query ID, subject ID, percent identities, alignment length, nb. misses, nb. gaps, query begin, query end, subject begin, subject end, e-value, bit score.
Producing NCBI BLAST-XML Format
Instead of BLAST-like tabular output format, PLAST can also produce XML format similar to NCBI BLAST xml:
plast ... -outfmt 4 -force-query-order 1000
The use of additional argument “-force-query-order 1000” is mandatory to produce appropriate XML files.
Monitoring job
PLAST enables you to monitor job execution:
plast ... -bargraph
When using argument -bargraph, PLAST displays a progression bar such as this one:
plastp [1/1] 100.0% align=16960 time [00:00:08 - 00:00:00 - 00:00:08] mem=298.7Mo (max=298.7Mo tot=0.3Go) seeds [5082:5082] [====================] 100%
Several pieces of information are provided, such as comparison method (plastp), progression of execution (%), number of matches found (16960), execution times, etc. More on this here.
Note: ensure that your terminal window displays at least 150 characters/line, otherwise bargraph won’t display appropriately on a single line.
Getting help
Use the following command-line to see all available PLAST arguments:
plast -h