I was asked to help out with using ‘Novoplasty’ to build complete chloroplast sequences from shorter dna strands. It’s a pretty amazing app that’s easy enough to use but it only lets you run one seed file at a time – (I’m assuming if you’re reading this you understand what ‘Novoplasty’ is!). Doing lots of runs can quickly eat up your day so I decided to write a perl script that will run batches of seed files through ‘Novoplasty’. Following is how you can use it.
First download a copy of ‘Novoplasty’ and ‘Batch Novoplasty’.
The batch script has been written to work with version Novoplasty 2.5.6. You can open the ‘Batch Novoplasty’ in a text editor and change this to work with other versions. This is hard coded near the bottom of the script so be careful when you change the text! Feel free to change the message that’s printed when the script finishes, if it makes your life better!
Place ‘Novoplasty’ and ‘Batch Novoplasty’ files in the same folder. Open a terminal window, drag ‘Batch Novoplasty’ into the terminal window, and click return. This will create three folders and ask you to place your files in them. The folders are:
Place your .fasta files containing the seed data into the ‘1_Seed_Files’ folder. Each .fasta file should contain one seed, best to clean the data by removing the header and any lower case a, g, c’s or t’s from the beginning. ‘Novoplasty’ will only read in 100 characters so give it good data to work with.
If your sequences have been supplied in a single .fasta file I’ve attached another perl script that can be used to pull them apart. It relies on each sequence starting with “>Seq_”. The text that follows “>Seq_” is used to create a unique name for each seed file. Either start all your seed fragments with “>Seq_” (followed by a unique identifier like a bar code number) or change the variable inside the script to suit your current format. You will need to name the file containing all your seed fragments “split.fasta”. This script assumes the first 100 characters of your seed fragment are header details and removes them, it also removes any lower case a, g, c’s or t’s from the beginning of the seed.
Rename the seed file to split “split.fasta” and place it in the same folder as the “Split Fasta File” script. Open a terminal window, drag the “Split Fasta File” script in and hit return. The script should spit out a new fasta file for each occurrence of your delimiter string i.e. “>Seq_”.
If running the script gives a permission error, check that the script file is set to be executable. Part of the script that passes the middle 100 characters of your seed to ‘Novoplasty’ has been commented out. The idea was that the middle 100 characters should be clear of any messy sequencing errors that often occur at the beginning and end of a read. If you liked to use this, simply uncomment the section by removing the # at the beginning of each line. Move all the resulting seed files into the ‘1_Seed_Files’ folder for the Batch script to use.
Next place your forward and reverse data into the folders ‘2_Reads_Forward’ and ‘3_Reads_Reverse’. The batch should work with .fastq or . fastq.gz, though I’ve only tested it with fastq.gz files. The script will only read in one file per folder. Jump into terminal again, drag and drop the ‘Batch Novoplasty’ file into your terminal window, hit return and that ‘should’ be it. Your seed files will be read in one at time and built using the supplied forward and reverse files. You won’t get all the normal feed back supplied by ‘Novoplasty’, just a simple ‘processing Seed file 1’ etc…
Note changing variables in the config file won’t have an effect on ‘Novoplasty’. The batch script recreates the config for every new seed file. If you need different settings in the config file make the changes inside the ‘Batch Novoplasty’ script.
If you want to run multiple batches at the same time, just set up a folder for each batch, put ‘Novoplasty’ and the ‘Batch Novoplasty’ scripts in the same folder and run the batch script in a different terminal window for each batch.