How to run arraylib
on the command line
To run arraylib
on a library deconvolution experiment with default parameters run:
arraylib-run <input_directory> <experimental_design.csv> -c <number_of_cpu_cores_to_use> -gb <path_to_genbank_reference_directory> -br <path_to_bowtie2_indices> -t <transposon_sequence> -bu <upstream_sequence_of_barcodes> -bd <downstream_sequence_of_barcodes>
Input parameters
Required parameters:
input_dir: path to directory holding the input fastq files
exp_design: path to csv file indicating experimental design (values should be separated by a comma). The experimental design file should have columns, Filename, Poolname and Pooldimension. (see example in tests/test_data/full_exp_design.csv)
Filename should contain all the unqiue input fastq filenames.
Poolname should indicate to which pool a given file belongs. Multiple files per poolname are allowed.
Pooldimension indicates the pooling dimension a pool belongs to. All pools sharing the same pooling dimension should have the same string in the Pooldimension column.
An example of how an exp_design file could look like:
Filename |
Poolname |
Pooldimension |
---|---|---|
column1.fastq |
column1 |
columns |
column2.fastq |
column2 |
columns |
row1.fastq |
row1 |
rows |
row2.fastq |
row2 |
rows |
platerow1.fastq |
platerow1 |
platerows |
platerow2.fastq |
platerow2 |
platerows |
platecol1.fastq |
platecol1 |
platecols |
platecol2.fastq |
platecol2 |
platecols |
-gb path to genbank reference file
-br path to bowtie index files, ending with the basename of your index (if the basename of your index is UTI89 and you store your bowtie2 [1] references in bowtie_ref it should be bowtie_ref/UTI89). Please visit https://bowtie-bio.sourceforge.net/bowtie2/manual.shtml#the-bowtie2-build-indexer for a manual how to create bowtie2 indices.
-t transposon sequence (e.g. AGATGTGTATAAGAGACAG)
-bu upstream sequence of barcode (e.g. CGAGGTCTCT)
-bd downstream sequence of barcode (e.g. CGTACGCTGC)
Optional parameters:
-mq minimum bowtie2 alignment quality score for each base to include read
-sq minimum phred score for each base to include read
-tm number of transposon mismatches allowed
-thr threshold for local filter (e.g. a threshold of 0.05 would filter out all reads < 0.05 of the maximum read count for a given mutant)
-g_thr threshold for global filter (all reads below g_thr will be set to 0)
Run only on barcodes
If you want to run arraylib-solve only on barcodes without alignment to the reference genome use the following command:
arraylib-run_on_barcodes <input_directory> <experimental_design.csv> -c <number_of_cpu_cores_to_use> -bu <upstream_sequence_of_barcodes> -bd <downstream_sequence_of_barcodes>
Optional parameters:
-thr threshold for local filter (e.g. a threshold of 0.05 would filter out all reads < 0.05 of the maximum read count for a given mutant)
-g_thr threshold for global filter (all reads below g_thr will be set to 0)
Output
arraylib-solve
outputs 4 files: * count_matrix.csv: Read counts per pool for each mutant, normalized and filtered. * mutant_location_summary.csv: A summary of mutants found in the well plate grid, where each row corresponds to a different mutant. * well_location_summary.csv: A summary of the deconvolved well plate grid, where each row corresponds to a different well.
References
[1] Langmead, B. and Salzberg, S.L., 2012. Fast gapped-read alignment with Bowtie 2. Nature methods, 9(4), pp.357-359.