How to run `arraylib` on the command line

To run arraylib on a library deconvolution experiment with default parameters run:

arraylib-run <input_directory> <experimental_design.csv> -c <number_of_cpu_cores_to_use> -gb <path_to_genbank_reference_directory> -br <path_to_bowtie2_indices> -t <transposon_sequence> -bu <upstream_sequence_of_barcodes> -bd <downstream_sequence_of_barcodes>

Input parameters

Required parameters:

input_dir: path to directory holding the input fastq files
exp_design: path to csv file indicating experimental design (values should be separated by a comma). The experimental design file should have columns, Filename, Poolname and Pooldimension. (see example in tests/test_data/full_exp_design.csv)
- Filename should contain all the unqiue input fastq filenames.
- Poolname should indicate to which pool a given file belongs. Multiple files per poolname are allowed.
- Pooldimension indicates the pooling dimension a pool belongs to. All pools sharing the same pooling dimension should have the same string in the Pooldimension column.

An example of how an exp_design file could look like:

Filename	Poolname	Pooldimension
column1.fastq	column1	columns
column2.fastq	column2	columns
row1.fastq	row1	rows
row2.fastq	row2	rows
platerow1.fastq	platerow1	platerows
platerow2.fastq	platerow2	platerows
platecol1.fastq	platecol1	platecols
platecol2.fastq	platecol2	platecols

-gb path to genbank reference file
-br path to bowtie index files, ending with the basename of your index (if the basename of your index is UTI89 and you store your bowtie2 [1] references in bowtie_ref it should be bowtie_ref/UTI89). Please visit https://bowtie-bio.sourceforge.net/bowtie2/manual.shtml#the-bowtie2-build-indexer for a manual how to create bowtie2 indices.
-t transposon sequence (e.g. AGATGTGTATAAGAGACAG)
-bu upstream sequence of barcode (e.g. CGAGGTCTCT)
-bd downstream sequence of barcode (e.g. CGTACGCTGC)

Optional parameters:

-mq minimum bowtie2 alignment quality score for each base to include read
-sq minimum phred score for each base to include read
-tm number of transposon mismatches allowed
-thr threshold for local filter (e.g. a threshold of 0.05 would filter out all reads < 0.05 of the maximum read count for a given mutant)
-g_thr threshold for global filter (all reads below g_thr will be set to 0)

Run only on barcodes

If you want to run arraylib-solve only on barcodes without alignment to the reference genome use the following command:

arraylib-run_on_barcodes <input_directory> <experimental_design.csv> -c <number_of_cpu_cores_to_use>  -bu <upstream_sequence_of_barcodes> -bd <downstream_sequence_of_barcodes>

Optional parameters:

-thr threshold for local filter (e.g. a threshold of 0.05 would filter out all reads < 0.05 of the maximum read count for a given mutant)
-g_thr threshold for global filter (all reads below g_thr will be set to 0)

Output

arraylib-solve outputs 4 files: * count_matrix.csv: Read counts per pool for each mutant, normalized and filtered. * mutant_location_summary.csv: A summary of mutants found in the well plate grid, where each row corresponds to a different mutant. * well_location_summary.csv: A summary of the deconvolved well plate grid, where each row corresponds to a different well.

References

[1] Langmead, B. and Salzberg, S.L., 2012. Fast gapped-read alignment with Bowtie 2. Nature methods, 9(4), pp.357-359.

How to run arraylib on the command line

Input parameters

Run only on barcodes

Output

References

How to run `arraylib` on the command line