LibraryExperiment
- class arraylib.libraryexperiment.LibraryExperiment(cores, map_quality, seq_quality, gb_ref, bowtie_ref, tn_seq, tn_mismatches, input_dir, exp_design, use_barcodes, bar_upstream, bar_downstream, filter_thr, global_filter_thr, min_counts)
LibraryExperiment class used to perform individual analysis steps and store intermediate results.
- Parameters
cores (int) – number of cores to use
map_quality (int) – minimum bowtie2 alignment quality score for each base to include read
seq_quality (int) – minimum phred score for each base to include read
gb_ref (str) – path to genbank reference file
bowtie_ref (str) – path to bowtie reference file
tn_seq (str) – transposon sequence (e.g. ATTGCCTA)
tn_mismatches (int) – number of transposon mismatches allowed
input_dir (str) – path to directory holding the input fastq files
exp_design (str) – path to file indicating experimental design. The experimental design file should have columns, Filename, Poolname and Pooldimension
use_barcodes (bool) – whether to perform deconvolution only on barcodes without genomic alignment
bar_upstream (str) – upstream sequence of barcode
bar_downstream (str) – downstream sequence of barcode
filter_thr (float) – threshold for local filter. read counts of a given mutant whose percentage of the max read count is lower than filter_thr are set to 0
global_filter_thr (int) – threshold for global filter. read counts lower than global_filter_thr are set to 0
- align_genomic_seq()
Aligns trimmed reads to reference using bowtie2. The output of bowtie2 is parsed and stored in temp/alignment_result.csv.
- deconvolve(barcode_only=False, count_mat='filtered')
Deconvolve mutant count matrix and return summary output with genes names.
- Parameters
barcode_only (bool) – whether to perform deconvolution only on barcodes without genomic alignment
count_mat (str) – which count matrix to use options are (raw, bc_filtered, normalized, filtered)
- get_genomic_seq(barcode_only=False)
Trims all the reads based on the presence of the transposon recognizing sequence. Only keeps the downstream genomic sequences after the transposon border site.
As side effect a temp folder is created in the current directory trimmed genomic sequences are written to temp/trimmed_sequences.fastq
- Parameters
barcode_only (bool) – whether to perform deconvolution only on barcodes without genomic alignment
- write_count_matrix(barcode_only=False)
Assembles count matrix from bowtie2 alignment. Performs counts per million normalization and filters out spurious barcodes with very little read counts for a given mutant. Stores 4 differently processed count matrices in LibraryExperiment:
raw_count_matrix : contains raw counts for all detected mutants count_matrix : contains raw counts for detected mutants, but
barcode that have count sums below 10 % of the max barcode are removed.
normalized_count_matrix : cpm normalized count_matrix filtered_count_matrix : local and global filters applied to
normalized_count_matrix
- Parameters
barcode_only (bool) – whether to perform deconvolution only on barcodes without genomic alignment