LibraryExperiment

class arraylib.libraryexperiment.LibraryExperiment(cores, map_quality, seq_quality, gb_ref, bowtie_ref, tn_seq, tn_mismatches, input_dir, exp_design, use_barcodes, bar_upstream, bar_downstream, filter_thr, global_filter_thr, min_counts)

LibraryExperiment class used to perform individual analysis steps and store intermediate results.

Parameters
  • cores (int) – number of cores to use

  • map_quality (int) – minimum bowtie2 alignment quality score for each base to include read

  • seq_quality (int) – minimum phred score for each base to include read

  • gb_ref (str) – path to genbank reference file

  • bowtie_ref (str) – path to bowtie reference file

  • tn_seq (str) – transposon sequence (e.g. ATTGCCTA)

  • tn_mismatches (int) – number of transposon mismatches allowed

  • input_dir (str) – path to directory holding the input fastq files

  • exp_design (str) – path to file indicating experimental design. The experimental design file should have columns, Filename, Poolname and Pooldimension

  • use_barcodes (bool) – whether to perform deconvolution only on barcodes without genomic alignment

  • bar_upstream (str) – upstream sequence of barcode

  • bar_downstream (str) – downstream sequence of barcode

  • filter_thr (float) – threshold for local filter. read counts of a given mutant whose percentage of the max read count is lower than filter_thr are set to 0

  • global_filter_thr (int) – threshold for global filter. read counts lower than global_filter_thr are set to 0

align_genomic_seq()

Aligns trimmed reads to reference using bowtie2. The output of bowtie2 is parsed and stored in temp/alignment_result.csv.

deconvolve(barcode_only=False, count_mat='filtered')

Deconvolve mutant count matrix and return summary output with genes names.

Parameters
  • barcode_only (bool) – whether to perform deconvolution only on barcodes without genomic alignment

  • count_mat (str) – which count matrix to use options are (raw, bc_filtered, normalized, filtered)

get_genomic_seq(barcode_only=False)

Trims all the reads based on the presence of the transposon recognizing sequence. Only keeps the downstream genomic sequences after the transposon border site.

As side effect a temp folder is created in the current directory trimmed genomic sequences are written to temp/trimmed_sequences.fastq

Parameters

barcode_only (bool) – whether to perform deconvolution only on barcodes without genomic alignment

write_count_matrix(barcode_only=False)

Assembles count matrix from bowtie2 alignment. Performs counts per million normalization and filters out spurious barcodes with very little read counts for a given mutant. Stores 4 differently processed count matrices in LibraryExperiment:

raw_count_matrix : contains raw counts for all detected mutants count_matrix : contains raw counts for detected mutants, but

barcode that have count sums below 10 % of the max barcode are removed.

normalized_count_matrix : cpm normalized count_matrix filtered_count_matrix : local and global filters applied to

normalized_count_matrix

Parameters

barcode_only (bool) – whether to perform deconvolution only on barcodes without genomic alignment