{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "05d573cf",
   "metadata": {},
   "source": [
    "# How to run `arraylib` on the command line\n",
    "\n",
    "To run `arraylib` on a library deconvolution experiment with default parameters run:\n",
    "\n",
    "```\n",
    "arraylib-run <input_directory> <experimental_design.csv> -c <number_of_cpu_cores_to_use> -gb <path_to_genbank_reference_directory> -br <path_to_bowtie2_indices> -t <transposon_sequence> -bu <upstream_sequence_of_barcodes> -bd <downstream_sequence_of_barcodes>\n",
    "```\n",
    "\n",
    "## Input parameters\n",
    "\n",
    "Required parameters:\n",
    "\n",
    "* input_dir: path to directory holding the input fastq files\n",
    "* exp_design: path to csv file indicating experimental design (values should be separated by a comma). The experimental design file \n",
    "       should have columns, Filename, Poolname and Pooldimension. (see example in tests/test_data/full_exp_design.csv)\n",
    "  * Filename should contain all the unqiue input fastq filenames.\n",
    "  * Poolname should indicate to which pool a given file belongs. Multiple files per poolname are allowed.\n",
    "  * Pooldimension indicates the pooling dimension a pool belongs to. All pools sharing the same pooling dimension should have the same string in the Pooldimension column.\n",
    "  \n",
    "\n",
    "An example of how an exp_design file could look like:\n",
    "\n",
    "| Filename          | Poolname        | Pooldimension  |\n",
    "| :---------------: | :-------------: | :------------: |\n",
    "| column1.fastq     | column1         | columns        |\n",
    "| column2.fastq     | column2         | columns        |\n",
    "| row1.fastq        | row1            | rows           |\n",
    "| row2.fastq        | row2            | rows           |\n",
    "| platerow1.fastq   | platerow1       | platerows      |\n",
    "| platerow2.fastq   | platerow2       | platerows      |\n",
    "| platecol1.fastq   | platecol1       | platecols      |\n",
    "| platecol2.fastq   | platecol2       | platecols      |\n",
    "\n",
    "* -gb path to genbank reference file\n",
    "* -br path to bowtie index files, ending with the basename of your index (if the basename of your index is UTI89 and you store your bowtie2 [[1]](#1) references in bowtie_ref it should be bowtie_ref/UTI89). Please visit https://bowtie-bio.sourceforge.net/bowtie2/manual.shtml#the-bowtie2-build-indexer for a manual how to create bowtie2 indices.\n",
    "* -t transposon sequence (e.g. AGATGTGTATAAGAGACAG)\n",
    "* -bu upstream sequence of barcode (e.g. CGAGGTCTCT)\n",
    "* -bd downstream sequence of barcode (e.g. CGTACGCTGC)\n",
    "\n",
    "Optional parameters:\n",
    "\n",
    "* -mq minimum bowtie2 alignment quality score for each base to include read\n",
    "* -sq minimum phred score for each base to include read\n",
    "* -tm number of transposon mismatches allowed\n",
    "* -thr threshold for local filter (e.g. a threshold of 0.05 would filter out all reads < 0.05 of the maximum read count for a given mutant)\n",
    "* -g\\_thr threshold for global filter (all reads below g_thr will be set to 0) \n",
    "\n",
    "## Run only on barcodes\n",
    "If you want to run arraylib-solve only on barcodes without alignment to the reference genome use the following command:\n",
    "\n",
    "```\n",
    "arraylib-run_on_barcodes <input_directory> <experimental_design.csv> -c <number_of_cpu_cores_to_use>  -bu <upstream_sequence_of_barcodes> -bd <downstream_sequence_of_barcodes>\n",
    "```\n",
    "\n",
    "Optional parameters:\n",
    "\n",
    "* -thr threshold for local filter (e.g. a threshold of 0.05 would filter out all reads < 0.05 of the maximum read count for a given mutant)\n",
    "* -g\\_thr threshold for global filter (all reads below g_thr will be set to 0) \n",
    "\n",
    "## Output\n",
    "\n",
    "`arraylib-solve` outputs 4 files: \n",
    "* count_matrix.csv: Read counts per pool for each mutant, normalized and filtered.\n",
    "* mutant_location_summary.csv: A summary of mutants found in the well plate grid, where each row corresponds to a different mutant.\n",
    "* well_location_summary.csv: A summary of the deconvolved well plate grid, where each row corresponds to a different well.\n",
    "\n",
    "\n",
    "\n",
    "## References\n",
    "<a id=\"1\">[1]</a> \n",
    "Langmead, B. and Salzberg, S.L., 2012. Fast gapped-read alignment with Bowtie 2. Nature methods, 9(4), pp.357-359."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "spyder-env",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}