GuideMaker

GuideMaker: Software to design CRISPR-Cas guide RNA pools in non-model genomes? 🦠 🧬

CRISPR-Cas systems have expanded the possibilities for gene editing in bacteria and eukaryotes. There are many excellent tools for designing the CRISPR-Cas guide RNAs for model organisms with standard Cas enzymes. GuideMaker is intended as a fast and easy-to-use design tool for atypical projects with 1) non-standard Cas enzymes, 2) non-model organisms, or 3) projects that need to design a panel of guide RNAs (gRNA) for genome-wide screens.

GuideMaker can rapidly design gRNAs for gene targets across the genome from a degenerate protospacer adjacent motif (PAM) and a GenBank file. The tool applies Hierarchical Navigable Small World (HNSW) graphs to speed up the comparison of guide RNAs enabling the user to design gRNAs for all genes for a typical bacterial genome and PAM sequence in about 1-2 minutes on a laptop.

GuideMaker enables the rapid design of genome-wide CRISPR/Cas gene function studies in non-model organisms with any Cas enzyme. While GuideMaker is designed with prokaryotic genomes in mind, it can process smaller eukaryotic genomes as well. GuideMaker is available as command-line software and as a web application at https://guidemaker.app.scinet.usda.gov and in the CyCverse Discovery Environment.

Methods to access GuideMaker

GuideMaker can be easily accessed via:

  • Web Application
  • CyVerse Discovery Environment
  • Command Line
  • Local Web Application

NOTE: *Our web application runs on a small server instance and is primarily designed for the lower-memory requirements bacterial genomes. We recommend that users run larger genomes on the CyCverse Discovery Environment or run GuideMaker locally as a command-line or web browser-based application.

1.Web Application 2.CyCverse Discovery Environment
Image of GuideMaker Web App Image of GuideMaker Web App

3.Command Line

GuideMaker can be installed from:

3.1. Bioconda: (preferred method because it handles dependencies):

# Create a conda environment and install GuideMaker via Bioconda.

conda create --strict-channel-priority --override-channels --channel conda-forge --channel bioconda --channel defaults --name gmenv guidemaker

# Activate conda env
conda activate gmenv

# Test the installation
guidemaker -h

3.2. Github

    # Create a conda environment and install and pybedtools
    conda create -n gmenv python=3.7 pybedtools=0.8.2
    conda activate gmenv

    git clone https://github.com/USDA-ARS-GBRU/GuideMaker.git
    cd GuideMaker
    pip install .

    # check if the installation works
    guidemaker -h

3.3. Docker image: Available at Github Registry


docker pull ghcr.io/usda-ars-gbru/guidemaker-nonavx:sha-9be9fe1c9dca

Dependencies

  • pybedtools
  • NMSLib
  • Biopython
  • Pandas
  • Streamlit for webapp
  • altair for plotting

Command Line Usage

usage: guidemaker [-h] --genbank GENBANK [GENBANK ...] --pamseq PAMSEQ
                  --outdir OUTDIR [--pam_orientation {5prime,3prime}]
                  [--guidelength [10-27]] [--lsr [0-27]]
                  [--dtype {hamming,leven}] [--dist [0-5]] [--before [1-500]]
                  [--into [1-500]] [--knum [2-20]] [--controls CONTROLS]
                  [--threads THREADS] [--log LOG] [--tempdir TEMPDIR]
                  [--restriction_enzyme_list [RESTRICTION_ENZYME_LIST [RESTRICTION_ENZYME_LIST ...]]]
                  [--filter_by_locus [FILTER_BY_LOCUS [FILTER_BY_LOCUS ...]]]
                  [--doench_efficiency_score] [--keeptemp] [--plot]
                  [--config CONFIG] [-V]

GuideMaker: Software to design gRNAs pools in non-model genomes and CRISPR-Cas
systems

optional arguments:
  -h, --help            show this help message and exit
  --genbank GENBANK [GENBANK ...], -i GENBANK [GENBANK ...]
                        One or more genbank .gbk or gzipped .gbk files for a
                        single genome
  --pamseq PAMSEQ, -p PAMSEQ
                        A short PAM motif to search for, it may use IUPAC
                        ambiguous alphabet
  --outdir OUTDIR, -o OUTDIR
                        The directory for data output
  --pam_orientation {5prime,3prime}, -r {5prime,3prime}
                        PAM position relative to target: 5prime:
                        [PAM][target], 3prime: [target][PAM]. For example,
                        Cas9 is 3prime. Default: '5prime'.
  --guidelength [10-27], -l [10-27]
                        Length of the guide sequence. Default: 20.
  --lsr [0-27]          Length of a seed region near the PAM site required to
                        be unique. Default: 10.
  --dtype {hamming,leven}
                        Select the distance type. Default: hamming.
  --dist [0-5]          Minimum edit distance from any other potential guide.
                        Default: 2.
  --before [1-500]      keep guides this far in front of a feature. Default:
                        100.
  --into [1-500]        keep guides this far inside (past the start site)of a
                        feature. Default: 200.
  --knum [2-20]         how many sequences similar to the guide to report.
                        Default: 5.
  --controls CONTROLS   Number or random control RNAs to generate. Default:
                        1000.
  --threads THREADS     The number of cpu threads to use. Default: 2
  --log LOG             Log file
  --tempdir TEMPDIR     The temp file directory
  --restriction_enzyme_list [RESTRICTION_ENZYME_LIST [RESTRICTION_ENZYME_LIST ...]]
                        List of sequence representing restriction enzymes.
                        Default: None.
  --filter_by_locus [FILTER_BY_LOCUS [FILTER_BY_LOCUS ...]]
                        List of locus tag. Default: None.
  --doench_efficiency_score
                        Doench et al. 2016 - only for NGG PAM: None.
  --keeptemp            Option to keep intermediate files be kept
  --plot                Option to genereate guidemaker plots
  --config CONFIG       Path to YAML formatted configuration file, default is 
                        /Users/ravinpoudel/opt/anaconda3/envs/gmenv/lib/python
                        3.7/site-packages/guidemaker/data/config_default.yaml
  -V, --version         show program's version number and exit

To run the web app locally, in terminal run:
-----------------------------------------------------------------------
streamlit run /Users/ravinpoudel/opt/anaconda3/envs/gmenv/lib/python3.7/site-
packages/guidemaker/data/app.py

Examples

Use case: Make 20 nucleotide guide sequences for SpCas9 (NGG) in the bacterium Carsonela ruddii. The length of the seed region near the PAM required to be unique in each guide is 11 nucleotides.

    guidemaker \
    -i tests/test_data/Carsonella_ruddii.gbk \
    -p NGG \
    --pam_orientation 3prime \
    --guidelength 20 \
    --lsr 11 \
    -o OUTDIR \
    --doench_efficiency_score \
    --threads 2 

4. Running Web App locally

To run the web app locally, you first need to complete the command line installation described above.

If the path of the app.py differs from the one displayed below, you can locate the path by first running guidemaker --help. Script to run the web app locally is available at the bottom of the help command output.


streamlit run /Users/admin/opt/anaconda3/envs/gmenv/lib/python3.7/site-packages/guidemaker/data/app.py --server.maxUploadSize 500

Image of GuideMaker Web App

Using GuideMaker's results

This section provides information on how to use GuideMaker's results to create a molecular protocol for pooled CRISPR screens.

Pooled CRISPR Experiments

Experiments that target the entire genome, or many genes at once, are typically performed in pooled experiments where 100-100,000+ targets are tested simultaneously. The pooled oligonucleotides for each gRNA are cloned in one batch and used simultaneously in the designed experiment. Each gRNA sequence acts as a barcode that can be quantified with high-throughput sequencing to elucidate each target's relative importance under the experimental conditions.

Vectors for gRNA Cloning

Genome-scale CRISPR experiments require a gRNA vector amenable to high-throughput cloning, most often through Golden Gate cloning, a restriction enzyme-dependent reaction. Plasmids to express gRNA are available from Addgene can be found at the link below, though not all of these are compatible with high-throughput cloning.

Addgene: CRISPR Plasmids - Empty gRNA Vectors

After running GuideMaker, the designed gRNA output can be downloaded and with minor adjustments, the targets can be ordered as oligos for cloning. Pooled oligonucleotides can be purchased from several vendors, including those listed below. Pool sizes vary from 100 to over 200,000 oligonucleotides. Vendor specifications for the number of oligos, oligo length, and cost per bp vary widely. For bacterial genome-scale experiments, as of 2021, Genscript offers pool sizes of 12,472 and 91,766 with up to 79 bp per oligo for list prices of $1600 and $4,000, respectively.

Some example vendors are:

Most pools require amplification before cloning to convert the ssDNA to dsDNA and increase the concentration for efficient cloning. Accordingly, adding a constant region at the 3' end for primer binding is recommended. Sub-pools can also be amplified by adding unique constant regions to some oligos, enabling the large-scale synthesis to be split amongst organisms or specific targets in a single organism. Because Golden Gate cloning utilizes restriction enzymes, filtering gRNA designs with the cognate restriction enzyme recognition sites is necessary, a feature found in GuideMaker. A general protocol for cloning pooled gRNA from synthesized oligonucleotides from IDT is linked below, though similar workflows can be used for pools from other vendors.

Pooled CRISPR Data Analysis

After the experiment, the cells are collected and DNA is isolated. The target sequence is then amplified and adaptors for high-throughput sequencing added. Several data analysis pipelines have been developed to identify target sequences over-represented or under-represented in the pool. The manuscript by Wang et al. (2019) provides a protocol for using a high-quality tool with these capabilities.

Citation

Wang, B., Wang, M., Zhang, W. et al. Integrative analysis of pooled CRISPR genetic screens using MAGeCKFlute. Nat Protoc 14, 756–780 (2019). https://doi.org/10.1038/s41596-018-0113-7

FAQs

Coming soon…

Reporting Errors and Suggestions

Open the GuideMaker github repo, navigate to the Issues page and submit an issue to report difficulties, errors, or suggestions for improvements. Also, check FAQs section prior submitting an issue.

Image of GuideMaker Web App

Citation

*Poudel R, Rodriguez LT, Reisch CR, Rivers AR. GuideMaker: Software to design CRISPR-Cas guide RNA pools in non-model genomes. 2021. In Review.

API documentation

API documentation for the module can be found here

License information

GuideMaker was created by the United States Department of Agriculture - Agricultural Research Service (USDA-ARS). As a work of the United States Government, this software is available under the CC0 1.0 Universal Public Domain Dedication (CC0 1.0)

CircleCI Codacy Badge Codecov DOI Anaconda-Server Badge GitHub release (latest by date)

About us

GuideMaker was developed by the USDA Agricultural Research Service, Genomics and Bioinformatics Research Unit group in Gainesville, FL led by Adam Rivers. Check out our other work at https://tinyecology.com.