The scripts are divided into two modules using python and R for each of them
The python script analyze_fcs_flowcal.py and jupyter notebook scripts_archive/flowcal_pipeline_report.py/ipnb is a wrapper for (semi-)automated processing of flow cytometry data using FlowCal submodule from Tabor lab : https://github.com/taborlab/FlowCal/ in a standard workflow
Castillo-Hair, Sebastian M., et al. "FlowCal: a user-friendly, open source software tool for automatically converting flow cytometry data from arbitrary to calibrated units." ACS synthetic biology 5.7 (2016): 774-780.
Briefly, this is what the python wrapper does :
- Opens all .fcs files within the directory; identify the file for calibration beads
- Prepares the beads data for calibration into mean equivalent fluorophore units (MEFL)
- FlowCal functions : Cleanup data, convert arbitrary fluorescence into MEFLs. Clean up includes
- Gates out saturated events (low and high end)
- Density gating for cells, to remove debris. Retains 50% events from the highest density region. This parameter can be changed by user and would be good to test : 0.3, 0.5, 0.8 fractions before running all the data
- Retains singlet population : top 90% of the FSC-A vs FSC-H plot. This excludes any clumps of cells..
- Saves the plots showing cleanup steps for all/5 random data in
.htmlfrom the jupyter notebook. - Outputs summary statistics of mean, median, mode to a .csv file
- Saves the cleaned up
.fcsfiles to theprocessed_datadirectory. These can be analyzed by any tool of the user's choice. To interact with each of these steps individually and test different parameters, such as the fraction retained for density gating, use the jupyter notebookscripts_archive/adhoc_flowcal_analysis.py/ipnb
Note: I save the data from flowcal for analysis by R later. Users can use any other tool they wish. The reason for this decision is that I wasn't satisfied with the analysis and plotting capabilities provided by FlowCal and I prefer ggplot to python's plots. + R has a very good general purpose flowcytometry ecocystem with many packages built upon the flowCore package; These work on .fcs files without keeping them in the RAM!
The R section is not fully automated yet, but it should work pretty well once you get a hang of the R commands in an hour or two. Do reach out to me by using the issues section on github if you have questions
- The R section of the pipeline uses the processed data saved by flowcal. _If you wish, you can skip the cleanup in python and look at the raw data with the same R scripts as well.
- It attaches the sample names to wells from a 96-well layout in google sheet/.csv file.
- After this R provides commands to use for gating based on a single representative
.fcs, and broadcasts the gate to all other data. Using theopenCytopackage for this.- Currently I use the function
openCyto::mindensity(..)which draws a gate threshold at the minimum density region in 1d, so is applicable when the sample has a bimodal distribution with two populations - Look at the documentations in
openCyto's autogating for other gating schemes in 1D and 2D. AndflowCoreforrectangleGate()andquadGate()
- Currently I use the function
- Calculates population statistics for all the data using .flowWorkspace package and save data into
.csvfile - Plots distributions of data as highly customizable ggplots both with and without gating. The plots can be made with a one liner code using the powerful
ggcytopackage. Note: replicate wells with same name are merged. Example figure with lots of customizations (no gating here) : ![[FACS_analysis/plots/S043_28-3-22-processed-ridge density-processed-red.png]]
-
Setup git on your computer if you haven't already - git helper
-
Please clone this R-python hybrid code into your computer with the command
git clone https://github.com/ppreshant/flow_cytometry.gitor thesshversiongit clone git@github.com:ppreshant/flow_cytometry.git(which is more secure, and takes a couple mins extra setting up, but I would recommend it - here's some help).- The same folder will hold your flow cytometry data and the outputs so it can get large. Choose the folder location accordingly.
-
For the first time, run the steps in R to to load all the required packages
install.packages('tidyverse'); and do the same for -- reticulate
- BiocManager
Use BiocManager to install the bioconductor packages -
BiocManager::install("flowCore"); and others -- ggcyto
- openCyto
-
use conda to setup the python requirements : Mostly need the standard
pandas,matplotlib,numpyetc.- Install miniconda : a minimal version of the package and environment manager
conda. use instructions from the documentation page - Use the command
conda env create -f flowcal_wrappers_environment.yaml. This will create an environment with the nameflowcaland install all the python dependancies listed in the file to your conda environment
- Install miniconda : a minimal version of the package and environment manager
- Put your data into the
flowcyt_datadirectory. - Update the files for user_inputs for both python and R:
./0.5-user_inputs.R: for R steps- base_directory <- 'flowcyt_data' or 'processed_data'
- folder_name <- '..' : the folder your individual
.fcsfiles are in within the base_directory - file.name_input <- '..' : Use this option if you have a single
.fcsfile holding multiple data (such as from Guava machines). _After unpacking these data you will use the same name for thefolder_nameoption - template_source <- 'googlesheet' # use 'googlesheet' or 'excel' options depending on where you are providing the plate layout to name the wells.
scripts_general_fns/g10_user_config.py: for python steps- fcs_experiment_folder = '..' : the folder your individual
.fcsfiles are in within the base_directory - density_gating_fraction = .5 ; might need to adjust
- fcs_experiment_folder = '..' : the folder your individual
- Put sample names into the excel file
flowcyt_data/plate_layoyts.xlsxor a google sheet. Each well with sample will have the formatplasmid1_positive. The value after the '_' is thesample_category: used to colour plots ; and the value before isassay_variablewill be on the x/y-axis of the plots.exceloption is easier but if you would prefer to use thegooglesheetfor naming the samples, then duplicate theFlow cytometry layoutstab from this sheet into your own googlesheet, and put its url in the0-general_functions_fcs.R/sheeturlsfor theplate_layouts_pkoption.
- If you have a single
.fcsfile with multiple data run and you want to run the flowCal workflow. Run the# prelimsand# load datasections in the codeanalyze_fcs.R. This will unpack each individual well into a separate.fcsfile in a folder. For subsequent steps, change thefolder_nameoption to the name of the new folder and changefile.name_inputto be empty''. Now you can go ahead with the python module and come back the the R module.
- open a suitable terminal that works for
condaand activate theflowcalenvironment that you created above withconda activate flowcal - launch your favorite IDE to access python.
jupyter-labshould be installed in this environment, so type it's name in the same terminal and a browser window will open - Follow instructions in the [[#Data, and config]] above and, add your directory name etc. to the config file
scripts_general_fns/g10_user_config.py - Open the jupyter notebook
flowcal_pipeline_report.ipnband execute the two cells and your data should be ready in about 3 min! .. to be elaborated
- Ensure that the data is in the folder and config file specific to
R:./0.5-user_inputs.Ris updated - run
source('./analyze_fcs.R')to load the data into R - run
7-exploratory_data_view.Rfor saving overview of all data. - run
11-manual_gating_workflow.Rfor gating and saving counts of populations above the gated thresholds
Do contact me if you have any questions about running this by creating an issue here
wrappers for automated processing and plotting of bacterial flow cytometry data
Copyright (C) 2023 Prashant Kalvapalle
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.