Skip to content

ImmunoScore/ImmuneNetScore

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ImmuneNetScore

A computational framework to profile networks of distinct immune-cells from tumor transcriptomes

ImmuneNetScore VERSION 1.0 User Perl Documentation

NAME

   ImmunoNetScore.pl  

SYNOPSIS

   Reads in an expression profile for a quantile normalized datasets in the NCBI
   GEO data base format and calculates an immune profile for both general
   immune-cell types (GITs) and distinct immune-cell types (DISTs) based on
   an integrated framework of knowledge based information for human genes 

USAGE

   $perl ImmunNetScore.pl [CASE_NAME] [CONFIG_FILE] [GEO_SERIES_MATRIX_FILE] 

DESCRIPTION

   The script analyzes a transcriptome of a complex biological sample, such as a tumor,
   and calculates immune profiles for both general immune-cell types (GITs) and distinct
   immune-cell types (DISTs) based on an integrated framework of knowledge based information
   for human genes. More detailed background information is accessible in the manuscript:
   A computational framework to profile networks of distinct immune-cells from tumor transcriptomes,
   Clancy and Hovig, 2015. 
   
   The script accesses key sourced data from './data/' directories in the repository neccessary to calculate 
   immune-cell profiles and networks of immune-cell  whose signatures expression profiles have been inferred from
   transcriptome analysis. The following are the source data in subdirectories  './data/': 
      - saturation: GIT profiles with gene saturation scores computed from all of Medline
      - aplcusters: The resltant affinity propagation clusters computed from DIST specific
                    protein-protien interaction networks
      - platform: Expression array plaform data from the GEO_series_matrix_file
   
   The script then creates a results directory with the following .TXT files detailing the GIT
   and DIST immmune profiles for the transcritpome being analyzed in './CASE_NAME':
      - general_res_CASE_NAME_subtype_scores.txt - calculated GIT scores for all samples/patients
      - specific_res_CASE_NAME_subtype_scores.txt - calculated DIST scores for all samples/patients
   
   The script creates a directory for patient/sample specific immune-cell networks in
   the './CASE_NAME/networks' directory containing the following file types:
      - GSMXXXX_immunecell_net.txt - network files
      - GSMXXXX_immunecell_associated_DIST_edges.csv - underlying DIST connections for the network

INPUT

   -CASE_NAME
      Assigned name to the study/case. A '../CASE_NAME/' directory is created, score files and
      '../CASE_NAME/results' sub-directories
      
   - CONFIG_FILE [OPTIONS]
      Each option in the configuration files has a prefix '#' followed by the option and its variabe: '#OPTION: VAR'
      - #begin
      - #case: CASE_NAME
      - #accession: GSEnnnn
      - #platform: GPLmmmm
      - #gene_platform:  Row number from GPLmmmm indicating start of ID row in the platform
      - #begin_data_platform: Row number from GPLmmmm indicating start of data row in the platform
      - #patient_id_row: Row number from GSEnnnn indicating start of patient row in the acession
      - #sample_id_row:  Row number from GSEnnnn indicating start of sample ID row in the acession
      - #saturation: value between 0.0 and 1.0. The limit of literature saturation of genes to immune-cells (default=0.9)
      - #logratio: :value between 0.0 and 1.0. The log ratio which defines network connection between two immune-cells (default=0.5)
      - ##end config
   -GEO_SERIES_MATRIX_FILE
      This format is available through the NCBI Gene Expression Omnibus (GEO) website.
      The files are typically named GSEnnnn-GPLmmm_series_matrix.txt (where nnnn is the series number and mmm is the platform number.)
      These files only include samples for a single platform.
      Files downloaded from GEO are usually compressed, with .gz extensions. They need to be decompressed (e.g. using gunzip) before use. 

OUTPUT

   - General Immune-cell Type (GIT) scores in the filename:  general_res_CASE_NAME_subtype_scores.txt
   (calculated GIT scores for all samples/patients)
      Location
         '.data/CASE_NAME/results'
      Format
         Column 1 : GEO accession numbers GSMnnnnnnn for patients/samples
         Subsequent columns are the GIT scores 
      
   - Distinct Immune-cell SubType (DIST) scores in the filename:  specific_res_CASE_NAME_subtype_scores.txt
   (calculated GIT scores for all samples/patients)
      Location
         '.data/CASE_NAME/results'
      Format
         Column 1 : GEO accession numbers GSMnnnnnnn for patients/samples
         Subsequent columns are the DIST scores
         
   - Immune cell network files in the filenmae generated for each patient/sample: GSMXXXX_immunecell_net.txt 
       Location
         '.data/CASE_NAME/results/networks'
             Format
         Column 1 : GIT node 1
         Column 2 : GIT node 2
         Column 3 : Edge directionality
            pd = directed
            pp = undirected
         Column 4 : Log ratio score
            > #logratio in CONFIG_FILE 

AUTHOR

    Trevor Clancy - trevor.clancy@rr-research.no

perl v5.8.8 2015-12-12 ImmuneNetScore VERSION 1.0

About

A computational framework to profile networks of distinct immune-cells from tumor transcriptomes

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages