0% found this document useful (0 votes)

36 views8 pages

Cancer - Capstone Project

The document describes an objective to develop a comprehensive cancer disease data model providing factors involved in cancer. It outlines requirements for the data model including various data types and ability to handle ongoing changes. It also describes leveraging graph-oriented data stores, indexed document stores, and ontology-based concept definition. The data model classifies clinical variables and allows filtering cases. It provides visualizations of mutation frequencies and maps mutations to protein-coding regions.

Uploaded by

Sellam V

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views8 pages

Cancer - Capstone Project

Uploaded by

Sellam V

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

Objectives

Cancer is a complex multifactorial disease that affects up to 40% of people across the world.
However, many mechanisms of cancer remain unclear due to the lack of studies based on
systematic knowledge, leading to ineffective treatment and/or trans- mission of genetic
defects. Here, we developed an cancer disease data model to provide a comprehensive
resource featuring various factors involved.

The data model is designed to maintain data and metadata consistency, integrity, and
availability while accommodating:

 Biospecimen, clinical, and cancer genomic data and metadata

 Multiple, disparate NCI ongoing projects
 Completely new, as yet unthought of projects
 Ongoing changes and technological progress
 Frequent and complex queries from both external users and internal administrators

To meet these requirements, the design and implementation of the data model leverages:

 Flexible but robust graph-oriented data stores

 Indexed document stores for API and front end performance
 Ontology-based concept and data element definition

Data Processing steps:

 Schema-based entity and relationship validation on loading

 Properties are key-value pairs associated with an entity. Properties cannot be nested,
which means that the value must be numerical, boolean, or a string, and cannot be
another key-value set. Properties can be either required or optional. The following
properties are of particular importance in constructing the infertility Data Model:
o Type is a required property for all entities. Entity types
include project, case, demographic, sample, read_group and others.
o System properties are properties used in infertility system operation and
maintenance. They cannot be modified except under special circumstances.
o Unique keys are properties, or combinations of properties, that can be used to
uniquely identify the entity in the database. For example, the tuple
(combination) of [ project_id, submitter_id ] is a unique key for most entities.
 Links define relationships between entities, and the multiplicity of those relationships
(e.g. one-to-one, one-to-many, many-to-many).
Directed Acyclic Graph – Interconnected entities
Users can filter by specific clinical variables, grouped into these categories:

 Demographic: Data for the characterization of the patient by means of segmenting the
population (e.g. characterization by age, sex, race, etc.).
 Diagnoses: Data from the investigation, analysis, and recognition of the presence and
nature of disease, condition, or injury from expressed signs and symptoms; also, the
scientific determination of any kind; the concise results of such an investigation.
 Treatments: Records of the administration and intention of therapeutic agents
provided to a patient to alter the course of a pathologic process.
 Exposures: Clinically-relevant patient information not immediately resulting from
genetic predispositions.

The Cases tab gives an overview of all the cases/patients who correspond to the filters chosen
(Cohort).

The top of this section contains a few pie graphs with categorical information regarding the
Primary Site, Project, Disease Type, Gender, and Vital Status.

Below these pie charts is a tabular view of cases, which can be exported, sorted and saved
using the buttons on the right and includes the following information:

 Case ID (Submitter ID): The Case ID / submitter ID of that case/patient (i.e. TCGA
Barcode).
 Project: The study name for the project for which the case belongs.
 Primary Site: The primary site of the cancer/project.
 Gender: The gender of the case.
 Files: The total number of files available for that case.
 Available Files per Data Category: Seven columns displaying the number of files
available in each of the seven data categories. These link to the files for the specific
case.
 # Mutations: The number of SSMs (simple somatic mutations) detected in that case.
 # Genes: The number of genes affected by mutations in that case.

Note: By default, the UUID is not displayed on summary page tables. You can display the
UUID by clicking on the icon with 3 parallel lines and checking the UUID option.

Case Summary Page

The Case Summary Page displays case details including the project and disease information,
data files that are available for that case, and the experimental strategies employed.

CLINICAL AND BIOSPECIMEN INFORMATION

The page also provides clinical and biospecimen information about that case. Links to export
clinical and biospecimen information in JSON format are provided.
Some clinical records can support multiple records of the same type (Diagnoses, Family
Histories, Exposures, Follow-Ups, Molecular Tests). If only one record exists, the UUID of
the record is provided at the top of the corresponding tab.

If there are multiple records, they are listed as horizontal tabs.

Some record types are further nested under another. For example, a Diagnosis record may
have multiple associated Treatment records. Or a Follow-Up record may have multiple
associated Molecular Test Records. The associated sub-records are listed in a table on the tab.


Users can filter by specific clinical variables, grouped into these categories:

Data Visualization
A table and two bar graphs show how many cases are affected by mutations and copy number
variation within the gene as a ratio and percentage. Each row/bar represents the number of
cases for each project. The final column in the table lists the number of unique mutations
observed on the gene for each project.
PROTEIN VIEWER

Mutations and their frequency across cases are mapped to a graphical visualization of
protein-coding regions with a lollipop plot. Pfam domains are highlighted along the x-axis to
assign functionality to specific protein-coding regions. The bottom track represents a view of
the full gene length. Different transcripts can be selected by using the drop-down menu above
the plot.

GenomicDataCommons - a - Bioconductor - Interface - to - the (Cách Viết Code Cho Gói TCGAbiolinks) Python
No ratings yet
GenomicDataCommons - a - Bioconductor - Interface - to - the (Cách Viết Code Cho Gói TCGAbiolinks) Python
4 pages
WiDs EDA Final
No ratings yet
WiDs EDA Final
34 pages
Brain Umap
No ratings yet
Brain Umap
12 pages
Clinical Genomic Database
No ratings yet
Clinical Genomic Database
5 pages
NCI Cancer Research Data Commons: Resources To Share Key Cancer Data
No ratings yet
NCI Cancer Research Data Commons: Resources To Share Key Cancer Data
8 pages
Nihms 1037623
No ratings yet
Nihms 1037623
28 pages
Breast Cancer Diagnosis Using Machine Learning Alg
No ratings yet
Breast Cancer Diagnosis Using Machine Learning Alg
13 pages
Data Management Databases and Warehousing
No ratings yet
Data Management Databases and Warehousing
354 pages
2 Books
No ratings yet
2 Books
9 pages
Lec 1
No ratings yet
Lec 1
33 pages
Genomics in Pathology Practice
No ratings yet
Genomics in Pathology Practice
60 pages
Medical Genetics Ethics Cases For Interviews
No ratings yet
Medical Genetics Ethics Cases For Interviews
34 pages
Cancer Genomics
No ratings yet
Cancer Genomics
47 pages
Big Data Analytics in Genomics Free Ebook Download
No ratings yet
Big Data Analytics in Genomics Free Ebook Download
15 pages
Cinicaldatawarehousing - Rosen 4 11 07
No ratings yet
Cinicaldatawarehousing - Rosen 4 11 07
47 pages
Drilling Into Big Cancer-Genome Data
No ratings yet
Drilling Into Big Cancer-Genome Data
5 pages
Drilling Into Big Cancer-Genome Data
No ratings yet
Drilling Into Big Cancer-Genome Data
5 pages
Disease Symptoms and Patient Profile Dataset
100% (1)
Disease Symptoms and Patient Profile Dataset
6 pages
Moving Pan-Cancer Studies From Basic Research Toward The Clinic
No ratings yet
Moving Pan-Cancer Studies From Basic Research Toward The Clinic
12 pages
CellMinerCDB For Integrative Cross-Database Genomics and Pharmacogenomics Analyses of Cancer Cell Lines
No ratings yet
CellMinerCDB For Integrative Cross-Database Genomics and Pharmacogenomics Analyses of Cancer Cell Lines
37 pages
2018-Cell-Comprehensive Characterization of Cancer Driver Genes and Mutations
No ratings yet
2018-Cell-Comprehensive Characterization of Cancer Driver Genes and Mutations
37 pages
2023CTRExamPrep1Case STR
No ratings yet
2023CTRExamPrep1Case STR
36 pages
Clinical Genomic Data
No ratings yet
Clinical Genomic Data
14 pages
10 1016/j Cancergen 2019 04 049
No ratings yet
10 1016/j Cancergen 2019 04 049
1 page
Leveraging Big Data To Transform Target
No ratings yet
Leveraging Big Data To Transform Target
13 pages
Progression Modeling of Cognitive Disease Using Temporal Data Mining: Research Landscape, Gaps and Solution Design
No ratings yet
Progression Modeling of Cognitive Disease Using Temporal Data Mining: Research Landscape, Gaps and Solution Design
7 pages
Genomic Data Sharing Case Studies, Challenges, and Opportunities For Precision Medicine, 1st Edition Academic PDF Download
100% (8)
Genomic Data Sharing Case Studies, Challenges, and Opportunities For Precision Medicine, 1st Edition Academic PDF Download
15 pages
2020 Article 1554
No ratings yet
2020 Article 1554
12 pages
Computational Analysis of Next Generation Sequencing Data
No ratings yet
Computational Analysis of Next Generation Sequencing Data
10 pages
Journal of Infection and Public Health: Tanzila Saba
No ratings yet
Journal of Infection and Public Health: Tanzila Saba
16 pages
Document - Desert Weather Set
No ratings yet
Document - Desert Weather Set
81 pages
1 s2.0 S2001037023001459 Main
No ratings yet
1 s2.0 S2001037023001459 Main
17 pages
Cancer Bioinformatics Addressing The Challenges of Integrated Postgenomic Cancer Research
No ratings yet
Cancer Bioinformatics Addressing The Challenges of Integrated Postgenomic Cancer Research
5 pages
Systematic Mapping in Improving The Extraction of Cancer Pathology Information Using RPA Orchestration
No ratings yet
Systematic Mapping in Improving The Extraction of Cancer Pathology Information Using RPA Orchestration
9 pages
Uci Dataset
No ratings yet
Uci Dataset
12 pages
2019 Article 3142
No ratings yet
2019 Article 3142
10 pages
Big Data Analytics in Genomics Ka Chun Wong PDF
No ratings yet
Big Data Analytics in Genomics Ka Chun Wong PDF
426 pages
Mathematical Model of Classification of Human Genome Data For Breast Cancer
No ratings yet
Mathematical Model of Classification of Human Genome Data For Breast Cancer
12 pages
Explore Rare Cancer Medicine Mutations
No ratings yet
Explore Rare Cancer Medicine Mutations
6 pages
Intro Biol Notes
No ratings yet
Intro Biol Notes
49 pages
SDO Quiz
No ratings yet
SDO Quiz
40 pages
Public Health Genomics and Epidemiology
No ratings yet
Public Health Genomics and Epidemiology
15 pages
Acs Case Study Ravishek Kumar 2023407601
No ratings yet
Acs Case Study Ravishek Kumar 2023407601
10 pages
Lec 10 (Eng)
No ratings yet
Lec 10 (Eng)
20 pages
Genomics of Drug Sensitivity in Cancer
No ratings yet
Genomics of Drug Sensitivity in Cancer
7 pages
Cancer Genetics: Recent Advances
No ratings yet
Cancer Genetics: Recent Advances
8 pages
Cancer Registry's Role in India
No ratings yet
Cancer Registry's Role in India
27 pages
Assignment Bigdata
No ratings yet
Assignment Bigdata
17 pages
Zehir 2017
No ratings yet
Zehir 2017
14 pages
From Integrative Disease Modeling To Predictive
No ratings yet
From Integrative Disease Modeling To Predictive
12 pages
Graph4med - A Web Application...
No ratings yet
Graph4med - A Web Application...
22 pages
Genespring GX: Analysis of SNP Arrays
No ratings yet
Genespring GX: Analysis of SNP Arrays
48 pages
CellMiner Cross-Database (CellMinerCDB) Version 1.2: Exploration of Patient-Derived Cancer Cell Line Pharmacogenomics
No ratings yet
CellMiner Cross-Database (CellMinerCDB) Version 1.2: Exploration of Patient-Derived Cancer Cell Line Pharmacogenomics
11 pages
Genomic Data Privacy Challenges
No ratings yet
Genomic Data Privacy Challenges
20 pages
Health Outcomes Overview
No ratings yet
Health Outcomes Overview
12 pages
Exploring Biological Pathways Using Unity3D
No ratings yet
Exploring Biological Pathways Using Unity3D
72 pages
CRI StatisticalModeling Methods
No ratings yet
CRI StatisticalModeling Methods
89 pages
LNBI Proceedings
No ratings yet
LNBI Proceedings
11 pages
International Cancer Genome Consortium-Prince Dudhatra-9724949948
No ratings yet
International Cancer Genome Consortium-Prince Dudhatra-9724949948
37 pages
Syncytium
No ratings yet
Syncytium
87 pages
Antimicrobial Agents - Antibacterials and Antifungals (PDFDrive)
No ratings yet
Antimicrobial Agents - Antibacterials and Antifungals (PDFDrive)
1,458 pages
Student Hospital Experience Report
No ratings yet
Student Hospital Experience Report
31 pages
HCC - Improve Care
No ratings yet
HCC - Improve Care
37 pages
Pathology (Awasir)
50% (2)
Pathology (Awasir)
152 pages
Local Anesthetics 2019-2020 2nd Yr
No ratings yet
Local Anesthetics 2019-2020 2nd Yr
54 pages
2019 Ranganathan Oral Epithelial Dysplasia - Classifications and Clinical Relevance in Risk Assess
No ratings yet
2019 Ranganathan Oral Epithelial Dysplasia - Classifications and Clinical Relevance in Risk Assess
13 pages
Talagutac
No ratings yet
Talagutac
1 page
Nifuroxazide - Uses, Interactions, Mechanism of Action - DrugBank Online
No ratings yet
Nifuroxazide - Uses, Interactions, Mechanism of Action - DrugBank Online
8 pages
2021 Abbott Annual Report
No ratings yet
2021 Abbott Annual Report
86 pages
LAB 9 - Brucella Test
No ratings yet
LAB 9 - Brucella Test
4 pages
Postpartum Hemorrhage
No ratings yet
Postpartum Hemorrhage
47 pages
Habit Breaking Appliance For Tongue Thrusting - A Modification
No ratings yet
Habit Breaking Appliance For Tongue Thrusting - A Modification
5 pages
Mindmap Uti 2
No ratings yet
Mindmap Uti 2
1 page
0225 PDF
100% (1)
0225 PDF
47 pages
Elderly Bed Bathing
No ratings yet
Elderly Bed Bathing
11 pages
Cervical Cancer Mcqs-Wps Office
No ratings yet
Cervical Cancer Mcqs-Wps Office
11 pages
Amoxicillin A Versatile Antibiotic
No ratings yet
Amoxicillin A Versatile Antibiotic
8 pages
Data Guided Healthcare Decision Making New Edition PDF
100% (16)
Data Guided Healthcare Decision Making New Edition PDF
16 pages
Sam's Club Drug List For Plus Members
100% (6)
Sam's Club Drug List For Plus Members
12 pages
Nursingcrib Com NURSING CARE PLAN Chicken Pox PDF
No ratings yet
Nursingcrib Com NURSING CARE PLAN Chicken Pox PDF
2 pages
Selective Grinding
100% (1)
Selective Grinding
4 pages
Quiz
No ratings yet
Quiz
4 pages
HTS-Targeted HIV Testing Using HRST
No ratings yet
HTS-Targeted HIV Testing Using HRST
19 pages
CELBAN Tip Sheet Speaking
No ratings yet
CELBAN Tip Sheet Speaking
1 page
Chapter - 8: How Organisms Reproduce
100% (1)
Chapter - 8: How Organisms Reproduce
25 pages
Quality and Safety Evidence Based Nursing
No ratings yet
Quality and Safety Evidence Based Nursing
30 pages
Demon Possession 30
No ratings yet
Demon Possession 30
11 pages
Scalp Acupuncture for Lumbar Sprain
No ratings yet
Scalp Acupuncture for Lumbar Sprain
1 page
Osce - PG1
No ratings yet
Osce - PG1
16 pages

Cancer - Capstone Project

Uploaded by

Cancer - Capstone Project

Uploaded by

Objectives

 Biospecimen, clinical, and cancer genomic data and metadata

 Flexible but robust graph-oriented data stores

Data Processing steps:

 Schema-based entity and relationship validation on loading

Case Summary Page

CLINICAL AND BIOSPECIMEN INFORMATION

If there are multiple records, they are listed as horizontal tabs.

You might also like