R Project

The CDC has contracted experts to analyze molecular screening data from two countries experiencing a novel disease outbreak to determine where the outbreak began and if a vaccine developed in one country would work in the other. The screening data includes information on 10 genetic markers from infected patients. Experts are asked to compile the screening data, analyze it to answer the two questions, and provide R scripts for ongoing analysis of future outbreak data.

Uploaded by

Neeck

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views2 pages

R Project

Uploaded by

Neeck

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Introduction to Biocomputing - R Project

Complete the tasks below and submit your results via a pull request on GitHub by 5 pm
Friday, December 3rd.

You have been contracted by the U.S. Center for Disease Control (CDC) to evaluate the dynamics of an
emerging disease outbreak and potential response strategy to the outbreak. Because this outbreak is in the
early stages, all details about locations and type of disease will be limited. The scenario is that we have
detected, using molecular biology screening, a novel disease-causing agent in two countries. We have extensive
screening data made available by both countries and we are interested in answers to two questions:

1. In which country (X or Y) did the disease outbreak likely begin?

2. If Country Y develops a vaccine for the disease, is it likely to work for citizens of Country X?

The molecular screening used is something called “microsatellite analysis”. In this specific screen, the presence
or absence of ten markers specific to an immunologically active protein from the disease-causing agent give
us information about the presense/absence of the disease in a patient and similarity or difference amongst
strains of the disease in a group of patients.
• If one or more of the markers are present in a patient’s sample, the patient was infected.
• The disease is caused by a bacteria that is transmitted through the air. Given the short generation
time of the bacteria causing the disease and rapid sperad of the disease, we suspect the disease-causing
bacteria is evolving along its transmission path.
• The gene targeted by the microsatellite screen encodes the main protein shown to form an immunological
response by the patient and so differences in which markers are present would indicate differences in
the protein in the disease and potentially different responses by a patient’s immune system.

Details about data provided

Each country is screening a large number of patients with symptoms daily. Data from each country are
provided in text files with names of the format “screening_NNN.txt” where NNN indicates the day of year
that the screens in that file were conducted on. Each file contains twelve columns - gender, age, and ten
columns for the ten microsatellite markers (1 means the marker was present for that patient, 0 means the
marker was absent).

Although your primary goal is to answer the two questions above, we expect additional data of the form we
currently have access to from Country X and Y in subsequent months. It is also possible that other countries
will see the disease spread and therefore we want you to provide answers and supporting information
for the two questions above, but also provide code that could be used for future analyses by
the CDC. Specific requirements of your code are listed below.
Code requirements
The CDC requests two scripts written in the R programming language. The first script (supportingFunc-
tions.R) will contain a number of custom functions created to accomplish various data handling or summary
tasks. The second script (analysis.R) will use the source() function to load the functions defined in support-
ingFunctions.R, compile all data into a single comma-separated value (.csv) file, process the data included
in the entire data set in order to answer the two questions above and provide graphical evidence for your
answers. Use comments in analysis.R to explain the rational and how the graphical evidence supports your
answer to the two questions.

To facilitate analysis of the provided data, as well as future data, the CDC requests the following functions
(and any others you feel are necessary) be provided in supportingFunctions.R:
• Country X and Y have different traditions for the delimiter in their data files. Write a function that
converts all files in a directory with space- or tab-delimited data (.txt) into comma-separated value
files.
• Write a function to compile data from all .csv files in a directory into a single .csv file. The compiled
data should have the original twelve columns from daily data sheets, but also country and dayofYear
columns. The user should be able to choose whether they want to remove rows with NA’s in any
columns, include NAs in the compiled data but be warned of their presence, or include NAs in the
compiled data without a warning
• Write a function to summarize the compiled data set in terms of number of screens run, percent of
patients screened that were infected, male vs. female patients, and the age distribution of patients.

You can work in groups of up to 3 students. You will only need to turn in one set of scripts for the whole
group via pull request on Github, but all group members must contribute to the final product.
To begin your work, fork the R Project repo from Stuart’s Github. Clone the forked repo so that you have
the required files. Be sure to commit regularly to show how you and your group members contibuted to your
solutions.

Turning in your assignment via GitHub

Once you have committed all changes to your local Git repo and pushed all of those commits to the forked
repo on GitHub, you can “turn in” your assignment using a pull request. This can be done from the
GitHub repo website. When viewing the forked repo, select “Pull requests” in the upper middle of the screen,
then click the green “New pull request” button in the upper right. You’ll then see a screen with a history of
commits for you and your collaborators, select the green “Create pull request button”. In the text box next
to your user icon near the top of the page, remove whatever text is there and add “last name submission”,
but obviously substitute your last names. Then click the green “Create pull request” button.

CS3943-9223 Assignment1
No ratings yet
CS3943-9223 Assignment1
2 pages
CS685: Data Mining: Assignment 1 (100 Marks) Due On: 13th September, 2021, 11:00pm
No ratings yet
CS685: Data Mining: Assignment 1 (100 Marks) Due On: 13th September, 2021, 11:00pm
2 pages
Assignment R
No ratings yet
Assignment R
6 pages
Syadatajveez
No ratings yet
Syadatajveez
21 pages
R Programming Workshop Guide
No ratings yet
R Programming Workshop Guide
7 pages
Covid Data Report
No ratings yet
Covid Data Report
21 pages
DIY Project - Data Mining and Analytics2
No ratings yet
DIY Project - Data Mining and Analytics2
1 page
Genetic Analysis Project Guide
No ratings yet
Genetic Analysis Project Guide
3 pages
R Jeevitha
No ratings yet
R Jeevitha
16 pages
Annotated-Lab 1 Spring 2025 Assignment - RMD
No ratings yet
Annotated-Lab 1 Spring 2025 Assignment - RMD
3 pages
CPSC 217: Introduction To Computer Science For Multidisciplinary Studies I
No ratings yet
CPSC 217: Introduction To Computer Science For Multidisciplinary Studies I
9 pages
HW3-Control Flow Functions-1
No ratings yet
HW3-Control Flow Functions-1
2 pages
Introduction R For DS
No ratings yet
Introduction R For DS
9 pages
Edge Covid-19: A Web Platform To Generate Submission-Ready Genomes For Sars-Cov-2 Sequencing Efforts
No ratings yet
Edge Covid-19: A Web Platform To Generate Submission-Ready Genomes For Sars-Cov-2 Sequencing Efforts
18 pages
Project Proposal
No ratings yet
Project Proposal
2 pages
R - SEC - 2022 - Solution DU CBCS
No ratings yet
R - SEC - 2022 - Solution DU CBCS
6 pages
Sample
No ratings yet
Sample
13 pages
IP Projects For Class Xii
0% (1)
IP Projects For Class Xii
20 pages
COMP551 Fall 2020 P1
No ratings yet
COMP551 Fall 2020 P1
4 pages
COMP2501 - Assignment - 1 - Questions - RMD 2
No ratings yet
COMP2501 - Assignment - 1 - Questions - RMD 2
7 pages
Pyr Agossou FR
No ratings yet
Pyr Agossou FR
12 pages
Installing R and RStudio
No ratings yet
Installing R and RStudio
40 pages
Assignment 1 Specification - T1 - 2023 - COIT12209
No ratings yet
Assignment 1 Specification - T1 - 2023 - COIT12209
3 pages
A Public Website For The Automated Assessment and Validation of Sars-Cov-2 Diagnostic PCR Assays
No ratings yet
A Public Website For The Automated Assessment and Validation of Sars-Cov-2 Diagnostic PCR Assays
8 pages
Name
No ratings yet
Name
23 pages
Artificial Intelligence Project Report
No ratings yet
Artificial Intelligence Project Report
15 pages
Ip Covid Proj 24-25 Board Final
No ratings yet
Ip Covid Proj 24-25 Board Final
35 pages
Information Technology Sba 2021-2022
100% (1)
Information Technology Sba 2021-2022
15 pages
COVID 19 Some Challenges Some Data 1
No ratings yet
COVID 19 Some Challenges Some Data 1
26 pages
Name
No ratings yet
Name
23 pages
Maheswari Public School Kalwar Road: Project File Session 2023-24
No ratings yet
Maheswari Public School Kalwar Road: Project File Session 2023-24
28 pages
MTH3409 - SCL - Sem - 1 2022 2023 PDF
No ratings yet
MTH3409 - SCL - Sem - 1 2022 2023 PDF
5 pages
RBasics Handout
No ratings yet
RBasics Handout
6 pages
MMMMM
No ratings yet
MMMMM
23 pages
Report - Data Visualization and Exploration
No ratings yet
Report - Data Visualization and Exploration
14 pages
STAT8010 Assignment 2 - 2023
No ratings yet
STAT8010 Assignment 2 - 2023
4 pages
CITS1401 Project#02, Sem2, 2024
No ratings yet
CITS1401 Project#02, Sem2, 2024
10 pages
QMM1001 Applied Activity 2
No ratings yet
QMM1001 Applied Activity 2
2 pages
BCA Covid-19 Data Analysis
No ratings yet
BCA Covid-19 Data Analysis
37 pages
Data Analytics With Python
No ratings yet
Data Analytics With Python
18 pages
Problem Set 8
No ratings yet
Problem Set 8
3 pages
Ip Practical File Class Xii
No ratings yet
Ip Practical File Class Xii
27 pages
QMM1001 Applied Activity 1
No ratings yet
QMM1001 Applied Activity 1
2 pages
Biol1001 RAssignment 1
No ratings yet
Biol1001 RAssignment 1
5 pages
Nishant Mini Project 1 Rishi
No ratings yet
Nishant Mini Project 1 Rishi
18 pages
Rishi Mini Project
No ratings yet
Rishi Mini Project
18 pages
COVID-19 Vaccine Disparity Study
No ratings yet
COVID-19 Vaccine Disparity Study
2 pages
Screenshot 2024-11-07 at 8.59.45 PM
No ratings yet
Screenshot 2024-11-07 at 8.59.45 PM
15 pages
QBUS6840 Group Assignment (30 Marks) : 1 Background and Task
No ratings yet
QBUS6840 Group Assignment (30 Marks) : 1 Background and Task
3 pages
Python Codes and Comments
No ratings yet
Python Codes and Comments
5 pages
C Ovid Data Analysis
No ratings yet
C Ovid Data Analysis
3 pages
1 s2.0 S1755436524000185 Main
No ratings yet
1 s2.0 S1755436524000185 Main
11 pages
FIT2086 Assignment 3: Regression & Classification Analysis
No ratings yet
FIT2086 Assignment 3: Regression & Classification Analysis
9 pages
and Data/uk - and - Regional - Series
0% (1)
and Data/uk - and - Regional - Series
5 pages
Lec 09
No ratings yet
Lec 09
16 pages
R Programmimg Practical Journal All-1
No ratings yet
R Programmimg Practical Journal All-1
25 pages
Epiverse Data Science - Training Timetable
No ratings yet
Epiverse Data Science - Training Timetable
1 page
MIT1 022F18 ProjectEpidemics
No ratings yet
MIT1 022F18 ProjectEpidemics
7 pages
Covid Data Proposal
No ratings yet
Covid Data Proposal
2 pages
Accreditation Requirements Hemodialysis
No ratings yet
Accreditation Requirements Hemodialysis
12 pages
Abnormal Psychology - Exam 2
No ratings yet
Abnormal Psychology - Exam 2
5 pages
I
No ratings yet
I
13 pages
ICU One Pager Running A Code
No ratings yet
ICU One Pager Running A Code
1 page
Torticolis Articol Protocol
No ratings yet
Torticolis Articol Protocol
68 pages
TVU Nursing Care Plan
No ratings yet
TVU Nursing Care Plan
19 pages
Causes of Acute Abdominal Pain in Children by Age - UpToDate
No ratings yet
Causes of Acute Abdominal Pain in Children by Age - UpToDate
2 pages
Perioperative Management of Patients With.21
No ratings yet
Perioperative Management of Patients With.21
25 pages
Oral Surgery Exam Prep
No ratings yet
Oral Surgery Exam Prep
30 pages
Prushield Ebrochure English
No ratings yet
Prushield Ebrochure English
26 pages
Keloid S
No ratings yet
Keloid S
1 page
Infertility, Infertility Treatment and Behavioural Problems in The Offspring
No ratings yet
Infertility, Infertility Treatment and Behavioural Problems in The Offspring
13 pages
Down Syndrome: Key Facts & Testing
No ratings yet
Down Syndrome: Key Facts & Testing
2 pages
Good Pharmacy Practice
100% (2)
Good Pharmacy Practice
24 pages
PBI Springer Product Market Codes 1301
No ratings yet
PBI Springer Product Market Codes 1301
11 pages
Pediatric Endocrinology: A Practical Clinical Guide
100% (1)
Pediatric Endocrinology: A Practical Clinical Guide
870 pages
Eye Hospital Design Insights
No ratings yet
Eye Hospital Design Insights
35 pages
Instant download Emergency and Trauma Care for Nurses and Paramedics Kate Curtis Clair Ramsden Ramon Z Shaban Margaret Fry Julie Considine Curtis Kate Ramsden Clair Shaban Ramon Z Fry Margaret Considine Julie pdf all chapter
100% (11)
Instant download Emergency and Trauma Care for Nurses and Paramedics Kate Curtis Clair Ramsden Ramon Z Shaban Margaret Fry Julie Considine Curtis Kate Ramsden Clair Shaban Ramon Z Fry Margaret Considine Julie pdf all chapter
62 pages
Sakhiya Skin Clinic Limited: Contact: Toll Free No.: Email
No ratings yet
Sakhiya Skin Clinic Limited: Contact: Toll Free No.: Email
1 page
BattBattered Child
No ratings yet
BattBattered Child
20 pages
Kali Brom for PCOS Acne Treatment
100% (1)
Kali Brom for PCOS Acne Treatment
6 pages
IQVIA Market Prognosis 2024-2028 China and Hong Kong
No ratings yet
IQVIA Market Prognosis 2024-2028 China and Hong Kong
12 pages
GTN Ointment Glasgow Colorectal Centre
No ratings yet
GTN Ointment Glasgow Colorectal Centre
4 pages
Comprehensive Lab Values Guide
100% (1)
Comprehensive Lab Values Guide
4 pages
TAWS Poster 2025 Edited
No ratings yet
TAWS Poster 2025 Edited
3 pages
17 The Puerperium Noted PDF
No ratings yet
17 The Puerperium Noted PDF
39 pages
Cost Effectiveness Analysis
No ratings yet
Cost Effectiveness Analysis
13 pages
CIOMS WG XII BR Balance For Medicinal Products May 2025 1748355746
No ratings yet
CIOMS WG XII BR Balance For Medicinal Products May 2025 1748355746
196 pages
Peritonitis Following Prepyloric Ulcer Perforation A Case Report
No ratings yet
Peritonitis Following Prepyloric Ulcer Perforation A Case Report
2 pages
PHYTOTHERAPY
No ratings yet
PHYTOTHERAPY
39 pages

R Project

Uploaded by

R Project

Uploaded by

Introduction to Biocomputing - R Project

1. In which country (X or Y) did the disease outbreak likely begin?

Details about data provided

Turning in your assignment via GitHub

You might also like