0% found this document useful (0 votes)
15 views6 pages

Assignment 2.2

The document details the installation and loading of the 'tidyverse' package in R, including the successful unpacking of its dependencies. It also describes data manipulation steps using 'table2' to extract tuberculosis (TB) cases and population data, calculate rates, and create a plot showing changes in TB cases over time. Additionally, it highlights an error encountered when trying to select a non-existent column and provides examples of data frames used in the analysis.

Uploaded by

Mohamed Romance
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views6 pages

Assignment 2.2

The document details the installation and loading of the 'tidyverse' package in R, including the successful unpacking of its dependencies. It also describes data manipulation steps using 'table2' to extract tuberculosis (TB) cases and population data, calculate rates, and create a plot showing changes in TB cases over time. Additionally, it highlights an error encountered when trying to select a non-existent column and provides examples of data frames used in the analysis.

Uploaded by

Mohamed Romance
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Assignment: 2.

> install.packages("tidyverse", dependencies = TRUE)

package ‘lazyeval’ successfully unpacked and MD5 sums checked


package ‘pkgbuild’ successfully unpacked and MD5 sums checked
package ‘rprojroot’ successfully unpacked and MD5 sums checked
package ‘diffobj’ successfully unpacked and MD5 sums checked
package ‘rex’ successfully unpacked and MD5 sums checked
package ‘Rcpp’ successfully unpacked and MD5 sums checked
package ‘brio’ successfully unpacked and MD5 sums checked
package ‘desc’ successfully unpacked and MD5 sums checked
package ‘pkgload’ successfully unpacked and MD5 sums checked
package ‘praise’ successfully unpacked and MD5 sums checked
package ‘waldo’ successfully unpacked and MD5 sums checked
package ‘covr’ successfully unpacked and MD5 sums checked
package ‘feather’ successfully unpacked and MD5 sums checked
package ‘mockr’ successfully unpacked and MD5 sums checked
package ‘testthat’ successfully unpacked and MD5 sums checked
package ‘tidyverse’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in


C:\Users\Mohamed\AppData\Local\Temp\Rtmp46RGAG\downloaded_packages
>
> installed.packages()["tidyverse", ]

Package

"tidyverse"

LibPath

"C:/Users/Mohamed/AppData/Local/R/win-library/4.3"

Version

"2.0.0"

Priority

NA

Depends

"R (>= 3.3)"

Imports
"broom (>= 1.0.3), conflicted (>= 1.2.0), cli (>= 3.6.0),\ndbplyr (>= 2.3.0), dplyr (>= 1.1.0), dtplyr (>= 1.2.2), forcats\n(
>= 1.0.0), ggplot2 (>= 3.4.1), googledrive (>= 2.0.0),\ngooglesheets4 (>= 1.0.1), haven (>= 2.5.1), hms (>= 1.1.2),\nht
tr (>= 1.4.4), jsonlite (>= 1.8.4), lubridate (>= 1.9.2),\nmagrittr (>= 2.0.3), modelr (>= 0.1.10), pillar (>= 1.8.1),\npurrr
(>= 1.0.1), ragg (>= 1.2.5), readr (>= 2.1.4), readxl (>=\n1.4.2), reprex (>= 2.0.2), rlang (>= 1.0.6), rstudioapi (>=\n0.1
4), rvest (>= 1.0.3), stringr (>= 1.5.0), tibble (>= 3.1.8),\ntidyr (>= 1.3.0), xml2 (>= 1.3.3)"

LinkingTo

NA

Suggests

"covr (>= 3.6.1), feather (>= 0.3.5), glue (>= 1.6.2), mockr\n(>= 0.2.0), knitr (>= 1.41), rmarkdown (>= 2.20), testthat
(>=\n3.1.6)"

Enhances

NA

License

"MIT + file LICENSE"

License_is_FOSS

NA

License_restricts_use

NA

OS_type

NA

MD5sum

NA

NeedsCompilation

"no"

Built

"4.3.3"
>
> library(tidyverse)
── Attaching core tidyverse packages ─────────────────────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ───────────────────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package to force all conflicts to become errors
Warning messages:
1: package ‘tidyverse’ was built under R version 4.3.3
2: package ‘ggplot2’ was built under R version 4.3.3
3: package ‘tibble’ was built under R version 4.3.3
4: package ‘tidyr’ was built under R version 4.3.3
5: package ‘readr’ was built under R version 4.3.3
6: package ‘purrr’ was built under R version 4.3.3
7: package ‘dplyr’ was built under R version 4.3.3
8: package ‘stringr’ was built under R version 4.3.3
9: package ‘forcats’ was built under R version 4.3.3
10: package ‘lubridate’ was built under R version 4.3.3
>
> detach("package:ggplot2", unload = TRUE)
> library(ggplot2)
Warning message:
package ‘ggplot2’ was built under R version 4.3.3
> library(dplyr)
> library(tidyr)
> # Assuming table2 has columns country, year, and cases
> tb_cases <- table2 %>%
+ filter(type == "cases") %>%
+ select(country, year, cases)
Error in `select()`:
! Can't select columns that don't exist.
✖ Column `cases` doesn't exist.
Run `rlang::last_trace()` to see where the error occurred.
> head(table2)
# A tibble: 6 × 4
country year type count
<chr> <dbl> <chr> <dbl>
1 Afghanistan 1999 cases 745
2 Afghanistan 1999 population 19987071
3 Afghanistan 2000 cases 2666
4 Afghanistan 2000 population 20595360
5 Brazil 1999 cases 37737
6 Brazil 1999 population 172006362
> head(table3)
# A tibble: 6 × 3
country year rate
<chr> <dbl> <chr>
1 Afghanistan 1999 745/19987071
2 Afghanistan 2000 2666/20595360
3 Brazil 1999 37737/172006362
4 Brazil 2000 80488/174504898
5 China 1999 212258/1272915272
6 China 2000 213766/1280428583
> head(table1)
# A tibble: 6 × 4
country year cases population
<chr> <dbl> <dbl> <dbl>
1 Afghanistan 1999 745 19987071
2 Afghanistan 2000 2666 20595360
3 Brazil 1999 37737 172006362
4 Brazil 2000 80488 174504898
5 China 1999 212258 1272915272
6 China 2000 213766 1280428583
> head(table4)
> head(table4a)
# A tibble: 3 × 3
country `1999` `2000`
<chr> <dbl> <dbl>
1 Afghanistan 745 2666
2 Brazil 37737 80488
3 China 212258 213766
> head(table4b)
# A tibble: 3 × 3
country `1999` `2000`
<chr> <dbl> <dbl>
1 Afghanistan 19987071 20595360
2 Brazil 172006362 174504898
3 China 1272915272 1280428583
>

a. Extract the number of TB cases per country per year from table2.
cases <- table2 %>% filter(type == "cases")

b. Extract the matching population per country per year.


> population <- table2 %>% filter(type == "population")

c. Divide cases by population, and multiply by 10,000.


rate <- cases %>%
+ left_join(population, by = c("country", "year"), suffix = c("_cases", "_population")) %>%
+ mutate(rate = (count_cases / count_population) * 10000)
d. Store back in the appropriate place.
table2_rate <- rate %>%
+ select(country, year, rate)
>
Which representation is easiest to work with? Which is hardest?
Why?
3. Re-create the plot showing change in cases over time using
table2 instead of table1. What do you need to do first?
Steps
1. Filter cases, we need to extract the number of cases from table2 where the type
is cases.
cases_data <- table2 %>% filter(type == "cases")

2. Plot the data: we use ggplot2 to create the plot showing the change in cases
over time.
library(ggplot2)
>
> ggplot(cases_data, aes(x = year, y = count, color = country)) +
+ geom_line() +
+ labs(title = "Change in TB Cases Over Time",
+ x = "Year", y = "Number of Cases")
>

You might also like