Assignment: 2.
> install.packages("tidyverse", dependencies = TRUE)
package ‘lazyeval’ successfully unpacked and MD5 sums checked
package ‘pkgbuild’ successfully unpacked and MD5 sums checked
package ‘rprojroot’ successfully unpacked and MD5 sums checked
package ‘diffobj’ successfully unpacked and MD5 sums checked
package ‘rex’ successfully unpacked and MD5 sums checked
package ‘Rcpp’ successfully unpacked and MD5 sums checked
package ‘brio’ successfully unpacked and MD5 sums checked
package ‘desc’ successfully unpacked and MD5 sums checked
package ‘pkgload’ successfully unpacked and MD5 sums checked
package ‘praise’ successfully unpacked and MD5 sums checked
package ‘waldo’ successfully unpacked and MD5 sums checked
package ‘covr’ successfully unpacked and MD5 sums checked
package ‘feather’ successfully unpacked and MD5 sums checked
package ‘mockr’ successfully unpacked and MD5 sums checked
package ‘testthat’ successfully unpacked and MD5 sums checked
package ‘tidyverse’ successfully unpacked and MD5 sums checked
The downloaded binary packages are in
            C:\Users\Mohamed\AppData\Local\Temp\Rtmp46RGAG\downloaded_packages
>
> installed.packages()["tidyverse", ]
Package
"tidyverse"
LibPath
"C:/Users/Mohamed/AppData/Local/R/win-library/4.3"
Version
"2.0.0"
Priority
NA
Depends
"R (>= 3.3)"
Imports
"broom (>= 1.0.3), conflicted (>= 1.2.0), cli (>= 3.6.0),\ndbplyr (>= 2.3.0), dplyr (>= 1.1.0), dtplyr (>= 1.2.2), forcats\n(
>= 1.0.0), ggplot2 (>= 3.4.1), googledrive (>= 2.0.0),\ngooglesheets4 (>= 1.0.1), haven (>= 2.5.1), hms (>= 1.1.2),\nht
tr (>= 1.4.4), jsonlite (>= 1.8.4), lubridate (>= 1.9.2),\nmagrittr (>= 2.0.3), modelr (>= 0.1.10), pillar (>= 1.8.1),\npurrr
(>= 1.0.1), ragg (>= 1.2.5), readr (>= 2.1.4), readxl (>=\n1.4.2), reprex (>= 2.0.2), rlang (>= 1.0.6), rstudioapi (>=\n0.1
4), rvest (>= 1.0.3), stringr (>= 1.5.0), tibble (>= 3.1.8),\ntidyr (>= 1.3.0), xml2 (>= 1.3.3)"
LinkingTo
NA
Suggests
"covr (>= 3.6.1), feather (>= 0.3.5), glue (>= 1.6.2), mockr\n(>= 0.2.0), knitr (>= 1.41), rmarkdown (>= 2.20), testthat
(>=\n3.1.6)"
Enhances
NA
License
"MIT + file LICENSE"
License_is_FOSS
NA
License_restricts_use
NA
OS_type
NA
MD5sum
NA
NeedsCompilation
"no"
Built
"4.3.3"
>
> library(tidyverse)
── Attaching core tidyverse packages ─────────────────────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ───────────────────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package to force all conflicts to become errors
Warning messages:
1: package ‘tidyverse’ was built under R version 4.3.3
2: package ‘ggplot2’ was built under R version 4.3.3
3: package ‘tibble’ was built under R version 4.3.3
4: package ‘tidyr’ was built under R version 4.3.3
5: package ‘readr’ was built under R version 4.3.3
6: package ‘purrr’ was built under R version 4.3.3
7: package ‘dplyr’ was built under R version 4.3.3
8: package ‘stringr’ was built under R version 4.3.3
9: package ‘forcats’ was built under R version 4.3.3
10: package ‘lubridate’ was built under R version 4.3.3
>
> detach("package:ggplot2", unload = TRUE)
> library(ggplot2)
Warning message:
package ‘ggplot2’ was built under R version 4.3.3
> library(dplyr)
> library(tidyr)
> # Assuming table2 has columns country, year, and cases
> tb_cases <- table2 %>%
+ filter(type == "cases") %>%
+ select(country, year, cases)
Error in `select()`:
! Can't select columns that don't exist.
✖ Column `cases` doesn't exist.
Run `rlang::last_trace()` to see where the error occurred.
> head(table2)
# A tibble: 6 × 4
  country year type           count
  <chr>     <dbl> <chr>       <dbl>
1 Afghanistan 1999 cases            745
2 Afghanistan 1999 population 19987071
3 Afghanistan 2000 cases            2666
4 Afghanistan 2000 population 20595360
5 Brazil     1999 cases       37737
6 Brazil     1999 population 172006362
> head(table3)
# A tibble: 6 × 3
  country year rate
  <chr>     <dbl> <chr>
1 Afghanistan 1999 745/19987071
2 Afghanistan 2000 2666/20595360
3 Brazil     1999 37737/172006362
4 Brazil     2000 80488/174504898
5 China       1999 212258/1272915272
6 China       2000 213766/1280428583
> head(table1)
# A tibble: 6 × 4
  country year cases population
 <chr>      <dbl> <dbl> <dbl>
1 Afghanistan 1999 745 19987071
2 Afghanistan 2000 2666 20595360
3 Brazil    1999 37737 172006362
4 Brazil    2000 80488 174504898
5 China      1999 212258 1272915272
6 China      2000 213766 1280428583
> head(table4)
> head(table4a)
# A tibble: 3 × 3
 country `1999` `2000`
 <chr>      <dbl> <dbl>
1 Afghanistan 745 2666
2 Brazil    37737 80488
3 China      212258 213766
> head(table4b)
# A tibble: 3 × 3
 country       `1999` `2000`
 <chr>         <dbl> <dbl>
1 Afghanistan 19987071 20595360
2 Brazil    172006362 174504898
3 China      1272915272 1280428583
>
a. Extract the number of TB cases per country per year from table2.
cases <- table2 %>% filter(type == "cases")
b. Extract the matching population per country per year.
> population <- table2 %>% filter(type == "population")
c. Divide cases by population, and multiply by 10,000.
rate <- cases %>%
+ left_join(population, by = c("country", "year"), suffix = c("_cases", "_population")) %>%
+ mutate(rate = (count_cases / count_population) * 10000)
d. Store back in the appropriate place.
table2_rate <- rate %>%
+ select(country, year, rate)
>
Which representation is easiest to work with? Which is hardest?
Why?
3. Re-create the plot showing change in cases over time using
table2 instead of table1. What do you need to do first?
Steps
1. Filter cases, we need to extract the number of cases from table2 where the type
is cases.
cases_data <- table2 %>% filter(type == "cases")
2. Plot the data: we use ggplot2 to create the plot showing the change in cases
over time.
library(ggplot2)
>
> ggplot(cases_data, aes(x = year, y = count, color = country)) +
+ geom_line() +
+ labs(title = "Change in TB Cases Over Time",
+       x = "Year", y = "Number of Cases")
>