class: center, middle, inverse, title-slide # Recap II ### Introduction to R
Bern R Bootcamp
### June 2020 --- layout: true <div class="my-footer"> <span style="text-align:center"> <span> <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/by-sa.png" height=14 style="vertical-align: middle"/> </span> <a href="https://therbootcamp.github.io/"> <span style="padding-left:82px"> <font color="#7E7E7E"> www.therbootcamp.com </font> </span> </a> <a href="https://therbootcamp.github.io/"> <font color="#7E7E7E"> Introduction to R | June 2020 </font> </a> </span> </div> --- .pull-left45[ # Statistics I #### <high>Descriptive statistics</high> with <mono>dplyr</mono> ```r # Group-summarise idiom baselers %>% group_by(sex, eyecor) %>% summarise( N = n(), age_mean = mean(age), height_median = median(height), children_max = max(children) ) ``` #### <high>Simple hypothesis tests</high> with <mono>stats</mono> ```r # Simple hypothesis test t.test(baselers$happiness, baselers$fitness, var.equal = TRUE) ``` ] .pull-right45[ <br><br><br> <p align = "center"> <img src="image/null_hypothesis.png" height=430px><br> <font style="font-size:10px">from <a href="https://xkcd.com/892/">xkcd.com</a></font> </p> ] --- # <mono>summarise()</mono> .pull-left45[ Use `summarise()` to create new columns of <high>summary statistics</high>. The result of `summarise()` is always be a tibble. Functions used in `summerise()` <high>must return a single value</high>. ```r data %>% summarise( NAME = SUMMARY_FUN(A), NAME = SUMMARY_FUN(B), ... ) ``` ] .pull-right5[ ```r # Calculate summary statistics baselers %>% summarise( N = n(), age_mean = mean(age), height_median = median(height), children_max = max(children) ) ``` ``` # A tibble: 1 x 4 N age_mean height_median children_max <int> <dbl> <dbl> <dbl> 1 10000 44.6 171. 6 ``` ] --- # <mono>group_by()</mono> + <mono>summarise()</mono> .pull-left45[ Use `group_by()` to <high>group data</high> according to one or more columns. Then, use `summarise()` to <high>calculate summary statistics</high> for each group. You can include <high>one or more</high> grouping variables. ```r data %>% group_by(A, B, ...) %>% summarise( NAME = SUMMARY_FUN(A), NAME = SUMMARY_FUN(B), ... ) ``` ] .pull-right5[ ```r baselers %>% group_by(sex) %>% summarise( N = n(), age_mean = mean(age), income_median = median(income, na.rm = TRUE) ) ``` ] --- # <mono>group_by()</mono> + <mono>summarise()</mono> .pull-left45[ Use `group_by()` to <high>group data</high> according to one or more columns. Then, use `summarise()` to <high>calculate summary statistics</high> for each group. You can include <high>one or more</high> grouping variables. ```r data %>% group_by(A, B, ...) %>% summarise( NAME = SUMMARY_FUN(A), NAME = SUMMARY_FUN(B), ... ) ``` ] .pull-right5[ ```r baselers %>% group_by(sex) %>% summarise( N = n(), age_mean = mean(age), income_median = median(income, na.rm = TRUE) ) ``` ``` # A tibble: 2 x 4 sex N age_mean income_median <chr> <int> <dbl> <dbl> 1 female 5000 45.4 7300 2 male 5000 43.8 7100 ``` ] --- .pull-left3[ # Decision tree ] .pull-right6[ <p align="center"> <img src="https://s3-eu-west-1.amazonaws.com/pfigshare-u-previews/98305/preview.jpg" height="580px" vspace="20"> </p> ] .pull-left45[ # Plotting ```r ggplot(data = mpg, mapping = aes(x = XXX, y = XXX)) + geom_XXX() + geom_XXX() + labs(x = "XXX", y = "XXX", title = "XXX", caption = "XXX") + theme_bw() ``` ] .pull-right45[ <br><br><br><br> ![](RecapII_files/figure-html/unnamed-chunk-10-1.png)<!-- --> ] --- .pull-left45[ # Plotting ```r ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) + geom_point() + geom_smooth() + labs(x = "Miles per gallon in city", y = "Miles per gallon on highway", title = "MPG data", caption = "Source: mpg data tidyverse.") + theme_bw() ``` ] .pull-right45[ <br><br><br><br> ![](RecapII_files/figure-html/unnamed-chunk-12-1.png)<!-- --> ] --- class: middle, center <h1><a href="https://dwulff.github.io/Intro2R_Unibe/1_Data/BernRBootcamp_Day3.zip">Downloads</a></h1> --- .pull-left35[ <br><br><br><br><br><br><br><br><br> <p align="center"> <font size=8><hfont><high>Questions?</high></hfont></font><br> <font size = 4><a href = "https://therbootcamp.github.io/Intro2DataScience_2018Oct/">Link to schedule</a></font> </p> ] .pull-right6[ <img src="https://dwulff.github.io/Intro2R_Unibe/_sessions/Welcome/image/schedule.png" height="580" align="center"> ]