adapted from meetville.com
This is your final assignment if you want to get ECTS points for this course. You will use all we learned during the course to understand, arrange, analyse, visualise and report on a new dataset about medical appointments. If you are not into ECTS points you can still take a swing at the project - it will have some new challenges for you!
What is the dataset about? There is a large number of medical appointments where patients end up canceling or simply do not show up. This practise comes with costs to the medical system - in this assignment we will try to understand what drives patients not to show up in a sample of medical appointments from Vitoria, Espirito Santo, Brazil.
You should (at least) complete the following tasks
Datasets
into R objects.Open your bernrbootcamp
R project. It should already have the folders 1_Data
and 2_Code
. Make sure that the data files listed in the Datasets
section above are in your 1_Data
folder.
Open a new R script. At the top of the script, using comments, write your name and the date. Save it as a new file called final_project.R
in the bernrbootcamp
folder.
Using library()
load the set of packages for this practical listed in the packages section above.
Read the appointments.csv
and the weather.csv
files into two objects called appointments
and weather
respectively. Check out the content of the two datasets in Overview and Datasets above.
We want to first see what variables are in the dataset (check out Datasets above) and whether there are variables with impossible values. Generate an overview of all variables with minimum and maximum values, means and medians for all appropriate numeric variables. Correct the variable(s) that stand out and provide a rational for what you did. Also rename Hipertension into Hyptertension and Handcap into Handicap.
How many patients showed up for their appointment(s), how many did not?
How many patients are female, how many are male? Who (PatientID) has the most number of appointments? Split that analysis (most number of appointments) for gender.
Generate a simple plot with the age of all patients.
Summarise the number of particiants per age in years (range: range(appointments$Age)
) for the whole appointments
dataset. Generate a line graph showing the distribution of patients per age. Now split this graph into two rows with Female|Male and two columns with Show|No-Show - this will give you an age distribution with four panels.
Generate a new variable age_group
which includes four groups [1:4] where 1: ‘children’ between 0 and 18 years, 2: ‘young adults’ between 19 and 30 years, 3: ‘adults’ (31:50 years) and 4 ‘old adults’ (51:max(Age))
Plot the four age groups on the x-axis and the size of the groups on the y-axis (call this variable: number_cases). Connect the points with a line graph, separated for gender. Facet the graph into two parts, with noshow
‘No’ on top and nowshow
‘Yes’ on bottom. So ultimately you should have 4 lines, 2 in each panel.
We want to explore the time difference between making an appointment and actually having an appointment a little more. Using the lubridate()
package extract Year-Month-Day from ScheduledDay
and write this date information (without time) into a new variable ScheduledDate
(Hint: date()
). Convert AppointmentDay
to the date class, too. We want to get the distribution of time between scheduled appointment date and actual appointment day - write this difference (in days) in a new variable called time_diff
. There are some appointments with a negative time_diff
. We assume that these are based on typos - remove them from the dataset.
Categorize lead days - we now want to categorize the time_diff
variable into a new varialbe called lead_days
with five categories: ‘0 days’, ‘1-2 days’, ‘3-7 days’, ‘8-31 days’, ‘32+ days’.
Weather could have a strong influence for going to the doctor. The weather
tibble provide information about the weather in Vitoria.We will join the two datasets adding RH2M
, T2M
and PRECTOT
from weather
to appointments
. You want to join these by AppointmentDay
(from appointements
) and YYYYMMDD
(from weather
) - Hint: left_join()
. Call the new dataset: full_df
.
Check that you full_df
dataframe has the following dimension:
Generate a new file with File - R Markdown - Document
(with the default: HTML). Save this file to your BernRBootcamp
folder.
Document your 3 central insights out of this dataset. Describe what the insights are with your own words and document them with figures, statistics and tables produced with R code in your Marddown file FinalReport.html
.
Generate a new file with File - R Markdown - Presentation
(with the default: HTML Isoslides). Save this file to your BernRBootcamp
folder FinalPresentation.html
.
Prepare a 15 minute presentation documenting your approach to the Document your 3 central insights out of this dataset. Describe what the insights are with your own words and document them with figures, statistics and tables produced with R code in your Marddown file.
File | Rows | Columns | Description |
---|---|---|---|
medical_noshows.csv | 110527 | 14 | Patient shows |
wheather.csv | 41 | 10 | Weather data for Vitoria Brasil |
Variable | Description |
---|---|
PatientId | ID of a patient |
AppointmentID | ID for each appointment |
Gender | Male or Female |
ScheduledDay | The day of the actual appointment, when patients have to visit the doctor |
AppointmentDay | The day someone called or registered the appointment, this should be before the appointment |
Age | How old is the patient |
Neighbourhood | Where the patient was born |
Hipertension | True or False |
Diabetes | True or False |
Alcoholism | True or False |
Handcap | Hanicapped - level 1:4, 1 lowest level |
SMS_received | True: 1 or more messages sent to the patient |
No-show | 1: No, 2: Yes |
Variable | Description |
---|---|
LON | Longitude |
LAT | Latitude |
YEAR | Year |
MM | Month |
DD | Day |
DOY | Day of Year |
YYYYMMDD | Date |
RH2M | Relative Humidity at 2 Meters |
T2M | Temperature at 2 Meters |
PRECTOT | Precipitation |
Package | Installation |
---|---|
tidyverse |
install.packages("tidyverse") |
lubridate |
install.packages("lubridate") |
tidylog |
devtools::install_github("elbersb/tidylog") |