00:30
ggplot2
plotQuiz 1 on Monday September 11 in class
Lab 1 (both parts) due Tuesday September 12 at 9 am
The table below displays data from a survey on a class of students.
What proportion of the class was in the marching band?
00:30
What proportion of those in the marching band where juniors?
00:30
What proportion were sophomores not in the marching band?
00:30
What were the dimensions of the raw data from which this table was constructed?
00:30
How would you characterize the association between these two variables?
00:30
Political affiliation and college degree status of 500 survey participants.
Which group is the largest?
01:00
What does this plot show?
00:30
R has a vast ecosystem of packages that add new functions. Any installed package can be loaded with the library()
function.
Most data you will not be creating by hand. You will either be
Loading it in from a separate file.
Loading it from within an R package (most of our are in stat20data
)
To load data from a package,
library()
View(<df name>)
.# A tibble: 333 × 8
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
<fct> <fct> <dbl> <dbl> <int> <int>
1 Adelie Torgersen 39.1 18.7 181 3750
2 Adelie Torgersen 39.5 17.4 186 3800
3 Adelie Torgersen 40.3 18 195 3250
4 Adelie Torgersen 36.7 19.3 193 3450
5 Adelie Torgersen 39.3 20.6 190 3650
6 Adelie Torgersen 38.9 17.8 181 3625
7 Adelie Torgersen 39.2 19.6 195 4675
8 Adelie Torgersen 41.1 17.6 182 3200
9 Adelie Torgersen 38.6 21.2 191 3800
10 Adelie Torgersen 34.6 21.1 198 4400
# ℹ 323 more rows
# ℹ 2 more variables: sex <fct>, year <int>
tidyverse
The tidyverse
package contains several functions used to manipulate data frames:
select()
: subset columnsarrange()
: sort rowsmutate()
: create a new column from existing column(s)filter()
: subset rows# A tibble: 333 × 2
species island
<fct> <fct>
1 Adelie Torgersen
2 Adelie Torgersen
3 Adelie Torgersen
4 Adelie Torgersen
5 Adelie Torgersen
6 Adelie Torgersen
7 Adelie Torgersen
8 Adelie Torgersen
9 Adelie Torgersen
10 Adelie Torgersen
# ℹ 323 more rows
# A tibble: 333 × 8
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
<fct> <fct> <dbl> <dbl> <int> <int>
1 Adelie Dream 32.1 15.5 188 3050
2 Adelie Dream 33.1 16.1 178 2900
3 Adelie Torgersen 33.5 19 190 3600
4 Adelie Dream 34 17.1 185 3400
5 Adelie Torgersen 34.4 18.4 184 3325
6 Adelie Biscoe 34.5 18.1 187 2900
7 Adelie Torgersen 34.6 21.1 198 4400
8 Adelie Torgersen 34.6 17.2 189 3200
9 Adelie Biscoe 35 17.9 190 3450
10 Adelie Biscoe 35 17.9 192 3725
# ℹ 323 more rows
# ℹ 2 more variables: sex <fct>, year <int>
You can sort in descending order by wrapping the variable name in
desc()
.
# A tibble: 333 × 9
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
<fct> <fct> <dbl> <dbl> <int> <int>
1 Adelie Torgersen 39.1 18.7 181 3750
2 Adelie Torgersen 39.5 17.4 186 3800
3 Adelie Torgersen 40.3 18 195 3250
4 Adelie Torgersen 36.7 19.3 193 3450
5 Adelie Torgersen 39.3 20.6 190 3650
6 Adelie Torgersen 38.9 17.8 181 3625
7 Adelie Torgersen 39.2 19.6 195 4675
8 Adelie Torgersen 41.1 17.6 182 3200
9 Adelie Torgersen 38.6 21.2 191 3800
10 Adelie Torgersen 34.6 21.1 198 4400
# ℹ 323 more rows
# ℹ 3 more variables: sex <fct>, year <int>, bill_index <dbl>
Remember that you can nest functions.
# A tibble: 333 × 1
bill_index
<dbl>
1 57.8
2 56.9
3 58.3
4 56
5 59.9
6 56.7
7 58.8
8 58.7
9 59.8
10 55.7
# ℹ 323 more rows
# A tibble: 165 × 8
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
<fct> <fct> <dbl> <dbl> <int> <int>
1 Adelie Torgersen 39.5 17.4 186 3800
2 Adelie Torgersen 40.3 18 195 3250
3 Adelie Torgersen 36.7 19.3 193 3450
4 Adelie Torgersen 38.9 17.8 181 3625
5 Adelie Torgersen 41.1 17.6 182 3200
6 Adelie Torgersen 36.6 17.8 185 3700
7 Adelie Torgersen 38.7 19 195 3450
8 Adelie Torgersen 34.4 18.4 184 3325
9 Adelie Biscoe 37.8 18.3 174 3400
10 Adelie Biscoe 35.9 19.2 189 3800
# ℹ 155 more rows
# ℹ 2 more variables: sex <fct>, year <int>
# A tibble: 165 × 1
sex
<fct>
1 female
2 female
3 female
4 female
5 female
6 female
7 female
8 female
9 female
10 female
# ℹ 155 more rows
There is a built-in data set to R called mtcars
that has information on cars that appeared in Motor Trend magazine. It’s already loaded and can be accessed as mtcars
.
Create a slimmer data frame that only contains the columns hp
and wt
and save it to mtcars_slim
.
Create a new column called power_to_weight
that is the ratio of hp
to wt
. Save the three-column data frame back over mtcars_slim
.
Sort the data frame in descending order by the power-to-weight ratio.
Filter the data frame so only the rows with a power-to-weight ratio greater than 45 are preserved.
Hint: look up help files!
08:00
A template for a line plot:
Where:
DATAFRAME
is the name of your data frameXVARIABLE
is the name of the variable of that data frame that you want on the x-axisYVARIABLE
is the name of the variable of that data frmae that you want on the y-axis
Break
05:00
20:00