Lab1_Shape.Rmd
Welcome back. This lab overviews practical aspects about the form or shape that data can take. We will use coding concepts in R that should be mostly familiar from last semester, and use this lab as an opportunity for a little bit of review. There are no readings from the textbook for this lab, but you may find the following links generally helpful:
Apologies that the videos are in two parts…I couldn’t compete with a vacuum cleaner.
The structure of research designs imply that produced data will particular properties, the data can be saved in different formats, data can arrive in different shapes and sizes, and it must be transformed into specific shapes in order to conduct specific analyses. Thus, the shape of data, and the shaping of data, are part and parcel of research, from the beginning to the end. We could spend a good deal of time discussing data organization and manipulation. Most of this conversation will unfold over the semester. In the remaining part of this background section I’m going to identify a few places where data organization is important. Each of these topics could easily be expanded to a whole chapter. For now, my goal is to alert you to them. We will cover some of these topics in more detail in this lab.
Perhaps because the discipline of Psychology is so large and varied it has been slow to adopt any widespread standards for formatting data. Certainly, there are so many kinds of data that standards for one research project might not apply to another. Here are some considerations to keep in mind:
Data-shaping is a practical part of all data-analysis in R. Using scripts to handle the data from the input to analysis allows a reproducible pipeline. A reproducible is helpful for you and others. When your analysis pipeline is not reproducible, you may not be able to fix mistakes that you make. For example, if you accidentally delete something, or move something around by hand, you may not have a record of having performed that operation, and if you forget about it, you may never be able to go back and fix errors. When you use a script for analysis you never “touch” the data by hand. Instead, all actions are taken by script. Even if your script makes a mistake, the mistake is at least identifiable and fixable. If someone else has the raw data, and your analysis script, then if they input the data to your script, they should output the same analysis that you reported. The nuts and bolts of the analysis script often include many transformations of the data: The raw data is inputted to R, it might be saved in different variables, pre-processed in various ways, and reformatted and sliced and diced to meet input requirements for R statistical analysis functions.
A major overarching goal for the end of our course is for you to understand how you could create your own statistical analyses customized exactly to nuances of your research designs. In other words, how to WYOR (Write Your Own Recipes) for statistics. In order to get there, it is important to recognize fundamental connections between your research design and the shape of data that will be collected in the design. In this lab, we will use R as a conceptual and practical tool to illustrate how simulated data can be created for particular designs. Once you have simulated data, you can test out your own planned analysis in advance of obtaining real data.
This is a short concept section to illustrate the concept of transformability. The basic idea is that transformable data can be arranged into different shapes and back again without losing any information. As a practical matter, many functions depend on their inputs being formatted in a particular way. Thus, data transformation is often required to get the data into shape so that it can be inputted into some function.
As a general rule, most functions for statistical tests in R require that data are organized in long-format. I personally find this convenient because it means that:
Let’s look at an example of wide data. Imagine we have five people, and we have measured how many times they check their phone, in the morning, afternoon, and evening.
wide_data <- data.frame(person = 1:5,
Morning = c(1,3,2,4,3),
Afternoon = c(3,4,5,4,7),
Evening = c(7,8,7,6,9))
knitr::kable(wide_data)
person | Morning | Afternoon | Evening |
---|---|---|---|
1 | 1 | 3 | 7 |
2 | 3 | 4 | 8 |
3 | 2 | 5 | 7 |
4 | 4 | 4 | 6 |
5 | 3 | 7 | 9 |
If we have 5 people, and collect measures three times each, then we must have 5 x 3 total cells. The wide version of this data is a 5x3 matrix with 5 rows (different people), and 3 columns (morning, afternoon, evening). This matrix has 15 cells in it, so it is capable of representing all complete cases of the data. Wide-data is a perfectly fine way to represent data, and there is nothing inherently wrong with wide-data. However, as I mentioned before, many functions in R are written with the assumption that data is shaped as long-data.
Here is an example of the same thing in long-format:
long_data <- data.frame(person = rep(1:5, each=3),
time_of_day = rep(c("Morning", "Afternoon", "Evening"),5),
counts = c(1,3,7,3,4,8,2,5,7,4,4,6,3,7,9))
knitr::kable(long_data)
person | time_of_day | counts |
---|---|---|
1 | Morning | 1 |
1 | Afternoon | 3 |
1 | Evening | 7 |
2 | Morning | 3 |
2 | Afternoon | 4 |
2 | Evening | 8 |
3 | Morning | 2 |
3 | Afternoon | 5 |
3 | Evening | 7 |
4 | Morning | 4 |
4 | Afternoon | 4 |
4 | Evening | 6 |
5 | Morning | 3 |
5 | Afternoon | 7 |
5 | Evening | 9 |
person <- rep(1:5,3)
time_of_day <- rep(c("Morning", "Afternoon", "Evening"),each =5)
counts <- c(1,3,7,3,4,8,2,5,7,4,4,6,3,7,9)
test <- data.frame(person,time_of_day,counts)
As you can see, in long-data format, the data gets really long. The rule here is that each dependent measure (e.g., the counts of phone looking) is listed on a single row. There ar 5 x 3 = 15 total individual measures, so there must be 15 rows in the long-form representation.
Sometimes you might receive data in wide format and need to convert it to long format. This can be accomplished in R in multiple different ways. You can write custom code to do it, or you can try using various existing functions. If you Google “wide to long in R,” you might come across a curious history of functions that have been developed to provide this function. One part of that history is that the functions keep getting re-written so that they are “more clear” about how they work. I’ll admit that I have found these functions confusing before, and usually find myself messing around with them until they do the conversion I’m looking for. In any case, here is an example of pivoting from wide to long using tidyr
library(tidyr)
pivot_longer(data = wide_data,
cols = !person,
names_to = "time_of_day",
values_to = "counts")
#> # A tibble: 15 × 3
#> person time_of_day counts
#> <int> <chr> <dbl>
#> 1 1 Morning 1
#> 2 1 Afternoon 3
#> 3 1 Evening 7
#> 4 2 Morning 3
#> 5 2 Afternoon 4
#> 6 2 Evening 8
#> 7 3 Morning 2
#> 8 3 Afternoon 5
#> 9 3 Evening 7
#> 10 4 Morning 4
#> 11 4 Afternoon 4
#> 12 4 Evening 6
#> 13 5 Morning 3
#> 14 5 Afternoon 7
#> 15 5 Evening 9
The grim reality is that data that is not made by you could take a huge number of different formats. And, once you get it into R, you may have wrangle it into shape before you can proceed to analyze it. In this example, I will show a strange data format, and write some custom code to wrangle it into long-format. This is just to illustrate the general idea that sometimes you may have to do custom data shaping.
Consider the following format. Each subject’s phone checking count is a number separated by commas. The first number is always for morning, the second is for afternoon, and the third for evening. Individual subjects are separated by semi-colons. Thus, the first three numbers are for subject 1, and the next three are for subject 2, and so on. As you can see, all of the data from before is perfectly preserved, all on one line.
the_data<-"1,3,7;3,4,8;2,5,7;4,4,6;3,7,9"
So, we have a custom format above, and now we need to get it into long format. Unfortunately, there are no tidy-verse functions for shaping weird custom data formats. So, someone has pull some tricks out of their hat.
library(dplyr)
subjects <- unlist(strsplit(the_data, split = ";"))
subjects
#> [1] "1,3,7" "3,4,8" "2,5,7" "4,4,6" "3,7,9"
subjects <- strsplit(subjects,split=",")
subjects
#> [[1]]
#> [1] "1" "3" "7"
#>
#> [[2]]
#> [1] "3" "4" "8"
#>
#> [[3]]
#> [1] "2" "5" "7"
#>
#> [[4]]
#> [1] "4" "4" "6"
#>
#> [[5]]
#> [1] "3" "7" "9"
subjects <- t(data.frame(subjects))
subjects
#> [,1] [,2] [,3]
#> c..1....3....7.. "1" "3" "7"
#> c..3....4....8.. "3" "4" "8"
#> c..2....5....7.. "2" "5" "7"
#> c..4....4....6.. "4" "4" "6"
#> c..3....7....9.. "3" "7" "9"
colnames(subjects) <- c("Morning","Afternoon","Evening")
subjects
#> Morning Afternoon Evening
#> c..1....3....7.. "1" "3" "7"
#> c..3....4....8.. "3" "4" "8"
#> c..2....5....7.. "2" "5" "7"
#> c..4....4....6.. "4" "4" "6"
#> c..3....7....9.. "3" "7" "9"
row.names(subjects) <- 1:5
subjects <- as.data.frame(subjects) %>%
mutate(person=1:5)
pivot_longer(data = subjects,
cols = 1:3,
names_to = "time_of_day",
values_to = "counts")
#> # A tibble: 15 × 3
#> person time_of_day counts
#> <int> <chr> <chr>
#> 1 1 Morning 1
#> 2 1 Afternoon 3
#> 3 1 Evening 7
#> 4 2 Morning 3
#> 5 2 Afternoon 4
#> 6 2 Evening 8
#> 7 3 Morning 2
#> 8 3 Afternoon 5
#> 9 3 Evening 7
#> 10 4 Morning 4
#> 11 4 Afternoon 4
#> 12 4 Evening 6
#> 13 5 Morning 3
#> 14 5 Afternoon 7
#> 15 5 Evening 9
The purpose of this section is to focus on the process of creating data-structures in R that have the following properties:
A one-sample t-test involves a vector of means. Here, a vector of means is created by sampling 10 values from a unit normal distribution.
dv <- rnorm(10,0,1)
t.test(dv)
#>
#> One Sample t-test
#>
#> data: dv
#> t = 0.031591, df = 9, p-value = 0.9755
#> alternative hypothesis: true mean is not equal to 0
#> 95 percent confidence interval:
#> -0.9560349 0.9831151
#> sample estimates:
#> mean of x
#> 0.01354013
Consider a design with 50 participants. Each participant takes a TRUE/FALSE quiz with 10 questions. A researcher wants to apply a one-sample t-test to test whether the participants performed better than chance.
Create example raw data that represents each subjects’ answer to each question
There 50 participants x 10 questions, so there must be 500 cells
I sample 1s and 0s from a binomial to indicate correct vs incorrect one each question
Create a summary vector of means suitable for the t.test function
Run the t.test
subject_data <- matrix( rbinom(50*10,1,.5), ncol=10, nrow=50)
subject_means <- rowMeans(subject_data)
t.test(subject_means, mu=.5)
#>
#> One Sample t-test
#>
#> data: subject_means
#> t = 1.8518, df = 49, p-value = 0.07009
#> alternative hypothesis: true mean is not equal to 0.5
#> 95 percent confidence interval:
#> 0.4962511 0.5917489
#> sample estimates:
#> mean of x
#> 0.544
Consider a design measuring fluctuations in weight as a function of weekday vs. weekend. Researchers have 25 people weigh themselves 5 times throughout the day on Wednesday, and 5 times throughout the day on Sunday. Create a data frame that represents this situation, and conduct a paired sample t-test.
To break this down, we will create a long data.frame with four columns: Subject number, Day, measurement number, weight. How many rows must their be? There are 25 people, 5 measurements per day, and two days of measurements. In long-format, there is only one measure per row. Therefore, there are 25 x 5 x 2 = 250 rows.
I repeat each number from 1 to 25, 10 times each.
subject_number <- rep(1:25, each=10)
The day column has two levels, Wednesday vs. Sunday. Each level has to appear 5 times for each subject.
#day <- rep(c("Wednesday","Sunday"), each = 5) # makes one subject
day <- rep(rep(c("Wednesday","Sunday"), each = 5), 25)
We need a variable to represent each of the five measurements that are taken per day. Let’s call this measurement_number
measurement_number <- rep(1:5, 2*25)
We need some pretend measurements. For now, let’s just choose some random numbers from a normal distribution. We need 250 numbers.
weights <- rnorm(250, 100, 25)
Next, let’s combine all of these vectors into a data.frame
weight_data <- data.frame(subject_number,
day,
measurement_number,
weights)
head(weight_data)
#> subject_number day measurement_number weights
#> 1 1 Wednesday 1 76.33381
#> 2 1 Wednesday 2 90.21276
#> 3 1 Wednesday 3 114.99555
#> 4 1 Wednesday 4 94.17472
#> 5 1 Wednesday 5 76.91866
#> 6 1 Sunday 1 128.93150
Note, we could have defined everything inside a single data.frame
weight_data <- data.frame(subject_number = rep(1:25, each=10),
day = rep(rep(c("Wednesday","Sunday"), each = 5), 25),
measurement_number = rep(1:5, 2*25),
weights = rnorm(250, 100, 25))
The data.frame weight_data
now represents the complete shape of the data implied by the research design. However, this data is not yet ready for the t.test
function. This is because the t.test
function assumes that inputs will be means for each participant in each condition. So, the raw data must be summarized first. In other words, we must find the mean weight within each day that each participant was measured. We will continue to use the dplyr
syntax to group and summarize data.
subject_means <- weight_data %>%
group_by(subject_number,day) %>%
summarize(mean_weight = mean(weights), .groups = "drop")
head(subject_means)
#> # A tibble: 6 × 3
#> subject_number day mean_weight
#> <int> <chr> <dbl>
#> 1 1 Sunday 97.2
#> 2 1 Wednesday 104.
#> 3 2 Sunday 102.
#> 4 2 Wednesday 100.
#> 5 3 Sunday 85.7
#> 6 3 Wednesday 83.7
Finally, we can run the t.test
t.test(mean_weight~day, paired=TRUE, data=subject_means)
#>
#> Paired t-test
#>
#> data: mean_weight by day
#> t = 3.6881, df = 24, p-value = 0.001154
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#> 4.789947 16.963559
#> sample estimates:
#> mean of the differences
#> 10.87675
A researcher gives 10 subjects a recall memory test. They all read 50 words for a later memory test. After a short break half of the participants are put in a noisy room, and the other half are put in a quiet room. They are all given a piece of paper with 50 lines and asked to write down as memory words as they can remember. The raw data is coded as 1s or 0s, with 1 representing a correctly recalled word and 0 represent an incorrectly recalled word. A researcher wants to do a t-test on the number of correctly recalled words in the noisy vs quiet room.
Note, I switch to using a tibble here instead of a data.frame.
subjects <- rep(1:10, each = 50)
room <- rep(c("Noisy","Quiet"), each = 50*5)
words <- rep(1:50, 10)
correct <- rbinom(500,1,.5)
recall_data <- tibble(subjects,
room,
words,
correct)
recall_data
#> # A tibble: 500 × 4
#> subjects room words correct
#> <int> <chr> <int> <int>
#> 1 1 Noisy 1 0
#> 2 1 Noisy 2 1
#> 3 1 Noisy 3 0
#> 4 1 Noisy 4 0
#> 5 1 Noisy 5 1
#> 6 1 Noisy 6 1
#> 7 1 Noisy 7 0
#> 8 1 Noisy 8 1
#> 9 1 Noisy 9 1
#> 10 1 Noisy 10 1
#> # … with 490 more rows
count_data <- recall_data %>%
group_by(subjects,room) %>%
summarize(number_correct = sum(correct), .groups="drop")
count_data
#> # A tibble: 10 × 3
#> subjects room number_correct
#> <int> <chr> <int>
#> 1 1 Noisy 23
#> 2 2 Noisy 28
#> 3 3 Noisy 22
#> 4 4 Noisy 31
#> 5 5 Noisy 28
#> 6 6 Quiet 26
#> 7 7 Quiet 28
#> 8 8 Quiet 25
#> 9 9 Quiet 25
#> 10 10 Quiet 30
t.test(number_correct~room, var.equal=TRUE, data=count_data)
#>
#> Two Sample t-test
#>
#> data: number_correct by room
#> t = -0.2052, df = 8, p-value = 0.8425
#> alternative hypothesis: true difference in means between group Noisy and group Quiet is not equal to 0
#> 95 percent confidence interval:
#> -4.89523 4.09523
#> sample estimates:
#> mean in group Noisy mean in group Quiet
#> 26.4 26.8
100 people write down their height in centimeters, and the day of the month they were born. Conduct a linear regression to see if day of month explains variation in height.
people <- tibble(height = rnorm(100, 90, 10),
day = sample(1:31, 100, replace=TRUE))
people
#> # A tibble: 100 × 2
#> height day
#> <dbl> <int>
#> 1 105. 7
#> 2 83.4 18
#> 3 90.5 30
#> 4 84.4 31
#> 5 102. 31
#> 6 93.8 31
#> 7 75.4 18
#> 8 76.4 14
#> 9 88.9 14
#> 10 86.3 2
#> # … with 90 more rows
lm.out <- lm(height~day, data= people)
lm.out
#>
#> Call:
#> lm(formula = height ~ day, data = people)
#>
#> Coefficients:
#> (Intercept) day
#> 89.02527 -0.04441
summary(lm.out)
#>
#> Call:
#> lm(formula = height ~ day, data = people)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -25.7829 -5.3427 0.4639 6.5575 18.5019
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 89.02527 1.81492 49.052 <2e-16 ***
#> day -0.04441 0.10074 -0.441 0.66
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 9.061 on 98 degrees of freedom
#> Multiple R-squared: 0.001979, Adjusted R-squared: -0.008205
#> F-statistic: 0.1943 on 1 and 98 DF, p-value: 0.6603
We haven’t yet covered one-way ANOVA in this course. You may already be familiar with ANOVA. For now, we can think of it is a kind of extension of t-tests to deal with single-factor designs with more than one level. For example, consider extending the paired t-test sample above. In that example, 25 people weighed themselves 5 times throughout the day on Wednesday, and 5 times throughout the day on Sunday. The independent variable is day, and it has two levels (Wednesday vs Sunday). Why should we stop at Wednesday vs. Sunday? There are five other days of the week. Maybe each day has its own influence on weight.
Consider simulating data for a one-factor design with 7 levels, one for each day of the week. N=25, and each person measures themselves 5 times each day.
weight_data <- tibble(subject_number = rep(1:25, each=5*7),
day = rep(rep(c("S","M","T","W","Th","F","Sa"),
each = 5), 25),
measurement_number = rep(1:5, 7*25),
weights = rnorm(25*5*7, 100, 25))
As we will see in later labs on ANOVA, the analysis can be performed in one-line with the aov
function
subject_means <- weight_data %>%
group_by(subject_number,day) %>%
summarize(mean_weight = mean(weights), .groups="drop")
subject_means
#> # A tibble: 175 × 3
#> subject_number day mean_weight
#> <int> <chr> <dbl>
#> 1 1 F 94.0
#> 2 1 M 102.
#> 3 1 S 125.
#> 4 1 Sa 107.
#> 5 1 T 104.
#> 6 1 Th 103.
#> 7 1 W 97.2
#> 8 2 F 102.
#> 9 2 M 101.
#> 10 2 S 92.1
#> # … with 165 more rows
aov.out <- aov(mean_weight ~ day, data = subject_means)
summary(aov.out)
#> Df Sum Sq Mean Sq F value Pr(>F)
#> day 6 704 117.4 0.955 0.457
#> Residuals 168 20642 122.9
And, as we will also learn in class, ANOVA and Linear Regression are fundamentally the same analysis, so we could also use the lm
function and treat the analysis as a regression.
subject_means <- weight_data %>%
group_by(subject_number,day) %>%
summarize(mean_weight = mean(weights), .groups="drop")
subject_means
#> # A tibble: 175 × 3
#> subject_number day mean_weight
#> <int> <chr> <dbl>
#> 1 1 F 94.0
#> 2 1 M 102.
#> 3 1 S 125.
#> 4 1 Sa 107.
#> 5 1 T 104.
#> 6 1 Th 103.
#> 7 1 W 97.2
#> 8 2 F 102.
#> 9 2 M 101.
#> 10 2 S 92.1
#> # … with 165 more rows
lm.out <- lm(mean_weight ~ day, data = subject_means)
summary(lm.out)
#>
#> Call:
#> lm(formula = mean_weight ~ day, data = subject_means)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -28.260 -6.852 -1.305 6.015 33.166
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 101.7744 2.2169 45.908 <2e-16 ***
#> dayM -1.7951 3.1352 -0.573 0.568
#> dayS -3.1560 3.1352 -1.007 0.316
#> daySa -5.1739 3.1352 -1.650 0.101
#> dayT -0.5008 3.1352 -0.160 0.873
#> dayTh 0.6861 3.1352 0.219 0.827
#> dayW -3.8070 3.1352 -1.214 0.226
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 11.08 on 168 degrees of freedom
#> Multiple R-squared: 0.03299, Adjusted R-squared: -0.001542
#> F-statistic: 0.9553 on 6 and 168 DF, p-value: 0.4573
The one-way ANOVA extends the t-test design in terms of the number of levels of a single factor. It is also possible to run experiments with multiple independent variables, that each have multiple levels.
For example, let’s continue with the above design and make some minor changes so it becomes a factorial 7x2 design. A 7x2 design has two independent variables. The first one has 7 levels, the second one has 2 levels.
We modify the example include a time of day factor. For example, we will have people measure their weight two times in the morning, and two times in the evening. Thus, there will be two independent variables. Day (S, M, T, W, Th, F, Sa) and time of day (morning, evening). We will keep the number of subjects at 25.
weight_data <- tibble(subject_number = rep(1:25, each=4*7),
day = rep(rep(c("S","M","T","W","Th","F","Sa"),
each = 4), 25),
time_of_day = rep(c("Morning","Morning",
"Evening","Evening"),7*25),
measurement_number = rep(rep(1:2, 2), 7*25),
weights = rnorm(25*4*7, 100, 25))
subject_means <- weight_data %>%
group_by(subject_number,day, time_of_day) %>%
summarize(mean_weight = mean(weights), .groups="drop")
subject_means
#> # A tibble: 350 × 4
#> subject_number day time_of_day mean_weight
#> <int> <chr> <chr> <dbl>
#> 1 1 F Evening 83.7
#> 2 1 F Morning 99.5
#> 3 1 M Evening 98.8
#> 4 1 M Morning 107.
#> 5 1 S Evening 95.0
#> 6 1 S Morning 77.7
#> 7 1 Sa Evening 110.
#> 8 1 Sa Morning 95.3
#> 9 1 T Evening 73.4
#> 10 1 T Morning 99.9
#> # … with 340 more rows
aov.out <- aov(mean_weight ~ day*time_of_day, data = subject_means)
summary(aov.out)
#> Df Sum Sq Mean Sq F value Pr(>F)
#> day 6 1497 249.58 0.814 0.560
#> time_of_day 1 3 3.01 0.010 0.921
#> day:time_of_day 6 1446 240.97 0.786 0.582
#> Residuals 336 103049 306.69
Just like we can treat a one-way ANOVA as a regression, we can also treat a Factorial ANOVA as a multiple regression:
subject_means <- weight_data %>%
group_by(subject_number,day, time_of_day) %>%
summarize(mean_weight = mean(weights), .groups="drop")
subject_means$day <-as.factor(subject_means$day)
subject_means$time_of_day <-as.factor(subject_means$time_of_day)
subject_means
#> # A tibble: 350 × 4
#> subject_number day time_of_day mean_weight
#> <int> <fct> <fct> <dbl>
#> 1 1 F Evening 83.7
#> 2 1 F Morning 99.5
#> 3 1 M Evening 98.8
#> 4 1 M Morning 107.
#> 5 1 S Evening 95.0
#> 6 1 S Morning 77.7
#> 7 1 Sa Evening 110.
#> 8 1 Sa Morning 95.3
#> 9 1 T Evening 73.4
#> 10 1 T Morning 99.9
#> # … with 340 more rows
lm.out <- lm(mean_weight ~ day*time_of_day, data = subject_means)
summary(lm.out)
#>
#> Call:
#> lm(formula = mean_weight ~ day * time_of_day, data = subject_means)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -45.420 -10.227 -0.060 9.608 56.573
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 99.14447 3.50253 28.307 <2e-16 ***
#> dayM -4.28104 4.95332 -0.864 0.388
#> dayS -0.89345 4.95332 -0.180 0.857
#> daySa 0.87252 4.95332 0.176 0.860
#> dayT 0.13216 4.95332 0.027 0.979
#> dayTh -1.09450 4.95332 -0.221 0.825
#> dayW 0.04107 4.95332 0.008 0.993
#> time_of_dayMorning -4.56384 4.95332 -0.921 0.358
#> dayM:time_of_dayMorning 8.00406 7.00505 1.143 0.254
#> dayS:time_of_dayMorning -0.30700 7.00505 -0.044 0.965
#> daySa:time_of_dayMorning 1.61384 7.00505 0.230 0.818
#> dayT:time_of_dayMorning 3.35989 7.00505 0.480 0.632
#> dayTh:time_of_dayMorning 6.72953 7.00505 0.961 0.337
#> dayW:time_of_dayMorning 11.24792 7.00505 1.606 0.109
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 17.51 on 336 degrees of freedom
#> Multiple R-squared: 0.0278, Adjusted R-squared: -0.009818
#> F-statistic: 0.739 on 13 and 336 DF, p-value: 0.7242
anova(lm.out)
#> Analysis of Variance Table
#>
#> Response: mean_weight
#> Df Sum Sq Mean Sq F value Pr(>F)
#> day 6 1497 249.579 0.8138 0.5598
#> time_of_day 1 3 3.011 0.0098 0.9211
#> day:time_of_day 6 1446 240.973 0.7857 0.5816
#> Residuals 336 103049 306.692
NOTE: For Spring 2022, following the video in the semester long project tab for setting up your new R project and github repo. You will be making an .Rmd, adding it to your vignettes folder, and then displaying your work on your pkgdown website.
Your assignment instructions are the following:
pkgdown::build_site()
so that your lab work is displayed on your website.Lab1_data.xlsx
data file. This file contains fake data for a 2x3x2 repeated measures design, for 10 participants. The data is in wide format. Here is the link.https://github.com/CrumpLab/rstatsmethods/raw/master/vignettes/Stats2/Lab1_data.xlsx
Your task is to convert the data to long format, and store the long-format data in a data.frame or tibble. Print out some of the long-form data in your lab1.Rmd, to show that you did make the appropriate conversion. For extra fun, show two different ways to solve the problem.
If you need to modify the excel by hand to help you solve the problem that is OK, just make a note of it in your lab work.