M09-1-Principle: Data Import and Export
Note: Lecture note is available at my quarto-pub site
1 Overview
1.1 Expected Learning Outcomes
- Explain how to create a Github repository and collaborate with others on the same R projects.
- Effectively load and look through built-in datasets in R.
- Import various type of data (csv, xlsx, and SPSS) to RStudio.
- Scrape data from the web using SelectorGadget, rvest, and inspect function of web browers.
- Import multiple data sets and work with them.
- Work with labelled data in R.
- Export/save output data to local pc and push to Github.
1.2 The textbook chapters to cover:
- Ch07: Data Import
- Ch20: Spreadsheets
- Ch24: Web Scraping
- Ch18: Missing values
2 Installing and loading up R Packages
3 Reading built-in Data
4 Importing Data
4.1 csv
files
- You should have downloaded the data from the web: https://github.com/jaejungca/R-data-import-export
- Make sure you have the data in the same directory as the qmd file or specify the directory of the data.
4.1.1 Using base R
4.1.2 Using readr
package
```{r readr package}
stu_per_readr <- read_csv("data/StudentsPerformance.csv") # Use read_csv() all the time from now on over read.csv()
spec(stu_per_readr)
class(stu_per_readr)
head(stu_per_readr)
skimr::skim(stu_per_readr) # better than summary() function
stu_per_readr$`math score` # RStudio's auto prompt
stu_per_readr$"math score" # same outcome
stu_per_readr$'math score'
```
cols(
gender = col_character(),
`race/ethnicity` = col_character(),
`parental level of education` = col_character(),
lunch = col_character(),
`test preparation course` = col_character(),
`math score` = col_double(),
`reading score` = col_double(),
`writing score` = col_double()
)
[1] "spec_tbl_df" "tbl_df" "tbl" "data.frame"
Name | stu_per_readr |
Number of rows | 1000 |
Number of columns | 8 |
_______________________ | |
Column type frequency: | |
character | 5 |
numeric | 3 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
gender | 0 | 1 | 4 | 6 | 0 | 2 | 0 |
race/ethnicity | 0 | 1 | 7 | 7 | 0 | 5 | 0 |
parental level of education | 0 | 1 | 11 | 18 | 0 | 6 | 0 |
lunch | 0 | 1 | 8 | 12 | 0 | 2 | 0 |
test preparation course | 0 | 1 | 4 | 9 | 0 | 2 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
math score | 0 | 1 | 66.09 | 15.16 | 0 | 57.00 | 66 | 77 | 100 | ▁▁▅▇▃ |
reading score | 0 | 1 | 69.17 | 14.60 | 17 | 59.00 | 70 | 79 | 100 | ▁▂▆▇▃ |
writing score | 0 | 1 | 68.05 | 15.20 | 10 | 57.75 | 69 | 79 | 100 | ▁▂▅▇▃ |
[1] 72 69 90 47 76 71 88 40 64 38 58 40 65 78 50 69 88 18
[19] 46 54 66 65 44 69 74 73 69 67 70 62 69 63 56 40 97 81
[37] 74 50 75 57 55 58 53 59 50 65 55 66 57 82 53 77 53 88
[55] 71 33 82 52 58 0 79 39 62 69 59 67 45 60 61 39 58 63
[73] 41 61 49 44 30 80 61 62 47 49 50 72 42 73 76 71 58 73
[91] 65 27 71 43 79 78 65 63 58 65 79 68 85 60 98 58 87 66
[109] 52 70 77 62 54 51 99 84 75 78 51 55 79 91 88 63 83 87
[127] 72 65 82 51 89 53 87 75 74 58 51 70 59 71 76 59 42 57
[145] 88 22 88 73 68 100 62 77 59 54 62 70 66 60 61 66 82 75
[163] 49 52 81 96 53 58 68 67 72 94 79 63 43 81 46 71 52 97
[181] 62 46 50 65 45 65 80 62 48 77 66 76 62 77 69 61 59 55
[199] 45 78 67 65 69 57 59 74 82 81 74 58 80 35 42 60 87 84
[217] 83 34 66 61 56 87 55 86 52 45 72 57 68 88 76 46 67 92
[235] 83 80 63 64 54 84 73 80 56 59 75 85 89 58 65 68 47 71
[253] 60 80 54 62 64 78 70 65 64 79 44 99 76 59 63 69 88 71
[271] 69 58 47 65 88 83 85 59 65 73 53 45 73 70 37 81 97 67
[289] 88 77 76 86 63 65 78 67 46 71 40 90 81 56 67 80 74 69
[307] 99 51 53 49 73 66 67 68 59 71 77 83 63 56 67 75 71 43
[325] 41 82 61 28 82 41 71 47 62 90 83 61 76 49 24 35 58 61
[343] 69 67 79 72 62 77 75 87 52 66 63 46 59 61 63 42 59 80
[361] 58 85 52 27 59 49 69 61 44 73 84 45 74 82 59 46 80 85
[379] 71 66 80 87 79 38 38 67 64 57 62 73 73 77 76 57 65 48
[397] 50 85 74 60 59 53 49 88 54 63 65 82 52 87 70 84 71 63
[415] 51 84 71 74 68 57 82 57 47 59 41 62 86 69 65 68 64 61
[433] 61 47 73 50 75 75 70 89 67 78 59 73 79 67 69 86 47 81
[451] 64 100 65 65 53 37 79 53 100 72 53 54 71 77 75 84 26 72
[469] 77 91 83 63 68 59 90 71 76 80 55 76 73 52 68 59 49 70
[487] 61 60 64 79 65 64 83 81 54 68 54 59 66 76 74 94 63 95
[505] 40 82 68 55 79 86 76 64 62 54 77 76 74 66 66 67 71 91
[523] 69 54 53 68 56 36 29 62 68 47 62 79 73 66 51 51 85 97
[541] 75 79 81 82 64 78 92 72 62 79 79 87 40 77 53 32 55 61
[559] 53 73 74 63 96 63 48 48 92 61 63 68 71 91 53 50 74 40
[577] 61 81 48 53 81 77 63 73 69 65 55 44 54 48 58 71 68 74
[595] 92 56 30 53 69 65 54 29 76 60 84 75 85 40 61 58 69 58
[613] 94 65 82 60 37 88 95 65 35 62 58 100 61 100 69 61 49 44
[631] 67 79 66 75 84 71 67 80 86 76 41 74 72 74 70 65 59 64
[649] 50 69 51 68 85 65 73 62 77 69 43 90 74 73 55 65 80 50
[667] 63 77 73 81 66 52 69 65 69 50 73 70 81 63 67 60 62 29
[685] 62 94 85 77 53 93 49 73 66 77 49 79 75 59 57 66 79 57
[703] 87 63 59 62 46 66 89 42 93 80 98 81 60 76 73 96 76 91
[721] 62 55 74 50 47 81 65 68 73 53 68 55 87 55 53 67 92 53
[739] 81 61 80 37 81 59 55 72 69 69 50 87 71 68 79 77 58 84
[757] 55 70 52 69 53 48 78 62 60 74 58 76 68 58 52 75 52 62
[775] 66 49 66 35 72 94 46 77 76 52 91 32 72 19 68 52 48 60
[793] 66 89 42 57 70 70 69 52 67 76 87 82 73 75 64 41 90 59
[811] 51 45 54 87 72 94 45 61 60 77 85 78 49 71 48 62 56 65
[829] 69 68 61 74 64 77 58 60 73 75 58 66 39 64 23 74 40 90
[847] 91 64 59 80 71 61 87 82 62 97 75 65 52 87 53 81 39 71
[865] 97 82 59 61 78 49 59 70 82 90 43 80 81 57 59 64 63 71
[883] 64 55 51 62 93 54 69 44 86 85 50 88 59 32 36 63 67 65
[901] 85 73 34 93 67 88 57 79 67 70 50 69 52 47 46 68 100 44
[919] 57 91 69 35 72 54 74 74 64 65 46 48 67 62 61 70 98 70
[937] 67 57 85 77 72 78 81 61 58 54 82 49 49 57 94 75 74 58
[955] 62 72 84 92 45 75 56 48 100 65 72 62 66 63 68 75 89 78
[973] 53 49 54 64 60 62 55 91 8 81 79 78 74 57 40 81 44 67
[991] 86 65 55 62 63 88 62 59 68 77
[1] 72 69 90 47 76 71 88 40 64 38 58 40 65 78 50 69 88 18
[19] 46 54 66 65 44 69 74 73 69 67 70 62 69 63 56 40 97 81
[37] 74 50 75 57 55 58 53 59 50 65 55 66 57 82 53 77 53 88
[55] 71 33 82 52 58 0 79 39 62 69 59 67 45 60 61 39 58 63
[73] 41 61 49 44 30 80 61 62 47 49 50 72 42 73 76 71 58 73
[91] 65 27 71 43 79 78 65 63 58 65 79 68 85 60 98 58 87 66
[109] 52 70 77 62 54 51 99 84 75 78 51 55 79 91 88 63 83 87
[127] 72 65 82 51 89 53 87 75 74 58 51 70 59 71 76 59 42 57
[145] 88 22 88 73 68 100 62 77 59 54 62 70 66 60 61 66 82 75
[163] 49 52 81 96 53 58 68 67 72 94 79 63 43 81 46 71 52 97
[181] 62 46 50 65 45 65 80 62 48 77 66 76 62 77 69 61 59 55
[199] 45 78 67 65 69 57 59 74 82 81 74 58 80 35 42 60 87 84
[217] 83 34 66 61 56 87 55 86 52 45 72 57 68 88 76 46 67 92
[235] 83 80 63 64 54 84 73 80 56 59 75 85 89 58 65 68 47 71
[253] 60 80 54 62 64 78 70 65 64 79 44 99 76 59 63 69 88 71
[271] 69 58 47 65 88 83 85 59 65 73 53 45 73 70 37 81 97 67
[289] 88 77 76 86 63 65 78 67 46 71 40 90 81 56 67 80 74 69
[307] 99 51 53 49 73 66 67 68 59 71 77 83 63 56 67 75 71 43
[325] 41 82 61 28 82 41 71 47 62 90 83 61 76 49 24 35 58 61
[343] 69 67 79 72 62 77 75 87 52 66 63 46 59 61 63 42 59 80
[361] 58 85 52 27 59 49 69 61 44 73 84 45 74 82 59 46 80 85
[379] 71 66 80 87 79 38 38 67 64 57 62 73 73 77 76 57 65 48
[397] 50 85 74 60 59 53 49 88 54 63 65 82 52 87 70 84 71 63
[415] 51 84 71 74 68 57 82 57 47 59 41 62 86 69 65 68 64 61
[433] 61 47 73 50 75 75 70 89 67 78 59 73 79 67 69 86 47 81
[451] 64 100 65 65 53 37 79 53 100 72 53 54 71 77 75 84 26 72
[469] 77 91 83 63 68 59 90 71 76 80 55 76 73 52 68 59 49 70
[487] 61 60 64 79 65 64 83 81 54 68 54 59 66 76 74 94 63 95
[505] 40 82 68 55 79 86 76 64 62 54 77 76 74 66 66 67 71 91
[523] 69 54 53 68 56 36 29 62 68 47 62 79 73 66 51 51 85 97
[541] 75 79 81 82 64 78 92 72 62 79 79 87 40 77 53 32 55 61
[559] 53 73 74 63 96 63 48 48 92 61 63 68 71 91 53 50 74 40
[577] 61 81 48 53 81 77 63 73 69 65 55 44 54 48 58 71 68 74
[595] 92 56 30 53 69 65 54 29 76 60 84 75 85 40 61 58 69 58
[613] 94 65 82 60 37 88 95 65 35 62 58 100 61 100 69 61 49 44
[631] 67 79 66 75 84 71 67 80 86 76 41 74 72 74 70 65 59 64
[649] 50 69 51 68 85 65 73 62 77 69 43 90 74 73 55 65 80 50
[667] 63 77 73 81 66 52 69 65 69 50 73 70 81 63 67 60 62 29
[685] 62 94 85 77 53 93 49 73 66 77 49 79 75 59 57 66 79 57
[703] 87 63 59 62 46 66 89 42 93 80 98 81 60 76 73 96 76 91
[721] 62 55 74 50 47 81 65 68 73 53 68 55 87 55 53 67 92 53
[739] 81 61 80 37 81 59 55 72 69 69 50 87 71 68 79 77 58 84
[757] 55 70 52 69 53 48 78 62 60 74 58 76 68 58 52 75 52 62
[775] 66 49 66 35 72 94 46 77 76 52 91 32 72 19 68 52 48 60
[793] 66 89 42 57 70 70 69 52 67 76 87 82 73 75 64 41 90 59
[811] 51 45 54 87 72 94 45 61 60 77 85 78 49 71 48 62 56 65
[829] 69 68 61 74 64 77 58 60 73 75 58 66 39 64 23 74 40 90
[847] 91 64 59 80 71 61 87 82 62 97 75 65 52 87 53 81 39 71
[865] 97 82 59 61 78 49 59 70 82 90 43 80 81 57 59 64 63 71
[883] 64 55 51 62 93 54 69 44 86 85 50 88 59 32 36 63 67 65
[901] 85 73 34 93 67 88 57 79 67 70 50 69 52 47 46 68 100 44
[919] 57 91 69 35 72 54 74 74 64 65 46 48 67 62 61 70 98 70
[937] 67 57 85 77 72 78 81 61 58 54 82 49 49 57 94 75 74 58
[955] 62 72 84 92 45 75 56 48 100 65 72 62 66 63 68 75 89 78
[973] 53 49 54 64 60 62 55 91 8 81 79 78 74 57 40 81 44 67
[991] 86 65 55 62 63 88 62 59 68 77
[1] 72 69 90 47 76 71 88 40 64 38 58 40 65 78 50 69 88 18
[19] 46 54 66 65 44 69 74 73 69 67 70 62 69 63 56 40 97 81
[37] 74 50 75 57 55 58 53 59 50 65 55 66 57 82 53 77 53 88
[55] 71 33 82 52 58 0 79 39 62 69 59 67 45 60 61 39 58 63
[73] 41 61 49 44 30 80 61 62 47 49 50 72 42 73 76 71 58 73
[91] 65 27 71 43 79 78 65 63 58 65 79 68 85 60 98 58 87 66
[109] 52 70 77 62 54 51 99 84 75 78 51 55 79 91 88 63 83 87
[127] 72 65 82 51 89 53 87 75 74 58 51 70 59 71 76 59 42 57
[145] 88 22 88 73 68 100 62 77 59 54 62 70 66 60 61 66 82 75
[163] 49 52 81 96 53 58 68 67 72 94 79 63 43 81 46 71 52 97
[181] 62 46 50 65 45 65 80 62 48 77 66 76 62 77 69 61 59 55
[199] 45 78 67 65 69 57 59 74 82 81 74 58 80 35 42 60 87 84
[217] 83 34 66 61 56 87 55 86 52 45 72 57 68 88 76 46 67 92
[235] 83 80 63 64 54 84 73 80 56 59 75 85 89 58 65 68 47 71
[253] 60 80 54 62 64 78 70 65 64 79 44 99 76 59 63 69 88 71
[271] 69 58 47 65 88 83 85 59 65 73 53 45 73 70 37 81 97 67
[289] 88 77 76 86 63 65 78 67 46 71 40 90 81 56 67 80 74 69
[307] 99 51 53 49 73 66 67 68 59 71 77 83 63 56 67 75 71 43
[325] 41 82 61 28 82 41 71 47 62 90 83 61 76 49 24 35 58 61
[343] 69 67 79 72 62 77 75 87 52 66 63 46 59 61 63 42 59 80
[361] 58 85 52 27 59 49 69 61 44 73 84 45 74 82 59 46 80 85
[379] 71 66 80 87 79 38 38 67 64 57 62 73 73 77 76 57 65 48
[397] 50 85 74 60 59 53 49 88 54 63 65 82 52 87 70 84 71 63
[415] 51 84 71 74 68 57 82 57 47 59 41 62 86 69 65 68 64 61
[433] 61 47 73 50 75 75 70 89 67 78 59 73 79 67 69 86 47 81
[451] 64 100 65 65 53 37 79 53 100 72 53 54 71 77 75 84 26 72
[469] 77 91 83 63 68 59 90 71 76 80 55 76 73 52 68 59 49 70
[487] 61 60 64 79 65 64 83 81 54 68 54 59 66 76 74 94 63 95
[505] 40 82 68 55 79 86 76 64 62 54 77 76 74 66 66 67 71 91
[523] 69 54 53 68 56 36 29 62 68 47 62 79 73 66 51 51 85 97
[541] 75 79 81 82 64 78 92 72 62 79 79 87 40 77 53 32 55 61
[559] 53 73 74 63 96 63 48 48 92 61 63 68 71 91 53 50 74 40
[577] 61 81 48 53 81 77 63 73 69 65 55 44 54 48 58 71 68 74
[595] 92 56 30 53 69 65 54 29 76 60 84 75 85 40 61 58 69 58
[613] 94 65 82 60 37 88 95 65 35 62 58 100 61 100 69 61 49 44
[631] 67 79 66 75 84 71 67 80 86 76 41 74 72 74 70 65 59 64
[649] 50 69 51 68 85 65 73 62 77 69 43 90 74 73 55 65 80 50
[667] 63 77 73 81 66 52 69 65 69 50 73 70 81 63 67 60 62 29
[685] 62 94 85 77 53 93 49 73 66 77 49 79 75 59 57 66 79 57
[703] 87 63 59 62 46 66 89 42 93 80 98 81 60 76 73 96 76 91
[721] 62 55 74 50 47 81 65 68 73 53 68 55 87 55 53 67 92 53
[739] 81 61 80 37 81 59 55 72 69 69 50 87 71 68 79 77 58 84
[757] 55 70 52 69 53 48 78 62 60 74 58 76 68 58 52 75 52 62
[775] 66 49 66 35 72 94 46 77 76 52 91 32 72 19 68 52 48 60
[793] 66 89 42 57 70 70 69 52 67 76 87 82 73 75 64 41 90 59
[811] 51 45 54 87 72 94 45 61 60 77 85 78 49 71 48 62 56 65
[829] 69 68 61 74 64 77 58 60 73 75 58 66 39 64 23 74 40 90
[847] 91 64 59 80 71 61 87 82 62 97 75 65 52 87 53 81 39 71
[865] 97 82 59 61 78 49 59 70 82 90 43 80 81 57 59 64 63 71
[883] 64 55 51 62 93 54 69 44 86 85 50 88 59 32 36 63 67 65
[901] 85 73 34 93 67 88 57 79 67 70 50 69 52 47 46 68 100 44
[919] 57 91 69 35 72 54 74 74 64 65 46 48 67 62 61 70 98 70
[937] 67 57 85 77 72 78 81 61 58 54 82 49 49 57 94 75 74 58
[955] 62 72 84 92 45 75 56 48 100 65 72 62 66 63 68 75 89 78
[973] 53 49 54 64 60 62 55 91 8 81 79 78 74 57 40 81 44 67
[991] 86 65 55 62 63 88 62 59 68 77
4.2 URL
4.3 Importing Excel
data with readxl
package
```{r readxl}
library(readxl)
sp_excel <- read_excel("data/StudentsPerformance.xlsx")
#?read_excel
sp_excel
skimr::skim(sp_excel)
```
Name | sp_excel |
Number of rows | 1000 |
Number of columns | 8 |
_______________________ | |
Column type frequency: | |
character | 5 |
numeric | 3 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
gender | 0 | 1 | 4 | 6 | 0 | 2 | 0 |
race/ethnicity | 0 | 1 | 7 | 7 | 0 | 5 | 0 |
parental level of education | 0 | 1 | 11 | 18 | 0 | 6 | 0 |
lunch | 0 | 1 | 8 | 12 | 0 | 2 | 0 |
test preparation course | 0 | 1 | 4 | 9 | 0 | 2 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
math score | 0 | 1 | 66.09 | 15.16 | 0 | 57.00 | 66 | 77 | 100 | ▁▁▅▇▃ |
reading score | 0 | 1 | 69.17 | 14.60 | 17 | 59.00 | 70 | 79 | 100 | ▁▂▆▇▃ |
writing score | 0 | 1 | 68.05 | 15.20 | 10 | 57.75 | 69 | 79 | 100 | ▁▂▅▇▃ |
4.4 SPSS data with haven
package
- labelled data
- SPSS:
read_sav()
- SAS:
read_sas()
- STATA
read_stata()
- SPSS:
```{r heaven}
library(haven)
demo <- read_sav("data/demo.sav") # from heaven package; read data as tibble.
class(demo)
typeof(demo)
demo
view_df(demo)
```
[1] "tbl_df" "tbl" "data.frame"
[1] "list"
ID | Name | Label | Values | Value Labels |
1 | age | Age in years | range: 18-77 | |
2 | marital | Marital status | 0 1 |
Unmarried Married |
3 | address | Years at current address | range: 0-56 | |
4 | income | Household income in thousands | range: 9-1116 | |
5 | inccat | Income category in thousands | 1 2 3 4 |
Under $25 $25 - $49 $50 - $74 $75+ |
6 | car | Price of primary vehicle | range: 4.2-99.9 | |
7 | cbackground-color:#eeeeeeat | Primary vehicle price category | 1 2 3 |
Economy Standard Luxury |
8 | ed | Level of education | 1 2 3 4 5 |
Did not complete high school High school degree Some college College degree Post-undergraduate degree |
9 | employ | Years with current employer | range: 0-57 | |
10 | retire | Retired | 0 1 |
No Yes |
11 | empcat | Years with current employer | 1 2 3 |
Less than 5 5 to 15 More than 15 |
12 | jobsat | Job satisfaction | 1 2 3 4 5 |
Highly dissatisfied Somewhat dissatisfied Neutral Somewhat satisfied Highly satisfied |
13 | gender | Gender | f m |
<output omitted> |
14 | reside | Number of people in household | range: 1-9 | |
15 | wireless | Wireless service | 0 1 |
No Yes |
16 | multline | Multiple lines | 0 1 |
No Yes |
17 | voice | Voice mail | 0 1 |
No Yes |
18 | pager | Paging service | 0 1 |
No Yes |
19 | internet | Internet | 0 1 8 9 |
No Yes Does not know No Answer |
20 | callid | Caller ID | 0 1 |
No Yes |
21 | callwait | Call waiting | 0 1 |
No Yes |
22 | owntv | Owns TV | 0 1 |
No Yes |
23 | ownvcr | Owns VCR | 0 1 |
No Yes |
24 | owncd | Owns stereo/CD player | 0 1 |
No Yes |
25 | ownpda | Owns PDA | 0 1 |
No Yes |
26 | ownpc | Owns computer | 0 1 |
No Yes |
27 | ownfax | Owns fax machine | 0 1 |
No Yes |
28 | news | Newspaper subscription | 0 1 |
Yes No |
29 | response | Response | 0 1 |
Yes No |
4.5 Import from clipboard
Go to the following site and copy the Stock Price site and copy the data to clipboard.
4.5.1 For Windows users
```{r read.table}
#| error: true
stocks <- read.delim("clipboard")
head(stocks)
tail(stocks)
#?read.delim # from read.table package
readr::read_delim("clipboard")
```
Error: 'clipboard' does not exist in current working directory ('C:/Users/jmjung/OneDrive - Cal Poly Pomona/Documents/Work/Jaemin/4. Teaching/DWV101/Course Content/M09-Import_Export/Principles').
4.5.2 For Mac OS users
5 Data Cleaning & Exploratory Data Analysis
5.1 Cleaning
```{r cleaning stocks}
# examine and clean data
head(stocks)
class(stocks)
stocks <- as_tibble(stocks)
head(stocks)
glimpse(stocks)
skimr::skim(stocks)
head(stocks)
colnames(stocks)[5:6] <- c("Close", "Adj_Close")
head(stocks)
# clean names
library(janitor)
stocks |>
clean_names() -> stocks
stocks
```
[1] "data.frame"
Rows: 0
Columns: 1
$ M09.1.Principle.Data_Import_Export.qmd <lgl>
Name | stocks |
Number of rows | 0 |
Number of columns | 1 |
_______________________ | |
Column type frequency: | |
logical | 1 |
________________________ | |
Group variables | None |
Variable type: logical
skim_variable | n_missing | complete_rate | mean | count |
---|---|---|---|---|
M09.1.Principle.Data_Import_Export.qmd | 0 | NaN | NaN | : |
5.2 Exploratory Data Analysis (EDA)
5.2.1 Automated EDA packages/functions
```{r diamonds data}
glimpse(diamonds)
skimr::skim(diamonds)
diamonds %>%
group_by(cut) %>%
skimr::skim()
```
Rows: 53,940
Columns: 10
$ carat <dbl> 0.23, 0.21, 0.23, 0.29, 0.31, 0.24, 0.24, 0.26, 0.22, 0.23, 0.…
$ cut <ord> Ideal, Premium, Good, Premium, Good, Very Good, Very Good, Ver…
$ color <ord> E, E, E, I, J, J, I, H, E, H, J, J, F, J, E, E, I, J, J, J, I,…
$ clarity <ord> SI2, SI1, VS1, VS2, SI2, VVS2, VVS1, SI1, VS2, VS1, SI1, VS1, …
$ depth <dbl> 61.5, 59.8, 56.9, 62.4, 63.3, 62.8, 62.3, 61.9, 65.1, 59.4, 64…
$ table <dbl> 55, 61, 65, 58, 58, 57, 57, 55, 61, 61, 55, 56, 61, 54, 62, 58…
$ price <int> 326, 326, 327, 334, 335, 336, 336, 337, 337, 338, 339, 340, 34…
$ x <dbl> 3.95, 3.89, 4.05, 4.20, 4.34, 3.94, 3.95, 4.07, 3.87, 4.00, 4.…
$ y <dbl> 3.98, 3.84, 4.07, 4.23, 4.35, 3.96, 3.98, 4.11, 3.78, 4.05, 4.…
$ z <dbl> 2.43, 2.31, 2.31, 2.63, 2.75, 2.48, 2.47, 2.53, 2.49, 2.39, 2.…
Name | diamonds |
Number of rows | 53940 |
Number of columns | 10 |
_______________________ | |
Column type frequency: | |
factor | 3 |
numeric | 7 |
________________________ | |
Group variables | None |
Variable type: factor
skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
---|---|---|---|---|---|
cut | 0 | 1 | TRUE | 5 | Ide: 21551, Pre: 13791, Ver: 12082, Goo: 4906 |
color | 0 | 1 | TRUE | 7 | G: 11292, E: 9797, F: 9542, H: 8304 |
clarity | 0 | 1 | TRUE | 8 | SI1: 13065, VS2: 12258, SI2: 9194, VS1: 8171 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
carat | 0 | 1 | 0.80 | 0.47 | 0.2 | 0.40 | 0.70 | 1.04 | 5.01 | ▇▂▁▁▁ |
depth | 0 | 1 | 61.75 | 1.43 | 43.0 | 61.00 | 61.80 | 62.50 | 79.00 | ▁▁▇▁▁ |
table | 0 | 1 | 57.46 | 2.23 | 43.0 | 56.00 | 57.00 | 59.00 | 95.00 | ▁▇▁▁▁ |
price | 0 | 1 | 3932.80 | 3989.44 | 326.0 | 950.00 | 2401.00 | 5324.25 | 18823.00 | ▇▂▁▁▁ |
x | 0 | 1 | 5.73 | 1.12 | 0.0 | 4.71 | 5.70 | 6.54 | 10.74 | ▁▁▇▃▁ |
y | 0 | 1 | 5.73 | 1.14 | 0.0 | 4.72 | 5.71 | 6.54 | 58.90 | ▇▁▁▁▁ |
z | 0 | 1 | 3.54 | 0.71 | 0.0 | 2.91 | 3.53 | 4.04 | 31.80 | ▇▁▁▁▁ |
Name | Piped data |
Number of rows | 53940 |
Number of columns | 10 |
_______________________ | |
Column type frequency: | |
factor | 2 |
numeric | 7 |
________________________ | |
Group variables | cut |
Variable type: factor
skim_variable | cut | n_missing | complete_rate | ordered | n_unique | top_counts |
---|---|---|---|---|---|---|
color | Fair | 0 | 1 | TRUE | 7 | G: 314, F: 312, H: 303, E: 224 |
color | Good | 0 | 1 | TRUE | 7 | E: 933, F: 909, G: 871, H: 702 |
color | Very Good | 0 | 1 | TRUE | 7 | E: 2400, G: 2299, F: 2164, H: 1824 |
color | Premium | 0 | 1 | TRUE | 7 | G: 2924, H: 2360, E: 2337, F: 2331 |
color | Ideal | 0 | 1 | TRUE | 7 | G: 4884, E: 3903, F: 3826, H: 3115 |
clarity | Fair | 0 | 1 | TRUE | 8 | SI2: 466, SI1: 408, VS2: 261, I1: 210 |
clarity | Good | 0 | 1 | TRUE | 8 | SI1: 1560, SI2: 1081, VS2: 978, VS1: 648 |
clarity | Very Good | 0 | 1 | TRUE | 8 | SI1: 3240, VS2: 2591, SI2: 2100, VS1: 1775 |
clarity | Premium | 0 | 1 | TRUE | 8 | SI1: 3575, VS2: 3357, SI2: 2949, VS1: 1989 |
clarity | Ideal | 0 | 1 | TRUE | 8 | VS2: 5071, SI1: 4282, VS1: 3589, VVS: 2606 |
Variable type: numeric
skim_variable | cut | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|---|
carat | Fair | 0 | 1 | 1.05 | 0.52 | 0.22 | 0.70 | 1.00 | 1.20 | 5.01 | ▇▂▁▁▁ |
carat | Good | 0 | 1 | 0.85 | 0.45 | 0.23 | 0.50 | 0.82 | 1.01 | 3.01 | ▇▆▂▁▁ |
carat | Very Good | 0 | 1 | 0.81 | 0.46 | 0.20 | 0.41 | 0.71 | 1.02 | 4.00 | ▇▃▁▁▁ |
carat | Premium | 0 | 1 | 0.89 | 0.52 | 0.20 | 0.41 | 0.86 | 1.20 | 4.01 | ▇▆▁▁▁ |
carat | Ideal | 0 | 1 | 0.70 | 0.43 | 0.20 | 0.35 | 0.54 | 1.01 | 3.50 | ▇▂▁▁▁ |
depth | Fair | 0 | 1 | 64.04 | 3.64 | 43.00 | 64.40 | 65.00 | 65.90 | 79.00 | ▁▁▃▇▁ |
depth | Good | 0 | 1 | 62.37 | 2.17 | 54.30 | 61.30 | 63.40 | 63.80 | 67.00 | ▁▂▂▇▁ |
depth | Very Good | 0 | 1 | 61.82 | 1.38 | 56.80 | 60.90 | 62.10 | 62.90 | 64.90 | ▁▂▅▇▂ |
depth | Premium | 0 | 1 | 61.26 | 1.16 | 58.00 | 60.50 | 61.40 | 62.20 | 63.00 | ▁▃▆▇▇ |
depth | Ideal | 0 | 1 | 61.71 | 0.72 | 43.00 | 61.30 | 61.80 | 62.20 | 66.70 | ▁▁▁▇▆ |
table | Fair | 0 | 1 | 59.05 | 3.95 | 49.00 | 56.00 | 58.00 | 61.00 | 95.00 | ▇▆▁▁▁ |
table | Good | 0 | 1 | 58.69 | 2.85 | 51.00 | 56.00 | 58.00 | 61.00 | 66.00 | ▁▇▇▅▂ |
table | Very Good | 0 | 1 | 57.96 | 2.12 | 44.00 | 56.00 | 58.00 | 59.00 | 66.00 | ▁▁▆▇▁ |
table | Premium | 0 | 1 | 58.75 | 1.48 | 51.00 | 58.00 | 59.00 | 60.00 | 62.00 | ▁▁▁▇▃ |
table | Ideal | 0 | 1 | 55.95 | 1.25 | 43.00 | 55.00 | 56.00 | 57.00 | 63.00 | ▁▁▅▇▁ |
price | Fair | 0 | 1 | 4358.76 | 3560.39 | 337.00 | 2050.25 | 3282.00 | 5205.50 | 18574.00 | ▇▃▁▁▁ |
price | Good | 0 | 1 | 3928.86 | 3681.59 | 327.00 | 1145.00 | 3050.50 | 5028.00 | 18788.00 | ▇▃▁▁▁ |
price | Very Good | 0 | 1 | 3981.76 | 3935.86 | 336.00 | 912.00 | 2648.00 | 5372.75 | 18818.00 | ▇▃▁▁▁ |
price | Premium | 0 | 1 | 4584.26 | 4349.20 | 326.00 | 1046.00 | 3185.00 | 6296.00 | 18823.00 | ▇▃▁▁▁ |
price | Ideal | 0 | 1 | 3457.54 | 3808.40 | 326.00 | 878.00 | 1810.00 | 4678.50 | 18806.00 | ▇▂▁▁▁ |
x | Fair | 0 | 1 | 6.25 | 0.96 | 0.00 | 5.63 | 6.18 | 6.70 | 10.74 | ▁▁▇▃▁ |
x | Good | 0 | 1 | 5.84 | 1.06 | 0.00 | 5.02 | 5.98 | 6.42 | 9.44 | ▁▁▆▇▁ |
x | Very Good | 0 | 1 | 5.74 | 1.10 | 0.00 | 4.75 | 5.74 | 6.47 | 10.01 | ▁▁▇▆▁ |
x | Premium | 0 | 1 | 5.97 | 1.19 | 0.00 | 4.80 | 6.11 | 6.80 | 10.14 | ▁▁▇▇▁ |
x | Ideal | 0 | 1 | 5.51 | 1.06 | 0.00 | 4.54 | 5.25 | 6.44 | 9.65 | ▁▁▇▃▁ |
y | Fair | 0 | 1 | 6.18 | 0.96 | 0.00 | 5.57 | 6.10 | 6.64 | 10.54 | ▁▁▇▃▁ |
y | Good | 0 | 1 | 5.85 | 1.05 | 0.00 | 5.02 | 5.99 | 6.44 | 9.38 | ▁▁▆▇▁ |
y | Very Good | 0 | 1 | 5.77 | 1.10 | 0.00 | 4.77 | 5.77 | 6.51 | 9.94 | ▁▁▇▆▁ |
y | Premium | 0 | 1 | 5.94 | 1.26 | 0.00 | 4.79 | 6.06 | 6.76 | 58.90 | ▇▁▁▁▁ |
y | Ideal | 0 | 1 | 5.52 | 1.07 | 0.00 | 4.55 | 5.26 | 6.44 | 31.80 | ▇▃▁▁▁ |
z | Fair | 0 | 1 | 3.98 | 0.65 | 0.00 | 3.61 | 3.97 | 4.28 | 6.98 | ▁▁▇▃▁ |
z | Good | 0 | 1 | 3.64 | 0.65 | 0.00 | 3.07 | 3.70 | 4.03 | 5.79 | ▁▁▆▇▁ |
z | Very Good | 0 | 1 | 3.56 | 0.73 | 0.00 | 2.95 | 3.56 | 4.02 | 31.80 | ▇▁▁▁▁ |
z | Premium | 0 | 1 | 3.65 | 0.73 | 0.00 | 2.94 | 3.72 | 4.16 | 8.06 | ▁▅▇▁▁ |
z | Ideal | 0 | 1 | 3.40 | 0.66 | 0.00 | 2.80 | 3.23 | 3.98 | 6.03 | ▁▁▇▃▁ |
5.2.2 Visualize the data with Group_by
and summarize()
functions
```{r Read_perf}
stu_per_readr %>%
group_by(gender) %>%
summarize(mean_math = mean(`math score`),
sd_math = sd(`math score`),
mean_read = mean(`reading score`),
sd_read = sd(`reading score`),
corr_math_read = cor(`math score`, `reading score`)
)
stu_per_readr %>%
ggplot(aes(x = `math score`, y = `reading score`))+
geom_point(aes(color = gender)) +
geom_smooth(method = lm)
stu_per_readr %>%
ggplot(aes(x = `math score`, y = `reading score`))+
geom_point(aes(color = gender)) +
geom_smooth(method = lm) +
facet_wrap(~ gender)
```
6 Exporting data
Save data as csv, Excel, and SPSS data
6.1 Saving as a csv file
6.2 save as an Excel file
Two packages are available.
6.3 Save as a SPSS data
7 Export Charts (Save chart)
7.1 Manually
7.2 Programatically
8 Web Scapring with rvest
package
8.1 HTML
HTML has a hierarchical structure formed by
elements
which consist of astart tag
(e.g.,), optional ), andattributes
(id=‘first’), anend tag
(likecontents
(everything in between the start and end tag).
<html>
<head>
<title>Page title</title>
</head>
<body>
<h1 id='first'>A heading</h1>
<p>Some text & <b>some bold text.</b></p>
<img src='myimg.png' width='100' height='100'>
</body>
8.1.1 Element
8.1.2 Attribute
- class
- id
8.2 Basics of Extracting Data
8.2.1 Find elements
- CSS is short for cascading style sheets, and is a tool for defining the visual styling of HTML documents.
- CSS includes a miniature language for selecting elements on a page called
CSS selectors
.CSS selectors
define patterns for locating HTML elements, and are useful for scraping because they provide a concise way of describing which elements you want to extract.- Three basic CSS selectors
p
selects all<p>
elements..title
selects all elements withclass
“title”.#title
selects the element with theid
attribute that equals “title”.- Id attributes must be unique within a document, so this will only ever select a single element.
html_elements()
=html_nodes()
class
: .xxxx:ID
: #xxxx
```{r}
library(rvest)
html <- minimal_html("
<h1>This is a heading</h1>
<p id='first'>This is a paragraph</p>
<p class='important'>This is an important paragraph</p>
")
html %>% html_element("h1") |>
html_text()
html %>% html_elements("p") |>
html_text()
html %>% html_elements(".important")|> # use . in front of class
html_text()
html %>% html_elements("#first") |> # use # in front of id
html_text()
```
[1] "This is a heading"
[1] "This is a paragraph" "This is an important paragraph"
[1] "This is an important paragraph"
[1] "This is a paragraph"
8.2.2 Nesting selections
html_elements()
- When applied to a node set, html_elements() returns all matching elements beneath any of the inputs, flattening results into a new node set.
html_element()
- When applied to a node set, html_element() always returns a vector the same length as the input, using a “missing” element where needed.
In most cases, you’ll use html_elements() and html_element() together, typically using html_elements() to identify elements that will become
observations
then using html_element() to find elements that will becomevariables.
```{r}
html <- minimal_html("
<ul>
<li><b>C-3PO</b> is a <i>droid</i> that weighs <span class='weight'>167 kg</span></li>
<li><b>R4-P17</b> is a <i>droid</i></li>
<li><b>R2-D2</b> is a <i>droid</i> that weighs <span class='weight'>96 kg</span></li>
<li><b>Yoda</b> weighs <span class='weight'>66 kg</span></li>
</ul>
")
characters <- html |>
html_elements("li") # observations
characters
# names
characters |>
html_element("b") # variables: cf compare with below
characters |>
html_elements("b")
# weights
characters |>
html_element(".weight")
characters |>
html_elements(".weight")
```
{xml_nodeset (4)}
[1] <li>\n<b>C-3PO</b> is a <i>droid</i> that weighs <span class="weight">167 ...
[2] <li>\n<b>R4-P17</b> is a <i>droid</i>\n</li>
[3] <li>\n<b>R2-D2</b> is a <i>droid</i> that weighs <span class="weight">96 ...
[4] <li>\n<b>Yoda</b> weighs <span class="weight">66 kg</span>\n</li>
{xml_nodeset (4)}
[1] <b>C-3PO</b>
[2] <b>R4-P17</b>
[3] <b>R2-D2</b>
[4] <b>Yoda</b>
{xml_nodeset (4)}
[1] <b>C-3PO</b>
[2] <b>R4-P17</b>
[3] <b>R2-D2</b>
[4] <b>Yoda</b>
{xml_nodeset (4)}
[1] <span class="weight">167 kg</span>
[2] NA
[3] <span class="weight">96 kg</span>
[4] <span class="weight">66 kg</span>
{xml_nodeset (3)}
[1] <span class="weight">167 kg</span>
[2] <span class="weight">96 kg</span>
[3] <span class="weight">66 kg</span>
8.2.3 Texts with html_text2()
8.2.4 attributes with html_attr()
```{r}
html <- minimal_html("
<p><a href='https://en.wikipedia.org/wiki/Cat'>cats</a></p>
<p><a href='https://en.wikipedia.org/wiki/Dog'>dogs</a></p>
")
# read value of attribute
html |>
html_elements("p") |>
html_element("a") |>
html_attr("href")
# read what's surrounded by a tags
html |>
html_elements("p") |>
html_element("a") |>
html_text2()
```
[1] "https://en.wikipedia.org/wiki/Cat" "https://en.wikipedia.org/wiki/Dog"
[1] "cats" "dogs"
8.2.4.1 Practice
```{r}
html <- minimal_html("
<ul>
<li><b>C-3PO</b> is a <i>droid</i> that weighs <span class='weight'>167 kg</span></li>
<li><b>R4-P17</b> is a <i>droid</i></li>
<li><b>R2-D2</b> is a <i>droid</i> that weighs <span class='weight'>96 kg</span></li>
<li><b>Yoda</b> weighs <span class='weight'>66 kg</span></li>
</ul>
")
html |>
html_elements("li") |>
html_element("i") |>
html_text2()
# value of attribute
html |>
html_elements("li") |>
html_element("span") |>
html_attr("class")
# what's surrounded by span (which is weights)
html |>
html_elements("li") |>
html_element("span") |>
html_text()
# using class to get weights
html |>
html_elements("li") |>
html_element(".weight") |>
html_text2()
```
[1] "droid" "droid" "droid" NA
[1] "weight" NA "weight" "weight"
[1] "167 kg" NA "96 kg" "66 kg"
[1] "167 kg" NA "96 kg" "66 kg"
8.2.5 Table
- Four main elements
- <table>,
- <tr> (table row),
- <th> (table heading), and
- <td> (table data).
```{r}
html <- minimal_html("
<table class='mytable'>
<tr><th>x</th> <th>y</th></tr>
<tr><td>1.5</td> <td>2.7</td></tr>
<tr><td>4.9</td> <td>1.3</td></tr>
<tr><td>7.2</td> <td>8.1</td></tr>
</table>
")
html |>
html_element(".mytable") |>
html_table()
html |>
html_element("table") |>
html_table()
html |>
html_elements("table") |>
html_table() # same as above since there is only one table in the data
```
[[1]]
# A tibble: 3 × 2
x y
<dbl> <dbl>
1 1.5 2.7
2 4.9 1.3
3 7.2 8.1
8.3 Practice
- Goal: Extract the table, where id=“exampleTable”
```{r}
## first, read the HTML code for our example HTML page
html <- read_html('https://bit.ly/3lz6ZRe')
html %>%
html_element('#exampleTable') |>
html_table()
```
- Goal: Extract the whole paragraph for the class named “rightColumn”
```{r}
html %>%
html_element('.rightColumn p') |>
html_text2() |>
cat()
html |>
html_element(".rightColumn") |>
html_element("p") |>
html_text2() # the same as above
html %>%
html_element('div.rightColumn p') |>
html_text2() # the same as above
```
Here's another column! The main purpose of this column is just to show that you can use CSS selectors to get all elements in a specific column.[1] "Here's another column! The main purpose of this column is just to show that you can use CSS selectors to get all elements in a specific column."
[1] "Here's another column! The main purpose of this column is just to show that you can use CSS selectors to get all elements in a specific column."
8.3.1 Selecting HTML elements
```{r}
html = 'https://bit.ly/3lz6ZRe' %>% read_html()
html
## find any <table> element
html %>% html_element('table') ## left table
html %>% html_elements('table') ## set of both tables
## find any element with class="someTable"
html %>% html_element('.someTable') ## left table
html %>% html_elements('.someTable') ## set of both tables
## find any element with id="steve"
## (only called it steve to show that id can be anything the developer chooses)
html %>% html_element('#steve') ## right table
html %>% html_elements('#steve') ## set with only the right table
## find any <tr> element with class="headerRow"
html %>% html_element('tr.headerRow') ## left table first row
html %>% html_elements('tr.headerRow') ## first rows of both tables
## find any element with class="someTable blue"
html %>% html_element('.someTable.blue') |> ## right table
html_table()
html %>% html_element('table.someTable.blue') |> ## right table
html_table()
html %>% html_element('table#steve.someTable.blue') |> ## right table
html_table()
html %>% html_element('#steve') |> ## right table
html_table()
html %>% html_element('table#steve') |> ## right table
html_table()
html %>% html_element('table.someTable.blue#steve') |> ## right table
html_table()
html %>% html_elements('.someTable.blue') |> ## set with only the right table
html_table()
```
{html_document}
<html>
[1] <head>\n<meta http-equiv="Content-Type" content="text/html; charset=UTF-8 ...
[2] <body>\n\n<h2>Looking behind the curtains of an HTML page</h2>\n\n<div cl ...
{html_node}
<table class="someTable" id="exampleTable">
[1] <tr class="headerRow">\n<!-- table row --><th>First column</th ...
[2] <tr>\n<!-- table row --><td>1</td> ...
[3] <tr>\n<!-- table row --><td>4</td> ...
{xml_nodeset (2)}
[1] <table class="someTable" id="exampleTable">\n<!-- table -- ...
[2] <table class="someTable blue" id="steve">\n<tr class="headerRow">\n<th>nu ...
{html_node}
<table class="someTable" id="exampleTable">
[1] <tr class="headerRow">\n<!-- table row --><th>First column</th ...
[2] <tr>\n<!-- table row --><td>1</td> ...
[3] <tr>\n<!-- table row --><td>4</td> ...
{xml_nodeset (2)}
[1] <table class="someTable" id="exampleTable">\n<!-- table -- ...
[2] <table class="someTable blue" id="steve">\n<tr class="headerRow">\n<th>nu ...
{html_node}
<table class="someTable blue" id="steve">
[1] <tr class="headerRow">\n<th>numbers</th>\n <th>letters</th>\n ...
[2] <tr>\n<td>1</td>\n <td>A</td>\n </tr>\n
[3] <tr>\n<td>2</td>\n <td>B</td>\n </tr>\n
[4] <tr>\n<td>3</td>\n <td>C</td>\n </tr>\n
[5] <tr>\n<td>4</td>\n <td>D</td>\n </tr>\n
[6] <tr>\n<td>5</td>\n <td>E</td>\n </tr>\n
[7] <tr>\n<td>6</td>\n <td>F</td>\n </tr>
{xml_nodeset (1)}
[1] <table class="someTable blue" id="steve">\n<tr class="headerRow">\n<th>nu ...
{html_node}
<tr class="headerRow">
[1] <th>First column</th>
[2] <th>Second column</th>
[3] <th>Third column</th>
{xml_nodeset (2)}
[1] <tr class="headerRow">\n<!-- table row --><th>First column</th ...
[2] <tr class="headerRow">\n<th>numbers</th>\n <th>letters</th>\n ...
[[1]]
# A tibble: 6 × 2
numbers letters
<int> <chr>
1 1 A
2 2 B
3 3 C
4 4 D
5 5 E
6 6 F
8.3.2 Extracting data from elements
```{r}
html = 'https://bit.ly/3lz6ZRe' %>% read_html()
# extract everything in the the left column
html %>% html_element('.leftColumn') %>% html_text() |>
cat()
html %>% html_element('.leftColumn') %>% html_text2() |>
cat() # the same as above
# Extract value of attributes
html |> html_elements('#exampleTable') |> html_attrs() # singlular doesn't work
html |> html_elements('#exampleTable') |> html_attr("id")
html |> html_elements('#exampleTable') |> html_attr("class")
html |> html_elements('.someTable') |> html_attrs()
html |> html_elements('.someTable') |> html_attr("class")
html |> html_elements('.someTable') |> html_attr("id")
# links
html %>% html_elements('a') %>% html_attr('href')
```
Left Column
This is a simple HTML document. Right click on the page and select view page source
(or something similar, depending on browser) to view the HTML source code.
Alternatively, right click on a specific element on the page and select inspect element.
This also shows the HTML code, but focused on the selected element. You should be able to fold
and unfold HTML nodes (using the triangle-like thing before the <tags>), and when you hover
your mouse over them, they should light up in the browser. Play around with this for a bit to get
a feel for exploring HTML code.
Here's a stupid table.
First column
Second column
Third column
1
2
3
4
5
6
Left Column
This is a simple HTML document. Right click on the page and select view page source (or something similar, depending on browser) to view the HTML source code.
Alternatively, right click on a specific element on the page and select inspect element. This also shows the HTML code, but focused on the selected element. You should be able to fold and unfold HTML nodes (using the triangle-like thing before the <tags>), and when you hover your mouse over them, they should light up in the browser. Play around with this for a bit to get a feel for exploring HTML code.
Here's a stupid table.
First column Second column Third column
1 2 3
4 5 6 [[1]]
class id
"someTable" "exampleTable"
[1] "exampleTable"
[1] "someTable"
[[1]]
class id
"someTable" "exampleTable"
[[2]]
class id
"someTable blue" "steve"
[1] "someTable" "someTable blue"
[1] "exampleTable" "steve"
[1] "https://blog.hubspot.com/website/how-to-inspect"
8.4 Finding the right selector
- selectorGadget
- Web development tool, Inspect
8.5 Starwars
- vignettes: https://rvest.tidyverse.org/articles/starwars.html
- Task: Turn this data into a 7 row data frame with the following variables: title, year, director, and intro
```{r}
url <- "https://rvest.tidyverse.org/articles/starwars.html"
# observation
section <- url |>
read_html() |>
html_elements("section")
# variable
section |>
html_element("h2") |>
html_text2()
# title
title = section |>
html_element("h2") |>
html_text2()
# release date
released = section |>
html_element("p") |>
html_text2() |>
str_remove("Released: ") |>
parse_date()
# director
director = section |>
html_element(".director") |>
html_text2()
section |>
html_element("span") |>
html_text2()
section |>
html_element("p span") |>
html_text2()
# introduction
intro = section |>
html_element(".crawl") |>
html_text2()
section |>
html_element("div.crawl") |>
html_text2()
tibble(title, released, director, intro)
```
[1] "The Phantom Menace" "Attack of the Clones"
[3] "Revenge of the Sith" "A New Hope"
[5] "The Empire Strikes Back" "Return of the Jedi"
[7] "The Force Awakens"
[1] "George Lucas" "George Lucas" "George Lucas" "George Lucas"
[5] "Irvin Kershner" "Richard Marquand" "J. J. Abrams"
[1] "George Lucas" "George Lucas" "George Lucas" "George Lucas"
[5] "Irvin Kershner" "Richard Marquand" "J. J. Abrams"
[1] "Turmoil has engulfed the Galactic Republic. The taxation of trade routes to outlying star systems is in dispute.\n\nHoping to resolve the matter with a blockade of deadly battleships, the greedy Trade Federation has stopped all shipping to the small planet of Naboo.\n\nWhile the Congress of the Republic endlessly debates this alarming chain of events, the Supreme Chancellor has secretly dispatched two Jedi Knights, the guardians of peace and justice in the galaxy, to settle the conflict…."
[2] "There is unrest in the Galactic Senate. Several thousand solar systems have declared their intentions to leave the Republic.\n\nThis separatist movement, under the leadership of the mysterious Count Dooku, has made it difficult for the limited number of Jedi Knights to maintain peace and order in the galaxy.\n\nSenator Amidala, the former Queen of Naboo, is returning to the Galactic Senate to vote on the critical issue of creating an ARMY OF THE REPUBLIC to assist the overwhelmed Jedi…."
[3] "War! The Republic is crumbling under attacks by the ruthless Sith Lord, Count Dooku. There are heroes on both sides. Evil is everywhere.\n\nIn a stunning move, the fiendish droid leader, General Grievous, has swept into the Republic capital and kidnapped Chancellor Palpatine, leader of the Galactic Senate.\n\nAs the Separatist Droid Army attempts to flee the besieged capital with their valuable hostage, two Jedi Knights lead a desperate mission to rescue the captive Chancellor…."
[4] "It is a period of civil war. Rebel spaceships, striking from a hidden base, have won their first victory against the evil Galactic Empire.\n\nDuring the battle, Rebel spies managed to steal secret plans to the Empire’s ultimate weapon, the DEATH STAR, an armored space station with enough power to destroy an entire planet.\n\nPursued by the Empire’s sinister agents, Princess Leia races home aboard her starship, custodian of the stolen plans that can save her people and restore freedom to the galaxy…."
[5] "It is a dark time for the Rebellion. Although the Death Star has been destroyed, Imperial troops have driven the Rebel forces from their hidden base and pursued them across the galaxy.\n\nEvading the dreaded Imperial Starfleet, a group of freedom fighters led by Luke Skywalker has established a new secret base on the remote ice world of Hoth.\n\nThe evil lord Darth Vader, obsessed with finding young Skywalker, has dispatched thousands of remote probes into the far reaches of space…."
[6] "Luke Skywalker has returned to his home planet of Tatooine in an attempt to rescue his friend Han Solo from the clutches of the vile gangster Jabba the Hutt.\n\nLittle does Luke know that the GALACTIC EMPIRE has secretly begun construction on a new armored space station even more powerful than the first dreaded Death Star.\n\nWhen completed, this ultimate weapon will spell certain doom for the small band of rebels struggling to restore freedom to the galaxy…"
[7] "Luke Skywalker has vanished. In his absence, the sinister FIRST ORDER has risen from the ashes of the Empire and will not rest until Skywalker, the last Jedi, has been destroyed. With the support of the REPUBLIC, General Leia Organa leads a brave RESISTANCE. She is desperate to find her brother Luke and gain his help in restoring peace and justice to the galaxy. Leia has sent her most daring pilot on a secret mission to Jakku, where an old ally has discovered a clue to Luke’s whereabouts…."
8.6 Archived IMDB Example
From IMDB Top 250 Movies list, extract the ranking, title, year, rating, and vote number. <https://web.archive.org/web/20220201012049/https://www.imdb.com/chart/top/” >
```{r}
"https://web.archive.org/web/20220201012049/https://www.imdb.com/chart/top/" |> read_html() -> html
table <- html |>
html_element("table") |>
html_table()
table |> glimpse()
ratings <- table |>
select(
rank_title_year = `Rank & Title`,
rating = `IMDb Rating`
) |>
mutate(rank_title_year = str_replace_all(rank_title_year, "\n +", " ")) |>
separate_wider_regex( # tidyr
rank_title_year,
patterns = c(
rank = "\\d+", "\\. ",
title = ".+", " +\\(",
year = "\\d+", "\\)"
)
)
ratings
# rating
html |>
html_elements("tr .ratingColumn.imdbRating") |>
html_text2()
# vote counts
html |>
html_elements("td strong") |>
html_attr("title")
ratings |>
mutate(
rating_n = html |> html_elements("td strong") |> html_attr("title")
) |>
separate_wider_regex(
rating_n,
patterns = c(
"[0-9.]+ based on ",
number = "[0-9,]+",
" user ratings"
)
) |>
mutate(number = parse_number(number))
```
Rows: 250
Columns: 5
$ `` <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ `Rank & Title` <chr> "1.\n The Shawshank Redemption\n (1994)", "…
$ `IMDb Rating` <dbl> 9.2, 9.1, 9.0, 9.0, 8.9, 8.9, 8.9, 8.8, 8.8, 8.8, 8.7, …
$ `Your Rating` <chr> "12345678910\n \n \n \n …
$ `` <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
[1] "9.2" "9.1" "9.0" "9.0" "8.9" "8.9" "8.9" "8.8" "8.8" "8.8" "8.7" "8.7"
[13] "8.7" "8.7" "8.7" "8.7" "8.6" "8.6" "8.6" "8.6" "8.6" "8.6" "8.6" "8.6"
[25] "8.6" "8.6" "8.5" "8.5" "8.5" "8.5" "8.5" "8.5" "8.5" "8.5" "8.5" "8.5"
[37] "8.5" "8.5" "8.5" "8.5" "8.5" "8.5" "8.5" "8.5" "8.5" "8.5" "8.5" "8.5"
[49] "8.4" "8.4" "8.4" "8.4" "8.4" "8.4" "8.4" "8.4" "8.4" "8.4" "8.4" "8.4"
[61] "8.4" "8.4" "8.4" "8.4" "8.4" "8.4" "8.4" "8.3" "8.3" "8.3" "8.3" "8.3"
[73] "8.3" "8.3" "8.3" "8.3" "8.3" "8.3" "8.3" "8.3" "8.3" "8.3" "8.3" "8.3"
[85] "8.3" "8.3" "8.3" "8.3" "8.3" "8.3" "8.3" "8.3" "8.3" "8.3" "8.3" "8.3"
[97] "8.3" "8.3" "8.3" "8.3" "8.3" "8.3" "8.3" "8.2" "8.2" "8.2" "8.2" "8.2"
[109] "8.2" "8.2" "8.2" "8.2" "8.2" "8.2" "8.2" "8.2" "8.2" "8.2" "8.2" "8.2"
[121] "8.2" "8.2" "8.2" "8.2" "8.2" "8.2" "8.2" "8.2" "8.2" "8.2" "8.2" "8.2"
[133] "8.2" "8.2" "8.2" "8.2" "8.2" "8.2" "8.2" "8.2" "8.2" "8.2" "8.2" "8.2"
[145] "8.2" "8.2" "8.2" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1"
[157] "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1"
[169] "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1"
[181] "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1"
[193] "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1"
[205] "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1"
[217] "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.0" "8.0" "8.0" "8.0" "8.0"
[229] "8.0" "8.0" "8.0" "8.0" "8.0" "8.0" "8.0" "8.0" "8.0" "8.0" "8.0" "8.0"
[241] "8.0" "8.0" "8.0" "8.0" "8.0" "8.0" "8.0" "8.0" "8.0" "8.0"
[1] "9.2 based on 2,536,415 user ratings"
[2] "9.1 based on 1,745,675 user ratings"
[3] "9.0 based on 1,211,032 user ratings"
[4] "9.0 based on 2,486,931 user ratings"
[5] "8.9 based on 749,563 user ratings"
[6] "8.9 based on 1,295,705 user ratings"
[7] "8.9 based on 1,749,722 user ratings"
[8] "8.8 based on 1,952,864 user ratings"
[9] "8.8 based on 732,557 user ratings"
[10] "8.8 based on 1,771,245 user ratings"
[11] "8.7 based on 1,996,110 user ratings"
[12] "8.7 based on 1,957,544 user ratings"
[13] "8.7 based on 2,228,642 user ratings"
[14] "8.7 based on 1,580,899 user ratings"
[15] "8.7 based on 1,230,892 user ratings"
[16] "8.7 based on 1,830,919 user ratings"
[17] "8.6 based on 1,096,894 user ratings"
[18] "8.6 based on 970,948 user ratings"
[19] "8.6 based on 334,944 user ratings"
[20] "8.6 based on 1,556,144 user ratings"
[21] "8.6 based on 1,363,287 user ratings"
[22] "8.6 based on 733,264 user ratings"
[23] "8.6 based on 438,951 user ratings"
[24] "8.6 based on 666,605 user ratings"
[25] "8.6 based on 432,286 user ratings"
[26] "8.6 based on 1,323,820 user ratings"
[27] "8.5 based on 1,302,369 user ratings"
[28] "8.5 based on 1,681,995 user ratings"
[29] "8.5 based on 715,672 user ratings"
[30] "8.5 based on 1,234,090 user ratings"
[31] "8.5 based on 712,448 user ratings"
[32] "8.5 based on 1,108,188 user ratings"
[33] "8.5 based on 51,931 user ratings"
[34] "8.5 based on 790,771 user ratings"
[35] "8.5 based on 1,053,728 user ratings"
[36] "8.5 based on 1,140,698 user ratings"
[37] "8.5 based on 1,047,652 user ratings"
[38] "8.5 based on 644,147 user ratings"
[39] "8.5 based on 1,006,734 user ratings"
[40] "8.5 based on 233,625 user ratings"
[41] "8.5 based on 260,547 user ratings"
[42] "8.5 based on 1,084,181 user ratings"
[43] "8.5 based on 788,463 user ratings"
[44] "8.5 based on 1,431,474 user ratings"
[45] "8.5 based on 179,321 user ratings"
[46] "8.5 based on 1,266,998 user ratings"
[47] "8.5 based on 817,319 user ratings"
[48] "8.5 based on 1,273,524 user ratings"
[49] "8.4 based on 552,614 user ratings"
[50] "8.4 based on 319,732 user ratings"
[51] "8.4 based on 474,492 user ratings"
[52] "8.4 based on 250,817 user ratings"
[53] "8.4 based on 841,202 user ratings"
[54] "8.4 based on 642,211 user ratings"
[55] "8.4 based on 1,191,224 user ratings"
[56] "8.4 based on 932,634 user ratings"
[57] "8.4 based on 217,246 user ratings"
[58] "8.4 based on 1,466,464 user ratings"
[59] "8.4 based on 378,992 user ratings"
[60] "8.4 based on 190,672 user ratings"
[61] "8.4 based on 214,880 user ratings"
[62] "8.4 based on 1,066,004 user ratings"
[63] "8.4 based on 975,508 user ratings"
[64] "8.4 based on 119,927 user ratings"
[65] "8.4 based on 466,734 user ratings"
[66] "8.4 based on 968,398 user ratings"
[67] "8.4 based on 474,688 user ratings"
[68] "8.3 based on 374,677 user ratings"
[69] "8.3 based on 553,426 user ratings"
[70] "8.3 based on 1,136,562 user ratings"
[71] "8.3 based on 239,205 user ratings"
[72] "8.3 based on 458,597 user ratings"
[73] "8.3 based on 1,613,428 user ratings"
[74] "8.3 based on 691,108 user ratings"
[75] "8.3 based on 337,473 user ratings"
[76] "8.3 based on 1,005,677 user ratings"
[77] "8.3 based on 81,165 user ratings"
[78] "8.3 based on 244,486 user ratings"
[79] "8.3 based on 41,453 user ratings"
[80] "8.3 based on 377,496 user ratings"
[81] "8.3 based on 947,809 user ratings"
[82] "8.3 based on 388,750 user ratings"
[83] "8.3 based on 1,120,594 user ratings"
[84] "8.3 based on 1,004,883 user ratings"
[85] "8.3 based on 1,368,392 user ratings"
[86] "8.3 based on 922,275 user ratings"
[87] "8.3 based on 81,907 user ratings"
[88] "8.3 based on 1,006,169 user ratings"
[89] "8.3 based on 71,711 user ratings"
[90] "8.3 based on 642,698 user ratings"
[91] "8.3 based on 976,830 user ratings"
[92] "8.3 based on 185,065 user ratings"
[93] "8.3 based on 389,665 user ratings"
[94] "8.3 based on 153,379 user ratings"
[95] "8.3 based on 313,111 user ratings"
[96] "8.3 based on 429,518 user ratings"
[97] "8.3 based on 809,376 user ratings"
[98] "8.3 based on 233,653 user ratings"
[99] "8.3 based on 317,960 user ratings"
[100] "8.3 based on 966,328 user ratings"
[101] "8.3 based on 75,518 user ratings"
[102] "8.3 based on 158,667 user ratings"
[103] "8.3 based on 284,543 user ratings"
[104] "8.2 based on 122,766 user ratings"
[105] "8.2 based on 714,172 user ratings"
[106] "8.2 based on 168,419 user ratings"
[107] "8.2 based on 175,698 user ratings"
[108] "8.2 based on 177,566 user ratings"
[109] "8.2 based on 152,493 user ratings"
[110] "8.2 based on 169,613 user ratings"
[111] "8.2 based on 236,661 user ratings"
[112] "8.2 based on 121,932 user ratings"
[113] "8.2 based on 780,889 user ratings"
[114] "8.2 based on 799,840 user ratings"
[115] "8.2 based on 254,464 user ratings"
[116] "8.2 based on 796,924 user ratings"
[117] "8.2 based on 826,507 user ratings"
[118] "8.2 based on 528,206 user ratings"
[119] "8.2 based on 735,266 user ratings"
[120] "8.2 based on 308,535 user ratings"
[121] "8.2 based on 801,505 user ratings"
[122] "8.2 based on 248,181 user ratings"
[123] "8.2 based on 997,479 user ratings"
[124] "8.2 based on 30,369 user ratings"
[125] "8.2 based on 729,640 user ratings"
[126] "8.2 based on 623,585 user ratings"
[127] "8.2 based on 563,810 user ratings"
[128] "8.2 based on 121,699 user ratings"
[129] "8.2 based on 119,336 user ratings"
[130] "8.2 based on 847,420 user ratings"
[131] "8.2 based on 447,908 user ratings"
[132] "8.2 based on 163,198 user ratings"
[133] "8.2 based on 346,503 user ratings"
[134] "8.2 based on 128,053 user ratings"
[135] "8.2 based on 525,787 user ratings"
[136] "8.2 based on 258,938 user ratings"
[137] "8.2 based on 1,389,272 user ratings"
[138] "8.2 based on 398,858 user ratings"
[139] "8.2 based on 72,133 user ratings"
[140] "8.2 based on 172,314 user ratings"
[141] "8.2 based on 369,894 user ratings"
[142] "8.2 based on 1,309,309 user ratings"
[143] "8.2 based on 75,094 user ratings"
[144] "8.2 based on 559,103 user ratings"
[145] "8.2 based on 498,452 user ratings"
[146] "8.2 based on 237,967 user ratings"
[147] "8.2 based on 121,706 user ratings"
[148] "8.1 based on 648,671 user ratings"
[149] "8.1 based on 897,988 user ratings"
[150] "8.1 based on 204,179 user ratings"
[151] "8.1 based on 341,517 user ratings"
[152] "8.1 based on 315,112 user ratings"
[153] "8.1 based on 320,423 user ratings"
[154] "8.1 based on 1,232,392 user ratings"
[155] "8.1 based on 564,751 user ratings"
[156] "8.1 based on 924,770 user ratings"
[157] "8.1 based on 137,430 user ratings"
[158] "8.1 based on 170,061 user ratings"
[159] "8.1 based on 402,801 user ratings"
[160] "8.1 based on 108,330 user ratings"
[161] "8.1 based on 480,345 user ratings"
[162] "8.1 based on 179,115 user ratings"
[163] "8.1 based on 234,213 user ratings"
[164] "8.1 based on 27,096 user ratings"
[165] "8.1 based on 957,223 user ratings"
[166] "8.1 based on 1,015,649 user ratings"
[167] "8.1 based on 929,833 user ratings"
[168] "8.1 based on 104,205 user ratings"
[169] "8.1 based on 167,796 user ratings"
[170] "8.1 based on 166,961 user ratings"
[171] "8.1 based on 1,086,084 user ratings"
[172] "8.1 based on 738,384 user ratings"
[173] "8.1 based on 666,235 user ratings"
[174] "8.1 based on 655,486 user ratings"
[175] "8.1 based on 214,651 user ratings"
[176] "8.1 based on 673,092 user ratings"
[177] "8.1 based on 1,002,899 user ratings"
[178] "8.1 based on 1,068,344 user ratings"
[179] "8.1 based on 457,139 user ratings"
[180] "8.1 based on 306,362 user ratings"
[181] "8.1 based on 59,253 user ratings"
[182] "8.1 based on 150,891 user ratings"
[183] "8.1 based on 84,576 user ratings"
[184] "8.1 based on 190,747 user ratings"
[185] "8.1 based on 661,393 user ratings"
[186] "8.1 based on 129,500 user ratings"
[187] "8.1 based on 766,950 user ratings"
[188] "8.1 based on 329,782 user ratings"
[189] "8.1 based on 88,539 user ratings"
[190] "8.1 based on 113,559 user ratings"
[191] "8.1 based on 753,584 user ratings"
[192] "8.1 based on 47,411 user ratings"
[193] "8.1 based on 293,483 user ratings"
[194] "8.1 based on 173,140 user ratings"
[195] "8.1 based on 917,392 user ratings"
[196] "8.1 based on 113,180 user ratings"
[197] "8.1 based on 161,695 user ratings"
[198] "8.1 based on 169,052 user ratings"
[199] "8.1 based on 468,148 user ratings"
[200] "8.1 based on 487,407 user ratings"
[201] "8.1 based on 27,359 user ratings"
[202] "8.1 based on 933,677 user ratings"
[203] "8.1 based on 402,019 user ratings"
[204] "8.1 based on 52,624 user ratings"
[205] "8.1 based on 86,972 user ratings"
[206] "8.1 based on 353,368 user ratings"
[207] "8.1 based on 676,674 user ratings"
[208] "8.1 based on 35,134 user ratings"
[209] "8.1 based on 780,504 user ratings"
[210] "8.1 based on 462,094 user ratings"
[211] "8.1 based on 829,167 user ratings"
[212] "8.1 based on 232,297 user ratings"
[213] "8.1 based on 708,101 user ratings"
[214] "8.1 based on 951,179 user ratings"
[215] "8.1 based on 32,909 user ratings"
[216] "8.1 based on 667,630 user ratings"
[217] "8.1 based on 59,274 user ratings"
[218] "8.1 based on 387,091 user ratings"
[219] "8.1 based on 133,958 user ratings"
[220] "8.1 based on 154,752 user ratings"
[221] "8.1 based on 710,813 user ratings"
[222] "8.1 based on 70,488 user ratings"
[223] "8.1 based on 164,778 user ratings"
[224] "8.0 based on 272,839 user ratings"
[225] "8.0 based on 172,689 user ratings"
[226] "8.0 based on 92,320 user ratings"
[227] "8.0 based on 113,868 user ratings"
[228] "8.0 based on 401,608 user ratings"
[229] "8.0 based on 452,546 user ratings"
[230] "8.0 based on 869,253 user ratings"
[231] "8.0 based on 133,352 user ratings"
[232] "8.0 based on 388,191 user ratings"
[233] "8.0 based on 141,794 user ratings"
[234] "8.0 based on 347,891 user ratings"
[235] "8.0 based on 68,989 user ratings"
[236] "8.0 based on 461,996 user ratings"
[237] "8.0 based on 551,403 user ratings"
[238] "8.0 based on 235,194 user ratings"
[239] "8.0 based on 604,948 user ratings"
[240] "8.0 based on 163,901 user ratings"
[241] "8.0 based on 45,710 user ratings"
[242] "8.0 based on 253,135 user ratings"
[243] "8.0 based on 100,753 user ratings"
[244] "8.0 based on 62,481 user ratings"
[245] "8.0 based on 39,453 user ratings"
[246] "8.0 based on 58,133 user ratings"
[247] "8.0 based on 47,439 user ratings"
[248] "8.0 based on 45,058 user ratings"
[249] "8.0 based on 52,144 user ratings"
[250] "8.0 based on 416,920 user ratings"
8.7 Current IMDB Site Example
From IMDB Top 250 Movies list, extract movie ranking, titles, ratings, and vote counts from the website https://m.imdb.com/chart/top/?ref_=nv_mv_250 and create a dataframe with them.
```{r}
page <- "https://m.imdb.com/chart/top/?ref_=nv_mv_250" |> read_html()
rank_titles <- page |>
html_elements("ul") |>
html_elements("li h3") |>
html_text2()
ratings <- page |>
html_elements("ul") |>
html_elements("li .ipc-rating-star--rating") |>
html_text2()
vote_counts <- page |>
html_elements("ul") |>
html_elements("li .ipc-rating-star--voteCount") |>
html_text(trim = TRUE)
```
8.7.1 Tibbles
```{r}
top_movies <- tibble(rank_titles, ratings, vote_counts)
top_movies <- top_movies |>
separate_wider_regex(
rank_titles,
patterns = c(
ranking = "\\d+", "\\. ",
movie = ".+"
)
) |>
mutate(
ranking = as.numeric(ranking),
ratings = parse_number(ratings),
vote_counts = str_extract(vote_counts, "[0-9.]+[MK]"),
multiplier = if_else(str_detect(vote_counts, "M"), 1e6, 1e3),
vote_counts = parse_number(vote_counts) * multiplier
) |>
select(-multiplier)
top_movies
```
9 NA’s
9.1 Explicit and implicit missing values
```{r}
stock <- tibble(
year = c(2020, 2020, 2020, 2020, 2021, 2021, 2021),
qtr = c( 1, 2, 3, 4, 2, 3, 4),
price = c(1.88, 0.59, 0.35, NA, 0.92, 0.17, 2.66)
)
stock
# pivot_wider will make implicit missing explicit.
stock_wider <- stock |>
pivot_wider(
names_from = qtr,
values_from = price,
)
stock_wider
# in pivot_longer, missing values can be dropped
stock_wider |>
pivot_longer(
cols = -year,
names_to = "qtr",
values_to = "price",
values_drop_na = TRUE
)
```
9.1.1 complete()
from
tidyr
package - Allows you to generate explicit missing values
9.2 Factors and empty group in factor levels
9.2.1 table
```{r}
health <- tibble(
name = c("Ikaia", "Oletta", "Leriah", "Dashay", "Tresaun"),
smoker = factor(c("no", "no", "no", "no", "no"), levels = c("yes", "no")),
age = c(34, 88, 75, 47, 56),
)
health
health |>
count(smoker)
library(gt)
health |>
count(smoker, .drop = FALSE) |>
gt() |>
tab_header(md("**Smokers vs. Non-smokers**"))
```
Smokers vs. Non-smokers | |
---|---|
smoker | n |
yes | 0 |
no | 5 |
9.2.2 plots
9.2.3 group_by()
```{r}
health |>
group_by(smoker) |>
summarise(
n = n(),
mean_age = mean(age),
sd_age = sd(age),
min_age = min(age),
max_age = max(age)
)
# both groups
health |>
group_by(smoker, .drop = FALSE) |>
summarise(
n = n(),
mean_age = mean(age),
sd_age = sd(age),
min_age = min(age),
max_age = max(age)
)
# cf
health |>
group_by(smoker) |>
summarize(
n = n(),
mean_age = mean(age),
sd_age = sd(age),
min_age = min(age),
max_age = max(age)
) |>
complete(smoker)
```
9.3 Missing value imputation using across()
and replace_na()
The coding may not work. Refer to the Advanced Wrangling codebook in Step 5 and see examples.
replace_na()
from tidyr
```{r}
airqual <- airquality %>%
as_tibble()
# identify variables with missing values
skimr::skim(airqual)
# identify data type of the variables with missing values.
glimpse(airqual)
# converting the target variables to doubles
airqual <- airqual %>%
mutate(Solar.R = as.double(Solar.R),
Ozone = as.double(Ozone))
glimpse(airqual)
# missing value imputation with a mean
airqual |>
mutate(across(Solar.R, ~ replace_na(., mean(., na.rm = TRUE))))
airqual |>
mutate(across(c(Ozone, Solar.R), ~replace_na(., mean(., na.rm = TRUE))))
airqual |>
mutate(across(where(is.double), ~replace_na(., mean(., na.rm = TRUE))))
```
Name | airqual |
Number of rows | 153 |
Number of columns | 6 |
_______________________ | |
Column type frequency: | |
numeric | 6 |
________________________ | |
Group variables | None |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
Ozone | 37 | 0.76 | 42.13 | 32.99 | 1.0 | 18.00 | 31.5 | 63.25 | 168.0 | ▇▃▂▁▁ |
Solar.R | 7 | 0.95 | 185.93 | 90.06 | 7.0 | 115.75 | 205.0 | 258.75 | 334.0 | ▅▃▅▇▅ |
Wind | 0 | 1.00 | 9.96 | 3.52 | 1.7 | 7.40 | 9.7 | 11.50 | 20.7 | ▂▇▇▃▁ |
Temp | 0 | 1.00 | 77.88 | 9.47 | 56.0 | 72.00 | 79.0 | 85.00 | 97.0 | ▂▃▇▇▃ |
Month | 0 | 1.00 | 6.99 | 1.42 | 5.0 | 6.00 | 7.0 | 8.00 | 9.0 | ▇▇▇▇▇ |
Day | 0 | 1.00 | 15.80 | 8.86 | 1.0 | 8.00 | 16.0 | 23.00 | 31.0 | ▇▇▇▇▆ |
Rows: 153
Columns: 6
$ Ozone <int> 41, 36, 12, 18, NA, 28, 23, 19, 8, NA, 7, 16, 11, 14, 18, 14, …
$ Solar.R <int> 190, 118, 149, 313, NA, NA, 299, 99, 19, 194, NA, 256, 290, 27…
$ Wind <dbl> 7.4, 8.0, 12.6, 11.5, 14.3, 14.9, 8.6, 13.8, 20.1, 8.6, 6.9, 9…
$ Temp <int> 67, 72, 74, 62, 56, 66, 65, 59, 61, 69, 74, 69, 66, 68, 58, 64…
$ Month <int> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,…
$ Day <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,…
Rows: 153
Columns: 6
$ Ozone <dbl> 41, 36, 12, 18, NA, 28, 23, 19, 8, NA, 7, 16, 11, 14, 18, 14, …
$ Solar.R <dbl> 190, 118, 149, 313, NA, NA, 299, 99, 19, 194, NA, 256, 290, 27…
$ Wind <dbl> 7.4, 8.0, 12.6, 11.5, 14.3, 14.9, 8.6, 13.8, 20.1, 8.6, 6.9, 9…
$ Temp <int> 67, 72, 74, 62, 56, 66, 65, 59, 61, 69, 74, 69, 66, 68, 58, 64…
$ Month <int> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,…
$ Day <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,…
9.4 using naniar
package
9.4.1 Data
9.4.2 replace_with_na()
- Replace an element with a missing value.
- To do so we use the replace argument, and specify a named list, which contains the names of the variable and the value it would take to replace with NA.
9.4.3 replace_with_na_all()
- Replace ALL values that meet a condition across an entire dataset.
```{r}
df |>
replace_with_na_all(condition = ~ .x == -99)
# write out all the offending strings
na_strings <- c("NA", "N A", "N / A", "N/A", "N/ A", "Not Available", "NOt available")
df |>
replace_with_na_all(condition = ~ .x %in% na_strings)
# common missing values list
common_na_numbers
common_na_strings
df |>
replace_with_na_all(condition = ~ .x %in% common_na_strings)
df |>
replace_with_na_all(condition = ~.x %in% c(common_na_strings,
common_na_numbers,
-98, -100, -101, -1))
```
[1] -9 -99 -999 -9999 9999 66 77 88
[1] "missing" "NA" "N A" "N/A" "#N/A" "NA " " NA"
[8] "N /A" "N / A" " N / A" "N / A " "na" "n a" "n/a"
[15] "na " " na" "n /a" "n / a" " a / a" "n / a " "NULL"
[22] "null" "" "\\?" "\\*" "\\."
9.4.4 replace_with_na_at()
This is similar to _all, but instead in this case you can specify the variables that you want affected by the rule that you state.
This is useful in cases where you want to specify a rule that only affects a selected number of variables.
9.4.5 replace_with_na_if()
10 Working with multiple External Data Sets.
- Tidytuesday repo: https://github.com/rfordatascience/tidytuesday/tree/master/data/2019/2019-05-07
- Data dictionary
10.1 Data and setup
```{r}
library(tidyverse)
theme_set(theme_bw())
student_ratio <- read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-05-07/student_teacher_ratio.csv")
skimr::skim(student_ratio)
student_ratio %>%
arrange(desc(student_ratio)) %>%
slice_head(n =10)
student_ratio %>%
#view()
#count(indicator)
count(year, sort = TRUE)
#count(flags, sort = TRUE)
```
Name | student_ratio |
Number of rows | 5189 |
Number of columns | 8 |
_______________________ | |
Column type frequency: | |
character | 6 |
numeric | 2 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
edulit_ind | 0 | 1.00 | 7 | 9 | 0 | 7 | 0 |
indicator | 0 | 1.00 | 17 | 37 | 0 | 7 | 0 |
country_code | 0 | 1.00 | 3 | 5 | 0 | 235 | 0 |
country | 0 | 1.00 | 4 | 52 | 0 | 232 | 0 |
flag_codes | 4185 | 0.19 | 1 | 1 | 0 | 3 | 0 |
flags | 4185 | 0.19 | 14 | 23 | 0 | 3 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
year | 0 | 1.00 | 2014.41 | 1.67 | 2012.00 | 2013.00 | 2014.00 | 2016.00 | 2018.00 | ▇▅▅▅▃ |
student_ratio | 325 | 0.94 | 18.29 | 10.41 | 1.16 | 11.66 | 15.84 | 21.91 | 168.63 | ▇▁▁▁▁ |
10.2 Exploratory Data Analysis
```{r}
s_t_ratio_prim_2015 <- student_ratio %>%
filter(indicator == "Primary Education",
year == 2015,
!is.na(student_ratio))
s_t_ratio_prim_2015 %>% #count(country, sort = TRUE)
arrange(desc(student_ratio)) %>%
slice(c(1:10, seq(n() - 9, n()))) %>% #view()
mutate(country = fct_reorder(country, student_ratio)) %>%
ggplot(aes(student_ratio, country)) +
geom_point() +
expand_limits(x = 0) +
labs(title = "Countries with the highest and lowest student/teacher ratios",
y = "",
x = "Student/teacher ratio")
```
10.3 Research Question
- Research Questions: Does student/teacher ratio is associated with national wealth? How?
10.3.1 Getting GDP, Populaiton, region data from WDI package
- WDI (World Development Indicators (World Bank)) package:
- For details: https://cran.r-project.org/web/packages/WDI/WDI.pdf
WDI()
: It is a function from WDI package that Downloads the requested data by using the World Bank’s API, parses the resulting XML file, and formats it in long country-year format.WDIsearch()
: Search names and descriptions of available WDI series.
```{r}
#install.packages("WDI")
library(WDI)
help("WDI")
WDIsearch("gdp per capita") %>% view()
WDIsearch("Population") %>%
#view()
#class()
as_tibble() %>%
filter(str_detect(name, "^Population")) |>
arrange(str_length(name))
indicators <- WDI(indicator = c("NY.GDP.PCAP.CD", "SP.POP.TOTL"),
start = 2015, end = 2015, extra = TRUE) %>%
as_tibble() %>%
select(country_code = iso3c,
region,
per_cap_gdp = NY.GDP.PCAP.CD,
total_pop = SP.POP.TOTL)
head(indicators)
indicators |>
count(country_code, sort = TRUE)
joined_by_indicators <- s_t_ratio_prim_2015 %>%
inner_join(indicators, by = "country_code")
joined_by_indicators |>
filter(is.na(per_cap_gdp ) | is.na(total_pop))
joined_by_indicators |>
ggplot(aes(student_ratio)) +
geom_histogram() +
scale_x_log10()
joined_by_indicators |>
ggplot(aes(per_cap_gdp)) +
geom_histogram() +
scale_x_log10()
```
10.3.2 Student ratio vs. gdp
```{r}
joined_by_indicators %>%
ggplot(aes(per_cap_gdp, student_ratio))+
geom_point() +
scale_x_log10(labels = scales::dollar) +
scale_y_log10() +
geom_smooth(method = lm, se = FALSE) +
geom_text(aes(label = country), vjust = 1, hjust = 1, check_overlap = TRUE) +
labs(x = "GDP per Capita",
y = "Student/Teacher Ratio",
title = "The relationship between GDP and Student/Teacher Ratio",
subtitle = "For Year 2015")
```
10.3.3 Adding population and region to the chart
```{r}
joined_by_indicators %>%
arrange(desc(total_pop)) %>%
ggplot(aes(per_cap_gdp, student_ratio)) +
geom_point(aes(size = total_pop, color = region)) +
scale_x_log10(labels = scales::dollar) +
scale_y_log10() +
geom_smooth(method = lm, se = FALSE) +
geom_text(aes(label = country), vjust = 1, hjust = 1, check_overlap = TRUE) +
labs(x = "GDP per Capita",
y = "Student/Teacher Ratio",
title = "The relationship between GDP and Student/Teacher Ratio",
subtitle = "For Year 2015")
```
10.3.4 Refining the chart
```{r}
joined_by_indicators %>%
arrange(desc(total_pop)) %>%
#slice_max(SP.POP.TOTL, n = 100) %>%
ggplot(aes(per_cap_gdp, student_ratio))+
geom_point(aes(size = total_pop, color = region)) +
scale_x_log10(labels = scales::dollar) +
scale_y_log10() +
geom_smooth(method = lm, se = FALSE) +
geom_text(aes(label = country), vjust = 1, hjust = 1, check_overlap = TRUE) +
scale_size_continuous(labels = scales::comma_format(), range = c(.5, 15)) +
labs(x = "GDP per Capita",
y = "Student/Teacher Ratio",
title = "The relationship between GDP and Student/Teacher Ratio",
subtitle = "For Year 2015",
color = "Region",
size = "Population")
```
This confirms there’s a negative correlation between a country’s wealth and its student/teacher ratio.
11 Working with Labelled data
11.1 Uploading packages used often for labelled data
11.2 Importing SPSS data
- Blog by Martin Chan: Working with SPSS labels in R https://martinctc.github.io/blog/working-with-spss-labels-in-r/
```{r import data}
library(here)
here()
data <- read_sav(here("data", "demo.sav"))
#demo <- read_sav("data/demo.sav")
class(data)
dim(data)
head(data)
glimpse(data)
#view(data)
```
[1] "C:/Users/jmjung/OneDrive - Cal Poly Pomona/Documents/Work/Jaemin/4. Teaching/DWV101/Course Content/M09-Import_Export/Principles"
[1] "tbl_df" "tbl" "data.frame"
[1] 6400 29
Rows: 6,400
Columns: 29
$ age <dbl> 55, 56, 28, 24, 25, 45, 42, 35, 46, 34, 55, 28, 31, 42, 35, 5…
$ marital <dbl+lbl> 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, …
$ address <dbl> 12, 29, 9, 4, 2, 9, 19, 15, 26, 0, 17, 3, 9, 8, 8, 24, 1, 0, …
$ income <dbl> 72, 153, 28, 26, 23, 76, 40, 57, 24, 89, 72, 24, 40, 137, 70,…
$ inccat <dbl+lbl> 3, 4, 2, 2, 1, 4, 2, 3, 1, 4, 3, 1, 2, 4, 3, 4, 2, 2, 4, …
$ car <dbl> 36.2, 76.9, 13.7, 12.5, 11.3, 37.2, 19.8, 28.2, 12.2, 46.1, 3…
$ carcat <dbl+lbl> 3, 3, 1, 1, 1, 3, 2, 2, 1, 3, 3, 1, 2, 3, 3, 3, 2, 1, 3, …
$ ed <dbl+lbl> 1, 1, 3, 4, 2, 3, 3, 2, 1, 3, 3, 4, 4, 3, 3, 4, 3, 1, 3, …
$ employ <dbl> 23, 35, 4, 0, 5, 13, 10, 1, 11, 12, 2, 4, 0, 3, 9, 16, 0, 2, …
$ retire <dbl+lbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ empcat <dbl+lbl> 3, 3, 1, 1, 2, 2, 2, 1, 2, 2, 1, 1, 1, 1, 2, 3, 1, 1, 3, …
$ jobsat <dbl+lbl> 5, 4, 3, 1, 2, 2, 2, 1, 5, 4, 3, 5, 2, 1, 4, 5, 1, 4, 3, …
$ gender <chr+lbl> "f", "m", "f", "m", "m", "m", "m", "f", "f", "m", "f", "m…
$ reside <dbl> 4, 1, 3, 3, 2, 2, 1, 1, 2, 6, 2, 1, 4, 1, 3, 2, 7, 2, 1, 4, 1…
$ wireless <dbl+lbl> 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, …
$ multline <dbl+lbl> 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, …
$ voice <dbl+lbl> 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, …
$ pager <dbl+lbl> 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, …
$ internet <dbl+lbl> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, …
$ callid <dbl+lbl> 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, …
$ callwait <dbl+lbl> 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, …
$ owntv <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ ownvcr <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, …
$ owncd <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, …
$ ownpda <dbl+lbl> 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, …
$ ownpc <dbl+lbl> 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, …
$ ownfax <dbl+lbl> 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, …
$ news <dbl+lbl> 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, …
$ response <dbl+lbl> 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, …
11.3 Wrangling with labelled data
11.3.1 Viewing data dictionary
```{r}
#install.packages("sjPlot")
#library(sjPlot)
data %>%
sjPlot::view_df() # create a nice table of viewer for the variable, label, value labels
```
ID | Name | Label | Values | Value Labels |
1 | age | Age in years | range: 18-77 | |
2 | marital | Marital status | 0 1 |
Unmarried Married |
3 | address | Years at current address | range: 0-56 | |
4 | income | Household income in thousands | range: 9-1116 | |
5 | inccat | Income category in thousands | 1 2 3 4 |
Under $25 $25 - $49 $50 - $74 $75+ |
6 | car | Price of primary vehicle | range: 4.2-99.9 | |
7 | cbackground-color:#eeeeeeat | Primary vehicle price category | 1 2 3 |
Economy Standard Luxury |
8 | ed | Level of education | 1 2 3 4 5 |
Did not complete high school High school degree Some college College degree Post-undergraduate degree |
9 | employ | Years with current employer | range: 0-57 | |
10 | retire | Retired | 0 1 |
No Yes |
11 | empcat | Years with current employer | 1 2 3 |
Less than 5 5 to 15 More than 15 |
12 | jobsat | Job satisfaction | 1 2 3 4 5 |
Highly dissatisfied Somewhat dissatisfied Neutral Somewhat satisfied Highly satisfied |
13 | gender | Gender | f m |
<output omitted> |
14 | reside | Number of people in household | range: 1-9 | |
15 | wireless | Wireless service | 0 1 |
No Yes |
16 | multline | Multiple lines | 0 1 |
No Yes |
17 | voice | Voice mail | 0 1 |
No Yes |
18 | pager | Paging service | 0 1 |
No Yes |
19 | internet | Internet | 0 1 8 9 |
No Yes Does not know No Answer |
20 | callid | Caller ID | 0 1 |
No Yes |
21 | callwait | Call waiting | 0 1 |
No Yes |
22 | owntv | Owns TV | 0 1 |
No Yes |
23 | ownvcr | Owns VCR | 0 1 |
No Yes |
24 | owncd | Owns stereo/CD player | 0 1 |
No Yes |
25 | ownpda | Owns PDA | 0 1 |
No Yes |
26 | ownpc | Owns computer | 0 1 |
No Yes |
27 | ownfax | Owns fax machine | 0 1 |
No Yes |
28 | news | Newspaper subscription | 0 1 |
Yes No |
29 | response | Response | 0 1 |
Yes No |
11.3.2 Creating a data dictionary
```{r create a dictionary}
dictionary <- labelled::generate_dictionary(data)
dictionary
dictionary %>%
filter(variable %in% c("marital", "inccat", "carcat", "jobsat")) %>%
select(pos, variable, label, value_labels) %>%
view()
data %>%
pull(marital) %>%
class(.)
data %>%
pull(marital) %>%
str(.)
data %>%
select(marital, inccat, carcat, jobsat) %>%
head(.)
```
[1] "haven_labelled" "vctrs_vctr" "double"
dbl+lbl [1:6400] 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, ...
@ label : chr "Marital status"
@ format.spss: chr "F4.0"
@ labels : Named num [1:2] 0 1
..- attr(*, "names")= chr [1:2] "Unmarried" "Married"
11.3.3 Applying common operations to the labelled data set
```{r Applying common operations to the data set}
data %>%
glimpse()
skimr::skim(data)
# Evaluate if variable is of class haven_labelled.
data$marital %>%
class()
data |> pull(marital) |> class() # same as before
class(data)
class(data$income)
class(data$inccat)
data$inccat %>%
is.labelled()
haven::is.labelled(data$inccat)
haven::is.labelled(data$income)
haven::is.labelled(data)
# variable label
var_label(data)
var_label(data$jobsat) # the same as below
data$jobsat %>%
attr('label')
# print value labels
data$jobsat %>%
attr('labels')
val_labels(data$jobsat)
# print value label for one specific value
val_label(data$jobsat, 1)
data %>%
pull(jobsat) %>%
val_label(1)
```
Rows: 6,400
Columns: 29
$ age <dbl> 55, 56, 28, 24, 25, 45, 42, 35, 46, 34, 55, 28, 31, 42, 35, 5…
$ marital <dbl+lbl> 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, …
$ address <dbl> 12, 29, 9, 4, 2, 9, 19, 15, 26, 0, 17, 3, 9, 8, 8, 24, 1, 0, …
$ income <dbl> 72, 153, 28, 26, 23, 76, 40, 57, 24, 89, 72, 24, 40, 137, 70,…
$ inccat <dbl+lbl> 3, 4, 2, 2, 1, 4, 2, 3, 1, 4, 3, 1, 2, 4, 3, 4, 2, 2, 4, …
$ car <dbl> 36.2, 76.9, 13.7, 12.5, 11.3, 37.2, 19.8, 28.2, 12.2, 46.1, 3…
$ carcat <dbl+lbl> 3, 3, 1, 1, 1, 3, 2, 2, 1, 3, 3, 1, 2, 3, 3, 3, 2, 1, 3, …
$ ed <dbl+lbl> 1, 1, 3, 4, 2, 3, 3, 2, 1, 3, 3, 4, 4, 3, 3, 4, 3, 1, 3, …
$ employ <dbl> 23, 35, 4, 0, 5, 13, 10, 1, 11, 12, 2, 4, 0, 3, 9, 16, 0, 2, …
$ retire <dbl+lbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ empcat <dbl+lbl> 3, 3, 1, 1, 2, 2, 2, 1, 2, 2, 1, 1, 1, 1, 2, 3, 1, 1, 3, …
$ jobsat <dbl+lbl> 5, 4, 3, 1, 2, 2, 2, 1, 5, 4, 3, 5, 2, 1, 4, 5, 1, 4, 3, …
$ gender <chr+lbl> "f", "m", "f", "m", "m", "m", "m", "f", "f", "m", "f", "m…
$ reside <dbl> 4, 1, 3, 3, 2, 2, 1, 1, 2, 6, 2, 1, 4, 1, 3, 2, 7, 2, 1, 4, 1…
$ wireless <dbl+lbl> 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, …
$ multline <dbl+lbl> 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, …
$ voice <dbl+lbl> 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, …
$ pager <dbl+lbl> 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, …
$ internet <dbl+lbl> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, …
$ callid <dbl+lbl> 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, …
$ callwait <dbl+lbl> 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, …
$ owntv <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ ownvcr <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, …
$ owncd <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, …
$ ownpda <dbl+lbl> 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, …
$ ownpc <dbl+lbl> 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, …
$ ownfax <dbl+lbl> 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, …
$ news <dbl+lbl> 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, …
$ response <dbl+lbl> 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, …
Name | data |
Number of rows | 6400 |
Number of columns | 29 |
_______________________ | |
Column type frequency: | |
character | 1 |
numeric | 28 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
gender | 0 | 1 | 1 | 1 | 0 | 2 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
age | 0 | 1.00 | 42.06 | 12.29 | 18.0 | 33.0 | 41.0 | 51.0 | 77.0 | ▅▇▇▃▁ |
marital | 0 | 1.00 | 0.50 | 0.50 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | ▇▁▁▁▇ |
address | 0 | 1.00 | 11.56 | 9.94 | 0.0 | 3.0 | 9.0 | 17.0 | 56.0 | ▇▃▂▁▁ |
income | 0 | 1.00 | 69.47 | 78.72 | 9.0 | 28.0 | 45.0 | 79.0 | 1116.0 | ▇▁▁▁▁ |
inccat | 0 | 1.00 | 2.53 | 1.07 | 1.0 | 2.0 | 2.0 | 4.0 | 4.0 | ▃▇▁▃▆ |
car | 0 | 1.00 | 30.13 | 21.93 | 4.2 | 13.9 | 22.2 | 39.5 | 99.9 | ▇▃▂▂▁ |
carcat | 0 | 1.00 | 2.07 | 0.80 | 1.0 | 1.0 | 2.0 | 3.0 | 3.0 | ▆▁▇▁▇ |
ed | 0 | 1.00 | 2.59 | 1.20 | 1.0 | 2.0 | 2.0 | 4.0 | 5.0 | ▆▇▆▆▂ |
employ | 0 | 1.00 | 10.57 | 9.72 | 0.0 | 3.0 | 8.0 | 16.0 | 57.0 | ▇▃▁▁▁ |
retire | 0 | 1.00 | 0.05 | 0.21 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | ▇▁▁▁▁ |
empcat | 0 | 1.00 | 1.94 | 0.79 | 1.0 | 1.0 | 2.0 | 3.0 | 3.0 | ▇▁▇▁▆ |
jobsat | 0 | 1.00 | 3.06 | 1.37 | 1.0 | 2.0 | 3.0 | 4.0 | 5.0 | ▆▇▇▇▇ |
reside | 0 | 1.00 | 2.35 | 1.47 | 1.0 | 1.0 | 2.0 | 3.0 | 9.0 | ▇▃▁▁▁ |
wireless | 0 | 1.00 | 0.40 | 0.49 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | ▇▁▁▁▅ |
multline | 0 | 1.00 | 0.42 | 0.49 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | ▇▁▁▁▆ |
voice | 0 | 1.00 | 0.43 | 0.50 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | ▇▁▁▁▆ |
pager | 0 | 1.00 | 0.25 | 0.43 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | ▇▁▁▁▂ |
internet | 255 | 0.96 | 0.27 | 0.44 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | ▇▁▁▁▃ |
callid | 0 | 1.00 | 0.51 | 0.50 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | ▇▁▁▁▇ |
callwait | 0 | 1.00 | 0.51 | 0.50 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | ▇▁▁▁▇ |
owntv | 0 | 1.00 | 0.99 | 0.10 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | ▁▁▁▁▇ |
ownvcr | 0 | 1.00 | 0.96 | 0.20 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | ▁▁▁▁▇ |
owncd | 0 | 1.00 | 0.97 | 0.17 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | ▁▁▁▁▇ |
ownpda | 0 | 1.00 | 0.20 | 0.40 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | ▇▁▁▁▂ |
ownpc | 0 | 1.00 | 0.44 | 0.50 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | ▇▁▁▁▆ |
ownfax | 0 | 1.00 | 0.19 | 0.39 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | ▇▁▁▁▂ |
news | 0 | 1.00 | 0.57 | 0.50 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | ▆▁▁▁▇ |
response | 0 | 1.00 | 0.89 | 0.31 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | ▁▁▁▁▇ |
[1] "haven_labelled" "vctrs_vctr" "double"
[1] "haven_labelled" "vctrs_vctr" "double"
[1] "tbl_df" "tbl" "data.frame"
[1] "numeric"
[1] "haven_labelled" "vctrs_vctr" "double"
[1] TRUE
[1] TRUE
[1] FALSE
[1] FALSE
$age
[1] "Age in years"
$marital
[1] "Marital status"
$address
[1] "Years at current address"
$income
[1] "Household income in thousands"
$inccat
[1] "Income category in thousands"
$car
[1] "Price of primary vehicle"
$carcat
[1] "Primary vehicle price category"
$ed
[1] "Level of education"
$employ
[1] "Years with current employer"
$retire
[1] "Retired"
$empcat
[1] "Years with current employer"
$jobsat
[1] "Job satisfaction"
$gender
[1] "Gender"
$reside
[1] "Number of people in household"
$wireless
[1] "Wireless service"
$multline
[1] "Multiple lines"
$voice
[1] "Voice mail"
$pager
[1] "Paging service"
$internet
[1] "Internet"
$callid
[1] "Caller ID"
$callwait
[1] "Call waiting"
$owntv
[1] "Owns TV"
$ownvcr
[1] "Owns VCR"
$owncd
[1] "Owns stereo/CD player"
$ownpda
[1] "Owns PDA"
$ownpc
[1] "Owns computer"
$ownfax
[1] "Owns fax machine"
$news
[1] "Newspaper subscription"
$response
[1] "Response"
[1] "Job satisfaction"
[1] "Job satisfaction"
Highly dissatisfied Somewhat dissatisfied Neutral
1 2 3
Somewhat satisfied Highly satisfied
4 5
<labelled<double>[6400]>: Job satisfaction
[1] 5 4 3 1 2 2 2 1 5 4 3 5 2 1 4 5 1 4 3 5 3 3 3 3 4 3 1 5 5 4 1 4 1 2 5 1 1
[38] 3 4 3 3 2 5 2 3 1 2 3 2 3 2 5 4 2 2 1 3 3 3 3 5 3 2 2 1 1 4 3 4 2 5 5 4 5
[75] 2 3 5 5 5 3 3 4 4 2 2 5 4 3 4 4 5 4 1 3 5 1 4 2 2 2 3 3 4 3 3 2 5 4 1 4 4
[112] 2 2 2 1 4 1 1 1 5 3 2 5 1 4 3 2 2 1 3 2 1 1 2 1 3 1 3 4 1 4 3 1 5 1 4 2 2
[149] 2 5 2 2 2 4 4 5 2 2 1 2 3 4 3 1 2 3 4 2 2 2 2 1 3 4 4 1 4 4 4 2 1 4 3 2 2
[186] 3 4 5 5 5 3 2 1 5 5 2 1 3 5 3 3 5 3 4 2 4 2 1 5 2 5 3 5 4 5 3 4 2 4 4 1 1
[223] 2 2 4 1 1 5 1 3 2 1 4 5 1 3 4 4 5 5 5 5 4 2 2 4 1 4 5 5 5 4 2 5 5 1 4 5 1
[260] 5 2 4 5 4 1 3 5 5 2 3 5 1 4 5 5 4 3 4 2 4 3 3 4 3 1 2 5 2 2 3 2 5 4 2 4 3
[297] 2 2 1 1 4 1 2 4 5 5 3 1 2 4 5 4 4 4 4 5 3 4 1 5 1 4 4 3 4 1 1 4 5 5 2 3 3
[334] 3 3 2 1 5 3 2 1 4 4 3 3 3 1 5 1 4 1 5 5 1 3 5 3 5 5 3 2 2 3 4 4 5 4 5 2 1
[371] 5 5 2 3 5 2 3 3 3 3 3 4 2 1 5 4 4 3 2 1 3 5 4 3 2 4 4 4 1 4 4 1 1 1 1 5 3
[408] 3 4 4 4 2 3 1 2 2 4 4 3 5 3 3 3 5 4 3 4 5 2 3 4 4 4 1 4 2 3 2 3 5 3 5 1 2
[445] 1 5 4 1 5 5 2 5 3 4 4 5 3 4 2 4 4 4 1 3 3 5 2 3 2 3 4 3 3 4 2 2 1 3 5 4 3
[482] 3 1 3 4 1 1 1 3 5 4 2 2 4 1 4 5 3 3 1 2 4 1 3 4 5 3 3 5 4 2 3 1 2 4 3 2 4
[519] 5 3 4 5 1 3 4 2 1 5 2 1 4 2 1 5 4 3 4 2 5 4 1 3 5 4 1 2 3 2 4 2 2 4 2 4 3
[556] 2 1 1 1 3 1 4 2 4 2 4 3 1 5 5 4 2 3 4 3 1 1 4 1 1 1 1 2 3 3 4 3 1 4 4 2 5
[593] 4 2 5 4 5 4 2 2 3 4 4 4 3 3 4 5 5 2 3 1 5 5 4 3 2 1 3 5 5 3 3 1 3 4 3 3 2
[630] 3 1 5 5 4 3 1 4 2 2 2 3 1 5 3 5 1 2 4 2 3 2 2 2 4 2 3 2 2 5 2 4 3 2 4 2 4
[667] 4 1 2 3 1 5 1 4 5 3 3 2 4 4 3 3 2 5 1 4 2 5 4 2 4 1 1 4 1 2 3 2 2 5 5 3 4
[704] 5 4 5 1 5 3 4 2 2 3 3 1 2 1 3 3 3 2 2 2 3 2 4 2 5 1 5 2 4 1 4 4 4 3 2 5 4
[741] 4 2 4 4 3 2 3 4 2 5 3 1 2 3 5 5 1 3 5 2 3 4 5 5 2 2 4 3 5 2 4 4 5 3 1 3 5
[778] 1 5 5 2 2 5 4 2 2 2 3 1 3 5 3 1 3 4 2 5 3 5 5 3 2 3 4 1 5 5 1 4 1 2 2 1 2
[815] 2 5 2 3 5 2 3 2 4 2 1 3 3 2 5 3 5 3 1 3 5 4 1 3 1 5 4 2 4 3 1 1 1 4 5 3 3
[852] 1 5 2 2 4 4 1 1 2 5 4 1 4 4 2 5 1 2 4 3 3 2 1 3 5 4 2 3 3 3 2 3 3 1 4 1 3
[889] 2 3 1 3 2 1 3 5 4 3 3 1 5 3 4 4 2 2 3 5 3 5 3 5 3 4 4 3 3 3 1 1 1 4 3 5 4
[926] 2 2 3 2 5 3 1 3 5 3 2 4 1 3 5 5 1 5 5 2 4 5 4 4 2 3 3 2 1 3 5 3 2 5 4 2 2
[963] 5 3 4 3 1 1 1 1 5 4 5 5 2 5 5 2 3 4 4 4 4 4 3 1 2 5 4 4 3 4 1 4 4 5 3 3 1
[1000] 4 1 4 1 2 5 5 4 3 5 3 4 4 1 4 1 5 3 2 4 5 5 5 1 2 5 5 4 5 4 3 4 3 4 5 4 1
[1037] 5 4 4 2 2 3 1 1 2 4 3 3 5 4 4 2 2 3 2 3 2 3 5 3 5 1 2 4 1 1 3 2 5 3 3 4 5
[1074] 2 2 5 3 4 3 5 3 1 5 3 4 2 1 5 2 3 3 4 2 2 3 3 3 1 2 5 5 3 3 2 4 5 4 5 4 5
[1111] 1 4 1 2 4 3 2 5 1 3 1 1 2 3 5 3 3 1 3 1 5 3 4 2 1 3 1 2 2 5 5 4 5 1 1 4 5
[1148] 4 3 4 2 3 4 5 1 2 3 2 5 5 2 2 1 3 3 2 3 2 5 3 1 1 3 2 3 4 2 1 2 4 2 3 4 5
[1185] 3 1 4 4 5 4 4 1 5 5 5 4 2 2 5 5 2 1 2 5 2 1 4 2 4 1 5 3 1 1 2 1 2 4 4 4 1
[1222] 2 3 4 3 5 5 1 1 5 4 5 4 1 3 4 2 2 2 3 4 4 1 4 2 3 5 5 3 4 4 1 3 1 3 5 4 4
[1259] 2 1 4 5 5 2 5 2 5 4 3 1 4 5 5 4 1 4 5 5 3 1 3 4 4 2 5 3 4 5 2 2 2 4 3 2 1
[1296] 3 1 4 4 2 4 1 3 2 2 4 3 1 2 1 5 3 4 5 5 4 2 4 2 1 5 4 5 4 4 5 3 4 1 4 4 3
[1333] 3 3 4 5 5 1 3 3 4 4 3 4 1 3 1 4 4 1 1 2 5 4 2 2 2 5 2 2 3 1 3 3 2 2 4 5 2
[1370] 4 1 2 2 2 1 3 5 4 3 3 2 3 3 5 4 3 1 3 1 4 4 2 3 3 3 4 5 4 1 3 3 2 5 5 1 1
[1407] 2 2 1 2 5 3 4 4 3 2 1 4 2 2 5 4 3 3 4 1 1 5 2 1 4 2 4 1 5 5 2 4 1 1 2 3 2
[1444] 4 3 4 5 4 4 5 1 5 5 5 1 1 5 5 5 4 2 5 1 3 3 5 1 1 1 5 2 4 1 2 5 4 2 1 4 3
[1481] 1 3 5 3 2 3 5 2 1 3 2 4 4 1 5 1 3 2 5 5 5 4 4 5 2 5 4 3 1 1 1 5 3 5 2 5 2
[1518] 3 5 2 2 5 2 2 5 4 4 1 1 2 5 4 1 2 1 4 2 1 3 2 2 5 3 5 1 4 3 1 5 1 4 3 5 2
[1555] 3 1 4 2 4 4 2 2 2 5 2 1 1 2 1 4 4 2 3 5 1 4 3 3 3 5 2 3 1 5 4 5 5 2 5 2 2
[1592] 2 2 5 3 4 2 5 1 2 3 1 3 5 3 4 4 5 1 3 5 4 2 5 1 4 1 2 5 4 3 3 3 5 3 2 2 1
[1629] 1 3 4 2 4 3 2 5 4 1 1 4 4 5 5 3 1 5 4 4 2 2 3 5 4 2 2 3 5 5 2 2 1 1 2 4 2
[1666] 5 5 4 1 2 4 3 5 4 1 1 1 5 2 2 1 4 3 4 3 5 1 5 5 1 3 4 5 5 2 3 5 4 1 3 3 3
[1703] 2 2 3 5 3 3 1 1 2 5 1 5 3 2 2 3 2 5 2 3 4 1 2 4 4 4 1 5 2 5 2 2 5 1 5 1 2
[1740] 1 4 4 4 4 2 3 1 5 5 3 5 3 3 1 5 3 5 1 5 1 4 1 4 3 1 4 5 2 5 2 5 3 3 3 2 3
[1777] 2 2 1 4 4 1 4 2 4 4 1 3 4 2 3 3 3 1 1 5 3 2 3 4 1 2 3 3 5 4 2 5 1 4 2 5 2
[1814] 5 4 2 3 5 1 1 4 1 3 4 5 2 5 5 1 2 3 3 5 5 5 2 2 2 5 1 2 2 5 5 3 4 4 3 3 5
[1851] 1 5 3 2 1 2 4 2 4 2 2 1 1 5 5 1 2 4 2 2 2 1 1 1 5 5 5 1 2 1 2 5 3 5 3 5 3
[1888] 5 3 3 3 5 4 4 1 2 2 4 1 3 3 1 4 4 1 5 5 2 5 4 4 3 1 4 5 1 4 5 3 4 1 1 3 2
[1925] 5 4 5 1 1 2 4 1 1 5 4 3 5 2 4 3 5 3 4 4 3 1 5 4 1 3 2 5 4 4 4 2 3 1 4 3 1
[1962] 2 1 5 3 3 5 1 5 2 2 3 3 5 4 3 3 4 1 4 2 3 1 3 3 3 2 3 3 2 2 2 2 4 3 4 2 3
[1999] 4 2 3 2 1 5 3 1 4 4 3 4 3 5 4 4 1 1 2 2 4 3 5 3 2 4 1 4 2 2 1 5 2 5 5 4 3
[2036] 2 1 5 4 1 3 5 4 3 1 1 2 5 5 4 4 2 3 5 4 4 3 4 3 5 3 5 1 5 1 3 3 4 2 4 5 5
[2073] 5 1 5 5 1 3 3 4 2 5 4 3 1 5 2 4 3 3 5 2 3 3 1 5 4 5 3 5 4 4 3 3 5 1 1 3 3
[2110] 4 2 2 5 5 3 5 3 3 5 3 5 4 5 4 3 2 2 5 2 3 1 4 5 2 2 5 4 5 5 4 3 1 3 4 2 2
[2147] 4 2 4 1 3 4 3 2 3 5 3 2 3 4 2 2 5 2 3 2 4 2 4 2 2 2 5 4 5 5 1 4 3 5 4 1 4
[2184] 4 5 2 2 4 1 3 4 1 3 3 3 1 3 2 3 5 1 1 5 1 1 4 2 5 1 3 3 1 3 4 4 3 5 5 3 5
[2221] 4 2 2 1 5 1 3 2 2 1 5 1 1 1 3 1 1 5 5 5 2 1 2 3 4 1 3 2 2 4 3 3 4 5 5 5 1
[2258] 3 2 1 5 5 1 4 5 5 1 1 1 3 4 1 4 2 3 4 4 5 5 5 2 5 5 4 1 4 1 2 3 4 3 2 1 1
[2295] 3 2 5 3 1 4 2 5 4 5 2 1 1 5 5 5 5 5 1 5 1 2 2 1 4 3 5 1 2 3 1 5 5 1 1 5 5
[2332] 5 4 3 1 3 2 2 3 4 3 4 3 3 4 3 5 4 2 4 3 2 1 4 5 2 2 3 2 3 3 4 3 5 2 4 3 5
[2369] 3 4 4 4 1 5 4 3 5 3 4 2 5 4 1 5 4 3 3 2 5 2 3 4 3 4 4 2 4 5 3 2 2 1 3 4 1
[2406] 4 2 5 1 2 1 4 5 4 5 3 2 2 1 4 3 1 5 4 5 4 5 5 3 3 3 5 1 5 2 3 3 3 3 2 5 4
[2443] 4 4 3 2 3 1 5 3 1 5 5 5 1 1 4 1 4 4 1 1 1 4 1 3 5 4 2 5 4 3 2 3 4 2 2 1 1
[2480] 3 3 1 4 5 3 5 5 5 4 5 2 2 4 2 3 2 1 5 3 4 2 2 5 2 1 1 4 3 5 3 4 3 3 4 1 4
[2517] 2 3 2 4 4 2 1 2 5 3 2 1 1 2 3 2 3 1 5 3 2 3 2 4 1 3 3 5 3 2 1 1 1 3 2 4 2
[2554] 3 5 3 3 1 5 2 3 4 3 3 1 1 4 2 3 1 4 1 3 4 4 1 5 4 4 1 4 4 1 2 1 1 4 2 3 5
[2591] 1 5 4 3 2 5 3 2 4 4 5 4 1 3 1 1 2 3 1 5 4 5 4 5 4 2 3 4 3 5 4 4 2 2 3 1 5
[2628] 5 2 2 3 5 3 1 2 5 2 4 1 1 3 2 3 5 2 3 2 1 4 3 1 4 3 1 3 4 4 4 3 3 4 2 3 5
[2665] 3 5 5 2 2 5 4 2 3 4 1 3 4 3 1 3 1 5 4 4 2 1 5 2 4 3 2 1 5 5 5 3 1 2 4 2 3
[2702] 2 2 5 1 3 5 4 2 1 3 3 4 3 4 4 1 4 2 1 5 1 4 1 4 5 2 5 2 4 2 1 4 2 1 4 2 1
[2739] 3 5 3 3 2 2 5 5 4 1 1 3 4 3 5 1 5 3 2 5 1 4 1 4 1 1 5 3 5 2 5 4 2 1 2 1 1
[2776] 2 4 5 2 3 2 1 1 4 4 5 1 5 3 5 5 3 3 4 1 4 3 2 2 4 1 4 3 3 2 3 4 2 2 2 3 5
[2813] 1 3 5 4 1 3 2 5 4 5 4 3 3 1 5 1 2 3 1 3 4 1 3 3 4 2 4 5 4 4 3 4 2 2 5 4 2
[2850] 2 3 3 3 2 3 3 5 3 2 4 2 2 2 3 2 3 3 3 3 1 5 3 1 2 4 1 1 1 2 5 1 5 4 2 1 3
[2887] 3 3 1 4 4 2 3 3 3 4 5 2 5 2 3 4 3 2 5 3 1 1 3 5 5 3 3 3 4 4 4 5 5 2 1 5 5
[2924] 1 1 3 4 1 4 2 5 5 4 2 4 3 4 2 1 2 4 3 2 1 2 1 4 3 5 1 4 4 3 1 2 5 1 2 4 5
[2961] 5 3 2 1 4 2 5 3 1 2 3 4 1 1 4 4 4 2 2 3 3 5 2 2 2 1 5 3 3 3 2 4 5 4 2 2 5
[2998] 4 1 1 2 5 4 3 3 4 5 3 1 2 4 3 4 5 5 3 5 2 4 2 3 2 5 4 1 4 4 3 1 5 1 1 5 3
[3035] 5 2 3 3 2 5 5 4 2 4 3 1 1 1 3 5 2 3 3 1 5 2 4 1 3 1 1 1 4 1 2 2 3 1 3 1 5
[3072] 1 5 2 4 4 3 1 3 2 5 2 5 4 3 3 2 5 1 5 4 1 1 1 5 4 3 1 3 3 3 5 3 2 3 5 4 1
[3109] 1 4 3 1 3 4 3 5 5 5 2 5 4 4 4 3 2 3 4 1 4 5 3 2 5 2 4 5 2 1 1 3 2 4 2 2 4
[3146] 3 4 1 5 5 5 4 1 4 2 4 2 1 4 5 5 2 2 2 3 4 5 3 2 1 4 5 5 3 3 4 3 3 2 1 3 2
[3183] 4 2 4 3 4 2 2 3 3 5 4 3 4 3 5 4 3 1 4 2 2 3 5 5 1 2 3 4 1 4 4 2 3 5 1 1 3
[3220] 3 4 4 5 2 3 1 5 1 4 2 5 1 4 1 2 1 2 5 1 2 3 4 3 1 4 3 3 3 4 3 3 4 1 3 1 4
[3257] 2 2 2 4 4 5 1 4 1 2 4 2 2 3 5 3 3 4 3 5 2 2 4 5 1 5 1 1 5 1 2 4 4 4 5 5 1
[3294] 1 3 3 4 1 4 4 2 1 3 5 2 5 2 2 5 2 3 2 4 1 3 2 4 3 2 4 2 2 1 3 2 5 2 2 2 4
[3331] 3 2 1 4 3 2 2 3 3 3 2 3 4 1 2 4 1 2 2 4 3 2 2 2 4 4 4 2 5 3 4 2 3 4 2 1 3
[3368] 2 4 3 3 4 4 5 4 1 3 1 3 5 2 4 4 5 2 3 5 3 3 5 1 4 3 1 4 3 4 5 3 1 1 2 5 5
[3405] 5 5 1 1 3 2 3 1 3 2 4 4 4 3 1 5 4 1 1 3 3 1 4 4 2 3 3 2 4 2 3 5 5 4 4 4 4
[3442] 3 2 2 2 2 5 3 5 1 5 2 5 4 3 1 1 3 4 3 1 5 5 4 2 4 1 2 1 3 5 2 4 5 3 1 2 4
[3479] 5 4 4 4 4 4 5 3 5 1 4 4 5 4 1 4 4 4 3 5 1 1 2 3 1 1 5 5 5 2 5 1 5 5 2 1 5
[3516] 1 4 3 4 4 4 2 5 4 1 3 1 5 3 1 3 3 2 5 4 4 1 4 3 3 3 3 1 1 3 3 5 2 4 5 4 4
[3553] 2 5 1 3 2 4 2 4 3 4 1 2 5 5 3 4 4 1 1 4 4 1 3 4 1 2 3 5 2 1 5 4 2 4 3 2 1
[3590] 4 2 3 3 5 3 2 5 1 5 5 2 4 3 3 1 5 5 4 2 1 4 1 3 4 4 2 4 5 1 4 2 5 5 4 2 3
[3627] 5 1 3 2 2 4 1 3 3 4 2 1 5 3 2 2 3 3 4 3 1 4 5 5 3 4 1 4 1 3 1 2 2 5 4 4 3
[3664] 4 4 5 4 5 4 5 4 4 5 1 5 5 5 4 3 4 2 1 1 5 3 1 2 5 5 4 1 1 1 1 1 5 2 3 4 2
[3701] 4 1 1 3 5 2 5 4 2 2 4 5 5 5 4 3 5 4 1 3 4 2 5 2 1 5 4 5 5 3 3 5 1 3 1 2 2
[3738] 1 3 1 2 5 3 3 4 2 1 5 4 5 2 3 3 1 4 1 4 4 3 2 5 2 4 2 4 5 2 1 5 2 5 4 4 2
[3775] 2 3 1 2 5 5 4 3 3 4 2 3 1 5 1 4 3 4 5 4 1 3 4 2 5 1 3 3 4 2 4 5 2 3 4 5 1
[3812] 2 3 3 3 1 4 3 5 3 4 3 1 4 1 1 3 2 4 3 5 1 3 2 4 5 2 2 3 2 2 3 3 3 2 4 1 5
[3849] 2 2 3 3 4 4 1 4 1 4 4 1 1 1 4 2 3 4 3 4 2 4 1 1 3 1 4 1 2 5 4 2 2 4 3 4 1
[3886] 2 5 1 1 1 5 4 4 1 4 4 5 5 5 3 5 5 2 4 2 1 5 3 4 5 4 1 2 2 2 3 2 3 5 3 2 5
[3923] 3 2 4 5 4 4 4 3 4 5 5 5 3 3 5 2 1 2 3 5 5 4 2 3 3 2 1 1 1 5 1 1 4 3 2 2 1
[3960] 2 2 1 3 3 2 3 4 4 3 4 3 1 2 4 5 3 4 4 1 3 1 1 5 4 5 4 4 2 1 1 3 5 4 2 5 1
[3997] 3 4 5 3 2 5 5 3 4 4 4 3 5 4 1 2 2 5 3 4 1 3 3 4 5 5 4 1 3 4 3 4 3 1 5 1 3
[4034] 5 2 2 2 5 2 4 2 1 1 5 5 3 5 5 1 3 5 4 5 4 5 2 5 1 4 1 1 1 2 5 1 4 1 3 3 5
[4071] 1 3 1 1 5 5 1 4 4 4 4 2 2 1 3 5 2 4 3 3 2 1 1 4 3 4 3 5 2 3 1 5 3 4 2 5 3
[4108] 5 3 2 4 1 3 3 5 1 2 5 4 3 1 2 3 5 5 3 3 3 3 5 2 2 2 4 2 3 2 1 3 3 4 5 4 4
[4145] 3 2 5 2 2 3 3 2 4 5 2 2 2 4 4 4 4 2 4 2 3 1 3 1 1 3 4 5 5 3 4 2 2 1 3 2 3
[4182] 2 4 5 3 1 3 4 4 4 4 4 2 5 1 1 4 2 3 4 2 4 5 3 2 4 4 5 4 5 4 2 2 2 4 3 5 5
[4219] 3 1 1 5 3 3 5 4 3 3 4 1 2 4 1 3 2 3 2 5 1 4 2 3 4 1 4 1 5 4 2 3 2 5 4 5 4
[4256] 1 1 2 1 2 1 4 5 1 5 5 4 5 5 3 5 1 3 1 3 5 4 4 1 5 3 2 3 1 2 2 3 4 1 5 5 2
[4293] 3 2 3 3 5 4 4 2 2 3 1 3 4 3 1 3 1 5 5 1 3 4 2 4 5 4 3 5 3 3 2 2 2 1 5 4 1
[4330] 2 1 4 1 4 4 5 5 1 1 4 2 2 3 3 2 4 5 1 2 1 2 5 3 4 1 3 2 4 1 3 5 1 4 5 5 1
[4367] 3 2 1 1 3 5 5 4 2 3 2 3 4 1 3 4 5 1 3 5 2 4 5 4 2 1 3 5 3 2 3 4 1 5 2 2 2
[4404] 3 5 5 2 1 4 2 4 4 5 5 4 2 3 1 5 4 3 5 3 2 1 4 1 4 4 3 2 1 5 4 1 4 5 4 5 5
[4441] 2 5 3 4 3 3 1 1 5 1 2 2 4 4 2 1 1 3 4 1 4 5 4 1 3 5 3 4 5 1 4 1 2 3 4 2 3
[4478] 2 5 5 3 5 4 1 4 2 3 5 1 4 1 4 3 1 5 2 4 3 1 2 2 3 1 3 3 5 2 1 3 2 5 4 5 3
[4515] 2 2 4 5 5 3 5 2 2 2 5 1 1 4 3 5 2 4 4 5 5 4 5 2 1 3 4 5 1 3 3 4 2 2 3 5 3
[4552] 2 2 3 5 1 4 5 3 3 1 5 5 4 2 2 3 3 2 5 5 1 2 5 4 4 3 2 1 2 5 2 4 2 2 1 2 3
[4589] 5 4 3 3 1 4 4 1 4 2 4 5 2 5 3 5 1 5 3 1 2 1 2 3 3 2 4 4 3 4 4 2 1 5 4 3 2
[4626] 4 5 5 1 5 1 4 1 2 3 4 3 1 4 4 5 4 4 3 2 3 3 5 2 3 4 5 5 4 1 3 3 2 3 1 1 4
[4663] 2 3 1 4 4 1 4 5 5 5 4 3 3 4 3 4 4 3 5 5 3 4 3 1 4 2 5 1 4 4 4 1 4 2 1 5 1
[4700] 2 5 4 4 4 4 3 5 3 5 4 2 5 5 3 4 4 5 5 5 3 2 4 3 2 5 3 3 3 5 3 4 2 3 2 3 4
[4737] 3 5 4 3 3 2 3 4 1 4 3 3 4 5 4 5 4 3 1 2 5 2 2 1 2 3 2 5 5 3 4 1 3 5 5 4 5
[4774] 4 3 5 3 4 3 1 4 2 3 5 5 2 2 4 3 3 4 2 5 3 2 4 3 3 4 1 4 1 3 5 5 5 4 2 4 1
[4811] 5 2 2 3 3 4 4 5 5 1 3 3 5 1 1 3 2 5 1 1 2 5 4 2 2 2 4 2 2 5 5 4 5 4 2 3 4
[4848] 2 3 1 4 2 1 1 1 2 3 2 3 3 5 3 1 3 5 4 3 3 5 3 2 2 1 2 3 2 4 1 1 3 1 5 1 2
[4885] 4 5 4 2 1 4 2 4 3 2 4 3 2 5 4 5 4 3 4 5 1 3 2 5 4 4 2 3 4 2 5 3 4 3 4 2 1
[4922] 1 4 4 2 3 4 3 5 5 2 2 4 4 4 5 4 1 4 3 1 4 1 3 1 4 3 5 4 5 4 4 1 3 1 3 3 4
[4959] 5 4 3 5 4 3 4 4 1 2 1 3 2 1 2 1 4 1 3 2 5 3 2 4 5 3 3 2 2 2 2 4 4 4 1 2 5
[4996] 4 4 1 2 5 1 2 3 2 3 1 5 5 2 3 1 4 3 2 2 3 1 2 4 4 4 5 5 4 1 1 1 3 2 4 4 3
[5033] 4 4 3 5 5 1 3 3 5 2 5 3 1 4 1 1 2 3 4 2 4 3 4 5 3 2 4 5 2 3 5 5 2 5 3 5 4
[5070] 4 3 3 2 2 4 1 2 4 5 3 4 4 4 1 4 3 1 4 3 5 1 2 2 2 3 2 2 4 1 3 5 3 4 4 3 2
[5107] 3 1 2 1 1 1 3 3 2 3 5 3 4 2 4 4 1 3 5 5 1 2 1 1 2 2 4 2 4 2 1 2 2 4 2 2 3
[5144] 4 2 4 2 1 1 3 1 5 3 1 3 2 5 3 4 3 5 4 5 2 2 4 3 4 2 1 2 2 5 5 5 4 4 4 2 4
[5181] 3 2 2 5 5 5 3 3 2 4 5 2 3 4 3 5 4 2 2 4 5 1 3 4 1 2 2 4 2 5 4 3 4 1 3 5 2
[5218] 3 2 5 4 5 1 1 2 5 4 3 2 2 1 1 5 1 1 3 5 4 5 3 3 3 3 4 5 3 5 4 1 5 5 2 1 1
[5255] 1 5 4 2 2 3 4 3 3 5 2 5 4 5 5 4 4 2 5 2 3 4 1 4 3 1 5 2 5 5 2 4 1 2 1 2 1
[5292] 1 3 5 2 3 2 1 5 4 4 2 1 4 2 3 1 1 5 3 2 1 5 3 3 3 3 1 5 4 3 1 5 5 5 5 1 5
[5329] 5 4 2 3 3 5 4 5 5 3 2 1 3 2 2 2 3 2 4 3 3 2 2 3 4 3 4 4 5 1 2 3 5 3 5 3 3
[5366] 5 1 3 1 5 4 2 5 3 2 4 4 4 2 4 2 5 1 4 2 5 3 2 5 5 5 4 5 1 2 2 2 4 3 1 3 5
[5403] 5 4 2 1 1 4 1 3 2 5 2 1 4 1 2 2 2 2 3 5 2 5 3 1 2 4 1 3 5 4 5 5 2 3 3 4 4
[5440] 1 4 5 4 1 5 5 4 3 4 5 5 3 2 1 2 3 3 2 1 5 2 5 3 3 5 3 3 3 4 4 5 4 4 2 2 5
[5477] 4 5 2 2 5 3 1 1 2 2 3 4 2 4 2 1 3 5 5 5 1 2 4 1 4 1 1 3 1 2 3 3 1 1 4 4 1
[5514] 3 2 2 3 5 5 3 5 3 5 1 3 4 2 3 3 4 5 4 5 1 4 3 1 3 3 3 5 5 2 4 1 5 5 5 3 1
[5551] 5 5 2 2 4 4 5 2 5 1 3 3 4 2 5 1 4 5 4 1 3 3 5 5 5 3 3 5 1 1 3 5 1 5 1 3 4
[5588] 5 4 1 4 3 4 5 4 3 4 1 2 1 3 2 1 2 1 2 4 4 1 1 4 5 1 4 5 3 4 2 2 4 3 5 2 3
[5625] 1 3 3 5 2 4 5 3 2 1 4 3 3 3 3 4 3 3 1 1 4 1 1 2 5 3 4 5 4 1 2 4 2 4 1 4 1
[5662] 4 3 2 5 2 5 1 4 2 3 3 1 2 5 5 2 3 3 1 2 4 3 2 2 4 5 4 2 3 2 2 4 2 3 4 5 3
[5699] 4 1 3 5 4 5 4 2 4 2 2 1 4 5 3 4 4 4 5 2 4 2 2 3 1 2 3 2 4 1 2 1 4 2 5 2 2
[5736] 1 3 3 3 5 3 4 4 1 4 3 4 2 3 3 2 1 1 1 1 2 4 3 3 3 5 4 1 3 3 3 1 1 5 4 2 1
[5773] 1 4 1 5 4 3 3 4 3 3 1 3 3 2 5 2 4 3 5 1 4 3 3 5 4 4 2 4 4 4 1 4 5 5 5 2 2
[5810] 3 5 1 5 3 3 3 1 4 3 2 4 3 2 4 3 5 3 4 5 5 1 4 5 5 3 3 3 4 2 3 1 2 2 3 2 3
[5847] 3 4 3 1 1 2 3 4 5 4 2 5 4 3 2 4 3 2 1 4 3 2 5 1 5 4 1 5 4 4 2 4 1 4 3 1 4
[5884] 3 3 4 1 3 2 1 5 2 4 4 3 3 5 2 3 3 3 2 5 3 3 5 2 1 5 4 3 2 2 2 5 1 5 4 2 3
[5921] 3 1 1 3 4 3 2 3 2 4 3 5 2 3 1 2 4 3 2 4 2 3 1 5 3 2 4 5 5 4 2 5 4 2 4 5 2
[5958] 5 2 3 4 1 3 1 2 1 4 3 1 2 5 4 3 5 3 5 4 2 2 4 1 4 1 1 5 5 2 3 3 4 3 2 4 5
[5995] 1 2 2 3 2 4 3 2 3 5 4 2 3 4 4 1 3 5 3 4 5 2 3 4 1 4 2 5 4 3 4 3 3 4 5 5 2
[6032] 3 4 5 3 5 5 3 2 4 5 1 4 2 3 5 3 4 1 1 5 5 4 5 4 3 2 3 1 5 5 4 2 5 5 1 1 1
[6069] 4 1 3 3 5 5 2 1 1 1 4 4 4 5 1 4 4 3 4 3 4 5 5 2 4 3 4 1 2 1 3 3 2 3 4 3 1
[6106] 1 5 2 2 3 2 2 5 3 2 1 3 5 5 2 4 4 2 4 5 4 2 2 3 2 4 1 5 3 5 2 2 2 3 4 3 1
[6143] 5 4 4 2 5 4 4 4 4 5 5 2 5 3 1 4 2 2 4 3 3 1 1 4 3 2 1 4 5 3 4 5 5 5 3 3 4
[6180] 1 5 2 4 4 5 2 1 1 2 2 4 1 2 4 3 5 5 3 3 4 5 4 1 1 4 4 2 1 1 2 5 1 5 5 2 1
[6217] 4 3 2 4 3 2 3 3 2 2 5 3 2 1 1 4 4 3 4 1 4 5 4 2 3 1 4 4 3 1 1 1 1 3 5 3 1
[6254] 2 1 5 4 3 4 3 2 2 4 3 1 5 1 4 4 1 3 4 4 3 2 4 1 4 1 5 1 4 4 5 3 4 3 5 3 4
[6291] 3 2 2 2 3 2 2 4 3 4 1 5 5 2 5 5 3 5 4 4 3 2 2 4 2 3 2 1 2 2 3 1 2 2 5 3 2
[6328] 3 2 3 2 4 5 3 1 5 2 5 4 2 4 1 3 3 3 4 1 2 5 1 4 4 5 1 3 3 4 1 5 4 2 4 4 3
[6365] 4 4 5 3 1 1 5 3 2 4 5 2 2 2 4 1 5 4 2 3 5 4 3 5 1 4 2 4 3 2 5 1 1 2 5 3
Labels:
value label
1 Highly dissatisfied
2 Somewhat dissatisfied
3 Neutral
4 Somewhat satisfied
5 Highly satisfied
[1] "Highly dissatisfied"
[1] "Highly dissatisfied"
11.4 Converting labelled data
It should be noted that
value labels
doesn’t imply that your vectors should be considered as categorical or continuous.Therefore,
value labels
are not intended to be used for data analysis. For example, before performing modeling or plotting, you should convert vectors with value labels into factors or into classic numeric/character vectors.Labelled data cheatsheet: https://raw.githubusercontent.com/rstudio/cheatsheets/main/labelled.pdf
11.4.1 Converting to factors
- Notice the data after conversion
- If we had to convert labelled data to character and then to factor, there are a lot of wrangling to do.
haven::as_factor()
convert labelled data to factor with level descriptors directly, preserving the order of levels that were used to be associated with the value labels.
```{r converting labelled data to unlabelled data}
data %>%
select(marital, inccat, carcat, jobsat) %>%
glimpse()
data %>%
transmute(marital = haven::as_factor(marital),
inccat = haven::as_factor(inccat),
carcat = haven::as_factor(carcat)) %>%
glimpse()
# cf: compare with the above
data %>%
transmute(marital = as_factor(marital),
inccat = as_factor(inccat),
carcat = as_factor(carcat)) %>%
glimpse()
```
Rows: 6,400
Columns: 4
$ marital <dbl+lbl> 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1…
$ inccat <dbl+lbl> 3, 4, 2, 2, 1, 4, 2, 3, 1, 4, 3, 1, 2, 4, 3, 4, 2, 2, 4, 4…
$ carcat <dbl+lbl> 3, 3, 1, 1, 1, 3, 2, 2, 1, 3, 3, 1, 2, 3, 3, 3, 2, 1, 3, 3…
$ jobsat <dbl+lbl> 5, 4, 3, 1, 2, 2, 2, 1, 5, 4, 3, 5, 2, 1, 4, 5, 1, 4, 3, 5…
Rows: 6,400
Columns: 3
$ marital <fct> Married, Unmarried, Married, Married, Unmarried, Married, Unma…
$ inccat <fct> $50 - $74, $75+, $25 - $49, $25 - $49, Under $25, $75+, $25 - …
$ carcat <fct> Luxury, Luxury, Economy, Economy, Economy, Luxury, Standard, S…
Rows: 6,400
Columns: 3
$ marital <fct> 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0,…
$ inccat <fct> 3, 4, 2, 2, 1, 4, 2, 3, 1, 4, 3, 1, 2, 4, 3, 4, 2, 2, 4, 4, 1,…
$ carcat <fct> 3, 3, 1, 1, 1, 3, 2, 2, 1, 3, 3, 1, 2, 3, 3, 3, 2, 1, 3, 3, 1,…
11.4.1.1 Alt.
```{r}
# convert selected variables to factors
data %>%
mutate_at(.vars = vars(marital, inccat, carcat), haven::as_factor) %>%
select(marital, inccat, carcat) %>%
glimpse()
```
Rows: 6,400
Columns: 3
$ marital <fct> Married, Unmarried, Married, Married, Unmarried, Married, Unma…
$ inccat <fct> $50 - $74, $75+, $25 - $49, $25 - $49, Under $25, $75+, $25 - …
$ carcat <fct> Luxury, Luxury, Economy, Economy, Economy, Luxury, Standard, S…
11.4.2 Converting to doubles
Rows: 6,400
Columns: 29
$ age <dbl> 55, 56, 28, 24, 25, 45, 42, 35, 46, 34, 55, 28, 31, 42, 35, 5…
$ marital <dbl> 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0…
$ address <dbl> 12, 29, 9, 4, 2, 9, 19, 15, 26, 0, 17, 3, 9, 8, 8, 24, 1, 0, …
$ income <dbl> 72, 153, 28, 26, 23, 76, 40, 57, 24, 89, 72, 24, 40, 137, 70,…
$ inccat <dbl> 3, 4, 2, 2, 1, 4, 2, 3, 1, 4, 3, 1, 2, 4, 3, 4, 2, 2, 4, 4, 1…
$ car <dbl> 36.2, 76.9, 13.7, 12.5, 11.3, 37.2, 19.8, 28.2, 12.2, 46.1, 3…
$ carcat <dbl> 3, 3, 1, 1, 1, 3, 2, 2, 1, 3, 3, 1, 2, 3, 3, 3, 2, 1, 3, 3, 1…
$ ed <dbl> 1, 1, 3, 4, 2, 3, 3, 2, 1, 3, 3, 4, 4, 3, 3, 4, 3, 1, 3, 2, 1…
$ employ <dbl> 23, 35, 4, 0, 5, 13, 10, 1, 11, 12, 2, 4, 0, 3, 9, 16, 0, 2, …
$ retire <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ empcat <dbl> 3, 3, 1, 1, 2, 2, 2, 1, 2, 2, 1, 1, 1, 1, 2, 3, 1, 1, 3, 3, 1…
$ jobsat <dbl> 5, 4, 3, 1, 2, 2, 2, 1, 5, 4, 3, 5, 2, 1, 4, 5, 1, 4, 3, 5, 3…
$ gender <chr> "f", "m", "f", "m", "m", "m", "m", "f", "f", "m", "f", "m", "…
$ reside <dbl> 4, 1, 3, 3, 2, 2, 1, 1, 2, 6, 2, 1, 4, 1, 3, 2, 7, 2, 1, 4, 1…
$ wireless <dbl> 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0…
$ multline <dbl> 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0…
$ voice <dbl> 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0…
$ pager <dbl> 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0…
$ internet <dbl> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0…
$ callid <dbl> 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0…
$ callwait <dbl> 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0…
$ owntv <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ ownvcr <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0…
$ owncd <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0…
$ ownpda <dbl> 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0…
$ ownpc <dbl> 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 0…
$ ownfax <dbl> 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0…
$ news <dbl> 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1…
$ response <dbl> 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1…
11.4.3 Converting to characters
- When converting a labelled vector into a character vector of the value labels, be aware that original values of the vector will be converted.
- numbers will be gone
Rows: 6,400
Columns: 29
$ age <dbl> 55, 56, 28, 24, 25, 45, 42, 35, 46, 34, 55, 28, 31, 42, 35, 5…
$ marital <chr> "Married", "Unmarried", "Married", "Married", "Unmarried", "M…
$ address <dbl> 12, 29, 9, 4, 2, 9, 19, 15, 26, 0, 17, 3, 9, 8, 8, 24, 1, 0, …
$ income <dbl> 72, 153, 28, 26, 23, 76, 40, 57, 24, 89, 72, 24, 40, 137, 70,…
$ inccat <dbl+lbl> 3, 4, 2, 2, 1, 4, 2, 3, 1, 4, 3, 1, 2, 4, 3, 4, 2, 2, 4, …
$ car <dbl> 36.2, 76.9, 13.7, 12.5, 11.3, 37.2, 19.8, 28.2, 12.2, 46.1, 3…
$ carcat <dbl+lbl> 3, 3, 1, 1, 1, 3, 2, 2, 1, 3, 3, 1, 2, 3, 3, 3, 2, 1, 3, …
$ ed <dbl+lbl> 1, 1, 3, 4, 2, 3, 3, 2, 1, 3, 3, 4, 4, 3, 3, 4, 3, 1, 3, …
$ employ <dbl> 23, 35, 4, 0, 5, 13, 10, 1, 11, 12, 2, 4, 0, 3, 9, 16, 0, 2, …
$ retire <dbl+lbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ empcat <dbl+lbl> 3, 3, 1, 1, 2, 2, 2, 1, 2, 2, 1, 1, 1, 1, 2, 3, 1, 1, 3, …
$ jobsat <dbl+lbl> 5, 4, 3, 1, 2, 2, 2, 1, 5, 4, 3, 5, 2, 1, 4, 5, 1, 4, 3, …
$ gender <chr+lbl> "f", "m", "f", "m", "m", "m", "m", "f", "f", "m", "f", "m…
$ reside <dbl> 4, 1, 3, 3, 2, 2, 1, 1, 2, 6, 2, 1, 4, 1, 3, 2, 7, 2, 1, 4, 1…
$ wireless <dbl+lbl> 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, …
$ multline <dbl+lbl> 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, …
$ voice <dbl+lbl> 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, …
$ pager <dbl+lbl> 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, …
$ internet <dbl+lbl> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, …
$ callid <dbl+lbl> 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, …
$ callwait <dbl+lbl> 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, …
$ owntv <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ ownvcr <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, …
$ owncd <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, …
$ ownpda <dbl+lbl> 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, …
$ ownpc <dbl+lbl> 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, …
$ ownfax <dbl+lbl> 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, …
$ news <dbl+lbl> 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, …
$ response <dbl+lbl> 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, …
11.5 Create a new variable with a row-wise computation
- Compute each person’s total number of electronics owned.
```{r}
glimpse(data)
data %>%
#select(starts_with("own")) %>%
rowwise(age) %>%
mutate(total_owns = sum(c_across(starts_with("own")))) %>%
var_labels(total_owns = "Total Electronics Owned") %>%
select(starts_with("own"), total_owns) %>%
#view()
head()
by_row_data <- data %>%
rowwise(age) %>%
mutate(total_owns = sum(c_across(starts_with("own")))) %>%
set_variable_labels(total_owns = "Total Electronics Owned") %>%
ungroup()
by_row_data %>% #view()
select(contains("own")) %>%
head()
var_label(by_row_data$total_owns)
# alternative way of rowwise summation
data |>
mutate(total_elect = owntv + ownvcr + owncd + ownpda + ownpc + ownfax) |>
select(starts_with("own"), total_elect)
# cf
data |>
mutate(total_elect = sum(across(starts_with("own")))) |>
select(starts_with("own"), total_elect)
# cf. by_row_data
by_row_data |>
select(contains("own"))
```
Rows: 6,400
Columns: 29
$ age <dbl> 55, 56, 28, 24, 25, 45, 42, 35, 46, 34, 55, 28, 31, 42, 35, 5…
$ marital <dbl+lbl> 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, …
$ address <dbl> 12, 29, 9, 4, 2, 9, 19, 15, 26, 0, 17, 3, 9, 8, 8, 24, 1, 0, …
$ income <dbl> 72, 153, 28, 26, 23, 76, 40, 57, 24, 89, 72, 24, 40, 137, 70,…
$ inccat <dbl+lbl> 3, 4, 2, 2, 1, 4, 2, 3, 1, 4, 3, 1, 2, 4, 3, 4, 2, 2, 4, …
$ car <dbl> 36.2, 76.9, 13.7, 12.5, 11.3, 37.2, 19.8, 28.2, 12.2, 46.1, 3…
$ carcat <dbl+lbl> 3, 3, 1, 1, 1, 3, 2, 2, 1, 3, 3, 1, 2, 3, 3, 3, 2, 1, 3, …
$ ed <dbl+lbl> 1, 1, 3, 4, 2, 3, 3, 2, 1, 3, 3, 4, 4, 3, 3, 4, 3, 1, 3, …
$ employ <dbl> 23, 35, 4, 0, 5, 13, 10, 1, 11, 12, 2, 4, 0, 3, 9, 16, 0, 2, …
$ retire <dbl+lbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ empcat <dbl+lbl> 3, 3, 1, 1, 2, 2, 2, 1, 2, 2, 1, 1, 1, 1, 2, 3, 1, 1, 3, …
$ jobsat <dbl+lbl> 5, 4, 3, 1, 2, 2, 2, 1, 5, 4, 3, 5, 2, 1, 4, 5, 1, 4, 3, …
$ gender <chr+lbl> "f", "m", "f", "m", "m", "m", "m", "f", "f", "m", "f", "m…
$ reside <dbl> 4, 1, 3, 3, 2, 2, 1, 1, 2, 6, 2, 1, 4, 1, 3, 2, 7, 2, 1, 4, 1…
$ wireless <dbl+lbl> 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, …
$ multline <dbl+lbl> 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, …
$ voice <dbl+lbl> 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, …
$ pager <dbl+lbl> 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, …
$ internet <dbl+lbl> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, …
$ callid <dbl+lbl> 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, …
$ callwait <dbl+lbl> 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, …
$ owntv <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ ownvcr <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, …
$ owncd <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, …
$ ownpda <dbl+lbl> 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, …
$ ownpc <dbl+lbl> 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, …
$ ownfax <dbl+lbl> 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, …
$ news <dbl+lbl> 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, …
$ response <dbl+lbl> 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, …
[1] "Total Electronics Owned"
11.6 Tidying data
- Generate insights on consumer electronics ownership
```{r}
#library(tidyverse)
data %>%
select(starts_with("own")) %>%
head(10)
# Long data
data %>%
pivot_longer(starts_with("own"), names_to = "owns",
values_to = "measures"
) %>%
select(age, marital, inccat, ed, owns, measures)
owns_long <- data %>%
pivot_longer(starts_with("own"), names_to = "owns",
values_to = "measures"
) %>%
select(age, marital, inccat, ed, owns, measures)
```
11.7 Descriptive statistics in Tables
11.7.1 Frequencies
- For categorical data
```{r}
skimr::skim(data)
var_label(data$marital)
view_df(data)
# marital
var_label(data$marital)
data %>%
mutate(marital = haven::as_factor(marital)) |>
count(marital) |>
mutate(percent = n/sum(n),
cum_sum = cumsum(percent)) |>
gt() |>
tab_header("Marital Status Frequency") |>
fmt_percent(columns = c(percent, cum_sum),
decimals = 2
) |>
fmt_number(columns = n,
decimals = 0)
# inccat
var_label(data$inccat)
data %>%
mutate(inccat = haven::as_factor(inccat)) %>%
count(inccat) %>%
mutate(percent = n/sum(n),
cum_sum = cumsum(percent)
) |>
gt() |>
tab_header(title = "Income category Frequency",
subtitle = "in thousands") |>
fmt_percent(columns = c(percent, cum_sum),
decimals = 2
) |>
fmt_number(columns = n,
decimals = 0)
# carcat
var_label(data$carcat)
data |>
mutate(carcat = haven::as_factor(carcat)) |>
count(carcat) |>
mutate(percent = n/sum(n),
cum_sum = cumsum(percent)) |>
gt() |>
tab_header(title = "Primary vehicle price category Frequency") |>
fmt_percent(columns = c(percent, cum_sum),
decimals = 2
) |>
fmt_number(columns = n,
decimals = 0)
```
Name | data |
Number of rows | 6400 |
Number of columns | 29 |
_______________________ | |
Column type frequency: | |
character | 1 |
numeric | 28 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
gender | 0 | 1 | 1 | 1 | 0 | 2 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
age | 0 | 1.00 | 42.06 | 12.29 | 18.0 | 33.0 | 41.0 | 51.0 | 77.0 | ▅▇▇▃▁ |
marital | 0 | 1.00 | 0.50 | 0.50 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | ▇▁▁▁▇ |
address | 0 | 1.00 | 11.56 | 9.94 | 0.0 | 3.0 | 9.0 | 17.0 | 56.0 | ▇▃▂▁▁ |
income | 0 | 1.00 | 69.47 | 78.72 | 9.0 | 28.0 | 45.0 | 79.0 | 1116.0 | ▇▁▁▁▁ |
inccat | 0 | 1.00 | 2.53 | 1.07 | 1.0 | 2.0 | 2.0 | 4.0 | 4.0 | ▃▇▁▃▆ |
car | 0 | 1.00 | 30.13 | 21.93 | 4.2 | 13.9 | 22.2 | 39.5 | 99.9 | ▇▃▂▂▁ |
carcat | 0 | 1.00 | 2.07 | 0.80 | 1.0 | 1.0 | 2.0 | 3.0 | 3.0 | ▆▁▇▁▇ |
ed | 0 | 1.00 | 2.59 | 1.20 | 1.0 | 2.0 | 2.0 | 4.0 | 5.0 | ▆▇▆▆▂ |
employ | 0 | 1.00 | 10.57 | 9.72 | 0.0 | 3.0 | 8.0 | 16.0 | 57.0 | ▇▃▁▁▁ |
retire | 0 | 1.00 | 0.05 | 0.21 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | ▇▁▁▁▁ |
empcat | 0 | 1.00 | 1.94 | 0.79 | 1.0 | 1.0 | 2.0 | 3.0 | 3.0 | ▇▁▇▁▆ |
jobsat | 0 | 1.00 | 3.06 | 1.37 | 1.0 | 2.0 | 3.0 | 4.0 | 5.0 | ▆▇▇▇▇ |
reside | 0 | 1.00 | 2.35 | 1.47 | 1.0 | 1.0 | 2.0 | 3.0 | 9.0 | ▇▃▁▁▁ |
wireless | 0 | 1.00 | 0.40 | 0.49 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | ▇▁▁▁▅ |
multline | 0 | 1.00 | 0.42 | 0.49 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | ▇▁▁▁▆ |
voice | 0 | 1.00 | 0.43 | 0.50 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | ▇▁▁▁▆ |
pager | 0 | 1.00 | 0.25 | 0.43 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | ▇▁▁▁▂ |
internet | 255 | 0.96 | 0.27 | 0.44 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | ▇▁▁▁▃ |
callid | 0 | 1.00 | 0.51 | 0.50 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | ▇▁▁▁▇ |
callwait | 0 | 1.00 | 0.51 | 0.50 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | ▇▁▁▁▇ |
owntv | 0 | 1.00 | 0.99 | 0.10 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | ▁▁▁▁▇ |
ownvcr | 0 | 1.00 | 0.96 | 0.20 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | ▁▁▁▁▇ |
owncd | 0 | 1.00 | 0.97 | 0.17 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | ▁▁▁▁▇ |
ownpda | 0 | 1.00 | 0.20 | 0.40 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | ▇▁▁▁▂ |
ownpc | 0 | 1.00 | 0.44 | 0.50 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | ▇▁▁▁▆ |
ownfax | 0 | 1.00 | 0.19 | 0.39 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | ▇▁▁▁▂ |
news | 0 | 1.00 | 0.57 | 0.50 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | ▆▁▁▁▇ |
response | 0 | 1.00 | 0.89 | 0.31 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | ▁▁▁▁▇ |
[1] "Marital status"
ID | Name | Label | Values | Value Labels |
1 | age | Age in years | range: 18-77 | |
2 | marital | Marital status | 0 1 |
Unmarried Married |
3 | address | Years at current address | range: 0-56 | |
4 | income | Household income in thousands | range: 9-1116 | |
5 | inccat | Income category in thousands | 1 2 3 4 |
Under $25 $25 - $49 $50 - $74 $75+ |
6 | car | Price of primary vehicle | range: 4.2-99.9 | |
7 | cbackground-color:#eeeeeeat | Primary vehicle price category | 1 2 3 |
Economy Standard Luxury |
8 | ed | Level of education | 1 2 3 4 5 |
Did not complete high school High school degree Some college College degree Post-undergraduate degree |
9 | employ | Years with current employer | range: 0-57 | |
10 | retire | Retired | 0 1 |
No Yes |
11 | empcat | Years with current employer | 1 2 3 |
Less than 5 5 to 15 More than 15 |
12 | jobsat | Job satisfaction | 1 2 3 4 5 |
Highly dissatisfied Somewhat dissatisfied Neutral Somewhat satisfied Highly satisfied |
13 | gender | Gender | f m |
<output omitted> |
14 | reside | Number of people in household | range: 1-9 | |
15 | wireless | Wireless service | 0 1 |
No Yes |
16 | multline | Multiple lines | 0 1 |
No Yes |
17 | voice | Voice mail | 0 1 |
No Yes |
18 | pager | Paging service | 0 1 |
No Yes |
19 | internet | Internet | 0 1 8 9 |
No Yes Does not know No Answer |
20 | callid | Caller ID | 0 1 |
No Yes |
21 | callwait | Call waiting | 0 1 |
No Yes |
22 | owntv | Owns TV | 0 1 |
No Yes |
23 | ownvcr | Owns VCR | 0 1 |
No Yes |
24 | owncd | Owns stereo/CD player | 0 1 |
No Yes |
25 | ownpda | Owns PDA | 0 1 |
No Yes |
26 | ownpc | Owns computer | 0 1 |
No Yes |
27 | ownfax | Owns fax machine | 0 1 |
No Yes |
28 | news | Newspaper subscription | 0 1 |
Yes No |
29 | response | Response | 0 1 |
Yes No |
[1] "Marital status"
Marital Status Frequency | |||
---|---|---|---|
Marital status | n | percent | cum_sum |
Unmarried | 3,224 | 50.38% | 50.38% |
Married | 3,176 | 49.62% | 100.00% |
[1] "Income category in thousands"
Income category Frequency | |||
---|---|---|---|
in thousands | |||
Income category in thousands | n | percent | cum_sum |
Under $25 | 1,174 | 18.34% | 18.34% |
$25 - $49 | 2,388 | 37.31% | 55.66% |
$50 - $74 | 1,120 | 17.50% | 73.16% |
$75+ | 1,718 | 26.84% | 100.00% |
[1] "Primary vehicle price category"
Primary vehicle price category Frequency | |||
---|---|---|---|
Primary vehicle price category | n | percent | cum_sum |
Economy | 1,841 | 28.77% | 28.77% |
Standard | 2,275 | 35.55% | 64.31% |
Luxury | 2,284 | 35.69% | 100.00% |
11.7.2 Percentiles
- for continuous data
```{r}
glimpse(data)
view_df(data)
data %>%
select(age, car, employ) %>%
summarize(
Age =
quantile(age, probs = c(0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0)),
Car_price =
quantile(car, probs = c(0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0)),
Years_current_employer =
quantile(employ, probs = c(0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0))
) |>
mutate(Percentile = c("10th",
"20th",
"30th",
"40th",
"50th",
"60th",
"70th",
"80th",
"90th",
"100th")) |>
relocate(Percentile) |>
gt() |>
tab_header(title = md("**Percentiles for selected continuous variables**")) |>
tab_source_note(
source_note = md("**Data source:** The data is from demo.sav file, a file contained in SPSS Program")
)
```
Rows: 6,400
Columns: 29
$ age <dbl> 55, 56, 28, 24, 25, 45, 42, 35, 46, 34, 55, 28, 31, 42, 35, 5…
$ marital <dbl+lbl> 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, …
$ address <dbl> 12, 29, 9, 4, 2, 9, 19, 15, 26, 0, 17, 3, 9, 8, 8, 24, 1, 0, …
$ income <dbl> 72, 153, 28, 26, 23, 76, 40, 57, 24, 89, 72, 24, 40, 137, 70,…
$ inccat <dbl+lbl> 3, 4, 2, 2, 1, 4, 2, 3, 1, 4, 3, 1, 2, 4, 3, 4, 2, 2, 4, …
$ car <dbl> 36.2, 76.9, 13.7, 12.5, 11.3, 37.2, 19.8, 28.2, 12.2, 46.1, 3…
$ carcat <dbl+lbl> 3, 3, 1, 1, 1, 3, 2, 2, 1, 3, 3, 1, 2, 3, 3, 3, 2, 1, 3, …
$ ed <dbl+lbl> 1, 1, 3, 4, 2, 3, 3, 2, 1, 3, 3, 4, 4, 3, 3, 4, 3, 1, 3, …
$ employ <dbl> 23, 35, 4, 0, 5, 13, 10, 1, 11, 12, 2, 4, 0, 3, 9, 16, 0, 2, …
$ retire <dbl+lbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ empcat <dbl+lbl> 3, 3, 1, 1, 2, 2, 2, 1, 2, 2, 1, 1, 1, 1, 2, 3, 1, 1, 3, …
$ jobsat <dbl+lbl> 5, 4, 3, 1, 2, 2, 2, 1, 5, 4, 3, 5, 2, 1, 4, 5, 1, 4, 3, …
$ gender <chr+lbl> "f", "m", "f", "m", "m", "m", "m", "f", "f", "m", "f", "m…
$ reside <dbl> 4, 1, 3, 3, 2, 2, 1, 1, 2, 6, 2, 1, 4, 1, 3, 2, 7, 2, 1, 4, 1…
$ wireless <dbl+lbl> 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, …
$ multline <dbl+lbl> 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, …
$ voice <dbl+lbl> 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, …
$ pager <dbl+lbl> 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, …
$ internet <dbl+lbl> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, …
$ callid <dbl+lbl> 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, …
$ callwait <dbl+lbl> 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, …
$ owntv <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ ownvcr <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, …
$ owncd <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, …
$ ownpda <dbl+lbl> 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, …
$ ownpc <dbl+lbl> 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, …
$ ownfax <dbl+lbl> 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, …
$ news <dbl+lbl> 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, …
$ response <dbl+lbl> 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, …
ID | Name | Label | Values | Value Labels |
---|---|---|---|---|
1 | age | Age in years | range: 18-77 | |
2 | marital | Marital status |
0 1 |
Unmarried Married |
3 | address | Years at current address | range: 0-56 | |
4 | income | Household income in thousands | range: 9-1116 | |
5 | inccat | Income category in thousands |
1 2 3 4 |
Under $25 $25 - $49 $50 - $74 $75+ |
6 | car | Price of primary vehicle | range: 4.2-99.9 | |
7 | cbackground-color:#eeeeeeat | Primary vehicle price category |
1 2 3 |
Economy Standard Luxury |
8 | ed | Level of education |
1 2 3 4 5 |
Did not complete high school High school degree Some college College degree Post-undergraduate degree |
9 | employ | Years with current employer | range: 0-57 | |
10 | retire | Retired |
0 1 |
No Yes |
11 | empcat | Years with current employer |
1 2 3 |
Less than 5 5 to 15 More than 15 |
12 | jobsat | Job satisfaction |
1 2 3 4 5 |
Highly dissatisfied Somewhat dissatisfied Neutral Somewhat satisfied Highly satisfied |
13 | gender | Gender |
f m |
<output omitted> |
14 | reside | Number of people in household | range: 1-9 | |
15 | wireless | Wireless service |
0 1 |
No Yes |
16 | multline | Multiple lines |
0 1 |
No Yes |
17 | voice | Voice mail |
0 1 |
No Yes |
18 | pager | Paging service |
0 1 |
No Yes |
19 | internet | Internet |
0 1 8 9 |
No Yes Does not know No Answer |
20 | callid | Caller ID |
0 1 |
No Yes |
21 | callwait | Call waiting |
0 1 |
No Yes |
22 | owntv | Owns TV |
0 1 |
No Yes |
23 | ownvcr | Owns VCR |
0 1 |
No Yes |
24 | owncd | Owns stereo/CD player |
0 1 |
No Yes |
25 | ownpda | Owns PDA |
0 1 |
No Yes |
26 | ownpc | Owns computer |
0 1 |
No Yes |
27 | ownfax | Owns fax machine |
0 1 |
No Yes |
28 | news | Newspaper subscription |
0 1 |
Yes No |
29 | response | Response |
0 1 |
Yes No |
Percentiles for selected continuous variables | |||
---|---|---|---|
Percentile | Age | Car_price | Years_current_employer |
10th | 26 | 10.00 | 0 |
20th | 31 | 12.60 | 2 |
30th | 34 | 15.20 | 4 |
40th | 38 | 18.20 | 6 |
50th | 41 | 22.20 | 8 |
60th | 45 | 27.20 | 11 |
70th | 49 | 34.10 | 14 |
80th | 53 | 47.22 | 18 |
90th | 59 | 69.10 | 25 |
100th | 77 | 99.90 | 57 |
Data source: The data is from demo.sav file, a file contained in SPSS Program |
11.7.3 means/sd from a Wide Data
```{r}
# means
data |>
summarise(across(starts_with("own"), \(x) mean(x, na.rm = TRUE))) |>
gt() |>
tab_header(title = md("**Proportion of Participants who Own Electronics**")
) |>
fmt_percent(columns = everything(),
decimals = 1) |>
tab_source_note(
source_note = md(
"**Data source:** The data is from demo.sav file, a file contained in SPSS Program"
)
)
# sd
data |>
summarise(across(starts_with("own"), \(x) sd(x, na.rm = TRUE)
)
) |>
gt() |>
tab_header(title = md("**Standard Deviation of Electronics Ownership**")
) |>
fmt_percent(columns = everything(),
decimals = 1) |>
tab_source_note(
source_note = md(
"**Data source:** The data is from demo.sav file, a file contained in SPSS Program"
)
)
# both
data %>%
summarize(across(starts_with("own"),
.fns = list(prop = mean, sd = sd),
.names = "{.col}_{.fn}"))
```
Proportion of Participants who Own Electronics | |||||
---|---|---|---|---|---|
owntv | ownvcr | owncd | ownpda | ownpc | ownfax |
99.0% | 96.0% | 97.0% | 20.4% | 43.9% | 18.8% |
Data source: The data is from demo.sav file, a file contained in SPSS Program |
Standard Deviation of Electronics Ownership | |||||
---|---|---|---|---|---|
owntv | ownvcr | owncd | ownpda | ownpc | ownfax |
9.9% | 19.6% | 17.1% | 40.3% | 49.6% | 39.1% |
Data source: The data is from demo.sav file, a file contained in SPSS Program |
11.7.4 Means/SD from a Long Data (Recommended)
```{r}
# by individual consumer electronics items
owns_long %>%
group_by(owns) %>%
summarize_at(.vars = vars(measures),
.funs = list(Proportion = mean, sd = sd)) %>%
gt() |>
tab_header(title = md("**Summary Statistics for Electronics Ownership**")
) |>
fmt_number(columns = c(Proportion, sd),
decimals = 2) |>
tab_source_note(
source_note = md(
"**Data source:** The data is from demo.sav file, a file contained in SPSS Program"
)
)
```
Summary Statistics for Electronics Ownership | ||
---|---|---|
owns | Proportion | sd |
owncd | 0.97 | 0.17 |
ownfax | 0.19 | 0.39 |
ownpc | 0.44 | 0.50 |
ownpda | 0.20 | 0.40 |
owntv | 0.99 | 0.10 |
ownvcr | 0.96 | 0.20 |
Data source: The data is from demo.sav file, a file contained in SPSS Program |
11.7.4.1 Alt.
11.8 Visualizing Likert-Type Scales
11.8.1 Inefficient way
- Don’t use three step approach: from labelled to character to factor and then releveling as it takes more steps.
- Don’t factorize labelled data using
factor()
as you lose rich label information
```{r}
var_label(data$inccat)
data %>%
mutate(inccat = as_character(inccat),
inccat = fct_relevel(inccat, "Under $25")) %>%
ggplot() +
geom_bar(aes(x= inccat, y = after_stat(prop), group = 1), fill = 'purple') +
scale_y_continuous(labels = scales::percent) +
labs(title = "Income categories in the demo data",
subtitle = "Distribution of income",
x = "Income category in thousands",
y = "",
caption = ""
)
var_label(data$carcat)
data %>%
mutate(carcat = as_character(carcat),
carcat = fct_relevel(carcat, c("Economy",
"Standard",
"Luxury"))) %>%
ggplot() +
geom_bar(aes(x= carcat, y = after_stat(prop), group = 1), fill = 'orange') +
scale_y_continuous(labels = scales::percent) +
labs(title = "Primary vehicle price category in the demo data",
subtitle = "",
x = "Primary vehicle price category",
y = "",
caption = ""
)
var_label(data$jobsat)
val_labels(data$jobsat)
data %>%
ggplot() +
geom_bar(aes(x= factor(jobsat), y = after_stat(prop), group = 1), fill = '#D35400') +
scale_y_continuous(labels = scales::percent) +
labs(title = "Job satisfaction in the demo data",
subtitle = "",
x = "Job satisfaction",
y = "",
caption = "0 = Highly dissatisfied, 5 = Highly satisfied"
)
```
[1] "Income category in thousands"
[1] "Primary vehicle price category"
[1] "Job satisfaction"
<labelled<double>[6400]>: Job satisfaction
[1] 5 4 3 1 2 2 2 1 5 4 3 5 2 1 4 5 1 4 3 5 3 3 3 3 4 3 1 5 5 4 1 4 1 2 5 1 1
[38] 3 4 3 3 2 5 2 3 1 2 3 2 3 2 5 4 2 2 1 3 3 3 3 5 3 2 2 1 1 4 3 4 2 5 5 4 5
[75] 2 3 5 5 5 3 3 4 4 2 2 5 4 3 4 4 5 4 1 3 5 1 4 2 2 2 3 3 4 3 3 2 5 4 1 4 4
[112] 2 2 2 1 4 1 1 1 5 3 2 5 1 4 3 2 2 1 3 2 1 1 2 1 3 1 3 4 1 4 3 1 5 1 4 2 2
[149] 2 5 2 2 2 4 4 5 2 2 1 2 3 4 3 1 2 3 4 2 2 2 2 1 3 4 4 1 4 4 4 2 1 4 3 2 2
[186] 3 4 5 5 5 3 2 1 5 5 2 1 3 5 3 3 5 3 4 2 4 2 1 5 2 5 3 5 4 5 3 4 2 4 4 1 1
[223] 2 2 4 1 1 5 1 3 2 1 4 5 1 3 4 4 5 5 5 5 4 2 2 4 1 4 5 5 5 4 2 5 5 1 4 5 1
[260] 5 2 4 5 4 1 3 5 5 2 3 5 1 4 5 5 4 3 4 2 4 3 3 4 3 1 2 5 2 2 3 2 5 4 2 4 3
[297] 2 2 1 1 4 1 2 4 5 5 3 1 2 4 5 4 4 4 4 5 3 4 1 5 1 4 4 3 4 1 1 4 5 5 2 3 3
[334] 3 3 2 1 5 3 2 1 4 4 3 3 3 1 5 1 4 1 5 5 1 3 5 3 5 5 3 2 2 3 4 4 5 4 5 2 1
[371] 5 5 2 3 5 2 3 3 3 3 3 4 2 1 5 4 4 3 2 1 3 5 4 3 2 4 4 4 1 4 4 1 1 1 1 5 3
[408] 3 4 4 4 2 3 1 2 2 4 4 3 5 3 3 3 5 4 3 4 5 2 3 4 4 4 1 4 2 3 2 3 5 3 5 1 2
[445] 1 5 4 1 5 5 2 5 3 4 4 5 3 4 2 4 4 4 1 3 3 5 2 3 2 3 4 3 3 4 2 2 1 3 5 4 3
[482] 3 1 3 4 1 1 1 3 5 4 2 2 4 1 4 5 3 3 1 2 4 1 3 4 5 3 3 5 4 2 3 1 2 4 3 2 4
[519] 5 3 4 5 1 3 4 2 1 5 2 1 4 2 1 5 4 3 4 2 5 4 1 3 5 4 1 2 3 2 4 2 2 4 2 4 3
[556] 2 1 1 1 3 1 4 2 4 2 4 3 1 5 5 4 2 3 4 3 1 1 4 1 1 1 1 2 3 3 4 3 1 4 4 2 5
[593] 4 2 5 4 5 4 2 2 3 4 4 4 3 3 4 5 5 2 3 1 5 5 4 3 2 1 3 5 5 3 3 1 3 4 3 3 2
[630] 3 1 5 5 4 3 1 4 2 2 2 3 1 5 3 5 1 2 4 2 3 2 2 2 4 2 3 2 2 5 2 4 3 2 4 2 4
[667] 4 1 2 3 1 5 1 4 5 3 3 2 4 4 3 3 2 5 1 4 2 5 4 2 4 1 1 4 1 2 3 2 2 5 5 3 4
[704] 5 4 5 1 5 3 4 2 2 3 3 1 2 1 3 3 3 2 2 2 3 2 4 2 5 1 5 2 4 1 4 4 4 3 2 5 4
[741] 4 2 4 4 3 2 3 4 2 5 3 1 2 3 5 5 1 3 5 2 3 4 5 5 2 2 4 3 5 2 4 4 5 3 1 3 5
[778] 1 5 5 2 2 5 4 2 2 2 3 1 3 5 3 1 3 4 2 5 3 5 5 3 2 3 4 1 5 5 1 4 1 2 2 1 2
[815] 2 5 2 3 5 2 3 2 4 2 1 3 3 2 5 3 5 3 1 3 5 4 1 3 1 5 4 2 4 3 1 1 1 4 5 3 3
[852] 1 5 2 2 4 4 1 1 2 5 4 1 4 4 2 5 1 2 4 3 3 2 1 3 5 4 2 3 3 3 2 3 3 1 4 1 3
[889] 2 3 1 3 2 1 3 5 4 3 3 1 5 3 4 4 2 2 3 5 3 5 3 5 3 4 4 3 3 3 1 1 1 4 3 5 4
[926] 2 2 3 2 5 3 1 3 5 3 2 4 1 3 5 5 1 5 5 2 4 5 4 4 2 3 3 2 1 3 5 3 2 5 4 2 2
[963] 5 3 4 3 1 1 1 1 5 4 5 5 2 5 5 2 3 4 4 4 4 4 3 1 2 5 4 4 3 4 1 4 4 5 3 3 1
[1000] 4 1 4 1 2 5 5 4 3 5 3 4 4 1 4 1 5 3 2 4 5 5 5 1 2 5 5 4 5 4 3 4 3 4 5 4 1
[1037] 5 4 4 2 2 3 1 1 2 4 3 3 5 4 4 2 2 3 2 3 2 3 5 3 5 1 2 4 1 1 3 2 5 3 3 4 5
[1074] 2 2 5 3 4 3 5 3 1 5 3 4 2 1 5 2 3 3 4 2 2 3 3 3 1 2 5 5 3 3 2 4 5 4 5 4 5
[1111] 1 4 1 2 4 3 2 5 1 3 1 1 2 3 5 3 3 1 3 1 5 3 4 2 1 3 1 2 2 5 5 4 5 1 1 4 5
[1148] 4 3 4 2 3 4 5 1 2 3 2 5 5 2 2 1 3 3 2 3 2 5 3 1 1 3 2 3 4 2 1 2 4 2 3 4 5
[1185] 3 1 4 4 5 4 4 1 5 5 5 4 2 2 5 5 2 1 2 5 2 1 4 2 4 1 5 3 1 1 2 1 2 4 4 4 1
[1222] 2 3 4 3 5 5 1 1 5 4 5 4 1 3 4 2 2 2 3 4 4 1 4 2 3 5 5 3 4 4 1 3 1 3 5 4 4
[1259] 2 1 4 5 5 2 5 2 5 4 3 1 4 5 5 4 1 4 5 5 3 1 3 4 4 2 5 3 4 5 2 2 2 4 3 2 1
[1296] 3 1 4 4 2 4 1 3 2 2 4 3 1 2 1 5 3 4 5 5 4 2 4 2 1 5 4 5 4 4 5 3 4 1 4 4 3
[1333] 3 3 4 5 5 1 3 3 4 4 3 4 1 3 1 4 4 1 1 2 5 4 2 2 2 5 2 2 3 1 3 3 2 2 4 5 2
[1370] 4 1 2 2 2 1 3 5 4 3 3 2 3 3 5 4 3 1 3 1 4 4 2 3 3 3 4 5 4 1 3 3 2 5 5 1 1
[1407] 2 2 1 2 5 3 4 4 3 2 1 4 2 2 5 4 3 3 4 1 1 5 2 1 4 2 4 1 5 5 2 4 1 1 2 3 2
[1444] 4 3 4 5 4 4 5 1 5 5 5 1 1 5 5 5 4 2 5 1 3 3 5 1 1 1 5 2 4 1 2 5 4 2 1 4 3
[1481] 1 3 5 3 2 3 5 2 1 3 2 4 4 1 5 1 3 2 5 5 5 4 4 5 2 5 4 3 1 1 1 5 3 5 2 5 2
[1518] 3 5 2 2 5 2 2 5 4 4 1 1 2 5 4 1 2 1 4 2 1 3 2 2 5 3 5 1 4 3 1 5 1 4 3 5 2
[1555] 3 1 4 2 4 4 2 2 2 5 2 1 1 2 1 4 4 2 3 5 1 4 3 3 3 5 2 3 1 5 4 5 5 2 5 2 2
[1592] 2 2 5 3 4 2 5 1 2 3 1 3 5 3 4 4 5 1 3 5 4 2 5 1 4 1 2 5 4 3 3 3 5 3 2 2 1
[1629] 1 3 4 2 4 3 2 5 4 1 1 4 4 5 5 3 1 5 4 4 2 2 3 5 4 2 2 3 5 5 2 2 1 1 2 4 2
[1666] 5 5 4 1 2 4 3 5 4 1 1 1 5 2 2 1 4 3 4 3 5 1 5 5 1 3 4 5 5 2 3 5 4 1 3 3 3
[1703] 2 2 3 5 3 3 1 1 2 5 1 5 3 2 2 3 2 5 2 3 4 1 2 4 4 4 1 5 2 5 2 2 5 1 5 1 2
[1740] 1 4 4 4 4 2 3 1 5 5 3 5 3 3 1 5 3 5 1 5 1 4 1 4 3 1 4 5 2 5 2 5 3 3 3 2 3
[1777] 2 2 1 4 4 1 4 2 4 4 1 3 4 2 3 3 3 1 1 5 3 2 3 4 1 2 3 3 5 4 2 5 1 4 2 5 2
[1814] 5 4 2 3 5 1 1 4 1 3 4 5 2 5 5 1 2 3 3 5 5 5 2 2 2 5 1 2 2 5 5 3 4 4 3 3 5
[1851] 1 5 3 2 1 2 4 2 4 2 2 1 1 5 5 1 2 4 2 2 2 1 1 1 5 5 5 1 2 1 2 5 3 5 3 5 3
[1888] 5 3 3 3 5 4 4 1 2 2 4 1 3 3 1 4 4 1 5 5 2 5 4 4 3 1 4 5 1 4 5 3 4 1 1 3 2
[1925] 5 4 5 1 1 2 4 1 1 5 4 3 5 2 4 3 5 3 4 4 3 1 5 4 1 3 2 5 4 4 4 2 3 1 4 3 1
[1962] 2 1 5 3 3 5 1 5 2 2 3 3 5 4 3 3 4 1 4 2 3 1 3 3 3 2 3 3 2 2 2 2 4 3 4 2 3
[1999] 4 2 3 2 1 5 3 1 4 4 3 4 3 5 4 4 1 1 2 2 4 3 5 3 2 4 1 4 2 2 1 5 2 5 5 4 3
[2036] 2 1 5 4 1 3 5 4 3 1 1 2 5 5 4 4 2 3 5 4 4 3 4 3 5 3 5 1 5 1 3 3 4 2 4 5 5
[2073] 5 1 5 5 1 3 3 4 2 5 4 3 1 5 2 4 3 3 5 2 3 3 1 5 4 5 3 5 4 4 3 3 5 1 1 3 3
[2110] 4 2 2 5 5 3 5 3 3 5 3 5 4 5 4 3 2 2 5 2 3 1 4 5 2 2 5 4 5 5 4 3 1 3 4 2 2
[2147] 4 2 4 1 3 4 3 2 3 5 3 2 3 4 2 2 5 2 3 2 4 2 4 2 2 2 5 4 5 5 1 4 3 5 4 1 4
[2184] 4 5 2 2 4 1 3 4 1 3 3 3 1 3 2 3 5 1 1 5 1 1 4 2 5 1 3 3 1 3 4 4 3 5 5 3 5
[2221] 4 2 2 1 5 1 3 2 2 1 5 1 1 1 3 1 1 5 5 5 2 1 2 3 4 1 3 2 2 4 3 3 4 5 5 5 1
[2258] 3 2 1 5 5 1 4 5 5 1 1 1 3 4 1 4 2 3 4 4 5 5 5 2 5 5 4 1 4 1 2 3 4 3 2 1 1
[2295] 3 2 5 3 1 4 2 5 4 5 2 1 1 5 5 5 5 5 1 5 1 2 2 1 4 3 5 1 2 3 1 5 5 1 1 5 5
[2332] 5 4 3 1 3 2 2 3 4 3 4 3 3 4 3 5 4 2 4 3 2 1 4 5 2 2 3 2 3 3 4 3 5 2 4 3 5
[2369] 3 4 4 4 1 5 4 3 5 3 4 2 5 4 1 5 4 3 3 2 5 2 3 4 3 4 4 2 4 5 3 2 2 1 3 4 1
[2406] 4 2 5 1 2 1 4 5 4 5 3 2 2 1 4 3 1 5 4 5 4 5 5 3 3 3 5 1 5 2 3 3 3 3 2 5 4
[2443] 4 4 3 2 3 1 5 3 1 5 5 5 1 1 4 1 4 4 1 1 1 4 1 3 5 4 2 5 4 3 2 3 4 2 2 1 1
[2480] 3 3 1 4 5 3 5 5 5 4 5 2 2 4 2 3 2 1 5 3 4 2 2 5 2 1 1 4 3 5 3 4 3 3 4 1 4
[2517] 2 3 2 4 4 2 1 2 5 3 2 1 1 2 3 2 3 1 5 3 2 3 2 4 1 3 3 5 3 2 1 1 1 3 2 4 2
[2554] 3 5 3 3 1 5 2 3 4 3 3 1 1 4 2 3 1 4 1 3 4 4 1 5 4 4 1 4 4 1 2 1 1 4 2 3 5
[2591] 1 5 4 3 2 5 3 2 4 4 5 4 1 3 1 1 2 3 1 5 4 5 4 5 4 2 3 4 3 5 4 4 2 2 3 1 5
[2628] 5 2 2 3 5 3 1 2 5 2 4 1 1 3 2 3 5 2 3 2 1 4 3 1 4 3 1 3 4 4 4 3 3 4 2 3 5
[2665] 3 5 5 2 2 5 4 2 3 4 1 3 4 3 1 3 1 5 4 4 2 1 5 2 4 3 2 1 5 5 5 3 1 2 4 2 3
[2702] 2 2 5 1 3 5 4 2 1 3 3 4 3 4 4 1 4 2 1 5 1 4 1 4 5 2 5 2 4 2 1 4 2 1 4 2 1
[2739] 3 5 3 3 2 2 5 5 4 1 1 3 4 3 5 1 5 3 2 5 1 4 1 4 1 1 5 3 5 2 5 4 2 1 2 1 1
[2776] 2 4 5 2 3 2 1 1 4 4 5 1 5 3 5 5 3 3 4 1 4 3 2 2 4 1 4 3 3 2 3 4 2 2 2 3 5
[2813] 1 3 5 4 1 3 2 5 4 5 4 3 3 1 5 1 2 3 1 3 4 1 3 3 4 2 4 5 4 4 3 4 2 2 5 4 2
[2850] 2 3 3 3 2 3 3 5 3 2 4 2 2 2 3 2 3 3 3 3 1 5 3 1 2 4 1 1 1 2 5 1 5 4 2 1 3
[2887] 3 3 1 4 4 2 3 3 3 4 5 2 5 2 3 4 3 2 5 3 1 1 3 5 5 3 3 3 4 4 4 5 5 2 1 5 5
[2924] 1 1 3 4 1 4 2 5 5 4 2 4 3 4 2 1 2 4 3 2 1 2 1 4 3 5 1 4 4 3 1 2 5 1 2 4 5
[2961] 5 3 2 1 4 2 5 3 1 2 3 4 1 1 4 4 4 2 2 3 3 5 2 2 2 1 5 3 3 3 2 4 5 4 2 2 5
[2998] 4 1 1 2 5 4 3 3 4 5 3 1 2 4 3 4 5 5 3 5 2 4 2 3 2 5 4 1 4 4 3 1 5 1 1 5 3
[3035] 5 2 3 3 2 5 5 4 2 4 3 1 1 1 3 5 2 3 3 1 5 2 4 1 3 1 1 1 4 1 2 2 3 1 3 1 5
[3072] 1 5 2 4 4 3 1 3 2 5 2 5 4 3 3 2 5 1 5 4 1 1 1 5 4 3 1 3 3 3 5 3 2 3 5 4 1
[3109] 1 4 3 1 3 4 3 5 5 5 2 5 4 4 4 3 2 3 4 1 4 5 3 2 5 2 4 5 2 1 1 3 2 4 2 2 4
[3146] 3 4 1 5 5 5 4 1 4 2 4 2 1 4 5 5 2 2 2 3 4 5 3 2 1 4 5 5 3 3 4 3 3 2 1 3 2
[3183] 4 2 4 3 4 2 2 3 3 5 4 3 4 3 5 4 3 1 4 2 2 3 5 5 1 2 3 4 1 4 4 2 3 5 1 1 3
[3220] 3 4 4 5 2 3 1 5 1 4 2 5 1 4 1 2 1 2 5 1 2 3 4 3 1 4 3 3 3 4 3 3 4 1 3 1 4
[3257] 2 2 2 4 4 5 1 4 1 2 4 2 2 3 5 3 3 4 3 5 2 2 4 5 1 5 1 1 5 1 2 4 4 4 5 5 1
[3294] 1 3 3 4 1 4 4 2 1 3 5 2 5 2 2 5 2 3 2 4 1 3 2 4 3 2 4 2 2 1 3 2 5 2 2 2 4
[3331] 3 2 1 4 3 2 2 3 3 3 2 3 4 1 2 4 1 2 2 4 3 2 2 2 4 4 4 2 5 3 4 2 3 4 2 1 3
[3368] 2 4 3 3 4 4 5 4 1 3 1 3 5 2 4 4 5 2 3 5 3 3 5 1 4 3 1 4 3 4 5 3 1 1 2 5 5
[3405] 5 5 1 1 3 2 3 1 3 2 4 4 4 3 1 5 4 1 1 3 3 1 4 4 2 3 3 2 4 2 3 5 5 4 4 4 4
[3442] 3 2 2 2 2 5 3 5 1 5 2 5 4 3 1 1 3 4 3 1 5 5 4 2 4 1 2 1 3 5 2 4 5 3 1 2 4
[3479] 5 4 4 4 4 4 5 3 5 1 4 4 5 4 1 4 4 4 3 5 1 1 2 3 1 1 5 5 5 2 5 1 5 5 2 1 5
[3516] 1 4 3 4 4 4 2 5 4 1 3 1 5 3 1 3 3 2 5 4 4 1 4 3 3 3 3 1 1 3 3 5 2 4 5 4 4
[3553] 2 5 1 3 2 4 2 4 3 4 1 2 5 5 3 4 4 1 1 4 4 1 3 4 1 2 3 5 2 1 5 4 2 4 3 2 1
[3590] 4 2 3 3 5 3 2 5 1 5 5 2 4 3 3 1 5 5 4 2 1 4 1 3 4 4 2 4 5 1 4 2 5 5 4 2 3
[3627] 5 1 3 2 2 4 1 3 3 4 2 1 5 3 2 2 3 3 4 3 1 4 5 5 3 4 1 4 1 3 1 2 2 5 4 4 3
[3664] 4 4 5 4 5 4 5 4 4 5 1 5 5 5 4 3 4 2 1 1 5 3 1 2 5 5 4 1 1 1 1 1 5 2 3 4 2
[3701] 4 1 1 3 5 2 5 4 2 2 4 5 5 5 4 3 5 4 1 3 4 2 5 2 1 5 4 5 5 3 3 5 1 3 1 2 2
[3738] 1 3 1 2 5 3 3 4 2 1 5 4 5 2 3 3 1 4 1 4 4 3 2 5 2 4 2 4 5 2 1 5 2 5 4 4 2
[3775] 2 3 1 2 5 5 4 3 3 4 2 3 1 5 1 4 3 4 5 4 1 3 4 2 5 1 3 3 4 2 4 5 2 3 4 5 1
[3812] 2 3 3 3 1 4 3 5 3 4 3 1 4 1 1 3 2 4 3 5 1 3 2 4 5 2 2 3 2 2 3 3 3 2 4 1 5
[3849] 2 2 3 3 4 4 1 4 1 4 4 1 1 1 4 2 3 4 3 4 2 4 1 1 3 1 4 1 2 5 4 2 2 4 3 4 1
[3886] 2 5 1 1 1 5 4 4 1 4 4 5 5 5 3 5 5 2 4 2 1 5 3 4 5 4 1 2 2 2 3 2 3 5 3 2 5
[3923] 3 2 4 5 4 4 4 3 4 5 5 5 3 3 5 2 1 2 3 5 5 4 2 3 3 2 1 1 1 5 1 1 4 3 2 2 1
[3960] 2 2 1 3 3 2 3 4 4 3 4 3 1 2 4 5 3 4 4 1 3 1 1 5 4 5 4 4 2 1 1 3 5 4 2 5 1
[3997] 3 4 5 3 2 5 5 3 4 4 4 3 5 4 1 2 2 5 3 4 1 3 3 4 5 5 4 1 3 4 3 4 3 1 5 1 3
[4034] 5 2 2 2 5 2 4 2 1 1 5 5 3 5 5 1 3 5 4 5 4 5 2 5 1 4 1 1 1 2 5 1 4 1 3 3 5
[4071] 1 3 1 1 5 5 1 4 4 4 4 2 2 1 3 5 2 4 3 3 2 1 1 4 3 4 3 5 2 3 1 5 3 4 2 5 3
[4108] 5 3 2 4 1 3 3 5 1 2 5 4 3 1 2 3 5 5 3 3 3 3 5 2 2 2 4 2 3 2 1 3 3 4 5 4 4
[4145] 3 2 5 2 2 3 3 2 4 5 2 2 2 4 4 4 4 2 4 2 3 1 3 1 1 3 4 5 5 3 4 2 2 1 3 2 3
[4182] 2 4 5 3 1 3 4 4 4 4 4 2 5 1 1 4 2 3 4 2 4 5 3 2 4 4 5 4 5 4 2 2 2 4 3 5 5
[4219] 3 1 1 5 3 3 5 4 3 3 4 1 2 4 1 3 2 3 2 5 1 4 2 3 4 1 4 1 5 4 2 3 2 5 4 5 4
[4256] 1 1 2 1 2 1 4 5 1 5 5 4 5 5 3 5 1 3 1 3 5 4 4 1 5 3 2 3 1 2 2 3 4 1 5 5 2
[4293] 3 2 3 3 5 4 4 2 2 3 1 3 4 3 1 3 1 5 5 1 3 4 2 4 5 4 3 5 3 3 2 2 2 1 5 4 1
[4330] 2 1 4 1 4 4 5 5 1 1 4 2 2 3 3 2 4 5 1 2 1 2 5 3 4 1 3 2 4 1 3 5 1 4 5 5 1
[4367] 3 2 1 1 3 5 5 4 2 3 2 3 4 1 3 4 5 1 3 5 2 4 5 4 2 1 3 5 3 2 3 4 1 5 2 2 2
[4404] 3 5 5 2 1 4 2 4 4 5 5 4 2 3 1 5 4 3 5 3 2 1 4 1 4 4 3 2 1 5 4 1 4 5 4 5 5
[4441] 2 5 3 4 3 3 1 1 5 1 2 2 4 4 2 1 1 3 4 1 4 5 4 1 3 5 3 4 5 1 4 1 2 3 4 2 3
[4478] 2 5 5 3 5 4 1 4 2 3 5 1 4 1 4 3 1 5 2 4 3 1 2 2 3 1 3 3 5 2 1 3 2 5 4 5 3
[4515] 2 2 4 5 5 3 5 2 2 2 5 1 1 4 3 5 2 4 4 5 5 4 5 2 1 3 4 5 1 3 3 4 2 2 3 5 3
[4552] 2 2 3 5 1 4 5 3 3 1 5 5 4 2 2 3 3 2 5 5 1 2 5 4 4 3 2 1 2 5 2 4 2 2 1 2 3
[4589] 5 4 3 3 1 4 4 1 4 2 4 5 2 5 3 5 1 5 3 1 2 1 2 3 3 2 4 4 3 4 4 2 1 5 4 3 2
[4626] 4 5 5 1 5 1 4 1 2 3 4 3 1 4 4 5 4 4 3 2 3 3 5 2 3 4 5 5 4 1 3 3 2 3 1 1 4
[4663] 2 3 1 4 4 1 4 5 5 5 4 3 3 4 3 4 4 3 5 5 3 4 3 1 4 2 5 1 4 4 4 1 4 2 1 5 1
[4700] 2 5 4 4 4 4 3 5 3 5 4 2 5 5 3 4 4 5 5 5 3 2 4 3 2 5 3 3 3 5 3 4 2 3 2 3 4
[4737] 3 5 4 3 3 2 3 4 1 4 3 3 4 5 4 5 4 3 1 2 5 2 2 1 2 3 2 5 5 3 4 1 3 5 5 4 5
[4774] 4 3 5 3 4 3 1 4 2 3 5 5 2 2 4 3 3 4 2 5 3 2 4 3 3 4 1 4 1 3 5 5 5 4 2 4 1
[4811] 5 2 2 3 3 4 4 5 5 1 3 3 5 1 1 3 2 5 1 1 2 5 4 2 2 2 4 2 2 5 5 4 5 4 2 3 4
[4848] 2 3 1 4 2 1 1 1 2 3 2 3 3 5 3 1 3 5 4 3 3 5 3 2 2 1 2 3 2 4 1 1 3 1 5 1 2
[4885] 4 5 4 2 1 4 2 4 3 2 4 3 2 5 4 5 4 3 4 5 1 3 2 5 4 4 2 3 4 2 5 3 4 3 4 2 1
[4922] 1 4 4 2 3 4 3 5 5 2 2 4 4 4 5 4 1 4 3 1 4 1 3 1 4 3 5 4 5 4 4 1 3 1 3 3 4
[4959] 5 4 3 5 4 3 4 4 1 2 1 3 2 1 2 1 4 1 3 2 5 3 2 4 5 3 3 2 2 2 2 4 4 4 1 2 5
[4996] 4 4 1 2 5 1 2 3 2 3 1 5 5 2 3 1 4 3 2 2 3 1 2 4 4 4 5 5 4 1 1 1 3 2 4 4 3
[5033] 4 4 3 5 5 1 3 3 5 2 5 3 1 4 1 1 2 3 4 2 4 3 4 5 3 2 4 5 2 3 5 5 2 5 3 5 4
[5070] 4 3 3 2 2 4 1 2 4 5 3 4 4 4 1 4 3 1 4 3 5 1 2 2 2 3 2 2 4 1 3 5 3 4 4 3 2
[5107] 3 1 2 1 1 1 3 3 2 3 5 3 4 2 4 4 1 3 5 5 1 2 1 1 2 2 4 2 4 2 1 2 2 4 2 2 3
[5144] 4 2 4 2 1 1 3 1 5 3 1 3 2 5 3 4 3 5 4 5 2 2 4 3 4 2 1 2 2 5 5 5 4 4 4 2 4
[5181] 3 2 2 5 5 5 3 3 2 4 5 2 3 4 3 5 4 2 2 4 5 1 3 4 1 2 2 4 2 5 4 3 4 1 3 5 2
[5218] 3 2 5 4 5 1 1 2 5 4 3 2 2 1 1 5 1 1 3 5 4 5 3 3 3 3 4 5 3 5 4 1 5 5 2 1 1
[5255] 1 5 4 2 2 3 4 3 3 5 2 5 4 5 5 4 4 2 5 2 3 4 1 4 3 1 5 2 5 5 2 4 1 2 1 2 1
[5292] 1 3 5 2 3 2 1 5 4 4 2 1 4 2 3 1 1 5 3 2 1 5 3 3 3 3 1 5 4 3 1 5 5 5 5 1 5
[5329] 5 4 2 3 3 5 4 5 5 3 2 1 3 2 2 2 3 2 4 3 3 2 2 3 4 3 4 4 5 1 2 3 5 3 5 3 3
[5366] 5 1 3 1 5 4 2 5 3 2 4 4 4 2 4 2 5 1 4 2 5 3 2 5 5 5 4 5 1 2 2 2 4 3 1 3 5
[5403] 5 4 2 1 1 4 1 3 2 5 2 1 4 1 2 2 2 2 3 5 2 5 3 1 2 4 1 3 5 4 5 5 2 3 3 4 4
[5440] 1 4 5 4 1 5 5 4 3 4 5 5 3 2 1 2 3 3 2 1 5 2 5 3 3 5 3 3 3 4 4 5 4 4 2 2 5
[5477] 4 5 2 2 5 3 1 1 2 2 3 4 2 4 2 1 3 5 5 5 1 2 4 1 4 1 1 3 1 2 3 3 1 1 4 4 1
[5514] 3 2 2 3 5 5 3 5 3 5 1 3 4 2 3 3 4 5 4 5 1 4 3 1 3 3 3 5 5 2 4 1 5 5 5 3 1
[5551] 5 5 2 2 4 4 5 2 5 1 3 3 4 2 5 1 4 5 4 1 3 3 5 5 5 3 3 5 1 1 3 5 1 5 1 3 4
[5588] 5 4 1 4 3 4 5 4 3 4 1 2 1 3 2 1 2 1 2 4 4 1 1 4 5 1 4 5 3 4 2 2 4 3 5 2 3
[5625] 1 3 3 5 2 4 5 3 2 1 4 3 3 3 3 4 3 3 1 1 4 1 1 2 5 3 4 5 4 1 2 4 2 4 1 4 1
[5662] 4 3 2 5 2 5 1 4 2 3 3 1 2 5 5 2 3 3 1 2 4 3 2 2 4 5 4 2 3 2 2 4 2 3 4 5 3
[5699] 4 1 3 5 4 5 4 2 4 2 2 1 4 5 3 4 4 4 5 2 4 2 2 3 1 2 3 2 4 1 2 1 4 2 5 2 2
[5736] 1 3 3 3 5 3 4 4 1 4 3 4 2 3 3 2 1 1 1 1 2 4 3 3 3 5 4 1 3 3 3 1 1 5 4 2 1
[5773] 1 4 1 5 4 3 3 4 3 3 1 3 3 2 5 2 4 3 5 1 4 3 3 5 4 4 2 4 4 4 1 4 5 5 5 2 2
[5810] 3 5 1 5 3 3 3 1 4 3 2 4 3 2 4 3 5 3 4 5 5 1 4 5 5 3 3 3 4 2 3 1 2 2 3 2 3
[5847] 3 4 3 1 1 2 3 4 5 4 2 5 4 3 2 4 3 2 1 4 3 2 5 1 5 4 1 5 4 4 2 4 1 4 3 1 4
[5884] 3 3 4 1 3 2 1 5 2 4 4 3 3 5 2 3 3 3 2 5 3 3 5 2 1 5 4 3 2 2 2 5 1 5 4 2 3
[5921] 3 1 1 3 4 3 2 3 2 4 3 5 2 3 1 2 4 3 2 4 2 3 1 5 3 2 4 5 5 4 2 5 4 2 4 5 2
[5958] 5 2 3 4 1 3 1 2 1 4 3 1 2 5 4 3 5 3 5 4 2 2 4 1 4 1 1 5 5 2 3 3 4 3 2 4 5
[5995] 1 2 2 3 2 4 3 2 3 5 4 2 3 4 4 1 3 5 3 4 5 2 3 4 1 4 2 5 4 3 4 3 3 4 5 5 2
[6032] 3 4 5 3 5 5 3 2 4 5 1 4 2 3 5 3 4 1 1 5 5 4 5 4 3 2 3 1 5 5 4 2 5 5 1 1 1
[6069] 4 1 3 3 5 5 2 1 1 1 4 4 4 5 1 4 4 3 4 3 4 5 5 2 4 3 4 1 2 1 3 3 2 3 4 3 1
[6106] 1 5 2 2 3 2 2 5 3 2 1 3 5 5 2 4 4 2 4 5 4 2 2 3 2 4 1 5 3 5 2 2 2 3 4 3 1
[6143] 5 4 4 2 5 4 4 4 4 5 5 2 5 3 1 4 2 2 4 3 3 1 1 4 3 2 1 4 5 3 4 5 5 5 3 3 4
[6180] 1 5 2 4 4 5 2 1 1 2 2 4 1 2 4 3 5 5 3 3 4 5 4 1 1 4 4 2 1 1 2 5 1 5 5 2 1
[6217] 4 3 2 4 3 2 3 3 2 2 5 3 2 1 1 4 4 3 4 1 4 5 4 2 3 1 4 4 3 1 1 1 1 3 5 3 1
[6254] 2 1 5 4 3 4 3 2 2 4 3 1 5 1 4 4 1 3 4 4 3 2 4 1 4 1 5 1 4 4 5 3 4 3 5 3 4
[6291] 3 2 2 2 3 2 2 4 3 4 1 5 5 2 5 5 3 5 4 4 3 2 2 4 2 3 2 1 2 2 3 1 2 2 5 3 2
[6328] 3 2 3 2 4 5 3 1 5 2 5 4 2 4 1 3 3 3 4 1 2 5 1 4 4 5 1 3 3 4 1 5 4 2 4 4 3
[6365] 4 4 5 3 1 1 5 3 2 4 5 2 2 2 4 1 5 4 2 3 5 4 3 5 1 4 2 4 3 2 5 1 1 2 5 3
Labels:
value label
1 Highly dissatisfied
2 Somewhat dissatisfied
3 Neutral
4 Somewhat satisfied
5 Highly satisfied
11.8.2 Efficient way
- To utilize rich label information contained in value labels, use haven::as_factor()
```{r old}
theme_set(theme_minimal())
data %>%
ggplot(aes(haven::as_factor(marital))) +
geom_bar(fill='blue')
# get label of the variable to use it as a subtitle of the chart.
var_label(data$marital)
data %>%
ggplot(aes(x=haven::as_factor(marital), y = after_stat(prop), group = 1)) +
geom_bar(fill = 'blue') +
scale_y_continuous(labels = scales::percent) +
labs(title = "Marital Status in the demo data",
subtitle = "# of participants who are married are about the same as \nthose who are not married ",
x = "Marital Status",
y = "")
var_label(data$inccat)
data %>%
ggplot() +
geom_bar(aes(x= haven::as_factor(inccat), y = after_stat(prop), group = 1), fill = 'purple') +
scale_y_continuous(labels = scales::percent) +
labs(title = "Income categories in the demo data",
subtitle = "Distribution of income",
x = "Income category in thousands",
y = "",
caption = ""
)
var_label(data$carcat)
data %>%
ggplot() +
geom_bar(aes(x= haven::as_factor(carcat),
y = after_stat(prop), group = 1), fill = 'orange') +
scale_y_continuous(labels = scales::percent) +
labs(title = "Primary vehicle price category in the demo data",
subtitle = "",
x = "Primary vehicle price category",
y = "",
caption = ""
)
var_label(data$jobsat)
val_labels(data$jobsat)
data %>%
ggplot() +
geom_bar(aes(x= haven::as_factor(jobsat),
y = after_stat(prop), group = 1), fill = '#D35400') +
scale_y_continuous(labels = scales::percent) +
coord_flip() +
labs(title = "Job satisfaction in the demo data",
#subtitle = "",
x = "Job satisfaction",
y = "",
caption = ""
)
```
[1] "Marital status"
[1] "Income category in thousands"
[1] "Primary vehicle price category"
[1] "Job satisfaction"
<labelled<double>[6400]>: Job satisfaction
[1] 5 4 3 1 2 2 2 1 5 4 3 5 2 1 4 5 1 4 3 5 3 3 3 3 4 3 1 5 5 4 1 4 1 2 5 1 1
[38] 3 4 3 3 2 5 2 3 1 2 3 2 3 2 5 4 2 2 1 3 3 3 3 5 3 2 2 1 1 4 3 4 2 5 5 4 5
[75] 2 3 5 5 5 3 3 4 4 2 2 5 4 3 4 4 5 4 1 3 5 1 4 2 2 2 3 3 4 3 3 2 5 4 1 4 4
[112] 2 2 2 1 4 1 1 1 5 3 2 5 1 4 3 2 2 1 3 2 1 1 2 1 3 1 3 4 1 4 3 1 5 1 4 2 2
[149] 2 5 2 2 2 4 4 5 2 2 1 2 3 4 3 1 2 3 4 2 2 2 2 1 3 4 4 1 4 4 4 2 1 4 3 2 2
[186] 3 4 5 5 5 3 2 1 5 5 2 1 3 5 3 3 5 3 4 2 4 2 1 5 2 5 3 5 4 5 3 4 2 4 4 1 1
[223] 2 2 4 1 1 5 1 3 2 1 4 5 1 3 4 4 5 5 5 5 4 2 2 4 1 4 5 5 5 4 2 5 5 1 4 5 1
[260] 5 2 4 5 4 1 3 5 5 2 3 5 1 4 5 5 4 3 4 2 4 3 3 4 3 1 2 5 2 2 3 2 5 4 2 4 3
[297] 2 2 1 1 4 1 2 4 5 5 3 1 2 4 5 4 4 4 4 5 3 4 1 5 1 4 4 3 4 1 1 4 5 5 2 3 3
[334] 3 3 2 1 5 3 2 1 4 4 3 3 3 1 5 1 4 1 5 5 1 3 5 3 5 5 3 2 2 3 4 4 5 4 5 2 1
[371] 5 5 2 3 5 2 3 3 3 3 3 4 2 1 5 4 4 3 2 1 3 5 4 3 2 4 4 4 1 4 4 1 1 1 1 5 3
[408] 3 4 4 4 2 3 1 2 2 4 4 3 5 3 3 3 5 4 3 4 5 2 3 4 4 4 1 4 2 3 2 3 5 3 5 1 2
[445] 1 5 4 1 5 5 2 5 3 4 4 5 3 4 2 4 4 4 1 3 3 5 2 3 2 3 4 3 3 4 2 2 1 3 5 4 3
[482] 3 1 3 4 1 1 1 3 5 4 2 2 4 1 4 5 3 3 1 2 4 1 3 4 5 3 3 5 4 2 3 1 2 4 3 2 4
[519] 5 3 4 5 1 3 4 2 1 5 2 1 4 2 1 5 4 3 4 2 5 4 1 3 5 4 1 2 3 2 4 2 2 4 2 4 3
[556] 2 1 1 1 3 1 4 2 4 2 4 3 1 5 5 4 2 3 4 3 1 1 4 1 1 1 1 2 3 3 4 3 1 4 4 2 5
[593] 4 2 5 4 5 4 2 2 3 4 4 4 3 3 4 5 5 2 3 1 5 5 4 3 2 1 3 5 5 3 3 1 3 4 3 3 2
[630] 3 1 5 5 4 3 1 4 2 2 2 3 1 5 3 5 1 2 4 2 3 2 2 2 4 2 3 2 2 5 2 4 3 2 4 2 4
[667] 4 1 2 3 1 5 1 4 5 3 3 2 4 4 3 3 2 5 1 4 2 5 4 2 4 1 1 4 1 2 3 2 2 5 5 3 4
[704] 5 4 5 1 5 3 4 2 2 3 3 1 2 1 3 3 3 2 2 2 3 2 4 2 5 1 5 2 4 1 4 4 4 3 2 5 4
[741] 4 2 4 4 3 2 3 4 2 5 3 1 2 3 5 5 1 3 5 2 3 4 5 5 2 2 4 3 5 2 4 4 5 3 1 3 5
[778] 1 5 5 2 2 5 4 2 2 2 3 1 3 5 3 1 3 4 2 5 3 5 5 3 2 3 4 1 5 5 1 4 1 2 2 1 2
[815] 2 5 2 3 5 2 3 2 4 2 1 3 3 2 5 3 5 3 1 3 5 4 1 3 1 5 4 2 4 3 1 1 1 4 5 3 3
[852] 1 5 2 2 4 4 1 1 2 5 4 1 4 4 2 5 1 2 4 3 3 2 1 3 5 4 2 3 3 3 2 3 3 1 4 1 3
[889] 2 3 1 3 2 1 3 5 4 3 3 1 5 3 4 4 2 2 3 5 3 5 3 5 3 4 4 3 3 3 1 1 1 4 3 5 4
[926] 2 2 3 2 5 3 1 3 5 3 2 4 1 3 5 5 1 5 5 2 4 5 4 4 2 3 3 2 1 3 5 3 2 5 4 2 2
[963] 5 3 4 3 1 1 1 1 5 4 5 5 2 5 5 2 3 4 4 4 4 4 3 1 2 5 4 4 3 4 1 4 4 5 3 3 1
[1000] 4 1 4 1 2 5 5 4 3 5 3 4 4 1 4 1 5 3 2 4 5 5 5 1 2 5 5 4 5 4 3 4 3 4 5 4 1
[1037] 5 4 4 2 2 3 1 1 2 4 3 3 5 4 4 2 2 3 2 3 2 3 5 3 5 1 2 4 1 1 3 2 5 3 3 4 5
[1074] 2 2 5 3 4 3 5 3 1 5 3 4 2 1 5 2 3 3 4 2 2 3 3 3 1 2 5 5 3 3 2 4 5 4 5 4 5
[1111] 1 4 1 2 4 3 2 5 1 3 1 1 2 3 5 3 3 1 3 1 5 3 4 2 1 3 1 2 2 5 5 4 5 1 1 4 5
[1148] 4 3 4 2 3 4 5 1 2 3 2 5 5 2 2 1 3 3 2 3 2 5 3 1 1 3 2 3 4 2 1 2 4 2 3 4 5
[1185] 3 1 4 4 5 4 4 1 5 5 5 4 2 2 5 5 2 1 2 5 2 1 4 2 4 1 5 3 1 1 2 1 2 4 4 4 1
[1222] 2 3 4 3 5 5 1 1 5 4 5 4 1 3 4 2 2 2 3 4 4 1 4 2 3 5 5 3 4 4 1 3 1 3 5 4 4
[1259] 2 1 4 5 5 2 5 2 5 4 3 1 4 5 5 4 1 4 5 5 3 1 3 4 4 2 5 3 4 5 2 2 2 4 3 2 1
[1296] 3 1 4 4 2 4 1 3 2 2 4 3 1 2 1 5 3 4 5 5 4 2 4 2 1 5 4 5 4 4 5 3 4 1 4 4 3
[1333] 3 3 4 5 5 1 3 3 4 4 3 4 1 3 1 4 4 1 1 2 5 4 2 2 2 5 2 2 3 1 3 3 2 2 4 5 2
[1370] 4 1 2 2 2 1 3 5 4 3 3 2 3 3 5 4 3 1 3 1 4 4 2 3 3 3 4 5 4 1 3 3 2 5 5 1 1
[1407] 2 2 1 2 5 3 4 4 3 2 1 4 2 2 5 4 3 3 4 1 1 5 2 1 4 2 4 1 5 5 2 4 1 1 2 3 2
[1444] 4 3 4 5 4 4 5 1 5 5 5 1 1 5 5 5 4 2 5 1 3 3 5 1 1 1 5 2 4 1 2 5 4 2 1 4 3
[1481] 1 3 5 3 2 3 5 2 1 3 2 4 4 1 5 1 3 2 5 5 5 4 4 5 2 5 4 3 1 1 1 5 3 5 2 5 2
[1518] 3 5 2 2 5 2 2 5 4 4 1 1 2 5 4 1 2 1 4 2 1 3 2 2 5 3 5 1 4 3 1 5 1 4 3 5 2
[1555] 3 1 4 2 4 4 2 2 2 5 2 1 1 2 1 4 4 2 3 5 1 4 3 3 3 5 2 3 1 5 4 5 5 2 5 2 2
[1592] 2 2 5 3 4 2 5 1 2 3 1 3 5 3 4 4 5 1 3 5 4 2 5 1 4 1 2 5 4 3 3 3 5 3 2 2 1
[1629] 1 3 4 2 4 3 2 5 4 1 1 4 4 5 5 3 1 5 4 4 2 2 3 5 4 2 2 3 5 5 2 2 1 1 2 4 2
[1666] 5 5 4 1 2 4 3 5 4 1 1 1 5 2 2 1 4 3 4 3 5 1 5 5 1 3 4 5 5 2 3 5 4 1 3 3 3
[1703] 2 2 3 5 3 3 1 1 2 5 1 5 3 2 2 3 2 5 2 3 4 1 2 4 4 4 1 5 2 5 2 2 5 1 5 1 2
[1740] 1 4 4 4 4 2 3 1 5 5 3 5 3 3 1 5 3 5 1 5 1 4 1 4 3 1 4 5 2 5 2 5 3 3 3 2 3
[1777] 2 2 1 4 4 1 4 2 4 4 1 3 4 2 3 3 3 1 1 5 3 2 3 4 1 2 3 3 5 4 2 5 1 4 2 5 2
[1814] 5 4 2 3 5 1 1 4 1 3 4 5 2 5 5 1 2 3 3 5 5 5 2 2 2 5 1 2 2 5 5 3 4 4 3 3 5
[1851] 1 5 3 2 1 2 4 2 4 2 2 1 1 5 5 1 2 4 2 2 2 1 1 1 5 5 5 1 2 1 2 5 3 5 3 5 3
[1888] 5 3 3 3 5 4 4 1 2 2 4 1 3 3 1 4 4 1 5 5 2 5 4 4 3 1 4 5 1 4 5 3 4 1 1 3 2
[1925] 5 4 5 1 1 2 4 1 1 5 4 3 5 2 4 3 5 3 4 4 3 1 5 4 1 3 2 5 4 4 4 2 3 1 4 3 1
[1962] 2 1 5 3 3 5 1 5 2 2 3 3 5 4 3 3 4 1 4 2 3 1 3 3 3 2 3 3 2 2 2 2 4 3 4 2 3
[1999] 4 2 3 2 1 5 3 1 4 4 3 4 3 5 4 4 1 1 2 2 4 3 5 3 2 4 1 4 2 2 1 5 2 5 5 4 3
[2036] 2 1 5 4 1 3 5 4 3 1 1 2 5 5 4 4 2 3 5 4 4 3 4 3 5 3 5 1 5 1 3 3 4 2 4 5 5
[2073] 5 1 5 5 1 3 3 4 2 5 4 3 1 5 2 4 3 3 5 2 3 3 1 5 4 5 3 5 4 4 3 3 5 1 1 3 3
[2110] 4 2 2 5 5 3 5 3 3 5 3 5 4 5 4 3 2 2 5 2 3 1 4 5 2 2 5 4 5 5 4 3 1 3 4 2 2
[2147] 4 2 4 1 3 4 3 2 3 5 3 2 3 4 2 2 5 2 3 2 4 2 4 2 2 2 5 4 5 5 1 4 3 5 4 1 4
[2184] 4 5 2 2 4 1 3 4 1 3 3 3 1 3 2 3 5 1 1 5 1 1 4 2 5 1 3 3 1 3 4 4 3 5 5 3 5
[2221] 4 2 2 1 5 1 3 2 2 1 5 1 1 1 3 1 1 5 5 5 2 1 2 3 4 1 3 2 2 4 3 3 4 5 5 5 1
[2258] 3 2 1 5 5 1 4 5 5 1 1 1 3 4 1 4 2 3 4 4 5 5 5 2 5 5 4 1 4 1 2 3 4 3 2 1 1
[2295] 3 2 5 3 1 4 2 5 4 5 2 1 1 5 5 5 5 5 1 5 1 2 2 1 4 3 5 1 2 3 1 5 5 1 1 5 5
[2332] 5 4 3 1 3 2 2 3 4 3 4 3 3 4 3 5 4 2 4 3 2 1 4 5 2 2 3 2 3 3 4 3 5 2 4 3 5
[2369] 3 4 4 4 1 5 4 3 5 3 4 2 5 4 1 5 4 3 3 2 5 2 3 4 3 4 4 2 4 5 3 2 2 1 3 4 1
[2406] 4 2 5 1 2 1 4 5 4 5 3 2 2 1 4 3 1 5 4 5 4 5 5 3 3 3 5 1 5 2 3 3 3 3 2 5 4
[2443] 4 4 3 2 3 1 5 3 1 5 5 5 1 1 4 1 4 4 1 1 1 4 1 3 5 4 2 5 4 3 2 3 4 2 2 1 1
[2480] 3 3 1 4 5 3 5 5 5 4 5 2 2 4 2 3 2 1 5 3 4 2 2 5 2 1 1 4 3 5 3 4 3 3 4 1 4
[2517] 2 3 2 4 4 2 1 2 5 3 2 1 1 2 3 2 3 1 5 3 2 3 2 4 1 3 3 5 3 2 1 1 1 3 2 4 2
[2554] 3 5 3 3 1 5 2 3 4 3 3 1 1 4 2 3 1 4 1 3 4 4 1 5 4 4 1 4 4 1 2 1 1 4 2 3 5
[2591] 1 5 4 3 2 5 3 2 4 4 5 4 1 3 1 1 2 3 1 5 4 5 4 5 4 2 3 4 3 5 4 4 2 2 3 1 5
[2628] 5 2 2 3 5 3 1 2 5 2 4 1 1 3 2 3 5 2 3 2 1 4 3 1 4 3 1 3 4 4 4 3 3 4 2 3 5
[2665] 3 5 5 2 2 5 4 2 3 4 1 3 4 3 1 3 1 5 4 4 2 1 5 2 4 3 2 1 5 5 5 3 1 2 4 2 3
[2702] 2 2 5 1 3 5 4 2 1 3 3 4 3 4 4 1 4 2 1 5 1 4 1 4 5 2 5 2 4 2 1 4 2 1 4 2 1
[2739] 3 5 3 3 2 2 5 5 4 1 1 3 4 3 5 1 5 3 2 5 1 4 1 4 1 1 5 3 5 2 5 4 2 1 2 1 1
[2776] 2 4 5 2 3 2 1 1 4 4 5 1 5 3 5 5 3 3 4 1 4 3 2 2 4 1 4 3 3 2 3 4 2 2 2 3 5
[2813] 1 3 5 4 1 3 2 5 4 5 4 3 3 1 5 1 2 3 1 3 4 1 3 3 4 2 4 5 4 4 3 4 2 2 5 4 2
[2850] 2 3 3 3 2 3 3 5 3 2 4 2 2 2 3 2 3 3 3 3 1 5 3 1 2 4 1 1 1 2 5 1 5 4 2 1 3
[2887] 3 3 1 4 4 2 3 3 3 4 5 2 5 2 3 4 3 2 5 3 1 1 3 5 5 3 3 3 4 4 4 5 5 2 1 5 5
[2924] 1 1 3 4 1 4 2 5 5 4 2 4 3 4 2 1 2 4 3 2 1 2 1 4 3 5 1 4 4 3 1 2 5 1 2 4 5
[2961] 5 3 2 1 4 2 5 3 1 2 3 4 1 1 4 4 4 2 2 3 3 5 2 2 2 1 5 3 3 3 2 4 5 4 2 2 5
[2998] 4 1 1 2 5 4 3 3 4 5 3 1 2 4 3 4 5 5 3 5 2 4 2 3 2 5 4 1 4 4 3 1 5 1 1 5 3
[3035] 5 2 3 3 2 5 5 4 2 4 3 1 1 1 3 5 2 3 3 1 5 2 4 1 3 1 1 1 4 1 2 2 3 1 3 1 5
[3072] 1 5 2 4 4 3 1 3 2 5 2 5 4 3 3 2 5 1 5 4 1 1 1 5 4 3 1 3 3 3 5 3 2 3 5 4 1
[3109] 1 4 3 1 3 4 3 5 5 5 2 5 4 4 4 3 2 3 4 1 4 5 3 2 5 2 4 5 2 1 1 3 2 4 2 2 4
[3146] 3 4 1 5 5 5 4 1 4 2 4 2 1 4 5 5 2 2 2 3 4 5 3 2 1 4 5 5 3 3 4 3 3 2 1 3 2
[3183] 4 2 4 3 4 2 2 3 3 5 4 3 4 3 5 4 3 1 4 2 2 3 5 5 1 2 3 4 1 4 4 2 3 5 1 1 3
[3220] 3 4 4 5 2 3 1 5 1 4 2 5 1 4 1 2 1 2 5 1 2 3 4 3 1 4 3 3 3 4 3 3 4 1 3 1 4
[3257] 2 2 2 4 4 5 1 4 1 2 4 2 2 3 5 3 3 4 3 5 2 2 4 5 1 5 1 1 5 1 2 4 4 4 5 5 1
[3294] 1 3 3 4 1 4 4 2 1 3 5 2 5 2 2 5 2 3 2 4 1 3 2 4 3 2 4 2 2 1 3 2 5 2 2 2 4
[3331] 3 2 1 4 3 2 2 3 3 3 2 3 4 1 2 4 1 2 2 4 3 2 2 2 4 4 4 2 5 3 4 2 3 4 2 1 3
[3368] 2 4 3 3 4 4 5 4 1 3 1 3 5 2 4 4 5 2 3 5 3 3 5 1 4 3 1 4 3 4 5 3 1 1 2 5 5
[3405] 5 5 1 1 3 2 3 1 3 2 4 4 4 3 1 5 4 1 1 3 3 1 4 4 2 3 3 2 4 2 3 5 5 4 4 4 4
[3442] 3 2 2 2 2 5 3 5 1 5 2 5 4 3 1 1 3 4 3 1 5 5 4 2 4 1 2 1 3 5 2 4 5 3 1 2 4
[3479] 5 4 4 4 4 4 5 3 5 1 4 4 5 4 1 4 4 4 3 5 1 1 2 3 1 1 5 5 5 2 5 1 5 5 2 1 5
[3516] 1 4 3 4 4 4 2 5 4 1 3 1 5 3 1 3 3 2 5 4 4 1 4 3 3 3 3 1 1 3 3 5 2 4 5 4 4
[3553] 2 5 1 3 2 4 2 4 3 4 1 2 5 5 3 4 4 1 1 4 4 1 3 4 1 2 3 5 2 1 5 4 2 4 3 2 1
[3590] 4 2 3 3 5 3 2 5 1 5 5 2 4 3 3 1 5 5 4 2 1 4 1 3 4 4 2 4 5 1 4 2 5 5 4 2 3
[3627] 5 1 3 2 2 4 1 3 3 4 2 1 5 3 2 2 3 3 4 3 1 4 5 5 3 4 1 4 1 3 1 2 2 5 4 4 3
[3664] 4 4 5 4 5 4 5 4 4 5 1 5 5 5 4 3 4 2 1 1 5 3 1 2 5 5 4 1 1 1 1 1 5 2 3 4 2
[3701] 4 1 1 3 5 2 5 4 2 2 4 5 5 5 4 3 5 4 1 3 4 2 5 2 1 5 4 5 5 3 3 5 1 3 1 2 2
[3738] 1 3 1 2 5 3 3 4 2 1 5 4 5 2 3 3 1 4 1 4 4 3 2 5 2 4 2 4 5 2 1 5 2 5 4 4 2
[3775] 2 3 1 2 5 5 4 3 3 4 2 3 1 5 1 4 3 4 5 4 1 3 4 2 5 1 3 3 4 2 4 5 2 3 4 5 1
[3812] 2 3 3 3 1 4 3 5 3 4 3 1 4 1 1 3 2 4 3 5 1 3 2 4 5 2 2 3 2 2 3 3 3 2 4 1 5
[3849] 2 2 3 3 4 4 1 4 1 4 4 1 1 1 4 2 3 4 3 4 2 4 1 1 3 1 4 1 2 5 4 2 2 4 3 4 1
[3886] 2 5 1 1 1 5 4 4 1 4 4 5 5 5 3 5 5 2 4 2 1 5 3 4 5 4 1 2 2 2 3 2 3 5 3 2 5
[3923] 3 2 4 5 4 4 4 3 4 5 5 5 3 3 5 2 1 2 3 5 5 4 2 3 3 2 1 1 1 5 1 1 4 3 2 2 1
[3960] 2 2 1 3 3 2 3 4 4 3 4 3 1 2 4 5 3 4 4 1 3 1 1 5 4 5 4 4 2 1 1 3 5 4 2 5 1
[3997] 3 4 5 3 2 5 5 3 4 4 4 3 5 4 1 2 2 5 3 4 1 3 3 4 5 5 4 1 3 4 3 4 3 1 5 1 3
[4034] 5 2 2 2 5 2 4 2 1 1 5 5 3 5 5 1 3 5 4 5 4 5 2 5 1 4 1 1 1 2 5 1 4 1 3 3 5
[4071] 1 3 1 1 5 5 1 4 4 4 4 2 2 1 3 5 2 4 3 3 2 1 1 4 3 4 3 5 2 3 1 5 3 4 2 5 3
[4108] 5 3 2 4 1 3 3 5 1 2 5 4 3 1 2 3 5 5 3 3 3 3 5 2 2 2 4 2 3 2 1 3 3 4 5 4 4
[4145] 3 2 5 2 2 3 3 2 4 5 2 2 2 4 4 4 4 2 4 2 3 1 3 1 1 3 4 5 5 3 4 2 2 1 3 2 3
[4182] 2 4 5 3 1 3 4 4 4 4 4 2 5 1 1 4 2 3 4 2 4 5 3 2 4 4 5 4 5 4 2 2 2 4 3 5 5
[4219] 3 1 1 5 3 3 5 4 3 3 4 1 2 4 1 3 2 3 2 5 1 4 2 3 4 1 4 1 5 4 2 3 2 5 4 5 4
[4256] 1 1 2 1 2 1 4 5 1 5 5 4 5 5 3 5 1 3 1 3 5 4 4 1 5 3 2 3 1 2 2 3 4 1 5 5 2
[4293] 3 2 3 3 5 4 4 2 2 3 1 3 4 3 1 3 1 5 5 1 3 4 2 4 5 4 3 5 3 3 2 2 2 1 5 4 1
[4330] 2 1 4 1 4 4 5 5 1 1 4 2 2 3 3 2 4 5 1 2 1 2 5 3 4 1 3 2 4 1 3 5 1 4 5 5 1
[4367] 3 2 1 1 3 5 5 4 2 3 2 3 4 1 3 4 5 1 3 5 2 4 5 4 2 1 3 5 3 2 3 4 1 5 2 2 2
[4404] 3 5 5 2 1 4 2 4 4 5 5 4 2 3 1 5 4 3 5 3 2 1 4 1 4 4 3 2 1 5 4 1 4 5 4 5 5
[4441] 2 5 3 4 3 3 1 1 5 1 2 2 4 4 2 1 1 3 4 1 4 5 4 1 3 5 3 4 5 1 4 1 2 3 4 2 3
[4478] 2 5 5 3 5 4 1 4 2 3 5 1 4 1 4 3 1 5 2 4 3 1 2 2 3 1 3 3 5 2 1 3 2 5 4 5 3
[4515] 2 2 4 5 5 3 5 2 2 2 5 1 1 4 3 5 2 4 4 5 5 4 5 2 1 3 4 5 1 3 3 4 2 2 3 5 3
[4552] 2 2 3 5 1 4 5 3 3 1 5 5 4 2 2 3 3 2 5 5 1 2 5 4 4 3 2 1 2 5 2 4 2 2 1 2 3
[4589] 5 4 3 3 1 4 4 1 4 2 4 5 2 5 3 5 1 5 3 1 2 1 2 3 3 2 4 4 3 4 4 2 1 5 4 3 2
[4626] 4 5 5 1 5 1 4 1 2 3 4 3 1 4 4 5 4 4 3 2 3 3 5 2 3 4 5 5 4 1 3 3 2 3 1 1 4
[4663] 2 3 1 4 4 1 4 5 5 5 4 3 3 4 3 4 4 3 5 5 3 4 3 1 4 2 5 1 4 4 4 1 4 2 1 5 1
[4700] 2 5 4 4 4 4 3 5 3 5 4 2 5 5 3 4 4 5 5 5 3 2 4 3 2 5 3 3 3 5 3 4 2 3 2 3 4
[4737] 3 5 4 3 3 2 3 4 1 4 3 3 4 5 4 5 4 3 1 2 5 2 2 1 2 3 2 5 5 3 4 1 3 5 5 4 5
[4774] 4 3 5 3 4 3 1 4 2 3 5 5 2 2 4 3 3 4 2 5 3 2 4 3 3 4 1 4 1 3 5 5 5 4 2 4 1
[4811] 5 2 2 3 3 4 4 5 5 1 3 3 5 1 1 3 2 5 1 1 2 5 4 2 2 2 4 2 2 5 5 4 5 4 2 3 4
[4848] 2 3 1 4 2 1 1 1 2 3 2 3 3 5 3 1 3 5 4 3 3 5 3 2 2 1 2 3 2 4 1 1 3 1 5 1 2
[4885] 4 5 4 2 1 4 2 4 3 2 4 3 2 5 4 5 4 3 4 5 1 3 2 5 4 4 2 3 4 2 5 3 4 3 4 2 1
[4922] 1 4 4 2 3 4 3 5 5 2 2 4 4 4 5 4 1 4 3 1 4 1 3 1 4 3 5 4 5 4 4 1 3 1 3 3 4
[4959] 5 4 3 5 4 3 4 4 1 2 1 3 2 1 2 1 4 1 3 2 5 3 2 4 5 3 3 2 2 2 2 4 4 4 1 2 5
[4996] 4 4 1 2 5 1 2 3 2 3 1 5 5 2 3 1 4 3 2 2 3 1 2 4 4 4 5 5 4 1 1 1 3 2 4 4 3
[5033] 4 4 3 5 5 1 3 3 5 2 5 3 1 4 1 1 2 3 4 2 4 3 4 5 3 2 4 5 2 3 5 5 2 5 3 5 4
[5070] 4 3 3 2 2 4 1 2 4 5 3 4 4 4 1 4 3 1 4 3 5 1 2 2 2 3 2 2 4 1 3 5 3 4 4 3 2
[5107] 3 1 2 1 1 1 3 3 2 3 5 3 4 2 4 4 1 3 5 5 1 2 1 1 2 2 4 2 4 2 1 2 2 4 2 2 3
[5144] 4 2 4 2 1 1 3 1 5 3 1 3 2 5 3 4 3 5 4 5 2 2 4 3 4 2 1 2 2 5 5 5 4 4 4 2 4
[5181] 3 2 2 5 5 5 3 3 2 4 5 2 3 4 3 5 4 2 2 4 5 1 3 4 1 2 2 4 2 5 4 3 4 1 3 5 2
[5218] 3 2 5 4 5 1 1 2 5 4 3 2 2 1 1 5 1 1 3 5 4 5 3 3 3 3 4 5 3 5 4 1 5 5 2 1 1
[5255] 1 5 4 2 2 3 4 3 3 5 2 5 4 5 5 4 4 2 5 2 3 4 1 4 3 1 5 2 5 5 2 4 1 2 1 2 1
[5292] 1 3 5 2 3 2 1 5 4 4 2 1 4 2 3 1 1 5 3 2 1 5 3 3 3 3 1 5 4 3 1 5 5 5 5 1 5
[5329] 5 4 2 3 3 5 4 5 5 3 2 1 3 2 2 2 3 2 4 3 3 2 2 3 4 3 4 4 5 1 2 3 5 3 5 3 3
[5366] 5 1 3 1 5 4 2 5 3 2 4 4 4 2 4 2 5 1 4 2 5 3 2 5 5 5 4 5 1 2 2 2 4 3 1 3 5
[5403] 5 4 2 1 1 4 1 3 2 5 2 1 4 1 2 2 2 2 3 5 2 5 3 1 2 4 1 3 5 4 5 5 2 3 3 4 4
[5440] 1 4 5 4 1 5 5 4 3 4 5 5 3 2 1 2 3 3 2 1 5 2 5 3 3 5 3 3 3 4 4 5 4 4 2 2 5
[5477] 4 5 2 2 5 3 1 1 2 2 3 4 2 4 2 1 3 5 5 5 1 2 4 1 4 1 1 3 1 2 3 3 1 1 4 4 1
[5514] 3 2 2 3 5 5 3 5 3 5 1 3 4 2 3 3 4 5 4 5 1 4 3 1 3 3 3 5 5 2 4 1 5 5 5 3 1
[5551] 5 5 2 2 4 4 5 2 5 1 3 3 4 2 5 1 4 5 4 1 3 3 5 5 5 3 3 5 1 1 3 5 1 5 1 3 4
[5588] 5 4 1 4 3 4 5 4 3 4 1 2 1 3 2 1 2 1 2 4 4 1 1 4 5 1 4 5 3 4 2 2 4 3 5 2 3
[5625] 1 3 3 5 2 4 5 3 2 1 4 3 3 3 3 4 3 3 1 1 4 1 1 2 5 3 4 5 4 1 2 4 2 4 1 4 1
[5662] 4 3 2 5 2 5 1 4 2 3 3 1 2 5 5 2 3 3 1 2 4 3 2 2 4 5 4 2 3 2 2 4 2 3 4 5 3
[5699] 4 1 3 5 4 5 4 2 4 2 2 1 4 5 3 4 4 4 5 2 4 2 2 3 1 2 3 2 4 1 2 1 4 2 5 2 2
[5736] 1 3 3 3 5 3 4 4 1 4 3 4 2 3 3 2 1 1 1 1 2 4 3 3 3 5 4 1 3 3 3 1 1 5 4 2 1
[5773] 1 4 1 5 4 3 3 4 3 3 1 3 3 2 5 2 4 3 5 1 4 3 3 5 4 4 2 4 4 4 1 4 5 5 5 2 2
[5810] 3 5 1 5 3 3 3 1 4 3 2 4 3 2 4 3 5 3 4 5 5 1 4 5 5 3 3 3 4 2 3 1 2 2 3 2 3
[5847] 3 4 3 1 1 2 3 4 5 4 2 5 4 3 2 4 3 2 1 4 3 2 5 1 5 4 1 5 4 4 2 4 1 4 3 1 4
[5884] 3 3 4 1 3 2 1 5 2 4 4 3 3 5 2 3 3 3 2 5 3 3 5 2 1 5 4 3 2 2 2 5 1 5 4 2 3
[5921] 3 1 1 3 4 3 2 3 2 4 3 5 2 3 1 2 4 3 2 4 2 3 1 5 3 2 4 5 5 4 2 5 4 2 4 5 2
[5958] 5 2 3 4 1 3 1 2 1 4 3 1 2 5 4 3 5 3 5 4 2 2 4 1 4 1 1 5 5 2 3 3 4 3 2 4 5
[5995] 1 2 2 3 2 4 3 2 3 5 4 2 3 4 4 1 3 5 3 4 5 2 3 4 1 4 2 5 4 3 4 3 3 4 5 5 2
[6032] 3 4 5 3 5 5 3 2 4 5 1 4 2 3 5 3 4 1 1 5 5 4 5 4 3 2 3 1 5 5 4 2 5 5 1 1 1
[6069] 4 1 3 3 5 5 2 1 1 1 4 4 4 5 1 4 4 3 4 3 4 5 5 2 4 3 4 1 2 1 3 3 2 3 4 3 1
[6106] 1 5 2 2 3 2 2 5 3 2 1 3 5 5 2 4 4 2 4 5 4 2 2 3 2 4 1 5 3 5 2 2 2 3 4 3 1
[6143] 5 4 4 2 5 4 4 4 4 5 5 2 5 3 1 4 2 2 4 3 3 1 1 4 3 2 1 4 5 3 4 5 5 5 3 3 4
[6180] 1 5 2 4 4 5 2 1 1 2 2 4 1 2 4 3 5 5 3 3 4 5 4 1 1 4 4 2 1 1 2 5 1 5 5 2 1
[6217] 4 3 2 4 3 2 3 3 2 2 5 3 2 1 1 4 4 3 4 1 4 5 4 2 3 1 4 4 3 1 1 1 1 3 5 3 1
[6254] 2 1 5 4 3 4 3 2 2 4 3 1 5 1 4 4 1 3 4 4 3 2 4 1 4 1 5 1 4 4 5 3 4 3 5 3 4
[6291] 3 2 2 2 3 2 2 4 3 4 1 5 5 2 5 5 3 5 4 4 3 2 2 4 2 3 2 1 2 2 3 1 2 2 5 3 2
[6328] 3 2 3 2 4 5 3 1 5 2 5 4 2 4 1 3 3 3 4 1 2 5 1 4 4 5 1 3 3 4 1 5 4 2 4 4 3
[6365] 4 4 5 3 1 1 5 3 2 4 5 2 2 2 4 1 5 4 2 3 5 4 3 5 1 4 2 4 3 2 5 1 1 2 5 3
Labels:
value label
1 Highly dissatisfied
2 Somewhat dissatisfied
3 Neutral
4 Somewhat satisfied
5 Highly satisfied
11.9 Visualizing Dichotomous Data
11.9.1 Visualizing with Labels
- Use
haven::as_factor()
from haven package
```{r}
# Converting multiple labelled variables to factors with labels
data %>%
select(starts_with("own")) %>%
mutate(across(where(haven::is.labelled), haven::as_factor)) %>% # from haven package
head()
# Long data
owns_long_fct <- data %>%
select(starts_with("own")) %>%
mutate(across(where(is.labelled), haven::as_factor)) %>%
pivot_longer(everything(), names_to = "owns", values_to = "measures")
owns_long_fct
# Dodged barplot
owns_long_fct %>%
count(owns, measures) %>%
ggplot(aes(x = n, y = measures, fill = owns)) +
geom_col(position = "dodge") +
coord_flip()+
scale_x_continuous(labels = comma) +
labs(title = "Electronics Ownership Status",
x = "# of participants who owns the said products",
y = "Ownership Status",
fill = "Product Ownership")
# Facet_wrap barplot
owns_long_fct %>%
count(owns, measures) %>%
ggplot(aes(x = n, y = measures, fill = owns)) +
geom_col(show.legend = FALSE)+
facet_wrap(. ~ owns)+
scale_x_continuous(labels = comma) +
labs(title = "Electronics Ownership Status",
x = "# of participants who owns the said products",
y = "",
fill = "Product Ownership")
# Facet_grid barplot
owns_long_fct %>%
count(owns, measures) %>%
ggplot(aes(x = n, y = measures, fill = owns)) +
geom_col(show.legend = FALSE)+
facet_grid(owns ~ .)+
scale_x_continuous(labels = comma) +
labs(title = "Electronics Ownership Status",
x = "# of participants who owns the said products",
y = "",
fill = "Product Ownership")
```
11.9.2 Visualizing with Doubles (Not recommended)
```{r data viz with values}
# Long data
owns_long_num <- data |>
select(starts_with("own")) |>
mutate(across(where(is.labelled), as.numeric)) |>
pivot_longer(everything(), names_to = "owns", values_to = "measures")
owns_long_num
# dodged barplot
owns_long_num %>%
count(owns, measures) %>%
ggplot(aes(x = n, y = factor(measures), fill = owns)) +
geom_col(position = "dodge") +
coord_flip()+
scale_x_continuous(labels = comma) +
labs(title = "Electronics Ownership Status",
x = "# of participants who owns the said products",
y = "Ownership Status",
fill = "Product Ownership")
# facet_wrapped barplot
owns_long %>%
count(owns, measures) %>%
ggplot(aes(x = n, y=factor(measures), fill = owns)) +
geom_col(show.legend = FALSE)+
facet_wrap(. ~ owns) +
scale_x_continuous(labels = comma) +
labs(title = "Owndership Status",
x = "# of participants who owns the said products",
y = "Ownership Status",
fill = "Product Ownership")
# facet_grid barplot
owns_long %>%
count(owns, measures) %>%
ggplot(aes(x = n, y = factor(measures), fill = owns)) +
geom_col(show.legend = FALSE) +
facet_grid(owns ~ .) +
scale_x_continuous(labels = comma) +
labs(title = "Owndership Status",
x = "# of participants who owns the said products",
y = "Ownership Status",
fill = "Product Ownership")
```
11.10 Visualizing Plots with Summary Stat
There are two ways to do so.
11.10.1 From a wide data
- summarize first and then reshaping the data.
```{r}
length(data)
n <- length(data$age)
n
data %>%
mutate(n = n()) %>%
summarize(across(starts_with("own"),
.fns = list(prop = mean, sd = sd),
.names = "{.col}_{.fn}")) %>%
pivot_longer(everything(),
names_to = c("set", ".value"),
names_pattern = "(.+)_(.+)") %>%
mutate(set = fct_reorder(set, prop)) %>%
ggplot(aes(set, prop, fill = set)) +
geom_col() +
geom_text(aes(label = round(prop, 2))) +
geom_errorbar(aes(x = set,
ymin = prop-(1.96*sd/sqrt(n)),
ymax = prop+(1.96*sd/sqrt(n))),
width = .5, color = "orange", linewidth = 1.0, alpha = 0.7) +
scale_y_continuous(labels = percent) +
coord_flip() +
labs(x = "Electrnics",
y = "Proportions of those who own the said electronics",
title = "Electronics Ownership Status",
subtitle = "Percentages among participants") +
guides(fill = "none")
```
[1] 29
[1] 6400
11.10.2 From a long data (recommended)
- reshaping the data first, followed by summarizing.
```{r}
theme_set(theme_minimal())
owns_long %>%
group_by(owns) %>%
summarize(prop = mean(measures),
se = sqrt(prop*(1-prop)/n()),
n = n()) %>%
mutate(owns = fct_reorder(owns, prop)) %>%
ggplot(aes(owns, prop, fill = owns)) +
geom_col() +
geom_text(aes(label = round(prop, 2))) +
geom_errorbar(aes(x = owns,
ymin = prop-(1.96*se),
ymax = prop+(1.96*se)),
width = .5, color = "orange", linewidth = 1.0, alpha = 0.7)+
scale_y_continuous(labels = percent) +
coord_flip() +
labs(x = "Electrnics",
y = "Proportions of those who own the said electronics",
title = "Electronics Ownership Status",
subtitle = "Percentages among participants") +
guides(fill = "none")
```
11.10.2.1 Alt.
```{r}
theme_set(theme_minimal())
owns_long %>%
group_by(owns) %>%
summarize(prop = mean(measures),
sd = sd(measures),
n = n(),
se = sd/sqrt(n) ) %>%
mutate(owns = fct_reorder(owns, prop)) %>%
ggplot(aes(owns, prop, fill = owns)) +
geom_col() +
geom_text(aes(label = round(prop, 2))) +
geom_errorbar(aes(x = owns,
ymin = prop-(1.96*se),
ymax = prop+(1.96*se)),
width = .5, color = "orange", linewidth = 1.0, alpha = 0.5)+
scale_y_continuous(labels = percent) +
coord_flip() +
labs(x = "Electrnics",
y = "Proportions of those who own the said electronics",
title = "Electronics Ownership Status",
subtitle = "Percentages among participants") +
guides(fill = "none")
```
12 Appendix
12.1 Common operations with labelled data
Reference: https://community.rstudio.com/t/leveraging-labelled-data-in-r-r-views-submission/114983
I primarily use three packages for working with labelled data: haven, labelled, and sjlabelled. These three packages do have some overlap in functionality, in addition to naming schemes that differ but achieve the same objective (e.g., haven::as_factor vs sjlabelled::as_label), or naming schemes that are the same but achieve different objectives (e.g., haven::as_factor vs sjlabelled::as_factor).
To compound confusion, the concept of a label can refer to either variable or value labels. Frequently, plural function names refer to value labels, as in haven::zap_labels or labelled::remove_val_labels.
Here are operations I commonly perform on labelled data:
- Evaluate if variable is of class haven_labelled.
- Why? Troubleshooting, exploring, mutating.
- Function(s): haven::is.labelled()
- Convert haven_labelled variable to numeric value codes.
- Why? To treat the variable as continuous for analysis.
- For example, if a 1-7 rating scale imports as labelled and you want to compute a mean.
- Function(s):
- base::as.numeric() (strips variable of all metadata),
- haven::zap_labels() and
- labelled::remove_val_labels (removes value labels, retains other metadata)
- Convert haven_labelled() variable to factor with value labels.
- Why? To treat the variable as categorical for analysis.
- Function(s):
- haven::as_factor(),
- labelled::to_factor(),
- sjlabelled::as_label().
- As far as I can tell, these three functions have the same result. By default, the factor levels are ordered by value codes.
- Convert variable label to variable name.
- Why? For more informative or readable variable names.
- Function(s):
- sjlabelled::label_to_colnames()
12.2 Joseph Larmarange reference
- Joseph Larmarange reference https://cran.r-project.org/web/packages/labelled/vignettes/intro_labelled.html
```{r practice}
library(labelled)
class(iris)
head(iris)
var_label(iris$Sepal.Length) <- "Length of sepal"
head(iris)
var_label(iris) <- list(Petal.Length = "Length of petal", Petal.Width = "Width of Petal")
var_label(iris$Petal.Length)
var_label(iris)
var_label(iris$Sepal.Length) <- NULL
view(iris)
look_for(iris)
```
[1] "data.frame"
[1] "Length of petal"
$Sepal.Length
[1] "Length of sepal"
$Sepal.Width
NULL
$Petal.Length
[1] "Length of petal"
$Petal.Width
[1] "Width of Petal"
$Species
NULL
12.3 Converting labelled data to unlabelled data
- Comparison of various methods
```{r eval=FALSE}
data_unlabelled <- data %>%
mutate(
f1 = haven::as_factor(carcat),
f2 = labelled::to_factor(carcat),
f3 = sjlabelled::as_label(carcat),
f4 = sjlabelled::as_factor(carcat),
n1 = base::as.numeric(carcat),
n2 = sjlabelled::as_numeric(carcat),
n3 = haven::zap_labels(carcat),
n4 = labelled::remove_val_labels(carcat)
) %>%
dplyr::select(age, f1, f2, f3, f4, n1, n2, n3, n4)
data_unlabelled
```
12.4 IMDB Lowest Rated Movies
12.4.1 Movie titles
```{r}
library(rvest)
url <- "https://www.imdb.com/chart/bottom?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=4da9d9a5-d299-43f2-9c53-"
# movie titles
movie <-
read_html(url) |>
html_elements("li") |>
html_elements(".ipc-metadata-list-summary-item__tc") |>
html_elements(".ipc-title__text") |>
html_text()
## alternatively,
read_html(url) |>
html_elements("li .ipc-metadata-list-summary-item__tc .ipc-title__text") |>
html_text()
```
[1] "1. Disaster Movie" "2. Manos: The Hands of Fate"
[3] "3. Birdemic: Shock and Terror" "4. Superbabies: Baby Geniuses 2"
[5] "5. Kirk Cameron's Saving Christmas" "6. The Hottie & the Nottie"
[7] "7. House of the Dead" "8. Son of the Mask"
[9] "9. Radhe" "10. Epic Movie"
[11] "11. Pledge This!" "12. Battlefield Earth"
[13] "13. Alone in the Dark" "14. Dragonball Evolution"
[15] "15. Race 3" "16. Foodfight!"
[17] "17. Going Overboard" "18. From Justin to Kelly"
[19] "19. Turks in Space" "20. Meet the Spartans"
[21] "21. Gigli" "22. Daniel the Wizard"
[23] "23. Date Movie" "24. Cats"
[25] "25. Baby Geniuses"
12.4.2 ratings
12.4.2.1 alt.
```{r}
# ratings
rate <-
read_html(url) |>
html_elements(".ipc-rating-star.ipc-rating-star--base.ipc-rating-star--imdb.ratingGroup--imdb-rating") |>
html_text()
class(rate)
#rate <- rate[!grepl("Rate", rate)]
#rating <- str_extract(rate, "\\d\\.\\d\\s(?=\\(\\d+K\\))") #1.5 (32K)
rating <-
tibble(rate) |>
mutate(rate = str_extract(rate, "\\d\\.\\d\\s(?=\\(\\d+K\\))")) # look around (followed by)
rating
```
[1] "character"
12.4.3 Vote counts
ipc-rating-star–voteCount ::: {.cell}
```{r}
# vote counts
vote_counts <-
read_html(url) |>
html_elements("li") |>
html_elements(".ipc-rating-star--voteCount") |>
html_text(trim = TRUE)
vote_counts
## Alternatively,
read_html(url) |>
html_elements("li .ipc-metadata-list-summary-item__tc .ipc-rating-star--voteCount") |>
html_text(trim = TRUE)
```
[1] "(96K)" "(38K)" "(26K)" "(32K)" "(17K)" "(39K)" "(39K)" "(61K)"
[9] "(181K)" "(111K)" "(19K)" "(84K)" "(48K)" "(83K)" "(49K)" "(12K)"
[17] "(15K)" "(27K)" "(17K)" "(113K)" "(51K)" "(15K)" "(63K)" "(58K)"
[25] "(28K)"
[1] "(96K)" "(38K)" "(26K)" "(32K)" "(17K)" "(39K)" "(39K)" "(61K)"
[9] "(181K)" "(111K)" "(19K)" "(84K)" "(48K)" "(83K)" "(49K)" "(12K)"
[17] "(15K)" "(27K)" "(17K)" "(113K)" "(51K)" "(15K)" "(63K)" "(58K)"
[25] "(28K)"
:::
12.4.4 Creating a data frame
```{r}
# create a data frame
worst_movies <-
tibble(movie, ratings, vote_counts)
worst_movies
worst_movies <-
worst_movies |>
mutate(vote_counts = str_extract(vote_counts, "\\d+K")) |>
mutate(movies = str_extract(movie, "(?<=\\d\\.\\s).+")) |> # Look arounds (proceeded by)
select(-movie) |>
relocate(movies, ratings)
sorted_worst_movies <-
worst_movies |>
arrange(rate)
sorted_worst_movies
```
12.5 Selecting descendants (children, children’s children, etc.)
```{r}
'https://en.wikipedia.org/wiki/Hyperlink' %>%
read_html() %>%
html_elements('a') %>%
length()
'https://en.wikipedia.org/wiki/Hyperlink' %>%
read_html() %>%
html_elements('#content a') %>%
length()
'https://en.wikipedia.org/wiki/Hyperlink' %>%
read_html() %>%
html_element('#content') %>%
html_elements('a') %>%
length()
```
[1] 517
[1] 454
[1] 454
13 Further resources and references
How to link Github and RStudio: https://www.cpp.edu/cba/customer-insights-lab/news/event/install-git.shtml
David Robinson’s screencast: https://www.youtube.com/watch?v=NoUHdrailxA
“Conversion semantics,” by Hadley Wickham: https://haven.tidyverse.org/articles/semantics.html
“Leveraging labelled data in R: Embracing SPSS, SAS, and Stata data sets with the haven, labelled, and sjlabelled packages,”: https://community.rstudio.com/t/leveraging-labelled-data-in-r-r-views-submission/114983
“Introduction to labelled,” by Joseph Larmarange https://cran.r-project.org/web/packages/labelled/vignettes/intro_labelled.html
KableExtra vignettes: https://cran.r-project.org/web/packages/kableExtra/vignettes/awesome_table_in_html.html
Kasper Wellbers’s Web Scraping with RVest: https://github.com/ccs-amsterdam/r-course-material/blob/master/tutorials/rvest.md