M09-1-Principle: Data Import and Export

Author

Jae Jung

Published

April 4, 2025

Note

Note: Lecture note is available at my quarto-pub site

1 Overview

1.1 Expected Learning Outcomes

  1. Explain how to create a Github repository and collaborate with others on the same R projects.
  2. Effectively load and look through built-in datasets in R.
  3. Import various type of data (csv, xlsx, and SPSS) to RStudio.
  4. Scrape data from the web using SelectorGadget, rvest, and inspect function of web browers.
  5. Import multiple data sets and work with them.
  6. Work with labelled data in R.
  7. Export/save output data to local pc and push to Github.

1.2 The textbook chapters to cover:

  • Ch07: Data Import
  • Ch20: Spreadsheets
  • Ch24: Web Scraping
  • Ch18: Missing values

2 Installing and loading up R Packages

```{r Load up packages}
library(tidyverse)
library(sjPlot)
```

3 Reading built-in Data

```{r r datasets}
# Look up all datasets in the loaded package

#data()

# Getting information about mtcars dataset.
#help(mtcars)
head(mtcars)
```

4 Importing Data

4.1 csv files

4.1.1 Using base R

```{r utils}
#?read.csv
stu_per_base <- read.csv("data/StudentsPerformance.csv") # Not as good as read_csv

class(stu_per_base)
head(stu_per_base)
```
[1] "data.frame"

4.1.2 Using readr package

```{r readr package}
stu_per_readr <- read_csv("data/StudentsPerformance.csv") # Use read_csv() all the time from now on over read.csv()

spec(stu_per_readr)
class(stu_per_readr)
head(stu_per_readr)

skimr::skim(stu_per_readr) #  better than summary() function

stu_per_readr$`math score` # RStudio's auto prompt
stu_per_readr$"math score" # same outcome
stu_per_readr$'math score'
```
cols(
  gender = col_character(),
  `race/ethnicity` = col_character(),
  `parental level of education` = col_character(),
  lunch = col_character(),
  `test preparation course` = col_character(),
  `math score` = col_double(),
  `reading score` = col_double(),
  `writing score` = col_double()
)
[1] "spec_tbl_df" "tbl_df"      "tbl"         "data.frame" 
Data summary
Name stu_per_readr
Number of rows 1000
Number of columns 8
_______________________
Column type frequency:
character 5
numeric 3
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
gender 0 1 4 6 0 2 0
race/ethnicity 0 1 7 7 0 5 0
parental level of education 0 1 11 18 0 6 0
lunch 0 1 8 12 0 2 0
test preparation course 0 1 4 9 0 2 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
math score 0 1 66.09 15.16 0 57.00 66 77 100 ▁▁▅▇▃
reading score 0 1 69.17 14.60 17 59.00 70 79 100 ▁▂▆▇▃
writing score 0 1 68.05 15.20 10 57.75 69 79 100 ▁▂▅▇▃
   [1]  72  69  90  47  76  71  88  40  64  38  58  40  65  78  50  69  88  18
  [19]  46  54  66  65  44  69  74  73  69  67  70  62  69  63  56  40  97  81
  [37]  74  50  75  57  55  58  53  59  50  65  55  66  57  82  53  77  53  88
  [55]  71  33  82  52  58   0  79  39  62  69  59  67  45  60  61  39  58  63
  [73]  41  61  49  44  30  80  61  62  47  49  50  72  42  73  76  71  58  73
  [91]  65  27  71  43  79  78  65  63  58  65  79  68  85  60  98  58  87  66
 [109]  52  70  77  62  54  51  99  84  75  78  51  55  79  91  88  63  83  87
 [127]  72  65  82  51  89  53  87  75  74  58  51  70  59  71  76  59  42  57
 [145]  88  22  88  73  68 100  62  77  59  54  62  70  66  60  61  66  82  75
 [163]  49  52  81  96  53  58  68  67  72  94  79  63  43  81  46  71  52  97
 [181]  62  46  50  65  45  65  80  62  48  77  66  76  62  77  69  61  59  55
 [199]  45  78  67  65  69  57  59  74  82  81  74  58  80  35  42  60  87  84
 [217]  83  34  66  61  56  87  55  86  52  45  72  57  68  88  76  46  67  92
 [235]  83  80  63  64  54  84  73  80  56  59  75  85  89  58  65  68  47  71
 [253]  60  80  54  62  64  78  70  65  64  79  44  99  76  59  63  69  88  71
 [271]  69  58  47  65  88  83  85  59  65  73  53  45  73  70  37  81  97  67
 [289]  88  77  76  86  63  65  78  67  46  71  40  90  81  56  67  80  74  69
 [307]  99  51  53  49  73  66  67  68  59  71  77  83  63  56  67  75  71  43
 [325]  41  82  61  28  82  41  71  47  62  90  83  61  76  49  24  35  58  61
 [343]  69  67  79  72  62  77  75  87  52  66  63  46  59  61  63  42  59  80
 [361]  58  85  52  27  59  49  69  61  44  73  84  45  74  82  59  46  80  85
 [379]  71  66  80  87  79  38  38  67  64  57  62  73  73  77  76  57  65  48
 [397]  50  85  74  60  59  53  49  88  54  63  65  82  52  87  70  84  71  63
 [415]  51  84  71  74  68  57  82  57  47  59  41  62  86  69  65  68  64  61
 [433]  61  47  73  50  75  75  70  89  67  78  59  73  79  67  69  86  47  81
 [451]  64 100  65  65  53  37  79  53 100  72  53  54  71  77  75  84  26  72
 [469]  77  91  83  63  68  59  90  71  76  80  55  76  73  52  68  59  49  70
 [487]  61  60  64  79  65  64  83  81  54  68  54  59  66  76  74  94  63  95
 [505]  40  82  68  55  79  86  76  64  62  54  77  76  74  66  66  67  71  91
 [523]  69  54  53  68  56  36  29  62  68  47  62  79  73  66  51  51  85  97
 [541]  75  79  81  82  64  78  92  72  62  79  79  87  40  77  53  32  55  61
 [559]  53  73  74  63  96  63  48  48  92  61  63  68  71  91  53  50  74  40
 [577]  61  81  48  53  81  77  63  73  69  65  55  44  54  48  58  71  68  74
 [595]  92  56  30  53  69  65  54  29  76  60  84  75  85  40  61  58  69  58
 [613]  94  65  82  60  37  88  95  65  35  62  58 100  61 100  69  61  49  44
 [631]  67  79  66  75  84  71  67  80  86  76  41  74  72  74  70  65  59  64
 [649]  50  69  51  68  85  65  73  62  77  69  43  90  74  73  55  65  80  50
 [667]  63  77  73  81  66  52  69  65  69  50  73  70  81  63  67  60  62  29
 [685]  62  94  85  77  53  93  49  73  66  77  49  79  75  59  57  66  79  57
 [703]  87  63  59  62  46  66  89  42  93  80  98  81  60  76  73  96  76  91
 [721]  62  55  74  50  47  81  65  68  73  53  68  55  87  55  53  67  92  53
 [739]  81  61  80  37  81  59  55  72  69  69  50  87  71  68  79  77  58  84
 [757]  55  70  52  69  53  48  78  62  60  74  58  76  68  58  52  75  52  62
 [775]  66  49  66  35  72  94  46  77  76  52  91  32  72  19  68  52  48  60
 [793]  66  89  42  57  70  70  69  52  67  76  87  82  73  75  64  41  90  59
 [811]  51  45  54  87  72  94  45  61  60  77  85  78  49  71  48  62  56  65
 [829]  69  68  61  74  64  77  58  60  73  75  58  66  39  64  23  74  40  90
 [847]  91  64  59  80  71  61  87  82  62  97  75  65  52  87  53  81  39  71
 [865]  97  82  59  61  78  49  59  70  82  90  43  80  81  57  59  64  63  71
 [883]  64  55  51  62  93  54  69  44  86  85  50  88  59  32  36  63  67  65
 [901]  85  73  34  93  67  88  57  79  67  70  50  69  52  47  46  68 100  44
 [919]  57  91  69  35  72  54  74  74  64  65  46  48  67  62  61  70  98  70
 [937]  67  57  85  77  72  78  81  61  58  54  82  49  49  57  94  75  74  58
 [955]  62  72  84  92  45  75  56  48 100  65  72  62  66  63  68  75  89  78
 [973]  53  49  54  64  60  62  55  91   8  81  79  78  74  57  40  81  44  67
 [991]  86  65  55  62  63  88  62  59  68  77
   [1]  72  69  90  47  76  71  88  40  64  38  58  40  65  78  50  69  88  18
  [19]  46  54  66  65  44  69  74  73  69  67  70  62  69  63  56  40  97  81
  [37]  74  50  75  57  55  58  53  59  50  65  55  66  57  82  53  77  53  88
  [55]  71  33  82  52  58   0  79  39  62  69  59  67  45  60  61  39  58  63
  [73]  41  61  49  44  30  80  61  62  47  49  50  72  42  73  76  71  58  73
  [91]  65  27  71  43  79  78  65  63  58  65  79  68  85  60  98  58  87  66
 [109]  52  70  77  62  54  51  99  84  75  78  51  55  79  91  88  63  83  87
 [127]  72  65  82  51  89  53  87  75  74  58  51  70  59  71  76  59  42  57
 [145]  88  22  88  73  68 100  62  77  59  54  62  70  66  60  61  66  82  75
 [163]  49  52  81  96  53  58  68  67  72  94  79  63  43  81  46  71  52  97
 [181]  62  46  50  65  45  65  80  62  48  77  66  76  62  77  69  61  59  55
 [199]  45  78  67  65  69  57  59  74  82  81  74  58  80  35  42  60  87  84
 [217]  83  34  66  61  56  87  55  86  52  45  72  57  68  88  76  46  67  92
 [235]  83  80  63  64  54  84  73  80  56  59  75  85  89  58  65  68  47  71
 [253]  60  80  54  62  64  78  70  65  64  79  44  99  76  59  63  69  88  71
 [271]  69  58  47  65  88  83  85  59  65  73  53  45  73  70  37  81  97  67
 [289]  88  77  76  86  63  65  78  67  46  71  40  90  81  56  67  80  74  69
 [307]  99  51  53  49  73  66  67  68  59  71  77  83  63  56  67  75  71  43
 [325]  41  82  61  28  82  41  71  47  62  90  83  61  76  49  24  35  58  61
 [343]  69  67  79  72  62  77  75  87  52  66  63  46  59  61  63  42  59  80
 [361]  58  85  52  27  59  49  69  61  44  73  84  45  74  82  59  46  80  85
 [379]  71  66  80  87  79  38  38  67  64  57  62  73  73  77  76  57  65  48
 [397]  50  85  74  60  59  53  49  88  54  63  65  82  52  87  70  84  71  63
 [415]  51  84  71  74  68  57  82  57  47  59  41  62  86  69  65  68  64  61
 [433]  61  47  73  50  75  75  70  89  67  78  59  73  79  67  69  86  47  81
 [451]  64 100  65  65  53  37  79  53 100  72  53  54  71  77  75  84  26  72
 [469]  77  91  83  63  68  59  90  71  76  80  55  76  73  52  68  59  49  70
 [487]  61  60  64  79  65  64  83  81  54  68  54  59  66  76  74  94  63  95
 [505]  40  82  68  55  79  86  76  64  62  54  77  76  74  66  66  67  71  91
 [523]  69  54  53  68  56  36  29  62  68  47  62  79  73  66  51  51  85  97
 [541]  75  79  81  82  64  78  92  72  62  79  79  87  40  77  53  32  55  61
 [559]  53  73  74  63  96  63  48  48  92  61  63  68  71  91  53  50  74  40
 [577]  61  81  48  53  81  77  63  73  69  65  55  44  54  48  58  71  68  74
 [595]  92  56  30  53  69  65  54  29  76  60  84  75  85  40  61  58  69  58
 [613]  94  65  82  60  37  88  95  65  35  62  58 100  61 100  69  61  49  44
 [631]  67  79  66  75  84  71  67  80  86  76  41  74  72  74  70  65  59  64
 [649]  50  69  51  68  85  65  73  62  77  69  43  90  74  73  55  65  80  50
 [667]  63  77  73  81  66  52  69  65  69  50  73  70  81  63  67  60  62  29
 [685]  62  94  85  77  53  93  49  73  66  77  49  79  75  59  57  66  79  57
 [703]  87  63  59  62  46  66  89  42  93  80  98  81  60  76  73  96  76  91
 [721]  62  55  74  50  47  81  65  68  73  53  68  55  87  55  53  67  92  53
 [739]  81  61  80  37  81  59  55  72  69  69  50  87  71  68  79  77  58  84
 [757]  55  70  52  69  53  48  78  62  60  74  58  76  68  58  52  75  52  62
 [775]  66  49  66  35  72  94  46  77  76  52  91  32  72  19  68  52  48  60
 [793]  66  89  42  57  70  70  69  52  67  76  87  82  73  75  64  41  90  59
 [811]  51  45  54  87  72  94  45  61  60  77  85  78  49  71  48  62  56  65
 [829]  69  68  61  74  64  77  58  60  73  75  58  66  39  64  23  74  40  90
 [847]  91  64  59  80  71  61  87  82  62  97  75  65  52  87  53  81  39  71
 [865]  97  82  59  61  78  49  59  70  82  90  43  80  81  57  59  64  63  71
 [883]  64  55  51  62  93  54  69  44  86  85  50  88  59  32  36  63  67  65
 [901]  85  73  34  93  67  88  57  79  67  70  50  69  52  47  46  68 100  44
 [919]  57  91  69  35  72  54  74  74  64  65  46  48  67  62  61  70  98  70
 [937]  67  57  85  77  72  78  81  61  58  54  82  49  49  57  94  75  74  58
 [955]  62  72  84  92  45  75  56  48 100  65  72  62  66  63  68  75  89  78
 [973]  53  49  54  64  60  62  55  91   8  81  79  78  74  57  40  81  44  67
 [991]  86  65  55  62  63  88  62  59  68  77
   [1]  72  69  90  47  76  71  88  40  64  38  58  40  65  78  50  69  88  18
  [19]  46  54  66  65  44  69  74  73  69  67  70  62  69  63  56  40  97  81
  [37]  74  50  75  57  55  58  53  59  50  65  55  66  57  82  53  77  53  88
  [55]  71  33  82  52  58   0  79  39  62  69  59  67  45  60  61  39  58  63
  [73]  41  61  49  44  30  80  61  62  47  49  50  72  42  73  76  71  58  73
  [91]  65  27  71  43  79  78  65  63  58  65  79  68  85  60  98  58  87  66
 [109]  52  70  77  62  54  51  99  84  75  78  51  55  79  91  88  63  83  87
 [127]  72  65  82  51  89  53  87  75  74  58  51  70  59  71  76  59  42  57
 [145]  88  22  88  73  68 100  62  77  59  54  62  70  66  60  61  66  82  75
 [163]  49  52  81  96  53  58  68  67  72  94  79  63  43  81  46  71  52  97
 [181]  62  46  50  65  45  65  80  62  48  77  66  76  62  77  69  61  59  55
 [199]  45  78  67  65  69  57  59  74  82  81  74  58  80  35  42  60  87  84
 [217]  83  34  66  61  56  87  55  86  52  45  72  57  68  88  76  46  67  92
 [235]  83  80  63  64  54  84  73  80  56  59  75  85  89  58  65  68  47  71
 [253]  60  80  54  62  64  78  70  65  64  79  44  99  76  59  63  69  88  71
 [271]  69  58  47  65  88  83  85  59  65  73  53  45  73  70  37  81  97  67
 [289]  88  77  76  86  63  65  78  67  46  71  40  90  81  56  67  80  74  69
 [307]  99  51  53  49  73  66  67  68  59  71  77  83  63  56  67  75  71  43
 [325]  41  82  61  28  82  41  71  47  62  90  83  61  76  49  24  35  58  61
 [343]  69  67  79  72  62  77  75  87  52  66  63  46  59  61  63  42  59  80
 [361]  58  85  52  27  59  49  69  61  44  73  84  45  74  82  59  46  80  85
 [379]  71  66  80  87  79  38  38  67  64  57  62  73  73  77  76  57  65  48
 [397]  50  85  74  60  59  53  49  88  54  63  65  82  52  87  70  84  71  63
 [415]  51  84  71  74  68  57  82  57  47  59  41  62  86  69  65  68  64  61
 [433]  61  47  73  50  75  75  70  89  67  78  59  73  79  67  69  86  47  81
 [451]  64 100  65  65  53  37  79  53 100  72  53  54  71  77  75  84  26  72
 [469]  77  91  83  63  68  59  90  71  76  80  55  76  73  52  68  59  49  70
 [487]  61  60  64  79  65  64  83  81  54  68  54  59  66  76  74  94  63  95
 [505]  40  82  68  55  79  86  76  64  62  54  77  76  74  66  66  67  71  91
 [523]  69  54  53  68  56  36  29  62  68  47  62  79  73  66  51  51  85  97
 [541]  75  79  81  82  64  78  92  72  62  79  79  87  40  77  53  32  55  61
 [559]  53  73  74  63  96  63  48  48  92  61  63  68  71  91  53  50  74  40
 [577]  61  81  48  53  81  77  63  73  69  65  55  44  54  48  58  71  68  74
 [595]  92  56  30  53  69  65  54  29  76  60  84  75  85  40  61  58  69  58
 [613]  94  65  82  60  37  88  95  65  35  62  58 100  61 100  69  61  49  44
 [631]  67  79  66  75  84  71  67  80  86  76  41  74  72  74  70  65  59  64
 [649]  50  69  51  68  85  65  73  62  77  69  43  90  74  73  55  65  80  50
 [667]  63  77  73  81  66  52  69  65  69  50  73  70  81  63  67  60  62  29
 [685]  62  94  85  77  53  93  49  73  66  77  49  79  75  59  57  66  79  57
 [703]  87  63  59  62  46  66  89  42  93  80  98  81  60  76  73  96  76  91
 [721]  62  55  74  50  47  81  65  68  73  53  68  55  87  55  53  67  92  53
 [739]  81  61  80  37  81  59  55  72  69  69  50  87  71  68  79  77  58  84
 [757]  55  70  52  69  53  48  78  62  60  74  58  76  68  58  52  75  52  62
 [775]  66  49  66  35  72  94  46  77  76  52  91  32  72  19  68  52  48  60
 [793]  66  89  42  57  70  70  69  52  67  76  87  82  73  75  64  41  90  59
 [811]  51  45  54  87  72  94  45  61  60  77  85  78  49  71  48  62  56  65
 [829]  69  68  61  74  64  77  58  60  73  75  58  66  39  64  23  74  40  90
 [847]  91  64  59  80  71  61  87  82  62  97  75  65  52  87  53  81  39  71
 [865]  97  82  59  61  78  49  59  70  82  90  43  80  81  57  59  64  63  71
 [883]  64  55  51  62  93  54  69  44  86  85  50  88  59  32  36  63  67  65
 [901]  85  73  34  93  67  88  57  79  67  70  50  69  52  47  46  68 100  44
 [919]  57  91  69  35  72  54  74  74  64  65  46  48  67  62  61  70  98  70
 [937]  67  57  85  77  72  78  81  61  58  54  82  49  49  57  94  75  74  58
 [955]  62  72  84  92  45  75  56  48 100  65  72  62  66  63  68  75  89  78
 [973]  53  49  54  64  60  62  55  91   8  81  79  78  74  57  40  81  44  67
 [991]  86  65  55  62  63  88  62  59  68  77

4.2 URL

```{r}
url_perf <- read.csv("https://raw.githubusercontent.com/jaejungca/R-data-import-export/refs/heads/main/data/StudentsPerformance.csv")

url_perf <- read_csv("https://raw.githubusercontent.com/jaejungca/R-data-import-export/refs/heads/main/data/StudentsPerformance.csv")

url_perf
```

4.3 Importing Excel data with readxl package

```{r readxl}
library(readxl)

sp_excel <- read_excel("data/StudentsPerformance.xlsx") 

#?read_excel

sp_excel
skimr::skim(sp_excel)
```
Data summary
Name sp_excel
Number of rows 1000
Number of columns 8
_______________________
Column type frequency:
character 5
numeric 3
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
gender 0 1 4 6 0 2 0
race/ethnicity 0 1 7 7 0 5 0
parental level of education 0 1 11 18 0 6 0
lunch 0 1 8 12 0 2 0
test preparation course 0 1 4 9 0 2 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
math score 0 1 66.09 15.16 0 57.00 66 77 100 ▁▁▅▇▃
reading score 0 1 69.17 14.60 17 59.00 70 79 100 ▁▂▆▇▃
writing score 0 1 68.05 15.20 10 57.75 69 79 100 ▁▂▅▇▃

4.4 SPSS data with haven package

  • labelled data
    • SPSS: read_sav()
    • SAS: read_sas()
    • STATA read_stata()
```{r heaven}
library(haven)

demo <- read_sav("data/demo.sav") # from heaven package; read data as tibble.
class(demo)
typeof(demo)

demo

view_df(demo)
```
[1] "tbl_df"     "tbl"        "data.frame"
[1] "list"
Data frame: demo
ID Name Label Values Value Labels
1 age Age in years range: 18-77
2 marital Marital status 0
1
Unmarried
Married
3 address Years at current address range: 0-56
4 income Household income in thousands range: 9-1116
5 inccat Income category in thousands 1
2
3
4
Under $25
$25 - $49
$50 - $74
$75+
6 car Price of primary vehicle range: 4.2-99.9
7 cbackground-color:#eeeeeeat Primary vehicle price category 1
2
3
Economy
Standard
Luxury
8 ed Level of education 1
2
3
4
5
Did not complete high school
High school degree
Some college
College degree
Post-undergraduate degree
9 employ Years with current employer range: 0-57
10 retire Retired 0
1
No
Yes
11 empcat Years with current employer 1
2
3
Less than 5
5 to 15
More than 15
12 jobsat Job satisfaction 1
2
3
4
5
Highly dissatisfied
Somewhat dissatisfied
Neutral
Somewhat satisfied
Highly satisfied
13 gender Gender f
m
<output omitted>
14 reside Number of people in household range: 1-9
15 wireless Wireless service 0
1
No
Yes
16 multline Multiple lines 0
1
No
Yes
17 voice Voice mail 0
1
No
Yes
18 pager Paging service 0
1
No
Yes
19 internet Internet 0
1
8
9
No
Yes
Does not know
No Answer
20 callid Caller ID 0
1
No
Yes
21 callwait Call waiting 0
1
No
Yes
22 owntv Owns TV 0
1
No
Yes
23 ownvcr Owns VCR 0
1
No
Yes
24 owncd Owns stereo/CD player 0
1
No
Yes
25 ownpda Owns PDA 0
1
No
Yes
26 ownpc Owns computer 0
1
No
Yes
27 ownfax Owns fax machine 0
1
No
Yes
28 news Newspaper subscription 0
1
Yes
No
29 response Response 0
1
Yes
No

4.5 Import from clipboard

Go to the following site and copy the Stock Price site and copy the data to clipboard.

4.5.1 For Windows users

```{r read.table}
#| error: true
stocks <- read.delim("clipboard")
head(stocks)
tail(stocks)
#?read.delim # from read.table package

readr::read_delim("clipboard")
```
Error: 'clipboard' does not exist in current working directory ('C:/Users/jmjung/OneDrive - Cal Poly Pomona/Documents/Work/Jaemin/4. Teaching/DWV101/Course Content/M09-Import_Export/Principles').

4.5.2 For Mac OS users

```{r}
#| error: true
stocks2 <- read.delim(pipe("pbpaste")) # pbpaste is the Mac equivalent of clipboard on Windows.
```
Error in read.table(file = file, header = header, sep = sep, quote = quote, : no lines available in input

5 Data Cleaning & Exploratory Data Analysis

5.1 Cleaning

```{r cleaning stocks}
# examine and clean data
head(stocks)
class(stocks)

stocks <- as_tibble(stocks)
head(stocks)

glimpse(stocks)
skimr::skim(stocks)
head(stocks)

colnames(stocks)[5:6] <- c("Close", "Adj_Close")
head(stocks)

# clean names
library(janitor)
stocks |> 
  clean_names() -> stocks

stocks
```
[1] "data.frame"
Rows: 0
Columns: 1
$ M09.1.Principle.Data_Import_Export.qmd <lgl> 
Data summary
Name stocks
Number of rows 0
Number of columns 1
_______________________
Column type frequency:
logical 1
________________________
Group variables None

Variable type: logical

skim_variable n_missing complete_rate mean count
M09.1.Principle.Data_Import_Export.qmd 0 NaN NaN :

5.2 Exploratory Data Analysis (EDA)

5.2.1 Automated EDA packages/functions

```{r diamonds data}
glimpse(diamonds)

skimr::skim(diamonds)

diamonds %>% 
  group_by(cut) %>% 
  skimr::skim()
```
Rows: 53,940
Columns: 10
$ carat   <dbl> 0.23, 0.21, 0.23, 0.29, 0.31, 0.24, 0.24, 0.26, 0.22, 0.23, 0.…
$ cut     <ord> Ideal, Premium, Good, Premium, Good, Very Good, Very Good, Ver…
$ color   <ord> E, E, E, I, J, J, I, H, E, H, J, J, F, J, E, E, I, J, J, J, I,…
$ clarity <ord> SI2, SI1, VS1, VS2, SI2, VVS2, VVS1, SI1, VS2, VS1, SI1, VS1, …
$ depth   <dbl> 61.5, 59.8, 56.9, 62.4, 63.3, 62.8, 62.3, 61.9, 65.1, 59.4, 64…
$ table   <dbl> 55, 61, 65, 58, 58, 57, 57, 55, 61, 61, 55, 56, 61, 54, 62, 58…
$ price   <int> 326, 326, 327, 334, 335, 336, 336, 337, 337, 338, 339, 340, 34…
$ x       <dbl> 3.95, 3.89, 4.05, 4.20, 4.34, 3.94, 3.95, 4.07, 3.87, 4.00, 4.…
$ y       <dbl> 3.98, 3.84, 4.07, 4.23, 4.35, 3.96, 3.98, 4.11, 3.78, 4.05, 4.…
$ z       <dbl> 2.43, 2.31, 2.31, 2.63, 2.75, 2.48, 2.47, 2.53, 2.49, 2.39, 2.…
Data summary
Name diamonds
Number of rows 53940
Number of columns 10
_______________________
Column type frequency:
factor 3
numeric 7
________________________
Group variables None

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
cut 0 1 TRUE 5 Ide: 21551, Pre: 13791, Ver: 12082, Goo: 4906
color 0 1 TRUE 7 G: 11292, E: 9797, F: 9542, H: 8304
clarity 0 1 TRUE 8 SI1: 13065, VS2: 12258, SI2: 9194, VS1: 8171

Variable type: numeric

Data summary
skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
carat 0 1 0.80 0.47 0.2 0.40 0.70 1.04 5.01 ▇▂▁▁▁
depth 0 1 61.75 1.43 43.0 61.00 61.80 62.50 79.00 ▁▁▇▁▁
table 0 1 57.46 2.23 43.0 56.00 57.00 59.00 95.00 ▁▇▁▁▁
price 0 1 3932.80 3989.44 326.0 950.00 2401.00 5324.25 18823.00 ▇▂▁▁▁
x 0 1 5.73 1.12 0.0 4.71 5.70 6.54 10.74 ▁▁▇▃▁
y 0 1 5.73 1.14 0.0 4.72 5.71 6.54 58.90 ▇▁▁▁▁
z 0 1 3.54 0.71 0.0 2.91 3.53 4.04 31.80 ▇▁▁▁▁
Name Piped data
Number of rows 53940
Number of columns 10
_______________________
Column type frequency:
factor 2
numeric 7
________________________
Group variables cut

Variable type: factor

skim_variable cut n_missing complete_rate ordered n_unique top_counts
color Fair 0 1 TRUE 7 G: 314, F: 312, H: 303, E: 224
color Good 0 1 TRUE 7 E: 933, F: 909, G: 871, H: 702
color Very Good 0 1 TRUE 7 E: 2400, G: 2299, F: 2164, H: 1824
color Premium 0 1 TRUE 7 G: 2924, H: 2360, E: 2337, F: 2331
color Ideal 0 1 TRUE 7 G: 4884, E: 3903, F: 3826, H: 3115
clarity Fair 0 1 TRUE 8 SI2: 466, SI1: 408, VS2: 261, I1: 210
clarity Good 0 1 TRUE 8 SI1: 1560, SI2: 1081, VS2: 978, VS1: 648
clarity Very Good 0 1 TRUE 8 SI1: 3240, VS2: 2591, SI2: 2100, VS1: 1775
clarity Premium 0 1 TRUE 8 SI1: 3575, VS2: 3357, SI2: 2949, VS1: 1989
clarity Ideal 0 1 TRUE 8 VS2: 5071, SI1: 4282, VS1: 3589, VVS: 2606

Variable type: numeric

skim_variable cut n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
carat Fair 0 1 1.05 0.52 0.22 0.70 1.00 1.20 5.01 ▇▂▁▁▁
carat Good 0 1 0.85 0.45 0.23 0.50 0.82 1.01 3.01 ▇▆▂▁▁
carat Very Good 0 1 0.81 0.46 0.20 0.41 0.71 1.02 4.00 ▇▃▁▁▁
carat Premium 0 1 0.89 0.52 0.20 0.41 0.86 1.20 4.01 ▇▆▁▁▁
carat Ideal 0 1 0.70 0.43 0.20 0.35 0.54 1.01 3.50 ▇▂▁▁▁
depth Fair 0 1 64.04 3.64 43.00 64.40 65.00 65.90 79.00 ▁▁▃▇▁
depth Good 0 1 62.37 2.17 54.30 61.30 63.40 63.80 67.00 ▁▂▂▇▁
depth Very Good 0 1 61.82 1.38 56.80 60.90 62.10 62.90 64.90 ▁▂▅▇▂
depth Premium 0 1 61.26 1.16 58.00 60.50 61.40 62.20 63.00 ▁▃▆▇▇
depth Ideal 0 1 61.71 0.72 43.00 61.30 61.80 62.20 66.70 ▁▁▁▇▆
table Fair 0 1 59.05 3.95 49.00 56.00 58.00 61.00 95.00 ▇▆▁▁▁
table Good 0 1 58.69 2.85 51.00 56.00 58.00 61.00 66.00 ▁▇▇▅▂
table Very Good 0 1 57.96 2.12 44.00 56.00 58.00 59.00 66.00 ▁▁▆▇▁
table Premium 0 1 58.75 1.48 51.00 58.00 59.00 60.00 62.00 ▁▁▁▇▃
table Ideal 0 1 55.95 1.25 43.00 55.00 56.00 57.00 63.00 ▁▁▅▇▁
price Fair 0 1 4358.76 3560.39 337.00 2050.25 3282.00 5205.50 18574.00 ▇▃▁▁▁
price Good 0 1 3928.86 3681.59 327.00 1145.00 3050.50 5028.00 18788.00 ▇▃▁▁▁
price Very Good 0 1 3981.76 3935.86 336.00 912.00 2648.00 5372.75 18818.00 ▇▃▁▁▁
price Premium 0 1 4584.26 4349.20 326.00 1046.00 3185.00 6296.00 18823.00 ▇▃▁▁▁
price Ideal 0 1 3457.54 3808.40 326.00 878.00 1810.00 4678.50 18806.00 ▇▂▁▁▁
x Fair 0 1 6.25 0.96 0.00 5.63 6.18 6.70 10.74 ▁▁▇▃▁
x Good 0 1 5.84 1.06 0.00 5.02 5.98 6.42 9.44 ▁▁▆▇▁
x Very Good 0 1 5.74 1.10 0.00 4.75 5.74 6.47 10.01 ▁▁▇▆▁
x Premium 0 1 5.97 1.19 0.00 4.80 6.11 6.80 10.14 ▁▁▇▇▁
x Ideal 0 1 5.51 1.06 0.00 4.54 5.25 6.44 9.65 ▁▁▇▃▁
y Fair 0 1 6.18 0.96 0.00 5.57 6.10 6.64 10.54 ▁▁▇▃▁
y Good 0 1 5.85 1.05 0.00 5.02 5.99 6.44 9.38 ▁▁▆▇▁
y Very Good 0 1 5.77 1.10 0.00 4.77 5.77 6.51 9.94 ▁▁▇▆▁
y Premium 0 1 5.94 1.26 0.00 4.79 6.06 6.76 58.90 ▇▁▁▁▁
y Ideal 0 1 5.52 1.07 0.00 4.55 5.26 6.44 31.80 ▇▃▁▁▁
z Fair 0 1 3.98 0.65 0.00 3.61 3.97 4.28 6.98 ▁▁▇▃▁
z Good 0 1 3.64 0.65 0.00 3.07 3.70 4.03 5.79 ▁▁▆▇▁
z Very Good 0 1 3.56 0.73 0.00 2.95 3.56 4.02 31.80 ▇▁▁▁▁
z Premium 0 1 3.65 0.73 0.00 2.94 3.72 4.16 8.06 ▁▅▇▁▁
z Ideal 0 1 3.40 0.66 0.00 2.80 3.23 3.98 6.03 ▁▁▇▃▁

5.2.2 Visualize the data with Group_by and summarize() functions

```{r Read_perf}
stu_per_readr %>% 
  group_by(gender) %>% 
  summarize(mean_math = mean(`math score`),
            sd_math = sd(`math score`),
            mean_read = mean(`reading score`),
            sd_read = sd(`reading score`),
            corr_math_read = cor(`math score`, `reading score`)
            )

stu_per_readr %>% 
  ggplot(aes(x = `math score`, y = `reading score`))+
  geom_point(aes(color = gender)) +
  geom_smooth(method = lm)

stu_per_readr %>% 
  ggplot(aes(x = `math score`, y = `reading score`))+
  geom_point(aes(color = gender)) +
  geom_smooth(method = lm) +
  facet_wrap(~ gender)
```

6 Exporting data

Save data as csv, Excel, and SPSS data

6.1 Saving as a csv file

```{r two ways}
write.csv(stocks, "data/stocks.csv")
#help("write.csv")

write_csv(stocks, "data/stocks1.csv")
```

6.2 save as an Excel file

Two packages are available.

```{r xlsx}
#install.packages("xlsx")
library(xlsx)
write.xlsx(stocks, "data/stocks1.xlsx")
```
```{r writexl}
#install.packages('writexl')
writexl::write_xlsx(stocks, "data/stocks2.xlsx")
```

6.3 Save as a SPSS data

```{r haven}
write_sav(stocks, "data/stocks.sav") # with haven package
```

7 Export Charts (Save chart)

7.1 Manually

```{r Export Charts}
hist(mtcars$mpg) # save chart manually
```

7.2 Programatically

```{r}
mtcars %>% 
  ggplot(aes(mpg)) +
  geom_histogram()

ggsave(filename = "images/mpg_hist.jpeg", plot = last_plot()) # or plot = last_plot()
```

8 Web Scapring with rvest package

8.1 HTML

HTML has a hierarchical structure formed by elements which consist of a start tag (e.g., ), optional attributes (id=‘first’), an end tag (like ), and contents (everything in between the start and end tag).

<html>
<head>
  <title>Page title</title>
</head>
<body>
  <h1 id='first'>A heading</h1>
  <p>Some text &amp; <b>some bold text.</b></p>
  <img src='myimg.png' width='100' height='100'>
</body>

8.1.1 Element

<p>
  Hi! My <b>name</b> is Hadley.
</p>

8.1.2 Attribute

  • class
  • id

8.2 Basics of Extracting Data

8.2.1 Find elements

  • CSS is short for cascading style sheets, and is a tool for defining the visual styling of HTML documents.
  • CSS includes a miniature language for selecting elements on a page called CSS selectors.
  • CSS selectors define patterns for locating HTML elements, and are useful for scraping because they provide a concise way of describing which elements you want to extract.
  • Three basic CSS selectors
  • p selects all <p> elements.
  • .title selects all elements with class “title”.
  • #title selects the element with the id attribute that equals “title”.
  • Id attributes must be unique within a document, so this will only ever select a single element.
  • html_elements() = html_nodes()
  • class: .xxxx:
  • ID: #xxxx
```{r}
library(rvest)
html <- minimal_html("
  <h1>This is a heading</h1>
  <p id='first'>This is a paragraph</p>
  <p class='important'>This is an important paragraph</p>
")

html %>% html_element("h1") |>
  html_text()

html %>% html_elements("p") |> 
  html_text()

html %>% html_elements(".important")|> # use . in front of class
  html_text()

html %>% html_elements("#first") |>  # use # in front of id
  html_text()
```
[1] "This is a heading"
[1] "This is a paragraph"            "This is an important paragraph"
[1] "This is an important paragraph"
[1] "This is a paragraph"

8.2.2 Nesting selections

  • html_elements()
    • When applied to a node set, html_elements() returns all matching elements beneath any of the inputs, flattening results into a new node set.
  • html_element()
    • When applied to a node set, html_element() always returns a vector the same length as the input, using a “missing” element where needed.

In most cases, you’ll use html_elements() and html_element() together, typically using html_elements() to identify elements that will become observations then using html_element() to find elements that will become variables.

```{r}
html <- minimal_html("
  <ul>
    <li><b>C-3PO</b> is a <i>droid</i> that weighs <span class='weight'>167 kg</span></li>
    <li><b>R4-P17</b> is a <i>droid</i></li>
    <li><b>R2-D2</b> is a <i>droid</i> that weighs <span class='weight'>96 kg</span></li>
    <li><b>Yoda</b> weighs <span class='weight'>66 kg</span></li>
  </ul>
  ")
characters <- html |> 
  html_elements("li") # observations
characters

# names

characters |> 
  html_element("b") # variables: cf compare with below

characters |> 
  html_elements("b")

# weights

characters |> 
  html_element(".weight")

characters |> 
  html_elements(".weight")
```
{xml_nodeset (4)}
[1] <li>\n<b>C-3PO</b> is a <i>droid</i> that weighs <span class="weight">167 ...
[2] <li>\n<b>R4-P17</b> is a <i>droid</i>\n</li>
[3] <li>\n<b>R2-D2</b> is a <i>droid</i> that weighs <span class="weight">96  ...
[4] <li>\n<b>Yoda</b> weighs <span class="weight">66 kg</span>\n</li>
{xml_nodeset (4)}
[1] <b>C-3PO</b>
[2] <b>R4-P17</b>
[3] <b>R2-D2</b>
[4] <b>Yoda</b>
{xml_nodeset (4)}
[1] <b>C-3PO</b>
[2] <b>R4-P17</b>
[3] <b>R2-D2</b>
[4] <b>Yoda</b>
{xml_nodeset (4)}
[1] <span class="weight">167 kg</span>
[2] NA
[3] <span class="weight">96 kg</span>
[4] <span class="weight">66 kg</span>
{xml_nodeset (3)}
[1] <span class="weight">167 kg</span>
[2] <span class="weight">96 kg</span>
[3] <span class="weight">66 kg</span>

8.2.3 Texts with html_text2()

```{r}
characters |> 
  html_element("b") |> 
  html_text2()

characters |> 
  html_element(".weight") |> 
  html_text2()
```
[1] "C-3PO"  "R4-P17" "R2-D2"  "Yoda"  
[1] "167 kg" NA       "96 kg"  "66 kg" 

8.2.4 attributes with html_attr()

```{r}
html <- minimal_html("
  <p><a href='https://en.wikipedia.org/wiki/Cat'>cats</a></p>
  <p><a href='https://en.wikipedia.org/wiki/Dog'>dogs</a></p>
")

# read value of attribute
html |> 
  html_elements("p") |> 
  html_element("a") |> 
  html_attr("href")

# read what's surrounded by a tags
html |> 
  html_elements("p") |> 
  html_element("a") |> 
  html_text2()
```
[1] "https://en.wikipedia.org/wiki/Cat" "https://en.wikipedia.org/wiki/Dog"
[1] "cats" "dogs"

8.2.4.1 Practice

```{r}
html <- minimal_html("
  <ul>
    <li><b>C-3PO</b> is a <i>droid</i> that weighs <span class='weight'>167 kg</span></li>
    <li><b>R4-P17</b> is a <i>droid</i></li>
    <li><b>R2-D2</b> is a <i>droid</i> that weighs <span class='weight'>96 kg</span></li>
    <li><b>Yoda</b> weighs <span class='weight'>66 kg</span></li>
  </ul>
  ")

html |> 
  html_elements("li") |> 
  html_element("i") |> 
  html_text2()

# value of attribute
html |> 
  html_elements("li") |>
  html_element("span") |> 
  html_attr("class")

# what's surrounded by span (which is weights)
html |> 
  html_elements("li") |> 
  html_element("span") |>
  html_text()

# using class to get weights
html |> 
  html_elements("li") |>  
  html_element(".weight") |> 
  html_text2()
```
[1] "droid" "droid" "droid" NA     
[1] "weight" NA       "weight" "weight"
[1] "167 kg" NA       "96 kg"  "66 kg" 
[1] "167 kg" NA       "96 kg"  "66 kg" 

8.2.5 Table

  • Four main elements
    - <table>, 
    - <tr> (table row), 
    - <th> (table heading), and 
    - <td> (table data).
```{r}
html <- minimal_html("
  <table class='mytable'>
    <tr><th>x</th>   <th>y</th></tr>
    <tr><td>1.5</td> <td>2.7</td></tr>
    <tr><td>4.9</td> <td>1.3</td></tr>
    <tr><td>7.2</td> <td>8.1</td></tr>
  </table>
  ")
html |> 
  html_element(".mytable") |> 
  html_table()

html |> 
  html_element("table") |> 
  html_table()


html |> 
  html_elements("table") |> 
  html_table() # same as above since there is only one table in the data
```
[[1]]
# A tibble: 3 × 2
      x     y
  <dbl> <dbl>
1   1.5   2.7
2   4.9   1.3
3   7.2   8.1

8.3 Practice

  • Goal: Extract the table, where id=“exampleTable”
```{r}
## first, read the HTML code for our example HTML page
html <- read_html('https://bit.ly/3lz6ZRe')


html %>% 
  html_element('#exampleTable') |>
  html_table()
```
  • Goal: Extract the whole paragraph for the class named “rightColumn”
```{r}
html %>% 
  html_element('.rightColumn p') |> 
  html_text2() |> 
  cat()

html |> 
  html_element(".rightColumn") |> 
  html_element("p") |> 
  html_text2() # the same as above

html %>% 
  html_element('div.rightColumn p') |> 
  html_text2() # the same as above
```
Here's another column! The main purpose of this column is just to show that you can use CSS selectors to get all elements in a specific column.[1] "Here's another column! The main purpose of this column is just to show that you can use CSS selectors to get all elements in a specific column."
[1] "Here's another column! The main purpose of this column is just to show that you can use CSS selectors to get all elements in a specific column."

8.3.1 Selecting HTML elements

```{r}
html = 'https://bit.ly/3lz6ZRe' %>% read_html()
html

## find any <table> element
html %>% html_element('table')            ## left table 
html %>% html_elements('table')           ## set of both tables

## find any element with class="someTable"
html %>% html_element('.someTable')       ## left table
html %>% html_elements('.someTable')      ## set of both tables

## find any element with id="steve" 
## (only called it steve to show that id can be anything the developer chooses)
html %>% html_element('#steve')           ## right table 
html %>% html_elements('#steve')          ## set with only the right table 

## find any <tr> element with class="headerRow"
html %>% html_element('tr.headerRow')     ## left table first row
html %>% html_elements('tr.headerRow')    ## first rows of both tables

## find any element with class="someTable blue"
html %>% html_element('.someTable.blue') |>  ## right table   
  html_table()

html %>% html_element('table.someTable.blue') |>  ## right table   
  html_table()

html %>% html_element('table#steve.someTable.blue') |>  ## right table   
  html_table()

html %>% html_element('#steve') |>  ## right table   
  html_table()

html %>% html_element('table#steve') |>  ## right table   
  html_table()

html %>% html_element('table.someTable.blue#steve') |>  ## right table   
  html_table()

html %>% html_elements('.someTable.blue') |>  ## set with only the right table    
  html_table()
```
{html_document}
<html>
[1] <head>\n<meta http-equiv="Content-Type" content="text/html; charset=UTF-8 ...
[2] <body>\n\n<h2>Looking behind the curtains of an HTML page</h2>\n\n<div cl ...
{html_node}
<table class="someTable" id="exampleTable">
[1] <tr class="headerRow">\n<!--    table row         --><th>First column</th ...
[2] <tr>\n<!--    table row         --><td>1</td>                             ...
[3] <tr>\n<!--    table row         --><td>4</td>                             ...
{xml_nodeset (2)}
[1] <table class="someTable" id="exampleTable">\n<!-- table                -- ...
[2] <table class="someTable blue" id="steve">\n<tr class="headerRow">\n<th>nu ...
{html_node}
<table class="someTable" id="exampleTable">
[1] <tr class="headerRow">\n<!--    table row         --><th>First column</th ...
[2] <tr>\n<!--    table row         --><td>1</td>                             ...
[3] <tr>\n<!--    table row         --><td>4</td>                             ...
{xml_nodeset (2)}
[1] <table class="someTable" id="exampleTable">\n<!-- table                -- ...
[2] <table class="someTable blue" id="steve">\n<tr class="headerRow">\n<th>nu ...
{html_node}
<table class="someTable blue" id="steve">
[1] <tr class="headerRow">\n<th>numbers</th>\n        <th>letters</th>\n      ...
[2] <tr>\n<td>1</td>\n        <td>A</td>\n      </tr>\n
[3] <tr>\n<td>2</td>\n        <td>B</td>\n      </tr>\n
[4] <tr>\n<td>3</td>\n        <td>C</td>\n      </tr>\n
[5] <tr>\n<td>4</td>\n        <td>D</td>\n      </tr>\n
[6] <tr>\n<td>5</td>\n        <td>E</td>\n      </tr>\n
[7] <tr>\n<td>6</td>\n        <td>F</td>\n      </tr>
{xml_nodeset (1)}
[1] <table class="someTable blue" id="steve">\n<tr class="headerRow">\n<th>nu ...
{html_node}
<tr class="headerRow">
[1] <th>First column</th>
[2] <th>Second column</th>
[3] <th>Third column</th>
{xml_nodeset (2)}
[1] <tr class="headerRow">\n<!--    table row         --><th>First column</th ...
[2] <tr class="headerRow">\n<th>numbers</th>\n        <th>letters</th>\n      ...
[[1]]
# A tibble: 6 × 2
  numbers letters
    <int> <chr>  
1       1 A      
2       2 B      
3       3 C      
4       4 D      
5       5 E      
6       6 F      

8.3.2 Extracting data from elements

```{r}
html = 'https://bit.ly/3lz6ZRe' %>% read_html()

# extract everything in the the left column

html %>% html_element('.leftColumn') %>% html_text() |> 
  cat()

html %>% html_element('.leftColumn') %>% html_text2() |> 
  cat() # the same as above

# Extract value of attributes

html |> html_elements('#exampleTable') |>  html_attrs() # singlular doesn't work
html |> html_elements('#exampleTable') |>  html_attr("id")
html |> html_elements('#exampleTable') |>  html_attr("class")

html |> html_elements('.someTable') |> html_attrs()
html |> html_elements('.someTable') |> html_attr("class")
html |> html_elements('.someTable') |> html_attr("id")

# links
html %>% html_elements('a') %>% html_attr('href')
```

    Left Column

    This is a simple HTML document. Right click on the page and select view page source 
       (or something similar, depending on browser) to view the HTML source code.
    
    Alternatively, right click on a specific element on the page and select inspect element. 
       This also shows the HTML code, but focused on the selected element. You should be able to fold 
       and unfold HTML nodes (using the triangle-like thing before the <tags>), and when you hover 
       your mouse over them, they should light up in the browser. Play around with this for a bit to get 
       a feel for exploring HTML code.

    Here's a stupid table.
    
    First column                         
        Second column                        
        Third column                         
      1                                    
        2                                    
        3                                    
      4                                    
        5                                    
        6                                    
      Left Column

This is a simple HTML document. Right click on the page and select view page source (or something similar, depending on browser) to view the HTML source code.

Alternatively, right click on a specific element on the page and select inspect element. This also shows the HTML code, but focused on the selected element. You should be able to fold and unfold HTML nodes (using the triangle-like thing before the <tags>), and when you hover your mouse over them, they should light up in the browser. Play around with this for a bit to get a feel for exploring HTML code.

Here's a stupid table.

First column    Second column   Third column    
1   2   3   
4   5   6   [[1]]
         class             id 
   "someTable" "exampleTable" 

[1] "exampleTable"
[1] "someTable"
[[1]]
         class             id 
   "someTable" "exampleTable" 

[[2]]
           class               id 
"someTable blue"          "steve" 

[1] "someTable"      "someTable blue"
[1] "exampleTable" "steve"       
[1] "https://blog.hubspot.com/website/how-to-inspect"

8.4 Finding the right selector

  • selectorGadget
  • Web development tool, Inspect

8.5 Starwars

Task Objective
```{r}
url <- "https://rvest.tidyverse.org/articles/starwars.html"

# observation
section <- url |> 
  read_html() |> 
  html_elements("section")

# variable
section |> 
  html_element("h2") |> 
  html_text2()

# title
title = section |> 
  html_element("h2") |> 
  html_text2()

# release date
released = section |> 
  html_element("p") |> 
  html_text2() |> 
  str_remove("Released: ") |> 
  parse_date() 

# director

director = section |> 
  html_element(".director") |> 
  html_text2()

section |> 
  html_element("span") |> 
  html_text2()

section |> 
  html_element("p span") |> 
  html_text2()

# introduction

intro = section |> 
  html_element(".crawl") |> 
  html_text2()

section |> 
  html_element("div.crawl") |> 
  html_text2()

tibble(title, released, director, intro)
```
[1] "The Phantom Menace"      "Attack of the Clones"   
[3] "Revenge of the Sith"     "A New Hope"             
[5] "The Empire Strikes Back" "Return of the Jedi"     
[7] "The Force Awakens"      
[1] "George Lucas"     "George Lucas"     "George Lucas"     "George Lucas"    
[5] "Irvin Kershner"   "Richard Marquand" "J. J. Abrams"    
[1] "George Lucas"     "George Lucas"     "George Lucas"     "George Lucas"    
[5] "Irvin Kershner"   "Richard Marquand" "J. J. Abrams"    
[1] "Turmoil has engulfed the Galactic Republic. The taxation of trade routes to outlying star systems is in dispute.\n\nHoping to resolve the matter with a blockade of deadly battleships, the greedy Trade Federation has stopped all shipping to the small planet of Naboo.\n\nWhile the Congress of the Republic endlessly debates this alarming chain of events, the Supreme Chancellor has secretly dispatched two Jedi Knights, the guardians of peace and justice in the galaxy, to settle the conflict…."           
[2] "There is unrest in the Galactic Senate. Several thousand solar systems have declared their intentions to leave the Republic.\n\nThis separatist movement, under the leadership of the mysterious Count Dooku, has made it difficult for the limited number of Jedi Knights to maintain peace and order in the galaxy.\n\nSenator Amidala, the former Queen of Naboo, is returning to the Galactic Senate to vote on the critical issue of creating an ARMY OF THE REPUBLIC to assist the overwhelmed Jedi…."             
[3] "War! The Republic is crumbling under attacks by the ruthless Sith Lord, Count Dooku. There are heroes on both sides. Evil is everywhere.\n\nIn a stunning move, the fiendish droid leader, General Grievous, has swept into the Republic capital and kidnapped Chancellor Palpatine, leader of the Galactic Senate.\n\nAs the Separatist Droid Army attempts to flee the besieged capital with their valuable hostage, two Jedi Knights lead a desperate mission to rescue the captive Chancellor…."                     
[4] "It is a period of civil war. Rebel spaceships, striking from a hidden base, have won their first victory against the evil Galactic Empire.\n\nDuring the battle, Rebel spies managed to steal secret plans to the Empire’s ultimate weapon, the DEATH STAR, an armored space station with enough power to destroy an entire planet.\n\nPursued by the Empire’s sinister agents, Princess Leia races home aboard her starship, custodian of the stolen plans that can save her people and restore freedom to the galaxy…."
[5] "It is a dark time for the Rebellion. Although the Death Star has been destroyed, Imperial troops have driven the Rebel forces from their hidden base and pursued them across the galaxy.\n\nEvading the dreaded Imperial Starfleet, a group of freedom fighters led by Luke Skywalker has established a new secret base on the remote ice world of Hoth.\n\nThe evil lord Darth Vader, obsessed with finding young Skywalker, has dispatched thousands of remote probes into the far reaches of space…."                 
[6] "Luke Skywalker has returned to his home planet of Tatooine in an attempt to rescue his friend Han Solo from the clutches of the vile gangster Jabba the Hutt.\n\nLittle does Luke know that the GALACTIC EMPIRE has secretly begun construction on a new armored space station even more powerful than the first dreaded Death Star.\n\nWhen completed, this ultimate weapon will spell certain doom for the small band of rebels struggling to restore freedom to the galaxy…"                                          
[7] "Luke Skywalker has vanished. In his absence, the sinister FIRST ORDER has risen from the ashes of the Empire and will not rest until Skywalker, the last Jedi, has been destroyed. With the support of the REPUBLIC, General Leia Organa leads a brave RESISTANCE. She is desperate to find her brother Luke and gain his help in restoring peace and justice to the galaxy. Leia has sent her most daring pilot on a secret mission to Jakku, where an old ally has discovered a clue to Luke’s whereabouts…."          

8.6 Archived IMDB Example

Goals

From IMDB Top 250 Movies list, extract the ranking, title, year, rating, and vote number. <https://web.archive.org/web/20220201012049/https://www.imdb.com/chart/top/” >

```{r}
"https://web.archive.org/web/20220201012049/https://www.imdb.com/chart/top/" |> read_html() -> html

table <- html |> 
  html_element("table") |> 
  html_table()

table |> glimpse()

ratings <- table |> 
  select(
    rank_title_year = `Rank & Title`,
    rating = `IMDb Rating`
  ) |> 
  mutate(rank_title_year = str_replace_all(rank_title_year, "\n +", " ")) |> 
  separate_wider_regex( # tidyr
    rank_title_year,
    patterns = c(
      rank = "\\d+", "\\. ",
      title = ".+", " +\\(",
      year = "\\d+", "\\)"
    )
  )
ratings

# rating

html |> 
  html_elements("tr .ratingColumn.imdbRating") |> 
  html_text2()

# vote counts
html |> 
  html_elements("td strong") |> 
  html_attr("title")

ratings |> 
  mutate(
    rating_n = html |>  html_elements("td strong") |> html_attr("title")
  ) |> 
  separate_wider_regex(
    rating_n, 
    patterns = c(
      "[0-9.]+ based on ",
      number = "[0-9,]+",
      " user ratings"
    )
  ) |> 
  mutate(number = parse_number(number))
```
Rows: 250
Columns: 5
$ ``             <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ `Rank & Title` <chr> "1.\n      The Shawshank Redemption\n        (1994)", "…
$ `IMDb Rating`  <dbl> 9.2, 9.1, 9.0, 9.0, 8.9, 8.9, 8.9, 8.8, 8.8, 8.8, 8.7, …
$ `Your Rating`  <chr> "12345678910\n        \n        \n            \n       …
$ ``             <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
  [1] "9.2" "9.1" "9.0" "9.0" "8.9" "8.9" "8.9" "8.8" "8.8" "8.8" "8.7" "8.7"
 [13] "8.7" "8.7" "8.7" "8.7" "8.6" "8.6" "8.6" "8.6" "8.6" "8.6" "8.6" "8.6"
 [25] "8.6" "8.6" "8.5" "8.5" "8.5" "8.5" "8.5" "8.5" "8.5" "8.5" "8.5" "8.5"
 [37] "8.5" "8.5" "8.5" "8.5" "8.5" "8.5" "8.5" "8.5" "8.5" "8.5" "8.5" "8.5"
 [49] "8.4" "8.4" "8.4" "8.4" "8.4" "8.4" "8.4" "8.4" "8.4" "8.4" "8.4" "8.4"
 [61] "8.4" "8.4" "8.4" "8.4" "8.4" "8.4" "8.4" "8.3" "8.3" "8.3" "8.3" "8.3"
 [73] "8.3" "8.3" "8.3" "8.3" "8.3" "8.3" "8.3" "8.3" "8.3" "8.3" "8.3" "8.3"
 [85] "8.3" "8.3" "8.3" "8.3" "8.3" "8.3" "8.3" "8.3" "8.3" "8.3" "8.3" "8.3"
 [97] "8.3" "8.3" "8.3" "8.3" "8.3" "8.3" "8.3" "8.2" "8.2" "8.2" "8.2" "8.2"
[109] "8.2" "8.2" "8.2" "8.2" "8.2" "8.2" "8.2" "8.2" "8.2" "8.2" "8.2" "8.2"
[121] "8.2" "8.2" "8.2" "8.2" "8.2" "8.2" "8.2" "8.2" "8.2" "8.2" "8.2" "8.2"
[133] "8.2" "8.2" "8.2" "8.2" "8.2" "8.2" "8.2" "8.2" "8.2" "8.2" "8.2" "8.2"
[145] "8.2" "8.2" "8.2" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1"
[157] "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1"
[169] "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1"
[181] "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1"
[193] "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1"
[205] "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1"
[217] "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.0" "8.0" "8.0" "8.0" "8.0"
[229] "8.0" "8.0" "8.0" "8.0" "8.0" "8.0" "8.0" "8.0" "8.0" "8.0" "8.0" "8.0"
[241] "8.0" "8.0" "8.0" "8.0" "8.0" "8.0" "8.0" "8.0" "8.0" "8.0"
  [1] "9.2 based on 2,536,415 user ratings"
  [2] "9.1 based on 1,745,675 user ratings"
  [3] "9.0 based on 1,211,032 user ratings"
  [4] "9.0 based on 2,486,931 user ratings"
  [5] "8.9 based on 749,563 user ratings"  
  [6] "8.9 based on 1,295,705 user ratings"
  [7] "8.9 based on 1,749,722 user ratings"
  [8] "8.8 based on 1,952,864 user ratings"
  [9] "8.8 based on 732,557 user ratings"  
 [10] "8.8 based on 1,771,245 user ratings"
 [11] "8.7 based on 1,996,110 user ratings"
 [12] "8.7 based on 1,957,544 user ratings"
 [13] "8.7 based on 2,228,642 user ratings"
 [14] "8.7 based on 1,580,899 user ratings"
 [15] "8.7 based on 1,230,892 user ratings"
 [16] "8.7 based on 1,830,919 user ratings"
 [17] "8.6 based on 1,096,894 user ratings"
 [18] "8.6 based on 970,948 user ratings"  
 [19] "8.6 based on 334,944 user ratings"  
 [20] "8.6 based on 1,556,144 user ratings"
 [21] "8.6 based on 1,363,287 user ratings"
 [22] "8.6 based on 733,264 user ratings"  
 [23] "8.6 based on 438,951 user ratings"  
 [24] "8.6 based on 666,605 user ratings"  
 [25] "8.6 based on 432,286 user ratings"  
 [26] "8.6 based on 1,323,820 user ratings"
 [27] "8.5 based on 1,302,369 user ratings"
 [28] "8.5 based on 1,681,995 user ratings"
 [29] "8.5 based on 715,672 user ratings"  
 [30] "8.5 based on 1,234,090 user ratings"
 [31] "8.5 based on 712,448 user ratings"  
 [32] "8.5 based on 1,108,188 user ratings"
 [33] "8.5 based on 51,931 user ratings"   
 [34] "8.5 based on 790,771 user ratings"  
 [35] "8.5 based on 1,053,728 user ratings"
 [36] "8.5 based on 1,140,698 user ratings"
 [37] "8.5 based on 1,047,652 user ratings"
 [38] "8.5 based on 644,147 user ratings"  
 [39] "8.5 based on 1,006,734 user ratings"
 [40] "8.5 based on 233,625 user ratings"  
 [41] "8.5 based on 260,547 user ratings"  
 [42] "8.5 based on 1,084,181 user ratings"
 [43] "8.5 based on 788,463 user ratings"  
 [44] "8.5 based on 1,431,474 user ratings"
 [45] "8.5 based on 179,321 user ratings"  
 [46] "8.5 based on 1,266,998 user ratings"
 [47] "8.5 based on 817,319 user ratings"  
 [48] "8.5 based on 1,273,524 user ratings"
 [49] "8.4 based on 552,614 user ratings"  
 [50] "8.4 based on 319,732 user ratings"  
 [51] "8.4 based on 474,492 user ratings"  
 [52] "8.4 based on 250,817 user ratings"  
 [53] "8.4 based on 841,202 user ratings"  
 [54] "8.4 based on 642,211 user ratings"  
 [55] "8.4 based on 1,191,224 user ratings"
 [56] "8.4 based on 932,634 user ratings"  
 [57] "8.4 based on 217,246 user ratings"  
 [58] "8.4 based on 1,466,464 user ratings"
 [59] "8.4 based on 378,992 user ratings"  
 [60] "8.4 based on 190,672 user ratings"  
 [61] "8.4 based on 214,880 user ratings"  
 [62] "8.4 based on 1,066,004 user ratings"
 [63] "8.4 based on 975,508 user ratings"  
 [64] "8.4 based on 119,927 user ratings"  
 [65] "8.4 based on 466,734 user ratings"  
 [66] "8.4 based on 968,398 user ratings"  
 [67] "8.4 based on 474,688 user ratings"  
 [68] "8.3 based on 374,677 user ratings"  
 [69] "8.3 based on 553,426 user ratings"  
 [70] "8.3 based on 1,136,562 user ratings"
 [71] "8.3 based on 239,205 user ratings"  
 [72] "8.3 based on 458,597 user ratings"  
 [73] "8.3 based on 1,613,428 user ratings"
 [74] "8.3 based on 691,108 user ratings"  
 [75] "8.3 based on 337,473 user ratings"  
 [76] "8.3 based on 1,005,677 user ratings"
 [77] "8.3 based on 81,165 user ratings"   
 [78] "8.3 based on 244,486 user ratings"  
 [79] "8.3 based on 41,453 user ratings"   
 [80] "8.3 based on 377,496 user ratings"  
 [81] "8.3 based on 947,809 user ratings"  
 [82] "8.3 based on 388,750 user ratings"  
 [83] "8.3 based on 1,120,594 user ratings"
 [84] "8.3 based on 1,004,883 user ratings"
 [85] "8.3 based on 1,368,392 user ratings"
 [86] "8.3 based on 922,275 user ratings"  
 [87] "8.3 based on 81,907 user ratings"   
 [88] "8.3 based on 1,006,169 user ratings"
 [89] "8.3 based on 71,711 user ratings"   
 [90] "8.3 based on 642,698 user ratings"  
 [91] "8.3 based on 976,830 user ratings"  
 [92] "8.3 based on 185,065 user ratings"  
 [93] "8.3 based on 389,665 user ratings"  
 [94] "8.3 based on 153,379 user ratings"  
 [95] "8.3 based on 313,111 user ratings"  
 [96] "8.3 based on 429,518 user ratings"  
 [97] "8.3 based on 809,376 user ratings"  
 [98] "8.3 based on 233,653 user ratings"  
 [99] "8.3 based on 317,960 user ratings"  
[100] "8.3 based on 966,328 user ratings"  
[101] "8.3 based on 75,518 user ratings"   
[102] "8.3 based on 158,667 user ratings"  
[103] "8.3 based on 284,543 user ratings"  
[104] "8.2 based on 122,766 user ratings"  
[105] "8.2 based on 714,172 user ratings"  
[106] "8.2 based on 168,419 user ratings"  
[107] "8.2 based on 175,698 user ratings"  
[108] "8.2 based on 177,566 user ratings"  
[109] "8.2 based on 152,493 user ratings"  
[110] "8.2 based on 169,613 user ratings"  
[111] "8.2 based on 236,661 user ratings"  
[112] "8.2 based on 121,932 user ratings"  
[113] "8.2 based on 780,889 user ratings"  
[114] "8.2 based on 799,840 user ratings"  
[115] "8.2 based on 254,464 user ratings"  
[116] "8.2 based on 796,924 user ratings"  
[117] "8.2 based on 826,507 user ratings"  
[118] "8.2 based on 528,206 user ratings"  
[119] "8.2 based on 735,266 user ratings"  
[120] "8.2 based on 308,535 user ratings"  
[121] "8.2 based on 801,505 user ratings"  
[122] "8.2 based on 248,181 user ratings"  
[123] "8.2 based on 997,479 user ratings"  
[124] "8.2 based on 30,369 user ratings"   
[125] "8.2 based on 729,640 user ratings"  
[126] "8.2 based on 623,585 user ratings"  
[127] "8.2 based on 563,810 user ratings"  
[128] "8.2 based on 121,699 user ratings"  
[129] "8.2 based on 119,336 user ratings"  
[130] "8.2 based on 847,420 user ratings"  
[131] "8.2 based on 447,908 user ratings"  
[132] "8.2 based on 163,198 user ratings"  
[133] "8.2 based on 346,503 user ratings"  
[134] "8.2 based on 128,053 user ratings"  
[135] "8.2 based on 525,787 user ratings"  
[136] "8.2 based on 258,938 user ratings"  
[137] "8.2 based on 1,389,272 user ratings"
[138] "8.2 based on 398,858 user ratings"  
[139] "8.2 based on 72,133 user ratings"   
[140] "8.2 based on 172,314 user ratings"  
[141] "8.2 based on 369,894 user ratings"  
[142] "8.2 based on 1,309,309 user ratings"
[143] "8.2 based on 75,094 user ratings"   
[144] "8.2 based on 559,103 user ratings"  
[145] "8.2 based on 498,452 user ratings"  
[146] "8.2 based on 237,967 user ratings"  
[147] "8.2 based on 121,706 user ratings"  
[148] "8.1 based on 648,671 user ratings"  
[149] "8.1 based on 897,988 user ratings"  
[150] "8.1 based on 204,179 user ratings"  
[151] "8.1 based on 341,517 user ratings"  
[152] "8.1 based on 315,112 user ratings"  
[153] "8.1 based on 320,423 user ratings"  
[154] "8.1 based on 1,232,392 user ratings"
[155] "8.1 based on 564,751 user ratings"  
[156] "8.1 based on 924,770 user ratings"  
[157] "8.1 based on 137,430 user ratings"  
[158] "8.1 based on 170,061 user ratings"  
[159] "8.1 based on 402,801 user ratings"  
[160] "8.1 based on 108,330 user ratings"  
[161] "8.1 based on 480,345 user ratings"  
[162] "8.1 based on 179,115 user ratings"  
[163] "8.1 based on 234,213 user ratings"  
[164] "8.1 based on 27,096 user ratings"   
[165] "8.1 based on 957,223 user ratings"  
[166] "8.1 based on 1,015,649 user ratings"
[167] "8.1 based on 929,833 user ratings"  
[168] "8.1 based on 104,205 user ratings"  
[169] "8.1 based on 167,796 user ratings"  
[170] "8.1 based on 166,961 user ratings"  
[171] "8.1 based on 1,086,084 user ratings"
[172] "8.1 based on 738,384 user ratings"  
[173] "8.1 based on 666,235 user ratings"  
[174] "8.1 based on 655,486 user ratings"  
[175] "8.1 based on 214,651 user ratings"  
[176] "8.1 based on 673,092 user ratings"  
[177] "8.1 based on 1,002,899 user ratings"
[178] "8.1 based on 1,068,344 user ratings"
[179] "8.1 based on 457,139 user ratings"  
[180] "8.1 based on 306,362 user ratings"  
[181] "8.1 based on 59,253 user ratings"   
[182] "8.1 based on 150,891 user ratings"  
[183] "8.1 based on 84,576 user ratings"   
[184] "8.1 based on 190,747 user ratings"  
[185] "8.1 based on 661,393 user ratings"  
[186] "8.1 based on 129,500 user ratings"  
[187] "8.1 based on 766,950 user ratings"  
[188] "8.1 based on 329,782 user ratings"  
[189] "8.1 based on 88,539 user ratings"   
[190] "8.1 based on 113,559 user ratings"  
[191] "8.1 based on 753,584 user ratings"  
[192] "8.1 based on 47,411 user ratings"   
[193] "8.1 based on 293,483 user ratings"  
[194] "8.1 based on 173,140 user ratings"  
[195] "8.1 based on 917,392 user ratings"  
[196] "8.1 based on 113,180 user ratings"  
[197] "8.1 based on 161,695 user ratings"  
[198] "8.1 based on 169,052 user ratings"  
[199] "8.1 based on 468,148 user ratings"  
[200] "8.1 based on 487,407 user ratings"  
[201] "8.1 based on 27,359 user ratings"   
[202] "8.1 based on 933,677 user ratings"  
[203] "8.1 based on 402,019 user ratings"  
[204] "8.1 based on 52,624 user ratings"   
[205] "8.1 based on 86,972 user ratings"   
[206] "8.1 based on 353,368 user ratings"  
[207] "8.1 based on 676,674 user ratings"  
[208] "8.1 based on 35,134 user ratings"   
[209] "8.1 based on 780,504 user ratings"  
[210] "8.1 based on 462,094 user ratings"  
[211] "8.1 based on 829,167 user ratings"  
[212] "8.1 based on 232,297 user ratings"  
[213] "8.1 based on 708,101 user ratings"  
[214] "8.1 based on 951,179 user ratings"  
[215] "8.1 based on 32,909 user ratings"   
[216] "8.1 based on 667,630 user ratings"  
[217] "8.1 based on 59,274 user ratings"   
[218] "8.1 based on 387,091 user ratings"  
[219] "8.1 based on 133,958 user ratings"  
[220] "8.1 based on 154,752 user ratings"  
[221] "8.1 based on 710,813 user ratings"  
[222] "8.1 based on 70,488 user ratings"   
[223] "8.1 based on 164,778 user ratings"  
[224] "8.0 based on 272,839 user ratings"  
[225] "8.0 based on 172,689 user ratings"  
[226] "8.0 based on 92,320 user ratings"   
[227] "8.0 based on 113,868 user ratings"  
[228] "8.0 based on 401,608 user ratings"  
[229] "8.0 based on 452,546 user ratings"  
[230] "8.0 based on 869,253 user ratings"  
[231] "8.0 based on 133,352 user ratings"  
[232] "8.0 based on 388,191 user ratings"  
[233] "8.0 based on 141,794 user ratings"  
[234] "8.0 based on 347,891 user ratings"  
[235] "8.0 based on 68,989 user ratings"   
[236] "8.0 based on 461,996 user ratings"  
[237] "8.0 based on 551,403 user ratings"  
[238] "8.0 based on 235,194 user ratings"  
[239] "8.0 based on 604,948 user ratings"  
[240] "8.0 based on 163,901 user ratings"  
[241] "8.0 based on 45,710 user ratings"   
[242] "8.0 based on 253,135 user ratings"  
[243] "8.0 based on 100,753 user ratings"  
[244] "8.0 based on 62,481 user ratings"   
[245] "8.0 based on 39,453 user ratings"   
[246] "8.0 based on 58,133 user ratings"   
[247] "8.0 based on 47,439 user ratings"   
[248] "8.0 based on 45,058 user ratings"   
[249] "8.0 based on 52,144 user ratings"   
[250] "8.0 based on 416,920 user ratings"  

8.7 Current IMDB Site Example

Goals

From IMDB Top 250 Movies list, extract movie ranking, titles, ratings, and vote counts from the website https://m.imdb.com/chart/top/?ref_=nv_mv_250 and create a dataframe with them.

```{r}
page <- "https://m.imdb.com/chart/top/?ref_=nv_mv_250" |> read_html()

rank_titles <- page |> 
  html_elements("ul") |> 
  html_elements("li h3") |> 
  html_text2()


ratings <- page |> 
  html_elements("ul") |> 
  html_elements("li .ipc-rating-star--rating") |> 
  html_text2() 
  
vote_counts <- page |> 
  html_elements("ul") |> 
  html_elements("li .ipc-rating-star--voteCount") |> 
  html_text(trim = TRUE) 
```

8.7.1 Tibbles

```{r}
top_movies <- tibble(rank_titles, ratings, vote_counts)

top_movies <- top_movies |> 
  separate_wider_regex(
    rank_titles,
    patterns = c(
      ranking = "\\d+", "\\. ",
      movie = ".+"
    )
  ) |> 
  mutate(
    ranking = as.numeric(ranking),
    ratings = parse_number(ratings),
    vote_counts = str_extract(vote_counts, "[0-9.]+[MK]"),
    multiplier = if_else(str_detect(vote_counts, "M"), 1e6, 1e3),
    vote_counts = parse_number(vote_counts) * multiplier
  ) |> 
   select(-multiplier)

top_movies
```

9 NA’s

9.1 Explicit and implicit missing values

```{r}
stock <- tibble(
  year  = c(2020, 2020, 2020, 2020, 2021, 2021, 2021),
  qtr   = c(   1,    2,    3,    4,    2,    3,    4),
  price = c(1.88, 0.59, 0.35,   NA, 0.92, 0.17, 2.66)
)

stock

# pivot_wider will make implicit missing explicit.
stock_wider <- stock |> 
  pivot_wider(
    names_from = qtr,
    values_from = price,
  )
stock_wider

# in pivot_longer, missing values can be dropped
stock_wider |> 
  pivot_longer(
    cols = -year, 
    names_to = "qtr",
    values_to = "price",
    values_drop_na = TRUE
  )
```

9.1.1 complete()

from tidyr package - Allows you to generate explicit missing values

```{r}
stock |> 
  complete(year, qtr)

stock |> 
  complete(year = 2018:2021, qtr)
```

9.2 Factors and empty group in factor levels

9.2.1 table

```{r}
health <- tibble(
  name   = c("Ikaia", "Oletta", "Leriah", "Dashay", "Tresaun"),
  smoker = factor(c("no", "no", "no", "no", "no"), levels = c("yes", "no")),
  age    = c(34, 88, 75, 47, 56),
)

health

health |> 
  count(smoker)

library(gt)
health |> 
  count(smoker, .drop = FALSE) |> 
  gt() |> 
  tab_header(md("**Smokers vs. Non-smokers**"))
```
Smokers vs. Non-smokers
smoker n
yes 0
no 5

9.2.2 plots

```{r}
health |> 
  count(smoker, .drop = FALSE) |> 
  ggplot(aes(smoker, n)) +
  geom_col()

health |> 
  ggplot(aes(smoker)) +
  geom_bar() +
  scale_x_discrete(drop = FALSE)
```

9.2.3 group_by()

```{r}
health |> 
  group_by(smoker) |> 
  summarise(
    n = n(),
    mean_age = mean(age),
    sd_age = sd(age),
    min_age = min(age),
    max_age = max(age)
  )

# both groups
health |> 
  group_by(smoker, .drop = FALSE) |> 
  summarise(
    n = n(),
    mean_age = mean(age),
    sd_age = sd(age),
    min_age = min(age),
    max_age = max(age)
  )

# cf
health |> 
  group_by(smoker) |> 
  summarize(
    n = n(),
    mean_age = mean(age),
    sd_age = sd(age),
    min_age = min(age),
    max_age = max(age)
  ) |> 
  complete(smoker)
```

9.3 Missing value imputation using across() and replace_na()

The coding may not work. Refer to the Advanced Wrangling codebook in Step 5 and see examples.

  • replace_na() from tidyr
```{r}
airqual <- airquality %>% 
  as_tibble()

# identify variables with missing values
skimr::skim(airqual)

# identify data type of the variables with missing values.
glimpse(airqual)

# converting the target variables to doubles
airqual <- airqual %>% 
  mutate(Solar.R = as.double(Solar.R),
         Ozone = as.double(Ozone))

glimpse(airqual)

# missing value imputation with a mean

airqual |>  
  mutate(across(Solar.R, ~ replace_na(., mean(., na.rm = TRUE))))

airqual |> 
  mutate(across(c(Ozone, Solar.R), ~replace_na(., mean(., na.rm = TRUE))))

airqual |> 
  mutate(across(where(is.double), ~replace_na(., mean(., na.rm = TRUE))))
```
Data summary
Name airqual
Number of rows 153
Number of columns 6
_______________________
Column type frequency:
numeric 6
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Ozone 37 0.76 42.13 32.99 1.0 18.00 31.5 63.25 168.0 ▇▃▂▁▁
Solar.R 7 0.95 185.93 90.06 7.0 115.75 205.0 258.75 334.0 ▅▃▅▇▅
Wind 0 1.00 9.96 3.52 1.7 7.40 9.7 11.50 20.7 ▂▇▇▃▁
Temp 0 1.00 77.88 9.47 56.0 72.00 79.0 85.00 97.0 ▂▃▇▇▃
Month 0 1.00 6.99 1.42 5.0 6.00 7.0 8.00 9.0 ▇▇▇▇▇
Day 0 1.00 15.80 8.86 1.0 8.00 16.0 23.00 31.0 ▇▇▇▇▆
Rows: 153
Columns: 6
$ Ozone   <int> 41, 36, 12, 18, NA, 28, 23, 19, 8, NA, 7, 16, 11, 14, 18, 14, …
$ Solar.R <int> 190, 118, 149, 313, NA, NA, 299, 99, 19, 194, NA, 256, 290, 27…
$ Wind    <dbl> 7.4, 8.0, 12.6, 11.5, 14.3, 14.9, 8.6, 13.8, 20.1, 8.6, 6.9, 9…
$ Temp    <int> 67, 72, 74, 62, 56, 66, 65, 59, 61, 69, 74, 69, 66, 68, 58, 64…
$ Month   <int> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,…
$ Day     <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,…
Rows: 153
Columns: 6
$ Ozone   <dbl> 41, 36, 12, 18, NA, 28, 23, 19, 8, NA, 7, 16, 11, 14, 18, 14, …
$ Solar.R <dbl> 190, 118, 149, 313, NA, NA, 299, 99, 19, 194, NA, 256, 290, 27…
$ Wind    <dbl> 7.4, 8.0, 12.6, 11.5, 14.3, 14.9, 8.6, 13.8, 20.1, 8.6, 6.9, 9…
$ Temp    <int> 67, 72, 74, 62, 56, 66, 65, 59, 61, 69, 74, 69, 66, 68, 58, 64…
$ Month   <int> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,…
$ Day     <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,…

9.4 using naniar package

9.4.1 Data

```{r}
df <- tibble::tribble(
  ~name,           ~x,  ~y,              ~z,  
  "N/A",           1,   "N/A",           -100, 
  "N A",           3,   "NOt available", -99,
  "N / A",         NA,  "29",              -98,
  "Not Available", -99, "25",              -101,
  "John Smith",    -98, "28",              -1)

df
```

9.4.2 replace_with_na()

  • Replace an element with a missing value.
  • To do so we use the replace argument, and specify a named list, which contains the names of the variable and the value it would take to replace with NA.
```{r}
library(naniar)

df |> 
  replace_with_na(replace = list(x = -99))

df |> 
  replace_with_na(replace = list(x = c(-99, -98)))

df |> 
  replace_with_na(replace = list(x = c(-99, -98),
                                 z = c(-99, -98)))
```

9.4.3 replace_with_na_all()

  • Replace ALL values that meet a condition across an entire dataset.
```{r}
df |>  
  replace_with_na_all(condition = ~ .x == -99)

# write out all the offending strings
na_strings <- c("NA", "N A", "N / A", "N/A", "N/ A", "Not Available", "NOt available")

df |> 
  replace_with_na_all(condition = ~ .x %in% na_strings)

# common missing values list
common_na_numbers
common_na_strings

df |> 
  replace_with_na_all(condition = ~ .x %in% common_na_strings)

df |>  
  replace_with_na_all(condition = ~.x %in% c(common_na_strings, 
                                             common_na_numbers,
                                             -98, -100, -101, -1))
```
[1]    -9   -99  -999 -9999  9999    66    77    88
 [1] "missing" "NA"      "N A"     "N/A"     "#N/A"    "NA "     " NA"    
 [8] "N /A"    "N / A"   " N / A"  "N / A "  "na"      "n a"     "n/a"    
[15] "na "     " na"     "n /a"    "n / a"   " a / a"  "n / a "  "NULL"   
[22] "null"    ""        "\\?"     "\\*"     "\\."    

9.4.4 replace_with_na_at()

  • This is similar to _all, but instead in this case you can specify the variables that you want affected by the rule that you state.

  • This is useful in cases where you want to specify a rule that only affects a selected number of variables.

```{r}
df |> 
  replace_with_na_at(.vars = c("x","z"),
                     condition = ~ .x == -99)

df %>% 
  replace_with_na_at(.vars = c("x", "z"),
                     condition = ~ .x < 0)
exp(-1)
exp(0)
exp(1)

df %>% 
  replace_with_na_at(.vars = c("x", "z"),
                     condition = ~ exp(.x) < 1)
```
[1] 0.3678794
[1] 1
[1] 2.718282

9.4.5 replace_with_na_if()

```{r}
df |> 
  replace_with_na_if(.predicate = is.character,
                     condition = ~ .x %in% na_strings)
```

10 Working with multiple External Data Sets.

10.1 Data and setup

```{r}
library(tidyverse)

theme_set(theme_bw())

student_ratio <- read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-05-07/student_teacher_ratio.csv")

skimr::skim(student_ratio)

student_ratio %>% 
  arrange(desc(student_ratio)) %>% 
  slice_head(n =10)

student_ratio %>% 
  #view()
  #count(indicator)
  count(year, sort = TRUE)
  #count(flags, sort = TRUE)
```
Data summary
Name student_ratio
Number of rows 5189
Number of columns 8
_______________________
Column type frequency:
character 6
numeric 2
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
edulit_ind 0 1.00 7 9 0 7 0
indicator 0 1.00 17 37 0 7 0
country_code 0 1.00 3 5 0 235 0
country 0 1.00 4 52 0 232 0
flag_codes 4185 0.19 1 1 0 3 0
flags 4185 0.19 14 23 0 3 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
year 0 1.00 2014.41 1.67 2012.00 2013.00 2014.00 2016.00 2018.00 ▇▅▅▅▃
student_ratio 325 0.94 18.29 10.41 1.16 11.66 15.84 21.91 168.63 ▇▁▁▁▁

10.2 Exploratory Data Analysis

```{r}
s_t_ratio_prim_2015 <- student_ratio %>%
  filter(indicator == "Primary Education",
         year == 2015,
         !is.na(student_ratio))

s_t_ratio_prim_2015 %>% #count(country, sort = TRUE)
  arrange(desc(student_ratio)) %>%
  slice(c(1:10, seq(n() - 9, n()))) %>% #view()
  mutate(country = fct_reorder(country, student_ratio)) %>%
  ggplot(aes(student_ratio, country)) +
  geom_point() +
  expand_limits(x = 0) +
  labs(title = "Countries with the highest and lowest student/teacher ratios",
       y = "",
       x = "Student/teacher ratio")
```

10.3 Research Question

  • Research Questions: Does student/teacher ratio is associated with national wealth? How?

10.3.1 Getting GDP, Populaiton, region data from WDI package

  • WDI (World Development Indicators (World Bank)) package:
  • For details: https://cran.r-project.org/web/packages/WDI/WDI.pdf
  • WDI(): It is a function from WDI package that Downloads the requested data by using the World Bank’s API, parses the resulting XML file, and formats it in long country-year format.
  • WDIsearch(): Search names and descriptions of available WDI series.
```{r}
#install.packages("WDI")
library(WDI)
help("WDI")

WDIsearch("gdp per capita") %>% view()

WDIsearch("Population") %>% 
  #view()
  #class()
  as_tibble() %>% 
  filter(str_detect(name, "^Population")) |> 
  arrange(str_length(name))


indicators <- WDI(indicator = c("NY.GDP.PCAP.CD", "SP.POP.TOTL"),
                  start = 2015, end = 2015, extra = TRUE) %>% 
  as_tibble() %>% 
  select(country_code = iso3c,
         region,
         per_cap_gdp = NY.GDP.PCAP.CD,
         total_pop = SP.POP.TOTL) 

head(indicators)

indicators |> 
  count(country_code, sort = TRUE)

joined_by_indicators <- s_t_ratio_prim_2015 %>% 
  inner_join(indicators, by = "country_code") 

joined_by_indicators |> 
  filter(is.na(per_cap_gdp ) | is.na(total_pop))

joined_by_indicators |> 
  ggplot(aes(student_ratio)) +
  geom_histogram() +
  scale_x_log10()

joined_by_indicators |>  
  ggplot(aes(per_cap_gdp)) +
  geom_histogram() +
  scale_x_log10()
```

10.3.2 Student ratio vs. gdp

```{r}
joined_by_indicators %>% 
  ggplot(aes(per_cap_gdp, student_ratio))+
  geom_point() +
  scale_x_log10(labels = scales::dollar) +
  scale_y_log10() +
  geom_smooth(method = lm, se = FALSE) +
  geom_text(aes(label = country), vjust = 1, hjust = 1, check_overlap = TRUE) +
  labs(x = "GDP per Capita",
       y = "Student/Teacher Ratio",
       title = "The relationship between GDP and Student/Teacher Ratio",
       subtitle = "For Year 2015")
```

10.3.3 Adding population and region to the chart

```{r}
joined_by_indicators %>% 
  arrange(desc(total_pop)) %>% 
  ggplot(aes(per_cap_gdp, student_ratio)) +
  geom_point(aes(size = total_pop, color = region)) +
  scale_x_log10(labels = scales::dollar) +
  scale_y_log10() +
  geom_smooth(method = lm, se = FALSE) +
  geom_text(aes(label = country), vjust = 1, hjust = 1, check_overlap = TRUE) +
  labs(x = "GDP per Capita",
       y = "Student/Teacher Ratio",
       title = "The relationship between GDP and Student/Teacher Ratio",
       subtitle = "For Year 2015")
```

10.3.4 Refining the chart

```{r}
joined_by_indicators %>% 
  arrange(desc(total_pop)) %>% 
  #slice_max(SP.POP.TOTL, n = 100) %>% 
  ggplot(aes(per_cap_gdp, student_ratio))+
  geom_point(aes(size = total_pop, color = region)) +
  scale_x_log10(labels = scales::dollar) +
  scale_y_log10() +
  geom_smooth(method = lm, se = FALSE) +
  geom_text(aes(label = country), vjust = 1, hjust = 1, check_overlap = TRUE) +
  scale_size_continuous(labels = scales::comma_format(), range = c(.5, 15)) +
  labs(x = "GDP per Capita",
       y = "Student/Teacher Ratio",
       title = "The relationship between GDP and Student/Teacher Ratio",
       subtitle = "For Year 2015",
       color = "Region",
       size = "Population")
```

Note

This confirms there’s a negative correlation between a country’s wealth and its student/teacher ratio.

11 Working with Labelled data

11.1 Uploading packages used often for labelled data

```{r}
library(haven)      # import .sav files ----  
library(labelled)   # tools for labelled data ----
library(sjlabelled) # more tools for labelled data ----
library(scales)
library(sjPlot)     # creates variable table
library(gt)
```

11.2 Importing SPSS data

```{r import data}
library(here)
here()
data <- read_sav(here("data", "demo.sav"))

#demo <- read_sav("data/demo.sav")

class(data)
dim(data)
head(data)
glimpse(data)
#view(data)
```
[1] "C:/Users/jmjung/OneDrive - Cal Poly Pomona/Documents/Work/Jaemin/4. Teaching/DWV101/Course Content/M09-Import_Export/Principles"
[1] "tbl_df"     "tbl"        "data.frame"
[1] 6400   29
Rows: 6,400
Columns: 29
$ age      <dbl> 55, 56, 28, 24, 25, 45, 42, 35, 46, 34, 55, 28, 31, 42, 35, 5…
$ marital  <dbl+lbl> 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, …
$ address  <dbl> 12, 29, 9, 4, 2, 9, 19, 15, 26, 0, 17, 3, 9, 8, 8, 24, 1, 0, …
$ income   <dbl> 72, 153, 28, 26, 23, 76, 40, 57, 24, 89, 72, 24, 40, 137, 70,…
$ inccat   <dbl+lbl> 3, 4, 2, 2, 1, 4, 2, 3, 1, 4, 3, 1, 2, 4, 3, 4, 2, 2, 4, …
$ car      <dbl> 36.2, 76.9, 13.7, 12.5, 11.3, 37.2, 19.8, 28.2, 12.2, 46.1, 3…
$ carcat   <dbl+lbl> 3, 3, 1, 1, 1, 3, 2, 2, 1, 3, 3, 1, 2, 3, 3, 3, 2, 1, 3, …
$ ed       <dbl+lbl> 1, 1, 3, 4, 2, 3, 3, 2, 1, 3, 3, 4, 4, 3, 3, 4, 3, 1, 3, …
$ employ   <dbl> 23, 35, 4, 0, 5, 13, 10, 1, 11, 12, 2, 4, 0, 3, 9, 16, 0, 2, …
$ retire   <dbl+lbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ empcat   <dbl+lbl> 3, 3, 1, 1, 2, 2, 2, 1, 2, 2, 1, 1, 1, 1, 2, 3, 1, 1, 3, …
$ jobsat   <dbl+lbl> 5, 4, 3, 1, 2, 2, 2, 1, 5, 4, 3, 5, 2, 1, 4, 5, 1, 4, 3, …
$ gender   <chr+lbl> "f", "m", "f", "m", "m", "m", "m", "f", "f", "m", "f", "m…
$ reside   <dbl> 4, 1, 3, 3, 2, 2, 1, 1, 2, 6, 2, 1, 4, 1, 3, 2, 7, 2, 1, 4, 1…
$ wireless <dbl+lbl> 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, …
$ multline <dbl+lbl> 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, …
$ voice    <dbl+lbl> 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, …
$ pager    <dbl+lbl> 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, …
$ internet <dbl+lbl> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, …
$ callid   <dbl+lbl> 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, …
$ callwait <dbl+lbl> 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, …
$ owntv    <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ ownvcr   <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, …
$ owncd    <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, …
$ ownpda   <dbl+lbl> 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, …
$ ownpc    <dbl+lbl> 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, …
$ ownfax   <dbl+lbl> 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, …
$ news     <dbl+lbl> 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, …
$ response <dbl+lbl> 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, …

11.3 Wrangling with labelled data

11.3.1 Viewing data dictionary

```{r}
#install.packages("sjPlot")
#library(sjPlot) 
data %>% 
  sjPlot::view_df() # create a nice table of viewer for the variable, label, value labels
```
Data frame: .
ID Name Label Values Value Labels
1 age Age in years range: 18-77
2 marital Marital status 0
1
Unmarried
Married
3 address Years at current address range: 0-56
4 income Household income in thousands range: 9-1116
5 inccat Income category in thousands 1
2
3
4
Under $25
$25 - $49
$50 - $74
$75+
6 car Price of primary vehicle range: 4.2-99.9
7 cbackground-color:#eeeeeeat Primary vehicle price category 1
2
3
Economy
Standard
Luxury
8 ed Level of education 1
2
3
4
5
Did not complete high school
High school degree
Some college
College degree
Post-undergraduate degree
9 employ Years with current employer range: 0-57
10 retire Retired 0
1
No
Yes
11 empcat Years with current employer 1
2
3
Less than 5
5 to 15
More than 15
12 jobsat Job satisfaction 1
2
3
4
5
Highly dissatisfied
Somewhat dissatisfied
Neutral
Somewhat satisfied
Highly satisfied
13 gender Gender f
m
<output omitted>
14 reside Number of people in household range: 1-9
15 wireless Wireless service 0
1
No
Yes
16 multline Multiple lines 0
1
No
Yes
17 voice Voice mail 0
1
No
Yes
18 pager Paging service 0
1
No
Yes
19 internet Internet 0
1
8
9
No
Yes
Does not know
No Answer
20 callid Caller ID 0
1
No
Yes
21 callwait Call waiting 0
1
No
Yes
22 owntv Owns TV 0
1
No
Yes
23 ownvcr Owns VCR 0
1
No
Yes
24 owncd Owns stereo/CD player 0
1
No
Yes
25 ownpda Owns PDA 0
1
No
Yes
26 ownpc Owns computer 0
1
No
Yes
27 ownfax Owns fax machine 0
1
No
Yes
28 news Newspaper subscription 0
1
Yes
No
29 response Response 0
1
Yes
No

11.3.2 Creating a data dictionary

```{r create a dictionary}
dictionary <- labelled::generate_dictionary(data)

dictionary

dictionary %>% 
  filter(variable %in% c("marital", "inccat", "carcat", "jobsat")) %>% 
  select(pos, variable, label, value_labels) %>% 
  view()

data %>% 
  pull(marital) %>% 
  class(.)

data %>% 
  pull(marital) %>% 
  str(.)

data %>% 
  select(marital, inccat, carcat, jobsat) %>% 
  head(.)
```
[1] "haven_labelled" "vctrs_vctr"     "double"        
 dbl+lbl [1:6400] 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, ...
 @ label      : chr "Marital status"
 @ format.spss: chr "F4.0"
 @ labels     : Named num [1:2] 0 1
  ..- attr(*, "names")= chr [1:2] "Unmarried" "Married"

11.3.3 Applying common operations to the labelled data set

```{r Applying common operations to the data set}
data %>% 
  glimpse()

skimr::skim(data)

# Evaluate if variable is of class haven_labelled.
data$marital %>% 
  class() 
data |> pull(marital) |> class() # same as before

class(data)
class(data$income)
class(data$inccat)

data$inccat %>% 
  is.labelled()

haven::is.labelled(data$inccat)
haven::is.labelled(data$income)
haven::is.labelled(data)

# variable label

var_label(data)

var_label(data$jobsat) # the same as below

data$jobsat %>% 
  attr('label') 

# print value labels
data$jobsat %>% 
  attr('labels') 

val_labels(data$jobsat)

# print value label for one specific value
val_label(data$jobsat, 1)

data %>% 
  pull(jobsat) %>% 
  val_label(1)
```
Rows: 6,400
Columns: 29
$ age      <dbl> 55, 56, 28, 24, 25, 45, 42, 35, 46, 34, 55, 28, 31, 42, 35, 5…
$ marital  <dbl+lbl> 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, …
$ address  <dbl> 12, 29, 9, 4, 2, 9, 19, 15, 26, 0, 17, 3, 9, 8, 8, 24, 1, 0, …
$ income   <dbl> 72, 153, 28, 26, 23, 76, 40, 57, 24, 89, 72, 24, 40, 137, 70,…
$ inccat   <dbl+lbl> 3, 4, 2, 2, 1, 4, 2, 3, 1, 4, 3, 1, 2, 4, 3, 4, 2, 2, 4, …
$ car      <dbl> 36.2, 76.9, 13.7, 12.5, 11.3, 37.2, 19.8, 28.2, 12.2, 46.1, 3…
$ carcat   <dbl+lbl> 3, 3, 1, 1, 1, 3, 2, 2, 1, 3, 3, 1, 2, 3, 3, 3, 2, 1, 3, …
$ ed       <dbl+lbl> 1, 1, 3, 4, 2, 3, 3, 2, 1, 3, 3, 4, 4, 3, 3, 4, 3, 1, 3, …
$ employ   <dbl> 23, 35, 4, 0, 5, 13, 10, 1, 11, 12, 2, 4, 0, 3, 9, 16, 0, 2, …
$ retire   <dbl+lbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ empcat   <dbl+lbl> 3, 3, 1, 1, 2, 2, 2, 1, 2, 2, 1, 1, 1, 1, 2, 3, 1, 1, 3, …
$ jobsat   <dbl+lbl> 5, 4, 3, 1, 2, 2, 2, 1, 5, 4, 3, 5, 2, 1, 4, 5, 1, 4, 3, …
$ gender   <chr+lbl> "f", "m", "f", "m", "m", "m", "m", "f", "f", "m", "f", "m…
$ reside   <dbl> 4, 1, 3, 3, 2, 2, 1, 1, 2, 6, 2, 1, 4, 1, 3, 2, 7, 2, 1, 4, 1…
$ wireless <dbl+lbl> 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, …
$ multline <dbl+lbl> 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, …
$ voice    <dbl+lbl> 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, …
$ pager    <dbl+lbl> 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, …
$ internet <dbl+lbl> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, …
$ callid   <dbl+lbl> 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, …
$ callwait <dbl+lbl> 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, …
$ owntv    <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ ownvcr   <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, …
$ owncd    <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, …
$ ownpda   <dbl+lbl> 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, …
$ ownpc    <dbl+lbl> 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, …
$ ownfax   <dbl+lbl> 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, …
$ news     <dbl+lbl> 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, …
$ response <dbl+lbl> 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, …
Data summary
Name data
Number of rows 6400
Number of columns 29
_______________________
Column type frequency:
character 1
numeric 28
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
gender 0 1 1 1 0 2 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
age 0 1.00 42.06 12.29 18.0 33.0 41.0 51.0 77.0 ▅▇▇▃▁
marital 0 1.00 0.50 0.50 0.0 0.0 0.0 1.0 1.0 ▇▁▁▁▇
address 0 1.00 11.56 9.94 0.0 3.0 9.0 17.0 56.0 ▇▃▂▁▁
income 0 1.00 69.47 78.72 9.0 28.0 45.0 79.0 1116.0 ▇▁▁▁▁
inccat 0 1.00 2.53 1.07 1.0 2.0 2.0 4.0 4.0 ▃▇▁▃▆
car 0 1.00 30.13 21.93 4.2 13.9 22.2 39.5 99.9 ▇▃▂▂▁
carcat 0 1.00 2.07 0.80 1.0 1.0 2.0 3.0 3.0 ▆▁▇▁▇
ed 0 1.00 2.59 1.20 1.0 2.0 2.0 4.0 5.0 ▆▇▆▆▂
employ 0 1.00 10.57 9.72 0.0 3.0 8.0 16.0 57.0 ▇▃▁▁▁
retire 0 1.00 0.05 0.21 0.0 0.0 0.0 0.0 1.0 ▇▁▁▁▁
empcat 0 1.00 1.94 0.79 1.0 1.0 2.0 3.0 3.0 ▇▁▇▁▆
jobsat 0 1.00 3.06 1.37 1.0 2.0 3.0 4.0 5.0 ▆▇▇▇▇
reside 0 1.00 2.35 1.47 1.0 1.0 2.0 3.0 9.0 ▇▃▁▁▁
wireless 0 1.00 0.40 0.49 0.0 0.0 0.0 1.0 1.0 ▇▁▁▁▅
multline 0 1.00 0.42 0.49 0.0 0.0 0.0 1.0 1.0 ▇▁▁▁▆
voice 0 1.00 0.43 0.50 0.0 0.0 0.0 1.0 1.0 ▇▁▁▁▆
pager 0 1.00 0.25 0.43 0.0 0.0 0.0 0.0 1.0 ▇▁▁▁▂
internet 255 0.96 0.27 0.44 0.0 0.0 0.0 1.0 1.0 ▇▁▁▁▃
callid 0 1.00 0.51 0.50 0.0 0.0 1.0 1.0 1.0 ▇▁▁▁▇
callwait 0 1.00 0.51 0.50 0.0 0.0 1.0 1.0 1.0 ▇▁▁▁▇
owntv 0 1.00 0.99 0.10 0.0 1.0 1.0 1.0 1.0 ▁▁▁▁▇
ownvcr 0 1.00 0.96 0.20 0.0 1.0 1.0 1.0 1.0 ▁▁▁▁▇
owncd 0 1.00 0.97 0.17 0.0 1.0 1.0 1.0 1.0 ▁▁▁▁▇
ownpda 0 1.00 0.20 0.40 0.0 0.0 0.0 0.0 1.0 ▇▁▁▁▂
ownpc 0 1.00 0.44 0.50 0.0 0.0 0.0 1.0 1.0 ▇▁▁▁▆
ownfax 0 1.00 0.19 0.39 0.0 0.0 0.0 0.0 1.0 ▇▁▁▁▂
news 0 1.00 0.57 0.50 0.0 0.0 1.0 1.0 1.0 ▆▁▁▁▇
response 0 1.00 0.89 0.31 0.0 1.0 1.0 1.0 1.0 ▁▁▁▁▇
[1] "haven_labelled" "vctrs_vctr"     "double"        
[1] "haven_labelled" "vctrs_vctr"     "double"        
[1] "tbl_df"     "tbl"        "data.frame"
[1] "numeric"
[1] "haven_labelled" "vctrs_vctr"     "double"        
[1] TRUE
[1] TRUE
[1] FALSE
[1] FALSE
$age
[1] "Age in years"

$marital
[1] "Marital status"

$address
[1] "Years at current address"

$income
[1] "Household income in thousands"

$inccat
[1] "Income category in thousands"

$car
[1] "Price of primary vehicle"

$carcat
[1] "Primary vehicle price category"

$ed
[1] "Level of education"

$employ
[1] "Years with current employer"

$retire
[1] "Retired"

$empcat
[1] "Years with current employer"

$jobsat
[1] "Job satisfaction"

$gender
[1] "Gender"

$reside
[1] "Number of people in household"

$wireless
[1] "Wireless service"

$multline
[1] "Multiple lines"

$voice
[1] "Voice mail"

$pager
[1] "Paging service"

$internet
[1] "Internet"

$callid
[1] "Caller ID"

$callwait
[1] "Call waiting"

$owntv
[1] "Owns TV"

$ownvcr
[1] "Owns VCR"

$owncd
[1] "Owns stereo/CD player"

$ownpda
[1] "Owns PDA"

$ownpc
[1] "Owns computer"

$ownfax
[1] "Owns fax machine"

$news
[1] "Newspaper subscription"

$response
[1] "Response"

[1] "Job satisfaction"
[1] "Job satisfaction"
  Highly dissatisfied Somewhat dissatisfied               Neutral 
                    1                     2                     3 
   Somewhat satisfied      Highly satisfied 
                    4                     5 
<labelled<double>[6400]>: Job satisfaction
   [1] 5 4 3 1 2 2 2 1 5 4 3 5 2 1 4 5 1 4 3 5 3 3 3 3 4 3 1 5 5 4 1 4 1 2 5 1 1
  [38] 3 4 3 3 2 5 2 3 1 2 3 2 3 2 5 4 2 2 1 3 3 3 3 5 3 2 2 1 1 4 3 4 2 5 5 4 5
  [75] 2 3 5 5 5 3 3 4 4 2 2 5 4 3 4 4 5 4 1 3 5 1 4 2 2 2 3 3 4 3 3 2 5 4 1 4 4
 [112] 2 2 2 1 4 1 1 1 5 3 2 5 1 4 3 2 2 1 3 2 1 1 2 1 3 1 3 4 1 4 3 1 5 1 4 2 2
 [149] 2 5 2 2 2 4 4 5 2 2 1 2 3 4 3 1 2 3 4 2 2 2 2 1 3 4 4 1 4 4 4 2 1 4 3 2 2
 [186] 3 4 5 5 5 3 2 1 5 5 2 1 3 5 3 3 5 3 4 2 4 2 1 5 2 5 3 5 4 5 3 4 2 4 4 1 1
 [223] 2 2 4 1 1 5 1 3 2 1 4 5 1 3 4 4 5 5 5 5 4 2 2 4 1 4 5 5 5 4 2 5 5 1 4 5 1
 [260] 5 2 4 5 4 1 3 5 5 2 3 5 1 4 5 5 4 3 4 2 4 3 3 4 3 1 2 5 2 2 3 2 5 4 2 4 3
 [297] 2 2 1 1 4 1 2 4 5 5 3 1 2 4 5 4 4 4 4 5 3 4 1 5 1 4 4 3 4 1 1 4 5 5 2 3 3
 [334] 3 3 2 1 5 3 2 1 4 4 3 3 3 1 5 1 4 1 5 5 1 3 5 3 5 5 3 2 2 3 4 4 5 4 5 2 1
 [371] 5 5 2 3 5 2 3 3 3 3 3 4 2 1 5 4 4 3 2 1 3 5 4 3 2 4 4 4 1 4 4 1 1 1 1 5 3
 [408] 3 4 4 4 2 3 1 2 2 4 4 3 5 3 3 3 5 4 3 4 5 2 3 4 4 4 1 4 2 3 2 3 5 3 5 1 2
 [445] 1 5 4 1 5 5 2 5 3 4 4 5 3 4 2 4 4 4 1 3 3 5 2 3 2 3 4 3 3 4 2 2 1 3 5 4 3
 [482] 3 1 3 4 1 1 1 3 5 4 2 2 4 1 4 5 3 3 1 2 4 1 3 4 5 3 3 5 4 2 3 1 2 4 3 2 4
 [519] 5 3 4 5 1 3 4 2 1 5 2 1 4 2 1 5 4 3 4 2 5 4 1 3 5 4 1 2 3 2 4 2 2 4 2 4 3
 [556] 2 1 1 1 3 1 4 2 4 2 4 3 1 5 5 4 2 3 4 3 1 1 4 1 1 1 1 2 3 3 4 3 1 4 4 2 5
 [593] 4 2 5 4 5 4 2 2 3 4 4 4 3 3 4 5 5 2 3 1 5 5 4 3 2 1 3 5 5 3 3 1 3 4 3 3 2
 [630] 3 1 5 5 4 3 1 4 2 2 2 3 1 5 3 5 1 2 4 2 3 2 2 2 4 2 3 2 2 5 2 4 3 2 4 2 4
 [667] 4 1 2 3 1 5 1 4 5 3 3 2 4 4 3 3 2 5 1 4 2 5 4 2 4 1 1 4 1 2 3 2 2 5 5 3 4
 [704] 5 4 5 1 5 3 4 2 2 3 3 1 2 1 3 3 3 2 2 2 3 2 4 2 5 1 5 2 4 1 4 4 4 3 2 5 4
 [741] 4 2 4 4 3 2 3 4 2 5 3 1 2 3 5 5 1 3 5 2 3 4 5 5 2 2 4 3 5 2 4 4 5 3 1 3 5
 [778] 1 5 5 2 2 5 4 2 2 2 3 1 3 5 3 1 3 4 2 5 3 5 5 3 2 3 4 1 5 5 1 4 1 2 2 1 2
 [815] 2 5 2 3 5 2 3 2 4 2 1 3 3 2 5 3 5 3 1 3 5 4 1 3 1 5 4 2 4 3 1 1 1 4 5 3 3
 [852] 1 5 2 2 4 4 1 1 2 5 4 1 4 4 2 5 1 2 4 3 3 2 1 3 5 4 2 3 3 3 2 3 3 1 4 1 3
 [889] 2 3 1 3 2 1 3 5 4 3 3 1 5 3 4 4 2 2 3 5 3 5 3 5 3 4 4 3 3 3 1 1 1 4 3 5 4
 [926] 2 2 3 2 5 3 1 3 5 3 2 4 1 3 5 5 1 5 5 2 4 5 4 4 2 3 3 2 1 3 5 3 2 5 4 2 2
 [963] 5 3 4 3 1 1 1 1 5 4 5 5 2 5 5 2 3 4 4 4 4 4 3 1 2 5 4 4 3 4 1 4 4 5 3 3 1
[1000] 4 1 4 1 2 5 5 4 3 5 3 4 4 1 4 1 5 3 2 4 5 5 5 1 2 5 5 4 5 4 3 4 3 4 5 4 1
[1037] 5 4 4 2 2 3 1 1 2 4 3 3 5 4 4 2 2 3 2 3 2 3 5 3 5 1 2 4 1 1 3 2 5 3 3 4 5
[1074] 2 2 5 3 4 3 5 3 1 5 3 4 2 1 5 2 3 3 4 2 2 3 3 3 1 2 5 5 3 3 2 4 5 4 5 4 5
[1111] 1 4 1 2 4 3 2 5 1 3 1 1 2 3 5 3 3 1 3 1 5 3 4 2 1 3 1 2 2 5 5 4 5 1 1 4 5
[1148] 4 3 4 2 3 4 5 1 2 3 2 5 5 2 2 1 3 3 2 3 2 5 3 1 1 3 2 3 4 2 1 2 4 2 3 4 5
[1185] 3 1 4 4 5 4 4 1 5 5 5 4 2 2 5 5 2 1 2 5 2 1 4 2 4 1 5 3 1 1 2 1 2 4 4 4 1
[1222] 2 3 4 3 5 5 1 1 5 4 5 4 1 3 4 2 2 2 3 4 4 1 4 2 3 5 5 3 4 4 1 3 1 3 5 4 4
[1259] 2 1 4 5 5 2 5 2 5 4 3 1 4 5 5 4 1 4 5 5 3 1 3 4 4 2 5 3 4 5 2 2 2 4 3 2 1
[1296] 3 1 4 4 2 4 1 3 2 2 4 3 1 2 1 5 3 4 5 5 4 2 4 2 1 5 4 5 4 4 5 3 4 1 4 4 3
[1333] 3 3 4 5 5 1 3 3 4 4 3 4 1 3 1 4 4 1 1 2 5 4 2 2 2 5 2 2 3 1 3 3 2 2 4 5 2
[1370] 4 1 2 2 2 1 3 5 4 3 3 2 3 3 5 4 3 1 3 1 4 4 2 3 3 3 4 5 4 1 3 3 2 5 5 1 1
[1407] 2 2 1 2 5 3 4 4 3 2 1 4 2 2 5 4 3 3 4 1 1 5 2 1 4 2 4 1 5 5 2 4 1 1 2 3 2
[1444] 4 3 4 5 4 4 5 1 5 5 5 1 1 5 5 5 4 2 5 1 3 3 5 1 1 1 5 2 4 1 2 5 4 2 1 4 3
[1481] 1 3 5 3 2 3 5 2 1 3 2 4 4 1 5 1 3 2 5 5 5 4 4 5 2 5 4 3 1 1 1 5 3 5 2 5 2
[1518] 3 5 2 2 5 2 2 5 4 4 1 1 2 5 4 1 2 1 4 2 1 3 2 2 5 3 5 1 4 3 1 5 1 4 3 5 2
[1555] 3 1 4 2 4 4 2 2 2 5 2 1 1 2 1 4 4 2 3 5 1 4 3 3 3 5 2 3 1 5 4 5 5 2 5 2 2
[1592] 2 2 5 3 4 2 5 1 2 3 1 3 5 3 4 4 5 1 3 5 4 2 5 1 4 1 2 5 4 3 3 3 5 3 2 2 1
[1629] 1 3 4 2 4 3 2 5 4 1 1 4 4 5 5 3 1 5 4 4 2 2 3 5 4 2 2 3 5 5 2 2 1 1 2 4 2
[1666] 5 5 4 1 2 4 3 5 4 1 1 1 5 2 2 1 4 3 4 3 5 1 5 5 1 3 4 5 5 2 3 5 4 1 3 3 3
[1703] 2 2 3 5 3 3 1 1 2 5 1 5 3 2 2 3 2 5 2 3 4 1 2 4 4 4 1 5 2 5 2 2 5 1 5 1 2
[1740] 1 4 4 4 4 2 3 1 5 5 3 5 3 3 1 5 3 5 1 5 1 4 1 4 3 1 4 5 2 5 2 5 3 3 3 2 3
[1777] 2 2 1 4 4 1 4 2 4 4 1 3 4 2 3 3 3 1 1 5 3 2 3 4 1 2 3 3 5 4 2 5 1 4 2 5 2
[1814] 5 4 2 3 5 1 1 4 1 3 4 5 2 5 5 1 2 3 3 5 5 5 2 2 2 5 1 2 2 5 5 3 4 4 3 3 5
[1851] 1 5 3 2 1 2 4 2 4 2 2 1 1 5 5 1 2 4 2 2 2 1 1 1 5 5 5 1 2 1 2 5 3 5 3 5 3
[1888] 5 3 3 3 5 4 4 1 2 2 4 1 3 3 1 4 4 1 5 5 2 5 4 4 3 1 4 5 1 4 5 3 4 1 1 3 2
[1925] 5 4 5 1 1 2 4 1 1 5 4 3 5 2 4 3 5 3 4 4 3 1 5 4 1 3 2 5 4 4 4 2 3 1 4 3 1
[1962] 2 1 5 3 3 5 1 5 2 2 3 3 5 4 3 3 4 1 4 2 3 1 3 3 3 2 3 3 2 2 2 2 4 3 4 2 3
[1999] 4 2 3 2 1 5 3 1 4 4 3 4 3 5 4 4 1 1 2 2 4 3 5 3 2 4 1 4 2 2 1 5 2 5 5 4 3
[2036] 2 1 5 4 1 3 5 4 3 1 1 2 5 5 4 4 2 3 5 4 4 3 4 3 5 3 5 1 5 1 3 3 4 2 4 5 5
[2073] 5 1 5 5 1 3 3 4 2 5 4 3 1 5 2 4 3 3 5 2 3 3 1 5 4 5 3 5 4 4 3 3 5 1 1 3 3
[2110] 4 2 2 5 5 3 5 3 3 5 3 5 4 5 4 3 2 2 5 2 3 1 4 5 2 2 5 4 5 5 4 3 1 3 4 2 2
[2147] 4 2 4 1 3 4 3 2 3 5 3 2 3 4 2 2 5 2 3 2 4 2 4 2 2 2 5 4 5 5 1 4 3 5 4 1 4
[2184] 4 5 2 2 4 1 3 4 1 3 3 3 1 3 2 3 5 1 1 5 1 1 4 2 5 1 3 3 1 3 4 4 3 5 5 3 5
[2221] 4 2 2 1 5 1 3 2 2 1 5 1 1 1 3 1 1 5 5 5 2 1 2 3 4 1 3 2 2 4 3 3 4 5 5 5 1
[2258] 3 2 1 5 5 1 4 5 5 1 1 1 3 4 1 4 2 3 4 4 5 5 5 2 5 5 4 1 4 1 2 3 4 3 2 1 1
[2295] 3 2 5 3 1 4 2 5 4 5 2 1 1 5 5 5 5 5 1 5 1 2 2 1 4 3 5 1 2 3 1 5 5 1 1 5 5
[2332] 5 4 3 1 3 2 2 3 4 3 4 3 3 4 3 5 4 2 4 3 2 1 4 5 2 2 3 2 3 3 4 3 5 2 4 3 5
[2369] 3 4 4 4 1 5 4 3 5 3 4 2 5 4 1 5 4 3 3 2 5 2 3 4 3 4 4 2 4 5 3 2 2 1 3 4 1
[2406] 4 2 5 1 2 1 4 5 4 5 3 2 2 1 4 3 1 5 4 5 4 5 5 3 3 3 5 1 5 2 3 3 3 3 2 5 4
[2443] 4 4 3 2 3 1 5 3 1 5 5 5 1 1 4 1 4 4 1 1 1 4 1 3 5 4 2 5 4 3 2 3 4 2 2 1 1
[2480] 3 3 1 4 5 3 5 5 5 4 5 2 2 4 2 3 2 1 5 3 4 2 2 5 2 1 1 4 3 5 3 4 3 3 4 1 4
[2517] 2 3 2 4 4 2 1 2 5 3 2 1 1 2 3 2 3 1 5 3 2 3 2 4 1 3 3 5 3 2 1 1 1 3 2 4 2
[2554] 3 5 3 3 1 5 2 3 4 3 3 1 1 4 2 3 1 4 1 3 4 4 1 5 4 4 1 4 4 1 2 1 1 4 2 3 5
[2591] 1 5 4 3 2 5 3 2 4 4 5 4 1 3 1 1 2 3 1 5 4 5 4 5 4 2 3 4 3 5 4 4 2 2 3 1 5
[2628] 5 2 2 3 5 3 1 2 5 2 4 1 1 3 2 3 5 2 3 2 1 4 3 1 4 3 1 3 4 4 4 3 3 4 2 3 5
[2665] 3 5 5 2 2 5 4 2 3 4 1 3 4 3 1 3 1 5 4 4 2 1 5 2 4 3 2 1 5 5 5 3 1 2 4 2 3
[2702] 2 2 5 1 3 5 4 2 1 3 3 4 3 4 4 1 4 2 1 5 1 4 1 4 5 2 5 2 4 2 1 4 2 1 4 2 1
[2739] 3 5 3 3 2 2 5 5 4 1 1 3 4 3 5 1 5 3 2 5 1 4 1 4 1 1 5 3 5 2 5 4 2 1 2 1 1
[2776] 2 4 5 2 3 2 1 1 4 4 5 1 5 3 5 5 3 3 4 1 4 3 2 2 4 1 4 3 3 2 3 4 2 2 2 3 5
[2813] 1 3 5 4 1 3 2 5 4 5 4 3 3 1 5 1 2 3 1 3 4 1 3 3 4 2 4 5 4 4 3 4 2 2 5 4 2
[2850] 2 3 3 3 2 3 3 5 3 2 4 2 2 2 3 2 3 3 3 3 1 5 3 1 2 4 1 1 1 2 5 1 5 4 2 1 3
[2887] 3 3 1 4 4 2 3 3 3 4 5 2 5 2 3 4 3 2 5 3 1 1 3 5 5 3 3 3 4 4 4 5 5 2 1 5 5
[2924] 1 1 3 4 1 4 2 5 5 4 2 4 3 4 2 1 2 4 3 2 1 2 1 4 3 5 1 4 4 3 1 2 5 1 2 4 5
[2961] 5 3 2 1 4 2 5 3 1 2 3 4 1 1 4 4 4 2 2 3 3 5 2 2 2 1 5 3 3 3 2 4 5 4 2 2 5
[2998] 4 1 1 2 5 4 3 3 4 5 3 1 2 4 3 4 5 5 3 5 2 4 2 3 2 5 4 1 4 4 3 1 5 1 1 5 3
[3035] 5 2 3 3 2 5 5 4 2 4 3 1 1 1 3 5 2 3 3 1 5 2 4 1 3 1 1 1 4 1 2 2 3 1 3 1 5
[3072] 1 5 2 4 4 3 1 3 2 5 2 5 4 3 3 2 5 1 5 4 1 1 1 5 4 3 1 3 3 3 5 3 2 3 5 4 1
[3109] 1 4 3 1 3 4 3 5 5 5 2 5 4 4 4 3 2 3 4 1 4 5 3 2 5 2 4 5 2 1 1 3 2 4 2 2 4
[3146] 3 4 1 5 5 5 4 1 4 2 4 2 1 4 5 5 2 2 2 3 4 5 3 2 1 4 5 5 3 3 4 3 3 2 1 3 2
[3183] 4 2 4 3 4 2 2 3 3 5 4 3 4 3 5 4 3 1 4 2 2 3 5 5 1 2 3 4 1 4 4 2 3 5 1 1 3
[3220] 3 4 4 5 2 3 1 5 1 4 2 5 1 4 1 2 1 2 5 1 2 3 4 3 1 4 3 3 3 4 3 3 4 1 3 1 4
[3257] 2 2 2 4 4 5 1 4 1 2 4 2 2 3 5 3 3 4 3 5 2 2 4 5 1 5 1 1 5 1 2 4 4 4 5 5 1
[3294] 1 3 3 4 1 4 4 2 1 3 5 2 5 2 2 5 2 3 2 4 1 3 2 4 3 2 4 2 2 1 3 2 5 2 2 2 4
[3331] 3 2 1 4 3 2 2 3 3 3 2 3 4 1 2 4 1 2 2 4 3 2 2 2 4 4 4 2 5 3 4 2 3 4 2 1 3
[3368] 2 4 3 3 4 4 5 4 1 3 1 3 5 2 4 4 5 2 3 5 3 3 5 1 4 3 1 4 3 4 5 3 1 1 2 5 5
[3405] 5 5 1 1 3 2 3 1 3 2 4 4 4 3 1 5 4 1 1 3 3 1 4 4 2 3 3 2 4 2 3 5 5 4 4 4 4
[3442] 3 2 2 2 2 5 3 5 1 5 2 5 4 3 1 1 3 4 3 1 5 5 4 2 4 1 2 1 3 5 2 4 5 3 1 2 4
[3479] 5 4 4 4 4 4 5 3 5 1 4 4 5 4 1 4 4 4 3 5 1 1 2 3 1 1 5 5 5 2 5 1 5 5 2 1 5
[3516] 1 4 3 4 4 4 2 5 4 1 3 1 5 3 1 3 3 2 5 4 4 1 4 3 3 3 3 1 1 3 3 5 2 4 5 4 4
[3553] 2 5 1 3 2 4 2 4 3 4 1 2 5 5 3 4 4 1 1 4 4 1 3 4 1 2 3 5 2 1 5 4 2 4 3 2 1
[3590] 4 2 3 3 5 3 2 5 1 5 5 2 4 3 3 1 5 5 4 2 1 4 1 3 4 4 2 4 5 1 4 2 5 5 4 2 3
[3627] 5 1 3 2 2 4 1 3 3 4 2 1 5 3 2 2 3 3 4 3 1 4 5 5 3 4 1 4 1 3 1 2 2 5 4 4 3
[3664] 4 4 5 4 5 4 5 4 4 5 1 5 5 5 4 3 4 2 1 1 5 3 1 2 5 5 4 1 1 1 1 1 5 2 3 4 2
[3701] 4 1 1 3 5 2 5 4 2 2 4 5 5 5 4 3 5 4 1 3 4 2 5 2 1 5 4 5 5 3 3 5 1 3 1 2 2
[3738] 1 3 1 2 5 3 3 4 2 1 5 4 5 2 3 3 1 4 1 4 4 3 2 5 2 4 2 4 5 2 1 5 2 5 4 4 2
[3775] 2 3 1 2 5 5 4 3 3 4 2 3 1 5 1 4 3 4 5 4 1 3 4 2 5 1 3 3 4 2 4 5 2 3 4 5 1
[3812] 2 3 3 3 1 4 3 5 3 4 3 1 4 1 1 3 2 4 3 5 1 3 2 4 5 2 2 3 2 2 3 3 3 2 4 1 5
[3849] 2 2 3 3 4 4 1 4 1 4 4 1 1 1 4 2 3 4 3 4 2 4 1 1 3 1 4 1 2 5 4 2 2 4 3 4 1
[3886] 2 5 1 1 1 5 4 4 1 4 4 5 5 5 3 5 5 2 4 2 1 5 3 4 5 4 1 2 2 2 3 2 3 5 3 2 5
[3923] 3 2 4 5 4 4 4 3 4 5 5 5 3 3 5 2 1 2 3 5 5 4 2 3 3 2 1 1 1 5 1 1 4 3 2 2 1
[3960] 2 2 1 3 3 2 3 4 4 3 4 3 1 2 4 5 3 4 4 1 3 1 1 5 4 5 4 4 2 1 1 3 5 4 2 5 1
[3997] 3 4 5 3 2 5 5 3 4 4 4 3 5 4 1 2 2 5 3 4 1 3 3 4 5 5 4 1 3 4 3 4 3 1 5 1 3
[4034] 5 2 2 2 5 2 4 2 1 1 5 5 3 5 5 1 3 5 4 5 4 5 2 5 1 4 1 1 1 2 5 1 4 1 3 3 5
[4071] 1 3 1 1 5 5 1 4 4 4 4 2 2 1 3 5 2 4 3 3 2 1 1 4 3 4 3 5 2 3 1 5 3 4 2 5 3
[4108] 5 3 2 4 1 3 3 5 1 2 5 4 3 1 2 3 5 5 3 3 3 3 5 2 2 2 4 2 3 2 1 3 3 4 5 4 4
[4145] 3 2 5 2 2 3 3 2 4 5 2 2 2 4 4 4 4 2 4 2 3 1 3 1 1 3 4 5 5 3 4 2 2 1 3 2 3
[4182] 2 4 5 3 1 3 4 4 4 4 4 2 5 1 1 4 2 3 4 2 4 5 3 2 4 4 5 4 5 4 2 2 2 4 3 5 5
[4219] 3 1 1 5 3 3 5 4 3 3 4 1 2 4 1 3 2 3 2 5 1 4 2 3 4 1 4 1 5 4 2 3 2 5 4 5 4
[4256] 1 1 2 1 2 1 4 5 1 5 5 4 5 5 3 5 1 3 1 3 5 4 4 1 5 3 2 3 1 2 2 3 4 1 5 5 2
[4293] 3 2 3 3 5 4 4 2 2 3 1 3 4 3 1 3 1 5 5 1 3 4 2 4 5 4 3 5 3 3 2 2 2 1 5 4 1
[4330] 2 1 4 1 4 4 5 5 1 1 4 2 2 3 3 2 4 5 1 2 1 2 5 3 4 1 3 2 4 1 3 5 1 4 5 5 1
[4367] 3 2 1 1 3 5 5 4 2 3 2 3 4 1 3 4 5 1 3 5 2 4 5 4 2 1 3 5 3 2 3 4 1 5 2 2 2
[4404] 3 5 5 2 1 4 2 4 4 5 5 4 2 3 1 5 4 3 5 3 2 1 4 1 4 4 3 2 1 5 4 1 4 5 4 5 5
[4441] 2 5 3 4 3 3 1 1 5 1 2 2 4 4 2 1 1 3 4 1 4 5 4 1 3 5 3 4 5 1 4 1 2 3 4 2 3
[4478] 2 5 5 3 5 4 1 4 2 3 5 1 4 1 4 3 1 5 2 4 3 1 2 2 3 1 3 3 5 2 1 3 2 5 4 5 3
[4515] 2 2 4 5 5 3 5 2 2 2 5 1 1 4 3 5 2 4 4 5 5 4 5 2 1 3 4 5 1 3 3 4 2 2 3 5 3
[4552] 2 2 3 5 1 4 5 3 3 1 5 5 4 2 2 3 3 2 5 5 1 2 5 4 4 3 2 1 2 5 2 4 2 2 1 2 3
[4589] 5 4 3 3 1 4 4 1 4 2 4 5 2 5 3 5 1 5 3 1 2 1 2 3 3 2 4 4 3 4 4 2 1 5 4 3 2
[4626] 4 5 5 1 5 1 4 1 2 3 4 3 1 4 4 5 4 4 3 2 3 3 5 2 3 4 5 5 4 1 3 3 2 3 1 1 4
[4663] 2 3 1 4 4 1 4 5 5 5 4 3 3 4 3 4 4 3 5 5 3 4 3 1 4 2 5 1 4 4 4 1 4 2 1 5 1
[4700] 2 5 4 4 4 4 3 5 3 5 4 2 5 5 3 4 4 5 5 5 3 2 4 3 2 5 3 3 3 5 3 4 2 3 2 3 4
[4737] 3 5 4 3 3 2 3 4 1 4 3 3 4 5 4 5 4 3 1 2 5 2 2 1 2 3 2 5 5 3 4 1 3 5 5 4 5
[4774] 4 3 5 3 4 3 1 4 2 3 5 5 2 2 4 3 3 4 2 5 3 2 4 3 3 4 1 4 1 3 5 5 5 4 2 4 1
[4811] 5 2 2 3 3 4 4 5 5 1 3 3 5 1 1 3 2 5 1 1 2 5 4 2 2 2 4 2 2 5 5 4 5 4 2 3 4
[4848] 2 3 1 4 2 1 1 1 2 3 2 3 3 5 3 1 3 5 4 3 3 5 3 2 2 1 2 3 2 4 1 1 3 1 5 1 2
[4885] 4 5 4 2 1 4 2 4 3 2 4 3 2 5 4 5 4 3 4 5 1 3 2 5 4 4 2 3 4 2 5 3 4 3 4 2 1
[4922] 1 4 4 2 3 4 3 5 5 2 2 4 4 4 5 4 1 4 3 1 4 1 3 1 4 3 5 4 5 4 4 1 3 1 3 3 4
[4959] 5 4 3 5 4 3 4 4 1 2 1 3 2 1 2 1 4 1 3 2 5 3 2 4 5 3 3 2 2 2 2 4 4 4 1 2 5
[4996] 4 4 1 2 5 1 2 3 2 3 1 5 5 2 3 1 4 3 2 2 3 1 2 4 4 4 5 5 4 1 1 1 3 2 4 4 3
[5033] 4 4 3 5 5 1 3 3 5 2 5 3 1 4 1 1 2 3 4 2 4 3 4 5 3 2 4 5 2 3 5 5 2 5 3 5 4
[5070] 4 3 3 2 2 4 1 2 4 5 3 4 4 4 1 4 3 1 4 3 5 1 2 2 2 3 2 2 4 1 3 5 3 4 4 3 2
[5107] 3 1 2 1 1 1 3 3 2 3 5 3 4 2 4 4 1 3 5 5 1 2 1 1 2 2 4 2 4 2 1 2 2 4 2 2 3
[5144] 4 2 4 2 1 1 3 1 5 3 1 3 2 5 3 4 3 5 4 5 2 2 4 3 4 2 1 2 2 5 5 5 4 4 4 2 4
[5181] 3 2 2 5 5 5 3 3 2 4 5 2 3 4 3 5 4 2 2 4 5 1 3 4 1 2 2 4 2 5 4 3 4 1 3 5 2
[5218] 3 2 5 4 5 1 1 2 5 4 3 2 2 1 1 5 1 1 3 5 4 5 3 3 3 3 4 5 3 5 4 1 5 5 2 1 1
[5255] 1 5 4 2 2 3 4 3 3 5 2 5 4 5 5 4 4 2 5 2 3 4 1 4 3 1 5 2 5 5 2 4 1 2 1 2 1
[5292] 1 3 5 2 3 2 1 5 4 4 2 1 4 2 3 1 1 5 3 2 1 5 3 3 3 3 1 5 4 3 1 5 5 5 5 1 5
[5329] 5 4 2 3 3 5 4 5 5 3 2 1 3 2 2 2 3 2 4 3 3 2 2 3 4 3 4 4 5 1 2 3 5 3 5 3 3
[5366] 5 1 3 1 5 4 2 5 3 2 4 4 4 2 4 2 5 1 4 2 5 3 2 5 5 5 4 5 1 2 2 2 4 3 1 3 5
[5403] 5 4 2 1 1 4 1 3 2 5 2 1 4 1 2 2 2 2 3 5 2 5 3 1 2 4 1 3 5 4 5 5 2 3 3 4 4
[5440] 1 4 5 4 1 5 5 4 3 4 5 5 3 2 1 2 3 3 2 1 5 2 5 3 3 5 3 3 3 4 4 5 4 4 2 2 5
[5477] 4 5 2 2 5 3 1 1 2 2 3 4 2 4 2 1 3 5 5 5 1 2 4 1 4 1 1 3 1 2 3 3 1 1 4 4 1
[5514] 3 2 2 3 5 5 3 5 3 5 1 3 4 2 3 3 4 5 4 5 1 4 3 1 3 3 3 5 5 2 4 1 5 5 5 3 1
[5551] 5 5 2 2 4 4 5 2 5 1 3 3 4 2 5 1 4 5 4 1 3 3 5 5 5 3 3 5 1 1 3 5 1 5 1 3 4
[5588] 5 4 1 4 3 4 5 4 3 4 1 2 1 3 2 1 2 1 2 4 4 1 1 4 5 1 4 5 3 4 2 2 4 3 5 2 3
[5625] 1 3 3 5 2 4 5 3 2 1 4 3 3 3 3 4 3 3 1 1 4 1 1 2 5 3 4 5 4 1 2 4 2 4 1 4 1
[5662] 4 3 2 5 2 5 1 4 2 3 3 1 2 5 5 2 3 3 1 2 4 3 2 2 4 5 4 2 3 2 2 4 2 3 4 5 3
[5699] 4 1 3 5 4 5 4 2 4 2 2 1 4 5 3 4 4 4 5 2 4 2 2 3 1 2 3 2 4 1 2 1 4 2 5 2 2
[5736] 1 3 3 3 5 3 4 4 1 4 3 4 2 3 3 2 1 1 1 1 2 4 3 3 3 5 4 1 3 3 3 1 1 5 4 2 1
[5773] 1 4 1 5 4 3 3 4 3 3 1 3 3 2 5 2 4 3 5 1 4 3 3 5 4 4 2 4 4 4 1 4 5 5 5 2 2
[5810] 3 5 1 5 3 3 3 1 4 3 2 4 3 2 4 3 5 3 4 5 5 1 4 5 5 3 3 3 4 2 3 1 2 2 3 2 3
[5847] 3 4 3 1 1 2 3 4 5 4 2 5 4 3 2 4 3 2 1 4 3 2 5 1 5 4 1 5 4 4 2 4 1 4 3 1 4
[5884] 3 3 4 1 3 2 1 5 2 4 4 3 3 5 2 3 3 3 2 5 3 3 5 2 1 5 4 3 2 2 2 5 1 5 4 2 3
[5921] 3 1 1 3 4 3 2 3 2 4 3 5 2 3 1 2 4 3 2 4 2 3 1 5 3 2 4 5 5 4 2 5 4 2 4 5 2
[5958] 5 2 3 4 1 3 1 2 1 4 3 1 2 5 4 3 5 3 5 4 2 2 4 1 4 1 1 5 5 2 3 3 4 3 2 4 5
[5995] 1 2 2 3 2 4 3 2 3 5 4 2 3 4 4 1 3 5 3 4 5 2 3 4 1 4 2 5 4 3 4 3 3 4 5 5 2
[6032] 3 4 5 3 5 5 3 2 4 5 1 4 2 3 5 3 4 1 1 5 5 4 5 4 3 2 3 1 5 5 4 2 5 5 1 1 1
[6069] 4 1 3 3 5 5 2 1 1 1 4 4 4 5 1 4 4 3 4 3 4 5 5 2 4 3 4 1 2 1 3 3 2 3 4 3 1
[6106] 1 5 2 2 3 2 2 5 3 2 1 3 5 5 2 4 4 2 4 5 4 2 2 3 2 4 1 5 3 5 2 2 2 3 4 3 1
[6143] 5 4 4 2 5 4 4 4 4 5 5 2 5 3 1 4 2 2 4 3 3 1 1 4 3 2 1 4 5 3 4 5 5 5 3 3 4
[6180] 1 5 2 4 4 5 2 1 1 2 2 4 1 2 4 3 5 5 3 3 4 5 4 1 1 4 4 2 1 1 2 5 1 5 5 2 1
[6217] 4 3 2 4 3 2 3 3 2 2 5 3 2 1 1 4 4 3 4 1 4 5 4 2 3 1 4 4 3 1 1 1 1 3 5 3 1
[6254] 2 1 5 4 3 4 3 2 2 4 3 1 5 1 4 4 1 3 4 4 3 2 4 1 4 1 5 1 4 4 5 3 4 3 5 3 4
[6291] 3 2 2 2 3 2 2 4 3 4 1 5 5 2 5 5 3 5 4 4 3 2 2 4 2 3 2 1 2 2 3 1 2 2 5 3 2
[6328] 3 2 3 2 4 5 3 1 5 2 5 4 2 4 1 3 3 3 4 1 2 5 1 4 4 5 1 3 3 4 1 5 4 2 4 4 3
[6365] 4 4 5 3 1 1 5 3 2 4 5 2 2 2 4 1 5 4 2 3 5 4 3 5 1 4 2 4 3 2 5 1 1 2 5 3

Labels:
 value                 label
     1   Highly dissatisfied
     2 Somewhat dissatisfied
     3               Neutral
     4    Somewhat satisfied
     5      Highly satisfied
[1] "Highly dissatisfied"
[1] "Highly dissatisfied"

11.4 Converting labelled data

  • It should be noted that value labels doesn’t imply that your vectors should be considered as categorical or continuous.

  • Therefore, value labels are not intended to be used for data analysis. For example, before performing modeling or plotting, you should convert vectors with value labels into factors or into classic numeric/character vectors.

  • Labelled data cheatsheet: https://raw.githubusercontent.com/rstudio/cheatsheets/main/labelled.pdf

11.4.1 Converting to factors

  • Notice the data after conversion
  • If we had to convert labelled data to character and then to factor, there are a lot of wrangling to do.
  • haven::as_factor() convert labelled data to factor with level descriptors directly, preserving the order of levels that were used to be associated with the value labels.
```{r converting labelled data to unlabelled data}
data %>% 
  select(marital, inccat, carcat, jobsat) %>% 
  glimpse()

data %>% 
  transmute(marital = haven::as_factor(marital),
            inccat = haven::as_factor(inccat),
            carcat = haven::as_factor(carcat)) %>% 
  glimpse()

# cf: compare with the above
data %>% 
  transmute(marital = as_factor(marital),
            inccat = as_factor(inccat),
            carcat = as_factor(carcat)) %>% 
  glimpse()
```
Rows: 6,400
Columns: 4
$ marital <dbl+lbl> 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1…
$ inccat  <dbl+lbl> 3, 4, 2, 2, 1, 4, 2, 3, 1, 4, 3, 1, 2, 4, 3, 4, 2, 2, 4, 4…
$ carcat  <dbl+lbl> 3, 3, 1, 1, 1, 3, 2, 2, 1, 3, 3, 1, 2, 3, 3, 3, 2, 1, 3, 3…
$ jobsat  <dbl+lbl> 5, 4, 3, 1, 2, 2, 2, 1, 5, 4, 3, 5, 2, 1, 4, 5, 1, 4, 3, 5…
Rows: 6,400
Columns: 3
$ marital <fct> Married, Unmarried, Married, Married, Unmarried, Married, Unma…
$ inccat  <fct> $50 - $74, $75+, $25 - $49, $25 - $49, Under $25, $75+, $25 - …
$ carcat  <fct> Luxury, Luxury, Economy, Economy, Economy, Luxury, Standard, S…
Rows: 6,400
Columns: 3
$ marital <fct> 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0,…
$ inccat  <fct> 3, 4, 2, 2, 1, 4, 2, 3, 1, 4, 3, 1, 2, 4, 3, 4, 2, 2, 4, 4, 1,…
$ carcat  <fct> 3, 3, 1, 1, 1, 3, 2, 2, 1, 3, 3, 1, 2, 3, 3, 3, 2, 1, 3, 3, 1,…

11.4.1.1 Alt.

```{r}
# convert selected variables to factors
data %>% 
  mutate_at(.vars = vars(marital, inccat, carcat), haven::as_factor) %>% 
  select(marital, inccat, carcat) %>% 
  glimpse()
```
Rows: 6,400
Columns: 3
$ marital <fct> Married, Unmarried, Married, Married, Unmarried, Married, Unma…
$ inccat  <fct> $50 - $74, $75+, $25 - $49, $25 - $49, Under $25, $75+, $25 - …
$ carcat  <fct> Luxury, Luxury, Economy, Economy, Economy, Luxury, Standard, S…

11.4.2 Converting to doubles

```{r}
data %>% 
  mutate(across(where(haven::is.labelled), remove_val_labels)) %>% 
  glimpse()
```
Rows: 6,400
Columns: 29
$ age      <dbl> 55, 56, 28, 24, 25, 45, 42, 35, 46, 34, 55, 28, 31, 42, 35, 5…
$ marital  <dbl> 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0…
$ address  <dbl> 12, 29, 9, 4, 2, 9, 19, 15, 26, 0, 17, 3, 9, 8, 8, 24, 1, 0, …
$ income   <dbl> 72, 153, 28, 26, 23, 76, 40, 57, 24, 89, 72, 24, 40, 137, 70,…
$ inccat   <dbl> 3, 4, 2, 2, 1, 4, 2, 3, 1, 4, 3, 1, 2, 4, 3, 4, 2, 2, 4, 4, 1…
$ car      <dbl> 36.2, 76.9, 13.7, 12.5, 11.3, 37.2, 19.8, 28.2, 12.2, 46.1, 3…
$ carcat   <dbl> 3, 3, 1, 1, 1, 3, 2, 2, 1, 3, 3, 1, 2, 3, 3, 3, 2, 1, 3, 3, 1…
$ ed       <dbl> 1, 1, 3, 4, 2, 3, 3, 2, 1, 3, 3, 4, 4, 3, 3, 4, 3, 1, 3, 2, 1…
$ employ   <dbl> 23, 35, 4, 0, 5, 13, 10, 1, 11, 12, 2, 4, 0, 3, 9, 16, 0, 2, …
$ retire   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ empcat   <dbl> 3, 3, 1, 1, 2, 2, 2, 1, 2, 2, 1, 1, 1, 1, 2, 3, 1, 1, 3, 3, 1…
$ jobsat   <dbl> 5, 4, 3, 1, 2, 2, 2, 1, 5, 4, 3, 5, 2, 1, 4, 5, 1, 4, 3, 5, 3…
$ gender   <chr> "f", "m", "f", "m", "m", "m", "m", "f", "f", "m", "f", "m", "…
$ reside   <dbl> 4, 1, 3, 3, 2, 2, 1, 1, 2, 6, 2, 1, 4, 1, 3, 2, 7, 2, 1, 4, 1…
$ wireless <dbl> 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0…
$ multline <dbl> 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0…
$ voice    <dbl> 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0…
$ pager    <dbl> 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0…
$ internet <dbl> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0…
$ callid   <dbl> 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0…
$ callwait <dbl> 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0…
$ owntv    <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ ownvcr   <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0…
$ owncd    <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0…
$ ownpda   <dbl> 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0…
$ ownpc    <dbl> 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 0…
$ ownfax   <dbl> 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0…
$ news     <dbl> 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1…
$ response <dbl> 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1…

11.4.3 Converting to characters

  • When converting a labelled vector into a character vector of the value labels, be aware that original values of the vector will be converted.
  • numbers will be gone
```{r}
data %>% 
  as_character(marital) %>% 
  #select(marital) %>%
  glimpse()
```
Rows: 6,400
Columns: 29
$ age      <dbl> 55, 56, 28, 24, 25, 45, 42, 35, 46, 34, 55, 28, 31, 42, 35, 5…
$ marital  <chr> "Married", "Unmarried", "Married", "Married", "Unmarried", "M…
$ address  <dbl> 12, 29, 9, 4, 2, 9, 19, 15, 26, 0, 17, 3, 9, 8, 8, 24, 1, 0, …
$ income   <dbl> 72, 153, 28, 26, 23, 76, 40, 57, 24, 89, 72, 24, 40, 137, 70,…
$ inccat   <dbl+lbl> 3, 4, 2, 2, 1, 4, 2, 3, 1, 4, 3, 1, 2, 4, 3, 4, 2, 2, 4, …
$ car      <dbl> 36.2, 76.9, 13.7, 12.5, 11.3, 37.2, 19.8, 28.2, 12.2, 46.1, 3…
$ carcat   <dbl+lbl> 3, 3, 1, 1, 1, 3, 2, 2, 1, 3, 3, 1, 2, 3, 3, 3, 2, 1, 3, …
$ ed       <dbl+lbl> 1, 1, 3, 4, 2, 3, 3, 2, 1, 3, 3, 4, 4, 3, 3, 4, 3, 1, 3, …
$ employ   <dbl> 23, 35, 4, 0, 5, 13, 10, 1, 11, 12, 2, 4, 0, 3, 9, 16, 0, 2, …
$ retire   <dbl+lbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ empcat   <dbl+lbl> 3, 3, 1, 1, 2, 2, 2, 1, 2, 2, 1, 1, 1, 1, 2, 3, 1, 1, 3, …
$ jobsat   <dbl+lbl> 5, 4, 3, 1, 2, 2, 2, 1, 5, 4, 3, 5, 2, 1, 4, 5, 1, 4, 3, …
$ gender   <chr+lbl> "f", "m", "f", "m", "m", "m", "m", "f", "f", "m", "f", "m…
$ reside   <dbl> 4, 1, 3, 3, 2, 2, 1, 1, 2, 6, 2, 1, 4, 1, 3, 2, 7, 2, 1, 4, 1…
$ wireless <dbl+lbl> 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, …
$ multline <dbl+lbl> 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, …
$ voice    <dbl+lbl> 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, …
$ pager    <dbl+lbl> 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, …
$ internet <dbl+lbl> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, …
$ callid   <dbl+lbl> 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, …
$ callwait <dbl+lbl> 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, …
$ owntv    <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ ownvcr   <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, …
$ owncd    <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, …
$ ownpda   <dbl+lbl> 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, …
$ ownpc    <dbl+lbl> 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, …
$ ownfax   <dbl+lbl> 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, …
$ news     <dbl+lbl> 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, …
$ response <dbl+lbl> 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, …

11.5 Create a new variable with a row-wise computation

  • Compute each person’s total number of electronics owned.
```{r}
glimpse(data)

data %>% 
  #select(starts_with("own")) %>% 
  rowwise(age) %>% 
  mutate(total_owns = sum(c_across(starts_with("own")))) %>% 
  var_labels(total_owns = "Total Electronics Owned") %>% 
  select(starts_with("own"), total_owns) %>% 
  #view()
  head()

by_row_data <- data %>% 
  rowwise(age) %>% 
  mutate(total_owns = sum(c_across(starts_with("own")))) %>% 
  set_variable_labels(total_owns = "Total Electronics Owned") %>% 
  ungroup()

by_row_data %>% #view()
  select(contains("own")) %>% 
  head()

var_label(by_row_data$total_owns)

# alternative way of rowwise summation
data |> 
  mutate(total_elect = owntv + ownvcr + owncd + ownpda + ownpc + ownfax) |> 
  select(starts_with("own"), total_elect)

# cf
data |> 
  mutate(total_elect = sum(across(starts_with("own")))) |> 
  select(starts_with("own"), total_elect)

# cf. by_row_data
by_row_data |> 
  select(contains("own"))
```
Rows: 6,400
Columns: 29
$ age      <dbl> 55, 56, 28, 24, 25, 45, 42, 35, 46, 34, 55, 28, 31, 42, 35, 5…
$ marital  <dbl+lbl> 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, …
$ address  <dbl> 12, 29, 9, 4, 2, 9, 19, 15, 26, 0, 17, 3, 9, 8, 8, 24, 1, 0, …
$ income   <dbl> 72, 153, 28, 26, 23, 76, 40, 57, 24, 89, 72, 24, 40, 137, 70,…
$ inccat   <dbl+lbl> 3, 4, 2, 2, 1, 4, 2, 3, 1, 4, 3, 1, 2, 4, 3, 4, 2, 2, 4, …
$ car      <dbl> 36.2, 76.9, 13.7, 12.5, 11.3, 37.2, 19.8, 28.2, 12.2, 46.1, 3…
$ carcat   <dbl+lbl> 3, 3, 1, 1, 1, 3, 2, 2, 1, 3, 3, 1, 2, 3, 3, 3, 2, 1, 3, …
$ ed       <dbl+lbl> 1, 1, 3, 4, 2, 3, 3, 2, 1, 3, 3, 4, 4, 3, 3, 4, 3, 1, 3, …
$ employ   <dbl> 23, 35, 4, 0, 5, 13, 10, 1, 11, 12, 2, 4, 0, 3, 9, 16, 0, 2, …
$ retire   <dbl+lbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ empcat   <dbl+lbl> 3, 3, 1, 1, 2, 2, 2, 1, 2, 2, 1, 1, 1, 1, 2, 3, 1, 1, 3, …
$ jobsat   <dbl+lbl> 5, 4, 3, 1, 2, 2, 2, 1, 5, 4, 3, 5, 2, 1, 4, 5, 1, 4, 3, …
$ gender   <chr+lbl> "f", "m", "f", "m", "m", "m", "m", "f", "f", "m", "f", "m…
$ reside   <dbl> 4, 1, 3, 3, 2, 2, 1, 1, 2, 6, 2, 1, 4, 1, 3, 2, 7, 2, 1, 4, 1…
$ wireless <dbl+lbl> 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, …
$ multline <dbl+lbl> 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, …
$ voice    <dbl+lbl> 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, …
$ pager    <dbl+lbl> 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, …
$ internet <dbl+lbl> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, …
$ callid   <dbl+lbl> 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, …
$ callwait <dbl+lbl> 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, …
$ owntv    <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ ownvcr   <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, …
$ owncd    <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, …
$ ownpda   <dbl+lbl> 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, …
$ ownpc    <dbl+lbl> 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, …
$ ownfax   <dbl+lbl> 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, …
$ news     <dbl+lbl> 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, …
$ response <dbl+lbl> 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, …
[1] "Total Electronics Owned"

11.6 Tidying data

  • Generate insights on consumer electronics ownership
```{r}
#library(tidyverse)

data %>% 
  select(starts_with("own")) %>% 
  head(10)

# Long data
data %>% 
  pivot_longer(starts_with("own"), names_to = "owns",
               values_to = "measures"
               ) %>% 
  select(age, marital, inccat, ed, owns, measures) 

owns_long <- data %>% 
  pivot_longer(starts_with("own"), names_to = "owns",
               values_to = "measures"
               ) %>% 
  select(age, marital, inccat, ed, owns, measures) 
```

11.7 Descriptive statistics in Tables

```{r}
library(gt)
```

11.7.1 Frequencies

  • For categorical data
```{r}
skimr::skim(data)

var_label(data$marital)

view_df(data)

# marital

var_label(data$marital)

data %>% 
  mutate(marital = haven::as_factor(marital)) |> 
  count(marital) |> 
  mutate(percent = n/sum(n),
         cum_sum = cumsum(percent)) |> 
  gt() |> 
  tab_header("Marital Status Frequency") |> 
  fmt_percent(columns = c(percent, cum_sum),
              decimals = 2
              ) |> 
  fmt_number(columns = n,
             decimals = 0)


# inccat

var_label(data$inccat)

data %>% 
  mutate(inccat = haven::as_factor(inccat)) %>% 
  count(inccat) %>% 
  mutate(percent = n/sum(n),
         cum_sum = cumsum(percent)
         ) |> 
  gt() |> 
  tab_header(title = "Income category Frequency",
             subtitle = "in thousands") |> 
  fmt_percent(columns = c(percent, cum_sum),
              decimals = 2
              ) |> 
  fmt_number(columns = n,
             decimals = 0) 


# carcat

var_label(data$carcat)

data |> 
  mutate(carcat = haven::as_factor(carcat)) |> 
  count(carcat) |> 
  mutate(percent = n/sum(n),
         cum_sum = cumsum(percent)) |> 
  gt() |> 
  tab_header(title = "Primary vehicle price category Frequency") |> 
  fmt_percent(columns = c(percent, cum_sum),
              decimals = 2
              ) |> 
  fmt_number(columns = n,
             decimals = 0) 
```
Data summary
Name data
Number of rows 6400
Number of columns 29
_______________________
Column type frequency:
character 1
numeric 28
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
gender 0 1 1 1 0 2 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
age 0 1.00 42.06 12.29 18.0 33.0 41.0 51.0 77.0 ▅▇▇▃▁
marital 0 1.00 0.50 0.50 0.0 0.0 0.0 1.0 1.0 ▇▁▁▁▇
address 0 1.00 11.56 9.94 0.0 3.0 9.0 17.0 56.0 ▇▃▂▁▁
income 0 1.00 69.47 78.72 9.0 28.0 45.0 79.0 1116.0 ▇▁▁▁▁
inccat 0 1.00 2.53 1.07 1.0 2.0 2.0 4.0 4.0 ▃▇▁▃▆
car 0 1.00 30.13 21.93 4.2 13.9 22.2 39.5 99.9 ▇▃▂▂▁
carcat 0 1.00 2.07 0.80 1.0 1.0 2.0 3.0 3.0 ▆▁▇▁▇
ed 0 1.00 2.59 1.20 1.0 2.0 2.0 4.0 5.0 ▆▇▆▆▂
employ 0 1.00 10.57 9.72 0.0 3.0 8.0 16.0 57.0 ▇▃▁▁▁
retire 0 1.00 0.05 0.21 0.0 0.0 0.0 0.0 1.0 ▇▁▁▁▁
empcat 0 1.00 1.94 0.79 1.0 1.0 2.0 3.0 3.0 ▇▁▇▁▆
jobsat 0 1.00 3.06 1.37 1.0 2.0 3.0 4.0 5.0 ▆▇▇▇▇
reside 0 1.00 2.35 1.47 1.0 1.0 2.0 3.0 9.0 ▇▃▁▁▁
wireless 0 1.00 0.40 0.49 0.0 0.0 0.0 1.0 1.0 ▇▁▁▁▅
multline 0 1.00 0.42 0.49 0.0 0.0 0.0 1.0 1.0 ▇▁▁▁▆
voice 0 1.00 0.43 0.50 0.0 0.0 0.0 1.0 1.0 ▇▁▁▁▆
pager 0 1.00 0.25 0.43 0.0 0.0 0.0 0.0 1.0 ▇▁▁▁▂
internet 255 0.96 0.27 0.44 0.0 0.0 0.0 1.0 1.0 ▇▁▁▁▃
callid 0 1.00 0.51 0.50 0.0 0.0 1.0 1.0 1.0 ▇▁▁▁▇
callwait 0 1.00 0.51 0.50 0.0 0.0 1.0 1.0 1.0 ▇▁▁▁▇
owntv 0 1.00 0.99 0.10 0.0 1.0 1.0 1.0 1.0 ▁▁▁▁▇
ownvcr 0 1.00 0.96 0.20 0.0 1.0 1.0 1.0 1.0 ▁▁▁▁▇
owncd 0 1.00 0.97 0.17 0.0 1.0 1.0 1.0 1.0 ▁▁▁▁▇
ownpda 0 1.00 0.20 0.40 0.0 0.0 0.0 0.0 1.0 ▇▁▁▁▂
ownpc 0 1.00 0.44 0.50 0.0 0.0 0.0 1.0 1.0 ▇▁▁▁▆
ownfax 0 1.00 0.19 0.39 0.0 0.0 0.0 0.0 1.0 ▇▁▁▁▂
news 0 1.00 0.57 0.50 0.0 0.0 1.0 1.0 1.0 ▆▁▁▁▇
response 0 1.00 0.89 0.31 0.0 1.0 1.0 1.0 1.0 ▁▁▁▁▇
[1] "Marital status"
Data frame: data
ID Name Label Values Value Labels
1 age Age in years range: 18-77
2 marital Marital status 0
1
Unmarried
Married
3 address Years at current address range: 0-56
4 income Household income in thousands range: 9-1116
5 inccat Income category in thousands 1
2
3
4
Under $25
$25 - $49
$50 - $74
$75+
6 car Price of primary vehicle range: 4.2-99.9
7 cbackground-color:#eeeeeeat Primary vehicle price category 1
2
3
Economy
Standard
Luxury
8 ed Level of education 1
2
3
4
5
Did not complete high school
High school degree
Some college
College degree
Post-undergraduate degree
9 employ Years with current employer range: 0-57
10 retire Retired 0
1
No
Yes
11 empcat Years with current employer 1
2
3
Less than 5
5 to 15
More than 15
12 jobsat Job satisfaction 1
2
3
4
5
Highly dissatisfied
Somewhat dissatisfied
Neutral
Somewhat satisfied
Highly satisfied
13 gender Gender f
m
<output omitted>
14 reside Number of people in household range: 1-9
15 wireless Wireless service 0
1
No
Yes
16 multline Multiple lines 0
1
No
Yes
17 voice Voice mail 0
1
No
Yes
18 pager Paging service 0
1
No
Yes
19 internet Internet 0
1
8
9
No
Yes
Does not know
No Answer
20 callid Caller ID 0
1
No
Yes
21 callwait Call waiting 0
1
No
Yes
22 owntv Owns TV 0
1
No
Yes
23 ownvcr Owns VCR 0
1
No
Yes
24 owncd Owns stereo/CD player 0
1
No
Yes
25 ownpda Owns PDA 0
1
No
Yes
26 ownpc Owns computer 0
1
No
Yes
27 ownfax Owns fax machine 0
1
No
Yes
28 news Newspaper subscription 0
1
Yes
No
29 response Response 0
1
Yes
No
[1] "Marital status"
Marital Status Frequency
Marital status n percent cum_sum
Unmarried 3,224 50.38% 50.38%
Married 3,176 49.62% 100.00%
[1] "Income category in thousands"
Income category Frequency
in thousands
Income category in thousands n percent cum_sum
Under $25 1,174 18.34% 18.34%
$25 - $49 2,388 37.31% 55.66%
$50 - $74 1,120 17.50% 73.16%
$75+ 1,718 26.84% 100.00%
[1] "Primary vehicle price category"
Primary vehicle price category Frequency
Primary vehicle price category n percent cum_sum
Economy 1,841 28.77% 28.77%
Standard 2,275 35.55% 64.31%
Luxury 2,284 35.69% 100.00%

11.7.2 Percentiles

  • for continuous data
```{r}
glimpse(data)
view_df(data)

data %>% 
  select(age, car, employ) %>% 
  summarize(
    Age = 
      quantile(age, probs = c(0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0)),
    Car_price = 
      quantile(car, probs = c(0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0)),
    Years_current_employer = 
      quantile(employ, probs = c(0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0))
    ) |> 
  mutate(Percentile = c("10th",
                        "20th", 
                        "30th",
                        "40th",
                        "50th",
                        "60th",
                        "70th",
                        "80th",
                        "90th",
                        "100th")) |> 
  relocate(Percentile) |> 
  gt() |> 
  tab_header(title = md("**Percentiles for selected continuous variables**")) |> 
  tab_source_note(
    source_note = md("**Data source:** The data is from demo.sav file, a file contained in SPSS Program")
  )
```
Rows: 6,400
Columns: 29
$ age      <dbl> 55, 56, 28, 24, 25, 45, 42, 35, 46, 34, 55, 28, 31, 42, 35, 5…
$ marital  <dbl+lbl> 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, …
$ address  <dbl> 12, 29, 9, 4, 2, 9, 19, 15, 26, 0, 17, 3, 9, 8, 8, 24, 1, 0, …
$ income   <dbl> 72, 153, 28, 26, 23, 76, 40, 57, 24, 89, 72, 24, 40, 137, 70,…
$ inccat   <dbl+lbl> 3, 4, 2, 2, 1, 4, 2, 3, 1, 4, 3, 1, 2, 4, 3, 4, 2, 2, 4, …
$ car      <dbl> 36.2, 76.9, 13.7, 12.5, 11.3, 37.2, 19.8, 28.2, 12.2, 46.1, 3…
$ carcat   <dbl+lbl> 3, 3, 1, 1, 1, 3, 2, 2, 1, 3, 3, 1, 2, 3, 3, 3, 2, 1, 3, …
$ ed       <dbl+lbl> 1, 1, 3, 4, 2, 3, 3, 2, 1, 3, 3, 4, 4, 3, 3, 4, 3, 1, 3, …
$ employ   <dbl> 23, 35, 4, 0, 5, 13, 10, 1, 11, 12, 2, 4, 0, 3, 9, 16, 0, 2, …
$ retire   <dbl+lbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ empcat   <dbl+lbl> 3, 3, 1, 1, 2, 2, 2, 1, 2, 2, 1, 1, 1, 1, 2, 3, 1, 1, 3, …
$ jobsat   <dbl+lbl> 5, 4, 3, 1, 2, 2, 2, 1, 5, 4, 3, 5, 2, 1, 4, 5, 1, 4, 3, …
$ gender   <chr+lbl> "f", "m", "f", "m", "m", "m", "m", "f", "f", "m", "f", "m…
$ reside   <dbl> 4, 1, 3, 3, 2, 2, 1, 1, 2, 6, 2, 1, 4, 1, 3, 2, 7, 2, 1, 4, 1…
$ wireless <dbl+lbl> 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, …
$ multline <dbl+lbl> 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, …
$ voice    <dbl+lbl> 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, …
$ pager    <dbl+lbl> 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, …
$ internet <dbl+lbl> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, …
$ callid   <dbl+lbl> 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, …
$ callwait <dbl+lbl> 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, …
$ owntv    <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ ownvcr   <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, …
$ owncd    <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, …
$ ownpda   <dbl+lbl> 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, …
$ ownpc    <dbl+lbl> 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, …
$ ownfax   <dbl+lbl> 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, …
$ news     <dbl+lbl> 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, …
$ response <dbl+lbl> 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, …
Data frame: data
ID Name Label Values Value Labels
1 age Age in years range: 18-77
2 marital Marital status 0
1
Unmarried
Married
3 address Years at current address range: 0-56
4 income Household income in thousands range: 9-1116
5 inccat Income category in thousands 1
2
3
4
Under $25
$25 - $49
$50 - $74
$75+
6 car Price of primary vehicle range: 4.2-99.9
7 cbackground-color:#eeeeeeat Primary vehicle price category 1
2
3
Economy
Standard
Luxury
8 ed Level of education 1
2
3
4
5
Did not complete high school
High school degree
Some college
College degree
Post-undergraduate degree
9 employ Years with current employer range: 0-57
10 retire Retired 0
1
No
Yes
11 empcat Years with current employer 1
2
3
Less than 5
5 to 15
More than 15
12 jobsat Job satisfaction 1
2
3
4
5
Highly dissatisfied
Somewhat dissatisfied
Neutral
Somewhat satisfied
Highly satisfied
13 gender Gender f
m
<output omitted>
14 reside Number of people in household range: 1-9
15 wireless Wireless service 0
1
No
Yes
16 multline Multiple lines 0
1
No
Yes
17 voice Voice mail 0
1
No
Yes
18 pager Paging service 0
1
No
Yes
19 internet Internet 0
1
8
9
No
Yes
Does not know
No Answer
20 callid Caller ID 0
1
No
Yes
21 callwait Call waiting 0
1
No
Yes
22 owntv Owns TV 0
1
No
Yes
23 ownvcr Owns VCR 0
1
No
Yes
24 owncd Owns stereo/CD player 0
1
No
Yes
25 ownpda Owns PDA 0
1
No
Yes
26 ownpc Owns computer 0
1
No
Yes
27 ownfax Owns fax machine 0
1
No
Yes
28 news Newspaper subscription 0
1
Yes
No
29 response Response 0
1
Yes
No
Percentiles for selected continuous variables
Percentile Age Car_price Years_current_employer
10th 26 10.00 0
20th 31 12.60 2
30th 34 15.20 4
40th 38 18.20 6
50th 41 22.20 8
60th 45 27.20 11
70th 49 34.10 14
80th 53 47.22 18
90th 59 69.10 25
100th 77 99.90 57
Data source: The data is from demo.sav file, a file contained in SPSS Program

11.7.3 means/sd from a Wide Data

```{r}
# means
data |> 
  summarise(across(starts_with("own"), \(x) mean(x, na.rm = TRUE))) |> 
  gt() |> 
  tab_header(title = md("**Proportion of Participants who Own Electronics**")
             ) |> 
  fmt_percent(columns = everything(),
              decimals = 1) |> 
  tab_source_note(
    source_note = md(
      "**Data source:** The data is from demo.sav file, a file contained in SPSS Program"
      )
  )
 

# sd
data |> 
  summarise(across(starts_with("own"), \(x) sd(x, na.rm = TRUE)
                   )
            ) |> 
  gt() |> 
  tab_header(title = md("**Standard Deviation of Electronics Ownership**")
             ) |> 
  fmt_percent(columns = everything(),
              decimals = 1) |> 
  tab_source_note(
    source_note = md(
      "**Data source:** The data is from demo.sav file, a file contained in SPSS Program"
      )
  )  

# both
data %>% 
  summarize(across(starts_with("own"), 
                   .fns =  list(prop = mean, sd = sd),
                   .names = "{.col}_{.fn}")) 
```
Proportion of Participants who Own Electronics
owntv ownvcr owncd ownpda ownpc ownfax
99.0% 96.0% 97.0% 20.4% 43.9% 18.8%
Data source: The data is from demo.sav file, a file contained in SPSS Program
Standard Deviation of Electronics Ownership
owntv ownvcr owncd ownpda ownpc ownfax
9.9% 19.6% 17.1% 40.3% 49.6% 39.1%
Data source: The data is from demo.sav file, a file contained in SPSS Program

11.8 Visualizing Likert-Type Scales

11.8.1 Inefficient way

  • Don’t use three step approach: from labelled to character to factor and then releveling as it takes more steps.
  • Don’t factorize labelled data using factor() as you lose rich label information
```{r}
var_label(data$inccat)

data %>%
  mutate(inccat = as_character(inccat),
         inccat = fct_relevel(inccat, "Under $25")) %>% 
  ggplot() +
  geom_bar(aes(x= inccat, y = after_stat(prop), group = 1), fill = 'purple') +
  scale_y_continuous(labels = scales::percent) +
  labs(title = "Income categories in the demo data",
       subtitle = "Distribution of income",
       x = "Income category in thousands",
       y = "",
       caption = ""
         )

var_label(data$carcat)

data %>%
  mutate(carcat = as_character(carcat),
         carcat = fct_relevel(carcat, c("Economy", 
                                        "Standard",
                                        "Luxury"))) %>% 
  ggplot() +
  geom_bar(aes(x= carcat, y = after_stat(prop), group = 1), fill = 'orange') +
  scale_y_continuous(labels = scales::percent) +
  labs(title = "Primary vehicle price category in the demo data",
       subtitle = "",
       x = "Primary vehicle price category",
       y = "",
       caption = ""
         )

var_label(data$jobsat)
val_labels(data$jobsat)

data %>%
  ggplot() +
  geom_bar(aes(x= factor(jobsat), y = after_stat(prop), group = 1), fill = '#D35400') +
  scale_y_continuous(labels = scales::percent) +
  labs(title = "Job satisfaction in the demo data",
       subtitle = "",
       x = "Job satisfaction",
       y = "",
       caption = "0 = Highly dissatisfied, 5 = Highly satisfied"
         )
```
[1] "Income category in thousands"
[1] "Primary vehicle price category"
[1] "Job satisfaction"
<labelled<double>[6400]>: Job satisfaction
   [1] 5 4 3 1 2 2 2 1 5 4 3 5 2 1 4 5 1 4 3 5 3 3 3 3 4 3 1 5 5 4 1 4 1 2 5 1 1
  [38] 3 4 3 3 2 5 2 3 1 2 3 2 3 2 5 4 2 2 1 3 3 3 3 5 3 2 2 1 1 4 3 4 2 5 5 4 5
  [75] 2 3 5 5 5 3 3 4 4 2 2 5 4 3 4 4 5 4 1 3 5 1 4 2 2 2 3 3 4 3 3 2 5 4 1 4 4
 [112] 2 2 2 1 4 1 1 1 5 3 2 5 1 4 3 2 2 1 3 2 1 1 2 1 3 1 3 4 1 4 3 1 5 1 4 2 2
 [149] 2 5 2 2 2 4 4 5 2 2 1 2 3 4 3 1 2 3 4 2 2 2 2 1 3 4 4 1 4 4 4 2 1 4 3 2 2
 [186] 3 4 5 5 5 3 2 1 5 5 2 1 3 5 3 3 5 3 4 2 4 2 1 5 2 5 3 5 4 5 3 4 2 4 4 1 1
 [223] 2 2 4 1 1 5 1 3 2 1 4 5 1 3 4 4 5 5 5 5 4 2 2 4 1 4 5 5 5 4 2 5 5 1 4 5 1
 [260] 5 2 4 5 4 1 3 5 5 2 3 5 1 4 5 5 4 3 4 2 4 3 3 4 3 1 2 5 2 2 3 2 5 4 2 4 3
 [297] 2 2 1 1 4 1 2 4 5 5 3 1 2 4 5 4 4 4 4 5 3 4 1 5 1 4 4 3 4 1 1 4 5 5 2 3 3
 [334] 3 3 2 1 5 3 2 1 4 4 3 3 3 1 5 1 4 1 5 5 1 3 5 3 5 5 3 2 2 3 4 4 5 4 5 2 1
 [371] 5 5 2 3 5 2 3 3 3 3 3 4 2 1 5 4 4 3 2 1 3 5 4 3 2 4 4 4 1 4 4 1 1 1 1 5 3
 [408] 3 4 4 4 2 3 1 2 2 4 4 3 5 3 3 3 5 4 3 4 5 2 3 4 4 4 1 4 2 3 2 3 5 3 5 1 2
 [445] 1 5 4 1 5 5 2 5 3 4 4 5 3 4 2 4 4 4 1 3 3 5 2 3 2 3 4 3 3 4 2 2 1 3 5 4 3
 [482] 3 1 3 4 1 1 1 3 5 4 2 2 4 1 4 5 3 3 1 2 4 1 3 4 5 3 3 5 4 2 3 1 2 4 3 2 4
 [519] 5 3 4 5 1 3 4 2 1 5 2 1 4 2 1 5 4 3 4 2 5 4 1 3 5 4 1 2 3 2 4 2 2 4 2 4 3
 [556] 2 1 1 1 3 1 4 2 4 2 4 3 1 5 5 4 2 3 4 3 1 1 4 1 1 1 1 2 3 3 4 3 1 4 4 2 5
 [593] 4 2 5 4 5 4 2 2 3 4 4 4 3 3 4 5 5 2 3 1 5 5 4 3 2 1 3 5 5 3 3 1 3 4 3 3 2
 [630] 3 1 5 5 4 3 1 4 2 2 2 3 1 5 3 5 1 2 4 2 3 2 2 2 4 2 3 2 2 5 2 4 3 2 4 2 4
 [667] 4 1 2 3 1 5 1 4 5 3 3 2 4 4 3 3 2 5 1 4 2 5 4 2 4 1 1 4 1 2 3 2 2 5 5 3 4
 [704] 5 4 5 1 5 3 4 2 2 3 3 1 2 1 3 3 3 2 2 2 3 2 4 2 5 1 5 2 4 1 4 4 4 3 2 5 4
 [741] 4 2 4 4 3 2 3 4 2 5 3 1 2 3 5 5 1 3 5 2 3 4 5 5 2 2 4 3 5 2 4 4 5 3 1 3 5
 [778] 1 5 5 2 2 5 4 2 2 2 3 1 3 5 3 1 3 4 2 5 3 5 5 3 2 3 4 1 5 5 1 4 1 2 2 1 2
 [815] 2 5 2 3 5 2 3 2 4 2 1 3 3 2 5 3 5 3 1 3 5 4 1 3 1 5 4 2 4 3 1 1 1 4 5 3 3
 [852] 1 5 2 2 4 4 1 1 2 5 4 1 4 4 2 5 1 2 4 3 3 2 1 3 5 4 2 3 3 3 2 3 3 1 4 1 3
 [889] 2 3 1 3 2 1 3 5 4 3 3 1 5 3 4 4 2 2 3 5 3 5 3 5 3 4 4 3 3 3 1 1 1 4 3 5 4
 [926] 2 2 3 2 5 3 1 3 5 3 2 4 1 3 5 5 1 5 5 2 4 5 4 4 2 3 3 2 1 3 5 3 2 5 4 2 2
 [963] 5 3 4 3 1 1 1 1 5 4 5 5 2 5 5 2 3 4 4 4 4 4 3 1 2 5 4 4 3 4 1 4 4 5 3 3 1
[1000] 4 1 4 1 2 5 5 4 3 5 3 4 4 1 4 1 5 3 2 4 5 5 5 1 2 5 5 4 5 4 3 4 3 4 5 4 1
[1037] 5 4 4 2 2 3 1 1 2 4 3 3 5 4 4 2 2 3 2 3 2 3 5 3 5 1 2 4 1 1 3 2 5 3 3 4 5
[1074] 2 2 5 3 4 3 5 3 1 5 3 4 2 1 5 2 3 3 4 2 2 3 3 3 1 2 5 5 3 3 2 4 5 4 5 4 5
[1111] 1 4 1 2 4 3 2 5 1 3 1 1 2 3 5 3 3 1 3 1 5 3 4 2 1 3 1 2 2 5 5 4 5 1 1 4 5
[1148] 4 3 4 2 3 4 5 1 2 3 2 5 5 2 2 1 3 3 2 3 2 5 3 1 1 3 2 3 4 2 1 2 4 2 3 4 5
[1185] 3 1 4 4 5 4 4 1 5 5 5 4 2 2 5 5 2 1 2 5 2 1 4 2 4 1 5 3 1 1 2 1 2 4 4 4 1
[1222] 2 3 4 3 5 5 1 1 5 4 5 4 1 3 4 2 2 2 3 4 4 1 4 2 3 5 5 3 4 4 1 3 1 3 5 4 4
[1259] 2 1 4 5 5 2 5 2 5 4 3 1 4 5 5 4 1 4 5 5 3 1 3 4 4 2 5 3 4 5 2 2 2 4 3 2 1
[1296] 3 1 4 4 2 4 1 3 2 2 4 3 1 2 1 5 3 4 5 5 4 2 4 2 1 5 4 5 4 4 5 3 4 1 4 4 3
[1333] 3 3 4 5 5 1 3 3 4 4 3 4 1 3 1 4 4 1 1 2 5 4 2 2 2 5 2 2 3 1 3 3 2 2 4 5 2
[1370] 4 1 2 2 2 1 3 5 4 3 3 2 3 3 5 4 3 1 3 1 4 4 2 3 3 3 4 5 4 1 3 3 2 5 5 1 1
[1407] 2 2 1 2 5 3 4 4 3 2 1 4 2 2 5 4 3 3 4 1 1 5 2 1 4 2 4 1 5 5 2 4 1 1 2 3 2
[1444] 4 3 4 5 4 4 5 1 5 5 5 1 1 5 5 5 4 2 5 1 3 3 5 1 1 1 5 2 4 1 2 5 4 2 1 4 3
[1481] 1 3 5 3 2 3 5 2 1 3 2 4 4 1 5 1 3 2 5 5 5 4 4 5 2 5 4 3 1 1 1 5 3 5 2 5 2
[1518] 3 5 2 2 5 2 2 5 4 4 1 1 2 5 4 1 2 1 4 2 1 3 2 2 5 3 5 1 4 3 1 5 1 4 3 5 2
[1555] 3 1 4 2 4 4 2 2 2 5 2 1 1 2 1 4 4 2 3 5 1 4 3 3 3 5 2 3 1 5 4 5 5 2 5 2 2
[1592] 2 2 5 3 4 2 5 1 2 3 1 3 5 3 4 4 5 1 3 5 4 2 5 1 4 1 2 5 4 3 3 3 5 3 2 2 1
[1629] 1 3 4 2 4 3 2 5 4 1 1 4 4 5 5 3 1 5 4 4 2 2 3 5 4 2 2 3 5 5 2 2 1 1 2 4 2
[1666] 5 5 4 1 2 4 3 5 4 1 1 1 5 2 2 1 4 3 4 3 5 1 5 5 1 3 4 5 5 2 3 5 4 1 3 3 3
[1703] 2 2 3 5 3 3 1 1 2 5 1 5 3 2 2 3 2 5 2 3 4 1 2 4 4 4 1 5 2 5 2 2 5 1 5 1 2
[1740] 1 4 4 4 4 2 3 1 5 5 3 5 3 3 1 5 3 5 1 5 1 4 1 4 3 1 4 5 2 5 2 5 3 3 3 2 3
[1777] 2 2 1 4 4 1 4 2 4 4 1 3 4 2 3 3 3 1 1 5 3 2 3 4 1 2 3 3 5 4 2 5 1 4 2 5 2
[1814] 5 4 2 3 5 1 1 4 1 3 4 5 2 5 5 1 2 3 3 5 5 5 2 2 2 5 1 2 2 5 5 3 4 4 3 3 5
[1851] 1 5 3 2 1 2 4 2 4 2 2 1 1 5 5 1 2 4 2 2 2 1 1 1 5 5 5 1 2 1 2 5 3 5 3 5 3
[1888] 5 3 3 3 5 4 4 1 2 2 4 1 3 3 1 4 4 1 5 5 2 5 4 4 3 1 4 5 1 4 5 3 4 1 1 3 2
[1925] 5 4 5 1 1 2 4 1 1 5 4 3 5 2 4 3 5 3 4 4 3 1 5 4 1 3 2 5 4 4 4 2 3 1 4 3 1
[1962] 2 1 5 3 3 5 1 5 2 2 3 3 5 4 3 3 4 1 4 2 3 1 3 3 3 2 3 3 2 2 2 2 4 3 4 2 3
[1999] 4 2 3 2 1 5 3 1 4 4 3 4 3 5 4 4 1 1 2 2 4 3 5 3 2 4 1 4 2 2 1 5 2 5 5 4 3
[2036] 2 1 5 4 1 3 5 4 3 1 1 2 5 5 4 4 2 3 5 4 4 3 4 3 5 3 5 1 5 1 3 3 4 2 4 5 5
[2073] 5 1 5 5 1 3 3 4 2 5 4 3 1 5 2 4 3 3 5 2 3 3 1 5 4 5 3 5 4 4 3 3 5 1 1 3 3
[2110] 4 2 2 5 5 3 5 3 3 5 3 5 4 5 4 3 2 2 5 2 3 1 4 5 2 2 5 4 5 5 4 3 1 3 4 2 2
[2147] 4 2 4 1 3 4 3 2 3 5 3 2 3 4 2 2 5 2 3 2 4 2 4 2 2 2 5 4 5 5 1 4 3 5 4 1 4
[2184] 4 5 2 2 4 1 3 4 1 3 3 3 1 3 2 3 5 1 1 5 1 1 4 2 5 1 3 3 1 3 4 4 3 5 5 3 5
[2221] 4 2 2 1 5 1 3 2 2 1 5 1 1 1 3 1 1 5 5 5 2 1 2 3 4 1 3 2 2 4 3 3 4 5 5 5 1
[2258] 3 2 1 5 5 1 4 5 5 1 1 1 3 4 1 4 2 3 4 4 5 5 5 2 5 5 4 1 4 1 2 3 4 3 2 1 1
[2295] 3 2 5 3 1 4 2 5 4 5 2 1 1 5 5 5 5 5 1 5 1 2 2 1 4 3 5 1 2 3 1 5 5 1 1 5 5
[2332] 5 4 3 1 3 2 2 3 4 3 4 3 3 4 3 5 4 2 4 3 2 1 4 5 2 2 3 2 3 3 4 3 5 2 4 3 5
[2369] 3 4 4 4 1 5 4 3 5 3 4 2 5 4 1 5 4 3 3 2 5 2 3 4 3 4 4 2 4 5 3 2 2 1 3 4 1
[2406] 4 2 5 1 2 1 4 5 4 5 3 2 2 1 4 3 1 5 4 5 4 5 5 3 3 3 5 1 5 2 3 3 3 3 2 5 4
[2443] 4 4 3 2 3 1 5 3 1 5 5 5 1 1 4 1 4 4 1 1 1 4 1 3 5 4 2 5 4 3 2 3 4 2 2 1 1
[2480] 3 3 1 4 5 3 5 5 5 4 5 2 2 4 2 3 2 1 5 3 4 2 2 5 2 1 1 4 3 5 3 4 3 3 4 1 4
[2517] 2 3 2 4 4 2 1 2 5 3 2 1 1 2 3 2 3 1 5 3 2 3 2 4 1 3 3 5 3 2 1 1 1 3 2 4 2
[2554] 3 5 3 3 1 5 2 3 4 3 3 1 1 4 2 3 1 4 1 3 4 4 1 5 4 4 1 4 4 1 2 1 1 4 2 3 5
[2591] 1 5 4 3 2 5 3 2 4 4 5 4 1 3 1 1 2 3 1 5 4 5 4 5 4 2 3 4 3 5 4 4 2 2 3 1 5
[2628] 5 2 2 3 5 3 1 2 5 2 4 1 1 3 2 3 5 2 3 2 1 4 3 1 4 3 1 3 4 4 4 3 3 4 2 3 5
[2665] 3 5 5 2 2 5 4 2 3 4 1 3 4 3 1 3 1 5 4 4 2 1 5 2 4 3 2 1 5 5 5 3 1 2 4 2 3
[2702] 2 2 5 1 3 5 4 2 1 3 3 4 3 4 4 1 4 2 1 5 1 4 1 4 5 2 5 2 4 2 1 4 2 1 4 2 1
[2739] 3 5 3 3 2 2 5 5 4 1 1 3 4 3 5 1 5 3 2 5 1 4 1 4 1 1 5 3 5 2 5 4 2 1 2 1 1
[2776] 2 4 5 2 3 2 1 1 4 4 5 1 5 3 5 5 3 3 4 1 4 3 2 2 4 1 4 3 3 2 3 4 2 2 2 3 5
[2813] 1 3 5 4 1 3 2 5 4 5 4 3 3 1 5 1 2 3 1 3 4 1 3 3 4 2 4 5 4 4 3 4 2 2 5 4 2
[2850] 2 3 3 3 2 3 3 5 3 2 4 2 2 2 3 2 3 3 3 3 1 5 3 1 2 4 1 1 1 2 5 1 5 4 2 1 3
[2887] 3 3 1 4 4 2 3 3 3 4 5 2 5 2 3 4 3 2 5 3 1 1 3 5 5 3 3 3 4 4 4 5 5 2 1 5 5
[2924] 1 1 3 4 1 4 2 5 5 4 2 4 3 4 2 1 2 4 3 2 1 2 1 4 3 5 1 4 4 3 1 2 5 1 2 4 5
[2961] 5 3 2 1 4 2 5 3 1 2 3 4 1 1 4 4 4 2 2 3 3 5 2 2 2 1 5 3 3 3 2 4 5 4 2 2 5
[2998] 4 1 1 2 5 4 3 3 4 5 3 1 2 4 3 4 5 5 3 5 2 4 2 3 2 5 4 1 4 4 3 1 5 1 1 5 3
[3035] 5 2 3 3 2 5 5 4 2 4 3 1 1 1 3 5 2 3 3 1 5 2 4 1 3 1 1 1 4 1 2 2 3 1 3 1 5
[3072] 1 5 2 4 4 3 1 3 2 5 2 5 4 3 3 2 5 1 5 4 1 1 1 5 4 3 1 3 3 3 5 3 2 3 5 4 1
[3109] 1 4 3 1 3 4 3 5 5 5 2 5 4 4 4 3 2 3 4 1 4 5 3 2 5 2 4 5 2 1 1 3 2 4 2 2 4
[3146] 3 4 1 5 5 5 4 1 4 2 4 2 1 4 5 5 2 2 2 3 4 5 3 2 1 4 5 5 3 3 4 3 3 2 1 3 2
[3183] 4 2 4 3 4 2 2 3 3 5 4 3 4 3 5 4 3 1 4 2 2 3 5 5 1 2 3 4 1 4 4 2 3 5 1 1 3
[3220] 3 4 4 5 2 3 1 5 1 4 2 5 1 4 1 2 1 2 5 1 2 3 4 3 1 4 3 3 3 4 3 3 4 1 3 1 4
[3257] 2 2 2 4 4 5 1 4 1 2 4 2 2 3 5 3 3 4 3 5 2 2 4 5 1 5 1 1 5 1 2 4 4 4 5 5 1
[3294] 1 3 3 4 1 4 4 2 1 3 5 2 5 2 2 5 2 3 2 4 1 3 2 4 3 2 4 2 2 1 3 2 5 2 2 2 4
[3331] 3 2 1 4 3 2 2 3 3 3 2 3 4 1 2 4 1 2 2 4 3 2 2 2 4 4 4 2 5 3 4 2 3 4 2 1 3
[3368] 2 4 3 3 4 4 5 4 1 3 1 3 5 2 4 4 5 2 3 5 3 3 5 1 4 3 1 4 3 4 5 3 1 1 2 5 5
[3405] 5 5 1 1 3 2 3 1 3 2 4 4 4 3 1 5 4 1 1 3 3 1 4 4 2 3 3 2 4 2 3 5 5 4 4 4 4
[3442] 3 2 2 2 2 5 3 5 1 5 2 5 4 3 1 1 3 4 3 1 5 5 4 2 4 1 2 1 3 5 2 4 5 3 1 2 4
[3479] 5 4 4 4 4 4 5 3 5 1 4 4 5 4 1 4 4 4 3 5 1 1 2 3 1 1 5 5 5 2 5 1 5 5 2 1 5
[3516] 1 4 3 4 4 4 2 5 4 1 3 1 5 3 1 3 3 2 5 4 4 1 4 3 3 3 3 1 1 3 3 5 2 4 5 4 4
[3553] 2 5 1 3 2 4 2 4 3 4 1 2 5 5 3 4 4 1 1 4 4 1 3 4 1 2 3 5 2 1 5 4 2 4 3 2 1
[3590] 4 2 3 3 5 3 2 5 1 5 5 2 4 3 3 1 5 5 4 2 1 4 1 3 4 4 2 4 5 1 4 2 5 5 4 2 3
[3627] 5 1 3 2 2 4 1 3 3 4 2 1 5 3 2 2 3 3 4 3 1 4 5 5 3 4 1 4 1 3 1 2 2 5 4 4 3
[3664] 4 4 5 4 5 4 5 4 4 5 1 5 5 5 4 3 4 2 1 1 5 3 1 2 5 5 4 1 1 1 1 1 5 2 3 4 2
[3701] 4 1 1 3 5 2 5 4 2 2 4 5 5 5 4 3 5 4 1 3 4 2 5 2 1 5 4 5 5 3 3 5 1 3 1 2 2
[3738] 1 3 1 2 5 3 3 4 2 1 5 4 5 2 3 3 1 4 1 4 4 3 2 5 2 4 2 4 5 2 1 5 2 5 4 4 2
[3775] 2 3 1 2 5 5 4 3 3 4 2 3 1 5 1 4 3 4 5 4 1 3 4 2 5 1 3 3 4 2 4 5 2 3 4 5 1
[3812] 2 3 3 3 1 4 3 5 3 4 3 1 4 1 1 3 2 4 3 5 1 3 2 4 5 2 2 3 2 2 3 3 3 2 4 1 5
[3849] 2 2 3 3 4 4 1 4 1 4 4 1 1 1 4 2 3 4 3 4 2 4 1 1 3 1 4 1 2 5 4 2 2 4 3 4 1
[3886] 2 5 1 1 1 5 4 4 1 4 4 5 5 5 3 5 5 2 4 2 1 5 3 4 5 4 1 2 2 2 3 2 3 5 3 2 5
[3923] 3 2 4 5 4 4 4 3 4 5 5 5 3 3 5 2 1 2 3 5 5 4 2 3 3 2 1 1 1 5 1 1 4 3 2 2 1
[3960] 2 2 1 3 3 2 3 4 4 3 4 3 1 2 4 5 3 4 4 1 3 1 1 5 4 5 4 4 2 1 1 3 5 4 2 5 1
[3997] 3 4 5 3 2 5 5 3 4 4 4 3 5 4 1 2 2 5 3 4 1 3 3 4 5 5 4 1 3 4 3 4 3 1 5 1 3
[4034] 5 2 2 2 5 2 4 2 1 1 5 5 3 5 5 1 3 5 4 5 4 5 2 5 1 4 1 1 1 2 5 1 4 1 3 3 5
[4071] 1 3 1 1 5 5 1 4 4 4 4 2 2 1 3 5 2 4 3 3 2 1 1 4 3 4 3 5 2 3 1 5 3 4 2 5 3
[4108] 5 3 2 4 1 3 3 5 1 2 5 4 3 1 2 3 5 5 3 3 3 3 5 2 2 2 4 2 3 2 1 3 3 4 5 4 4
[4145] 3 2 5 2 2 3 3 2 4 5 2 2 2 4 4 4 4 2 4 2 3 1 3 1 1 3 4 5 5 3 4 2 2 1 3 2 3
[4182] 2 4 5 3 1 3 4 4 4 4 4 2 5 1 1 4 2 3 4 2 4 5 3 2 4 4 5 4 5 4 2 2 2 4 3 5 5
[4219] 3 1 1 5 3 3 5 4 3 3 4 1 2 4 1 3 2 3 2 5 1 4 2 3 4 1 4 1 5 4 2 3 2 5 4 5 4
[4256] 1 1 2 1 2 1 4 5 1 5 5 4 5 5 3 5 1 3 1 3 5 4 4 1 5 3 2 3 1 2 2 3 4 1 5 5 2
[4293] 3 2 3 3 5 4 4 2 2 3 1 3 4 3 1 3 1 5 5 1 3 4 2 4 5 4 3 5 3 3 2 2 2 1 5 4 1
[4330] 2 1 4 1 4 4 5 5 1 1 4 2 2 3 3 2 4 5 1 2 1 2 5 3 4 1 3 2 4 1 3 5 1 4 5 5 1
[4367] 3 2 1 1 3 5 5 4 2 3 2 3 4 1 3 4 5 1 3 5 2 4 5 4 2 1 3 5 3 2 3 4 1 5 2 2 2
[4404] 3 5 5 2 1 4 2 4 4 5 5 4 2 3 1 5 4 3 5 3 2 1 4 1 4 4 3 2 1 5 4 1 4 5 4 5 5
[4441] 2 5 3 4 3 3 1 1 5 1 2 2 4 4 2 1 1 3 4 1 4 5 4 1 3 5 3 4 5 1 4 1 2 3 4 2 3
[4478] 2 5 5 3 5 4 1 4 2 3 5 1 4 1 4 3 1 5 2 4 3 1 2 2 3 1 3 3 5 2 1 3 2 5 4 5 3
[4515] 2 2 4 5 5 3 5 2 2 2 5 1 1 4 3 5 2 4 4 5 5 4 5 2 1 3 4 5 1 3 3 4 2 2 3 5 3
[4552] 2 2 3 5 1 4 5 3 3 1 5 5 4 2 2 3 3 2 5 5 1 2 5 4 4 3 2 1 2 5 2 4 2 2 1 2 3
[4589] 5 4 3 3 1 4 4 1 4 2 4 5 2 5 3 5 1 5 3 1 2 1 2 3 3 2 4 4 3 4 4 2 1 5 4 3 2
[4626] 4 5 5 1 5 1 4 1 2 3 4 3 1 4 4 5 4 4 3 2 3 3 5 2 3 4 5 5 4 1 3 3 2 3 1 1 4
[4663] 2 3 1 4 4 1 4 5 5 5 4 3 3 4 3 4 4 3 5 5 3 4 3 1 4 2 5 1 4 4 4 1 4 2 1 5 1
[4700] 2 5 4 4 4 4 3 5 3 5 4 2 5 5 3 4 4 5 5 5 3 2 4 3 2 5 3 3 3 5 3 4 2 3 2 3 4
[4737] 3 5 4 3 3 2 3 4 1 4 3 3 4 5 4 5 4 3 1 2 5 2 2 1 2 3 2 5 5 3 4 1 3 5 5 4 5
[4774] 4 3 5 3 4 3 1 4 2 3 5 5 2 2 4 3 3 4 2 5 3 2 4 3 3 4 1 4 1 3 5 5 5 4 2 4 1
[4811] 5 2 2 3 3 4 4 5 5 1 3 3 5 1 1 3 2 5 1 1 2 5 4 2 2 2 4 2 2 5 5 4 5 4 2 3 4
[4848] 2 3 1 4 2 1 1 1 2 3 2 3 3 5 3 1 3 5 4 3 3 5 3 2 2 1 2 3 2 4 1 1 3 1 5 1 2
[4885] 4 5 4 2 1 4 2 4 3 2 4 3 2 5 4 5 4 3 4 5 1 3 2 5 4 4 2 3 4 2 5 3 4 3 4 2 1
[4922] 1 4 4 2 3 4 3 5 5 2 2 4 4 4 5 4 1 4 3 1 4 1 3 1 4 3 5 4 5 4 4 1 3 1 3 3 4
[4959] 5 4 3 5 4 3 4 4 1 2 1 3 2 1 2 1 4 1 3 2 5 3 2 4 5 3 3 2 2 2 2 4 4 4 1 2 5
[4996] 4 4 1 2 5 1 2 3 2 3 1 5 5 2 3 1 4 3 2 2 3 1 2 4 4 4 5 5 4 1 1 1 3 2 4 4 3
[5033] 4 4 3 5 5 1 3 3 5 2 5 3 1 4 1 1 2 3 4 2 4 3 4 5 3 2 4 5 2 3 5 5 2 5 3 5 4
[5070] 4 3 3 2 2 4 1 2 4 5 3 4 4 4 1 4 3 1 4 3 5 1 2 2 2 3 2 2 4 1 3 5 3 4 4 3 2
[5107] 3 1 2 1 1 1 3 3 2 3 5 3 4 2 4 4 1 3 5 5 1 2 1 1 2 2 4 2 4 2 1 2 2 4 2 2 3
[5144] 4 2 4 2 1 1 3 1 5 3 1 3 2 5 3 4 3 5 4 5 2 2 4 3 4 2 1 2 2 5 5 5 4 4 4 2 4
[5181] 3 2 2 5 5 5 3 3 2 4 5 2 3 4 3 5 4 2 2 4 5 1 3 4 1 2 2 4 2 5 4 3 4 1 3 5 2
[5218] 3 2 5 4 5 1 1 2 5 4 3 2 2 1 1 5 1 1 3 5 4 5 3 3 3 3 4 5 3 5 4 1 5 5 2 1 1
[5255] 1 5 4 2 2 3 4 3 3 5 2 5 4 5 5 4 4 2 5 2 3 4 1 4 3 1 5 2 5 5 2 4 1 2 1 2 1
[5292] 1 3 5 2 3 2 1 5 4 4 2 1 4 2 3 1 1 5 3 2 1 5 3 3 3 3 1 5 4 3 1 5 5 5 5 1 5
[5329] 5 4 2 3 3 5 4 5 5 3 2 1 3 2 2 2 3 2 4 3 3 2 2 3 4 3 4 4 5 1 2 3 5 3 5 3 3
[5366] 5 1 3 1 5 4 2 5 3 2 4 4 4 2 4 2 5 1 4 2 5 3 2 5 5 5 4 5 1 2 2 2 4 3 1 3 5
[5403] 5 4 2 1 1 4 1 3 2 5 2 1 4 1 2 2 2 2 3 5 2 5 3 1 2 4 1 3 5 4 5 5 2 3 3 4 4
[5440] 1 4 5 4 1 5 5 4 3 4 5 5 3 2 1 2 3 3 2 1 5 2 5 3 3 5 3 3 3 4 4 5 4 4 2 2 5
[5477] 4 5 2 2 5 3 1 1 2 2 3 4 2 4 2 1 3 5 5 5 1 2 4 1 4 1 1 3 1 2 3 3 1 1 4 4 1
[5514] 3 2 2 3 5 5 3 5 3 5 1 3 4 2 3 3 4 5 4 5 1 4 3 1 3 3 3 5 5 2 4 1 5 5 5 3 1
[5551] 5 5 2 2 4 4 5 2 5 1 3 3 4 2 5 1 4 5 4 1 3 3 5 5 5 3 3 5 1 1 3 5 1 5 1 3 4
[5588] 5 4 1 4 3 4 5 4 3 4 1 2 1 3 2 1 2 1 2 4 4 1 1 4 5 1 4 5 3 4 2 2 4 3 5 2 3
[5625] 1 3 3 5 2 4 5 3 2 1 4 3 3 3 3 4 3 3 1 1 4 1 1 2 5 3 4 5 4 1 2 4 2 4 1 4 1
[5662] 4 3 2 5 2 5 1 4 2 3 3 1 2 5 5 2 3 3 1 2 4 3 2 2 4 5 4 2 3 2 2 4 2 3 4 5 3
[5699] 4 1 3 5 4 5 4 2 4 2 2 1 4 5 3 4 4 4 5 2 4 2 2 3 1 2 3 2 4 1 2 1 4 2 5 2 2
[5736] 1 3 3 3 5 3 4 4 1 4 3 4 2 3 3 2 1 1 1 1 2 4 3 3 3 5 4 1 3 3 3 1 1 5 4 2 1
[5773] 1 4 1 5 4 3 3 4 3 3 1 3 3 2 5 2 4 3 5 1 4 3 3 5 4 4 2 4 4 4 1 4 5 5 5 2 2
[5810] 3 5 1 5 3 3 3 1 4 3 2 4 3 2 4 3 5 3 4 5 5 1 4 5 5 3 3 3 4 2 3 1 2 2 3 2 3
[5847] 3 4 3 1 1 2 3 4 5 4 2 5 4 3 2 4 3 2 1 4 3 2 5 1 5 4 1 5 4 4 2 4 1 4 3 1 4
[5884] 3 3 4 1 3 2 1 5 2 4 4 3 3 5 2 3 3 3 2 5 3 3 5 2 1 5 4 3 2 2 2 5 1 5 4 2 3
[5921] 3 1 1 3 4 3 2 3 2 4 3 5 2 3 1 2 4 3 2 4 2 3 1 5 3 2 4 5 5 4 2 5 4 2 4 5 2
[5958] 5 2 3 4 1 3 1 2 1 4 3 1 2 5 4 3 5 3 5 4 2 2 4 1 4 1 1 5 5 2 3 3 4 3 2 4 5
[5995] 1 2 2 3 2 4 3 2 3 5 4 2 3 4 4 1 3 5 3 4 5 2 3 4 1 4 2 5 4 3 4 3 3 4 5 5 2
[6032] 3 4 5 3 5 5 3 2 4 5 1 4 2 3 5 3 4 1 1 5 5 4 5 4 3 2 3 1 5 5 4 2 5 5 1 1 1
[6069] 4 1 3 3 5 5 2 1 1 1 4 4 4 5 1 4 4 3 4 3 4 5 5 2 4 3 4 1 2 1 3 3 2 3 4 3 1
[6106] 1 5 2 2 3 2 2 5 3 2 1 3 5 5 2 4 4 2 4 5 4 2 2 3 2 4 1 5 3 5 2 2 2 3 4 3 1
[6143] 5 4 4 2 5 4 4 4 4 5 5 2 5 3 1 4 2 2 4 3 3 1 1 4 3 2 1 4 5 3 4 5 5 5 3 3 4
[6180] 1 5 2 4 4 5 2 1 1 2 2 4 1 2 4 3 5 5 3 3 4 5 4 1 1 4 4 2 1 1 2 5 1 5 5 2 1
[6217] 4 3 2 4 3 2 3 3 2 2 5 3 2 1 1 4 4 3 4 1 4 5 4 2 3 1 4 4 3 1 1 1 1 3 5 3 1
[6254] 2 1 5 4 3 4 3 2 2 4 3 1 5 1 4 4 1 3 4 4 3 2 4 1 4 1 5 1 4 4 5 3 4 3 5 3 4
[6291] 3 2 2 2 3 2 2 4 3 4 1 5 5 2 5 5 3 5 4 4 3 2 2 4 2 3 2 1 2 2 3 1 2 2 5 3 2
[6328] 3 2 3 2 4 5 3 1 5 2 5 4 2 4 1 3 3 3 4 1 2 5 1 4 4 5 1 3 3 4 1 5 4 2 4 4 3
[6365] 4 4 5 3 1 1 5 3 2 4 5 2 2 2 4 1 5 4 2 3 5 4 3 5 1 4 2 4 3 2 5 1 1 2 5 3

Labels:
 value                 label
     1   Highly dissatisfied
     2 Somewhat dissatisfied
     3               Neutral
     4    Somewhat satisfied
     5      Highly satisfied

11.8.2 Efficient way

  • To utilize rich label information contained in value labels, use haven::as_factor()
```{r old}
theme_set(theme_minimal())

data %>%
  ggplot(aes(haven::as_factor(marital))) +
  geom_bar(fill='blue')

# get label of the variable to use it as a subtitle of the chart.
var_label(data$marital)

data %>%
  ggplot(aes(x=haven::as_factor(marital), y = after_stat(prop), group = 1)) +
  geom_bar(fill = 'blue') +
  scale_y_continuous(labels = scales::percent) +
  labs(title = "Marital Status in the demo data",
       subtitle = "# of participants who are married are about the same as \nthose who are not married ",
       x = "Marital Status",
       y = "")
       

var_label(data$inccat)

data %>%
  ggplot() +
  geom_bar(aes(x= haven::as_factor(inccat), y = after_stat(prop), group = 1), fill = 'purple') +
  scale_y_continuous(labels = scales::percent) +
  labs(title = "Income categories in the demo data",
       subtitle = "Distribution of income",
       x = "Income category in thousands",
       y = "",
       caption = ""
         )

var_label(data$carcat)

data %>%
  ggplot() +
  geom_bar(aes(x= haven::as_factor(carcat), 
               y = after_stat(prop), group = 1), fill = 'orange') +
  scale_y_continuous(labels = scales::percent) +
  labs(title = "Primary vehicle price category in the demo data",
       subtitle = "",
       x = "Primary vehicle price category",
       y = "",
       caption = ""
         )

var_label(data$jobsat)
val_labels(data$jobsat)

data %>%
  ggplot() +
  geom_bar(aes(x= haven::as_factor(jobsat), 
               y = after_stat(prop), group = 1), fill = '#D35400') +
  scale_y_continuous(labels = scales::percent) +
  coord_flip() +
  labs(title = "Job satisfaction in the demo data",
       #subtitle = "",
       x = "Job satisfaction",
       y = "",
       caption = ""
         )
```
[1] "Marital status"
[1] "Income category in thousands"
[1] "Primary vehicle price category"
[1] "Job satisfaction"
<labelled<double>[6400]>: Job satisfaction
   [1] 5 4 3 1 2 2 2 1 5 4 3 5 2 1 4 5 1 4 3 5 3 3 3 3 4 3 1 5 5 4 1 4 1 2 5 1 1
  [38] 3 4 3 3 2 5 2 3 1 2 3 2 3 2 5 4 2 2 1 3 3 3 3 5 3 2 2 1 1 4 3 4 2 5 5 4 5
  [75] 2 3 5 5 5 3 3 4 4 2 2 5 4 3 4 4 5 4 1 3 5 1 4 2 2 2 3 3 4 3 3 2 5 4 1 4 4
 [112] 2 2 2 1 4 1 1 1 5 3 2 5 1 4 3 2 2 1 3 2 1 1 2 1 3 1 3 4 1 4 3 1 5 1 4 2 2
 [149] 2 5 2 2 2 4 4 5 2 2 1 2 3 4 3 1 2 3 4 2 2 2 2 1 3 4 4 1 4 4 4 2 1 4 3 2 2
 [186] 3 4 5 5 5 3 2 1 5 5 2 1 3 5 3 3 5 3 4 2 4 2 1 5 2 5 3 5 4 5 3 4 2 4 4 1 1
 [223] 2 2 4 1 1 5 1 3 2 1 4 5 1 3 4 4 5 5 5 5 4 2 2 4 1 4 5 5 5 4 2 5 5 1 4 5 1
 [260] 5 2 4 5 4 1 3 5 5 2 3 5 1 4 5 5 4 3 4 2 4 3 3 4 3 1 2 5 2 2 3 2 5 4 2 4 3
 [297] 2 2 1 1 4 1 2 4 5 5 3 1 2 4 5 4 4 4 4 5 3 4 1 5 1 4 4 3 4 1 1 4 5 5 2 3 3
 [334] 3 3 2 1 5 3 2 1 4 4 3 3 3 1 5 1 4 1 5 5 1 3 5 3 5 5 3 2 2 3 4 4 5 4 5 2 1
 [371] 5 5 2 3 5 2 3 3 3 3 3 4 2 1 5 4 4 3 2 1 3 5 4 3 2 4 4 4 1 4 4 1 1 1 1 5 3
 [408] 3 4 4 4 2 3 1 2 2 4 4 3 5 3 3 3 5 4 3 4 5 2 3 4 4 4 1 4 2 3 2 3 5 3 5 1 2
 [445] 1 5 4 1 5 5 2 5 3 4 4 5 3 4 2 4 4 4 1 3 3 5 2 3 2 3 4 3 3 4 2 2 1 3 5 4 3
 [482] 3 1 3 4 1 1 1 3 5 4 2 2 4 1 4 5 3 3 1 2 4 1 3 4 5 3 3 5 4 2 3 1 2 4 3 2 4
 [519] 5 3 4 5 1 3 4 2 1 5 2 1 4 2 1 5 4 3 4 2 5 4 1 3 5 4 1 2 3 2 4 2 2 4 2 4 3
 [556] 2 1 1 1 3 1 4 2 4 2 4 3 1 5 5 4 2 3 4 3 1 1 4 1 1 1 1 2 3 3 4 3 1 4 4 2 5
 [593] 4 2 5 4 5 4 2 2 3 4 4 4 3 3 4 5 5 2 3 1 5 5 4 3 2 1 3 5 5 3 3 1 3 4 3 3 2
 [630] 3 1 5 5 4 3 1 4 2 2 2 3 1 5 3 5 1 2 4 2 3 2 2 2 4 2 3 2 2 5 2 4 3 2 4 2 4
 [667] 4 1 2 3 1 5 1 4 5 3 3 2 4 4 3 3 2 5 1 4 2 5 4 2 4 1 1 4 1 2 3 2 2 5 5 3 4
 [704] 5 4 5 1 5 3 4 2 2 3 3 1 2 1 3 3 3 2 2 2 3 2 4 2 5 1 5 2 4 1 4 4 4 3 2 5 4
 [741] 4 2 4 4 3 2 3 4 2 5 3 1 2 3 5 5 1 3 5 2 3 4 5 5 2 2 4 3 5 2 4 4 5 3 1 3 5
 [778] 1 5 5 2 2 5 4 2 2 2 3 1 3 5 3 1 3 4 2 5 3 5 5 3 2 3 4 1 5 5 1 4 1 2 2 1 2
 [815] 2 5 2 3 5 2 3 2 4 2 1 3 3 2 5 3 5 3 1 3 5 4 1 3 1 5 4 2 4 3 1 1 1 4 5 3 3
 [852] 1 5 2 2 4 4 1 1 2 5 4 1 4 4 2 5 1 2 4 3 3 2 1 3 5 4 2 3 3 3 2 3 3 1 4 1 3
 [889] 2 3 1 3 2 1 3 5 4 3 3 1 5 3 4 4 2 2 3 5 3 5 3 5 3 4 4 3 3 3 1 1 1 4 3 5 4
 [926] 2 2 3 2 5 3 1 3 5 3 2 4 1 3 5 5 1 5 5 2 4 5 4 4 2 3 3 2 1 3 5 3 2 5 4 2 2
 [963] 5 3 4 3 1 1 1 1 5 4 5 5 2 5 5 2 3 4 4 4 4 4 3 1 2 5 4 4 3 4 1 4 4 5 3 3 1
[1000] 4 1 4 1 2 5 5 4 3 5 3 4 4 1 4 1 5 3 2 4 5 5 5 1 2 5 5 4 5 4 3 4 3 4 5 4 1
[1037] 5 4 4 2 2 3 1 1 2 4 3 3 5 4 4 2 2 3 2 3 2 3 5 3 5 1 2 4 1 1 3 2 5 3 3 4 5
[1074] 2 2 5 3 4 3 5 3 1 5 3 4 2 1 5 2 3 3 4 2 2 3 3 3 1 2 5 5 3 3 2 4 5 4 5 4 5
[1111] 1 4 1 2 4 3 2 5 1 3 1 1 2 3 5 3 3 1 3 1 5 3 4 2 1 3 1 2 2 5 5 4 5 1 1 4 5
[1148] 4 3 4 2 3 4 5 1 2 3 2 5 5 2 2 1 3 3 2 3 2 5 3 1 1 3 2 3 4 2 1 2 4 2 3 4 5
[1185] 3 1 4 4 5 4 4 1 5 5 5 4 2 2 5 5 2 1 2 5 2 1 4 2 4 1 5 3 1 1 2 1 2 4 4 4 1
[1222] 2 3 4 3 5 5 1 1 5 4 5 4 1 3 4 2 2 2 3 4 4 1 4 2 3 5 5 3 4 4 1 3 1 3 5 4 4
[1259] 2 1 4 5 5 2 5 2 5 4 3 1 4 5 5 4 1 4 5 5 3 1 3 4 4 2 5 3 4 5 2 2 2 4 3 2 1
[1296] 3 1 4 4 2 4 1 3 2 2 4 3 1 2 1 5 3 4 5 5 4 2 4 2 1 5 4 5 4 4 5 3 4 1 4 4 3
[1333] 3 3 4 5 5 1 3 3 4 4 3 4 1 3 1 4 4 1 1 2 5 4 2 2 2 5 2 2 3 1 3 3 2 2 4 5 2
[1370] 4 1 2 2 2 1 3 5 4 3 3 2 3 3 5 4 3 1 3 1 4 4 2 3 3 3 4 5 4 1 3 3 2 5 5 1 1
[1407] 2 2 1 2 5 3 4 4 3 2 1 4 2 2 5 4 3 3 4 1 1 5 2 1 4 2 4 1 5 5 2 4 1 1 2 3 2
[1444] 4 3 4 5 4 4 5 1 5 5 5 1 1 5 5 5 4 2 5 1 3 3 5 1 1 1 5 2 4 1 2 5 4 2 1 4 3
[1481] 1 3 5 3 2 3 5 2 1 3 2 4 4 1 5 1 3 2 5 5 5 4 4 5 2 5 4 3 1 1 1 5 3 5 2 5 2
[1518] 3 5 2 2 5 2 2 5 4 4 1 1 2 5 4 1 2 1 4 2 1 3 2 2 5 3 5 1 4 3 1 5 1 4 3 5 2
[1555] 3 1 4 2 4 4 2 2 2 5 2 1 1 2 1 4 4 2 3 5 1 4 3 3 3 5 2 3 1 5 4 5 5 2 5 2 2
[1592] 2 2 5 3 4 2 5 1 2 3 1 3 5 3 4 4 5 1 3 5 4 2 5 1 4 1 2 5 4 3 3 3 5 3 2 2 1
[1629] 1 3 4 2 4 3 2 5 4 1 1 4 4 5 5 3 1 5 4 4 2 2 3 5 4 2 2 3 5 5 2 2 1 1 2 4 2
[1666] 5 5 4 1 2 4 3 5 4 1 1 1 5 2 2 1 4 3 4 3 5 1 5 5 1 3 4 5 5 2 3 5 4 1 3 3 3
[1703] 2 2 3 5 3 3 1 1 2 5 1 5 3 2 2 3 2 5 2 3 4 1 2 4 4 4 1 5 2 5 2 2 5 1 5 1 2
[1740] 1 4 4 4 4 2 3 1 5 5 3 5 3 3 1 5 3 5 1 5 1 4 1 4 3 1 4 5 2 5 2 5 3 3 3 2 3
[1777] 2 2 1 4 4 1 4 2 4 4 1 3 4 2 3 3 3 1 1 5 3 2 3 4 1 2 3 3 5 4 2 5 1 4 2 5 2
[1814] 5 4 2 3 5 1 1 4 1 3 4 5 2 5 5 1 2 3 3 5 5 5 2 2 2 5 1 2 2 5 5 3 4 4 3 3 5
[1851] 1 5 3 2 1 2 4 2 4 2 2 1 1 5 5 1 2 4 2 2 2 1 1 1 5 5 5 1 2 1 2 5 3 5 3 5 3
[1888] 5 3 3 3 5 4 4 1 2 2 4 1 3 3 1 4 4 1 5 5 2 5 4 4 3 1 4 5 1 4 5 3 4 1 1 3 2
[1925] 5 4 5 1 1 2 4 1 1 5 4 3 5 2 4 3 5 3 4 4 3 1 5 4 1 3 2 5 4 4 4 2 3 1 4 3 1
[1962] 2 1 5 3 3 5 1 5 2 2 3 3 5 4 3 3 4 1 4 2 3 1 3 3 3 2 3 3 2 2 2 2 4 3 4 2 3
[1999] 4 2 3 2 1 5 3 1 4 4 3 4 3 5 4 4 1 1 2 2 4 3 5 3 2 4 1 4 2 2 1 5 2 5 5 4 3
[2036] 2 1 5 4 1 3 5 4 3 1 1 2 5 5 4 4 2 3 5 4 4 3 4 3 5 3 5 1 5 1 3 3 4 2 4 5 5
[2073] 5 1 5 5 1 3 3 4 2 5 4 3 1 5 2 4 3 3 5 2 3 3 1 5 4 5 3 5 4 4 3 3 5 1 1 3 3
[2110] 4 2 2 5 5 3 5 3 3 5 3 5 4 5 4 3 2 2 5 2 3 1 4 5 2 2 5 4 5 5 4 3 1 3 4 2 2
[2147] 4 2 4 1 3 4 3 2 3 5 3 2 3 4 2 2 5 2 3 2 4 2 4 2 2 2 5 4 5 5 1 4 3 5 4 1 4
[2184] 4 5 2 2 4 1 3 4 1 3 3 3 1 3 2 3 5 1 1 5 1 1 4 2 5 1 3 3 1 3 4 4 3 5 5 3 5
[2221] 4 2 2 1 5 1 3 2 2 1 5 1 1 1 3 1 1 5 5 5 2 1 2 3 4 1 3 2 2 4 3 3 4 5 5 5 1
[2258] 3 2 1 5 5 1 4 5 5 1 1 1 3 4 1 4 2 3 4 4 5 5 5 2 5 5 4 1 4 1 2 3 4 3 2 1 1
[2295] 3 2 5 3 1 4 2 5 4 5 2 1 1 5 5 5 5 5 1 5 1 2 2 1 4 3 5 1 2 3 1 5 5 1 1 5 5
[2332] 5 4 3 1 3 2 2 3 4 3 4 3 3 4 3 5 4 2 4 3 2 1 4 5 2 2 3 2 3 3 4 3 5 2 4 3 5
[2369] 3 4 4 4 1 5 4 3 5 3 4 2 5 4 1 5 4 3 3 2 5 2 3 4 3 4 4 2 4 5 3 2 2 1 3 4 1
[2406] 4 2 5 1 2 1 4 5 4 5 3 2 2 1 4 3 1 5 4 5 4 5 5 3 3 3 5 1 5 2 3 3 3 3 2 5 4
[2443] 4 4 3 2 3 1 5 3 1 5 5 5 1 1 4 1 4 4 1 1 1 4 1 3 5 4 2 5 4 3 2 3 4 2 2 1 1
[2480] 3 3 1 4 5 3 5 5 5 4 5 2 2 4 2 3 2 1 5 3 4 2 2 5 2 1 1 4 3 5 3 4 3 3 4 1 4
[2517] 2 3 2 4 4 2 1 2 5 3 2 1 1 2 3 2 3 1 5 3 2 3 2 4 1 3 3 5 3 2 1 1 1 3 2 4 2
[2554] 3 5 3 3 1 5 2 3 4 3 3 1 1 4 2 3 1 4 1 3 4 4 1 5 4 4 1 4 4 1 2 1 1 4 2 3 5
[2591] 1 5 4 3 2 5 3 2 4 4 5 4 1 3 1 1 2 3 1 5 4 5 4 5 4 2 3 4 3 5 4 4 2 2 3 1 5
[2628] 5 2 2 3 5 3 1 2 5 2 4 1 1 3 2 3 5 2 3 2 1 4 3 1 4 3 1 3 4 4 4 3 3 4 2 3 5
[2665] 3 5 5 2 2 5 4 2 3 4 1 3 4 3 1 3 1 5 4 4 2 1 5 2 4 3 2 1 5 5 5 3 1 2 4 2 3
[2702] 2 2 5 1 3 5 4 2 1 3 3 4 3 4 4 1 4 2 1 5 1 4 1 4 5 2 5 2 4 2 1 4 2 1 4 2 1
[2739] 3 5 3 3 2 2 5 5 4 1 1 3 4 3 5 1 5 3 2 5 1 4 1 4 1 1 5 3 5 2 5 4 2 1 2 1 1
[2776] 2 4 5 2 3 2 1 1 4 4 5 1 5 3 5 5 3 3 4 1 4 3 2 2 4 1 4 3 3 2 3 4 2 2 2 3 5
[2813] 1 3 5 4 1 3 2 5 4 5 4 3 3 1 5 1 2 3 1 3 4 1 3 3 4 2 4 5 4 4 3 4 2 2 5 4 2
[2850] 2 3 3 3 2 3 3 5 3 2 4 2 2 2 3 2 3 3 3 3 1 5 3 1 2 4 1 1 1 2 5 1 5 4 2 1 3
[2887] 3 3 1 4 4 2 3 3 3 4 5 2 5 2 3 4 3 2 5 3 1 1 3 5 5 3 3 3 4 4 4 5 5 2 1 5 5
[2924] 1 1 3 4 1 4 2 5 5 4 2 4 3 4 2 1 2 4 3 2 1 2 1 4 3 5 1 4 4 3 1 2 5 1 2 4 5
[2961] 5 3 2 1 4 2 5 3 1 2 3 4 1 1 4 4 4 2 2 3 3 5 2 2 2 1 5 3 3 3 2 4 5 4 2 2 5
[2998] 4 1 1 2 5 4 3 3 4 5 3 1 2 4 3 4 5 5 3 5 2 4 2 3 2 5 4 1 4 4 3 1 5 1 1 5 3
[3035] 5 2 3 3 2 5 5 4 2 4 3 1 1 1 3 5 2 3 3 1 5 2 4 1 3 1 1 1 4 1 2 2 3 1 3 1 5
[3072] 1 5 2 4 4 3 1 3 2 5 2 5 4 3 3 2 5 1 5 4 1 1 1 5 4 3 1 3 3 3 5 3 2 3 5 4 1
[3109] 1 4 3 1 3 4 3 5 5 5 2 5 4 4 4 3 2 3 4 1 4 5 3 2 5 2 4 5 2 1 1 3 2 4 2 2 4
[3146] 3 4 1 5 5 5 4 1 4 2 4 2 1 4 5 5 2 2 2 3 4 5 3 2 1 4 5 5 3 3 4 3 3 2 1 3 2
[3183] 4 2 4 3 4 2 2 3 3 5 4 3 4 3 5 4 3 1 4 2 2 3 5 5 1 2 3 4 1 4 4 2 3 5 1 1 3
[3220] 3 4 4 5 2 3 1 5 1 4 2 5 1 4 1 2 1 2 5 1 2 3 4 3 1 4 3 3 3 4 3 3 4 1 3 1 4
[3257] 2 2 2 4 4 5 1 4 1 2 4 2 2 3 5 3 3 4 3 5 2 2 4 5 1 5 1 1 5 1 2 4 4 4 5 5 1
[3294] 1 3 3 4 1 4 4 2 1 3 5 2 5 2 2 5 2 3 2 4 1 3 2 4 3 2 4 2 2 1 3 2 5 2 2 2 4
[3331] 3 2 1 4 3 2 2 3 3 3 2 3 4 1 2 4 1 2 2 4 3 2 2 2 4 4 4 2 5 3 4 2 3 4 2 1 3
[3368] 2 4 3 3 4 4 5 4 1 3 1 3 5 2 4 4 5 2 3 5 3 3 5 1 4 3 1 4 3 4 5 3 1 1 2 5 5
[3405] 5 5 1 1 3 2 3 1 3 2 4 4 4 3 1 5 4 1 1 3 3 1 4 4 2 3 3 2 4 2 3 5 5 4 4 4 4
[3442] 3 2 2 2 2 5 3 5 1 5 2 5 4 3 1 1 3 4 3 1 5 5 4 2 4 1 2 1 3 5 2 4 5 3 1 2 4
[3479] 5 4 4 4 4 4 5 3 5 1 4 4 5 4 1 4 4 4 3 5 1 1 2 3 1 1 5 5 5 2 5 1 5 5 2 1 5
[3516] 1 4 3 4 4 4 2 5 4 1 3 1 5 3 1 3 3 2 5 4 4 1 4 3 3 3 3 1 1 3 3 5 2 4 5 4 4
[3553] 2 5 1 3 2 4 2 4 3 4 1 2 5 5 3 4 4 1 1 4 4 1 3 4 1 2 3 5 2 1 5 4 2 4 3 2 1
[3590] 4 2 3 3 5 3 2 5 1 5 5 2 4 3 3 1 5 5 4 2 1 4 1 3 4 4 2 4 5 1 4 2 5 5 4 2 3
[3627] 5 1 3 2 2 4 1 3 3 4 2 1 5 3 2 2 3 3 4 3 1 4 5 5 3 4 1 4 1 3 1 2 2 5 4 4 3
[3664] 4 4 5 4 5 4 5 4 4 5 1 5 5 5 4 3 4 2 1 1 5 3 1 2 5 5 4 1 1 1 1 1 5 2 3 4 2
[3701] 4 1 1 3 5 2 5 4 2 2 4 5 5 5 4 3 5 4 1 3 4 2 5 2 1 5 4 5 5 3 3 5 1 3 1 2 2
[3738] 1 3 1 2 5 3 3 4 2 1 5 4 5 2 3 3 1 4 1 4 4 3 2 5 2 4 2 4 5 2 1 5 2 5 4 4 2
[3775] 2 3 1 2 5 5 4 3 3 4 2 3 1 5 1 4 3 4 5 4 1 3 4 2 5 1 3 3 4 2 4 5 2 3 4 5 1
[3812] 2 3 3 3 1 4 3 5 3 4 3 1 4 1 1 3 2 4 3 5 1 3 2 4 5 2 2 3 2 2 3 3 3 2 4 1 5
[3849] 2 2 3 3 4 4 1 4 1 4 4 1 1 1 4 2 3 4 3 4 2 4 1 1 3 1 4 1 2 5 4 2 2 4 3 4 1
[3886] 2 5 1 1 1 5 4 4 1 4 4 5 5 5 3 5 5 2 4 2 1 5 3 4 5 4 1 2 2 2 3 2 3 5 3 2 5
[3923] 3 2 4 5 4 4 4 3 4 5 5 5 3 3 5 2 1 2 3 5 5 4 2 3 3 2 1 1 1 5 1 1 4 3 2 2 1
[3960] 2 2 1 3 3 2 3 4 4 3 4 3 1 2 4 5 3 4 4 1 3 1 1 5 4 5 4 4 2 1 1 3 5 4 2 5 1
[3997] 3 4 5 3 2 5 5 3 4 4 4 3 5 4 1 2 2 5 3 4 1 3 3 4 5 5 4 1 3 4 3 4 3 1 5 1 3
[4034] 5 2 2 2 5 2 4 2 1 1 5 5 3 5 5 1 3 5 4 5 4 5 2 5 1 4 1 1 1 2 5 1 4 1 3 3 5
[4071] 1 3 1 1 5 5 1 4 4 4 4 2 2 1 3 5 2 4 3 3 2 1 1 4 3 4 3 5 2 3 1 5 3 4 2 5 3
[4108] 5 3 2 4 1 3 3 5 1 2 5 4 3 1 2 3 5 5 3 3 3 3 5 2 2 2 4 2 3 2 1 3 3 4 5 4 4
[4145] 3 2 5 2 2 3 3 2 4 5 2 2 2 4 4 4 4 2 4 2 3 1 3 1 1 3 4 5 5 3 4 2 2 1 3 2 3
[4182] 2 4 5 3 1 3 4 4 4 4 4 2 5 1 1 4 2 3 4 2 4 5 3 2 4 4 5 4 5 4 2 2 2 4 3 5 5
[4219] 3 1 1 5 3 3 5 4 3 3 4 1 2 4 1 3 2 3 2 5 1 4 2 3 4 1 4 1 5 4 2 3 2 5 4 5 4
[4256] 1 1 2 1 2 1 4 5 1 5 5 4 5 5 3 5 1 3 1 3 5 4 4 1 5 3 2 3 1 2 2 3 4 1 5 5 2
[4293] 3 2 3 3 5 4 4 2 2 3 1 3 4 3 1 3 1 5 5 1 3 4 2 4 5 4 3 5 3 3 2 2 2 1 5 4 1
[4330] 2 1 4 1 4 4 5 5 1 1 4 2 2 3 3 2 4 5 1 2 1 2 5 3 4 1 3 2 4 1 3 5 1 4 5 5 1
[4367] 3 2 1 1 3 5 5 4 2 3 2 3 4 1 3 4 5 1 3 5 2 4 5 4 2 1 3 5 3 2 3 4 1 5 2 2 2
[4404] 3 5 5 2 1 4 2 4 4 5 5 4 2 3 1 5 4 3 5 3 2 1 4 1 4 4 3 2 1 5 4 1 4 5 4 5 5
[4441] 2 5 3 4 3 3 1 1 5 1 2 2 4 4 2 1 1 3 4 1 4 5 4 1 3 5 3 4 5 1 4 1 2 3 4 2 3
[4478] 2 5 5 3 5 4 1 4 2 3 5 1 4 1 4 3 1 5 2 4 3 1 2 2 3 1 3 3 5 2 1 3 2 5 4 5 3
[4515] 2 2 4 5 5 3 5 2 2 2 5 1 1 4 3 5 2 4 4 5 5 4 5 2 1 3 4 5 1 3 3 4 2 2 3 5 3
[4552] 2 2 3 5 1 4 5 3 3 1 5 5 4 2 2 3 3 2 5 5 1 2 5 4 4 3 2 1 2 5 2 4 2 2 1 2 3
[4589] 5 4 3 3 1 4 4 1 4 2 4 5 2 5 3 5 1 5 3 1 2 1 2 3 3 2 4 4 3 4 4 2 1 5 4 3 2
[4626] 4 5 5 1 5 1 4 1 2 3 4 3 1 4 4 5 4 4 3 2 3 3 5 2 3 4 5 5 4 1 3 3 2 3 1 1 4
[4663] 2 3 1 4 4 1 4 5 5 5 4 3 3 4 3 4 4 3 5 5 3 4 3 1 4 2 5 1 4 4 4 1 4 2 1 5 1
[4700] 2 5 4 4 4 4 3 5 3 5 4 2 5 5 3 4 4 5 5 5 3 2 4 3 2 5 3 3 3 5 3 4 2 3 2 3 4
[4737] 3 5 4 3 3 2 3 4 1 4 3 3 4 5 4 5 4 3 1 2 5 2 2 1 2 3 2 5 5 3 4 1 3 5 5 4 5
[4774] 4 3 5 3 4 3 1 4 2 3 5 5 2 2 4 3 3 4 2 5 3 2 4 3 3 4 1 4 1 3 5 5 5 4 2 4 1
[4811] 5 2 2 3 3 4 4 5 5 1 3 3 5 1 1 3 2 5 1 1 2 5 4 2 2 2 4 2 2 5 5 4 5 4 2 3 4
[4848] 2 3 1 4 2 1 1 1 2 3 2 3 3 5 3 1 3 5 4 3 3 5 3 2 2 1 2 3 2 4 1 1 3 1 5 1 2
[4885] 4 5 4 2 1 4 2 4 3 2 4 3 2 5 4 5 4 3 4 5 1 3 2 5 4 4 2 3 4 2 5 3 4 3 4 2 1
[4922] 1 4 4 2 3 4 3 5 5 2 2 4 4 4 5 4 1 4 3 1 4 1 3 1 4 3 5 4 5 4 4 1 3 1 3 3 4
[4959] 5 4 3 5 4 3 4 4 1 2 1 3 2 1 2 1 4 1 3 2 5 3 2 4 5 3 3 2 2 2 2 4 4 4 1 2 5
[4996] 4 4 1 2 5 1 2 3 2 3 1 5 5 2 3 1 4 3 2 2 3 1 2 4 4 4 5 5 4 1 1 1 3 2 4 4 3
[5033] 4 4 3 5 5 1 3 3 5 2 5 3 1 4 1 1 2 3 4 2 4 3 4 5 3 2 4 5 2 3 5 5 2 5 3 5 4
[5070] 4 3 3 2 2 4 1 2 4 5 3 4 4 4 1 4 3 1 4 3 5 1 2 2 2 3 2 2 4 1 3 5 3 4 4 3 2
[5107] 3 1 2 1 1 1 3 3 2 3 5 3 4 2 4 4 1 3 5 5 1 2 1 1 2 2 4 2 4 2 1 2 2 4 2 2 3
[5144] 4 2 4 2 1 1 3 1 5 3 1 3 2 5 3 4 3 5 4 5 2 2 4 3 4 2 1 2 2 5 5 5 4 4 4 2 4
[5181] 3 2 2 5 5 5 3 3 2 4 5 2 3 4 3 5 4 2 2 4 5 1 3 4 1 2 2 4 2 5 4 3 4 1 3 5 2
[5218] 3 2 5 4 5 1 1 2 5 4 3 2 2 1 1 5 1 1 3 5 4 5 3 3 3 3 4 5 3 5 4 1 5 5 2 1 1
[5255] 1 5 4 2 2 3 4 3 3 5 2 5 4 5 5 4 4 2 5 2 3 4 1 4 3 1 5 2 5 5 2 4 1 2 1 2 1
[5292] 1 3 5 2 3 2 1 5 4 4 2 1 4 2 3 1 1 5 3 2 1 5 3 3 3 3 1 5 4 3 1 5 5 5 5 1 5
[5329] 5 4 2 3 3 5 4 5 5 3 2 1 3 2 2 2 3 2 4 3 3 2 2 3 4 3 4 4 5 1 2 3 5 3 5 3 3
[5366] 5 1 3 1 5 4 2 5 3 2 4 4 4 2 4 2 5 1 4 2 5 3 2 5 5 5 4 5 1 2 2 2 4 3 1 3 5
[5403] 5 4 2 1 1 4 1 3 2 5 2 1 4 1 2 2 2 2 3 5 2 5 3 1 2 4 1 3 5 4 5 5 2 3 3 4 4
[5440] 1 4 5 4 1 5 5 4 3 4 5 5 3 2 1 2 3 3 2 1 5 2 5 3 3 5 3 3 3 4 4 5 4 4 2 2 5
[5477] 4 5 2 2 5 3 1 1 2 2 3 4 2 4 2 1 3 5 5 5 1 2 4 1 4 1 1 3 1 2 3 3 1 1 4 4 1
[5514] 3 2 2 3 5 5 3 5 3 5 1 3 4 2 3 3 4 5 4 5 1 4 3 1 3 3 3 5 5 2 4 1 5 5 5 3 1
[5551] 5 5 2 2 4 4 5 2 5 1 3 3 4 2 5 1 4 5 4 1 3 3 5 5 5 3 3 5 1 1 3 5 1 5 1 3 4
[5588] 5 4 1 4 3 4 5 4 3 4 1 2 1 3 2 1 2 1 2 4 4 1 1 4 5 1 4 5 3 4 2 2 4 3 5 2 3
[5625] 1 3 3 5 2 4 5 3 2 1 4 3 3 3 3 4 3 3 1 1 4 1 1 2 5 3 4 5 4 1 2 4 2 4 1 4 1
[5662] 4 3 2 5 2 5 1 4 2 3 3 1 2 5 5 2 3 3 1 2 4 3 2 2 4 5 4 2 3 2 2 4 2 3 4 5 3
[5699] 4 1 3 5 4 5 4 2 4 2 2 1 4 5 3 4 4 4 5 2 4 2 2 3 1 2 3 2 4 1 2 1 4 2 5 2 2
[5736] 1 3 3 3 5 3 4 4 1 4 3 4 2 3 3 2 1 1 1 1 2 4 3 3 3 5 4 1 3 3 3 1 1 5 4 2 1
[5773] 1 4 1 5 4 3 3 4 3 3 1 3 3 2 5 2 4 3 5 1 4 3 3 5 4 4 2 4 4 4 1 4 5 5 5 2 2
[5810] 3 5 1 5 3 3 3 1 4 3 2 4 3 2 4 3 5 3 4 5 5 1 4 5 5 3 3 3 4 2 3 1 2 2 3 2 3
[5847] 3 4 3 1 1 2 3 4 5 4 2 5 4 3 2 4 3 2 1 4 3 2 5 1 5 4 1 5 4 4 2 4 1 4 3 1 4
[5884] 3 3 4 1 3 2 1 5 2 4 4 3 3 5 2 3 3 3 2 5 3 3 5 2 1 5 4 3 2 2 2 5 1 5 4 2 3
[5921] 3 1 1 3 4 3 2 3 2 4 3 5 2 3 1 2 4 3 2 4 2 3 1 5 3 2 4 5 5 4 2 5 4 2 4 5 2
[5958] 5 2 3 4 1 3 1 2 1 4 3 1 2 5 4 3 5 3 5 4 2 2 4 1 4 1 1 5 5 2 3 3 4 3 2 4 5
[5995] 1 2 2 3 2 4 3 2 3 5 4 2 3 4 4 1 3 5 3 4 5 2 3 4 1 4 2 5 4 3 4 3 3 4 5 5 2
[6032] 3 4 5 3 5 5 3 2 4 5 1 4 2 3 5 3 4 1 1 5 5 4 5 4 3 2 3 1 5 5 4 2 5 5 1 1 1
[6069] 4 1 3 3 5 5 2 1 1 1 4 4 4 5 1 4 4 3 4 3 4 5 5 2 4 3 4 1 2 1 3 3 2 3 4 3 1
[6106] 1 5 2 2 3 2 2 5 3 2 1 3 5 5 2 4 4 2 4 5 4 2 2 3 2 4 1 5 3 5 2 2 2 3 4 3 1
[6143] 5 4 4 2 5 4 4 4 4 5 5 2 5 3 1 4 2 2 4 3 3 1 1 4 3 2 1 4 5 3 4 5 5 5 3 3 4
[6180] 1 5 2 4 4 5 2 1 1 2 2 4 1 2 4 3 5 5 3 3 4 5 4 1 1 4 4 2 1 1 2 5 1 5 5 2 1
[6217] 4 3 2 4 3 2 3 3 2 2 5 3 2 1 1 4 4 3 4 1 4 5 4 2 3 1 4 4 3 1 1 1 1 3 5 3 1
[6254] 2 1 5 4 3 4 3 2 2 4 3 1 5 1 4 4 1 3 4 4 3 2 4 1 4 1 5 1 4 4 5 3 4 3 5 3 4
[6291] 3 2 2 2 3 2 2 4 3 4 1 5 5 2 5 5 3 5 4 4 3 2 2 4 2 3 2 1 2 2 3 1 2 2 5 3 2
[6328] 3 2 3 2 4 5 3 1 5 2 5 4 2 4 1 3 3 3 4 1 2 5 1 4 4 5 1 3 3 4 1 5 4 2 4 4 3
[6365] 4 4 5 3 1 1 5 3 2 4 5 2 2 2 4 1 5 4 2 3 5 4 3 5 1 4 2 4 3 2 5 1 1 2 5 3

Labels:
 value                 label
     1   Highly dissatisfied
     2 Somewhat dissatisfied
     3               Neutral
     4    Somewhat satisfied
     5      Highly satisfied

11.9 Visualizing Dichotomous Data

11.9.1 Visualizing with Labels

  • Use haven::as_factor() from haven package
```{r}
# Converting multiple labelled variables to factors with labels
data %>% 
  select(starts_with("own")) %>% 
  mutate(across(where(haven::is.labelled), haven::as_factor)) %>% # from haven package
  head()
  
# Long data
owns_long_fct <- data %>% 
  select(starts_with("own")) %>% 
  mutate(across(where(is.labelled), haven::as_factor)) %>% 
  pivot_longer(everything(), names_to = "owns", values_to = "measures") 

owns_long_fct

# Dodged barplot
owns_long_fct %>% 
  count(owns, measures)  %>% 
  ggplot(aes(x = n, y = measures, fill = owns)) +
  geom_col(position = "dodge") +
  coord_flip()+ 
  scale_x_continuous(labels = comma) +
  labs(title = "Electronics Ownership Status",
       x = "# of participants who owns the said products",
       y = "Ownership Status",
       fill = "Product Ownership")

# Facet_wrap barplot
owns_long_fct %>% 
  count(owns, measures) %>% 
  ggplot(aes(x = n, y = measures, fill = owns)) +
  geom_col(show.legend = FALSE)+
  facet_wrap(. ~ owns)+ 
  scale_x_continuous(labels = comma) +
  labs(title = "Electronics Ownership Status",
       x = "# of participants who owns the said products",
       y = "",
       fill = "Product Ownership")

# Facet_grid barplot
owns_long_fct %>% 
  count(owns, measures) %>% 
  ggplot(aes(x = n, y = measures, fill = owns)) +
  geom_col(show.legend = FALSE)+
  facet_grid(owns ~ .)+ 
  scale_x_continuous(labels = comma) +
  labs(title = "Electronics Ownership Status",
       x = "# of participants who owns the said products",
       y = "",
       fill = "Product Ownership")
```

11.10 Visualizing Plots with Summary Stat

There are two ways to do so.

11.10.1 From a wide data

  • summarize first and then reshaping the data.
```{r}
length(data)
n <- length(data$age)
n

data %>% 
  mutate(n = n()) %>% 
  summarize(across(starts_with("own"), 
                   .fns =  list(prop = mean, sd = sd),
                   .names = "{.col}_{.fn}")) %>% 
  pivot_longer(everything(),
               names_to = c("set", ".value"),
               names_pattern = "(.+)_(.+)") %>% 
  mutate(set = fct_reorder(set, prop)) %>% 
  ggplot(aes(set, prop, fill = set)) +
  geom_col() +
  geom_text(aes(label = round(prop, 2))) +
  geom_errorbar(aes(x = set, 
                    ymin = prop-(1.96*sd/sqrt(n)), 
                    ymax = prop+(1.96*sd/sqrt(n))), 
                width = .5, color = "orange", linewidth = 1.0, alpha = 0.7) +
  scale_y_continuous(labels = percent) + 
  coord_flip() +
  labs(x = "Electrnics",
       y = "Proportions of those who own the said electronics",
       title = "Electronics Ownership Status",
       subtitle = "Percentages among participants") +
  guides(fill = "none")
```
[1] 29
[1] 6400

12 Appendix

12.1 Common operations with labelled data

  • Reference: https://community.rstudio.com/t/leveraging-labelled-data-in-r-r-views-submission/114983

  • I primarily use three packages for working with labelled data: haven, labelled, and sjlabelled. These three packages do have some overlap in functionality, in addition to naming schemes that differ but achieve the same objective (e.g., haven::as_factor vs sjlabelled::as_label), or naming schemes that are the same but achieve different objectives (e.g., haven::as_factor vs sjlabelled::as_factor).

  • To compound confusion, the concept of a label can refer to either variable or value labels. Frequently, plural function names refer to value labels, as in haven::zap_labels or labelled::remove_val_labels.

Here are operations I commonly perform on labelled data:

  • Evaluate if variable is of class haven_labelled.
    • Why? Troubleshooting, exploring, mutating.
    • Function(s): haven::is.labelled()
  • Convert haven_labelled variable to numeric value codes.
    • Why? To treat the variable as continuous for analysis.
    • For example, if a 1-7 rating scale imports as labelled and you want to compute a mean.
    • Function(s):
      • base::as.numeric() (strips variable of all metadata),
      • haven::zap_labels() and
      • labelled::remove_val_labels (removes value labels, retains other metadata)
  • Convert haven_labelled() variable to factor with value labels.
    • Why? To treat the variable as categorical for analysis.
    • Function(s):
      • haven::as_factor(),
      • labelled::to_factor(),
      • sjlabelled::as_label().
    • As far as I can tell, these three functions have the same result. By default, the factor levels are ordered by value codes.
  • Convert variable label to variable name.
    • Why? For more informative or readable variable names.
    • Function(s):
      • sjlabelled::label_to_colnames()

12.2 Joseph Larmarange reference

```{r practice}
library(labelled)
class(iris)
head(iris)

var_label(iris$Sepal.Length) <- "Length of sepal"
head(iris)

var_label(iris) <- list(Petal.Length = "Length of petal", Petal.Width = "Width of Petal")
var_label(iris$Petal.Length)
var_label(iris)
var_label(iris$Sepal.Length) <- NULL
view(iris)

look_for(iris)
```
[1] "data.frame"
[1] "Length of petal"
$Sepal.Length
[1] "Length of sepal"

$Sepal.Width
NULL

$Petal.Length
[1] "Length of petal"

$Petal.Width
[1] "Width of Petal"

$Species
NULL

12.3 Converting labelled data to unlabelled data

  • Comparison of various methods
```{r  eval=FALSE}
data_unlabelled <- data %>% 
  mutate(
    f1 = haven::as_factor(carcat),
    f2 = labelled::to_factor(carcat),
    f3 = sjlabelled::as_label(carcat),
    f4 = sjlabelled::as_factor(carcat),
    n1 = base::as.numeric(carcat),
    n2 = sjlabelled::as_numeric(carcat),
    n3 = haven::zap_labels(carcat),
    n4 = labelled::remove_val_labels(carcat)
  ) %>% 
  dplyr::select(age, f1, f2, f3, f4, n1, n2, n3, n4)

data_unlabelled
```

12.4 IMDB Lowest Rated Movies

12.4.1 Movie titles

```{r}
library(rvest)

url <- "https://www.imdb.com/chart/bottom?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=4da9d9a5-d299-43f2-9c53-"

# movie titles
movie <- 
  read_html(url) |> 
  html_elements("li") |> 
  html_elements(".ipc-metadata-list-summary-item__tc") |> 
  html_elements(".ipc-title__text") |> 
  html_text()

## alternatively,
read_html(url) |> 
  html_elements("li .ipc-metadata-list-summary-item__tc .ipc-title__text") |> 
  html_text()
```
 [1] "1. Disaster Movie"                  "2. Manos: The Hands of Fate"       
 [3] "3. Birdemic: Shock and Terror"      "4. Superbabies: Baby Geniuses 2"   
 [5] "5. Kirk Cameron's Saving Christmas" "6. The Hottie & the Nottie"        
 [7] "7. House of the Dead"               "8. Son of the Mask"                
 [9] "9. Radhe"                           "10. Epic Movie"                    
[11] "11. Pledge This!"                   "12. Battlefield Earth"             
[13] "13. Alone in the Dark"              "14. Dragonball Evolution"          
[15] "15. Race 3"                         "16. Foodfight!"                    
[17] "17. Going Overboard"                "18. From Justin to Kelly"          
[19] "19. Turks in Space"                 "20. Meet the Spartans"             
[21] "21. Gigli"                          "22. Daniel the Wizard"             
[23] "23. Date Movie"                     "24. Cats"                          
[25] "25. Baby Geniuses"                 

12.4.2 ratings

```{r}
ratings <- read_html(url) |> 
  html_elements("li") |> 
  html_elements(".ipc-metadata-list-summary-item__tc") |> 
  html_elements(".ipc-rating-star--rating") |> 
  html_text()
```

12.4.2.1 alt.

```{r}
# ratings
rate <- 
  read_html(url) |> 
  html_elements(".ipc-rating-star.ipc-rating-star--base.ipc-rating-star--imdb.ratingGroup--imdb-rating") |> 
  html_text()
class(rate)

#rate <- rate[!grepl("Rate", rate)]
#rating <- str_extract(rate, "\\d\\.\\d\\s(?=\\(\\d+K\\))") #1.5 (32K)

rating <- 
  tibble(rate) |> 
  mutate(rate = str_extract(rate, "\\d\\.\\d\\s(?=\\(\\d+K\\))")) # look around (followed by)

rating
```
[1] "character"

12.4.3 Vote counts

ipc-rating-star–voteCount ::: {.cell}

```{r}
# vote counts
vote_counts <- 
  read_html(url) |> 
  html_elements("li") |> 
  html_elements(".ipc-rating-star--voteCount") |> 
  html_text(trim = TRUE) 
vote_counts

## Alternatively,
read_html(url) |> 
  html_elements("li .ipc-metadata-list-summary-item__tc .ipc-rating-star--voteCount") |> 
  html_text(trim = TRUE) 
```
 [1] "(96K)"  "(38K)"  "(26K)"  "(32K)"  "(17K)"  "(39K)"  "(39K)"  "(61K)" 
 [9] "(181K)" "(111K)" "(19K)"  "(84K)"  "(48K)"  "(83K)"  "(49K)"  "(12K)" 
[17] "(15K)"  "(27K)"  "(17K)"  "(113K)" "(51K)"  "(15K)"  "(63K)"  "(58K)" 
[25] "(28K)" 
 [1] "(96K)"  "(38K)"  "(26K)"  "(32K)"  "(17K)"  "(39K)"  "(39K)"  "(61K)" 
 [9] "(181K)" "(111K)" "(19K)"  "(84K)"  "(48K)"  "(83K)"  "(49K)"  "(12K)" 
[17] "(15K)"  "(27K)"  "(17K)"  "(113K)" "(51K)"  "(15K)"  "(63K)"  "(58K)" 
[25] "(28K)" 

:::

12.4.4 Creating a data frame

```{r}
# create a data frame
worst_movies <-
  tibble(movie, ratings, vote_counts)
worst_movies

worst_movies <-
  worst_movies |> 
  mutate(vote_counts = str_extract(vote_counts, "\\d+K")) |> 
  mutate(movies = str_extract(movie, "(?<=\\d\\.\\s).+")) |> # Look arounds (proceeded by)
  select(-movie) |> 
  relocate(movies, ratings) 

sorted_worst_movies <- 
  worst_movies |> 
  arrange(rate)

sorted_worst_movies
```

12.5 Selecting descendants (children, children’s children, etc.)

```{r}
'https://en.wikipedia.org/wiki/Hyperlink' %>%
  read_html() %>%
  html_elements('a') %>%
  length()

'https://en.wikipedia.org/wiki/Hyperlink' %>%
  read_html() %>%
  html_elements('#content a') %>%
  length()

'https://en.wikipedia.org/wiki/Hyperlink' %>%
  read_html() %>%
  html_element('#content') %>%
  html_elements('a') %>%
  length()
```
[1] 517
[1] 454
[1] 454

13 Further resources and references