```{r loading up packages}
library(tidyverse) # mega package containing 8 packages
library(gt)
library(gtsummary)
library(sjPlot)
```
Summary table with gtsummary
Preparation for MSDM CEP
1 Loading Packages
2 gtSummary package
3 Experimental Design:
Trial data
trial: data frame with 200 rows-one row per patient
trt
: Chemotherapy Treatmentage
: Agemarker
: Marker Level (ng/mL): the concentration of a specific protein or substance, measured in nanograms per milliliter (ng/mL), that can be detected in a blood sample and may indicate the presence or progression of cancer.stage
T Stage: The T stage in cancer describes the size and extent of the primary tumor. It’s part of the TNM staging system, which is the most common way to stage cancer.
T stage | Meaning |
---|---|
T0 | No evidence of a tumor |
T1 | A small tumor |
T2 | A larger tumor that has grown into nearby tissue |
T3 | A larger tumor that has grown into nearby tissue |
T4 | A larger or more advanced tumor that has grown into nearby tissue |
TX | The tumor cannot be measured |
Tis | The tumor is still within the confines of the normal glands and cannot metastasize |
grade
: Grade: describes how abnormal the cancer cells look under a microscope compared to normal cells, with higher grades indicating more rapid growth and a greater likelihood of spread.�response
: Tumor Responsedeath
: Patient Diedttdeath
: Months to Death/Censor
3.1 tbl_summary()
- Designed for Descriptive Statistics
```{r}
# data()
trialclass(trial)
glimpse(trial)
view_df(trial)
<- trial |>
trial2 select(trt, age, grade, response)
|>
trial2 tbl_summary()
```
# A tibble: 200 × 8
trt age marker stage grade response death ttdeath
<chr> <dbl> <dbl> <fct> <fct> <int> <int> <dbl>
1 Drug A 23 0.16 T1 II 0 0 24
2 Drug B 9 1.11 T2 I 1 0 24
3 Drug A 31 0.277 T1 II 0 0 24
4 Drug A NA 2.07 T3 III 1 1 17.6
5 Drug A 51 2.77 T4 III 1 1 16.4
6 Drug B 39 0.613 T4 I 0 1 15.6
7 Drug A 37 0.354 T1 II 0 0 24
8 Drug A 32 1.74 T1 I 0 1 18.4
9 Drug A 31 0.144 T1 II 0 0 24
10 Drug B 34 0.205 T3 I 0 1 10.5
# ℹ 190 more rows
[1] "tbl_df" "tbl" "data.frame"
Rows: 200
Columns: 8
$ trt <chr> "Drug A", "Drug B", "Drug A", "Drug A", "Drug A", "Drug B", "…
$ age <dbl> 23, 9, 31, NA, 51, 39, 37, 32, 31, 34, 42, 63, 54, 21, 48, 71…
$ marker <dbl> 0.160, 1.107, 0.277, 2.067, 2.767, 0.613, 0.354, 1.739, 0.144…
$ stage <fct> T1, T2, T1, T3, T4, T4, T1, T1, T1, T3, T1, T3, T4, T4, T1, T…
$ grade <fct> II, I, II, III, III, I, II, I, II, I, III, I, III, I, I, III,…
$ response <int> 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0…
$ death <int> 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0…
$ ttdeath <dbl> 24.00, 24.00, 24.00, 17.64, 16.43, 15.64, 24.00, 18.43, 24.00…
ID | Name | Label | Values | Value Labels |
---|---|---|---|---|
1 | trt | Chemotherapy Treatment | <output omitted> | |
2 | age | Age | range: 6-83 | |
3 | marker | Marker Level (ng/mL) | range: 0.0-3.9 | |
4 | stage | T Stage |
T1 T2 T3 T4 |
|
5 | grade | Grade |
I II III |
|
6 | response | Tumor Response | range: 0-1 | |
7 | death | Patient Died | range: 0-1 | |
8 | ttdeath | Months to Death/Censor | range: 3.5-24.0 |
Characteristic | N = 2001 |
---|---|
Chemotherapy Treatment | |
Drug A | 98 (49%) |
Drug B | 102 (51%) |
Age | 47 (38, 57) |
Unknown | 11 |
Grade | |
I | 68 (34%) |
II | 68 (34%) |
III | 64 (32%) |
Tumor Response | 61 (32%) |
Unknown | 7 |
1 n (%); Median (Q1, Q3) |
4 Differences between groups
4.1 Wilcoxon Rank sum test
- using median, which is a default
```{r}
|>
trial select(trt, age, marker, ttdeath) |>
tbl_summary(by = trt) |>
add_p()
# modifying
|>
trial select(trt, age, marker, ttdeath) |>
tbl_summary(by = trt) |>
add_p(
pvalue_fun = label_style_pvalue(digits = 2)
|>
) add_overall() # add overall statistics
```
Characteristic | Drug A N = 981 |
Drug B N = 1021 |
p-value2 |
---|---|---|---|
Age | 46 (37, 60) | 48 (39, 56) | 0.7 |
Unknown | 7 | 4 | |
Marker Level (ng/mL) | 0.84 (0.23, 1.60) | 0.52 (0.18, 1.21) | 0.085 |
Unknown | 6 | 4 | |
Months to Death/Censor | 23.5 (17.4, 24.0) | 21.2 (14.5, 24.0) | 0.14 |
1 Median (Q1, Q3) | |||
2 Wilcoxon rank sum test |
Characteristic | Overall N = 2001 |
Drug A N = 981 |
Drug B N = 1021 |
p-value2 |
---|---|---|---|---|
Age | 47 (38, 57) | 46 (37, 60) | 48 (39, 56) | 0.72 |
Unknown | 11 | 7 | 4 | |
Marker Level (ng/mL) | 0.64 (0.22, 1.41) | 0.84 (0.23, 1.60) | 0.52 (0.18, 1.21) | 0.085 |
Unknown | 10 | 6 | 4 | |
Months to Death/Censor | 22.4 (15.9, 24.0) | 23.5 (17.4, 24.0) | 21.2 (14.5, 24.0) | 0.14 |
1 Median (Q1, Q3) | ||||
2 Wilcoxon rank sum test |
4.2 Welch Two Samples T-test
```{r}
# be careful
|>
trial select(trt, age, marker, ttdeath) |>
tbl_summary(
by = trt,
statistic = list(
c(age, marker, ttdeath) ~ "{mean} ({sd})"),
missing = "no"
|>
) add_p() # uses Wilcoxon rank sum test
|>
trial select(trt, age, marker, ttdeath) |>
tbl_summary(
by = trt,
statistic = list(
c(age, marker, ttdeath) ~ "{mean} ({sd})"),
missing = "no"
|>
) add_difference(
pvalue_fun = label_style_pvalue(digits = 2)
|> # Welch two-sample t-test
) #modify_caption("**Table 1. Effectiveness of Drugs**") |>
as_gt() |>
tab_header(title = md("**Table 1. Effectiveness of Drugs**"),
subtitle = "Patient Characteristics")
```
Characteristic | Drug A N = 981 |
Drug B N = 1021 |
p-value2 |
---|---|---|---|
Age | 47 (15) | 47 (14) | 0.7 |
Marker Level (ng/mL) | 1.02 (0.89) | 0.82 (0.83) | 0.085 |
Months to Death/Censor | 20.2 (5.0) | 19.0 (5.5) | 0.14 |
1 Mean (SD) | |||
2 Wilcoxon rank sum test |
4.3 Chi-square with tbl_summary
```{r}
|>
trial select(trt, stage, grade) |>
tbl_summary(by = trt) |>
add_p(pvalue_fun = label_style_pvalue(digits = 2)) |>
add_overall()
```
Characteristic | Overall N = 2001 |
Drug A N = 981 |
Drug B N = 1021 |
p-value2 |
---|---|---|---|---|
T Stage | 0.87 | |||
T1 | 53 (27%) | 28 (29%) | 25 (25%) | |
T2 | 54 (27%) | 25 (26%) | 29 (28%) | |
T3 | 43 (22%) | 22 (22%) | 21 (21%) | |
T4 | 50 (25%) | 23 (23%) | 27 (26%) | |
Grade | 0.87 | |||
I | 68 (34%) | 35 (36%) | 33 (32%) | |
II | 68 (34%) | 32 (33%) | 36 (35%) | |
III | 64 (32%) | 31 (32%) | 33 (32%) | |
1 n (%) | ||||
2 Pearson’s Chi-squared test |
4.4 Customizing tbl_summary()
output
```{r}
|>
trial2 tbl_summary(
by = trt,
type = age ~ "continuous2",
statistic =
list(age ~ c("{mean} ({sd})", "{min}, {max}"),
~ "{n}/{N} ({p}%)"),
response label = grade ~ "Pathological Tumor Grade",
digits = age ~ 2
|>
) add_p(pvalue_fun = label_style_pvalue(digits = 2)) |>
add_q(method = "bonferroni") # p-values adjusted for multiple comparison
```
Characteristic | Drug A N = 981 |
Drug B N = 1021 |
p-value2 | q-value3 |
---|---|---|---|---|
Age | 0.72 | >0.99 | ||
Mean (SD) | 47.01 (14.71) | 47.45 (14.01) | ||
Min, Max | 6.00, 78.00 | 9.00, 83.00 | ||
Unknown | 7 | 4 | ||
Pathological Tumor Grade | 0.87 | >0.99 | ||
I | 35 (36%) | 33 (32%) | ||
II | 32 (33%) | 36 (35%) | ||
III | 31 (32%) | 33 (32%) | ||
Tumor Response | 28/95 (29%) | 33/98 (34%) | 0.53 | >0.99 |
Unknown | 3 | 4 | ||
1 n (%); n/N (%) | ||||
2 Wilcoxon rank sum test; Pearson’s Chi-squared test | ||||
3 Bonferroni correction for multiple testing |
4.4.1 as_gt()
- how to combine
gtSummary
withgt
objects
```{r}
|>
trial select(trt, marker, response) |>
tbl_summary(by = trt)
|>
trial select(trt, marker, response) |>
tbl_summary(
by = trt,
statistic = list(
~ "{mean} ({sd})",
marker ~ "{p}%"),
response missing = "no"
|>
) add_difference(
pvalue_fun = label_style_pvalue(digits = 2)
|>
) #modify_caption("**Table 1. Effectiveness of Drugs**") |>
as_gt() |>
tab_header(title = md("**Table 1. Effectiveness of Drugs**"),
subtitle = "Patient Characteristics")
```
Characteristic | Drug A N = 981 |
Drug B N = 1021 |
---|---|---|
Marker Level (ng/mL) | 0.84 (0.23, 1.60) | 0.52 (0.18, 1.21) |
Unknown | 6 | 4 |
Tumor Response | 28 (29%) | 33 (34%) |
Unknown | 3 | 4 |
1 Median (Q1, Q3); n (%) |
4.5 Chi-Square with tbl_cross()
- Cross-tabulation with trial data
```{r}
|>
trial tbl_cross(
row = trt,
col = stage,
percent = "column"
|>
) add_p() |>
bold_labels()
# cf
|>
trial select(trt, stage) |>
tbl_summary(by = stage) |>
add_p() |>
bold_labels()
```
T Stage
|
Total | p-value1 | ||||
---|---|---|---|---|---|---|
T1 | T2 | T3 | T4 | |||
Chemotherapy Treatment | 0.9 | |||||
Drug A | 28 (53%) | 25 (46%) | 22 (51%) | 23 (46%) | 98 (49%) | |
Drug B | 25 (47%) | 29 (54%) | 21 (49%) | 27 (54%) | 102 (51%) | |
Total | 53 (100%) | 54 (100%) | 43 (100%) | 50 (100%) | 200 (100%) | |
1 Pearson’s Chi-squared test |
Characteristic | T1 N = 531 |
T2 N = 541 |
T3 N = 431 |
T4 N = 501 |
p-value2 |
---|---|---|---|---|---|
Chemotherapy Treatment | 0.9 | ||||
Drug A | 28 (53%) | 25 (46%) | 22 (51%) | 23 (46%) | |
Drug B | 25 (47%) | 29 (54%) | 21 (49%) | 27 (54%) | |
1 n (%) | |||||
2 Pearson’s Chi-squared test |
Tip
- A chi-square test can be done with either
tbl-summary
ortbl-cross
, but the latter produces more attractive table. - The former is better when the table includes multiple types of statistical tests.
5 Survey Data
5.1 tbl_summary()
```{r}
gss_catclass(gss_cat) # builtin data from forcats package
view_df(gss_cat)
glimpse(gss_cat)
|>
gss_cat tbl_summary()
# by race
|>
gss_cat #count(race)
mutate(race = fct_drop(race)) |>
tbl_summary(by = race)
```
# A tibble: 21,483 × 9
year marital age race rincome partyid relig denom tvhours
<int> <fct> <int> <fct> <fct> <fct> <fct> <fct> <int>
1 2000 Never married 26 White $8000 to 9999 Ind,near … Prot… Sout… 12
2 2000 Divorced 48 White $8000 to 9999 Not str r… Prot… Bapt… NA
3 2000 Widowed 67 White Not applicable Independe… Prot… No d… 2
4 2000 Never married 39 White Not applicable Ind,near … Orth… Not … 4
5 2000 Divorced 25 White Not applicable Not str d… None Not … 1
6 2000 Married 25 White $20000 - 24999 Strong de… Prot… Sout… NA
7 2000 Never married 36 White $25000 or more Not str r… Chri… Not … 3
8 2000 Divorced 44 White $7000 to 7999 Ind,near … Prot… Luth… NA
9 2000 Married 44 White $25000 or more Not str d… Prot… Other 0
10 2000 Married 47 White $25000 or more Strong re… Prot… Sout… 3
# ℹ 21,473 more rows
[1] "tbl_df" "tbl" "data.frame"
ID | Name | Label | Values | Value Labels |
1 | year | range: 2000-2014 | ||
2 | marital | No answer Never married Separated Divorced Widowed Married |
||
3 | age | range: 18-89 | ||
4 | race | Other Black White Not applicable |
||
5 | rincome | No answer Don't know Refused $25000 or more $20000 - 24999 $15000 - 19999 $10000 - 14999 $8000 to 9999 $7000 to 7999 $6000 to 6999 $5000 to 5999 $4000 to 4999 $3000 to 3999 $1000 to 2999 Lt $1000 <... truncated> |
||
6 | partyid | No answer Don't know Other party Strong republican Not str republican Ind,near rep Independent Ind,near dem Not str democrat Strong democrat |
||
7 | relig | No answer Don't know Inter-nondenominational Native american Christian Orthodox-christian Moslem/islam Other eastern Hinduism Buddhism Other None Jewish Catholic Protestant <... truncated> |
||
8 | denom | No answer Don't know No denomination Other Episcopal Presbyterian-dk wh Presbyterian, merged Other presbyterian United pres ch in us Presbyterian c in us Lutheran-dk which Evangelical luth Other lutheran Wi evan luth synod Lutheran-mo synod <... truncated> |
||
9 | tvhours | range: 0-24 |
Rows: 21,483
Columns: 9
$ year <int> 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 20…
$ marital <fct> Never married, Divorced, Widowed, Never married, Divorced, Mar…
$ age <int> 26, 48, 67, 39, 25, 25, 36, 44, 44, 47, 53, 52, 52, 51, 52, 40…
$ race <fct> White, White, White, White, White, White, White, White, White,…
$ rincome <fct> $8000 to 9999, $8000 to 9999, Not applicable, Not applicable, …
$ partyid <fct> "Ind,near rep", "Not str republican", "Independent", "Ind,near…
$ relig <fct> Protestant, Protestant, Protestant, Orthodox-christian, None, …
$ denom <fct> "Southern baptist", "Baptist-dk which", "No denomination", "No…
$ tvhours <int> 12, NA, 2, 4, 1, NA, 3, NA, 0, 3, 2, NA, 1, NA, 1, 7, NA, 3, 3…
Characteristic | N = 21,4831 |
---|---|
year | |
2000 | 2,817 (13%) |
2002 | 2,765 (13%) |
2004 | 2,812 (13%) |
2006 | 4,510 (21%) |
2008 | 2,023 (9.4%) |
2010 | 2,044 (9.5%) |
2012 | 1,974 (9.2%) |
2014 | 2,538 (12%) |
marital | |
No answer | 17 (<0.1%) |
Never married | 5,416 (25%) |
Separated | 743 (3.5%) |
Divorced | 3,383 (16%) |
Widowed | 1,807 (8.4%) |
Married | 10,117 (47%) |
age | 46 (33, 59) |
Unknown | 76 |
race | |
Other | 1,959 (9.1%) |
Black | 3,129 (15%) |
White | 16,395 (76%) |
Not applicable | 0 (0%) |
rincome | |
No answer | 183 (0.9%) |
Don't know | 267 (1.2%) |
Refused | 975 (4.5%) |
$25000 or more | 7,363 (34%) |
$20000 - 24999 | 1,283 (6.0%) |
$15000 - 19999 | 1,048 (4.9%) |
$10000 - 14999 | 1,168 (5.4%) |
$8000 to 9999 | 340 (1.6%) |
$7000 to 7999 | 188 (0.9%) |
$6000 to 6999 | 215 (1.0%) |
$5000 to 5999 | 227 (1.1%) |
$4000 to 4999 | 226 (1.1%) |
$3000 to 3999 | 276 (1.3%) |
$1000 to 2999 | 395 (1.8%) |
Lt $1000 | 286 (1.3%) |
Not applicable | 7,043 (33%) |
partyid | |
No answer | 154 (0.7%) |
Don't know | 1 (<0.1%) |
Other party | 393 (1.8%) |
Strong republican | 2,314 (11%) |
Not str republican | 3,032 (14%) |
Ind,near rep | 1,791 (8.3%) |
Independent | 4,119 (19%) |
Ind,near dem | 2,499 (12%) |
Not str democrat | 3,690 (17%) |
Strong democrat | 3,490 (16%) |
relig | |
No answer | 93 (0.4%) |
Don't know | 15 (<0.1%) |
Inter-nondenominational | 109 (0.5%) |
Native american | 23 (0.1%) |
Christian | 689 (3.2%) |
Orthodox-christian | 95 (0.4%) |
Moslem/islam | 104 (0.5%) |
Other eastern | 32 (0.1%) |
Hinduism | 71 (0.3%) |
Buddhism | 147 (0.7%) |
Other | 224 (1.0%) |
None | 3,523 (16%) |
Jewish | 388 (1.8%) |
Catholic | 5,124 (24%) |
Protestant | 10,846 (50%) |
Not applicable | 0 (0%) |
denom | |
No answer | 117 (0.5%) |
Don't know | 52 (0.2%) |
No denomination | 1,683 (7.8%) |
Other | 2,534 (12%) |
Episcopal | 397 (1.8%) |
Presbyterian-dk wh | 244 (1.1%) |
Presbyterian, merged | 67 (0.3%) |
Other presbyterian | 47 (0.2%) |
United pres ch in us | 110 (0.5%) |
Presbyterian c in us | 104 (0.5%) |
Lutheran-dk which | 267 (1.2%) |
Evangelical luth | 122 (0.6%) |
Other lutheran | 30 (0.1%) |
Wi evan luth synod | 71 (0.3%) |
Lutheran-mo synod | 212 (1.0%) |
Luth ch in america | 71 (0.3%) |
Am lutheran | 146 (0.7%) |
Methodist-dk which | 239 (1.1%) |
Other methodist | 33 (0.2%) |
United methodist | 1,067 (5.0%) |
Afr meth ep zion | 32 (0.1%) |
Afr meth episcopal | 77 (0.4%) |
Baptist-dk which | 1,457 (6.8%) |
Other baptists | 213 (1.0%) |
Southern baptist | 1,536 (7.1%) |
Nat bapt conv usa | 40 (0.2%) |
Nat bapt conv of am | 76 (0.4%) |
Am bapt ch in usa | 130 (0.6%) |
Am baptist asso | 237 (1.1%) |
Not applicable | 10,072 (47%) |
tvhours | 2 (1, 4) |
Unknown | 10,146 |
1 n (%); Median (Q1, Q3) |
Characteristic | Other N = 1,9591 |
Black N = 3,1291 |
White N = 16,3951 |
---|---|---|---|
year | |||
2000 | 175 (8.9%) | 429 (14%) | 2,213 (13%) |
2002 | 167 (8.5%) | 410 (13%) | 2,188 (13%) |
2004 | 201 (10%) | 377 (12%) | 2,234 (14%) |
2006 | 592 (30%) | 634 (20%) | 3,284 (20%) |
2008 | 183 (9.3%) | 281 (9.0%) | 1,559 (9.5%) |
2010 | 183 (9.3%) | 311 (9.9%) | 1,550 (9.5%) |
2012 | 196 (10%) | 301 (9.6%) | 1,477 (9.0%) |
2014 | 262 (13%) | 386 (12%) | 1,890 (12%) |
marital | |||
No answer | 2 (0.1%) | 2 (<0.1%) | 13 (<0.1%) |
Never married | 633 (32%) | 1,305 (42%) | 3,478 (21%) |
Separated | 110 (5.6%) | 196 (6.3%) | 437 (2.7%) |
Divorced | 212 (11%) | 495 (16%) | 2,676 (16%) |
Widowed | 70 (3.6%) | 262 (8.4%) | 1,475 (9.0%) |
Married | 932 (48%) | 869 (28%) | 8,316 (51%) |
age | 37 (29, 48) | 42 (31, 55) | 48 (35, 61) |
Unknown | 8 | 14 | 54 |
rincome | |||
No answer | 14 (0.7%) | 35 (1.1%) | 134 (0.8%) |
Don't know | 45 (2.3%) | 45 (1.4%) | 177 (1.1%) |
Refused | 92 (4.7%) | 150 (4.8%) | 733 (4.5%) |
$25000 or more | 621 (32%) | 886 (28%) | 5,856 (36%) |
$20000 - 24999 | 112 (5.7%) | 220 (7.0%) | 951 (5.8%) |
$15000 - 19999 | 134 (6.8%) | 180 (5.8%) | 734 (4.5%) |
$10000 - 14999 | 126 (6.4%) | 210 (6.7%) | 832 (5.1%) |
$8000 to 9999 | 41 (2.1%) | 56 (1.8%) | 243 (1.5%) |
$7000 to 7999 | 24 (1.2%) | 27 (0.9%) | 137 (0.8%) |
$6000 to 6999 | 26 (1.3%) | 35 (1.1%) | 154 (0.9%) |
$5000 to 5999 | 27 (1.4%) | 40 (1.3%) | 160 (1.0%) |
$4000 to 4999 | 34 (1.7%) | 38 (1.2%) | 154 (0.9%) |
$3000 to 3999 | 35 (1.8%) | 59 (1.9%) | 182 (1.1%) |
$1000 to 2999 | 47 (2.4%) | 71 (2.3%) | 277 (1.7%) |
Lt $1000 | 36 (1.8%) | 51 (1.6%) | 199 (1.2%) |
Not applicable | 545 (28%) | 1,026 (33%) | 5,472 (33%) |
partyid | |||
No answer | 25 (1.3%) | 36 (1.2%) | 93 (0.6%) |
Don't know | 0 (0%) | 0 (0%) | 1 (<0.1%) |
Other party | 22 (1.1%) | 22 (0.7%) | 349 (2.1%) |
Strong republican | 81 (4.1%) | 56 (1.8%) | 2,177 (13%) |
Not str republican | 156 (8.0%) | 88 (2.8%) | 2,788 (17%) |
Ind,near rep | 118 (6.0%) | 92 (2.9%) | 1,581 (9.6%) |
Independent | 612 (31%) | 491 (16%) | 3,016 (18%) |
Ind,near dem | 285 (15%) | 352 (11%) | 1,862 (11%) |
Not str democrat | 437 (22%) | 746 (24%) | 2,507 (15%) |
Strong democrat | 223 (11%) | 1,246 (40%) | 2,021 (12%) |
relig | |||
No answer | 14 (0.7%) | 16 (0.5%) | 63 (0.4%) |
Don't know | 3 (0.2%) | 3 (<0.1%) | 9 (<0.1%) |
Inter-nondenominational | 2 (0.1%) | 29 (0.9%) | 78 (0.5%) |
Native american | 16 (0.8%) | 0 (0%) | 7 (<0.1%) |
Christian | 74 (3.8%) | 141 (4.5%) | 474 (2.9%) |
Orthodox-christian | 1 (<0.1%) | 2 (<0.1%) | 92 (0.6%) |
Moslem/islam | 42 (2.1%) | 35 (1.1%) | 27 (0.2%) |
Other eastern | 10 (0.5%) | 2 (<0.1%) | 20 (0.1%) |
Hinduism | 62 (3.2%) | 1 (<0.1%) | 8 (<0.1%) |
Buddhism | 72 (3.7%) | 10 (0.3%) | 65 (0.4%) |
Other | 29 (1.5%) | 18 (0.6%) | 177 (1.1%) |
None | 323 (16%) | 384 (12%) | 2,816 (17%) |
Jewish | 8 (0.4%) | 10 (0.3%) | 370 (2.3%) |
Catholic | 916 (47%) | 207 (6.6%) | 4,001 (24%) |
Protestant | 387 (20%) | 2,271 (73%) | 8,188 (50%) |
Not applicable | 0 (0%) | 0 (0%) | 0 (0%) |
denom | |||
No answer | 14 (0.7%) | 17 (0.5%) | 86 (0.5%) |
Don't know | 6 (0.3%) | 15 (0.5%) | 31 (0.2%) |
No denomination | 99 (5.1%) | 240 (7.7%) | 1,344 (8.2%) |
Other | 180 (9.2%) | 468 (15%) | 1,886 (12%) |
Episcopal | 9 (0.5%) | 38 (1.2%) | 350 (2.1%) |
Presbyterian-dk wh | 15 (0.8%) | 8 (0.3%) | 221 (1.3%) |
Presbyterian, merged | 1 (<0.1%) | 2 (<0.1%) | 64 (0.4%) |
Other presbyterian | 2 (0.1%) | 2 (<0.1%) | 43 (0.3%) |
United pres ch in us | 2 (0.1%) | 6 (0.2%) | 102 (0.6%) |
Presbyterian c in us | 6 (0.3%) | 5 (0.2%) | 93 (0.6%) |
Lutheran-dk which | 6 (0.3%) | 6 (0.2%) | 255 (1.6%) |
Evangelical luth | 1 (<0.1%) | 2 (<0.1%) | 119 (0.7%) |
Other lutheran | 0 (0%) | 0 (0%) | 30 (0.2%) |
Wi evan luth synod | 0 (0%) | 1 (<0.1%) | 70 (0.4%) |
Lutheran-mo synod | 2 (0.1%) | 2 (<0.1%) | 208 (1.3%) |
Luth ch in america | 2 (0.1%) | 2 (<0.1%) | 67 (0.4%) |
Am lutheran | 3 (0.2%) | 5 (0.2%) | 138 (0.8%) |
Methodist-dk which | 3 (0.2%) | 35 (1.1%) | 201 (1.2%) |
Other methodist | 2 (0.1%) | 5 (0.2%) | 26 (0.2%) |
United methodist | 11 (0.6%) | 49 (1.6%) | 1,007 (6.1%) |
Afr meth ep zion | 0 (0%) | 31 (1.0%) | 1 (<0.1%) |
Afr meth episcopal | 0 (0%) | 76 (2.4%) | 1 (<0.1%) |
Baptist-dk which | 37 (1.9%) | 697 (22%) | 723 (4.4%) |
Other baptists | 5 (0.3%) | 47 (1.5%) | 161 (1.0%) |
Southern baptist | 30 (1.5%) | 355 (11%) | 1,151 (7.0%) |
Nat bapt conv usa | 1 (<0.1%) | 35 (1.1%) | 4 (<0.1%) |
Nat bapt conv of am | 1 (<0.1%) | 58 (1.9%) | 17 (0.1%) |
Am bapt ch in usa | 4 (0.2%) | 76 (2.4%) | 50 (0.3%) |
Am baptist asso | 6 (0.3%) | 97 (3.1%) | 134 (0.8%) |
Not applicable | 1,511 (77%) | 749 (24%) | 7,812 (48%) |
tvhours | 2 (1, 4) | 3 (2, 5) | 2 (1, 4) |
Unknown | 932 | 1,429 | 7,785 |
1 n (%); Median (Q1, Q3) |
5.2 tbl_cross()
```{r}
levels(gss_cat$race)
unique(gss_cat$race)
<- gss_cat |>
gss_cat_md mutate(race = fct_drop(race))
levels(gss_cat_md$race)
levels(gss_cat_md$marital)
## cross-tab
|>
gss_cat_md filter(marital != "No answer") |> # reduce the number of cells
mutate(marital = fct_drop(marital)) |>
tbl_cross(
row = marital,
col = race,
percent = "column",
missing = "no"
|>
) add_p() |>
bold_labels()
```
[1] "Other" "Black" "White" "Not applicable"
[1] White Black Other
Levels: Other Black White Not applicable
[1] "Other" "Black" "White"
[1] "No answer" "Never married" "Separated" "Divorced"
[5] "Widowed" "Married"
race
|
Total | p-value1 | |||
---|---|---|---|---|---|
Other | Black | White | |||
marital | <0.001 | ||||
Never married | 633 (32%) | 1,305 (42%) | 3,478 (21%) | 5,416 (25%) | |
Separated | 110 (5.6%) | 196 (6.3%) | 437 (2.7%) | 743 (3.5%) | |
Divorced | 212 (11%) | 495 (16%) | 2,676 (16%) | 3,383 (16%) | |
Widowed | 70 (3.6%) | 262 (8.4%) | 1,475 (9.0%) | 1,807 (8.4%) | |
Married | 932 (48%) | 869 (28%) | 8,316 (51%) | 10,117 (47%) | |
Total | 1,957 (100%) | 3,127 (100%) | 16,382 (100%) | 21,466 (100%) | |
1 Pearson’s Chi-squared test |
6 More on tbl_summary()
6.1 Modifyingtbl_summary()
function argument
```{r}
<- trial |>
trial2 select(trt, age, grade)
|>
trial2 tbl_summary(by = trt)
|>
trial2 tbl_summary(
by = trt,
statistic = list(
all_continuous() ~ "{mean} ({sd})",
all_categorical() ~ "{n} / {N} ({p}%)"
),digits = all_continuous() ~ 2,
label = grade ~ "Tumor Grade",
missing_text = "(Missing)"
|>
) add_p() |>
add_overall()
```
Characteristic | Drug A N = 981 |
Drug B N = 1021 |
---|---|---|
Age | 46 (37, 60) | 48 (39, 56) |
Unknown | 7 | 4 |
Grade | ||
I | 35 (36%) | 33 (32%) |
II | 32 (33%) | 36 (35%) |
III | 31 (32%) | 33 (32%) |
1 Median (Q1, Q3); n (%) |
Characteristic | Overall N = 2001 |
Drug A N = 981 |
Drug B N = 1021 |
p-value2 |
---|---|---|---|---|
Age | 47.24 (14.31) | 47.01 (14.71) | 47.45 (14.01) | 0.7 |
(Missing) | 11 | 7 | 4 | |
Tumor Grade | 0.9 | |||
I | 68 / 200 (34%) | 35 / 98 (36%) | 33 / 102 (32%) | |
II | 68 / 200 (34%) | 32 / 98 (33%) | 36 / 102 (35%) | |
III | 64 / 200 (32%) | 31 / 98 (32%) | 33 / 102 (32%) | |
1 Mean (SD); n / N (%) | ||||
2 Wilcoxon rank sum test; Pearson’s Chi-squared test |
6.2 Formatting table with tbl_summary()
functions
```{r}
glimpse(trial2)
# Customizing table
|>
trial select(trt, age, grade, response) |>
#filter(!is.na(age)) |>
tbl_summary(
by = trt,
missing = "no") |>
#show_header_names()
add_p(pvalue_fun = label_style_pvalue(digits = 2)) |>
add_overall() |>
add_n() |>
add_stat_label(label = all_categorical() ~ "No. (%)") |>
modify_header(label ~ "**Variables**") |>
modify_spanning_header(c("stat_1", "stat_2") ~ "**Treatment Received**") |>
modify_footnote(
all_stat_cols() ~ "Median(IQR) or Frequency (%)"
|>
) modify_caption("**Table 1. Patient Characteristics**") |>
bold_labels()
```
Rows: 200
Columns: 3
$ trt <chr> "Drug A", "Drug B", "Drug A", "Drug A", "Drug A", "Drug B", "Dru…
$ age <dbl> 23, 9, 31, NA, 51, 39, 37, 32, 31, 34, 42, 63, 54, 21, 48, 71, 3…
$ grade <fct> II, I, II, III, III, I, II, I, II, I, III, I, III, I, I, III, II…
Variables | N | Overall N = 2001 |
Treatment Received
|
p-value2 | |
---|---|---|---|---|---|
Drug A N = 981 |
Drug B N = 1021 |
||||
Age, Median (Q1, Q3) | 189 | 47 (38, 57) | 46 (37, 60) | 48 (39, 56) | 0.72 |
Grade, No. (%) | 200 | 0.87 | |||
I | 68 (34%) | 35 (36%) | 33 (32%) | ||
II | 68 (34%) | 32 (33%) | 36 (35%) | ||
III | 64 (32%) | 31 (32%) | 33 (32%) | ||
Tumor Response, No. (%) | 193 | 61 (32%) | 28 (29%) | 33 (34%) | 0.53 |
1 Median(IQR) or Frequency (%) | |||||
2 Wilcoxon rank sum test; Pearson’s Chi-squared test |
6.3 t-test
```{r}
# function
<- function(data, variable, by, ...) {
my_ttest2 t.test(data[[variable]] ~ as.factor(data[[by]])) |>
::tidy() %>%
broom::mutate(
dplyrstat = glue::glue("t={style_sigfig(statistic)}, {style_pvalue(p.value, prepend_p = TRUE)}")
%>%
) ::pull(stat)
dplyr
}
# t-test
|>
trial select(age, marker, trt) |>
tbl_summary(
by = trt,
missing = "no"
|>
) add_stat(fns = everything() ~ my_ttest2) |>
modify_header(add_stat_1 = "**Treatment Comparison**")
# add_difference
|>
trial select(age, marker, trt) |>
tbl_summary(
by = trt,
missing = "no"
|>
) add_difference()
# change default stat to mean (sd)
|>
trial select(age, marker, trt) |>
tbl_summary(
by = trt,
missing = "no",
statistic = list(
all_continuous() ~ "{mean} ({sd})",
all_categorical() ~ "{n} / {N} ({p}%)"
),|>
) add_difference()
```
Characteristic | Drug A N = 981 |
Drug B N = 1021 |
Treatment Comparison |
---|---|---|---|
Age | 46 (37, 60) | 48 (39, 56) | t=-0.21, p=0.8 |
Marker Level (ng/mL) | 0.84 (0.23, 1.60) | 0.52 (0.18, 1.21) | t=1.6, p=0.12 |
1 Median (Q1, Q3) |
6.4 add_difference()
- t-tests for continuous variables
- test for equality of proportions
```{r}
|>
trial select(trt, age, marker, response, death) %>%
tbl_summary(
by = trt,
statistic =
list(
all_continuous() ~ "{mean} ({sd})",
all_dichotomous() ~ "{p}%"
),missing = "no"
|>
) add_n() |>
add_difference()
## controlling decimal points
|>
trial select(trt, age, marker, response, death) %>%
tbl_summary(
by = trt,
statistic =
list(
all_continuous() ~ "{mean} ({sd})",
all_dichotomous() ~ "{p}%"
),digits = list(all_continuous() ~ 2,
all_dichotomous() ~ 2),
missing = "no"
|>
) add_n() |>
add_difference() |>
modify_fmt_fun(
c(conf.low, conf.high) ~ label_style_number(digits = 2),
p.value = label_style_pvalue(digits = 2)
)```
Characteristic | N | Drug A N = 981 |
Drug B N = 1021 |
Difference2 | 95% CI2 | p-value2 |
---|---|---|---|---|---|---|
Age | 189 | 47 (15) | 47 (14) | -0.44 | -4.6, 3.7 | 0.8 |
Marker Level (ng/mL) | 190 | 1.02 (0.89) | 0.82 (0.83) | 0.20 | -0.05, 0.44 | 0.12 |
Tumor Response | 193 | 29% | 34% | -4.2% | -18%, 9.9% | 0.6 |
Patient Died | 200 | 53% | 59% | -5.8% | -21%, 9.0% | 0.5 |
Abbreviation: CI = Confidence Interval | ||||||
1 Mean (SD); % | ||||||
2 Welch Two Sample t-test; 2-sample test for equality of proportions with continuity correction |
Characteristic | N | Drug A N = 981 |
Drug B N = 1021 |
Difference2 | 95% CI2 | p-value2 |
---|---|---|---|---|---|---|
Age | 189 | 47.01 (14.71) | 47.45 (14.01) | -0.44 | -4.57, 3.69 | 0.83 |
Marker Level (ng/mL) | 190 | 1.02 (0.89) | 0.82 (0.83) | 0.20 | -0.05, 0.44 | 0.12 |
Tumor Response | 193 | 29.47% | 33.67% | -4.2% | -0.18, 0.10 | 0.64 |
Patient Died | 200 | 53.06% | 58.82% | -5.8% | -0.21, 0.09 | 0.50 |
Abbreviation: CI = Confidence Interval | ||||||
1 Mean (SD); % | ||||||
2 Welch Two Sample t-test; 2-sample test for equality of proportions with continuity correction |
7 ANCOVA Table
```{r}
# ANCOVA adjusted for grade and stage
|>
trial #select(trt, age, marker, grade, stage)
tbl_summary(
by = trt,
statistic = list(all_continuous() ~ "{mean} ({sd})"),
missing = "no",
include = c(age, marker, ttdeath, trt)
|>
) add_n() |>
add_difference(adj.vars = c(grade, stage))
```
Characteristic | N | Drug A N = 981 |
Drug B N = 1021 |
Adjusted Difference2 | 95% CI2 | p-value2 |
---|---|---|---|---|---|---|
Age | 189 | 47 (15) | 47 (14) | -0.36 | -4.5, 3.8 | 0.9 |
Marker Level (ng/mL) | 190 | 1.02 (0.89) | 0.82 (0.83) | 0.19 | -0.05, 0.43 | 0.12 |
Months to Death/Censor | 200 | 20.2 (5.0) | 19.0 (5.5) | 1.0 | -0.38, 2.5 | 0.15 |
Abbreviation: CI = Confidence Interval | ||||||
1 Mean (SD) | ||||||
2 ANCOVA |
8 Regression model with tbl_regression()
8.1 Traditoinal Logistic model
```{r}
<- trial |> glm(
m1 ~ age + stage,
response data = _,
family = binomial(link = "logit")
)
summary(m1)
```
Call:
glm(formula = response ~ age + stage, family = binomial(link = "logit"),
data = trial)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.48622 0.62023 -2.396 0.0166 *
age 0.01939 0.01147 1.691 0.0909 .
stageT2 -0.54143 0.44000 -1.231 0.2185
stageT3 -0.05953 0.45042 -0.132 0.8948
stageT4 -0.23109 0.44823 -0.516 0.6062
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 228.58 on 182 degrees of freedom
Residual deviance: 223.93 on 178 degrees of freedom
(17 observations deleted due to missingness)
AIC: 233.93
Number of Fisher Scoring iterations: 4
8.2 Table Using tbl_regression
```{r}
|>
m1 tbl_regression() # same result as the above
# customize
|>
m1 tbl_regression(
exponentiate = TRUE,
pvalue_fun = label_style_pvalue(digits = 2)
|>
) add_global_p() |>
bold_p(t = 0.10) |>
bold_labels() |>
add_glance_table(
include = c(nobs, logLik, AIC, BIC)
)```
Characteristic | log(OR) | 95% CI | p-value |
---|---|---|---|
Age | 0.02 | 0.00, 0.04 | 0.091 |
T Stage | |||
T1 | — | — | |
T2 | -0.54 | -1.4, 0.31 | 0.2 |
T3 | -0.06 | -0.95, 0.82 | 0.9 |
T4 | -0.23 | -1.1, 0.64 | 0.6 |
Abbreviations: CI = Confidence Interval, OR = Odds Ratio |
Characteristic | OR | 95% CI | p-value |
---|---|---|---|
Age | 1.02 | 1.00, 1.04 | 0.087 |
T Stage | 0.62 | ||
T1 | — | — | |
T2 | 0.58 | 0.24, 1.37 | |
T3 | 0.94 | 0.39, 2.28 | |
T4 | 0.79 | 0.33, 1.90 | |
No. Obs. | 183 | ||
Log-likelihood | -112 | ||
AIC | 234 | ||
BIC | 250 | ||
Abbreviations: CI = Confidence Interval, OR = Odds Ratio |
8.3 Multiple regression with OLS
tbl_regression()
```{r}
<- mpg |>
mpg_model lm(hwy ~ displ+ cyl + drv, data = _)
|>
mpg_model summary() # same as below
|>
mpg_model tbl_regression() |>
add_n() |>
add_vif() |>
bold_labels()
```
Call:
lm(formula = hwy ~ displ + cyl + drv, data = mpg)
Residuals:
Min 1Q Median 3Q Max
-8.7095 -2.0282 -0.1297 1.3760 13.8110
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 33.0915 1.0306 32.108 < 2e-16 ***
displ -1.1245 0.4614 -2.437 0.0156 *
cyl -1.4526 0.3334 -4.357 1.99e-05 ***
drvf 5.0446 0.5134 9.826 < 2e-16 ***
drvr 4.8851 0.7116 6.864 6.20e-11 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.968 on 229 degrees of freedom
Multiple R-squared: 0.7559, Adjusted R-squared: 0.7516
F-statistic: 177.2 on 4 and 229 DF, p-value: < 2.2e-16
Characteristic | N | Beta | 95% CI | p-value | GVIF | Adjusted GVIF1 |
---|---|---|---|---|---|---|
displ | 234 | -1.1 | -2.0, -0.22 | 0.016 | 9.4 | 3.1 |
cyl | 234 | -1.5 | -2.1, -0.80 | <0.001 | 7.6 | 2.8 |
drv | 234 | 2.0 | 1.2 | |||
4 | — | — | ||||
f | 5.0 | 4.0, 6.1 | <0.001 | |||
r | 4.9 | 3.5, 6.3 | <0.001 | |||
Abbreviations: CI = Confidence Interval, GVIF = Generalized Variance Inflation Factor | ||||||
1 GVIF^[1/(2*df)] |
9 Univariate Regression
- the same as simple regression
Univariate regression
Analyzes the relationship between one dependent variable and one independent variable, while “regression
” generally refers to a broader class of statistical methods that examine relationships between a dependent variable and one or more independent variables.
The function is a
wrapper
fortbl_regression()
, and as a result, accepts nearly identical function arguments.
9.1 Dichotomous DV
```{r}
|>
trial tbl_uvregression(
method = glm,
y = response,
include = c(age, stage),
method.args = list(family = binomial),
exponentiate = TRUE,
pvalue_fun = label_style_pvalue(digits = 2)
|>
) add_global_p() |>
add_q() |>
bold_p(t = 0.10, q = TRUE) |>
bold_labels()
```
Characteristic | N | OR | 95% CI | p-value | q-value1 |
---|---|---|---|---|---|
Age | 183 | 1.02 | 1.00, 1.04 | 0.091 | 0.18 |
T Stage | 193 | 0.58 | 0.58 | ||
T1 | — | — | |||
T2 | 0.63 | 0.27, 1.46 | |||
T3 | 1.13 | 0.48, 2.68 | |||
T4 | 0.83 | 0.36, 1.92 | |||
Abbreviations: CI = Confidence Interval, OR = Odds Ratio | |||||
1 False discovery rate correction for multiple testing |
9.2 Continuous DV
```{r}
|>
mpg tbl_uvregression(
method = lm,
y = hwy,
include = c(displ, cyl, drv),
pvalue_fun = label_style_pvalue(digits = 2)
)
# cf
|>
mpg lm(hwy ~ displ, data = _) |>
summary()
|>
mpg lm(hwy ~ cyl, data = _) |>
summary()
|>
mpg lm(hwy ~ drv, data = _) |>
summary()
```
Characteristic | N | Beta | 95% CI | p-value |
---|---|---|---|---|
displ | 234 | -3.5 | -3.9, -3.1 | <0.001 |
cyl | 234 | -2.8 | -3.1, -2.5 | <0.001 |
drv | 234 | |||
4 | — | — | ||
f | 9.0 | 7.9, 10 | <0.001 | |
r | 1.8 | 0.03, 3.6 | 0.047 | |
Abbreviation: CI = Confidence Interval |
Call:
lm(formula = hwy ~ displ, data = mpg)
Residuals:
Min 1Q Median 3Q Max
-7.1039 -2.1646 -0.2242 2.0589 15.0105
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 35.6977 0.7204 49.55 <2e-16 ***
displ -3.5306 0.1945 -18.15 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.836 on 232 degrees of freedom
Multiple R-squared: 0.5868, Adjusted R-squared: 0.585
F-statistic: 329.5 on 1 and 232 DF, p-value: < 2.2e-16
Call:
lm(formula = hwy ~ cyl, data = mpg)
Residuals:
Min 1Q Median 3Q Max
-8.7579 -2.4968 0.2421 2.4379 15.2421
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 40.0190 0.9591 41.72 <2e-16 ***
cyl -2.8153 0.1571 -17.92 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.865 on 232 degrees of freedom
Multiple R-squared: 0.5805, Adjusted R-squared: 0.5787
F-statistic: 321.1 on 1 and 232 DF, p-value: < 2.2e-16
Call:
lm(formula = hwy ~ drv, data = mpg)
Residuals:
Min 1Q Median 3Q Max
-11.160 -2.175 -1.000 1.960 15.840
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 19.1748 0.4037 47.501 <2e-16 ***
drvf 8.9856 0.5668 15.852 <2e-16 ***
drvr 1.8252 0.9134 1.998 0.0469 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 4.097 on 231 degrees of freedom
Multiple R-squared: 0.5307, Adjusted R-squared: 0.5266
F-statistic: 130.6 on 2 and 231 DF, p-value: < 2.2e-16
10 Combining tables in columns or rows
tbl_merge
: in columnstbl_stack
: in rows
```{r}
<- mpg |>
r1 lm(hwy ~ displ + cyl + drv, data = _) |>
tbl_regression() |>
add_n() |>
add_vif() |>
bold_labels()
<- mpg |>
r2 lm(cty ~ displ + cyl + drv, data = _) |>
tbl_regression() |>
add_n() |>
add_vif() |>
bold_labels()
tbl_merge(list(r1, r2))
tbl_merge(list(r1, r2), tab_spanner = c("Highway Mileage", "City Mileage"))
tbl_stack(list(r1, r2))
tbl_stack(list(r1, r2), group_header = c("Highway Mileage", "City Mileage"))
```
Characteristic |
Table 1
|
Table 2
|
||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
N | Beta | 95% CI | p-value | GVIF | Adjusted GVIF1 | N | Beta | 95% CI | p-value | GVIF | Adjusted GVIF1 | |
displ | 234 | -1.1 | -2.0, -0.22 | 0.016 | 9.4 | 3.1 | 234 | -0.74 | -1.4, -0.06 | 0.032 | 9.4 | 3.1 |
cyl | 234 | -1.5 | -2.1, -0.80 | <0.001 | 7.6 | 2.8 | 234 | -1.3 | -1.8, -0.81 | <0.001 | 7.6 | 2.8 |
drv | 234 | 2.0 | 1.2 | 234 | 2.0 | 1.2 | ||||||
4 | — | — | — | — | ||||||||
f | 5.0 | 4.0, 6.1 | <0.001 | 2.5 | 1.7, 3.3 | <0.001 | ||||||
r | 4.9 | 3.5, 6.3 | <0.001 | 2.2 | 1.1, 3.2 | <0.001 | ||||||
Abbreviations: CI = Confidence Interval, GVIF = Generalized Variance Inflation Factor | ||||||||||||
1 GVIF^[1/(2*df)] |
Characteristic |
Highway Mileage
|
City Mileage
|
||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
N | Beta | 95% CI | p-value | GVIF | Adjusted GVIF1 | N | Beta | 95% CI | p-value | GVIF | Adjusted GVIF1 | |
displ | 234 | -1.1 | -2.0, -0.22 | 0.016 | 9.4 | 3.1 | 234 | -0.74 | -1.4, -0.06 | 0.032 | 9.4 | 3.1 |
cyl | 234 | -1.5 | -2.1, -0.80 | <0.001 | 7.6 | 2.8 | 234 | -1.3 | -1.8, -0.81 | <0.001 | 7.6 | 2.8 |
drv | 234 | 2.0 | 1.2 | 234 | 2.0 | 1.2 | ||||||
4 | — | — | — | — | ||||||||
f | 5.0 | 4.0, 6.1 | <0.001 | 2.5 | 1.7, 3.3 | <0.001 | ||||||
r | 4.9 | 3.5, 6.3 | <0.001 | 2.2 | 1.1, 3.2 | <0.001 | ||||||
Abbreviations: CI = Confidence Interval, GVIF = Generalized Variance Inflation Factor | ||||||||||||
1 GVIF^[1/(2*df)] |
Characteristic | N | Beta | 95% CI | p-value | GVIF | Adjusted GVIF1 |
---|---|---|---|---|---|---|
displ | 234 | -1.1 | -2.0, -0.22 | 0.016 | 9.4 | 3.1 |
cyl | 234 | -1.5 | -2.1, -0.80 | <0.001 | 7.6 | 2.8 |
drv | 234 | 2.0 | 1.2 | |||
4 | — | — | ||||
f | 5.0 | 4.0, 6.1 | <0.001 | |||
r | 4.9 | 3.5, 6.3 | <0.001 | |||
displ | 234 | -0.74 | -1.4, -0.06 | 0.032 | 9.4 | 3.1 |
cyl | 234 | -1.3 | -1.8, -0.81 | <0.001 | 7.6 | 2.8 |
drv | 234 | 2.0 | 1.2 | |||
4 | — | — | ||||
f | 2.5 | 1.7, 3.3 | <0.001 | |||
r | 2.2 | 1.1, 3.2 | <0.001 | |||
Abbreviations: CI = Confidence Interval, GVIF = Generalized Variance Inflation Factor | ||||||
1 GVIF^[1/(2*df)] |
Characteristic | N | Beta | 95% CI | p-value | GVIF | Adjusted GVIF1 |
---|---|---|---|---|---|---|
Highway Mileage | ||||||
displ | 234 | -1.1 | -2.0, -0.22 | 0.016 | 9.4 | 3.1 |
cyl | 234 | -1.5 | -2.1, -0.80 | <0.001 | 7.6 | 2.8 |
drv | 234 | 2.0 | 1.2 | |||
4 | — | — | ||||
f | 5.0 | 4.0, 6.1 | <0.001 | |||
r | 4.9 | 3.5, 6.3 | <0.001 | |||
City Mileage | ||||||
displ | 234 | -0.74 | -1.4, -0.06 | 0.032 | 9.4 | 3.1 |
cyl | 234 | -1.3 | -1.8, -0.81 | <0.001 | 7.6 | 2.8 |
drv | 234 | 2.0 | 1.2 | |||
4 | — | — | ||||
f | 2.5 | 1.7, 3.3 | <0.001 | |||
r | 2.2 | 1.1, 3.2 | <0.001 | |||
Abbreviations: CI = Confidence Interval, GVIF = Generalized Variance Inflation Factor | ||||||
1 GVIF^[1/(2*df)] |
11 themes
```{r}
# With a default theme
|>
mpg lm(hwy ~ displ + cyl + drv, data = _) |>
tbl_regression() |>
add_n() |>
add_vif() |>
bold_labels()
# Journal of American Medical Association Theme
theme_gtsummary_journal(journal = "jama")
|>
mpg lm(hwy ~ displ + cyl + drv, data = _) |>
tbl_regression() |>
add_n() |>
add_vif() |>
bold_labels()
reset_gtsummary_theme()
# The Quarterly Journal of Economics
theme_gtsummary_journal(journal = "qjecon")
|>
mpg lm(hwy ~ displ + cyl + drv, data = _) |>
tbl_regression() |>
add_n() |>
add_vif() |>
bold_labels()
```
Characteristic | N | Beta | 95% CI | p-value | GVIF | Adjusted GVIF1 |
---|---|---|---|---|---|---|
displ | 234 | -1.1 | -2.0, -0.22 | 0.016 | 9.4 | 3.1 |
cyl | 234 | -1.5 | -2.1, -0.80 | <0.001 | 7.6 | 2.8 |
drv | 234 | 2.0 | 1.2 | |||
4 | — | — | ||||
f | 5.0 | 4.0, 6.1 | <0.001 | |||
r | 4.9 | 3.5, 6.3 | <0.001 | |||
Abbreviations: CI = Confidence Interval, GVIF = Generalized Variance Inflation Factor | ||||||
1 GVIF^[1/(2*df)] |
Characteristic | N | Beta (95% CI) | p-value | GVIF | Adjusted GVIF1 |
---|---|---|---|---|---|
displ | 234 | -1.1 (-2.0 to -0.22) | 0.016 | 9.4 | 3.1 |
cyl | 234 | -1.5 (-2.1 to -0.80) | <0.001 | 7.6 | 2.8 |
drv | 234 | 2.0 | 1.2 | ||
4 | — | ||||
f | 5.0 (4.0 to 6.1) | <0.001 | |||
r | 4.9 (3.5 to 6.3) | <0.001 | |||
Abbreviations: CI = Confidence Interval, GVIF = Generalized Variance Inflation Factor | |||||
1 GVIF^[1/(2*df)] |
Characteristic | N | Beta (SE)1 |
GVIF | Adjusted GVIF2 |
---|---|---|---|---|
displ | 234 | -1.1* (0.461) | 9.4 | 3.1 |
cyl | 234 | -1.5*** (0.333) | 7.6 | 2.8 |
drv | 234 | 2.0 | 1.2 | |
4 | — | |||
f | 5.0*** (0.513) | |||
r | 4.9*** (0.712) | |||
Abbreviations: GVIF = Generalized Variance Inflation Factor, SE = Standard Error | ||||
1 *p<0.05; **p<0.01; ***p<0.001 | ||||
2 GVIF^[1/(2*df)] |
12 References
- R Studio Cheatsheets: https://www.rstudio.com/resources/cheatsheets/
- R4DS Book: https://r4ds.had.co.nz/data-visualisation.html
- gtsummary’s cross tab: https://www.danieldsjoberg.com/gtsummary/articles/tbl_summary.html
- gtsummary’s regression table: https://www.danieldsjoberg.com/gtsummary/articles/tbl_regression.html
- Daniel Sjoberg’s presentation: https://www.danieldsjoberg.com/gtsummary-weill-cornell-presentation/