1 Loading Packages

```{r loading up packages}
library(tidyverse) # mega package containing 8 packages
library(gt)
library(gtsummary)
library(sjPlot)
```

2 gtSummary package

Tip

gtsummary: https://www.danieldsjoberg.com/gtsummary/articles/tbl_summary.html

3 Experimental Design:

Trial data

trial: data frame with 200 rows-one row per patient

trt: Chemotherapy Treatment
age: Age
marker: Marker Level (ng/mL): the concentration of a specific protein or substance, measured in nanograms per milliliter (ng/mL), that can be detected in a blood sample and may indicate the presence or progression of cancer.
stage T Stage: The T stage in cancer describes the size and extent of the primary tumor. It’s part of the TNM staging system, which is the most common way to stage cancer.

T stage	Meaning
T0	No evidence of a tumor
T1	A small tumor
T2	A larger tumor that has grown into nearby tissue
T3	A larger tumor that has grown into nearby tissue
T4	A larger or more advanced tumor that has grown into nearby tissue
TX	The tumor cannot be measured
Tis	The tumor is still within the confines of the normal glands and cannot metastasize

grade: Grade: describes how abnormal the cancer cells look under a microscope compared to normal cells, with higher grades indicating more rapid growth and a greater likelihood of spread.�
response: Tumor Response
death: Patient Died
ttdeath: Months to Death/Censor

3.1 `tbl_summary()`

Designed for Descriptive Statistics

```{r}
# data()
trial
class(trial)
glimpse(trial)
view_df(trial)

trial2 <- trial |> 
  select(trt, age, grade, response)

trial2 |> 
  tbl_summary()
```

# A tibble: 200 × 8
   trt      age marker stage grade response death ttdeath
   <chr>  <dbl>  <dbl> <fct> <fct>    <int> <int>   <dbl>
 1 Drug A    23  0.16  T1    II           0     0    24  
 2 Drug B     9  1.11  T2    I            1     0    24  
 3 Drug A    31  0.277 T1    II           0     0    24  
 4 Drug A    NA  2.07  T3    III          1     1    17.6
 5 Drug A    51  2.77  T4    III          1     1    16.4
 6 Drug B    39  0.613 T4    I            0     1    15.6
 7 Drug A    37  0.354 T1    II           0     0    24  
 8 Drug A    32  1.74  T1    I            0     1    18.4
 9 Drug A    31  0.144 T1    II           0     0    24  
10 Drug B    34  0.205 T3    I            0     1    10.5
# ℹ 190 more rows
[1] "tbl_df"     "tbl"        "data.frame"
Rows: 200
Columns: 8
$ trt      <chr> "Drug A", "Drug B", "Drug A", "Drug A", "Drug A", "Drug B", "…
$ age      <dbl> 23, 9, 31, NA, 51, 39, 37, 32, 31, 34, 42, 63, 54, 21, 48, 71…
$ marker   <dbl> 0.160, 1.107, 0.277, 2.067, 2.767, 0.613, 0.354, 1.739, 0.144…
$ stage    <fct> T1, T2, T1, T3, T4, T4, T1, T1, T1, T3, T1, T3, T4, T4, T1, T…
$ grade    <fct> II, I, II, III, III, I, II, I, II, I, III, I, III, I, I, III,…
$ response <int> 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0…
$ death    <int> 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0…
$ ttdeath  <dbl> 24.00, 24.00, 24.00, 17.64, 16.43, 15.64, 24.00, 18.43, 24.00…

Data frame: trial
ID	Name	Label	Values	Value Labels
1	trt	Chemotherapy Treatment		<output omitted>
2	age	Age	range: 6-83
3	marker	Marker Level (ng/mL)	range: 0.0-3.9
4	stage	T Stage		T1 T2 T3 T4
5	grade	Grade		I II III
6	response	Tumor Response	range: 0-1
7	death	Patient Died	range: 0-1
8	ttdeath	Months to Death/Censor	range: 3.5-24.0

Characteristic	N = 200¹
Chemotherapy Treatment
Drug A	98 (49%)
Drug B	102 (51%)
Age	47 (38, 57)
Unknown	11
Grade
I	68 (34%)
II	68 (34%)
III	64 (32%)
Tumor Response	61 (32%)
Unknown	7
¹ n (%); Median (Q1, Q3)

4 Differences between groups

4.1 Wilcoxon Rank sum test

using median, which is a default

```{r}
trial |> 
  select(trt, age, marker, ttdeath) |> 
  tbl_summary(by = trt) |> 
  add_p() 

# modifying
trial |> 
  select(trt, age, marker, ttdeath) |>
  tbl_summary(by = trt) |> 
  add_p(
    pvalue_fun = label_style_pvalue(digits = 2)
    ) |> 
  add_overall() # add overall statistics
```

Characteristic	Drug A N = 98¹	Drug B N = 102¹	p-value²
Age	46 (37, 60)	48 (39, 56)	0.7
Unknown	7	4
Marker Level (ng/mL)	0.84 (0.23, 1.60)	0.52 (0.18, 1.21)	0.085
Unknown	6	4
Months to Death/Censor	23.5 (17.4, 24.0)	21.2 (14.5, 24.0)	0.14
¹ Median (Q1, Q3)
² Wilcoxon rank sum test

Characteristic	Overall N = 200¹	Drug A N = 98¹	Drug B N = 102¹	p-value²
Age	47 (38, 57)	46 (37, 60)	48 (39, 56)	0.72
Unknown	11	7	4
Marker Level (ng/mL)	0.64 (0.22, 1.41)	0.84 (0.23, 1.60)	0.52 (0.18, 1.21)	0.085
Unknown	10	6	4
Months to Death/Censor	22.4 (15.9, 24.0)	23.5 (17.4, 24.0)	21.2 (14.5, 24.0)	0.14
¹ Median (Q1, Q3)
² Wilcoxon rank sum test

4.2 Welch Two Samples T-test

```{r}
# be careful
trial |> 
  select(trt, age, marker, ttdeath) |> 
  tbl_summary(
    by = trt,
    statistic = list(
      c(age, marker, ttdeath) ~ "{mean} ({sd})"),
    missing = "no"
  ) |> 
  add_p() # uses Wilcoxon rank sum test

trial |> 
  select(trt, age, marker, ttdeath) |> 
  tbl_summary(
    by = trt,
    statistic = list(
      c(age, marker, ttdeath) ~ "{mean} ({sd})"),
    missing = "no"
  ) |> 
  add_difference(
    pvalue_fun = label_style_pvalue(digits = 2)
    ) |> # Welch two-sample t-test
  #modify_caption("**Table 1. Effectiveness of Drugs**") |> 
  as_gt() |> 
  tab_header(title = md("**Table 1. Effectiveness of Drugs**"),
             subtitle = "Patient Characteristics")
```

Characteristic	Drug A N = 98¹	Drug B N = 102¹	p-value²
Age	47 (15)	47 (14)	0.7
Marker Level (ng/mL)	1.02 (0.89)	0.82 (0.83)	0.085
Months to Death/Censor	20.2 (5.0)	19.0 (5.5)	0.14
¹ Mean (SD)
² Wilcoxon rank sum test

4.3 Chi-square with `tbl_summary`

```{r}
trial |> 
  select(trt, stage, grade) |> 
  tbl_summary(by = trt) |> 
  add_p(pvalue_fun = label_style_pvalue(digits = 2)) |> 
  add_overall()
```

Characteristic	Overall N = 200¹	Drug A N = 98¹	Drug B N = 102¹	p-value²
T Stage				0.87
T1	53 (27%)	28 (29%)	25 (25%)
T2	54 (27%)	25 (26%)	29 (28%)
T3	43 (22%)	22 (22%)	21 (21%)
T4	50 (25%)	23 (23%)	27 (26%)
Grade				0.87
I	68 (34%)	35 (36%)	33 (32%)
II	68 (34%)	32 (33%)	36 (35%)
III	64 (32%)	31 (32%)	33 (32%)
¹ n (%)
² Pearson’s Chi-squared test

4.4 Customizing `tbl_summary()` output

```{r}
trial2 |> 
  tbl_summary(
    by = trt,
    type = age ~ "continuous2",
    statistic = 
      list(age ~ c("{mean} ({sd})", "{min}, {max}"),
           response ~ "{n}/{N} ({p}%)"),
    label = grade ~ "Pathological Tumor Grade",
    digits = age ~ 2
    ) |> 
  add_p(pvalue_fun = label_style_pvalue(digits = 2)) |> 
  add_q(method = "bonferroni") # p-values adjusted for multiple comparison
```

Characteristic	Drug A N = 98¹	Drug B N = 102¹	p-value²	q-value³
Age			0.72	>0.99
Mean (SD)	47.01 (14.71)	47.45 (14.01)
Min, Max	6.00, 78.00	9.00, 83.00
Unknown	7	4
Pathological Tumor Grade			0.87	>0.99
I	35 (36%)	33 (32%)
II	32 (33%)	36 (35%)
III	31 (32%)	33 (32%)
Tumor Response	28/95 (29%)	33/98 (34%)	0.53	>0.99
Unknown	3	4
¹ n (%); n/N (%)
² Wilcoxon rank sum test; Pearson’s Chi-squared test
³ Bonferroni correction for multiple testing

4.4.1 `as_gt()`

how to combine gtSummary with gt objects

```{r}
trial |> 
  select(trt, marker, response) |> 
  tbl_summary(by = trt)

trial |> 
  select(trt, marker, response) |> 
  tbl_summary(
    by = trt,
    statistic = list(
      marker ~ "{mean} ({sd})",
      response ~ "{p}%"),
    missing = "no"
  ) |> 
  add_difference(
    pvalue_fun = label_style_pvalue(digits = 2)
  ) |> 
  #modify_caption("**Table 1. Effectiveness of Drugs**") |> 
  as_gt() |> 
  tab_header(title = md("**Table 1. Effectiveness of Drugs**"),
             subtitle = "Patient Characteristics")
```

Characteristic	Drug A N = 98¹	Drug B N = 102¹
Marker Level (ng/mL)	0.84 (0.23, 1.60)	0.52 (0.18, 1.21)
Unknown	6	4
Tumor Response	28 (29%)	33 (34%)
Unknown	3	4
¹ Median (Q1, Q3); n (%)

4.5 Chi-Square with `tbl_cross()`

Cross-tabulation with trial data

```{r}
trial |> 
  tbl_cross(
    row = trt,
    col = stage,
    percent = "column"
  ) |> 
  add_p() |> 
  bold_labels()

# cf

trial |> 
  select(trt, stage) |> 
  tbl_summary(by = stage) |> 
  add_p() |> 
  bold_labels()
```

	T Stage				Total	p-value¹
	T1	T2	T3	T4	Total	p-value¹
Chemotherapy Treatment						0.9
Drug A	28 (53%)	25 (46%)	22 (51%)	23 (46%)	98 (49%)
Drug B	25 (47%)	29 (54%)	21 (49%)	27 (54%)	102 (51%)
Total	53 (100%)	54 (100%)	43 (100%)	50 (100%)	200 (100%)
¹ Pearson’s Chi-squared test

Characteristic	T1 N = 53¹	T2 N = 54¹	T3 N = 43¹	T4 N = 50¹	p-value²
Chemotherapy Treatment					0.9
Drug A	28 (53%)	25 (46%)	22 (51%)	23 (46%)
Drug B	25 (47%)	29 (54%)	21 (49%)	27 (54%)
¹ n (%)
² Pearson’s Chi-squared test

Tip

A chi-square test can be done with either tbl-summary or tbl-cross, but the latter produces more attractive table.
The former is better when the table includes multiple types of statistical tests.

5 Survey Data

5.1 `tbl_summary()`

```{r}
gss_cat
class(gss_cat) # builtin data from forcats package
view_df(gss_cat)

glimpse(gss_cat)
gss_cat |> 
  tbl_summary()

# by race

gss_cat |> 
  #count(race)
  mutate(race = fct_drop(race)) |> 
  tbl_summary(by = race) 
```

# A tibble: 21,483 × 9
    year marital         age race  rincome        partyid    relig denom tvhours
   <int> <fct>         <int> <fct> <fct>          <fct>      <fct> <fct>   <int>
 1  2000 Never married    26 White $8000 to 9999  Ind,near … Prot… Sout…      12
 2  2000 Divorced         48 White $8000 to 9999  Not str r… Prot… Bapt…      NA
 3  2000 Widowed          67 White Not applicable Independe… Prot… No d…       2
 4  2000 Never married    39 White Not applicable Ind,near … Orth… Not …       4
 5  2000 Divorced         25 White Not applicable Not str d… None  Not …       1
 6  2000 Married          25 White $20000 - 24999 Strong de… Prot… Sout…      NA
 7  2000 Never married    36 White $25000 or more Not str r… Chri… Not …       3
 8  2000 Divorced         44 White $7000 to 7999  Ind,near … Prot… Luth…      NA
 9  2000 Married          44 White $25000 or more Not str d… Prot… Other       0
10  2000 Married          47 White $25000 or more Strong re… Prot… Sout…       3
# ℹ 21,473 more rows
[1] "tbl_df"     "tbl"        "data.frame"

Data frame: gss_cat
ID	Name	Label	Values	Value Labels
1	year		range: 2000-2014
2	marital			No answer Never married Separated Divorced Widowed Married
3	age		range: 18-89
4	race			Other Black White Not applicable
5	rincome			No answer Don't know Refused $25000 or more $20000 - 24999 $15000 - 19999 $10000 - 14999 $8000 to 9999 $7000 to 7999 $6000 to 6999 $5000 to 5999 $4000 to 4999 $3000 to 3999 $1000 to 2999 Lt $1000 <... truncated>
6	partyid			No answer Don't know Other party Strong republican Not str republican Ind,near rep Independent Ind,near dem Not str democrat Strong democrat
7	relig			No answer Don't know Inter-nondenominational Native american Christian Orthodox-christian Moslem/islam Other eastern Hinduism Buddhism Other None Jewish Catholic Protestant <... truncated>
8	denom			No answer Don't know No denomination Other Episcopal Presbyterian-dk wh Presbyterian, merged Other presbyterian United pres ch in us Presbyterian c in us Lutheran-dk which Evangelical luth Other lutheran Wi evan luth synod Lutheran-mo synod <... truncated>
9	tvhours		range: 0-24

Rows: 21,483
Columns: 9
$ year    <int> 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 20…
$ marital <fct> Never married, Divorced, Widowed, Never married, Divorced, Mar…
$ age     <int> 26, 48, 67, 39, 25, 25, 36, 44, 44, 47, 53, 52, 52, 51, 52, 40…
$ race    <fct> White, White, White, White, White, White, White, White, White,…
$ rincome <fct> $8000 to 9999, $8000 to 9999, Not applicable, Not applicable, …
$ partyid <fct> "Ind,near rep", "Not str republican", "Independent", "Ind,near…
$ relig   <fct> Protestant, Protestant, Protestant, Orthodox-christian, None, …
$ denom   <fct> "Southern baptist", "Baptist-dk which", "No denomination", "No…
$ tvhours <int> 12, NA, 2, 4, 1, NA, 3, NA, 0, 3, 2, NA, 1, NA, 1, 7, NA, 3, 3…

Characteristic	N = 21,483¹
year
2000	2,817 (13%)
2002	2,765 (13%)
2004	2,812 (13%)
2006	4,510 (21%)
2008	2,023 (9.4%)
2010	2,044 (9.5%)
2012	1,974 (9.2%)
2014	2,538 (12%)
marital
No answer	17 (<0.1%)
Never married	5,416 (25%)
Separated	743 (3.5%)
Divorced	3,383 (16%)
Widowed	1,807 (8.4%)
Married	10,117 (47%)
age	46 (33, 59)
Unknown	76
race
Other	1,959 (9.1%)
Black	3,129 (15%)
White	16,395 (76%)
Not applicable	0 (0%)
rincome
No answer	183 (0.9%)
Don't know	267 (1.2%)
Refused	975 (4.5%)
$25000 or more	7,363 (34%)
$20000 - 24999	1,283 (6.0%)
$15000 - 19999	1,048 (4.9%)
$10000 - 14999	1,168 (5.4%)
$8000 to 9999	340 (1.6%)
$7000 to 7999	188 (0.9%)
$6000 to 6999	215 (1.0%)
$5000 to 5999	227 (1.1%)
$4000 to 4999	226 (1.1%)
$3000 to 3999	276 (1.3%)
$1000 to 2999	395 (1.8%)
Lt $1000	286 (1.3%)
Not applicable	7,043 (33%)
partyid
No answer	154 (0.7%)
Don't know	1 (<0.1%)
Other party	393 (1.8%)
Strong republican	2,314 (11%)
Not str republican	3,032 (14%)
Ind,near rep	1,791 (8.3%)
Independent	4,119 (19%)
Ind,near dem	2,499 (12%)
Not str democrat	3,690 (17%)
Strong democrat	3,490 (16%)
relig
No answer	93 (0.4%)
Don't know	15 (<0.1%)
Inter-nondenominational	109 (0.5%)
Native american	23 (0.1%)
Christian	689 (3.2%)
Orthodox-christian	95 (0.4%)
Moslem/islam	104 (0.5%)
Other eastern	32 (0.1%)
Hinduism	71 (0.3%)
Buddhism	147 (0.7%)
Other	224 (1.0%)
None	3,523 (16%)
Jewish	388 (1.8%)
Catholic	5,124 (24%)
Protestant	10,846 (50%)
Not applicable	0 (0%)
denom
No answer	117 (0.5%)
Don't know	52 (0.2%)
No denomination	1,683 (7.8%)
Other	2,534 (12%)
Episcopal	397 (1.8%)
Presbyterian-dk wh	244 (1.1%)
Presbyterian, merged	67 (0.3%)
Other presbyterian	47 (0.2%)
United pres ch in us	110 (0.5%)
Presbyterian c in us	104 (0.5%)
Lutheran-dk which	267 (1.2%)
Evangelical luth	122 (0.6%)
Other lutheran	30 (0.1%)
Wi evan luth synod	71 (0.3%)
Lutheran-mo synod	212 (1.0%)
Luth ch in america	71 (0.3%)
Am lutheran	146 (0.7%)
Methodist-dk which	239 (1.1%)
Other methodist	33 (0.2%)
United methodist	1,067 (5.0%)
Afr meth ep zion	32 (0.1%)
Afr meth episcopal	77 (0.4%)
Baptist-dk which	1,457 (6.8%)
Other baptists	213 (1.0%)
Southern baptist	1,536 (7.1%)
Nat bapt conv usa	40 (0.2%)
Nat bapt conv of am	76 (0.4%)
Am bapt ch in usa	130 (0.6%)
Am baptist asso	237 (1.1%)
Not applicable	10,072 (47%)
tvhours	2 (1, 4)
Unknown	10,146
¹ n (%); Median (Q1, Q3)

Characteristic	Other N = 1,959¹	Black N = 3,129¹	White N = 16,395¹
year
2000	175 (8.9%)	429 (14%)	2,213 (13%)
2002	167 (8.5%)	410 (13%)	2,188 (13%)
2004	201 (10%)	377 (12%)	2,234 (14%)
2006	592 (30%)	634 (20%)	3,284 (20%)
2008	183 (9.3%)	281 (9.0%)	1,559 (9.5%)
2010	183 (9.3%)	311 (9.9%)	1,550 (9.5%)
2012	196 (10%)	301 (9.6%)	1,477 (9.0%)
2014	262 (13%)	386 (12%)	1,890 (12%)
marital
No answer	2 (0.1%)	2 (<0.1%)	13 (<0.1%)
Never married	633 (32%)	1,305 (42%)	3,478 (21%)
Separated	110 (5.6%)	196 (6.3%)	437 (2.7%)
Divorced	212 (11%)	495 (16%)	2,676 (16%)
Widowed	70 (3.6%)	262 (8.4%)	1,475 (9.0%)
Married	932 (48%)	869 (28%)	8,316 (51%)
age	37 (29, 48)	42 (31, 55)	48 (35, 61)
Unknown	8	14	54
rincome
No answer	14 (0.7%)	35 (1.1%)	134 (0.8%)
Don't know	45 (2.3%)	45 (1.4%)	177 (1.1%)
Refused	92 (4.7%)	150 (4.8%)	733 (4.5%)
$25000 or more	621 (32%)	886 (28%)	5,856 (36%)
$20000 - 24999	112 (5.7%)	220 (7.0%)	951 (5.8%)
$15000 - 19999	134 (6.8%)	180 (5.8%)	734 (4.5%)
$10000 - 14999	126 (6.4%)	210 (6.7%)	832 (5.1%)
$8000 to 9999	41 (2.1%)	56 (1.8%)	243 (1.5%)
$7000 to 7999	24 (1.2%)	27 (0.9%)	137 (0.8%)
$6000 to 6999	26 (1.3%)	35 (1.1%)	154 (0.9%)
$5000 to 5999	27 (1.4%)	40 (1.3%)	160 (1.0%)
$4000 to 4999	34 (1.7%)	38 (1.2%)	154 (0.9%)
$3000 to 3999	35 (1.8%)	59 (1.9%)	182 (1.1%)
$1000 to 2999	47 (2.4%)	71 (2.3%)	277 (1.7%)
Lt $1000	36 (1.8%)	51 (1.6%)	199 (1.2%)
Not applicable	545 (28%)	1,026 (33%)	5,472 (33%)
partyid
No answer	25 (1.3%)	36 (1.2%)	93 (0.6%)
Don't know	0 (0%)	0 (0%)	1 (<0.1%)
Other party	22 (1.1%)	22 (0.7%)	349 (2.1%)
Strong republican	81 (4.1%)	56 (1.8%)	2,177 (13%)
Not str republican	156 (8.0%)	88 (2.8%)	2,788 (17%)
Ind,near rep	118 (6.0%)	92 (2.9%)	1,581 (9.6%)
Independent	612 (31%)	491 (16%)	3,016 (18%)
Ind,near dem	285 (15%)	352 (11%)	1,862 (11%)
Not str democrat	437 (22%)	746 (24%)	2,507 (15%)
Strong democrat	223 (11%)	1,246 (40%)	2,021 (12%)
relig
No answer	14 (0.7%)	16 (0.5%)	63 (0.4%)
Don't know	3 (0.2%)	3 (<0.1%)	9 (<0.1%)
Inter-nondenominational	2 (0.1%)	29 (0.9%)	78 (0.5%)
Native american	16 (0.8%)	0 (0%)	7 (<0.1%)
Christian	74 (3.8%)	141 (4.5%)	474 (2.9%)
Orthodox-christian	1 (<0.1%)	2 (<0.1%)	92 (0.6%)
Moslem/islam	42 (2.1%)	35 (1.1%)	27 (0.2%)
Other eastern	10 (0.5%)	2 (<0.1%)	20 (0.1%)
Hinduism	62 (3.2%)	1 (<0.1%)	8 (<0.1%)
Buddhism	72 (3.7%)	10 (0.3%)	65 (0.4%)
Other	29 (1.5%)	18 (0.6%)	177 (1.1%)
None	323 (16%)	384 (12%)	2,816 (17%)
Jewish	8 (0.4%)	10 (0.3%)	370 (2.3%)
Catholic	916 (47%)	207 (6.6%)	4,001 (24%)
Protestant	387 (20%)	2,271 (73%)	8,188 (50%)
Not applicable	0 (0%)	0 (0%)	0 (0%)
denom
No answer	14 (0.7%)	17 (0.5%)	86 (0.5%)
Don't know	6 (0.3%)	15 (0.5%)	31 (0.2%)
No denomination	99 (5.1%)	240 (7.7%)	1,344 (8.2%)
Other	180 (9.2%)	468 (15%)	1,886 (12%)
Episcopal	9 (0.5%)	38 (1.2%)	350 (2.1%)
Presbyterian-dk wh	15 (0.8%)	8 (0.3%)	221 (1.3%)
Presbyterian, merged	1 (<0.1%)	2 (<0.1%)	64 (0.4%)
Other presbyterian	2 (0.1%)	2 (<0.1%)	43 (0.3%)
United pres ch in us	2 (0.1%)	6 (0.2%)	102 (0.6%)
Presbyterian c in us	6 (0.3%)	5 (0.2%)	93 (0.6%)
Lutheran-dk which	6 (0.3%)	6 (0.2%)	255 (1.6%)
Evangelical luth	1 (<0.1%)	2 (<0.1%)	119 (0.7%)
Other lutheran	0 (0%)	0 (0%)	30 (0.2%)
Wi evan luth synod	0 (0%)	1 (<0.1%)	70 (0.4%)
Lutheran-mo synod	2 (0.1%)	2 (<0.1%)	208 (1.3%)
Luth ch in america	2 (0.1%)	2 (<0.1%)	67 (0.4%)
Am lutheran	3 (0.2%)	5 (0.2%)	138 (0.8%)
Methodist-dk which	3 (0.2%)	35 (1.1%)	201 (1.2%)
Other methodist	2 (0.1%)	5 (0.2%)	26 (0.2%)
United methodist	11 (0.6%)	49 (1.6%)	1,007 (6.1%)
Afr meth ep zion	0 (0%)	31 (1.0%)	1 (<0.1%)
Afr meth episcopal	0 (0%)	76 (2.4%)	1 (<0.1%)
Baptist-dk which	37 (1.9%)	697 (22%)	723 (4.4%)
Other baptists	5 (0.3%)	47 (1.5%)	161 (1.0%)
Southern baptist	30 (1.5%)	355 (11%)	1,151 (7.0%)
Nat bapt conv usa	1 (<0.1%)	35 (1.1%)	4 (<0.1%)
Nat bapt conv of am	1 (<0.1%)	58 (1.9%)	17 (0.1%)
Am bapt ch in usa	4 (0.2%)	76 (2.4%)	50 (0.3%)
Am baptist asso	6 (0.3%)	97 (3.1%)	134 (0.8%)
Not applicable	1,511 (77%)	749 (24%)	7,812 (48%)
tvhours	2 (1, 4)	3 (2, 5)	2 (1, 4)
Unknown	932	1,429	7,785
¹ n (%); Median (Q1, Q3)

5.2 `tbl_cross()`

```{r}
levels(gss_cat$race)
unique(gss_cat$race)

gss_cat_md <- gss_cat |> 
  mutate(race = fct_drop(race)) 

levels(gss_cat_md$race)
levels(gss_cat_md$marital)

## cross-tab
gss_cat_md |> 
  filter(marital != "No answer") |> # reduce the number of cells
  mutate(marital = fct_drop(marital)) |> 
  tbl_cross(
    row = marital,
    col = race,
    percent = "column",
    missing = "no"
  ) |> 
  add_p() |> 
  bold_labels()
```

[1] "Other"          "Black"          "White"          "Not applicable"
[1] White Black Other
Levels: Other Black White Not applicable
[1] "Other" "Black" "White"
[1] "No answer"     "Never married" "Separated"     "Divorced"     
[5] "Widowed"       "Married"

	race			Total	p-value¹
	Other	Black	White	Total	p-value¹
marital					<0.001
Never married	633 (32%)	1,305 (42%)	3,478 (21%)	5,416 (25%)
Separated	110 (5.6%)	196 (6.3%)	437 (2.7%)	743 (3.5%)
Divorced	212 (11%)	495 (16%)	2,676 (16%)	3,383 (16%)
Widowed	70 (3.6%)	262 (8.4%)	1,475 (9.0%)	1,807 (8.4%)
Married	932 (48%)	869 (28%)	8,316 (51%)	10,117 (47%)
Total	1,957 (100%)	3,127 (100%)	16,382 (100%)	21,466 (100%)
¹ Pearson’s Chi-squared test

6 More on `tbl_summary()`

6.1 Modifying`tbl_summary()` function argument

```{r}
trial2 <- trial |> 
  select(trt, age, grade)

trial2 |> 
  tbl_summary(by = trt)

trial2 |> 
  tbl_summary(
    by = trt,
    statistic = list(
      all_continuous() ~ "{mean} ({sd})",
      all_categorical() ~ "{n} / {N} ({p}%)"
    ),
    digits = all_continuous() ~ 2,
    label = grade ~ "Tumor Grade",
    missing_text = "(Missing)"
  ) |> 
  add_p() |> 
  add_overall()
```

Characteristic	Drug A N = 98¹	Drug B N = 102¹
Age	46 (37, 60)	48 (39, 56)
Unknown	7	4
Grade
I	35 (36%)	33 (32%)
II	32 (33%)	36 (35%)
III	31 (32%)	33 (32%)
¹ Median (Q1, Q3); n (%)

Characteristic	Overall N = 200¹	Drug A N = 98¹	Drug B N = 102¹	p-value²
Age	47.24 (14.31)	47.01 (14.71)	47.45 (14.01)	0.7
(Missing)	11	7	4
Tumor Grade				0.9
I	68 / 200 (34%)	35 / 98 (36%)	33 / 102 (32%)
II	68 / 200 (34%)	32 / 98 (33%)	36 / 102 (35%)
III	64 / 200 (32%)	31 / 98 (32%)	33 / 102 (32%)
¹ Mean (SD); n / N (%)
² Wilcoxon rank sum test; Pearson’s Chi-squared test

6.2 Formatting table with `tbl_summary()` functions

```{r}
glimpse(trial2)

# Customizing table
trial |> 
  select(trt, age, grade, response) |> 
  #filter(!is.na(age)) |> 
  tbl_summary(
    by = trt,
    missing = "no") |> 
  #show_header_names()
  add_p(pvalue_fun = label_style_pvalue(digits = 2)) |> 
  add_overall() |> 
  add_n() |> 
  add_stat_label(label = all_categorical() ~ "No. (%)") |> 
  modify_header(label ~ "**Variables**") |> 
  modify_spanning_header(c("stat_1", "stat_2") ~ "**Treatment Received**") |> 
  modify_footnote(
    all_stat_cols() ~ "Median(IQR) or Frequency (%)"
  ) |> 
  modify_caption("**Table 1. Patient Characteristics**") |> 
  bold_labels() 
```

Rows: 200
Columns: 3
$ trt   <chr> "Drug A", "Drug B", "Drug A", "Drug A", "Drug A", "Drug B", "Dru…
$ age   <dbl> 23, 9, 31, NA, 51, 39, 37, 32, 31, 34, 42, 63, 54, 21, 48, 71, 3…
$ grade <fct> II, I, II, III, III, I, II, I, II, I, III, I, III, I, I, III, II…

**Table 1. Patient Characteristics**
Variables	N	Overall N = 200¹	Treatment Received		p-value²
Variables	N	Overall N = 200¹	Drug A N = 98¹	Drug B N = 102¹	p-value²
Age, Median (Q1, Q3)	189	47 (38, 57)	46 (37, 60)	48 (39, 56)	0.72
Grade, No. (%)	200				0.87
I		68 (34%)	35 (36%)	33 (32%)
II		68 (34%)	32 (33%)	36 (35%)
III		64 (32%)	31 (32%)	33 (32%)
Tumor Response, No. (%)	193	61 (32%)	28 (29%)	33 (34%)	0.53
¹ Median(IQR) or Frequency (%)
² Wilcoxon rank sum test; Pearson’s Chi-squared test

6.3 t-test

```{r}
# function
my_ttest2 <- function(data, variable, by, ...) {
  t.test(data[[variable]] ~ as.factor(data[[by]])) |>
    broom::tidy() %>%
    dplyr::mutate(
      stat = glue::glue("t={style_sigfig(statistic)}, {style_pvalue(p.value, prepend_p = TRUE)}")
    ) %>%
    dplyr::pull(stat)
}

# t-test
trial |> 
  select(age, marker, trt) |> 
  tbl_summary(
    by = trt,
    missing = "no"
  ) |> 
  add_stat(fns = everything() ~ my_ttest2) |> 
  modify_header(add_stat_1 = "**Treatment Comparison**")

# add_difference
trial |> 
  select(age, marker, trt) |> 
  tbl_summary(
    by = trt,
    missing = "no"
  ) |> 
  add_difference()

# change default stat to mean (sd)
trial |> 
  select(age, marker, trt) |> 
  tbl_summary(
    by = trt,
    missing = "no",
    statistic = list(
      all_continuous() ~ "{mean} ({sd})",
      all_categorical() ~ "{n} / {N} ({p}%)"
    ),
  ) |> 
  add_difference()
```

Characteristic	Drug A N = 98¹	Drug B N = 102¹	Treatment Comparison
Age	46 (37, 60)	48 (39, 56)	t=-0.21, p=0.8
Marker Level (ng/mL)	0.84 (0.23, 1.60)	0.52 (0.18, 1.21)	t=1.6, p=0.12
¹ Median (Q1, Q3)

6.4 `add_difference()`

t-tests for continuous variables
test for equality of proportions

```{r}
trial |>
  select(trt, age, marker, response, death) %>%
  tbl_summary(
    by = trt,
    statistic =
      list(
        all_continuous() ~ "{mean} ({sd})",
        all_dichotomous() ~ "{p}%"
      ),
    missing = "no"
  ) |>
  add_n() |>
  add_difference()

## controlling decimal points
trial |>
  select(trt, age, marker, response, death) %>%
  tbl_summary(
    by = trt,
    statistic =
      list(
        all_continuous() ~ "{mean} ({sd})",
        all_dichotomous() ~ "{p}%"
      ),
    digits = list(all_continuous() ~ 2,
                  all_dichotomous() ~ 2),
    missing = "no"
  ) |>
  add_n() |>
  add_difference() |>
  modify_fmt_fun(
    c(conf.low, conf.high) ~ label_style_number(digits = 2),       
    p.value = label_style_pvalue(digits = 2)    
  )
```

Characteristic	N	Drug A N = 98¹	Drug B N = 102¹	Difference²	95% CI²	p-value²
Age	189	47 (15)	47 (14)	-0.44	-4.6, 3.7	0.8
Marker Level (ng/mL)	190	1.02 (0.89)	0.82 (0.83)	0.20	-0.05, 0.44	0.12
Tumor Response	193	29%	34%	-4.2%	-18%, 9.9%	0.6
Patient Died	200	53%	59%	-5.8%	-21%, 9.0%	0.5
Abbreviation: CI = Confidence Interval
¹ Mean (SD); %
² Welch Two Sample t-test; 2-sample test for equality of proportions with continuity correction

Characteristic	N	Drug A N = 98¹	Drug B N = 102¹	Difference²	95% CI²	p-value²
Age	189	47.01 (14.71)	47.45 (14.01)	-0.44	-4.57, 3.69	0.83
Marker Level (ng/mL)	190	1.02 (0.89)	0.82 (0.83)	0.20	-0.05, 0.44	0.12
Tumor Response	193	29.47%	33.67%	-4.2%	-0.18, 0.10	0.64
Patient Died	200	53.06%	58.82%	-5.8%	-0.21, 0.09	0.50
Abbreviation: CI = Confidence Interval
¹ Mean (SD); %
² Welch Two Sample t-test; 2-sample test for equality of proportions with continuity correction

7 ANCOVA Table

```{r}
# ANCOVA adjusted for grade and stage
trial |>
  #select(trt, age, marker, grade, stage)
  tbl_summary(
    by = trt,
    statistic = list(all_continuous() ~ "{mean} ({sd})"),
    missing = "no",
    include = c(age, marker, ttdeath, trt)
  ) |>
  add_n() |>
  add_difference(adj.vars = c(grade, stage))
```

Characteristic	N	Drug A N = 98¹	Drug B N = 102¹	Adjusted Difference²	95% CI²	p-value²
Age	189	47 (15)	47 (14)	-0.36	-4.5, 3.8	0.9
Marker Level (ng/mL)	190	1.02 (0.89)	0.82 (0.83)	0.19	-0.05, 0.43	0.12
Months to Death/Censor	200	20.2 (5.0)	19.0 (5.5)	1.0	-0.38, 2.5	0.15
Abbreviation: CI = Confidence Interval
¹ Mean (SD)
² ANCOVA

8 Regression model with `tbl_regression()`

8.1 Traditoinal Logistic model

```{r}
m1 <- trial |> glm(
  response ~ age + stage,
  data = _,
  family = binomial(link = "logit")
)

summary(m1)
```


Call:
glm(formula = response ~ age + stage, family = binomial(link = "logit"), 
    data = trial)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)  
(Intercept) -1.48622    0.62023  -2.396   0.0166 *
age          0.01939    0.01147   1.691   0.0909 .
stageT2     -0.54143    0.44000  -1.231   0.2185  
stageT3     -0.05953    0.45042  -0.132   0.8948  
stageT4     -0.23109    0.44823  -0.516   0.6062  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 228.58  on 182  degrees of freedom
Residual deviance: 223.93  on 178  degrees of freedom
  (17 observations deleted due to missingness)
AIC: 233.93

Number of Fisher Scoring iterations: 4

8.2 Table Using `tbl_regression`

```{r}
m1 |> 
  tbl_regression() # same result as the above

# customize
m1 |> 
  tbl_regression(
    exponentiate = TRUE,
    pvalue_fun = label_style_pvalue(digits = 2)
  ) |> 
  add_global_p() |> 
  bold_p(t = 0.10) |> 
  bold_labels() |> 
  add_glance_table(
    include = c(nobs, logLik, AIC, BIC)
  )
```

Characteristic	log(OR)	95% CI	p-value
Age	0.02	0.00, 0.04	0.091
T Stage
T1	—	—
T2	-0.54	-1.4, 0.31	0.2
T3	-0.06	-0.95, 0.82	0.9
T4	-0.23	-1.1, 0.64	0.6
Abbreviations: CI = Confidence Interval, OR = Odds Ratio

Characteristic	OR	95% CI	p-value
Age	1.02	1.00, 1.04	0.087
T Stage			0.62
T1	—	—
T2	0.58	0.24, 1.37
T3	0.94	0.39, 2.28
T4	0.79	0.33, 1.90
No. Obs.	183
Log-likelihood	-112
AIC	234
BIC	250
Abbreviations: CI = Confidence Interval, OR = Odds Ratio

8.3 Multiple regression with OLS

tbl_regression()

```{r}
mpg_model <- mpg |> 
  lm(hwy ~ displ+ cyl + drv, data = _)

mpg_model |> 
  summary() # same as below

mpg_model |> 
  tbl_regression() |> 
  add_n() |> 
  add_vif() |> 
  bold_labels()
```


Call:
lm(formula = hwy ~ displ + cyl + drv, data = mpg)

Residuals:
    Min      1Q  Median      3Q     Max 
-8.7095 -2.0282 -0.1297  1.3760 13.8110 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  33.0915     1.0306  32.108  < 2e-16 ***
displ        -1.1245     0.4614  -2.437   0.0156 *  
cyl          -1.4526     0.3334  -4.357 1.99e-05 ***
drvf          5.0446     0.5134   9.826  < 2e-16 ***
drvr          4.8851     0.7116   6.864 6.20e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.968 on 229 degrees of freedom
Multiple R-squared:  0.7559,    Adjusted R-squared:  0.7516 
F-statistic: 177.2 on 4 and 229 DF,  p-value: < 2.2e-16

Characteristic	N	Beta	95% CI	p-value	GVIF	Adjusted GVIF¹
displ	234	-1.1	-2.0, -0.22	0.016	9.4	3.1
cyl	234	-1.5	-2.1, -0.80	<0.001	7.6	2.8
drv	234				2.0	1.2
4		—	—
f		5.0	4.0, 6.1	<0.001
r		4.9	3.5, 6.3	<0.001
Abbreviations: CI = Confidence Interval, GVIF = Generalized Variance Inflation Factor
¹ GVIF^[1/(2*df)]

9 Univariate Regression

the same as simple regression
Univariate regression Analyzes the relationship between one dependent variable and one independent variable, while “regression” generally refers to a broader class of statistical methods that examine relationships between a dependent variable and one or more independent variables.

The function is a wrapper for tbl_regression(), and as a result, accepts nearly identical function arguments.

9.1 Dichotomous DV

```{r}
trial |>
  tbl_uvregression(
    method = glm,
    y = response,
    include = c(age, stage),
    method.args = list(family = binomial),
    exponentiate = TRUE,
    pvalue_fun = label_style_pvalue(digits = 2)
  ) |>
  add_global_p() |> 
  add_q() |> 
  bold_p(t = 0.10, q = TRUE) |> 
  bold_labels()
```

Characteristic	N	OR	95% CI	p-value	q-value¹
Age	183	1.02	1.00, 1.04	0.091	0.18
T Stage	193			0.58	0.58
T1		—	—
T2		0.63	0.27, 1.46
T3		1.13	0.48, 2.68
T4		0.83	0.36, 1.92
Abbreviations: CI = Confidence Interval, OR = Odds Ratio
¹ False discovery rate correction for multiple testing

9.2 Continuous DV

```{r}
mpg |> 
  tbl_uvregression(
    method = lm,
    y = hwy,
    include = c(displ, cyl, drv),
    pvalue_fun = label_style_pvalue(digits = 2)
  ) 

# cf
mpg |> 
  lm(hwy ~ displ, data = _) |> 
  summary()

mpg |> 
  lm(hwy ~ cyl, data = _) |> 
  summary()

mpg |> 
  lm(hwy ~ drv, data = _) |> 
  summary()
```

Characteristic	N	Beta	95% CI	p-value
displ	234	-3.5	-3.9, -3.1	<0.001
cyl	234	-2.8	-3.1, -2.5	<0.001
drv	234
4		—	—
f		9.0	7.9, 10	<0.001
r		1.8	0.03, 3.6	0.047
Abbreviation: CI = Confidence Interval


Call:
lm(formula = hwy ~ displ, data = mpg)

Residuals:
    Min      1Q  Median      3Q     Max 
-7.1039 -2.1646 -0.2242  2.0589 15.0105 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  35.6977     0.7204   49.55   <2e-16 ***
displ        -3.5306     0.1945  -18.15   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.836 on 232 degrees of freedom
Multiple R-squared:  0.5868,    Adjusted R-squared:  0.585 
F-statistic: 329.5 on 1 and 232 DF,  p-value: < 2.2e-16


Call:
lm(formula = hwy ~ cyl, data = mpg)

Residuals:
    Min      1Q  Median      3Q     Max 
-8.7579 -2.4968  0.2421  2.4379 15.2421 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  40.0190     0.9591   41.72   <2e-16 ***
cyl          -2.8153     0.1571  -17.92   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.865 on 232 degrees of freedom
Multiple R-squared:  0.5805,    Adjusted R-squared:  0.5787 
F-statistic: 321.1 on 1 and 232 DF,  p-value: < 2.2e-16


Call:
lm(formula = hwy ~ drv, data = mpg)

Residuals:
    Min      1Q  Median      3Q     Max 
-11.160  -2.175  -1.000   1.960  15.840 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  19.1748     0.4037  47.501   <2e-16 ***
drvf          8.9856     0.5668  15.852   <2e-16 ***
drvr          1.8252     0.9134   1.998   0.0469 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.097 on 231 degrees of freedom
Multiple R-squared:  0.5307,    Adjusted R-squared:  0.5266 
F-statistic: 130.6 on 2 and 231 DF,  p-value: < 2.2e-16

10 Combining tables in columns or rows

tbl_merge: in columns
tbl_stack: in rows

```{r}
r1 <- mpg |> 
  lm(hwy ~ displ + cyl + drv, data = _) |> 
  tbl_regression() |> 
  add_n() |> 
  add_vif() |> 
  bold_labels()

r2 <- mpg |> 
  lm(cty ~ displ + cyl + drv, data = _) |> 
  tbl_regression() |> 
  add_n() |> 
  add_vif() |> 
  bold_labels()

tbl_merge(list(r1, r2))
tbl_merge(list(r1, r2), tab_spanner = c("Highway Mileage", "City Mileage"))

tbl_stack(list(r1, r2))
tbl_stack(list(r1, r2), group_header = c("Highway Mileage", "City Mileage"))
```

Characteristic	Table 1						Table 2
Characteristic	N	Beta	95% CI	p-value	GVIF	Adjusted GVIF¹	N	Beta	95% CI	p-value	GVIF	Adjusted GVIF¹
displ	234	-1.1	-2.0, -0.22	0.016	9.4	3.1	234	-0.74	-1.4, -0.06	0.032	9.4	3.1
cyl	234	-1.5	-2.1, -0.80	<0.001	7.6	2.8	234	-1.3	-1.8, -0.81	<0.001	7.6	2.8
drv	234				2.0	1.2	234				2.0	1.2
4		—	—					—	—
f		5.0	4.0, 6.1	<0.001				2.5	1.7, 3.3	<0.001
r		4.9	3.5, 6.3	<0.001				2.2	1.1, 3.2	<0.001
Abbreviations: CI = Confidence Interval, GVIF = Generalized Variance Inflation Factor
¹ GVIF^[1/(2*df)]

Characteristic	Highway Mileage						City Mileage
Characteristic	N	Beta	95% CI	p-value	GVIF	Adjusted GVIF¹	N	Beta	95% CI	p-value	GVIF	Adjusted GVIF¹
displ	234	-1.1	-2.0, -0.22	0.016	9.4	3.1	234	-0.74	-1.4, -0.06	0.032	9.4	3.1
cyl	234	-1.5	-2.1, -0.80	<0.001	7.6	2.8	234	-1.3	-1.8, -0.81	<0.001	7.6	2.8
drv	234				2.0	1.2	234				2.0	1.2
4		—	—					—	—
f		5.0	4.0, 6.1	<0.001				2.5	1.7, 3.3	<0.001
r		4.9	3.5, 6.3	<0.001				2.2	1.1, 3.2	<0.001
Abbreviations: CI = Confidence Interval, GVIF = Generalized Variance Inflation Factor
¹ GVIF^[1/(2*df)]

Characteristic	N	Beta	95% CI	p-value	GVIF	Adjusted GVIF¹
displ	234	-1.1	-2.0, -0.22	0.016	9.4	3.1
cyl	234	-1.5	-2.1, -0.80	<0.001	7.6	2.8
drv	234				2.0	1.2
4		—	—
f		5.0	4.0, 6.1	<0.001
r		4.9	3.5, 6.3	<0.001
displ	234	-0.74	-1.4, -0.06	0.032	9.4	3.1
cyl	234	-1.3	-1.8, -0.81	<0.001	7.6	2.8
drv	234				2.0	1.2
4		—	—
f		2.5	1.7, 3.3	<0.001
r		2.2	1.1, 3.2	<0.001
Abbreviations: CI = Confidence Interval, GVIF = Generalized Variance Inflation Factor
¹ GVIF^[1/(2*df)]

Characteristic	N	Beta	95% CI	p-value	GVIF	Adjusted GVIF¹
Highway Mileage
displ	234	-1.1	-2.0, -0.22	0.016	9.4	3.1
cyl	234	-1.5	-2.1, -0.80	<0.001	7.6	2.8
drv	234				2.0	1.2
4		—	—
f		5.0	4.0, 6.1	<0.001
r		4.9	3.5, 6.3	<0.001
City Mileage
displ	234	-0.74	-1.4, -0.06	0.032	9.4	3.1
cyl	234	-1.3	-1.8, -0.81	<0.001	7.6	2.8
drv	234				2.0	1.2
4		—	—
f		2.5	1.7, 3.3	<0.001
r		2.2	1.1, 3.2	<0.001
Abbreviations: CI = Confidence Interval, GVIF = Generalized Variance Inflation Factor
¹ GVIF^[1/(2*df)]

11 themes

```{r}
# With a default theme

mpg |> 
  lm(hwy ~ displ + cyl + drv, data = _) |> 
  tbl_regression() |> 
  add_n() |> 
  add_vif() |> 
  bold_labels()

# Journal of American Medical Association Theme
theme_gtsummary_journal(journal = "jama")

mpg |> 
  lm(hwy ~ displ + cyl + drv, data = _) |> 
  tbl_regression() |> 
  add_n() |> 
  add_vif() |> 
  bold_labels()

reset_gtsummary_theme()

# The Quarterly Journal of Economics
theme_gtsummary_journal(journal = "qjecon")

mpg |> 
  lm(hwy ~ displ + cyl + drv, data = _) |> 
  tbl_regression() |> 
  add_n() |> 
  add_vif() |> 
  bold_labels()
```

Characteristic	N	Beta	95% CI	p-value	GVIF	Adjusted GVIF¹
displ	234	-1.1	-2.0, -0.22	0.016	9.4	3.1
cyl	234	-1.5	-2.1, -0.80	<0.001	7.6	2.8
drv	234				2.0	1.2
4		—	—
f		5.0	4.0, 6.1	<0.001
r		4.9	3.5, 6.3	<0.001
Abbreviations: CI = Confidence Interval, GVIF = Generalized Variance Inflation Factor
¹ GVIF^[1/(2*df)]

Characteristic	N	Beta (95% CI)	p-value	GVIF	Adjusted GVIF¹
displ	234	-1.1 (-2.0 to -0.22)	0.016	9.4	3.1
cyl	234	-1.5 (-2.1 to -0.80)	<0.001	7.6	2.8
drv	234			2.0	1.2
4		—
f		5.0 (4.0 to 6.1)	<0.001
r		4.9 (3.5 to 6.3)	<0.001
Abbreviations: CI = Confidence Interval, GVIF = Generalized Variance Inflation Factor
¹ GVIF^[1/(2*df)]

Characteristic	N	Beta (SE)¹	GVIF	Adjusted GVIF²
displ	234	-1.1* (0.461)	9.4	3.1
cyl	234	-1.5*** (0.333)	7.6	2.8
drv	234		2.0	1.2
4		—
f		5.0*** (0.513)
r		4.9*** (0.712)
Abbreviations: GVIF = Generalized Variance Inflation Factor, SE = Standard Error
¹ p<0.05; p<0.01; **p<0.001
² GVIF^[1/(2*df)]

12 References

R Studio Cheatsheets: https://www.rstudio.com/resources/cheatsheets/
R4DS Book: https://r4ds.had.co.nz/data-visualisation.html
gtsummary’s cross tab: https://www.danieldsjoberg.com/gtsummary/articles/tbl_summary.html
gtsummary’s regression table: https://www.danieldsjoberg.com/gtsummary/articles/tbl_regression.html
Daniel Sjoberg’s presentation: https://www.danieldsjoberg.com/gtsummary-weill-cornell-presentation/

1 Loading Packages

2 gtSummary package

3 Experimental Design:

3.1 tbl_summary()

4 Differences between groups

4.1 Wilcoxon Rank sum test

4.2 Welch Two Samples T-test

4.3 Chi-square with tbl_summary

4.4 Customizing tbl_summary() output

4.4.1 as_gt()

4.5 Chi-Square with tbl_cross()

5 Survey Data

5.1 tbl_summary()

5.2 tbl_cross()

6 More on tbl_summary()

6.1 Modifyingtbl_summary() function argument

6.2 Formatting table with tbl_summary() functions

6.3 t-test

6.4 add_difference()

7 ANCOVA Table

8 Regression model with tbl_regression()

8.1 Traditoinal Logistic model

8.2 Table Using tbl_regression

8.3 Multiple regression with OLS

9 Univariate Regression

9.1 Dichotomous DV

9.2 Continuous DV

10 Combining tables in columns or rows

11 themes

12 References

3.1 `tbl_summary()`

4.3 Chi-square with `tbl_summary`

4.4 Customizing `tbl_summary()` output

4.4.1 `as_gt()`

4.5 Chi-Square with `tbl_cross()`

5.1 `tbl_summary()`

5.2 `tbl_cross()`

6 More on `tbl_summary()`

6.1 Modifying`tbl_summary()` function argument

6.2 Formatting table with `tbl_summary()` functions

6.4 `add_difference()`

8 Regression model with `tbl_regression()`

8.2 Table Using `tbl_regression`