16: Additive & multiplicative models

Author

Thiago Cerqueira Silva

Q1

Paperwork question

Some notes: Additive scale in this practical can be seem as “risk difference”, while multiplicative as “risk ratio”. Don’t get confused with additive from “generalized additive models”

Q2

This question uses data from a retrospective cohort study conducted in southern Africa investigating the risk factors for tuberculosis (TB) among male gold miners (contained in goldmine.dta). We want to examine the joint effect of HIV status (hiv coded 0 for negative and 1 for positive) and silicosis (silic coded 0 for none, 1 for possible, 2 for early, and 3 for advanced) on the incidence of TB (tb coded 0 for negative and 1 for positive). Follow-up is assessed using entry and exit.

Show the code

library(haven) 
# create table
library(gtsummary)
# data wrangling and other functions
library(tidyverse)
# if you want to have a output similar to stata
library(tidylog)

# Limit significant digits to 3, remove scientific notation (most of the time, scipen=999 removes all time)
options(digits = 4, scipen = 9)

Show the code

gold <- read_dta("datasets/GOLDMINE.DTA")|> 
  mutate(across(where(is.labelled),as_factor))

**Outcome*: - tb (tubercolosis diagnosis, 0/1)

**Exposures*: - hiv (HIV status at entry: 0/1) - silic (silicosis grad: 0 = “none”, 1 = “possible”, 2 = “early”, 3 = “advanced”)

**Time*: -entry (date of entry, formatted as number of days) -exit (date of exit, formatted as number of days) -dob (date of birth, formatted as number of days)

Create follow-up time (years) and drop missing data in silicosis

Show the code

gold <- gold |> 
  mutate(silic = as.factor(silic)) |> 
  mutate(fup_y = (exit-entry)/365.25)

gold_2 <- gold |> 
  drop_na(silic)

Use the strate and streg (or glm) commands to examine the crude effect of HIV status on TB.

Show the code

mod1 <- glm(tb ~ hiv + offset(log(fup_y)) , data=gold_2, family=poisson)
tbl1 <- tbl_regression(mod1, exponentiate=T)
mod2 <- glm(tb ~ silic+ offset(log(fup_y)) , data=gold_2, family=poisson)
tbl2 <- tbl_regression(mod2, exponentiate=T)
tbl_merge(list(tbl1,tbl2),
          tab_spanner = c("**HIV**","**Silicosis**"))

Characteristic	HIV			Silicosis
Characteristic	IRR	95% CI	p-value	IRR	95% CI	p-value
HIV status at entry	4.59	3.48, 6.10	<0.001
silic
0				—	—
1				1.83	1.29, 2.54	<0.001
2				3.53	2.13, 5.53	<0.001
3				3.58	2.31, 5.35	<0.001
Abbreviations: CI = Confidence Interval, IRR = Incidence Rate Ratio

Are there any biological reasons why you might expect the effects of HIV status and silicosis on TB incidence to combine multiplicatively or additively?

Answer

One might expect the effects of HIV and silicosis on TB to combine additively because they have two separate pathways leading to active tuberculosis: HIV tends to cause TB through immunosuppression, whereas silicosis causes TB through lung impairment caused by prolonged silica dust exposure.

Examine the joint effects of HIV status and silicosis on TB in a multiplicative Poisson model. Is there evidence of an interaction on the multiplicative scale?

Show the code

# gold_2_agg <- gold_2 |> 
#   group_by(hiv,silic) |> 
#   summarise(n_tb = sum(tb),
#       pyears = sum(fup_y))
# 
# mod3 <- glm(n_tb ~hiv+ silic +offset(log(pyears)), data=gold_2_agg, family=poisson)
# mod31 <- glm(n_tb ~hiv* silic +offset(log(pyears)), data=gold_2_agg, family=poisson)

mod3 <- glm(tb ~hiv+ silic+ offset(log(fup_y)) , data=gold_2, family=poisson)
mod31 <- glm(tb ~hiv*silic+ offset(log(fup_y)) , data=gold_2, family=poisson)

tbl1 <- tbl_regression(mod3, exponentiate=T)
tbl2 <- tbl_regression(mod31, exponentiate=T)
tbl_merge(list(tbl1,tbl2),
          tab_spanner = c("No interaction","Interaction"))

Characteristic	No interaction			Interaction
Characteristic	IRR	95% CI	p-value	IRR	95% CI	p-value
HIV status at entry	4.82	3.65, 6.41	<0.001	4.85	3.33, 7.16	<0.001
silic
0	—	—		—	—
1	1.98	1.40, 2.76	<0.001	1.84	1.02, 3.19	0.035
2	3.57	2.16, 5.60	<0.001	2.58	0.89, 5.94	0.046
3	4.12	2.65, 6.15	<0.001	5.34	2.86, 9.46	<0.001
HIV status at entry * silic
HIV status at entry * 1				1.12	0.56, 2.31	0.7
HIV status at entry * 2				1.59	0.57, 5.18	0.4
HIV status at entry * 3				0.60	0.25, 1.40	0.2
Abbreviations: CI = Confidence Interval, IRR = Incidence Rate Ratio

Show the code

anova(mod3,mod31)

Resid. Df	Resid. Dev	Df	Deviance	Pr(>Chi)
3943	1423	NA	NA	NA
3940	1420	3	2.786	0.4259

Answer

The LRT for interaction gives a p-value of 0.43, hence there is no evidence of an interaction on the multiplicative scale. The adjusted RRs for HIV status and silicosis are given in the model above; both effects are strong after adjusting for one another. We have no evidence to suggest that these effects do not combine multiplicatively, so there is an estimated 20-fold increase in the rate of developing TB among those HIV-positive who have advanced silicosis compared to those HIV-negative without silicosis (RR=4.82x4.12=19.86).

Coding explanation

The code in Stata first aggregate the dataset, it is not strictly necessary.

Show the code

cat("Expected/Observed number of TB cases")

Expected/Observed number of TB cases

Show the code

gold_2 |>
  mutate(
    predicted_cases = predict(mod3, type = "response")
  ) |>
  group_by(hiv, silic) |>
  summarise(
    observed = sum(tb),
    predicted = sum(predicted_cases)
  )

hiv	silic	observed	predicted
0	0	40	40.127
0	1	17	18.329
0	2	5	6.956
0	3	15	11.589
1	0	78	77.873
1	1	30	28.671
1	2	15	13.044
1	3	12	15.411

For all covariates, the observed number of TB events is similar to the expected number under the multiplicative model.

Examine the joint effects of HIV infection and silicosis on TB in an additive rate model. Is there evidence of an interaction on the additive scale?

The answer to part (iv) suggests that the multiplicative model is appropriate and we could stop here. There is no evidence of an interaction on the multiplicative scale and for two out of the three categories of silicosis, the interaction was positive rather than negative. Hence, there is no statistical evidence to suggest additive effects. Let’s try fitting the additive model anyway, on the grounds that it is biologically plausible.

Show the code

# Create rate variable
gold_2 <- gold_2 |> 
  mutate(rate = tb/fup_y)

# use the poisson distribution with identity link
mod4 <- glm(rate ~hiv+ silic , data=gold_2, family=poisson(link="identity"),
            weights = fup_y)
mod41 <- glm(rate ~hiv*silic , data=gold_2, family=poisson(link="identity"),
            weights = fup_y)

anova(mod4,mod41)

Resid. Df	Resid. Dev	Df	Deviance	Pr(>Chi)
3943	1437	NA	NA	NA
3940	1420	3	16.82	0.0008

Answer

The LRT is given by 16.82 [2*(-955.979- -964.390)] which is a chi-square with 3 df, giving p=0.0008 and so there is strong evidence of interaction on the additive scale. Note that all three interaction terms are positive and the “main effects” are also positive (so all RRs>1), suggesting that the effects are more than additive (likely multiplicative).

Show the code

# tbl_regression(mod3, exponentiate=T)
# tbl_regression(mod3, exponentiate=T)

gold_2 |> 
  mutate(
    predicted_cases = predict(mod4,type = "response")
  ) |> group_by(hiv, silic) |> 
  summarise(observed = sum(tb),
            predicted= sum(predicted_cases*fup_y))

hiv	silic	observed	predicted
0	0	40	37.645
0	1	17	19.847
0	2	5	8.870
0	3	15	17.008
1	0	78	92.356
1	1	30	20.802
1	2	15	7.070
1	3	12	8.402

Answer

For some covariates, the observed number of TB events differs markedly from the expected number under the additive model. In summary, the strong evidence for an interaction (p=0.0008) and the discrepancies between the observed and expected values for the additive model suggest that the effects of HIV and silicosis do not combine additively.