library(data.table)
<- fread("https://raw.githubusercontent.com/conedatascience/covid-data/master/data/timeseries/waste-water.csv")
dat
<- dat[grepl(pattern = "Greens", wwtp_name)]
o
:=as.numeric(date_new)]
o[,date_n:= log10(sars_cov2_normalized)]
o[,log_copies <- o[!is.na(log_copies)]
o
plot(log_copies~date_new, o, type = "l", adj = 0,
xlab = '', main = "SARS-CoV-2 RNA Copies - Greensboro NC")
NCDHHS has been participating in waste-water surveillance for SARS-CoV-2. Unfortunately, there is some delay in the data (posted weekly and what is posted is generally a week old or more).
I just want to explore if there is any predictive power. First, I’ll pull the data down and plot it for Greensboro, NC.
Now I’m going to get into dangerous territory and fit a Bayesian spline to these data and then predict out to look at the trend.
library(mgcv)
Loading required package: nlme
This is mgcv 1.8-36. For overview type 'help("mgcv-package")'.
<- bam(log_copies ~ s(date_n, k = 7, bs = "cs"), data = o) fit
Warning in seq.default(0, 1, length = nk): partial argument match of 'length' to
'length.out'
Warning: partial match of 'scale.est' to 'scale.estimated'
Warning in model.matrix.default(Terms[[i]], mf, contrasts = oc): partial
argument match of 'contrasts' to 'contrasts.arg'
plot(fit)
Warning in seq.default(min(raw), max(raw), length = n): partial argument match
of 'length' to 'length.out'
Ok, now let’s do that prediction:
<- data.frame(date = seq.Date(min(o$date_new),
pred_matrix length.out = 365, by = "day"))
$date_n <- as.numeric(pred_matrix$date)
pred_matrix
$pred <- predict(fit, newdata = pred_matrix) pred_matrix
Warning in model.matrix.default(Terms[[i]], mf, contrasts = oc): partial
argument match of 'contrasts' to 'contrasts.arg'
plot(log_copies~date_new, o, type = "l",
xlim = c(min(o$date_new), Sys.Date()+14),
xlab = '', main = "SARS-CoV-2 RNA Copies - Greensboro NC")
lines(pred_matrix$date, pred_matrix$pred, col = "red")
So it appears that there will be an increasing amount of RNA in the wastewater in Greensboro.
<- nccovid::get_covid_state(select_county = "Guilford",
guilford_cases reporting_adj = TRUE)
Using: cone as the data source
Last date available: 2022-05-28
setDT(guilford_cases)
For fun (because of the reporting delay) I will plot the rolling average cases on this same plot. We can see that the cases did in fact increase, but much more rapidly than our projection would have suggested.
<- guilford_cases[date>=min(o$date_new)]
guilford_cases
plot(log_copies~date_new, o, type = "l",
xlim = c(min(o$date_new), Sys.Date()+14),
xlab = '', main = "SARS-CoV-2 RNA Copies - Greensboro NC")
lines(pred_matrix$date, pred_matrix$pred, col = "red")
par(new = TRUE)
plot(cases_daily_roll_sum~date, guilford_cases, xlab = "", ylab = "",
xlim = c(min(o$date_new), Sys.Date()+14),axes = FALSE)
In this next section I was curious if there was a strong cross-correlation with a particular lag. In theory it would be nice if we could say that we see RNA copies increasing and that gives us an alert some period before we see cases. This way health systems could prepare.
<- list()
cor_list for(i in 1:30){
<- copy(o)[,lag_cases:=shift(x = cases_new_cens_per10k,
d
i, type = "lead")][!is.na(lag_cases)]
<- with(d, cor(lag_cases,log_copies,
cor_list[[i]] method = "spearman"))
}
<- do.call(rbind, cor_list)
overall_cor plot(1:30, overall_cor, main = "Analysis of Lags Between RNA Copies and Cases",
ylab = "Spearman Correlation", xlab = "Lag Number")
abline(h = 0, lty ="dashed")
For this analysis it seems that there isn’t any large warnings…with the highest correlation being a 1 day lead. However there could be a rough 1-4 day advanced warning. Nothing too long term warning as what others have suggested.
Reuse
Citation
@online{dewitt2022,
author = {Michael DeWitt},
title = {Waste {Water} {Monitoring} and {COVID-19}},
date = {2022-01-01},
url = {https://michaeldewittjr.com/programming/2021-11-22-waste-water-monitoring-and-covid-19},
langid = {en}
}