| Type: | Package | 
| Title: | The Self-Controlled Case Series Method | 
| Version: | 1.7 | 
| Date: | 2024-04-01 | 
| Author: | Yonas Ghebremichael Weldeselassie, Heather Whitaker, Paddy Farrington | 
| Maintainer: | "Yonas Ghebremichael Weldeselassie" <yonas.weldeselassie@open.ac.uk> | 
| Description: | Various self-controlled case series models used to investigate associations between time-varying exposures such as vaccines or other drugs or non drug exposures and an adverse event can be fitted. Detailed information on the self-controlled case series method and its extensions with more examples can be found in Farrington, P., Whitaker, H., and Ghebremichael Weldeselassie, Y. (2018, ISBN: 978-1-4987-8159-6. Self-controlled Case Series studies: A modelling Guide with R. Boca Raton: Chapman & Hall/CRC Press) and https://sccs-studies.info/index.html. | 
| License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] | 
| Depends: | R (≥ 2.14.0) | 
| Imports: | survival, splines, corpcor, fda, R.methodsS3, gnm | 
| LazyData: | yes | 
| Packaged: | 2024-04-01 08:43:38 UTC; yw399 | 
| NeedsCompilation: | no | 
| Repository: | CRAN | 
| Date/Publication: | 2024-04-01 12:10:11 UTC | 
The Self-Controlled Case Series (SCCS) Method
Description
Fits the self-controlled case series model used to investigate the association between a time-varying exposures such as vaccines or other drugs and an adverse event. Some extensions of the SCCS method can be fitted with this package.
Note: Ages and times describing the start and end of observation, exposure times, and event times, should be expressed as integers. In most of the datasets, units of time are days; other choices may be appropriate according to context, but must be given as integers. Ages appearing as covariates need not be integers.
Details
| Package: | SCCS | 
| Type: | Package | 
| Version: | 1.7 | 
| Date: | 2024-04-01 | 
| License: | "GPL-2" | 
Author(s)
Yonas Ghebremichael-Weldeselassie, Heather Whitaker, Paddy Farrington
Maintainer: Yonas Ghebremichael-Weldeselassie <y.weldeselassie@warwick.ac.uk>
References
https://sccs-studies.info/index.html
Farrington, P., Whitaker, H., and Ghebremichael-Weldeselassie, Y. (2018). Self-controlled Case Series Studies: A modelling Guide with R. Boca Raton: Chapman & Hall/CRC Press.
Data on NSAID and antidepressant exposure and first GI bleed
Description
The data comprise ages in days at first gastro-intestinal (GI) bleed, treatments with non-steroidal anti-inflammatory drugs (NSAIDs), and treatments with antidepressants (SDs). There are 1000 simulated cases based on Tata et al (2005).
Usage
addatFormat
A data frame containing 3628 rows and 8 columns. The column names are 'case' (individual identifier), 'sta' (age on the first day of the observation period), 'end' (age on last day of the observation period), 'bleed' (age at first GI bleed), 'ns' (age at first day of NSAID treatment episode), 'endns' (age at last day of NSAID treatment episode), 'ad' (age at first day of AD treatment episode), 'endad' (age on last day of AD treatment episode). The data are in format 'stack' (see dataformat for details) with different exposure episodes on different rows.
References
Tata, L.J., Fortun, P. J., Hubbard, R. B., Smeeth, L., Hawkey, C. J., Smith, P., Whitaker, H. J., Farrington, C. P., Card, T. R. and West J. (2005). Does concurrent prescription of selective serotonin reuptake inhibitors and non-steroidal anti-inflammatory drugs substantially increase the risk of upper gastrointestinal bleeding? Alimentary Pharmacology and Therapeutics 22, 175–181.
Data on antidiabetics and fractures
Description
The data comprise ages in days at first fracture and start of treatment with a thiazolidinedione antidiabetic. There are 2000 simulated cases based on Douglas et al (2009). Observation is curtailed at the end of treatment.
Usage
adidatFormat
A data frame containing 2000 rows and 6 columns. The column names are 'case' (individual identifier), 'frac' (age at fracture), 'sta' (age on the first day of the observation period), 'end' (age on last day of the observation period), 'adi' (age at first day of antidiabetic treatment), 'type' (fracture type: 1 for foot/ankle/wrist/hand, 2 for hip, 3 for spine).
References
Douglas, I. J., Evans, S. J., Pocock, S. and Smeeth L. (2009). The risk of fractures associated with thiazolidinediones: A self-controlled case-series study. PLoS Medicine 6 (9), e1000154.
Data on MMR and aseptic meningitis
Description
The data comprise ages in days at measles, mumps and rubella (MMR) vaccination and hospital admission for aseptic meningitis. There are 10 admissions in 10 children.
Usage
amdatFormat
A data frame containing 10 rows and 5 columns. The column names are 'case' (individual identifier), 'sta' (age on first day of the observation period), 'end' (age on last day of the observation period), 'am' (age at admission for aseptic meningitis, with a confirmed diagnosis of viral meningitis), and 'mmr' (age at mmr vaccination).
Source
Whitaker, H. J., Farrington, C. P., Spiessens, B., and Musonda, P. (2006). Tutorial in biostatistics: The self-controlled case series method. Statistics in Medicine 25, 1768–1797.
References
Miller, E., Goldacre, M., Pugh, S., Colville, A., Farrington, C.P., Flower, A., Nash, J., MacFarlane, L. and Tettmar, R. Risk of aseptic meningitis after measles, mumps and rubella vaccine in UK children (1993). The Lancet 341, 979–982.
Data on antipsychotics and stroke
Description
The data comprise ages in days at first stroke and treatments with antipsychotics. There are 2000 simulated cases based on Douglas and Smeeth (2008), including 1500 without dementia and 500 with dementia.
Usage
apdatFormat
A data frame containing 11792 rows and 8 columns. The column names are 'case' (individual identifier), 'sta' (age on the first day of the observation period), 'end' (age on last day of the observation period), 'stro' (age at first stroke), 'ap' (age at first day of antipsychotic treatment episode), 'endap' (age at last day of antipsychotic treatment episode), 'cen' (1 if observation ended before the end of the study, 0 otherwise), 'dem' (0 if the patient does not have dementia, 1 if the patient has dementia). The data are in format 'stack' with different exposure episodes on different rows.
References
Douglas, I. J. and Smeeth L. (2008). Exposure to antipsychotics and risk of stroke: Self-controlled case series study. British Medical Journal 337, a1227.
Data on MMR vaccine and autism
Description
The data comprise ages in days at measles mumps rubella (MMR) vaccination and autism diagnosis. There are 350 simulated cases based on Taylor et al (1999).
Usage
autdatFormat
A data frame containing 350 rows and 5 columns. The column names are 'case' (individual identifier), 'diag' (age at autism diagnosis), 'sta' (age on the first day of the observation period), 'end' (age on last day of the observation period), 'mmr' (age at MMR vaccination.
References
Taylor, B., Miller, E., Farrington, C. P., Petropoulos, M.C., Favot-Mayaud, I., Li, J., and Waight, P. A. (1999). Autism and measles, mumps and rubella vaccine: No epidemiological evidence for a causal association. Lancet 353, 2026-2029.
Data on blood pressure and headaches
Description
The data comprise systolic and diastolic blood pressures taken twice a day for seven days, and start of headache in the next period. There are 71 headaches in 64 cases.
Usage
bpdatFormat
A data frame containing 896 rows and 6 columns. The column names are 'case' (individual identifier), 'dow' (day of week, coded 1 for Monday,...,7 for Sunday), 'time' (time of reading, coded 1 for am, 2 for pm), 'sys' (systolic blood pressure), 'dia' (diastolic blood pressure), 'head' (1 if a headache started in the period up to the next blood pressure reading, or within 12 hours of the last reading).
References
Farrington P., Whitaker H., and Ghebremichael-Weldeselassie Y. (2018). Self-controlled Case Series Studies: A modelling Guide with R. Boca Raton: Chapman & Hall/CRC Press.
Data on bupropion and sudden death
Description
The data comprise simulated data on bupropion and sudden death. There are 121 cases. The start of observation coincides with age at first bupropion prescription; the nominal end of observation is age on 11 November 2003 (day 1136). Ages are in days.
Usage
bupdatFormat
A data frame containing 121 rows and 4 columns. The column names are 'case' (individual identifier), 'date' (date of first bupropion prescription, day 0 = 1 October 2000), 'bup' (age at first bupropion prescription), 'death' (age at sudden death).
References
Hubbard R., Lewis S., West J., Smith C., Godfrey C., Smeeth L., Farrington P. and Britton J. (2005). Bupropion and the risk of sudden death: a self-controlled case series analysis using The Health Improvement Network. Thorax 60, 848-850.
Data on DTP and convulsions
Description
The data comprise ages in days at measles, mumps and rubella (MMR) vaccination, Haemophilus influenzae type b (Hib) booster or catch-up vaccination, and febrile convulsion. There are 2435 convulsions in 2201 children. The ages have been jittered.
Usage
condatFormat
A data frame containing 2435 rows and 11 columns. The column names are 'case' (individual identifier), 'sta' (age on first day of the observation period), 'end' (age on last day of the observation period), 'conv' (age at convulsion), 'hib' (age at Hib booster/catch-up vaccination), 'mmr' (age at MMR vaccination), 'sex' (1 for males, 2 for females), 'gap' (days from convulsion to next convulsion within the same case, or to end of observation), 'cen' (0 if last admission for this case, 1 otherwise), 'rec' (within-case event number), 'ngrp' (1 if case has a unique event, 2 if case has 2+ events).
References
Farrington P., Whitaker H., and Ghebremichael-Weldeselassie Y. (2018). Self-controlled Case Series Studies: A modelling Guide with R. Boca Raton: Chapman & Hall/CRC Press.
Data on DTP and convulsions
Description
The data comprise ages in days at diphteria, tetanus and pertussis (DTP) vaccination and febrile convulsion. There are 1379 convulsions in 1214 children. The ages have been jittered.
Usage
dtpdatFormat
A data frame containing 1379 rows and 8 columns. The column names are 'case' (individual identifier), 'sta' (age on first day of the observation period), 'end' (age on last day of the observation period), 'conv' (age at convulsion), 'dtp' (age at first dose of DTP), 'dtpd2' (age at second dose of DTP), 'dtpd3' (age at third dose of DTP), 'sex' (1 for males, 2 for females).
References
Farrington P., Whitaker H., and Ghebremichael-Weldeselassie Y. (2018). Self-controlled Case Series Studies: A modelling Guide with R. Boca Raton: Chapman & Hall/CRC Press.
SCCS with event-dependent exposure
Description
One of the assumptions of the self-controlled case series model is that occurence of an event does not affect subsequent exposure. This function fits the modified SCCS model when the assumption is not satisfied, see Farrington et al (2009). This modified method assumes that no exposure is possible following a unique event. It requires that exposure is of a fixed finite length and that the end of observation that would have applied in the absence of the event is known.
Usage
eventdepenexp(indiv, astart, aend, aevent, adrug, aedrug, expogrp=0,
             sameexpopar=T,agegrp=NULL, dataformat="stack", verbose=F,
             tolerance=1e-8,itermax=100, data)
Arguments
| indiv | a vector of individual identifiers of cases | 
| astart | a vector of ages at which the observation periods start | 
| aend | a vector of ages at end of observation periods, that would have applied in the absence of the event | 
| aevent | a vector of ages at event (one event per case) | 
| adrug | a vector of ages at which exposure starts or a matrix if there are multiple episodes of the same exposure type ( | 
| aedrug | a vector of ages at which exposure-related risk ends or a matrix if there are multiple episodes of the same exposure type. The dimension of  | 
| expogrp | a vector of days to the start of exposure-related risk, counted from  | 
| sameexpopar | a logical value. If TRUE (the default) no dose effect is assumed: the same exposure parameters are used for multiple doses/episodes of the same exposure type presented in  | 
| agegrp | a vector of cut points for the age groups where each value represents the start of an age catagory. The first element in the vector is the start of the second age group. The first age group starts at the minimum of  | 
| dataformat | the way the input data are assembled. It accepts "multi" or "stack" (the default), where "multi" refers to a data assembled with one row representing one event and "stack" refers to a data frame where repeated exposures of the same type are stacked in one column. In the "multi"  | 
| verbose | a logical value indicating whether information about the iterations should be printed. Default is FALSE | 
| tolerance | the convergence tolerance when estimating the parameters. Defaults to 1e-8. | 
| itermax | maximum number of iterations. 100 is the default. | 
| data | a data frame containing the input data. The data should be in 'stack' or 'multi' (see  | 
Details
This model fits a SCCS model with event-dependent exposures.
Value
Relative incidence estimates along with their 95% confidence intervals.
Author(s)
Yonas Ghebremichael-Weldeselassie, Heather Whitaker, Paddy Farrington.
References
Farrington, C. P., Whitaker H.J., and Hocine M.N. (2009). Case series analysis for censored, perturbed or curtailed post-event exposures. Biotatistics, 10(1), 3-16.
Farrington P., Whitaker H., and Ghebremichael-Weldeselassie Y. (2018). Self-controlled Case Series Studies: A modelling Guide with R. Boca Raton: Chapman & Hall/CRC Press.
See Also
Examples
library(SCCS)
# Analysis of rotavirus vaccination and intussusception data
# Model 1: Three doses of the same vaccine exposure OPV (OPV, OPV2 and opv3), 
# only one risk period [adrug, aedrug]
   rot.mod1 <- eventdepenexp(indiv=case, astart=sta, aend=end,
                         aevent=intus, adrug=cbind(rv,rvd2), 
                         expogrp=1,aedrug=cbind(rv+21,rvd2+21), 
                         agegrp=seq(56,168,14), dataformat="multi", data=rotdat)
   rot.mod1
# Model 2: Two doses with two riks periods, 1-7 and 8-21 
   rot.mod2 <- eventdepenexp(indiv=case, astart=sta, aend=end,
                            aevent=intus, adrug=cbind(rv,rvd2),
                            aedrug=cbind(rv+21,rvd2+21), expogrp=c(1,8),
                            agegrp=seq(56,168,14), dataformat="multi", 
                            data=rotdat)   
   rot.mod2
SCCS with event-dependent observation periods
Description
One of the assumptions of the self-controlled case series models is that the observation period for each individual is independent of event times. If an event increases the risk of death, such as myocardial infraction or stroke, this assumption is violated. This function fits the modified SCCS model when the assumption is not satisfied i.e ages at end of observation periods might depend on age at event as outlined in Farrington et al (2011).
Usage
eventdepenobs(formula, indiv, astart, aend, aevent, adrug, aedrug, censor,
              expogrp = list(), washout = list(), sameexpopar = list(), 
              agegrp = NULL, dataformat="stack", covariates=NULL, 
              regress=F, initval=rep(0.1, 7), data)
Arguments
| formula | a model formula. The dependent variable should always be "event" e.g. event ~ itp. If age effects are included in the model, the word 'age' must be used in the formula, e.g event ~ itp + age. | 
| indiv | a vector of individual identifiers of cases | 
| astart | a vector of ages at which the observation periods start | 
| aend | a vector of ages at end of observation periods | 
| aevent | a vector of ages at event (one event per case) | 
| adrug | a list of vectors of ages at start of exposures or a list of matrices if the exposures have multiple episodes ( | 
| aedrug | a list of vectors of ages at which exposure-related risk ends or a list of matrices if there are multiple episodes (repeat exposures in different columns) of the same exposure type. The dimension of each item of  | 
| censor | a vector of indicators for whether an observation periods were censored (1 = observation period ended early, 0 = fully observed). | 
| expogrp | list of vectors of days to the start of exposure-related risk, counted from  | 
| washout | list of vectors with days on start of washout periods counted from  | 
| sameexpopar | a vector of logical values. If TRUE (the default) no dose effect is assumed, the same exposure parameters are used for multiple doses/episodes of the same exposure type presented in  | 
| agegrp | a vector of cut points for the age groups where each value represents the start of an age catagory. The first element in the vector is the start of the second age group. The first age group starts at the minimum of  | 
| dataformat | the way the input data are assembled. It accepts "multi" or "stack" (the default), where "multi" refers to a data assembled with one row representing one event and "stack" refers to a data frame where repeated exposures of the same exposure type are stacked in one column. In the "multi" dataformat different episodes of the same exposure type are recorded as separate columns in the dataframe. | 
| covariates | list of covariates believed to affect the age at censoring (age at end of observation period) (e.g. covariates = gender). | 
| regress | logical,  | 
| initval | a vector of intial values used in fitting the weight functions. These are given in the order of: 1. Log mean of the exponential component 2. Intercept of the EG/EW log mean function  3. Intercept of the EG/EW log shape function 4. Intercept of the logit mixing probability function 5. Regression parameter of the G/W log mean functions, if regress = T 6. Regression parameter of the G/W log shape function, if regress=T 7. Regression parameter of the G/W logit mixing probability function, if regress=T. When  | 
| data | a data frame containing the input data. The data should be in 'stack' or 'multi' (see  | 
Details
This model is suitable when the event increases the risk of death, such as myocardial infraction (MI) or stroke. It is not suitable when the event itself is death. Four models are fitted to the interval between the age at end of observation and the event date, these are detailed in section 5.4 of Farrington et al (2011). The model with the lowest AIC is selected, and used to estimate weights that replace interval lengths in the model formula. This modification allows unbiased estimates of the exposure effect to be estimated, while age effects take on a different interpretation as they include the thinning effect of censoring.
Value
| summary | exposure related relative incidence estimates along with their 95% confidence intervals, age related relative incidence estimates and estimates of interactions with covariates if there are any. | 
| modelfit | model fit of the 4 different weight functions and their AIC values. | 
Author(s)
Yonas Ghebremichael-Weldeselassie, Heather Whitaker, Paddy Farrington.
References
Farrington, C. P., Anaya-Izquierdo, A., Whitaker, H. J., Hocine, M.N., Douglas, I., and Smeeth, L. (2011). Self-Controlled case series analysis With event-Dependent observation periods. Journal of the American Statistical Association 106 (494), 417–426.
Farrington P., Whitaker H., and Ghebremichael-Weldeselassie Y. (2018). Self-controlled Case Series Studies: A modelling Guide with R. Boca Raton: Chapman & Hall/CRC Press.
Examples
 library(SCCS)
 
 # Nicotine replacement therapy and myocardial infarction (MI)
 # With no age effect included
 
 nrt.mod <- eventdepenobs(event~nrt, indiv=case, astart=nrt,
             aend=act, aevent=mi, adrug=nrt, aedrug=nrt+28,
             censor=cen, expogrp=c(0,8,15,22), agegrp=NULL,
             data=nrtdat)
 
 
 # Respiratory tract infections and MI
 # Age effect included
 # intial values provided and there are two risk periods
 
 uni <- (1-duplicated(midat$case))
 ageq <- floor(quantile(midat$mi[uni==1], seq(0.1,0.9,0.1), names=FALSE))
                   # age groups
 
 mi.mod <- eventdepenobs(event~rti+age, indiv=case, astart=sta,
                         aend=end, aevent=mi, adrug=rti, aedrug=rti+14,
                         expogrp=c(0,8), agegrp=ageq, censor=cen, data=midat,
                         initval=rep(1.1,4))
                         
 mi.mod
 
                         
 Data on multitype convulsions and MMR
Description
The data comprise ages in days at MMR vaccine and convusion in children aged 366 to 730 days of age. The convulsions are two types: febrile or non-febrile. There are 988 events in 894 cases.
Usage
febdatFormat
A data frame containing 988 rows and 7 columns. The column names are 'case' (individual identifier), 'conv' (age at convulsion), 'sta' (age at start of observation period), 'end' (age at end of observation period), 'mmr' (age at MMR vaccine), 'sex' (coded 1 for males, 2 for females), 'type' (coded 1 for non-febrile convulsion, 2 for febrile convulsion).
References
Farrington P., Whitaker H., and Ghebremichael-Weldeselassie Y. (2018). Self-controlled Case Series Studies: A modelling Guide with R. Boca Raton: Chapman & Hall/CRC Press.
Formatting data
Description
Reformats the data based on age and/or season and exposure groups prior to fitting SCCS model.
Usage
formatdata(indiv, astart, aend, aevent, adrug, aedrug, expogrp = list(),
           washout = list(), sameexpopar = list(), agegrp = NULL, 
           seasongrp=NULL, dob=NULL, cov = cbind(), dataformat="stack", data)
Arguments
| indiv | a vector of individual identifiers of cases | 
| astart | a vector of ages at which the observation periods start | 
| aend | a vector of ages at end of observation periods | 
| aevent | a vector of ages at event, an individual can experience multiple events | 
| adrug | a list of vectors of ages at start of exposures or a list of matrices if the exposures have multiple episodes ( | 
| aedrug | a list of vectors of ages at which exposure-related risk ends or a list of matrices if there are multiple episodes (repeat exposures in different columns) of the same exposure type. The dimension of each item of  | 
| expogrp | list of vectors of days to the start of exposure-related risk, counted from  | 
| washout | list of vectors with days on start of washout periods counted from  | 
| sameexpopar | a vector of logical values. If TRUE (the default) no dose effect is assumed, the same  exposure parameters are used for multiple doses/episodes of the same exposure type presented in  | 
| agegrp | a vector of cut points of the age groups where each value represents the start of an age catagory. The first element in the vector is the start of the second age group. The first age group starts at  | 
| seasongrp | a vector of cut points for seasonal effects. The values should be given in ddmm format, representing the first days of each season group. The seasonal effect is a factor, the reference level being the time interval starting at the earliest date in  | 
| dob | a vector of birth dates of the cases, in ddmmyyyy format. They are used if seasonal effects are included in the model. The default  | 
| cov | a vector (or a matrix if there are multiple) of fixed covariates. The default is NULL where no covariates are included. | 
| dataformat | the way the input data are assembled. It accepts "multi" or "stack" (the default), where "multi" refers to a data assembled with one row representing one event and "stack" refers to a data frame where repeated exposures of the same type are stack in one column. In the "multi" dataformat different episodes of the same exposure type are recorded as separate columns in the dataframe. | 
| data | a data frame containing the input data. The data should be in 'stack' or 'multi' (see  | 
Value
a data frame containing the following columns:
| indivL | an identfier for each individual event. | 
| event | indicator for presence of an event within an interval. "1" where an event occured, "0" otherwise. | 
| age | factor for age groups. | 
| Season | a factor for season if  | 
| exposures | factors for exposure status of each exposure type. "0" for baseline/control periods, "1" for the first risk period. "1" for subsequent exposure risk periods if sameexpopar=TRUE, or increasing factor levels for each subsequent exposure if sameexpopar = FALSE. Indicators for washout periods (if there are any) are also included here. The column names of these factors are the same as the column names of the exposures in  | 
| interval | length of interval. Needed for offsets within the model. | 
There are also columns for eventday (day of adverse event), lower (day a period starts), upper (day a period ends), indiv (original individual indentifier), aevent, astart, aend and any covariates included in cov. 
Author(s)
Yonas Ghebremichael-Weldeselassie, Heather Whitaker, Paddy Farrington.
References
Whitaker, H. J., Farrington, C. P., Spiessens, B., and Musonda, P. (2006). Tutorial in biostatistics: The self-controlled case series method. Statistics in Medicine 25, 1768–1797.
Farrington P., Whitaker H., and Ghebremichael-Weldeselassie Y. (2018). Self-controlled Case Series Studies: A modelling Guide with R. Boca Raton: Chapman & Hall/CRC Press.
Examples
# MMR vaccine and ITP data
# A single exposure with three risk periods and no age groups included
 itp.dat1 <- formatdata(indiv=case, astart=sta, aend=end,
                      aevent=itp, adrug=mmr, aedrug=mmr+42,
                      expogrp=c(0,15,29), 
                      data=itpdat)
 itp.dat1
# A single exposure with three risk periods and six age groups
 itp.dat2 <- formatdata(indiv=case, astart=sta, aend=end,
                      aevent=itp, adrug=mmr, aedrug=mmr+42,
                      expogrp=c(0,15,29), agegrp=c(427,488,549,610,671),
                      data=itpdat)
 itp.dat2
Data on influenza vaccine and GBS
Description
The data comprise days (day 1 is 1st October 2010) at seasonal influenza vaccination and at onset of Guillain-Barre Syndrome (GBS) in Italy, gathered in the 2010-2011 influenza season. There are 174 cases from Galeotti (2013). Times have been jittered.
Usage
gbsdatFormat
A data frame containing 174 rows and 6 columns. The column names are 'case' (individual identifier), 'sta' (the first day of the observation period), 'end' (the last day of the observation period), 'gbs' (day of GBS onset), 'flu' (day of influenza vaccination), 'sage' (age in years on day 1, 1st October 2010).
References
Galeotti, F., M. Massari, R. D'Alessandro, E. Beghi, A. Chio, G. Logroscino, G. Filippini, M. D. Benedetti, M. Pugliatti, C. Santiccio, and R. Raschetti (2013). Risk of Guillain-Barre syndrome after 2010-2011 influenza vaccination. European Journal of Epidemiology 28 (5), 433-444.
Data on NSAID and first GI bleed
Description
The data comprise ages in days at first gastro-intestinal (GI) bleed and treatments with non-steroidal anti-inflammatory drugs (NSAIDs). There are 838 simulated cases based on Tata et al (2005).
Usage
gidatFormat
A data frame containing 2920 rows and 6 columns. The column names are 'case' (individual identifier), 'sta' (age on the first day of the observation period), 'end' (age on last day of the observation period), 'bleed' (age at first GI bleed), 'ns' (age at first day of NSAID treatment episode), 'endns' (age at last day of NSAID treatment episode). The data are in format 'stack' with different exposure episodes on different rows.
References
Tata, L. J., Fortun, P. J., Hubbard, R. B., Smeeth, L., Hawkey, C. J., Smith, P., Whitaker, H. J., Farrington, C. P., Card, T. R., and West J. (2005). Does concurrent prescription of selective serotonin reuptake inhibitors and non-steroidal anti-inammatory drugs substantially increase the risk of upper gastrointestinal bleeding? Alimentary Pharmacology and Therapeutics 22, 175–181.
Data on DTP, Hib and convulsions
Description
The data comprise ages at diphteria, tetanus and pertussis (DTP) vaccination, Haemophilus influenzae type b (Hib) vaccination, and febrile convulsion. There are 1378 convulsions in 1213 children. The ages have been jittered.
Usage
hibdatFormat
A data frame containing 1378 rows and 11 columns. The column names are 'case' (individual identifier), 'sta' (age on first day of the observation period), 'end' (age on last day of the observation period), 'conv' (age at convulsion), 'dtp' (age at first dose of DTP), 'dtpd2' (age at second dose of DTP), 'dtpd3' (age at third dose of DTP), 'hib' (age at first dose of Hib), 'hibd2' (age at second dose of Hib), 'hibd3' (age at third dose of hib), 'sex' (1 for males, 2 for females).
Data on antidepressants and hip fracture
Description
The data comprise ages in days at first treatment with antidepressant and first hip fracture. There are 1000 simulated cases based on Hubbard et al (2003).
Usage
hipdatFormat
A data frame containing 1000 rows and 6 columns. The column names are 'case' (individual identifier), 'frac' (age at first hip fracture), 'sta' (age on first day of the observation period), 'end' (age on last day of the observation period), 'ad' (age at start of first treatment with antidepressant), 'endad' (age at end of first antidepressant treatment).
References
Hubbard, R., Farrington, C.p., Smith, C., Smeeth, L., and Tattersfield A. (2003). Exposure to tricyclic and selective serotonin reuptake inhibitor antidepressants and the risk of hip fracture. American Journal of Epidemiology 158 (1), 77–84.
Data on intussusception and OPV
Description
The data comprise ages in days at intussusception and oral polio vaccine (OPV) from Cuba. There are 273 intussusceptions in 273 children. The ages have been jittered.
Usage
intdatFormat
A data frame containing 273 rows and 7 columns. The column names are 'case' (individual identifier), 'intus' (age at intussusception), 'sta' (age on first day of the observation period), 'end' (age on last day of the observation period), 'opv' (age at first dose of OPV), 'opvd2' (age at second dose of OPV), 'dob' (date of birth, in ddmmyyyy format).
References
Sardinas, M.A.G., Cardenas, A.z., Marie, G.c., Santiago, M.A., Sanchez, M.V., and Farrington C. P. (2001). Lack of association between intussusception and oral polio vaccine in Cuban children. European Journal of Epidemiology 17 (8), 783-787.
Integral of I-splines
Description
Evaluates design matrix for integrals of I-splines and integrals of the integrals. The function evaluates first, second and third integrals of I-splines.
Usage
integrateIspline(x, knots1, m, int)
Arguments
| x | a numeric vector of values at which to evaluate the integrals of I-spline functions | 
| knots1 | a numeric vector of interior knot positions with non-decreasing values | 
| m | a positive integer giving the order of the spline function. | 
| int | a positive integer (1, 2, or 3) for first, second or third integral of an I-spline function. | 
Value
A matrix with length of (x) rows and length of (knots1) - m columns.
Author(s)
Yonas Ghebremichael-Weldeselassie, Heather Whitaker, Paddy Farrington.
References
Ghebremichael-Weldeselassie Y. (2014). Smooth risk functions for self-cotrolled case series models. PhD thesis, The Open University.
Ghebremichael-Weldeselassie, Y., Whitaker, H. J., Farrington, C. P. (2015). Spline-based self controlled case series method. Statistics in Medicine 33:639-649.
Data on MMR and ITP
Description
The data comprise ages in days at measles, mumps and rubella (MMR) vaccination and hospital admission for idiopathic thrombocytopaenic purpura (ITP). There are 44 admissions in 35 children.
Usage
itpdatFormat
A data frame containing 44 rows and 9 columns. The column names are 'case' (individual identifier), 'sta' (age on first day of the observation period), 'end' (age on last day of the observation period), 'itp' (age at admission for ITP), 'mmr' (age at mmr vaccination), 'sex' (1 for males, 2 for females), 'gap' (days from ITP admission to next ITP admission within the same case, or to end of observation), 'cen' (0 if last admission for this case, 1 otherwise), 'rec' (within-case event number).
Source
Whitaker, H. J., Farrington, C. P., Spiessens, B., and Musonda, P. (2006). Tutorial in biostatistics: The self-controlled case series method. Statistics in Medicine 25, 1768–1797.
References
Miller, E., Waight, P., Farrington, P., Andrews, N., Stowe, J., and Taylor B. (2001). Idiopathic thrombocytopenic purpura and MMR vaccine. Archives of Disease in Childhood 84, 227–229.
Likelihood ratio test for SCCS models
Description
The function performs the likelihood ratio test for SCCS models that are nested (up to combining of multinomial categories).
Usage
lrtsccs(model1, model2)
Arguments
| model1 | an object fitted by the SCCS method e.g  | 
| model2 | an object fitted by the SCCS method e.g  | 
Value
likelihood ratio test statistic, degrees of freedom and p-value.
Author(s)
Yonas Ghebremichael-Weldeselassie, Heather Whitaker, Paddy Farrington.
References
Farrington P., Whitaker H., and Ghebremichael-Weldeselassie Y. (2018). Self-controlled Case Series Studies: A modelling Guide with R. Boca Raton: Chapman & Hall/CRC Press.
Examples
 itp.mod1 <- standardsccs(event~mmr+age, indiv=case, astart=sta,aend=end,
                          aevent=itp, adrug=mmr, aedrug=mmr+42, expogrp=c(0,15,29),
                          agegrp=c(427,488,549,610,671), data=itpdat)
                          
 
 itp.mod2 <- standardsccs(event~age, indiv=case, astart=sta,
                         aend=end, aevent=itp, adrug=mmr, aedrug=mmr+42,
                         expogrp=c(0,15,29), agegrp=c(427,488,549,610,671),
                         data=itpdat)
                         
 itp.mod3 <- standardsccs(event~mmr + age, indiv=case, astart=sta,
                          aend=end, aevent=itp, adrug=mmr, aedrug=mmr+42,
                          agegrp=c(427,488,549,610,671), data=itpdat)
                        
# Compare itp.mod1 a model with both age and exposure (mmr) and itpmod2 a model
# with only age effect 
   lrtsccs(itp.mod1,itp.mod2)
   
# Compare itp.mod1 a model with both age and 3 exposure categories and itpmod3 
# a model with age and only one exposure category  
   lrtsccs(itp.mod3,itp.mod1)  # order of the objects doesn't matter
   
   Data on respiratory tract infections and myocardial infarction
Description
These simulated data comprise ages in days at respiratory tract infection and first myocardial infarction in 1000 cases.
Usage
midatFormat
A data frame containing 2187 rows and 6 columns. The column names are 'case' (individual identifier), 'sta' (age on first day of the observation period), 'end' (age on last day of the observation period), 'mi' (age at first myocardial infarction), 'rti' (age at respiratory tract infection), 'cen' (the censoring indicator, equal to 1 if the observation period is censored, 0 if not). The data are in format "stack".
References
Smeeth L., Thomas, S. L., Hall, A. J., Hubbard, R., Farrington, P., Vallance, P. (2004). Risk of myocardial infarction and stroke after acute infection or vaccination. New England Journal of Medicine 351, 2611-2618.
Spline-based non parametric SCCS method
Description
Fits a spline-based non parametric SCCS model where both the exposure related relative incidence and age related relative incidence functions are represented by spline functions; that is, linear combinations of M-splines.
Usage
nonparasccs(indiv, astart, aend, aevent, adrug, aedrug, kn1=12, kn2=12,
            sp1=NULL, sp2=NULL, data)
Arguments
| indiv | a vector of individual identifiers of cases. | 
| astart | a vector of ages at start of observation periods. | 
| aend | a vector of ages at end of observation periods. | 
| aevent | a vector of ages at event, an individual can experience multiple events. | 
| adrug | a vector of ages at which exposure related risk period starts. | 
| aedrug | a vector of ages at which exposure-related risk ends. | 
| kn1 | an integer >= 5 representing the number of interior knots used to define the M-spline basis functions which are related to the age specific relative incidence function, usually between 8 and 12 knots is sufficient. It defaults to 12 knots. | 
| kn2 | a an integer >= 5 representing the number of interior knots used to define the M-spline basis functions which are related to the exposure specific relative incidence function, usually between 8 and 12 knots is sufficient. The default value is 12. | 
| sp1 | smoothing parameter value for age related relative incidence function. It defaults to "NULL" where the smoothing parameter is obtained automatically using an approximate cross-validation method. The value of "sp1" must be a number greater or equal to 0. | 
| sp2 | smoothing parameter value for exposure related relative incidence function. It defaults to "NULL" where the smoothing paramter is obtained automatically using an approximate cross-validation method. The value of "sp1" must be a number greater or equal to 0. | 
| data | A data frame containing the input data. | 
Details
The smoothing parameters for the age and exposure related relative incidence functions are chosen using a cross-validation method. To visualize the exposure-related relative incidence function, use the plot function.
Value
Relative incidence estimates along with their 95% confidence intervals.
| estimates | exposure related relative incidence estimates at each point of time since start of exposure until the maximum difference between the start and end of exposure. | 
| timesinceexposure | time units since the start of exposure. | 
| lci | lower confidence limits of the exposure related relative incidence estimates. | 
| uci | upper confidence limits of the exposure related relative incidence estimates. | 
Author(s)
Yonas Ghebremichael-Weldeselassie, Heather Whitaker, Paddy Farrington.
References
Ghebremichael-Weldeselassie, Y., Whitaker, H. J., Farrington, C. P. (2016). Flexible modelling of vaccine effects in self-controlled case series models. Biometrical Journal, 58(3):607-622.
Ghebremichael-Weldeselassie, Y., Whitaker, H. J., Farrington, C. P. (2017). Spline-based self controlled case series method. Statistics in Medicine 33:639-649.
Farrington P., Whitaker H., and Ghebremichael-Weldeselassie Y. (2018). Self-controlled Case Series Studies: A modelling Guide with R. Boca Raton: Chapman & Hall/CRC Press.
See Also
Examples
# ITP and MMR data 
  itp.mod <- nonparasccs(indiv=case, astart=sta, aend=end,
                         aevent=itp, adrug=mmr, aedrug=mmr+42, sp1=28000, sp2=1200,
                         data=itpdat)
  itp.mod
  
# Plot the exposure and age related relative incidence functions
  plot(itp.mod)
  
Data on NRT and MI
Description
The data comprise ages in days at first treatment with nicotine replacement therapy (NRT) and first subsequent myocardial infarction (MI). There are 141 simulated cases based on Tata et al (2005).
Usage
nrtdatFormat
A data frame containing 141 rows and 7 columns. The column names are 'case' (individual identifier), 'nrt' (age at initiation of NRT treatment), 'mi' (age at first subsequent MI), 'end' (age at end of nominal observation period, nrt + 365), 'act' (age at earliest of end and actual end of observation period), 'cage' (centred age, in years, at NRT), 'cen' (1 if act = end, 0 if act < end).
References
Hubbard, R., Lewis, S., Smith, C., Godfrey, C., Smeeth, L., Farrington, P., and Britton J. (2005). Use of nicotine replacement therapy and the risk of acute myocardial infarction, stroke and death. Tobacco Control 14, 416-421.
Data on OPV and intussusception
Description
The data comprise ages in days at oral polio vaccine (OPV) and first hospital admission for intussusception in children between the ages of 27 and 365 days. There are 207 cases.
Usage
opvdatFormat
A data frame containing 207 rows and 8 columns. The column names are 'case' (individual identifier), 'sta' (age on first day of the observation period), 'end' (age on last day of the observation period), 'intus' (age at admission for intussusception), 'opv' (age at first dose of OPV), 'opvd2' (age at second dose of OPV), 'opvd3' (age at third dose of OPV), 'sex' (1 for males, 2 for females).
Source
Whitaker, H. J., Farrington, C. P., Spiessens, B., and Musonda, P. (2006). Tutorial in biostatistics: The self-controlled case series method. Statistics in Medicine 25, 1768–1797.
References
Andrews N., Miller, E., Waight, P., Farrington, P., Crowcroft, N., Stowe, J., and Taylor B. (2002). Does oral polio vaccine cause intussusception in infants? Evidence from a sequence of three self-controlled case series studies in the United Kingdom. European Journal of Epidemiology 17, 701–706.
Data on asthma admissions and air pollution
Description
The data are a daily time series of hospital admissions for asthma in Nottingham, with levels (in micrograms per cubic metre) of particulate matter less than 10 micrometres in diameter (PM10 levels).
Usage
pmdatFormat
A data frame containing 2922 rows and 3 columns. The column names are 'day' (day of observation, 1 to 2922), 'asma' (number of asthma admissions on that day), and 'pm10' (PM10 level on that day). The data are in time series format, for SCCS analysis using generalised linear models.
References
Farrington, C. P. and Whitaker H. J. (2006). Semiparametric analysis of case series data (with discussion). Journal of the Royal Statistical Society, Series C 55 (5), 553-594.
Quantitative exposures in self controlled case series method
Description
This function fits the starndard SCCS model where the exposures are measured on a continuous scale.
Usage
quantsccs(formula, indiv, event, data)
Arguments
| formula | model formula. The dependent variale should always be "event" e.g. event ~ expo, where expo is an exposure measured at each time unit. | 
| indiv | a vector of individual identifiers of cases | 
| event | number of events occuring at each time unit. | 
| data | a data frame containing the input data. Data are assembled one line per time unit of observation. | 
Details
In this method exposures are measured at successive time points within the observation period for each case. And number of events experienced by each case at each time point are recorded.
Value
Relative incidence estimates along with their 95% confidence intervals.
Author(s)
Yonas Ghebremichael-Weldeselassie, Heather Whitaker, Paddy Farrington.
References
Farrington P., Whitaker H., and Ghebremichael-Weldeselassie Y. (2018). Self-controlled Case Series Studies: A modelling Guide with R. Boca Raton: Chapman & Hall/CRC Press.
See Also
Examples
# Headaches and blood pressure data. sys and dia (continuoous exposures) are systolic and
# diastolic blood pressure measurements respectively
 bp.mod <- quantsccs(event~sys+dia, indiv=case, event=head,
                      data=bpdat)
 bp.mod
Data on Rotavirus vaccine and intussusception
Description
The data comprise ages in days at rotavirus vaccine (RV) and first symptoms of intussusception in children between the ages of 42 and 183 days. There are 566 cases. Ages have been jittered.
Usage
rotdatFormat
A data frame containing 566 rows and 6 columns. The column names are 'case' (individual identifier), 'sta' (age on first day of the observation period), 'end' (age on last day of the observation period), 'intus' (age at first symptom of intussusception), 'rv' (age at first dose of RV), 'rvd2' (age at second dose of RV).
References
Stowe J., Andrews, N., Ladhani, S., and Miller E. (2016). The risk of intussusception following monovalent rotavirus vaccination in England: A self-controlled case-series evaluation. Vaccine 34, 3684-3689.
Data on RSV and ambient temperature
Description
The data are a weekly time series of counts of respiratory syncytial virus (RSV) isolates from England and Wales and average temperatures (in degrees Celsius) in Central England.
Usage
rsvdatFormat
A data frame containing 364 rows and 5 columns. The column names are 'year' (year of report, 1996 to 2003), 'week' (week of the year, numbered 1 to 52), 'win' (4-week window, numbered 1 to 91), 'temp' (average daily temperature for the week prior to the current week), and 'rsv' (count of RSV isolates for the current week). The data are in time series format, for SCCS analysis using generalised linear models.
References
Whitaker, H. J., Hocine, N. and Farrington C. P. (2007). On case-crossover methods for environmental time series data. Environmetrics 18 (2), 157-171.
Sample size calculation in SCCS
Description
The function calculates the sample size required for an SCCS analysis.
Usage
samplesize(eexpo, risk, astart, aend, p, alpha=0.05, power=0.8, eage=NULL,
          agegrp=NULL)
Arguments
| eexpo | the design value of the exposure related relative incidence. | 
| risk | a positive number showing the duration the risk period. It should not be greater than the duration of the shortest age group, if age groups are specified. | 
| astart | age at start of the observation period. It is an integer greater or equal to 0, that is the same start of observation for all cases. | 
| aend | age at end of an observation period. It is an integer greater than 0, that is the same end of observation for all cases. | 
| p | a vector/scalar of proportions of exposed in each age group, the default is p=1 where there are no age effects and all cases are exposed. | 
| alpha | level of significance e.g 0.05. | 
| power | the power required to detect the relative incidence specified in  | 
| eage | age related relative incidence parameters, the reference group is the first one. | 
| agegrp | a vector of cut points of the age groups where each value represents the start of an age category. The first element in the vector is the start of the second age group. The first age group starts at the minimum of  | 
Value
a sample size is produced.
Author(s)
Yonas Ghebremichael-Weldeselassie, Heather Whitaker, Paddy Farrington.
References
Musonda, P., Farrington, C. P., and Whitaker, H. (2006). Sample sizes for self-controlled case series studies. Statistics in Medicine 25, 2618–2631.
Farrington P., Whitaker H., and Ghebremichael-Weldeselassie Y. (2018). Self-controlled Case Series Studies: A modelling Guide with R. Boca Raton: Chapman & Hall/CRC Press.
Examples
# Sample size for exposure RI = 2.5 with 21 days risk period, 
# all cases exposed. The level of significance is 0.05 
# with 80% power. The sample size of when p=1 is: 
   ss1 <- samplesize(eexpo=2.5, risk=21, astart=366, aend=730,
                 p=1, alpha=0.05, power=0.8)
   ss1
# the sample size of events from cases exposed or not when 75% of the 
# population are exposed
   ss2 <- samplesize(eexpo=2.5, risk=21, astart=366, aend=730,
                 p=0.75, alpha=0.05, power=0.8)
   ss2
# Sample size when age effect is included and the proportions of the  
# target exposed population which are exposed in each age group
# are p=c(0.50,0.35,0.1,0.05): 
   ss3 <- samplesize(eexpo=2.5, risk=21, astart=366, aend=730,
                 p=c(0.50,0.35,0.1,0.05), alpha=0.05, power=0.8,
                 eage=c(1.2,1.6,2.0), agegrp=c(457,548,639))
   ss3
 
# Suppose that the sample is from the entire population with 75% exposed,
# then p=0.75*c(0.50,0.35,0.1,0.05)  
  ss4 <- samplesize(eexpo=2.5, risk=21, astart=366, aend=730,
                 p=0.75*c(0.50,0.35,0.1,0.05), alpha=0.05, power=0.8,
                 eage=c(1.2,1.6,2.0), agegrp=c(457,548,639)) 
  ss4 
 
Semiparametric self-controlled case series method
Description
The function fits the semiparametric self-controlled case series method where the age effect is left unspecified, as published in Farrington and Whitaker (2006).
Usage
semisccs(formula, indiv, astart, aend, aevent, adrug, aedrug, expogrp = list(), 
            washout = list(), sameexpopar = list(), dataformat="stack", data)
Arguments
| formula | model formula. The dependent variable should always be "event" e.g. event ~ itp. There is no need to specify age in the model formula. | 
| indiv | a vector of individual identifiers of cases. | 
| astart | a vector of ages at which the observation periods start. | 
| aend | a vector of ages at end of observation periods. | 
| aevent | a vector of ages at event, an individual can experience multiple events. | 
| adrug | a list of vectors of ages at start of exposures or a list of matrices if the exposures have multiple episodes ( | 
| aedrug | a list of vectors of ages at which exposure-related risk ends or a list of matrices if there are multiple episodes (repeat exposures in different columns) of the same exposure type. The dimension of each item of  | 
| expogrp | list of vectors of days to the start of exposure-related risk, counted from  | 
| washout | list of vectors with days to start of washout periods counted from  | 
| sameexpopar | a vector of logical values. If TRUE (the default) no dose effect is assumed: the same exposure parameters are used for multiple doses of the same exposure type, presented in  | 
| dataformat | the way the input data are assembled. It accepts "multi" or "stack" (the default), where "multi" refers to a data assembled with one row representing one event and "stack" refers to a data frame where repeated exposures of the same type are stack in one column. In the "multi" dataformat different episodes of the same type are recorded as separate columns in the dataframe. | 
| data | a data frame containing the input data. The data should be in 'stack' or 'multi' (see  | 
Details
In the standard SCCS method both age and exposure effects are modelled using step functions. However, mis-specification of age groups in the standard SCCS may lead to bias in the exposure related relative incidence estimates. In the semiparametric SCCS no age groups are pre-specified. A parameter for each day an event occurred is fitted, which means that this method is only suitable for small to medium sized data sets. An alternative for large data sets is provided by smoothagesccs.
Value
The function returns age and exposure related relative incidence estimates along with 95% confidence limits.
Author(s)
Yonas Ghebremichael-Weldeselassie, Heather Whitaker, Paddy Farrington.
References
Farrington, C. P., Whitaker, H. J. (2006). Semiparametric analysis of case series data. Applied Statistics, 55(5): 553–594.
Farrington, P., Whitaker, H., and Ghebremichael-Weldeselassie, Y. (2018). Self-controlled Case Series Studies: A modelling Guide with R. Boca Raton: Chapman & Hall/CRC Press.
See Also
standardsccs, smoothagesccs, smoothexposccs
Examples
# Example 1
# Semiparametric model for the ITP and MMR vaccine data
itp.mod1 <- semisccs(event~mmr, indiv=case, astart=sta,
                      aend=end, aevent=itp, adrug=mmr, aedrug=mmr+42,
                      expogrp=c(0,15,29), data=itpdat)
itp.mod1
# Example 2
# Data on itp and mmr vaccine
# Sex and mmr interaction included 
itp.mod2 <- semisccs(event~factor(sex)*mmr, indiv=case,
                      astart=sta, aend=end, aevent=itp, adrug=mmr,
                      aedrug=mmr+42, expogrp=c(0,15,29), data=itpdat)
itp.mod2
Data on hexavalent vaccine and sudden infant death syndrome (SIDS)
Description
These simulated data comprise ages in days at hexavalent vaccination and SIDS in 300 cases.
Usage
siddatFormat
A data frame containing 300 rows and 5 columns. The column names are 'case' (individual identifier), 'sta' (age on first day of the observation period), 'end' (age on last day of the nominal observation period), 'sids' (age at SIDS), 'hex' (age at first hexavalent vaccination), 'hexd2' (age at second dose), 'hexd3' (age at third dose).
References
Kuhnert R., Hecker, H., Poethko-Muller, C., Schlaud, M., Vennemann, M., Whitaker, H.J., and Farrington C. P. (2011). A modified self-controlled case series method to examine association between multidose vaccinations and death. Statistics in Medicine 30, 666-677.
Simulation of SCCS data
Description
This function creates a simulated SCCS data set with given design parameters, and can be used to generate cases with observation and risk periods of different durations, multiple risk periods, repeated exposures, and washout periods.
Usage
simulatesccsdata(nindivs, astart, aend, adrug, aedrug, expogrp=c(0), eexpo,
                washout=NULL, ewashout=NULL, agegrp=NULL, eage=NULL)
Arguments
| nindivs | a positive integer: number of cases to be generated (1 event per case). | 
| astart | age at start of an observation period. It is a single number if the same start of observation for all cases is required or a vector of length equal to  | 
| aend | age at end of the observation period. A single number for the same end of observation periods for all cases or a vector to allow for different end of observation periods. | 
| adrug | a vector (of length  | 
| aedrug | a vector of ages at which exposure-related risk ends or a matrix if there are multiple exposures. The number of columns of  | 
| expogrp | a vectors of days to the start of exposure-related risk, counted from  | 
| eexpo | a vector of exposure-related relative incidences. | 
| washout | a vector of days to start of washout periods counted from  | 
| ewashout | a vector of true relative incidence values associated with washout periods; it defaults to NULL when washout=NULL. | 
| agegrp | cut points of age groups, defaults to NULL (i.e no age effect included). These are given as the day of an age category starts, the first age category starts at the minimum of  | 
| eage | a vector of age-related relative incidences. The default is NULL where there is no age effect i.e agegrp = NULL. If age-specific relative incidences are from a continuous function  | 
Details
The true relative incidences related to age and exposure could be generated from discrete or continuous distributions.
Value
A data frame with columns "indiv" = individual identifier, "astart" = age on the day observation period starts, "adrug" = age on the day exposure starts, "aedrug" = age at the end of exposure related risk period, "aend" = age at the end of observation period, and "aevent" = age on the day of outcome event.
Author(s)
Yonas Ghebremichael-Weldeselassie, Heather Whitaker, Paddy Farrington.
References
Farrington, P., Whitaker, H., and Ghebremichael-Weldeselassie, Y. (2018). Self-controlled Case Series Studies: A modelling Guide with R. Boca Raton: Chapman & Hall/CRC Press.
Examples
# Simulate data where all the cases have same start and end of 
# observation periods and no age effect 
set.seed(4321)
arisk <- round(runif(110,366,730)) # ages at start of exposure
simdata <- simulatesccsdata(nindivs=110, astart=366, aend=730,
                            adrug=arisk, aedrug=arisk+20, eexpo=2.5)
                            
simdata
Spline-based semiparametric SCCS, smooth age
Description
Fits a semiparametric SCCS model with smooth age effect, where the age related relative incidence function is represented by spline function; that is, linear combinations of M-splines. The exposure related relative incidence function is represented by step functions. One exposure group can be included.
Usage
smoothagesccs(indiv, astart, aend, aevent, adrug, aedrug, expogrp = 0,
              washout = NULL, kn=12, sp = NULL, data)
Arguments
| indiv | a vector of individual identifiers of cases. | 
| astart | a vector of ages at which observation periods start. | 
| aend | a vector of ages at end of observation periods. | 
| aevent | a vector of ages at event, an individual can experience multiple events. | 
| adrug | a vector of ages at which exposure starts, only a single exposure type can be included. | 
| aedrug | a vector of ages at which the exposure-related risk periods end. | 
| expogrp | a vector of days to the start of exposure-related risk, counted from  | 
| washout | a vector of days to start of washout periods counted from  | 
| kn | an integer >=5 representing the number of interior knots used to define the M-spline basis functions which are related to the age specific relative incidence function, usually between 8 and 12 knots is sufficient. It defaults to 12 knots. | 
| sp | smoothing parameter value. It defaults to "auto" where the smoothing paramter is obtained automatically using a cross-validation method. The value of "sp" must be a number greater or equal to 0. | 
| data | a data frame containing the input data. The data are assembled one line per event. | 
Details
The standard SCCS represents the age and exposure effects by piecewise constant step functions, however mis-specification of age group cut points might lead to biased estimates of the exposure related relative incidences. The semiparametric SCCS model, semisccs, has numerical challenges when the number of cases is large. This splined-based semiparametric SCCS model with smooth age effect avoids these limitations of the standard and semiparametric SCCS models. The smoothing parameter for the age-related relative incidence function is chosen by an approximate cross-validation method. The method is outlined in Ghebremichael-Weldeselassie et al (2014). 
Value
Relative incidence estimates along with their 95% confidence limits.
| coef | log of the exposure related relative incidence estimates. | 
| se | standard errors of the log of exposure related relative incidence estimates. | 
| age | age related relative incidences at each day between the minimum age at start of observation and maximum age at end of observation periods. | 
| ageaxis | sequence of ages between the minimum age at start of observations and maximum age at end of observation periods corresponding to the age related relative incidences. | 
| smoothingpara | smoothing parameter chosen by maximizing an approximate cross-validation score or given as an argument in the function | 
| cv | cross-validation score | 
Author(s)
Yonas Ghebremichael-Weldeselassie, Heather Whitaker, Paddy Farrington.
References
Ghebremichael-Weldeselassie, Y., Whitaker, H. J., Farrington, C. P. (2015). Self-controlled case series method with smooth age effect. Statistics in Medicine, 33(4), 639-649.
Farrington, P., Whitaker, H., and Ghebremichael-Weldeselassie, Y. (2018). Self-controlled Case Series Studies: A modelling Guide with R. Boca Raton: Chapman & Hall/CRC Press.
See Also
Examples
# Fit the SCCS model with smooth age effect to the itp data and plot age effect.  
itp.mod <- smoothagesccs(indiv=case, astart=sta,aend=end, aevent=itp,
                           adrug=mmr, aedrug=mmr+42, expogrp=c(0,15,29), sp=2800,
                           data=itpdat)
itp.mod
plot(itp.mod)
Spline-based semiparametric SCCS, smooth exposure
Description
Fits a spline-based SCCS model where the exposure-related relative incidence function is represented by a spline function, that is a linear combination of M-splines, and the age effects are represented by a piecewise constant function.
Usage
smoothexposccs(indiv, astart, aend, aevent, adrug, aedrug, agegrp, kn=12, 
               sp = NULL, data)
Arguments
| indiv | a vector of individual identifiers of cases. | 
| astart | a vector of ages at which the observation periods start. | 
| aend | a vector of ages at end of observation periods. | 
| aevent | a vector of ages at event (outcome of interest), an individual can experience multiple events. | 
| adrug | a vector of ages at which exposure related risk period starts. | 
| aedrug | a vector of ages at which exposure related risk period ends. | 
| agegrp | a vector of cut points for the age groups where each value represents the start of an age catagory. The first element in the vector is the start of the second age group. The first age group starts at the minimum of  | 
| kn | number of interior knots >=5 used to define the M-spline basis functions, usually between 8 and 12 knots is sufficient. The default is 12. | 
| sp | smoothing parameter value. It defaults to "auto" where the smoothing paramter is obtained automatically using a cross validation method. The value of "sp" must be a number greater or equal to 0. | 
| data | a data frame containing the input data. | 
Details
The standardsccs, semisccs and smoothagesccs use piecewise constant step functions to model the exposure effect. However mis-specification of exposure group cut points might result in biased estimates. This method represents exposure related relative incidence function by a spline function.
Value
Relative incidence estimates along with their 95% confidence limits. Varaince-covariance matrix can also be obtained.
| estimates | exposure related relative incidence estimates at each point of time since start of exposure until the maximum duration of exposure. | 
| lci | lower confidence limits of the exposure-related relative incidence estimates. | 
| uci | upper confidence limits of the exposure-related relative incidence estimates. | 
Author(s)
Yonas Ghebremichael-Weldeselassie, Heather Whitaker, Paddy Farrington.
References
Ghebremichael-Weldeselassie, Y., Whitaker, H. J., Farrington, C. P. (2015). Flexible modelling of vaccine effects in self-controlled case series models 25, 1768–1797.
Farrington, P., Whitaker, H., and Ghebremichael-Weldeselassie, Y. (2018). Self-controlled Case Series Studies: A modelling Guide with R. Boca Raton: Chapman & Hall/CRC Press.
See Also
Examples
library(SCCS)
# Fit smooth exposure SCCS to MMR vaccine and itp 
 itp.mod1 <- smoothexposccs(sp=10, indiv=case, astart=sta, aend=end,
                           aevent=itp, adrug=mmr, aedrug=mmr+42,
                           agegrp=c(427, 488, 549, 610, 671), data=itpdat)
 
 itp.mod1
 
 plot(itp.mod1)
 
The standard SCCS method
Description
Fits the standard SCCS model where age and exposure effects are represented by piecewise constant functions.
Usage
standardsccs(formula, indiv, astart, aend, aevent, adrug, aedrug,
            expogrp = list(), washout = list(), sameexpopar = list(),
            agegrp = NULL, seasongrp=NULL, dob=NULL, dataformat="stack",
            data)
Arguments
| formula | model formula. The dependent variable should always be "event" e.g. event ~ itp. If age and/or season effects are included, they should always be included as 'age' and 'season', e.g. event ~ itp + age. | 
| indiv | a vector of individual identifiers of cases. | 
| astart | a vector of ages at which the observation periods start. | 
| aend | a vector of ages at end of observation periods. | 
| aevent | a vector of ages at event; an individual can experience multiple events | 
| adrug | a list of vectors of ages at start of exposures or a list of matrices if the exposures have multiple episodes ( | 
| aedrug | a list of vectors of ages at which exposure-related risk ends or a list of matrices if there are multiple episodes (repeat exposures in different columns) of the same exposure type. The dimension of each item of  | 
| expogrp | list of vectors of days to the start of exposure-related risk, counted from  | 
| washout | list of vectors with days to start of washout periods counted from  | 
| sameexpopar | a vector of logical values. If TRUE (the default) no dose effect is assumed: the same exposure parameters are used for multiple doses/episodes of the same exposure type, presented in  | 
| agegrp | a vector of cut points of the age groups where each value represents the start of an age category. The first element in the vector is the start of the second age group. The first age group starts at the minimum of  | 
| seasongrp | a vector of cut points for seasonal effects. The values should be given in ddmm format, representing the first days of each season group. The seasonal effect is a factor, the reference level being the time interval starting at the earliest date in  | 
| dob | a vector of birth dates of the cases, in ddmmyyyy format. They are used if seasonal effects are included in the model. The default  | 
| dataformat | the way the input data are assembled. It accepts "multi" or "stack" (the default), where "multi" refers to a data assembled with one row representing one event and "stack" refers to a data frame where repeated exposures of the same type are stack in one column. In the "multi" dataformat different episodes of the same exposure type are recorded as separate columns in the dataframe. | 
| data | a data frame containing the input data. The data should be in 'stack' or 'multi' (see  | 
Details
In the standard SCCS model, originally described in Farrington (1995), age and exposure effects are represented by step functions. Suppose that individual i has n_i events, n_{ijk} occurring in age group j and exposure group k and that the s^{{th}} event falls within age group j_s and exposure group k_s. The SCCS likelihood contribution for individual i is
l_i = \frac{\prod_{s=1}^{n_i} \exp(\alpha_{j_s} + \beta_{k_s}) }{\Big(\sum_{j=1}^J\sum_{k=1}^K\exp(\alpha_j + \beta_k) e_{ijk}\Big)^{n_i}}.
This is a multinomial likelihood with index n_i, responses n_{ijk} and probabilities
p_{iuv} = \frac{\exp(\alpha_{u} + \beta_{v})e_{iuv} }{\sum_{j=1}^J \sum_{k=1}^K\exp(\alpha_j + \beta_k) e_{ijk}}.
The standard SCCS likelihood is equivalent to a product multinomial likelihood.
The SCCS likelihood is equivalent to that of the conditional logistic model for 1:M matched case-control studies: see Breslow and Day (1980), Chapter 7. This means that SCCS models can be fit using conditional logistic regression, with a factor for each individual event. 
The SCCS R package maximises the likelihood using clogit. Each event is assigned an identifier (indivL), and direct estimation is avoided using the option strata(indivL).
Value
Relative incidence estimates along with their 95% confidence limits.
Author(s)
Yonas Ghebremichael-Weldeselassie, Heather Whitaker, Paddy Farrington.
References
Farrington, C.P. (1995). Relative incidence estimation from case series for vaccine safety evaluation. Biometrics 51, 228–235.
Breslow, N. E. and Day, N. E. (1980). Statistical Methods in Cancer Research, volume I: The analysis of case-control studies. IARC Publications No.32.
Farrington, P., Whitaker, H., and Ghebremichael-Weldeselassie, Y. (2018). Self-controlled Case Series studies: A modelling Guide with R. Boca Raton: Chapman & Hall/CRC Press.
See Also
Examples
# Single exposure-related risk period with no age effect
  itp.mod1 <- standardsccs(event~mmr, indiv=case, astart=sta, aend=end,
                        aevent=itp, adrug=mmr, aedrug=mmr+42, data=itpdat)
  itp.mod1
# Single exposure-related risk period and age effect included
  itp.mod2 <- standardsccs(event~mmr+age, indiv=case, astart=sta, aend=end,
                        aevent=itp, adrug=mmr, aedrug=mmr+42,
                        agegrp=c(427,488,549,610,671), data=itpdat)
  itp.mod2
# Multiple risk periods and age effect included  
  
  itp.mod3 <- standardsccs(event~mmr+age, indiv=case, astart=sta, aend=end,
                        aevent=itp, adrug=mmr, aedrug=mmr+42, expogrp=c(0,15,29),
                        agegrp=c(427,488,549,610,671), data=itpdat)
  itp.mod3
# Multiple risk periods, washout periods and age effects
   ageq <- floor(quantile(hipdat$frac, seq(0.05,0.95,0.05), 
                         names=FALSE)) # Age group 
                                      # cut points
   
   hip.mod1 <- standardsccs(event~ad+age, indiv=case, astart=sta, aend=end,
                           aevent=frac, adrug=ad, aedrug=endad, expogrp=c(0,15,43),
                           washout=c(1,92,182), agegrp=ageq, data=hipdat)
# Multiple/repeat exposures of the same exposure type, dataformat="stack" 
   ageq <- floor(quantile(gidat$bleed[duplicated(gidat$case)==0], 
                seq(0.025,0.975,0.025), names=FALSE))
   gi.mod1 <- standardsccs(event~ns+relevel(age,ref=21), indiv=case, astart=sta,
                         aend=end, aevent=bleed, adrug=ns, aedrug=endns,
                         agegrp=ageq, dataformat="stack", data=gidat)
   gi.mod1
   
#  Multiple doses of a vaccine each with different parameter estimates (sameexpopar=F)
  
   
   ageg <- c(57,85,113,141,169,197,225,253,281,309,337) # age group cut points
   
   dtp.mod2 <- standardsccs(event~dtp+age, indiv=case, astart=sta, aend=end,
                           aevent=conv, adrug=cbind(dtp,dtpd2, dtpd3), 
                           aedrug=cbind(dtp+14,dtpd2+14,dtpd3+14), 
                           expogrp=c(0,4,8),agegrp=ageg, dataformat="multi", 
                           sameexpopar=FALSE, data=dtpdat)
   dtp.mod2
   
# Multiple exposure types 
  ageg <- seq(387,707,20) # Age group cut points
 con.mod <- standardsccs(event~hib+mmr+age, indiv=case, astart=sta, aend=end,
                          aevent=conv, adrug=cbind(hib,mmr), aedrug=cbind(hib+14,mmr+14), 
                          expogrp=list(c(0,8), c(0,8)), agegrp=ageg, data=condat)
  con.mod
   
# Multiple doses/episodes of several exposure types, the doses of each exposure type
# have same paramter
   ageg <- c(57,85,113,141,169,197,225,253,281,309,337)   # age group cut points 
   hib.mod1 <- standardsccs(event~dtp+hib+age, indiv=case, astart=sta, 
                            aend=end, aevent=conv, 
                           adrug=list(cbind(dtp, dtpd2,dtpd3),
                                       cbind(hib,hibd2,hibd3)),
                            aedrug=list(cbind(dtp+14,dtpd2+14,dtpd3+14),
                                        cbind(hib+14,hibd2+14,hibd3+14)),
                            expogrp=list(c(0,4,8),c(0,8)),agegrp=ageg, 
                            dataformat="multi", data=hibdat)
                            
   hib.mod1                   
# Multiple doses/episodes of several exposure types, the doses of "dtp" 
# different parameters and the doses of the second exposure hib have
# same paramters
   ageg <- c(57,85,113,141,169,197,225,253,281,309,337)
                    # age group cut points 
   hib.mod2 <- standardsccs(event~dtp+hib+age, 
                                   indiv=case, astart=sta, aend=end,
                        aevent=conv, adrug=list(cbind(dtp,dtpd2,dtpd3), 
                                      cbind(hib,hibd2,hibd3)), 
                        aedrug=list(cbind(dtp+3,dtpd2+3,dtpd3+3),
                                      cbind(hib+7,hibd2+7,hibd3+7)),
                        sameexpopar=c(FALSE,TRUE), agegrp=ageg, 
                        dataformat="multi", data=hibdat)
   hib.mod2
   
# Season included in a model 
   month <- c(0101,0102,0103,0104,0105,0106,0107,0108,0109,0110,0111,0112) 
   # season cutpoints 
   
   int.mod <- standardsccs(event~opv+age+season, indiv=case, astart=sta,
                          aend=end, aevent=intus, adrug=cbind(opv,opvd2),
                          aedrug=cbind(opv+42,opvd2+42), expogrp=c(0,15,29),
                          agegrp=seq(30,330,30), seasongrp=month,dob=dob, 
                          dataformat="multi", data=intdat)
   int.mod