1. What is GGIR

A brief overview

GGIR is an R-package primarily designed to process multi-day raw accelerometer data for physical activity, sleep, and circadian rhythm research. The term raw refers to data being expressed in m/s² or gravitational acceleration as opposed to the previous generation of accelerometers which stored data in accelerometer brand-specific units. Despite the focus on the raw data, GGIR also offers functionality to process previous-generation accelerometer data. The signal processing for raw data includes automatic calibration to gravity, detection of abnormally high values, imputation of raw-level time gaps (only for specific sensor brands), and calculation of orientation angle and average magnitude acceleration based on a variety of metrics. Next, the signal processing for both raw and previous-generation data continue with the detection of non-wear and epoch-level imputation. Finally, GGIR uses all this information to describe the data at multiple time resolutions on data quality and data summary metrics that could be interpreted as estimates of physical activity, inactivity, sleep, and circadian rhythm. The time resolutions of GGIR output are:

Per recording, which typically matches to one participant ID.
Per sequence of recordings with matching participant IDs, which are optionally appended. For details on how to use this, see documentation on parameter maxRecordingInterval and documentation on how to use parameters as explained further down.
Calendar day with an option to specify the day border timing with midnight as default.
Night defined from noon-noon, unless the person woke up after noon, which is when the sleep algorithm will re-focus on the window 6pm-6pm.
Day segment, which can be defined via code to indicate timing of the segments within a day standardised for all recordings and days in a dataset or via diary file, where the segment definition is allowed to vary per recording and day.
Window between Waking-up to Waking-up the following day.
Window between Sleep onset to sleep onset the following day.
Epoch by epoch time series, where an epoch length is set by the user at for example 5 seconds.

Key strengths

GGIR has a permissive open-source software license to maximise re-use and collaboration.
GGIR is applicable to data from multiple sensor brands and file formats.
GGIR facilitates sleep, physical activity, and circadian rhythm research.
GGIR has extensive quantitative output designed for use in quantitative research.
GGIR is designed to be computationally efficient such as the option to store and re-use intermediate milestone data and the option to process multiple files in parallel on the same computer as discussed in Chapter 2.
GGIR is designed to be accessible for new users without experience in R or programming. GGIR requires only one function call and comes with elaborate open-access documentation. Additionally, paid training courses are offered to maximise the opportunity for users to learn about GGIR and for us to learn from users.
GGIR has had over a dozen code contributors.
GGIR has been available on the CRAN archive, meaning that it meets CRAN standards and that each release has gone through a series of automated checks.
We have public email list (google group) for users to reach out to each other and the maintainers.
Over hundreds of publications have used GGIR, which has been a powerful way to identify problems and improve the code and it has provided us with a wide range of reference values.

History

An elaborate reflection on GGIR’s first 10 years of existence can be found in this blog post. In short, GGIR evolved from a series of R scripts used in research around 2010-2012 to a first release in 2013. A key factor for the growth of GGIR has been its adoption by the research community and the willingness of a variety of researchers to invest in GGIR either in terms of time investment or financially. GGIR would not be what it is without all their efforts.

Philosophy behind GGIR

Flexible and accessible

The research field is highly heterogeneous in: the choice of sensor brand, data format, the study protocols used, and the research questions it tries to answer. At the same time many within the field lack the time or skills to write their own custom data processing software. GGIR aims to be flexible to handle all these different scenarios and at the same time remain accessible to those who lack time or skills to write software. Further, we hope GGIR is of use to those without the financial resources for commercial software, although we would like stress that we are not a charity and depend on paid or unpaid voluntary efforts from contributors.

Algorithm design

The philosophy behind the algorithms as implemented in GGIR is that biologically explainable (heuristic) approaches to data analyses are preferable over purely data-driven approaches, unless there is no other way. The idea is that to advance knowledge in this field of research, it is essential to have an understanding of the causal relation between the phenomena being observed (e.g. body movement), the way an (acceleration) sensor works, what we do with the data produced, and how we interpret the data. In contrast, data-driven methods look by design for optimal correlation with data and are not or much less concerned with such causal links. Identical to how correlation is not necessarily equal to causation in health research, it can also confound the process of measurement. Some examples: We may be able to capture the differences in acceleration that correlate with different activity types or different levels of energy expenditure but that does not mean that we actually measure the activity type or the energy expenditure level. Ignoring such aspects can easily lead to overestimating the value of accelerometers to measure those constructs and to underestimate the value of an accelerometer to capture acceleration as a useful measure of behaviour if appropriately used and interpreted.

A second problem with data-driven methods is that they heavily depend on the availability of reliable criterion methods. It is argued that such reliable criterion methods do not exist:

Indirect calorimetry and the indicators of energy metabolism that can be derived from it are unable to account for the activity type specific role of body weight on energy metabolism. This makes it impossible to make a standardised comparison of the energy cost of different activity types across individuals that differ in body weight. See also reflections in this blog post.
Polysomnography offers a physiological definition of sleep that is impossible to capture with a movement sensor forcing us to simplify our definition of ‘sleep’ towards a definition a movement sensor can capture, by which ‘validation’ with polysomnography becomes somewhat meaningless as we already know that we are no longer measuring the same construct.
Activity types are ambiguous to define given the high number of ways they can be performed. This introduces a fundamental level of uncertainty about the robustness of models outside the datasets and context they were developed in.

As a result, it is essential to put strong emphasis on algorithms that have descriptive value on their own regardless of whether they offer a high correlation with supposed criterion methods.

Permissive open-source license

It may sound obvious to some that research software is open-source, but in the fields of physical activity and sleep research, this is far from the accepted approach. GGIR is one of the very few research tools in this field that has a permissive license aimed to maximise its potential for re-use and collaboration.

Documentation structure and origin

We have structured the chapters in line with the GGIR training course we have been organising in recent years. The documentation that existed before was a collection of ad-hoc written paragraphs, that lacked a clear overarching structure and narrative. As a result, it was difficult to use the documentation in the training course. Further, we also wanted to provide a good level of documentation for those who do not follow the course or want to refresh their understanding of GGIR.

The documentation is mainly written in a narrative style where we have tried to explain both the theory and practice of all GGIR functionalities. Everything you need to type in your R script is highlighted like this.

This documentation is not intended as an academic review: We only cite publications to clarify the origin of algorithms and we only discuss what is part of GGIR.

Finally, the first version of this documentation was sponsored by Accelting with the commitment that this will remain available as free open-access documentation. However, things like this are much easier to maintain as a community: We would be grateful for your help to improve the documentation either by giving feedback, pull requests (for those who know how to do it), or financially.