nmr_import_spectra()
nmr_plot_spectra()
nmr_assign_bins()
nmr_relabund()
This vignette describes the general workflow for processing NMR data
using the {nmrrr}
package.
This package can be used for batch processing and analysis of NMR (nuclear magnetic resonance) data, including combining and cleaning spectral data, assigning compound classes to the peaks, and calculating relative contributions of the compound classes.
This package will not perform corrections on raw spectral data (e.g.,
phase correction, baseline correction, peak picking, etc.). These steps
must be done prior to using {nmrrr}
, using the appropriate
software (e.g., MNova, TopSpin).
For tips on processing NMR data in MNova/MestreNova, check out the repository wiki.
Currently, this package can handle data generated from MNova and TopSpin software. Because of the different file formats, users must specify the method when using the functions.
A note on the data used here
This example uses data from the
kfp_hysteresis
dataset included with the{nmrrr}
package. This is a subset of the data reported in Patel et al. 2021, representing samples subjected to drought and flood treatments. 1H solution-state NMR was performed on extracts reconstituted in DMSO-D6. The raw spectra were processed and cleaned using MNova, and the spectra and peaks were exported as .csv files. We use the bin set from Clemente et al. 2012 for compound classification.This dataset contains (a) SPECTRA data and (B) PEAKS data (peak picked in MNova)
library(nmrrr)
library(ggplot2)
theme_set(theme_bw()) # set the default ggplot theme
<- system.file("extdata", "kfp_hysteresis", "spectra_mnova", package = "nmrrr")
SPECTRA_FILES <- system.file("extdata", "kfp_hysteresis", "peaks_mnova_multiple", package = "nmrrr") PEAKS_FILES
nmr_import_spectra()
This function will:
<- nmr_import_spectra(path = SPECTRA_FILES,
spectra_df method = "mnova")
str(spectra_df)
#> tibble [65,589 × 3] (S3: tbl_df/tbl/data.frame)
#> $ ppm : num [1:65589] 0.00392 0.00422 0.00453 0.00483 0.00514 ...
#> $ intensity: num [1:65589] 0.00202 0.00202 0.00203 0.00203 0.00204 ...
#> $ sampleID : chr [1:65589] "29" "29" "29" "29" ...
Further cleaning of the dataframe may be done by the user, according to specific needs. For instance, including only certain ranges of ppm shift. For this dataset, we include only points between 0 and 10 ppm.
<- subset(spectra_df, ppm >= 0 & ppm <= 10)
spectra_df
str(spectra_df)
#> tibble [65,440 × 3] (S3: tbl_df/tbl/data.frame)
#> $ ppm : num [1:65440] 0.00392 0.00422 0.00453 0.00483 0.00514 ...
#> $ intensity: num [1:65440] 0.00202 0.00202 0.00203 0.00203 0.00204 ...
#> $ sampleID : chr [1:65440] "29" "29" "29" "29" ...
nmr_plot_spectra()
This function will plot all the spectra present in the
spectra_df
file. The spectra will be stacked and offset
vertically (this can be customized).
LABEL_POSITION = ...
.STAGGER = ...
.ggplot2
capabilities, and
the plot can therefore be customized using ggplot2
nomenclature.nmr_plot_spectra(dat = spectra_df,
binset = bins_Clemente2012,
label_position = 5,
mapping = aes(x = ppm,
y = intensity,
group = sampleID,
color = sampleID),
stagger = 0.5) +
# OPTIONAL PARAMETERS/LAYERS
geom_rect(aes(xmin = 2, xmax = 4, ymin = 0, ymax = 5.5),
fill = "white", color = NA, alpha = 0.8)+
labs(subtitle = "binset: Clemente et al. 2012")+
ylim(0, 5.5)
Notes:
?binset_Clemente2012
or
vignette("nmrrr_binsets)
.nmr_assign_bins()
This function will assign bins/compound classes to the peaks based on the preferred bin set.
This package provides bin sets for DMSO-d6, D2O, and MeOD solvents.
Users can choose from the available options, or can import their own
preferred bin set. See vignette("nmrrr_binsets")
for more
details.
<- nmr_assign_bins(dat = spectra_df,
spectra_bins binset = bins_Clemente2012)
Note: The user may want to assign additional filtering steps to filter certain flagged data points, e.g. impurities, weak peaks, etc.
For this current dataset, because of the strong influence of water peaks in the o-alkyl region, we exclude that region from our calculations.
<- subset(spectra_bins, group != "oalkyl") spectra_bins
nmr_relabund()
Method 1: Integrating area under the curve from processed spectra files
<- nmr_relabund(dat = spectra_bins,
relabund_integration method = "AUC")
Method 2: Calculating from peaks data
This method is specific to MNova-processed data. Users may pick peaks within MNova and export these as a table. In this case, users can simply add the area counts for each peak to calculate the relative contribution of the peak/bin type to the total area.
The peaks data can be exported one of two ways, giving two different formats of data files (“single columns” and “multiple columns”); this package can handle both versions. More details can be found in the repository wiki.
For both types, however, we first need to import and combine the files, then assign bin classes, and then add the areas.
<- nmr_import_peaks(path = PEAKS_FILES,
peaks_df method = "multiple columns")
str(peaks_df)
#> tibble [207 × 10] (S3: tbl_df/tbl/data.frame)
#> $ Obs : int [1:207] 1 2 3 4 5 6 7 8 9 10 ...
#> $ ppm : num [1:207] 15.37 14.27 14.09 7.08 7.07 ...
#> $ Intensity : num [1:207] 0 0 0 0.3 0 0.1 0.5 0 0 0 ...
#> $ Width : num [1:207] 0.63 0.61 0.71 43.83 15.82 ...
#> $ Area : num [1:207] 0.08 0.07 0.07 179.46 3.01 ...
#> $ Type : chr [1:207] "Artifact" "Artifact" "Artifact" "Compound" ...
#> $ Flags : chr [1:207] "Weak" "Weak" "Weak" "None" ...
#> $ Impurity/Compound: chr [1:207] NA NA NA NA ...
#> $ Annotation : chr [1:207] "" "" "" "" ...
#> $ sampleID : chr [1:207] "29" "29" "29" "29" ...
The columns we care about the most are ppm
and
Area
. There are additional columns that provide flags for
the peaks identified
(e.g. Type == "Artifact"/"Compound"/"Solvent"
,
Flags = "Weak"/"None"
, etc.). These can be filtered by the
user as needed.
<- subset(peaks_df, Type == "Compound") peaks_df
<- nmr_assign_bins(dat = peaks_df,
peaks_bins binset = bins_Clemente2012)
For this current dataset, because of the strong influence of water peaks in the o-alkyl region, we exclude that region from our calculations.
<- subset(peaks_bins, group != "oalkyl") peaks_bins
<- nmr_relabund(dat = peaks_bins,
relabund_peaks method = "peaks")
Users may then plot the relative abundance data using stacked bar plots, for example:
ggplot(relabund_integration,
aes(x = sampleID, y = relabund, fill = group))+
geom_bar(stat = "identity")+
labs(title = "Relative abundance by AUC",
subtitle = "binset: Clemente et al. 2012")
A note on the data used here
This example uses data from the
amp_burnseverity
dataset included with the{nmrrr}
package. This is a subset of the data available in Greiger et al. 2022, representing vegetation samples that were experimentally burnt in an open air burn table. Solid-state cross-polarization (CP) 13C NMR was performed on these samples. The raw spectra were processed and cleaned by scaling to mass using SIMPSON, and the spectra were batch-exported as a single .csv files. We use the SS bin set from Clemente et al. 2012 for compound classification.This dataset contains one .csv file with all the samples. This file cannot be processed with {nmrrr} in its current form, but we can import it and convert to long-form, after which it is compatible with the {nmrrr} functions.
Here, we provide the workflow to demonstrate how to use additional
formats with the {nmrrr} package. The first step is to bring the data
into a format that is compatible with {nmrrr} functions, i.e., long-form
data, with one column each for ppm
, intensity
,
and sampleID
.
This workflow makes use of {tidyverse} functions, but users may use other preferred packages and functions to get the same results.
library(tidyverse)
<- system.file("extdata", "amp_burnseverity", "spectra_wide.csv", package = "nmrrr")
SS_FILE <- read.csv(SS_FILE)
ss_data
## Make long form and do additional cleaning if needed.
=
ss_data_long %>%
ss_data pivot_longer(-ppm,
names_to = "sampleID",
values_to = "intensity") %>%
arrange(sampleID, ppm)
= subset(ss_data_long, ppm >= 0 & ppm <= 250)
ss_data_long = subset(ss_data_long, intensity >= 0)
ss_data_long
## Assign bins
= nmr_assign_bins(dat = ss_data_long,
data_long_bins binset = bins_ss_Clemente2012)
## Plot spectra
nmr_plot_spectra(dat = data_long_bins,
binset = bins_ss_Clemente2012,
mapping = aes(x = ppm, y = intensity,
group = sampleID,
color = sampleID),
stagger = 15,
label_position = 70)+
theme(axis.text.y = element_blank())+
xlim(210, 0)
## Calculate relative abundance
= nmr_relabund(dat = data_long_bins,
data_relabund method = "AUC")
ggplot(data = data_relabund,
aes(x = sampleID,
y = relabund,
fill = group))+
geom_bar(stat = "identity")
Users may import their own binsets, if they do not wish to use the binsets provided with the {nmrrr} package. Binsets are simply dataframes, and therefore can be imported from any .csv, .txt, Excel file, or similar.
The binset dataframe must have columns:
number
- Serial number of the groupgroup
- Shortened name of the group. This column is
used to label the groups, and will be seen in legends, tables, etc.start
- Lower limit (ppm shift) of the binstop
- Upper limit (ppm shift) of the bindescription
- Optional column, with full-length
description of the group
Below is an example of the binset format required.
bins_Clemente2012#> # A tibble: 6 × 5
#> number group start stop description
#> <int> <chr> <dbl> <dbl> <chr>
#> 1 1 aliphatic1 0.3 1.3 aliphatic methyl and methylene
#> 2 2 aliphatic2 1.3 2.2 aliphatic methyl and methylene near O and N
#> 3 3 oalkyl 2.9 4.1 O-alkyl, mainly from carbs and lignin
#> 4 4 alphah 4.1 4.8 alpha-H from proteins
#> 5 5 aromatic 6.2 7.8 aromatic, from lignin and proteins
#> 6 6 amide 7.8 8.4 amide from proteins