| Title: | Analysis of Sex Differences in Omics Data for Complex Diseases |
| Version: | 0.1.1 |
| Maintainer: | Enrico Glaab <enrico.glaab@uni.lu> |
| Description: | Tools to analyze sex differences in omics data for complex diseases. It includes functions for differential expression analysis using the 'limma' method <doi:10.1093/nar/gkv007>, interaction testing between sex and disease, pathway enrichment with 'clusterProfiler' <doi:10.1089/omi.2011.0118>, and gene regulatory network (GRN) construction and analysis using 'igraph'. The package enables a reproducible workflow from raw data processing to biological interpretation. |
| Depends: | R (≥ 3.6) |
| Imports: | limma, igraph, edgeR, Seurat, SeuratObject, clusterProfiler, org.Hs.eg.db, ReactomePA, data.table, ggplot2, tidyr, grid, ggraph, dplyr, ggrepel, scales, Rcpp, methods |
| License: | MIT + file LICENSE |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.3 |
| Suggests: | R.utils, DT, gridExtra, knitr, htmltools, kableExtra, rmarkdown, stringr |
| LinkingTo: | Rcpp |
| VignetteBuilder: | knitr, rmarkdown |
| NeedsCompilation: | yes |
| Packaged: | 2025-11-10 15:07:41 UTC; mohamed.soudy |
| Author: | Enrico Glaab [aut, cre], Sophie Le Bars [aut], Mohamed Soudy [aut], Murodzhon Akhmedov [cph] |
| Repository: | CRAN |
| Date/Publication: | 2025-11-13 21:20:08 UTC |
XYomics: Analysis of Sex Differences in Omics Data
Description
The **XYomics** package provides functions for performing differential expression analysis, pathway enrichment, gene regulatory network analysis, and comprehensive report generation for omics data.
Details
This package is designed to integrate various omics analyses (e.g., functional omics and single-cell data) with advanced visualization and reporting tools.
Author(s)
Maintainer: Enrico Glaab enrico.glaab@uni.lu
Authors:
Sophie Le Bars sophie.lebars@uni.lu
Mohamed Soudy mohamed.soudy@uni.lu
Other contributors:
Murodzhon Akhmedov [contributor, copyright holder]
Prize-collecting Steiner Forest (PCSF)
Description
PCSF returns a subnetwork obtained by solving the PCSF on the given interaction network.
Usage
PCSF(ppi, terminals, w = 2, b = 1, mu = 5e-04, dummies)
Arguments
ppi |
An interaction network, an igraph object. |
terminals |
A list of terminal genes with prizes to be analyzed in the PCSF context.
A named |
w |
A |
b |
A |
mu |
A |
dummies |
A list of nodes that are to connected to the root of the tree. If missing the root will be connected to all terminals. |
Details
The PCSF is a well-know problem in graph theory.
Given an undirected graph G = (V, E), where the vertices are labeled with prizes
p_{v} and the edges are labeled with costs c_{e} > 0, the goal is to identify
a subnetwork G' = (V', E') with a forest structure. The target is to minimize
the total edge costs in E', the total node prizes left out of V', and the
number of trees in G'. This is equivalent to minimization of the following
objective function:
F(G')= Minimize \sum_{ e \in E'} c_{e} + \beta*\sum_{v \not\in V'} p_v + \omega*k
where, k is the number of trees in the forest, and it is regulated by parameter \omega.
The parameter \beta is used to tune the prizes of nodes.
This optimization problem nicely maps onto the problem of finding differentially
enriched subnetworks in the cell protein-protein interaction (PPI) network.
The vertices of interaction network correspond to genes or proteins, and edges
represent the interactions among them. We can assign prizes
to vertices based on measurements of differential expression, copy number, or
mutation, and costs to edges based on confidence scores for those intra-cellular
interactions from experimental observation, yielding a proper input to the PCSF
problem. Vertices that are assigned a prize are referred to terminal nodes,
whereas the vertices which are not observed in patient data are not assigned a
prize and are called Steiner nodes. After scoring the interactome, the
PCSF is used to detect a relevant subnetwork (forest), which corresponds to a
portion of the interactome, where many genes are highly correlated in terms of
their functions and may regulate the differentially active biological process
of interest. The PCSF aims to identify neighborhoods in interaction networks
potentially belonging to the key dysregulated pathways of a disease.
In order to avoid a bias towards the hub nodes of PPI networks to appear in solution
of PCSF, we penalize the prizes of Steiner nodes according to their degree
distribution in PPI, and it is regulated by parameter \mu:
p'_{v} = p_{v} - \mu*degree(v)
The parameter \mu also affects the total number of Steiner nodes in the solution.
Higher the value of \mu smaller the number of Steiners in the subnetwork,
and vice-versa. Based on our previous analysis the recommended range of \mu
for biological networks is between 1e-4 and 5e-2, and users can choose the values
resulting subnetworks with vertex sets that have desirable Steiner/terminal
node ratio and average Steiner/terminal in-degree ratio
in the template interaction network.
Value
The final subnetwork obtained by the PCSF. It return an igraph object with the node prize and edge cost attributes.
Author(s)
Murodzhon Akhmedov
References
Akhmedov M., LeNail A., Bertoni F., Kwee I., Fraenkel E., and Montemanni R. (2017) A Fast Prize-Collecting Steiner Forest Algorithm for Functional Analyses in Biological Networks. Lecture Notes in Computer Science, to appear.
Internal function call_sr
Description
This function is internally used to solve the PCST.
Usage
call_sr(from, to, cost, node_names, node_prizes)
Arguments
from |
A |
to |
A |
cost |
A |
node_names |
A |
node_prizes |
A |
Value
R object (SEXP) which is converted from C++ List.
Author(s)
Murodzhon Akhmedov
Compute sex-specific differentially expressed genes (DEGs) per category
Description
Identifies male-specific, female-specific, sex-dimorphic, and sex-neutral DEGs from differential expression results.
Usage
categorize_sex_sc(
male_degs,
female_degs,
target_fdr = 0.05,
exclude_pval = 0.5,
min_abs_logfc = 0.25
)
Arguments
male_degs |
Data frame containing male differential expression results from one specific cell-type or bulk dataset. |
female_degs |
Data frame containing female differential expression results from one specific cell-type or bulk dataset. |
target_fdr |
Numeric. FDR threshold for significance. |
exclude_pval |
Numeric. P-value threshold for excluding genes in opposite sex. |
min_abs_logfc |
Numeric. Minimum absolute log2 fold change threshold. |
Value
Data frame containing categorized DEGs with associated statistics.
Perform Pathway Enrichment Analysis for Pre-Categorized Differentially Expressed Genes (DEGs)
Description
This function performs pathway enrichment analysis for differentially expressed genes (DEGs), which are already categorized into different types (e.g., Dimorphic, Neutral, Sex-specific) via the 'categorize_sex_sc' function. The function analyzes their enrichment in KEGG, GO, or Reactome pathways.
Usage
categorized_enrich_sc(
DEGs_category,
enrichment_db = "KEGG",
organism = "hsa",
org_db = org.Hs.eg.db,
pvalueCutoff = 0.05,
qvalueCutoff = 0.2
)
Arguments
DEGs_category |
Data frame containing gene symbols and their corresponding DEG types. Must include columns 'DEG_Type' (DEGs categories) and 'Gene_Symbols'. |
enrichment_db |
Character string specifying the enrichment database to use: "KEGG", "GO", or "REACTOME" (default: "KEGG"). |
organism |
Character string representing the organism code. For KEGG enrichment, use "hsa" (default). For Reactome enrichment, use "human". |
org_db |
databse of the organism (e.g: Org.Hs.eg.db) |
pvalueCutoff |
Numeric value specifying the p-value cutoff for statistical significance (default: 0.05). |
qvalueCutoff |
Numeric value specifying the q-value cutoff for multiple testing correction (default: 0.2). |
Details
- The input DEGs are already categorized by the 'categorize_sex_sc' function. - For GO enrichment, an appropriate OrgDb object (e.g., org.Hs.eg.db for humans) must be available. - For KEGG and Reactome enrichment, gene symbols are first converted to ENTREZ IDs. - Requires the 'clusterProfiler' package for enrichment analysis. - Ensures appropriate error handling for missing genes or database issues.
Value
A named list of enriched pathways for each DEG category, structured as a data frame.
Construct Protein-protein interaction Network using Prize-Collecting Steiner Forest
Description
Constructs a condition-specific gene regulatory network based on differential expression results using the PCSF algorithm.
Usage
construct_ppi_pcsf(
g,
prizes,
w = 2,
b = 1,
mu = 5e-04,
seed = 1,
min_nodes = 1
)
Arguments
g |
An igraph object representing the base network. |
prizes |
A named numeric vector of gene scores (prizes). Names must match vertex names in g. |
w |
Numeric. Edge cost scaling weight. Default is 2. |
b |
Numeric. Balance between prizes and edge costs. Default is 1. |
mu |
Numeric. Trade-off parameter for sparsity. Default is 5e-04. |
seed |
Integer. Random seed. Default is 1. |
min_nodes |
Integer. Minimum number of nodes in subnetwork. Default is 1. |
Value
An igraph object representing the extracted subnetwork. Returns NULL invisibly if no prize genes are present, the subnetwork is too small, or the PCSF algorithm fails.
Convert Data Frame to enrichResult
Description
Converts a data frame containing enrichment results into a clusterProfiler enrichResult object. Assumes the data frame has columns: ID, geneID, pvalue, and optionally p.adjust.
Usage
convertdf2enr(df, pvalueCutoff = 0.1, pAdjustMethod = "BH")
Arguments
df |
Data frame containing enrichment results. |
pvalueCutoff |
Numeric. P-value cutoff for the enrichment object (default: 0.1). |
pAdjustMethod |
Character string specifying the p-value adjustment method (default: "BH"). |
Value
An enrichResult object compatible with clusterProfiler plotting functions.
Generate Boxplots for Expression Data
Description
Creates boxplots to visualize expression differences across conditions and genders.
Usage
generate_boxplot(
x,
index,
phenotype,
gender,
title = "Expression Boxplot",
xlab = "Conditions",
ylab = "Expression Level"
)
Arguments
x |
Expression data matrix. |
index |
Numeric vector indicating which features (rows) to plot. |
phenotype |
Vector of phenotype labels. |
gender |
Vector of gender labels. |
title |
Title for the plot. |
xlab |
Label for the x-axis. |
ylab |
Label for the y-axis. |
Value
A boxplot is generated.
Generate a Comprehensive Analysis Report
Description
This function creates an integrated report that combines key analysis outputs,
Usage
generate_cat_report(
results_cat = results_cat,
enrichment_cat = results_cat,
grn_object = grn_object,
output_file = "cat_analysis_report.html",
output_dir = tempdir(),
template_path = NULL,
quiet = TRUE
)
Arguments
results_cat |
A data frame or list containing differential expression results. |
enrichment_cat |
A list with enrichment objects (e.g., BP, MF, KEGG, and optionally GSEA results). |
grn_object |
An igraph object representing the gene regulatory network (e.g., from PCSF analysis). |
output_file |
Character. The desired name (and optionally path) for the rendered report (default: "analysis_report.html"). |
output_dir |
Character. Output directory to save the report to. |
template_path |
Character. Path to the R Markdown template file. If |
quiet |
Logical. If |
Value
A character string with the path to the rendered report.
Generate a Comprehensive Analysis Report
Description
Creates an integrated HTML report combining differential expression results, enrichment analyses (GO, KEGG, GSEA), and gene regulatory network (GRN) data. Uses a parameterized R Markdown template for rendering.
Usage
generate_report(
de_results,
enrichment_results,
grn_object,
output_file = "analysis_report.html",
template_path = NULL,
params_list = list(),
quiet = TRUE
)
Arguments
de_results |
Data frame or list with differential expression results. |
enrichment_results |
List of enrichment results (e.g., BP, MF, KEGG, GSEA). |
grn_object |
An igraph object of the gene regulatory network. |
output_file |
Output report name (default: "analysis_report.html"). |
template_path |
Path to the R Markdown template. If NULL, uses the built-in template. |
params_list |
Named list of extra parameters passed to the R Markdown report. |
quiet |
Logical; if TRUE (default), rendering is quiet. |
Value
Character string with the path to the rendered report.
Download and Process STRING Protein-Protein Interaction Network
Description
Downloads and processes the STRING protein-protein interaction network, converting it to a simplified igraph object. The function downloads the network from STRING database, filters interactions by confidence score, converts STRING IDs to ENTREZ IDs, and returns the largest connected component as an undirected graph.
Usage
get_string_network(
organism = "9606",
score_threshold = 700,
use_default = TRUE
)
Arguments
organism |
Character string specifying the NCBI taxonomy identifier. Default is "9606" (Homo sapiens). |
score_threshold |
Numeric value between 0 and 1000 specifying the minimum combined score threshold for including interactions. Default is 700. |
use_default |
it will return the default network (9606 and score of 700) |
Details
The function performs the following steps:
Downloads protein interactions from STRING database
Filters interactions based on combined score
Downloads and processes STRING ID to ENTREZ ID mappings
Creates an igraph object with filtered interactions
Removes self-loops and multiple edges
Extracts the largest connected component
Value
An igraph object representing the largest connected component of the filtered STRING network, with the following properties:
Undirected edges
No self-loops
No multiple edges
Edge weights (1000 - combined_score)
Vertex names as ENTREZ IDs
Identify sex-specific and sex-dimorphic genes
Description
This function identifies truly sex-specific and sex-dimorphic genes by analyzing differential expression results from both sexes.
Usage
identify_sex_specific_genes(
male_results,
female_results,
target_fdr = 0.05,
exclude_fdr = 0.5
)
Arguments
male_results |
Data frame of differential expression results for males (from differential_expression). |
female_results |
Data frame of differential expression results for females (from differential_expression). |
target_fdr |
Numeric. FDR threshold for significant differential expression (default: 0.05). |
exclude_fdr |
Numeric. FDR threshold for excluding effects in the opposite sex (default: 0.5). |
Details
This function implements a two-step approach to identify sex-specific effects: 1. Identifies genes significantly affected in one sex (target_fdr) 2. Confirms lack of effect in the other sex (exclude_fdr) Additionally identifies genes with opposite (dimorphic) or same (shared) effects in both sexes.
Value
A data frame with identified genes categorized as: - male-specific: significant in males, not significant in females - female-specific: significant in females, not significant in males - sex-dimorphic: significant in both sexes with opposite effects - sex-shared: significant in both sexes with same direction Including columns for gene IDs, logFC values, and FDR values for both sexes.
Improved Pathway Enrichment Analysis
Description
Performs pathway enrichment analysis on a set of sex-biased genes using clusterProfiler.
Usage
improved_pathway_enrichment(
gene_list,
enrichment_db = "KEGG",
organism = "hsa",
org_db = org.Hs.eg.db,
pvalueCutoff = 0.05,
qvalueCutoff = 0.2
)
Arguments
gene_list |
A character vector of gene identifiers. |
enrichment_db |
Character string specifying the database for enrichment. Options include "KEGG", "GO", and "Reactome". Default is "KEGG". |
organism |
Character string specifying the organism code (e.g., "hsa" for human). |
org_db |
database of the organism (e.g: "org.Hs.eg.db") |
pvalueCutoff |
Numeric. P-value cutoff for enrichment (default: 0.05). |
qvalueCutoff |
Numeric. Q-value cutoff for enrichment (default: 0.2). |
Value
An enrichment result object.
Plot a Condition-Specific protein-protein interaction network with DEG Annotations
Description
Visualizes a gene regulatory or protein–protein interaction network for a given cell type and differential expression group. Nodes are sized and colored by degree, and key hub genes are optionally annotated with their barplots of log fold-changes across sexes.
Usage
plot_network(g, cell_type, DEG_type, result_categories)
Arguments
g |
An 'igraph' object representing the gene or protein interaction network. |
cell_type |
Character string. The cell type label used in the plot title. |
DEG_type |
Character string. The differential expression category to visualize (e.g., '"sex-dimorphic"'). |
result_categories |
A 'data.frame' or tibble containing at least the columns: '"DEG_Type"', '"Gene_Symbols"', '"Male_avg_logFC"', and '"Female_avg_logFC"'. |
Value
A 'ggplot' object representing the visualized network.
Perform Sex-Phenotype Interaction Analysis for Bulk Data (Interaction Term)
Description
This function performs a formal interaction analysis on bulk expression data to identify genes whose expression is significantly modulated by the interaction between sex and a given phenotype/condition. It uses a linear model with a multiplicative interaction term ('phenotype * sex').
Usage
sex_interaction_analysis_bulk(
x,
phenotype,
gender,
phenotype_labels = c("WT", "TG"),
sex_labels = c("F", "M")
)
Arguments
x |
A numeric matrix of expression data (features x samples). |
phenotype |
A character or factor vector indicating the condition for each sample. |
gender |
A character or factor vector indicating the sex for each sample. |
phenotype_labels |
Character vector. Labels for phenotype groups (default: c("WT", "TG")). |
sex_labels |
Character vector. Labels for sexes (default: c("F", "M")). |
Details
This function constructs a design matrix that includes a formal interaction term between the phenotype and sex (e.g., '~ phenotype * sex'). It then uses 'limma' to test for genes where the effect of the phenotype differs significantly between sexes. This is a statistically rigorous approach to identify sex-modulated genes.
Value
A data frame with differential expression statistics for the interaction term, including logFC, t-statistic, P-value, and adjusted P-value.
Perform Sex-Phenotype Interaction Analysis for Single-Cell Data
Description
Performs differential difference analysis for a given cell type to identify genes modulated by sex-phenotype interactions using limma.
Usage
sex_interaction_analysis_sc(
seurat_obj,
target_cell_type,
sex_col = "sex",
phenotype_col = "status",
celltype_col = "cell_type",
min_logfc = 0.25,
fdr_threshold = 0.05,
sex_labels = c("F", "M"),
phenotype_labels = c("WT", "TG")
)
Arguments
seurat_obj |
A Seurat object. |
target_cell_type |
Character. Cell type to analyze. |
sex_col |
Character. Column name for sex (default "sex"). |
phenotype_col |
Character. Column name for phenotype (default "status"). |
celltype_col |
Character. Column name for cell type (default "cell_type"). |
min_logfc |
Numeric. Minimum absolute log fold change (default 0.25). |
fdr_threshold |
Numeric. FDR threshold for significance (default 0.05). |
sex_labels |
Character vector of sex labels (default c("F","M")). |
phenotype_labels |
Character vector of phenotype groups (default c("WT","TG")). |
Value
A list with complete DE results, significant results, and summary statistics.
Perform differential expression analysis within each sex
Description
This function identifies differentially expressed genes between conditions separately for each sex using a linear modeling approach.
Usage
sex_stratified_analysis_bulk(
x,
phenotype,
gender,
analysis_type = c("male", "female")
)
Arguments
x |
A numeric matrix of expression data (features Ă— samples). |
phenotype |
A vector indicating condition labels for each sample. |
gender |
A vector indicating gender for each sample. Labels must start with "f" (female) and "m" (male). |
analysis_type |
Character. Type of analysis to perform: "dimorphic" (difference in differences), "female" (female condition effect), or "male" (male condition effect). Default is "dimorphic". |
Details
This function performs differential expression analysis within each sex separately. For male analysis, it compares conditions within males. For female analysis, it compares conditions within females. For dimorphic analysis, it tests for difference in condition effects between sexes. Note: To identify truly sex-specific genes, use the output of this function as input for identify_sex_specific_genes().
Value
A data frame with differential expression statistics including logFC, AveExpr, t-statistic, P-value, and adjusted P-value.
Compute sex-specific differentially expressed genes (DEGs)
Description
Identifies differentially expressed genes (DEGs) separately for male and female samples within different cell types using the Seurat package. Compares gene expression between control and perturbed groups in each sex.
Usage
sex_stratified_analysis_sc(
seurat_obj,
sex_column = "sex",
phenotype_column = "status",
celltype_column = "cell_type",
sex_labels_vector = c("F", "M"),
min_logfc = 0.25,
phenotype_labels_vector = c("WT", "TG"),
method = "wilcox"
)
Arguments
seurat_obj |
Seurat object containing the single-cell data. |
sex_column |
Character. Column name in metadata for sex (default "sex"). |
phenotype_column |
Character. Column name in metadata for phenotype (default "status"). |
celltype_column |
Character. Column name in metadata for cell type (default "cell_type"). |
sex_labels_vector |
Character vector of sex labels (default c("F","M")). |
min_logfc |
Numeric. Minimum absolute log fold change threshold (default 0.25). |
phenotype_labels_vector |
Character vector of phenotype groups (default c("WT","TG")). |
method |
Character. Statistical test to use for differential expression (default "wilcox"). |
Value
A list with male and female DEGs results.
Visualize Gene Regulatory Network with Pie Charts
Description
Plots a network with nodes represented by pie charts that display male and female effects.
Usage
visualize_network(
g,
female_res,
male_res,
vertex.size = 5,
vertex.label.cex = 0.8,
...
)
Arguments
g |
An igraph network object. |
female_res |
Differential expression results for females. |
male_res |
Differential expression results for males. |
vertex.size |
Size of the network nodes. |
vertex.label.cex |
Text size for vertex labels. |
... |
Additional graphical parameters. |
Value
The modified igraph object with visualization attributes.