---
title: "Getting Started with perumammals"
author: "Paul Efren Santos Andrade"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting Started with perumammals}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(perumammals)
```

## Introduction

The **perumammals** package provides tools for working with Peru's mammalian biodiversity. It includes a curated taxonomic backbone based on Pacheco et al. (2021), the most comprehensive and up-to-date synthesis of Peruvian mammal diversity.

This vignette will show you how to:

- Install and load the package
- Access the mammal species data
- Validate species names
- Identify endemic species
- Explore species by ecoregion
- Work with taxonomic families

## Installation

You can install the development version of perumammals from GitHub:
```{r install, eval=FALSE}
# Using pak (recommended)
pak::pak("PaulESantos/perumammals")

# Or using remotes
remotes::install_github("PaulESantos/perumammals")
```

## Loading the package
```{r setup}
library(perumammals)
```

## Available dataset

The main dataset included in the package is the species list provided as an appendix in Pacheco et al. (2021):

```{r datasets}
# Main species backbone
data(peru_mammals)
head(peru_mammals)

```

## Basic name validation

The core function `validate_peru_mammals()` checks if species names are present in the Peruvian mammal checklist:
```{r validate-basic}
# Single species
species_list <- c(
  "Puma concolor",           # Valid name
  "Tremarctos ornatus",      # Valid name  
  "Panthera onca",           # Valid name
  "Lycalopex sechurae",      # Valid name
  "Odocoileus virginianus",  # Valid name
  "Puma concolar"            # Misspelled
)

results <- validate_peru_mammals(species_list)
results
```

## Quick checks with wrapper functions

### Check if species occur in Peru
```{r is-peru}
# Returns TRUE/FALSE
is_peru_mammal(species_list)
```

### Identify endemic species
```{r endemic}
# Check which species are endemic to Peru
species_list <- c("Thomasomys notatus", "Tremarctos ornatus", "Eptesicus mochica", "Puma concolar")

is_endemic_peru(species_list)

# Get endemic status as character
endemic_status <- ifelse(
  is_endemic_peru(species_list) == "Endemic to Peru",
  "Endémica",
  "No endémica"
)
endemic_status
```

### Check match quality

```{r match-quality}
# Get match quality levels
match_quality_peru(species_list)
```

## Working with data frames

The validation functions integrate smoothly with data frames and the tidyverse:
```{r dataframe, warning=FALSE, message=FALSE}
library(dplyr)

# Create a sample dataset
my_data <- tibble(
  species = species_list,
  abundance = c(5, 3, 2, 8)
)

# Add validation results
my_data_validated <- my_data |> 
  mutate(
    in_peru = is_peru_mammal(species),
    endemic = is_endemic_peru(species),
    match_quality = match_quality_peru(species)
  )

my_data_validated
```

## Exploring taxonomic families

### List all families
```{r families}
# Get summary of all families
families <- pm_list_families()
families

# Families with highest species richness
families |> 
  arrange(desc(n_species)) |> 
  head(10)
```

### Filter by family

```{r family-filter}
# Get summary for bat species (Phyllostomidae)
pm_list_families() |> 
  filter(family == "Phyllostomidae")

# Get species list for a specific family
 pm_species(family = "Phyllostomidae")
```

## Analyzing endemic species

### Get endemic species list
```{r endemic-list}
# List all endemic species
endemic_mammals <- pm_species(endemic = TRUE)
endemic_mammals

# Endemic species by family
endemic_mammals |> 
  group_by(family) |> 
  summarise(n_species = n_distinct(scientific_name)) |> 
  arrange(desc(n_species)) |> 
  head(10)
```

### Endemic species by ecoregion
```{r endemic-ecoregion}
# Compare endemism across ecoregions
endemic_rate <- pm_list_ecoregions(include_endemic = TRUE)
endemic_rate

# Endemic species in Yungas
pm_by_ecoregion(ecoregion = "YUN", endemic = TRUE)

```

## Ecoregion analysis

### Species distribution across ecoregions

```{r ecoregion-dist}

# Count species per ecoregion
pm_list_ecoregions()
```

### Species with widest distribution

```{r wide-distribution}
# Species occurring in most ecoregions
peru_mammals_ecoregions |> 
  count(scientific_name, name = "n_ecoregions") |> 
  arrange(desc(n_ecoregions)) |> 
  top_n(10)
```

## Practical examples

### Example 1: Data cleaning workflow
```{r cleaning}
# Messy species list from field observations
field_data <- tibble(
  location = c("Manu", "Tambopata", "Paracas", "Cusco", "Lima"),
  species_name = c(
    "puma concolor",           # lowercase
    "Tremarctos ornatu",       # missing 's'
    "Otaria flavescens",       # marine mammal
    "Lycalopex sechure",       # missing 'ae'
    "Unknown bat"              # invalid
  ),
  count = c(2, 1, 15, 3, 8)
)

# Validate and clean
field_data_clean <- field_data %>%
  mutate(
    # Validate names
    validated = validate_peru_mammals(species_name)$Matched.Name,
    # Check if in Peru
    in_checklist = is_peru_mammal(species_name),
    # Get match quality
    quality = match_quality_peru(species_name)
  )

field_data_clean
```

### Example 2: Endemic species summary

```{r endemic-summary}
# Get all endemic mammals
endemic_species <- pm_species(endemic = TRUE)
endemic_species
# Total endemic species by order
endemic_species |> 
  count(order, name = "n_endemic") |> 
  arrange(desc(n_endemic))
```

### Example 3: Ecoregion-specific analysis

```{r ecoregion-analysis}
# Focus on Selva Baja (Amazon lowlands)

selva_baja_species <- pm_by_ecoregion(ecoregion = "SB")
selva_baja_species

# Endemic species in Selva Baja
pm_by_ecoregion(ecoregion = "SB", endemic = TRUE) |> 
  count(family, name = "n_species") |> 
  arrange(desc(n_species))
```

## Advanced: Fuzzy matching details

The validation algorithm uses a hierarchical matching approach:

1. **Exact match**: Perfect match with accepted name
2. **Genus + fuzzy species**: Genus exact, species with small differences
4. **Fuzzy genus + exact species**: Species exact, genus with small differences
5. **Double fuzzy**: Both genus and species with small differences
6. **No match**: No acceptable match found

```{r fuzzy-details}
# Examples of different match levels
test_names <- c(
  "Puma concolor",              # Level: Exact
  "Tremarctos ornatus Cuvier",  
  "Lycalopex sechure",          # Level: Genus + fuzzy species
  "Lyclopex sechurae",          # Level: Fuzzy genus + exact species
  "Panthera onca"               # Level: Exact
)

validate_peru_mammals(test_names) |> 
  select(Orig.Name, Matched.Name, matched)
```

## Citation

When using this package, please cite both the package and the source data:
```{r citation, eval=FALSE}
citation("perumammals")
```

**Package citation:**
Santos Andrade, P. E., & Gonzales Guillen, F. N. (2025). perumammals: Taxonomic Backbone and Name Validation Tools for Mammals of Peru.

**Data source:**
Pacheco, V., Cadenillas, R., Zeballos, H., Hurtado, C. M., Ruelas, D., & Pari, A. (2021). Lista actualizada de la diversidad de los mamíferos del Perú y una propuesta para su actualización. *Revista Peruana de Biología*, 28(special issue), e21019. https://doi.org/10.15381/rpb.v28i4.21019
