When pulling schema structure directly from database, you may decide which schema information should be saved in the configuration yaml file. The proper configuration defined with set_faker_opts
should be passed to faker_opts
parameters of schema_source
function:
<- source_schema(
schema source = conn,
schema = "public",
faker_opts = set_faker_opts(...)
)
DataFakeR currently offers two configuration types:
The current version of DataFakeR package supports five types (R target types) of columns:
Each column-type configuration is done by setting:
set_faker_opts(opt_pull_<type> = opt_pull_<type>(...))
The possible configurable parameters are (with supported types):
values (all types)
Should column unique values be sourced? If so the ones are stored as an array withing values parameter.max_uniq_to_pull (all types)
Pull unique values only when the distinct number of them is less than provided value. The parameter prevents for sourcing large amount of values to configuration file for example when dealing with ids column.nchar (character)
Should maximum number of characters in column be pulled? Is so stored as nchar parameter in configuration YAML file.range (numeric, integer, date)
Should column range be sourced? Is so stored as range parameter in configuration YAML file.na_ratio (all types)
Should column source ratio of NA values existing in the original column?The information stored by the above parameters may then be used in the simulation methods.
The default parameters can be accessed respectively from default_faker_opts
object, for example:
character columns:
$opt_pull_character
default_faker_opts#> $values
#> [1] TRUE
#>
#> $max_uniq_to_pull
#> [1] 10
#>
#> $nchar
#> [1] TRUE
#>
#> $na_ratio
#> [1] TRUE
#>
#> $levels_ratio
#> [1] TRUE
means, by default we save in the existing column values only when number of its unique values is less than 10. We will be also storing maximum number of character for strings in column.
integer columns:
$opt_pull_integer
default_faker_opts#> $values
#> [1] TRUE
#>
#> $max_uniq_to_pull
#> [1] 10
#>
#> $range
#> [1] TRUE
#>
#> $na_ratio
#> [1] TRUE
#>
#> $levels_ratio
#> [1] FALSE
means the same for sourcing possible values as for character type, more to that we will source the column values range.
Such configuration for sample book authors table, may result with the below structure:
ID | Author | Digest |
---|---|---|
1 | Miss Madelyn Crist MD | Digest A |
2 | Merritt Gislason IV | Digest A |
3 | Linton Botsford | Digest A |
4 | Isam Bins-Shanahan | Digest A |
5 | Ora Stark | Digest A |
6 | Priscila Auer | Digest A |
7 | Ms. Addie Grady DDS | Digest B |
8 | Dr. Wayman Halvorson V | Digest B |
9 | Kesha Legros | Digest B |
10 | Gay Hoppe | Digest B |
11 | Yolanda Greenholt | Digest B |
authors:
columns:
ID:
type: serial
unique: true
not_null: true
default: na.integer
range: [1, 11]
author:
type: varchar
unique: true
not_null: true
default: na.character
nchar: 23
digest:
type: varchar
unique: false
not_null: true
default: na.character
values: [Digest A, Digest B]
nchar: 8
type
, unique
, not_null
and default
are always sourced,nchar = 23
means max string length in the author column was 23 characters.If we want to not source range
and nchar
information just precise:
<- set_faker_opts(
my_opts opt_pull_integer = opt_pull_integer(range = FALSE),
opt_pull_character = opt_pull_character(nchar = FALSE)
)
and pass my_opts
to faker_opts
parameter of schema_source
function.
Can be achieved by specifying opt_pull_table
option with the method of the same name.
In the current version of DataFakeR package only one parameter (nrows
) can be configured, with the three values:
exact
- the number of rows of original table will be stored (and saved in configuration as nrows
value),ratio
- the ratio of rows for table across all the tables in schema will be stored (and saved in configuration as nrows
value),none
- number of rows will not be sourced.Such information can be further used to define number of rows in simulated table (see simulation options).
Note: In the future DataFakeR releases the option to define custom parameters will be enabled.