Skip to contents

Detect character structure from datasets

Usage

detect_chars_structure_datasets(
  datasets_folderpath,
  considered_extensions,
  patterns,
  output_filepath = file.path(datasets_folderpath, paste0("detect_chars_structure_",
    basename(datasets_folderpath), ".rds")),
  get_output_in_session = TRUE
)

Arguments

datasets_folderpath

Character 1L. Folder path of datasets to process. These datasets must be at the root of the path

considered_extensions

Character. Datasets file extensions to consider. Extensions must be one supported by the rio:: package

patterns

Character. Patterns to detect across the datasets variables. Regex is supported

output_filepath

Character 1L. Output folder path.

get_output_in_session

Logical 1L. If TRUE, the function return a list, such that each element element corresponds to pattern detection details for each considered dataset

Value

If get_output_in_session is TRUE, a named list of data frames (one per dataset file), each with columns var (variable name), any_defined_structure (logical), and examples (character). The list is also saved as an RDS file at output_filepath. If get_output_in_session is FALSE, the function returns NULL invisibly and is called for its side effect of writing the RDS file.

Examples

mydir <- system.file("detect_chars_structure_datasets", package = "scrutr")
outfile <- file.path(tempdir(), "detect_college.rds")

detect <- detect_chars_structure_datasets(
  datasets_folderpath = mydir,
  considered_extensions = "xlsx",
  patterns = "(?i)college",
  output_filepath = outfile,
  get_output_in_session = TRUE)

# head(lapply(detect, head))

file.exists(outfile)
#> [1] TRUE
unlink(outfile)