This function calculates the Information Value (IV) for the selected numeric predictor variables in the dataset, given a specified outcome variable. The Information Value provides a measure of the predictive power of each variable in relation to the outcome variable, which can be useful in feature selection for predictive modeling.
Usage
create_IV(
data,
predictors = NULL,
outcome,
bins = 5,
siglevel = 0.05,
exc_sig = FALSE,
return = "plot"
)
Arguments
- data
A Person Query dataset in the form of a data frame.
- predictors
A character vector specifying the columns to be used as predictors. Defaults to NULL, where all numeric vectors in the data will be used as predictors.
- outcome
String specifying the column name for a binary variable, containing only the values 1 or 0.
- bins
Number of bins to use, defaults to 5.
- siglevel
Significance level to use in comparing populations for the outcomes, defaults to 0.05
- exc_sig
Logical value determining whether to exclude values where the p-value lies below what is set at
siglevel
. Defaults toFALSE
, where p-value calculation does not happen altogether.- return
String specifying what to return. This must be one of the following strings:
"plot"
"summary"
"list"
"plot-WOE"
"IV"
See
Value
for more information.
Value
A different output is returned depending on the value passed to the return
argument:
"plot"
: 'ggplot' object. A bar plot showing the IV value of the top (maximum 12) variables."summary"
: data frame. A summary table for the metric."list"
: list. A list of outputs for all the input variables."plot-WOE"
: A list of 'ggplot' objects that show the WOE for each predictor used in the model."IV"
returns a list object which mirrors the return inInformation::create_infotables()
.
Details
This is a wrapper around wpa::create_IV()
.
Examples
# Return a summary table of IV
pq_data %>%
dplyr::mutate(X = ifelse(Internal_network_size > 40, 1, 0)) %>%
create_IV(outcome = "X",
predictors = c("Email_hours",
"Meeting_hours",
"Chat_hours"),
return = "plot")
# Return summary
pq_data %>%
dplyr::mutate(X = ifelse(Internal_network_size > 40, 1, 0)) %>%
create_IV(outcome = "X",
predictors = c("Email_hours", "Meeting_hours"),
return = "summary")
#> Variable IV
#> 1 Email_hours 1.7810969
#> 2 Meeting_hours 0.9445303