[Experimental]

Analyse a person-to-person (P2P) network query, with multiple visualisation and analysis output options. Pass a data frame containing a person-to-person query and return a network visualization. Options are available for community detection using either the Louvain or the Leiden algorithms.

network_p2p(
  data,
  hrvar = "Organization",
  display = "hrvar",
  return = "plot",
  path = paste0("network_p2p_", display),
  desc_hrvar = c("Organization", "LevelDesignation", "FunctionType"),
  bg_fill = "#FFFFFF",
  font_col = "grey20",
  legend_pos = "bottom",
  palette = "rainbow",
  node_alpha = 0.7,
  edge_alpha = 1,
  res = 0.5,
  seed = 1,
  algorithm = "mds",
  size_threshold = 5000,
  weight = "StrongTieScore"
)

Arguments

data

Data frame containing a person-to-person query.

hrvar

String containing the label for the HR attribute.

display

String determining what output to return. Valid values include:

  • "hrvar" (default): compute analysis or visuals without computing communities.

  • "louvain": compute analysis or visuals with community detection, using the Louvain algorithm.

  • "leiden": compute analysis or visuals with community detection, using the Leiden algorithm. This requires all the pre-requisites of the leiden package installed, which includes Python and reticulate.

return

String specifying what output to return. This must be one of the following strings:

  • 'plot' (default)

  • 'sankey'

  • 'table'

  • 'data'

  • 'describe'

  • 'network'

See Value for more information.

path

File path for saving the PDF output. Defaults to a timestamped path based on current parameters.

desc_hrvar

Character vector of length 3 containing the HR attributes to use when returning the "describe" output. See network_describe().

bg_fill

String to specify background fill colour.

font_col

String to specify font and link colour.

legend_pos

String to specify position of legend. Defaults to "bottom". See ggplot2::theme(). This is applicable for both the 'ggraph' and the fast plotting method. Valid inputs include:

  • "bottom"

  • "top"

  • "left" -"right"

palette

Function for generating a colour palette with a single argument n. Uses "rainbow" by default.

node_alpha

A numeric value between 0 and 1 to specify the transparency of the nodes. Defaults to 0.7.

edge_alpha

A numeric value between 0 and 1 to specify the transparency of the edges (only for 'ggraph' mode). Defaults to 1.

res

Resolution parameter to be passed to leiden::leiden(). Defaults to 0.5.

seed

Seed for the random number generator passed to either set.seed() when the Louvain algorithm is used, or leiden::leiden() when the Leiden algorithm is used, to ensure consistency. Only applicable when display is set to "louvain" or "leiden".

algorithm

String to specify the node placement algorithm to be used. Defaults to "mds" for the deterministic multi-dimensional scaling of nodes. See https://rdrr.io/cran/ggraph/man/layout_tbl_graph_igraph.html for a full list of options.

size_threshold

Numeric value representing the maximum number of edges before network_leiden() switches to use a more efficient, but less elegant plotting method (native igraph). Defaults to 5000. Set as 0 to coerce to a fast plotting method every time, and Inf to always use the default plotting method (with 'ggraph').

weight

String to specify which column to use as weights for the network. Defaults to "StrongTieScore. To create a graph without weights, supply NULL to this argument.

Value

A different output is returned depending on the value passed to the return

argument:

  • 'plot': return a network plot.

  • 'sankey': return a sankey plot combining communities and HR attribute. This is only valid if a community detection method is selected at display.

  • 'table': return a vertex summary table with counts in communities and HR attribute.

  • 'data': return a vertex data file that matches vertices with communities and HR attributes.

  • 'describe': return a list of data frames which describe each of the identified communities. The first data frame is a summary table of all the communities. This is only valid if a community detection method is selected at display.

  • 'network': return 'igraph' object.

Running Leiden communities

Running Leiden communities requires python dependencies installed. You can run the following:

# Return a network plot to console, coloured by Leiden communities
  p2p_data %>%
    network_p2p(display = "leiden",
                path = NULL,
                return = "plot")

When installing the 'leiden' package, you may be required to install the Python libraries 'python-igraph' and 'leidenalg'. You can install them with:

reticulate::py_install("python-igraph")
reticulate::py_install("leidenalg")

Examples

# Simulate a small person-to-person dataset
p2p_data <- p2p_data_sim(size = 50)

# Return a network plot to console, coloured by hrvar
p2p_data %>%
  network_p2p(display = "hrvar",
              path = NULL,
              return = "plot")


# Return a network plot to console, coloured by Louvain communities
p2p_data %>%
  network_p2p(display = "louvain",
              path = NULL,
              return = "plot")



# Return a network plot to console
# Coloured by Leiden communities
# Using Fruchterman-Reingold force-directed layout algorithm
# Force the use of fast plotting method
p2p_data %>%
  network_p2p(display = "hrvar",
              path = NULL,
              return = "plot",
              algorithm = "lgl",
              size_threshold = 0)
#> Using fast plot method due to large network size...


# Return a data frame matching HR variable and communities to nodes
# Using Louvain communities
p2p_data %>%
  network_p2p(display = "louvain",
              return = "data",
              algorithm = "fr")
#> # A tibble: 50 × 3
#>    name      Organization cluster
#>    <chr>     <chr>        <chr>  
#>  1 SIM_ID_1  Org F        1      
#>  2 SIM_ID_2  Org F        1      
#>  3 SIM_ID_3  Org E        2      
#>  4 SIM_ID_5  Org C        2      
#>  5 SIM_ID_6  Org B        2      
#>  6 SIM_ID_7  Org A        2      
#>  7 SIM_ID_8  Org D        2      
#>  8 SIM_ID_9  Org E        2      
#>  9 SIM_ID_10 Org C        2      
#> 10 SIM_ID_11 Org F        2      
#> # … with 40 more rows