Calculate sizes of unions and intersections between each pair of sets

getSetIntersections(df, setNames, idName)

Arguments

df

A data frame indicating set membership

setNames

A character vector of set names

idName

A string specifying name of ID variable for each item

Value

A data frame with variables:

  • set1 and set2 pair of sets

  • Ninter and Nunion number of elements in the intersection and union of the pair of sets

  • N1 and N2 number of items in each set

  • prop1 and prop2 proportion of items in each set that are also members of the other set (prop1 = Ninter / N1)

  • prop proportion of items from either set that are members of both sets (prop = Ninter / Nunion)

  • .pred predictions assuming marginal independence of sets

  • .error difference between predictions and observations

  • .relError error relative to observations

Details

The input data frame should contain a row for each item and a binary variable for each set indicating the membership of each item. The setNames input should correspond to the binary indicator columns in the data frame.

Examples

# Define set names data("movieSets") setNames <- colnames(movieSets[,-c(1:8)]) # Calculate set sizes getSetIntersections(movieSets, setNames, "movieId")
#> # A tibble: 361 x 20 #> set1 set2 Ntotal Ninter Nunion N1 N2 prop prop1 prop2 #> <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 Acti~ Acti~ 27278 3520 3520 3520 3520 1 1 1 #> 2 Acti~ Adve~ 27278 972 4877 3520 2329 0.199 0.276 0.417 #> 3 Acti~ Anim~ 27278 198 4349 3520 1027 0.0455 0.0562 0.193 #> 4 Acti~ Chil~ 27278 102 4557 3520 1139 0.0224 0.0290 0.0896 #> 5 Acti~ Come~ 27278 719 11175 3520 8374 0.0643 0.204 0.0859 #> 6 Acti~ Crime 27278 767 5692 3520 2939 0.135 0.218 0.261 #> 7 Acti~ Docu~ 27278 11 5980 3520 2471 0.00184 0.00312 0.00445 #> 8 Acti~ Drama 27278 1203 15661 3520 13344 0.0768 0.342 0.0902 #> 9 Acti~ Fant~ 27278 325 4607 3520 1412 0.0705 0.0923 0.230 #> 10 Acti~ Film~ 27278 13 3837 3520 330 0.00339 0.00369 0.0394 #> # ... with 351 more rows, and 10 more variables: Ninter.pred <dbl>, #> # Nunion.pred <dbl>, prop.pred <dbl>, prop1.pred <dbl>, Ninter.error <dbl>, #> # prop.error <dbl>, prop1.error <dbl>, Ninter.relError <dbl>, #> # prop.relError <dbl>, prop1.relError <dbl>