pymnet.sampling.esu.sample_multilayer_subgraphs_esu

pymnet.sampling.esu.sample_multilayer_subgraphs_esu(network, results, sizes=None, intersections=None, nnodes=None, nlayers=None, p=None, seed=None, intersection_type='strict', copy_network=True, custom_check_function=None)

A one-aspect multilayer version of the Rand-EnumerateSubgraphs (Rand-ESU) algorithm introduced by Wernicke [1].

Uniformly samples induced subgraphs of the form [nodelist][layerlist] which fulfill the given requirements. Each subgraph is sampled at probability product(p), where p is the parameter p. If all entries in p are 1, all such induced subgraphs in the network are found.

Parameters:
Multiple parameters can be given by the user, some of which are always required,
some of which are sometimes required, and some of which are mutually
exclusive. There are multiple functionalities to this function, and the choice
is done based on the parameters passed by the user. For a description of all
of them, see section Notes.
networkMultilayerNetwork

The multilayer network to be analyzed.

resultslist or callable

The method of outputting the found induced subgraphs. If a list, then the induced subgraphs are appended to it as ([nodelist],[layerlist]) tuples. The results list or the [nodelist] or [layerlist] lists are not guaranteed to be in any specific order. If a callable, when an acceptable induced subgraph is found, this callable is called with the argument ([nodelist],[layerlist]) (that is, one argument which is a tuple of two lists). The callable should therefore take only one required parameter in the form of a tuple. If you want to pass more parameters to the callable, do so via e.g. an anonymous function.

sizeslist of ints > 0

How many nodes should be on each layer of an acceptable induced subgraph. One integer for each layer of an acceptable subgraph.

intersectionslist of ints >= 0 or Nones, or int

How many nodes should be shared between sets of layers in an acceptable induced subgraph. If list, if an entry in the list is None, any number of shared nodes is accepted. The order of the intersections is taken to follow the order of layers in sizes, with two-layer intersections being listed first, then three-layer intersections, etc. If int, then this is taken to mean the intersection between ALL layers, and the other intersections can be anything. If this is an int with value x, it is equivalent to being a list with [None,None,…,None,x]. For more details, see section “Constructing the requirements” in the documentation of the function default_check_reqs.

nnodesint

How many nodes an acceptable subgraph should have. If not provided and sizes and intersections are provided, it will be calculated based on the sizes and intersections parameters. Required if there are Nones in intersections or if intersection_type is not “strict”. If you cannot guarantee the correctness of this number, do not use this parameter.

nlayersint

How many layers an acceptable subgraph should have. If not provided and sizes and intersections are provided, it will be calculated based on the sizes and intersections requirements. If you cannot guarantee the correctness of this number, do not use this parameter.

plist of floats 0 <= p <= 1

List of sampling probabilities at each depth. If None, p = 1 for each depth is used. The probability of sampling a given induced subgraph is the product of the elements of p. It is up to the user to provide a p of correct length to match the depth at which desired induced subgraphs are found. If you know how many nodes and layers an acceptable induced subgraph should have (nnodes and nlayers, respectively), you can calculate the length of p by: len(p) = nnodes - 1 + nlayers - 1 + 1. This formula follows from the fact that finding an induced subgraph starts from a nodelayer (the + 1), and then each node and each layer have to be added one at a time to the nodelist and layerlist, respectively (nnodes - 1 and nlayers - 1, respectively). Starting from a nodelayer means that both nodelist and layerlist are of length 1 when the expansion of the subgraph is started, hence the - 1’s.

seedint, str, bytes or bytearray

Seed for Rand-ESU.

intersection_typestring, “strict” or “less_or_equal”

If intersection_type is “strict”, all intersections must be exactly equal to entries in the intersections parameter. If intersection_type is “less_or_equal”, an intersection is allowed to be less than or equal to the corresponding entry in the intersections parameter. Usage is case-sensitive.

copy_networkboolean

Determines whether the network is copied at the beginning of execution. If True (default), the network is copied and the copy is modified during the execution (the original network is not modified). If False, the network is not copied, and the network is NOT modified during the execution. The copying takes more memory but results in faster execution times - the default value (True) is the recommended setting. The modification of the copy does not affect the edges in the induced subgraphs that are passed to the check function. During the execution, if this parameter is True, as starting nodelayers are iterated through in their numerical order, after a nodelayer has been iterated over all edges leading to it are removed (at this point, it is impossible to reach the nodelayer from subsequent starting nodelayers in any case). Therefore, if you use a custom_check_function whose return value depends also on the edges OUTSIDE the induced subgraph to be tested, set this parameter to False.

custom_check_functioncallable or None

If not None, this will be used to determine whether an induced subgraph is okay or not (instead of one of the built-in check functions). The algorithm finds induced subgraphs which have the given nnodes and nlayers, and which have a path spanning the induced subgraph (but are not necessarily connected). The algorithm then passes these to the check function, which determines whether the subgraph is acceptable or not. The arguments that are passed to your custom check function are the network, the nodelist of the induced subgraph, and the layerlist of the induced subgraph (three parameters, in this order). Your check function should therefore accept exactly three parameters. If you want to pass more parameters to the check function, do so via e.g. an anonymous function. If copy_network is True, the passed network is the copy of the network, which might have edges removed OUTSIDE of the induced subgraph (the edges inside the induced subgraph are identical to the original network’s). The function should return True or False (the subgraph is acceptable or not acceptable, respectively). When this parameter is not None, you must also specify nnodes and nlayers.

Notes

There are multiple functionalities built-in, and determining which is used is done by checking which parameters have been given by the user.

If you want to find induced subgraphs (ISs) that have a specified number of nodes on each layer and have specific intersections between layers, provide:

  • network

  • results

  • sizes

  • intersections as list without Nones

If you want to find ISs that have a specific number of nodes on each layer and have some specific intersections between layers, and some intersections that can be of any cardinality, provide:

  • network

  • results

  • sizes

  • intersections as list with Nones in the elements where intersection cardinalities can be anything (even all elements can be Nones)

  • nnodes

If you want to find ISs that have a specific number of nodes on each layer and have intersections that have at most specific cardinalities, provide:

  • network

  • results

  • sizes

  • intersections as list without Nones

  • nnodes

  • intersection_type = “less_or_equal”

If you want to find ISs that have a specific number of nodes on each layer and have some specific intersections that have at most specific cardinalities, and some intersections that can be of any cardinality, provide:

  • network

  • results

  • sizes

  • intersections as list with Nones in the elements where intersection cardinalities can be anything (even all intersections can be Nones)

  • nnodes

  • intersection_type = “less_or_equal”

If you want to find ISs that have a specific number of nodes on each layer and have a specific intersection between ALL layers (the other intersections can be anything), provide:

  • network

  • results

  • sizes

  • nnodes

  • intersections as int

If you want to find ISs that have a specific number of nodes and a specific number of layers, provide:

  • network

  • results

  • nnodes

  • nlayers

If you want to define your own function to determine when an IS is acceptable, provide:

  • network

  • results

  • nnodes

  • nlayers

  • custom_check_function

For all of the above uses, if you don’t want to find all ISs but only sample a portion of them, also provide:

  • p

Of the above uses, the first five use the default_check_reqs function for checking subgraph validity, the sixth uses the relaxed_check_reqs function, and the seventh uses the user-supplied checking function.

The documentation of the functions sampling.reqs.default_check_reqs, sampling.reqs.default_calculate_required_lengths and sampling.reqs.relaxed_check_reqs offer more insight into what are considered acceptable induced subgraphs in different cases in the functionalities described in this section. You should read these if you are not sure what you want to do or how to do it after reading this documentation.

References

[1] “A Faster Algorithm for Detecting Network Motifs”, S. Wernicke, WABI. Vol. 3692, pp. 165-177. Springer 2005.

Examples

Suppose we have the multilayer network N:

(1,'X')----(2,'X')----(3,'X')
              |
              |
           (2,'Y')

where (a,b) are nodelayer tuples with a = node identity and b = layer identity. After calling

>>> results = []
>>> sample_multilayer_subgraphs_esu(N,results,[2,1],[1])

the results list looks like [([1,2],[‘X’,’Y’]),([2,3],[‘X’,’Y’])] (or some other order of tuples and [nodelist] and [layerlist] inside the tuples, since the output is not guaranteed to be in any specific order).

After calling

>>> results = []
>>> sample_multilayer_subgraphs_esu(N,results,nnodes=3,nlayers=1)

the results list looks like [([1,2,3],[‘X’])] (or again, some other ordering).