Permutations for significance testing

In this example, we will work on another plant and pollination network. As in the previous example, we will download it and turn it into an adjacency matrix.

Because there will be some randomizations involved, we will start Julia with julia -p n, where n is the number of CPUs to run on. If you have a 4-core machine, then julia -p 3 is fine. This will not give excellent scaling because there is some overhead when using a small number of cores, but it will still cut the computing time.

Brim has to be loaded on all workers, so it should be imported with

@everywhere using Brim

Have a look at the Parallel Computing page of Julia's documentation if you need more information.

download(
  "https://www.nceas.ucsb.edu/interactionweb/data/plant_pollinator/text/ino_matr_f.txt",
  "ino.txt"
)
@everywhere A = map((x) -> x>0?1:0, int(readdlm("ino.txt")))

The first step is to optimize its modularity. Since this is a small network, we will use a random starting partition:

@everywhere rndbrim = (A) -> A |> partition_random |> recursive_brim!
ino_partitions = pmap(rndbrim, [A for i in 1:1000])

For reference, on my machine (runing three workers), the pmap (parallel) version runs 1000 replicates within 2 seconds, and the map (serial) does the same thing in 4 seconds. The gain in time becomes greater when using more replicates (or more cores...).

We can then filter only the partitions giving the best modularity:

ino_q = map(Q, ino_partitions)
best_partition_id = filter((x) -> ino_q[x] == maximum(ino_q), 1:length(ino_q))[1]
ino_mod = ino_partitions[best_partition_id]

The modularity of this partition (Q(ino_mod)) should be around 0.39 -- which is low, and may not be significant.

We will test this using permutations of the original matrix.

To generate a single randomization keeping the rows and columns marginals, we can use:

A_rnd = null_preserve_marginals(A)

This can take a few seconds to run (around 8 sec. on my machine). We will need to do this in parallel, which is not difficult:

random_matrices = pmap(null_preserve_marginals, [A for i in 1:50])

Note that I used 50 replicates for the sake of not spending minutes waiting for the permutations to finish. It is of course unreasonably low.

Now, we can apply the same steps as above, to measure the distribution of modularities in the shuffled matrices. Because we have not assigned random_matrices everywhere, we will not use pmap:

expected_q = map(Q, map(rndbrim, random_matrices))

This array will have the expected distribution of Q, for this network, under the null hypothesis that interactions are randomly distributed (this is a way to test the impact of the degree distribution on the modularity). There are different ways to get estimates of the p-value, but an easy one is to measure the proportion of randomized matrices that are less modular than the original one:

approx_pval = maximum([sum(expected_q .>= Q(ino_mod))/length(expected_q) 1/length(expected_q)])

Note that the p-value cannot be lower than 1/m, where m is the number of shuffled matrices tested. In this example, this should return a value around 0.02. This indicates that the modular structure of this network, although not strong, is a significant deviation from the random expectation.