18 Apr 2011

Multivariate Repeated Measurements with adonis():

Lately I had to figure out how to do a repeated measures (or mixed effects) analysis on multivariate (species) data. Here I share code for a computation in R with the adonis function of the vegan package. Credit goes to Gavin Simpson providing most of the important pieces of the below code in R-Help.

The design:
We have multivariate species data, sampled at different sites (n = 6) at 3 points in time (N = 24).

The hypothesis:
H0: Species composition is the same across time. The permutation will produce the H0-population by a restricted randomization of time points.
H1: Species composition differs between time points

The analysis:
With repeated measurements one has to allow for the temporal structure and for the dependency within repeated measures. In a pedantic manner one would account for temporal structure by permuting all time-points in the same way also keeping the ordering of time-points. With 3 time-points this would give you only 3 permutations!

But we may relax this very restrictive assumption and say it doesn't matter that all samples at t1 were taken at one time and the ones of t2, etc. at another. Still there would be the repeated measures and the fact that the samples were taken in a temporal sequence. So how to deal with this?

(1) Having in mind that repeated measures on one sample are not independent, we will not permute across these samples.

(2) Now to the temporal correlation within repeated samples: The ordering of time-points is another type of dependency: the ordering is 1, 2, 3 - we will change this ordering only to 3, 1, 2 or 2, 3, 1, This permutation is referred to as toroidal shift which is keeping the "neighborhood" of values intact.

Eventually, we have 3 permutations per site, with 6 sites - this yields 3^6 = 729 permutations overall. Thus, a minimal p-value of 1/729 = 0.0014 can be obtained. I'd say that's good enough!

### species matrix with 20 species abundances (mean = 50, sd = 10)
### one time variable, with 3 timepoints, which should be tested
### and a factor denoting sites that were repeatedly sampled (site)
 
## Load packages
require(vegan)
 
### Data:
sp <- matrix(rnorm(3 * 6 * 20, 50, 10), nrow = 3 * 6, ncol = 20,
         dimnames = list(1:18, paste("Sp", 1:20, sep = "")))
 
time <- as.ordered(rep(1:3, 6))
site <- gl(6, 3)
cbind(site, time, sp)
 
### add time effect at timepoint 3,
### this will effect will be tested by adonis():
sp_1 <- sp
sp_1[time==3,] <-  sp[time==3,] + rnorm(20, 10, 1)
cbind(site, time, sp_1)
 
### choose which species set to test:
test_sp <- sp_1
 
### computing the true R2-value
 
### (btw, using dist() defaults to euclidean distance):
print(fit <- adonis(dist(test_sp) ~ time, permutations=1))
 
### number of perms
B <- 1999
 
### setting up frame which will be populated by
### random r2 values:
pop <- rep(NA, B + 1)
 
### the first entry will be the true r2:
pop[1] <- fit$aov.tab[1, 5]
 
### set up a "permControl" object:
### we turn off mirroring as time should only flow in one direction
ctrl <- permControl(strata = site, within = Within(type = "series", mirror = FALSE))
 
### Number of observations:
nobs <- nrow(test_sp)
 
### check permutation (...rows represent the sample id):
### ..they are ok!
### within in each repeated sample (= sites) timepoints are shuffled,
### with keeping the sequence intact (e.g., for  site 1: 1,2,3 - 2,3,1 - 3,2,1)
shuffle(nobs, control = ctrl)
 
### loop:
### in adonis(...) you need to put permutations = 1, otherwise 
### adonis will not run
set.seed(123)
for(i in 2:(B+1)){
     idx <- shuffle(nobs, control = ctrl)
     fit.rand <- adonis(dist(test_sp) ~ time[idx], permutations = 1)
     pop[i] <- fit.rand$aov.tab[1, 5]
}
 
### get the p-value:
print(pval <- sum(pop >= pop[1]) / (B + 1))
### [1] 0.0035
 
### the sign. p-value supports the H1 (->there is a time effect).
### ..and the fact that samples are not iid is allowed by
### the customized perms - so this p-value is trustworthy as opposed
### to tests not acknowledging dependency of data points..
 
### test sp set without an effect:
### replace test_sp with sp set without effect:
test_sp <- sp
 
### now re-run the script and see the result:
### it is insign. - as expected:
 
### setting up frame which will be populated by
### random r2 values:
pop <- rep(NA, B + 1)
 
### computing the true R2-value:
print(fit <- adonis(dist(test_sp) ~ time, permutations = 1))
 
### the first entry will be the true r2:
pop[1] <- fit$aov.tab[1, 5]
 
### run the loop:
set.seed(123)
for(i in 2:(B+1)){
     idx <- shuffle(nobs, control = ctrl)
     fit.rand <- adonis(dist(test_sp) ~ time[idx], permutations = 1)
     pop[i] <- fit.rand$aov.tab[1, 5]
}
print(pval <- sum(pop >= pop[1]) / (B + 1))
### [1] 0.701
 
## make a histogram to see random R2-values and the true one:
hist(pop, xlab = "Population R2")
abline(v = pop[1], col = 2, lty = 3)
text(0.08, 300, paste("true R2,\np = ", pval, sep = ""))

To cite package ‘vegan’ in publications use:

Jari Oksanen, F. Guillaume Blanchet, Roeland Kindt, Pierre Legendre,
Peter R. Minchin, R. B. O'Hara, Gavin L. Simpson, Peter Solymos, M.
Henry H. Stevens and Helene Wagner (2011). vegan: Community Ecology
Package. R package version 2.0-2.
http://CRAN.R-project.org/package=vegan

33 comments :

  1. Hi there. I'm interested in this approach, but am having a hard time finding information on it. It seems that there are a number of papers out there that just use the regular PERMANOVA for data sets with repeated sampling. Are these just poor analyses?

    Also, it would be great if you could provide the output that you see in the text so that readers can confirm that what they are doing is coming out right.

    ReplyDelete
    Replies
    1. hi anonymous,

      not allowing for spatial/temporal correlation will yield anti-conservative p-values and there is higher chance of false significant effects than suggest by this p-values. i'd say that *is* poor analysis.

      i did add some output as proposed by you.

      hth, kay

      Delete
  2. If you have 729 unique permutations, why did you use 1999 permutations? Why not all possible 729?

    ReplyDelete
    Replies
    1. ..you're right: using the whole set of enumartions, what would be referred to as "exact test", would be best. But, with as much as 1999 permutation you yield a good approximation of the null-distribution. And, for the case being considered, I suppose that is all ok.

      Delete
  3. Hi, I need to perform a repeated measures permanova based on a bi-factorial sampling design.

    Can you help me?

    Cheers.

    Ronei.

    ReplyDelete
    Replies
    1. depends.. you can mail to the address in the footer.

      Delete
    2. Hi Kay

      I need to perform a rmPERMANOVA based on a bi-factorial (as bellow). Can you help me? In the script I lose my self in the "for"...

      Cheers, João

      birds fishes aphyo.1 aphyo.2 aphyo.3 mfore.1 mfore.2 mfore.3
      without native 75 59 59 75 61 54
      with native 75 58 60 75 69 69
      without non-native 75 68 63 75 69 61
      without native 75 58 59 75 63 55

      Delete
    3. ..send me the whole dataset and a short description and I'll see.

      Delete
  4. Hi- I am analyzing genetic distance matrices from samples taken from a three-factor sampling design, with repeated samplings over time. I have horrible problems with imbalance, as I had no way to control whether or not I actually got my (relatively rare) target organism at each sampling. Because I have had a hard time finding information about how to deal with violations of the assumptions of balance I am slashing back my data set to only the combinations that I can get balance for.

    Do I need balance in the number of instances of samplings per site as well, or have I some wiggle room in the repeated measures portion of the analysis?

    Thanks for your input!

    ReplyDelete
    Replies
    1. I'd say as long as your cell sizes (also for random factors, like your sites) exceed a minimum of about 5-10 (?) samples the test applies, given that you're testing the appropiate hypotheses..

      I'd advise you to address the R-mailing list (r-sig-eco) for a more profound answer on that question. See i.e. this: https://stat.ethz.ch/pipermail/r-help/2008-April/159252.html

      Delete
  5. Hi Kay,

    This is a very cool. I´m have the p value, but how can we retrieve the F and degrees of freedom.


    Sorry if it is a simple question, but my knowledge on statistic is very basic

    ReplyDelete
  6. Hi Kay,

    Sorry if this is a messy question for you. I have a species matrix with 11 species abundances, one time variable (weeks) with 6 timepoints, sites that were repeatedly sampled (site), replicate 3 per site. My null hypothesis is that the site didn't influence the species abundances

    I have some basic background in statistics, but I am quite new to R. I have tried unsuccessfully
    to modify your code to work with my data. Could you suggest me on how to proceed?

    Best regards

    Fernando

    Example of my matrix:
    Site Replicate week Sp1 Sp2 Sp3 Sp4 Sp5 Sp6 Sp7 Sp8 Sp9 Sp10 Sp11
    SS a 1 0 9 0 0 0 0 0 0 0 0 9
    SS b 1 0 6 0 0 1 0 0 0 1 0 6
    SS c 1 0 11 2 0 0 0 0 0 0 0 13
    SS d 1 0 9 1 0 0 0 0 0 0 0 10
    Mh a 1 0 3 0 0 2 1 0 0 0 0 3
    Mh b 1 0 3 1 0 0 0 0 0 0 0 4
    ........

    ReplyDelete
    Replies
    1. Hi there,

      If you`re solely interested in differences between sites I'd suppose you should average abundances over the timepoints, reducing the dataset from 6*3*N(Sites) to 3*N(Sites).. BTW, what species are we talking about, and, why did you repeat over weeks in the first place?

      HTH,
      Kay

      Delete
  7. Hi Kay,

    thanks a lot for the script as it does help to reduce the amount of analyses to run and is just what I was looking for. However, I have one question: In lines 18 to 20 you have the following code:

    sp_1 <- sp
    sp_1[time==3,] <- sp[time==3,] + rnorm(20, 10, 1)
    cbind(site, time, sp_1)

    In the middle line you add further random numbers to the original data set. These random numbers are needed just in your case or are the supposed to be added in my adapted script as well? What is the reason behind adding these random numbers?
    I'm sorry if it sounds like s silly question, am just trying to understand what I am actually doing here.

    Kind regards,
    Suse

    ReplyDelete
    Replies
    1. Suse,
      as commented in the script, this is just to add an artificial effect to showcase in the test..

      Regards,
      Kay

      Delete
  8. Dear KAY,
    If I run a MANOVA can I get the significance of each variable in the left part of the model by applying a summary.aov function.
    Did you know whether it is possible to obtain this output in a PERMANOVA using the adonis function.
    Thanks

    ReplyDelete
    Replies
    1. Ok! But the output of the adonis function doesn't show the significance of each variable, solely the total effect in the multivariate space.
      Tks.

      Delete
    2. Yes, for the significance of single variables on the left side ("species") there are no tests in adonis. That is simply because it is dealing with dissimilarity matrices rather than the single variables/species.

      Delete
  9. Hi Kay,

    I am having trouble understanding how you add the time effect at timepoint 3. I understand the assignment,
    sp_1[time==3,] <- sp[time==3,] + rnorm(20, 10, 1)
    but you never use that line again (time==3) but go back to sp_1 which includes everything, and then rename it test_sp which to me seems exactly the same as your original sp.

    What am I missing?



    ReplyDelete
    Replies
    1. Hi e,

      'sp' has no time effect, whereas for 'sp_1' some effect is added at time-level '3'. I run the test twice, in both cases using 'test_sp' - for the first test 'sp' is assigned to 'test_sp' and in the second test it is 'sp_1' that is assigned to 'test_sp'..

      HTH, Kay

      Delete
  10. I'm not sure I totally get this. Stupid me.

    First, easy question:
    why would you use euclidian distances on species abundance data? This is just for demonstration, right?

    Second: If I'm not only looking at the effect of time, but also at the effect of a treatment (in split-plot design, e.g.), how would I need to control for it?

    This would make no sense, I guess:

    fit <- adonis(distance ~ treatment+time, permutations=1, strata=treatment)

    ReplyDelete
    Replies
    1. Euclidean was used just for convenience. However, there's nothing wrong with it in general - there are of course some situations where it is not appropiate to use this distance measure, but that's the case for any chosen metric.

      If you want to test for a fixed factor, controlling for a second random factor (time) things get really tricky and I'd suppose you post to vegan-help or r-sig-eco. However, I'm afraid there were several questions on this topic before and to my knowledge none of them were answered clearly.

      Delete
    2. Thanks for the clarification. Months have gone by, and I tried hard.

      For instance, I aimed to hunt down every single thread on r-sig-ecology eben remotely connected, and also had a lengthy conversation with an editor of a Journal and author of a book on applied MV stats. Apparently, there is no clear solution to the problem.

      After some consideration, I now perform a db-RDA on my treatment factor while partialling out the sampling variation. Jari (i.e., vegantutor.pdf) literally indicated that this kind of treatment is somewhat related to a mixed effects model: "These conditioning variables typi-
      cally are “random” or background variables, and their effect is removed from the analysis based on “fixed” or interesting variables."

      I can use MRPP to attach a p-value to the data, or use adonis() and it's pseudo-F ratios and p-values. Both approaches are unable to deal with my design, I gather.

      My current opinion is it's better than nothing, but far from satisfying.
      I'd be interested in your views, if you find the time.

      Delete
    3. Hi @rne!

      ..Your story reminds me of the issues I ran into when writing my paper in which I also had a mixed model with mv-data. There are no clear answers to ones questions, and the more you gather the more complicated it gets..

      Please let my clarify that I'm nor a statistican, neither a specialist for mv data analysis - I'm 'just' an interested user who went a little further than most other people who blindly follow recipes for testing their data.

      The RDA approach with patialling out 'random factors' or 'unwanted factors' seems perfectly fine for me - however, I was told that many reviewers dislike multivariate tests - I guess that's because most people don't realls grasp those tests. So, be advised to really get to know what your doing - otherwise you will have your trouble defending your approach.. Anyways, this would also apply for a PERMANOVA approach. But I guess the latter method is somewhat easier to explain.

      Good luck,
      Kay

      Delete
  11. Kay, I still struggle. Using a real-life dataset - 24 plots (actually, 12 times two treatments, but that's another complication), sampled at four times, would I have to use 4^24 restricted permutations to check for a sampling time effect?

    This does not seem right. Nor is it feasible.






    ReplyDelete
    Replies
    1. Hi there,
      To my understanding the very same procedure applies (neglecting your treatments) - however, I tried to reproduce your example and noticed that my code ist not uptodate. There were several changes in the vegan package (permControl) in the meantime. I'm afraid I don't find the time to adjust the example for the moment. Please consider to head to the suited mailing lists..
      Best,
      Kay

      Delete
  12. Hi kay
    Thanks for taking the time to figure this out... I applied this approach to my own data (and I dare say it worked quite well!) and I would like to include this as part of the analysis in a paper I am writing... any suggestions on how to refer to this approach and cite it (apart from the vegan citation, of course)? I am also afraid many reviewers would be sceptical about this, but I really like this approach and the results I got out of this do seem to make sense with the rest of my analysis, so it would be a pity not to include it... any pointers would be appreciated! Thanks again, Laura

    ReplyDelete
    Replies
    1. Hi Laura!

      I'd cite Marti Andersons' fundamental papers on PERMANOVA, which is the method applied in the adonis-function, and Efron & Tibshirani: An Introduction to the Bootstrap, for general considerations regarding the applied permutation scheme (look for IID, which stands for independent and identical distributed).

      Regards,
      Kay

      Delete
    2. Hi Kay!
      Thanks for your quick reply. I applied CAP (Canonical Analysis of Principal Coordinates) after Anderson and Willis, 2003 on the rest of my data - so I definitely have the stack of papers on PERMANOVA as well. Much appreciate the pointer towards Efron & Tibshirani as I am just familiarizing myself with permutations and the "custom permutation scheme" was indeed the part I needed some citations for! Thanks again and keep up the good work
      Laura

      Delete
  13. Hi kay
    thanks again! One extra thing I noticed only just now: while going over my method again, I tried using the option "strata" within adonis, and it seems to be doing something very similar to your script (except the permutation is done over the F values rather than the R2 values). Am I wrong? And if that is indeed the case that both method constrain the permutations in the "correct" way , would using strata work as well, you reckon? Cheers again, Laura

    ReplyDelete
    Replies
    1. Laura, the "strata" argument restrains permutations to be carried out only within and not across strata. We use this above in our control object, for the samples taken at one site. However, in a repeated measurements approach this alone doesn't suffice - that's the reason for the other customizations in the above script! Best, Kay

      Delete