This is an example for a permutation test on stratified samples with repeated measurements. Samples are interdependent firstly because they come from several sites and secondly because the sampling was repeated a second time. That is samples of the same sites are dependent and sample t1 and sample t2, taken from the very same places are dependent.

What I want to test is whether there is a difference between timepoint one (t1) and two (t2) or not. A hypothesis could be: the average difference t1-t2 is sign. larger than zero (a one-sided test). Another hypothesis could be: the average difference is sign. different from zero, either larger or smaller (a two-sided test).

If you deal with a distribution of your data that ordinary Linear Mixed Models (LMMs) or Generalized LMMs (GLMMs) can handle you should vote for this option - but sometimes you deal with awkard data and permutation tests may the only thing to bail you out...

What I want to test is whether there is a difference between timepoint one (t1) and two (t2) or not. A hypothesis could be: the average difference t1-t2 is sign. larger than zero (a one-sided test). Another hypothesis could be: the average difference is sign. different from zero, either larger or smaller (a two-sided test).

If you deal with a distribution of your data that ordinary Linear Mixed Models (LMMs) or Generalized LMMs (GLMMs) can handle you should vote for this option - but sometimes you deal with awkard data and permutation tests may the only thing to bail you out...

library(vegan)

### the data:

sites <- sort(rep(letters[1:8],6))

tm <- rep(1:2,24)

pairs <- gl(24, 2)

n <- length(sites)

id <- 1:n

print(data <- data.frame(id, sites, tm, pairs))

### set up a Control object:

ctrl <- permControl(strata = sites, type = "series")

### check permutation scheme -

### by permuting the ids of the pair vector with the above

### control we assure that time points are either changed

### or unchanged within each stratum - this allows for the

### dependency within strata - the nr. of possible permutations

### is 2^n(strata) which is 2^8 = 256, meaning that the smallest

### p-value that we will be able to obtain is 1/256 = 0.0039

### in this frame you can retry and see that what i described above

### really happens:

data.frame(id, sites, pairs, tm, tm_p = tm[permuted.index2(n, control = ctrl)])

### now what i want to test is pair differences - i will

### yield differences by aggregating pairs, i will use the

### aggregate function with FUN = sum, to yield differences

### i set up a "sign" variable, with the negative sign aligned

### to the second time point, that is a positive difference

### means a decrease with time and vice versa:

sign <- rep(c(1,-1), 24)

### this will be a simulated dependent variable with a time-effect

### at the 2nd time point var is decreased

var <- rnorm(48, 10, 2)

var <- ifelse(tm == 2, var * 0.7, var * 1)

### in a frame:

data.frame(id, sites, pairs, tm, var)

### x will be the differences:

### you see, the added effect simulates a decrease

print(true <- aggregate(x = var*sign, by = list(sites = sites, pairs = pairs), FUN = sum))

### and now the actual permutation comes in -

### we permute the sign and than aggregate which

### gives the differences of permuted pairs

aggregate(x = var*sign[permuted.index2(n, control = ctrl)],

by = list(pairs = pairs), FUN = sum)

### and now we repeat the above step and record the mean differences

### which will give us a null-population - this we will compare to

### the mean of the observed differences

m.true <- mean(true$x)

### set number of perms

B <- 1999

### setting up frame for the null-population of differences:

pop <- rep(NA, B + 1)

pop[1] <- m.true

for(i in 2:(B+1)){

perm <- aggregate(x = var*sign[permuted.index2(n, control = ctrl)],

by = list(pairs = pairs), FUN = sum)

pop[i] <- mean(perm$x)

}

### show the null-population and the observed difference:

hist(pop, xlab = "Differences")

abline(v = pop[1], col = "red", lty = 3)

text(x = pop[1]*0.9, y = 150, srt = 90, "the true Difference")

### depending on your hypothesis you will take the one-sided p-value:

print(pval_1 <- sum(pop >= pop[1]) / (B + 1))

### or the two-sided, which simply is:

print(pval_2 <- sum(abs(pop) >= abs(pop[1])) / (B + 1))

### the one-sided test would be appropiate if your hypothesis

### was taht the observed difference is sign. larger than the null.

### the two-sided is appropiate if your hypothesis was that

### the obs. diff. is different from the null - that is, it could be larger

### or smaller than the null.

### You can check the result without time effect by resetting

### the var like beneath, then re-run the test:

var <- rnorm(48, 10, 2)

To cite package ‘vegan’ in publications use:

Jari Oksanen, F. Guillaume Blanchet, Roeland Kindt, Pierre Legendre, R.

B. O'Hara, Gavin L. Simpson, Peter Solymos, M. Henry H. Stevens and

Helene Wagner (2011). vegan: Community Ecology Package. R package

version 1.17-9. http://CRAN.R-project.org/package=vegan

## No comments :

## Post a Comment