If
I was even willing to accept this study on its face, this is my “shorter” interpretation
of its conclusion:
“We did
a mega-analysis of 17 genetic loci genome scan studies for schizophrenia. We found 7 genetic loci correlating to
schizophrenia. Of those, 5 were
NOT FOUND IN ANY OF THE ORIGINAL 17 STUDIES, NOR IN ANY PREVIOUS STUDY. The remaining 2 were found in SOME of
the studies. Thus, any
correlations other than those two found in any of the studies have been
effectively negated. “
Thus, even if you accept the study, the conclusion seems to point to the fact that almost all of the findings in the studies it analyzes were, in fact, statistical artifacts. Interesting. It's more interesting that the authors did not seem to be aware of this fact, or didn't think it was pertinent.
Moreover, like many meta-analyses (or mega-analyses at the case may be) the study itself uses some rather complicated statistical analysis that I think is already questionable, and I discuss that briefly below the fold. It will cost you $32 to check out the full study for yourself. I paid it so you don't have to...
Moreover, like many meta-analyses (or mega-analyses at the case may be) the study itself uses some rather complicated statistical analysis that I think is already questionable, and I discuss that briefly below the fold. It will cost you $32 to check out the full study for yourself. I paid it so you don't have to...
We
examined the role of common genetic variation in schizophrenia in a genome-wide
association study of substantial size:
…
The combined stage 1 and 2 analysis yielded genome-wide
significant associations with schizophrenia for seven loci, five of which are
new
So
we already might wonder why we have five associations that have not appeared
before in the three decades of performing such studies. Only two associations have ever appeared
before in a study and these two are have never been consistently replicated. Thus, it seems quite possible from the
first sentence in the abstract that we are just dealing with randomly generated
data and the associations are just what you might expect by chance.
Now,
let’s move on to the study itself.
The first sentence of the study is a good tipoff that we are headed into
dubious statistical machinations:
“In stage 1,
we conducted a mega-analysis combining genome-wide assocation study (GWAS) data
from 17 separate studies”
So we
know we are not working completely blindly. The authors have chosen 17 separate studies for which they
already effectively know the outcome.
They do not say why they chose these 17 studies or whether there were
other studies that they chose not to include in the “mega-analysis.” Thus there is immediate potential for
stacking the mega-analysis with studies that are more conducive to the desired
results (for the record, I am not accusing the authors of doing this
deliberately. Quite the contrary,
I believe such decisions are usually related to the unconscious bias of those
performing the study. Of course, you want to show that the different studies are consistent for such a mega-analyis and they try to do so, here:
We tested for
association using logistic regression of imputed dosages with sample
identifiers and three principal components as covariates to minimize inflation
in significance testing caused by population stratification. The
quantile-quantile plot (Supplementary Fig. 1) deviated from the null distribution with a
population stratification inflation factor of λ = 1.23. However, Lambdaλ1000, a metric that
standardizes the degree of inflation by sample size, was only 1.02, similar to
that observed in other GWAS meta-analyses2,3.
This deviation
persisted despite comprehensive quality control and inclusion of up to 20
principal components (Supplementary Fig. 1). Thus, we interpret this deviation as indicative of a large number of
weakly associated SNPs consistent with polygenic inheritance.
We need to get at what they are claiming here regarding a
quantile-quantile plot. Here is what a quantile-quantile plot is designed to measure:
The q-q plot is
used to answer the following questions:
.
Do two data
sets come from populations with a common distribution?
.
Do two data
sets have common location and scale?
.
Do two data
sets have similar distributional shapes?
Do two data sets
have similar tail behavior?
So, it appears right off the bat, the quantile-quantile plot
indicated a problem, as it does not show that the populations of these studies
come from a common distribution.
How is this shored up?
Well, they change the metric and assume, circularly, that the deviation
was due to weakly associated polygenic inheritance. Polygenic inheritance, weakly or not, has actually never
been demonstrated to be true. It
is an assumption of many scientists, due to the inability to find specific 1:1
correlations for genes and mental disorders. In fact, the whole reason for doing a mega-analysis such as
this is to find these elusive, weakly associated SNP’s, since no individual
studies to date have been able to do so.
Thus, they are effectively saying that the mega-analysis, designed to
find weakly associated SNP’s for for polygenic inheritance deviates from the
null because of weakly associated polygenic inheritance. If you doubt that such “weakly
associated polygenic inheritance” is factual, you can already rule out the
validity of the study. Of course,
that is too easy, so I will continue for those who need more convincing…
What we can see already is that, when they run into an
inconvenient statistical truth, they can find a way to “re-examine” the data so
that this truth goes away. This
happens again with the next paragraph:
We also examined
298 ancestry-informative markers (AIMs) that reflect European-ancestry
population substructure5.
Unadjusted analyses showed greater inflation in the test statistics than we saw
for all markers (AIMs λ = 2.26 compared to all markers λ = 1.56). After inclusion
of principal components, the distributions of the test statistics did not
differ between AIMs (λ = 1.18) and all markers (λ = 1.23), a result
inconsistent with population stratification explaining the residual deviation
seen in Supplementary Figure 1.
It appears that they needed to double down on the problem
already noted, which is that the generally accepted statistical analysis is not
giving them the distribution they wanted.
This is the kind of after-the-fact statistical machinations available
when you have no real hypothesis stated for your study. It’s “Let’s do a mega-analysis, see
what happens, then find ways to interpret the data to meet our bias.”
Okay, so
you are bored and you are saying, “Steve, why beat a dead horse. They are just trying to show that real
world distribution is not going to match up to statistical analyis and
adjustments need to be made (even if it is after-the-fact).” I could go on, but since the study really doesn't show much of anything, what is the point?
I am very happy to your post about on. Great information, I would like to say your post is very informative. You did a wonderful thing and should be proud of yourself for sharing. Thank you! I am very happy to your post its a great post.
ReplyDeletecomt gene test