
Identification of SNPInteractions Using Logic Regression
Holger Schwender
Collaborative Research Center
SFB 475, University of Dortmund, Germany
Logic regression [2] is a relatively new classification method that
attempts to predict the casecontrol status of an observation based on
Boolean combinations of binary variables, i.e. logic expressions such as "A
or B but not C (are TRUE)." This procedure has been successfully applied to
SNP (Single Nucleotide Polymorphism) data [1]. It, however, might be hard to
interpret the resulting logic expression, in particular if it contains a
large number of variables. I will show how such a logic expression can be
modified to obtain a representation of it that is easy to interpret. I will
further demonstrate the practical usefulness of this representation and how
it can be used to identify SNPs and combination of SNPs that might be
explanatory for the casecontrol status. Finally, I will propose two methods
for measuring the importance of the (combinations of) variables.
References
[1] Kooperberg, C., Ruczinski, I., LeBlanc, M.,
Hsu, L. (2001), "Sequence Analysis using Logic Regression", Genetic
Epidemiology, 21, 626631.
[2] Ruczinski, I., Kooperberg, C., LeBlanc, M. (2003), "Logic Regression",
Journal of Computational and Graphical Statistics, 12(3), 475511.
