Hotel Estoril Eden, Monte Estoril,
5-8 October 2005



Identification of SNP-Interactions Using Logic Regression

Holger Schwender
Collaborative Research Center SFB 475, University of Dortmund, Germany

Logic regression [2] is a relatively new classification method that attempts to predict the case-control status of an observation based on Boolean combinations of binary variables, i.e. logic expressions such as "A or B but not C (are TRUE)." This procedure has been successfully applied to SNP (Single Nucleotide Polymorphism) data [1]. It, however, might be hard to interpret the resulting logic expression, in particular if it contains a large number of variables.

I will show how such a logic expression can be modified to obtain a representation of it that is easy to interpret. I will further demonstrate the practical usefulness of this representation and how it can be used to identify SNPs and combination of SNPs that might be explanatory for the case-control status. Finally, I will propose two methods for measuring the importance of the (combinations of) variables.

[1] Kooperberg, C., Ruczinski, I., LeBlanc, M., Hsu, L. (2001), "Sequence Analysis using Logic Regression", Genetic Epidemiology, 21, 626-631.
[2] Ruczinski, I., Kooperberg, C., LeBlanc, M. (2003), "Logic Regression", Journal of Computational and Graphical Statistics, 12(3), 475-511.