SPIDA 2012: Summer Programme in Data Analysis

Description of Topics

SPIDA Program May 24 - June 1, 2012
Dates	Topic	Instructor
Thursday May 24th	Introduction to the R Statistical Computing Environment: Workshop	John Fox
Friday May 25th	Review of Linear Models
Saturday May 26th	Generalized Linear Models

Monday May 28h	Mixed Models for Hierarchical Data	Georges Monette
Tuesday May 29th	Mixed Models for Longitudinal Data
Wednesday May 30th	Further Topics in Mixed Models
Thursday May 31st	Non-Linear Mixed Models and Introduction to Generalized Linear Mixed Models
Friday June 1st	Generalized Linear Mixed Models: Applications and Inference

May 24th
Workshop: Introduction to the R Statistical Computing Environment

The R statistical programming language and computing environment has become the de-facto standard for writing statistical software among statisticians and has made substantial inroads in the social and behavioural sciences. R is a free, open-source implementation of the S language, and is available for Windows, Mac OS X, and Unix/Linux systems. There is also a commercial implementation of S called S-PLUS, but it has been eclipsed by R. The workshop assumes no previous experience with R.

This workshop will provide a basic overview of and introduction to R – in effect, using R as a statistical package. Topics to be covered include getting started with R; drawing standard statistical graphs; statistical models in R; and data in R. In addition, and as time permits, participants will be briefly introduced to R programming and to customized R graphics.

May 25-26, 2012
Linear and Generalized Linear Models
by John Fox

Linear and generalized linear models encompass the most commonly used statistical methods in social research, and provide the basis for most other statistical methods, such as the mixed-effects models taken up in the second part of this year's SPIDA. This workshop provides a review of applied linear regression analysis ("linear models"), followed by an introduction to generalized linear models, with an emphasis on models for categorical response variables (e.g., logit models) and for count data (e.g., Poisson regression and log-linear models). Particular attention will be paid to "diagnostic" methods for determining whether a fitted linear or generalized linear model adequately represents the data, and to the interpretation of results, including methods for visualizing models. The workshop will also cover the implementation of linear and generalized linear models in R. The workshop assumes that participants have previously been exposed to applied linear regression analysis and are at least generally familiar with logistic regression. Each day there will be a lab session in which participants have an opportunity to apply the material covered in that day's lecture, using R to conduct analyses of datasets provided by the instructor.

The text for the materials covered in this three-day Workshop is as follows:
Fox, John and Sanford Weisberg (2011), An R Companion to Applied Regression, Second Edition, Thousand Oaks, Ca.: Sage.

May 25th
Review of Linear Models

Multiple regression analysis
Dummy-variable regression
The elliptical geometry of regression
Principle of marginality and "effect displays" for complex models with interactions
Regression "diagnostics" for unusual data, non-constant error variance, and nonlinearity; how to correct these problems
Implementation of linear models and associated methods in R
Introduction to maximum likelihood estimation and Bayesian inference

May 26th
Generalized Linear Models (GLMs)

The structure of GLMs: linear predictors, distributional families, and link functions
How the familiar linear, logit, and probit models fit into the GLM framework
Poisson regression models for count data; handling over-dispersed count data
Time permitting: Loglinear models for contingency tables
Diagnostics for GLM
Implementation of generalized linear models in R

May 28-June 1, 2012
Linear, Non-Linear, and Generalized Linear Mixed Models
for Multilevel and Longitudinal Data
by Georges Monette

Linear models and their extension to generalized linear models are powerful tools for the analysis of data with observations that are statistically independent. Mixed models generalize these tools so they can be used with the complex data structures that are increasingly common in modern research. In fact, it is the recent development of these methods that makes the efficient analysis of complex data feasible.

We start with an intuitive visual development of the basic theory of mixed models. We then show how to apply the theory in practice. We will work with linear mixed models (LMEs), non-linear mixed models (NLMEs), generalized linear mixed models (GLMMs), and multivariate GLMMs, using packages (nlme, lme4, MCMCglmm and others) available in the R statistical programming environment. R is very well suited to this kind of analysis because it has strong capabilities for the graphical visualization of data and models, and for simulation, which are both important adjuncts to mixed modelling.

Data structures too complex for analysis with traditional methods arise two major ways: multilevel data (e.g. students within classes within schools) and longitudinal data in which each subject is measured on a potentially varying number of occasions. In contrast with traditional methods such as repeated measures analysis, mixed models allow for the analysis of very messy multilevel and longitudinal data. In addition, mixed models can be used to model flexible functions of time using non-parametric splines. GLMMs can be used to fit longitudinal models with realistic assumptions for response distributions such as zero-inflated models, hurdle models and overdispersed models.

The extension of generalized linear models (GLMs) to GLMMs is comparatively more complex than the extension of linear models to LMEs. The last day of the course will be devoted to methods of inference that work with GLMMs. We will learn how to apply Bayesian inference and Monte-Carlo Markov Chains (MCMC) to GLMMs.

By the end of the course, participants should have a working knowledge of mixed models and be able to apply them to a broad class of settings including, for example, the analysis of growth curves in accelerated longitudinal designs, such as those used in Statistics Canada’s longitudinal surveys like the Canadian National Population Health Survey (NPHS). You will also see many examples of specialized functions written in R for the analysis of mixed models and you should be able to design and program your own simple tools in R.

Prospective participants are encouraged to download R from http://cran.r-project.org and to explore the language and its facilities for statistical modelling.

The text for the materials covered in this five-day Workshop is as follows:
Snijders, Tom A.B. and Roel J. Bosker (2012), Multilevel Analysis: An Introduction to Basic and Advanced Multilevel Modeling, Second Edition, Thousand Oaks, Ca: Sage.

May 28th
Mixed Models for Hierarchical Data

A multilevel example
Fixed effects models for hierarchical data
Visualizing multiple fits: data space versus beta space
Random effects
Hierarchical models
Formulating hierarchical models as mixed models
Anatomy of mixed models: from the simplest to the largest
Estimation of fixed effects, inference for fixed effects
Formulating research questions as linear hypotheses
Prediction of random effects
Variance of random effects, estimation and inference

May 29th
Mixed Models for Longitudinal Data

Modeling dependency in time
"G side" versus "R side" variance models: overlapping and distinct functions. What is the difference between marginal and subject-specific models for linear models?
Modeling interesting functions of time: polynomial, periodic, discontinuous, flexible splines
Assumptions and diagnostics for mixed models, remedial measures

May 30th
Further Topics in Mixed Models

Interpreting random slopes and the variance of random effects, the variance function, components of variance models
Model selection, REML versus ML
Contextual variables: contextual effects versus compositional effects.
Disentangling age-period-cohort effects
Causal inference with longitudinal data, generalized propensity score
Dealing with missing data, selection models, pattern mixture models, imputation
What is R-squared for multilevel models?
Using mixed models for non-parametric splines
Power and design for multilevel models

May 31st
Non-Linear Mixed Models and Introduction to Generalized Linear Mixed Models

Non-linear mixed models: applications, asymptotic functions of time
When should you transform data to make the model linear or analyze the original data with a non-linear model?
GLMMs for multilevel logistic regression using glmmPQL and glmer.
GLMMs for multilevel count data using glmmPQL and glmer. Overdispersion, diagnostics and remedies.

June 1st
Generalized Linear Mixed Models: Applications and Inference

GLMMs for more complex models using MCMCglmm. Specifying models in MCMCglmm, priors for MCMC
Comparison of fitting methods: quasi-likelihood, likelihood. Why MCMC?
Multivariate GLMMs: zero-inflated models
Convergence diagnostics and how to improve convergence
Marginal versus subject-specific models and other issues of interpretation
Presenting the model graphically
Review

top