Description of Topics



May 24th
Workshop: Introduction to the R Statistical Computing Environment

The R statistical programming language and computing environment has become the de-facto standard for writing statistical software among statisticians and has made substantial inroads in the social and behavioural sciences. R is a free, open-source implementation of the S language, and is available for Windows, Mac OS X, and Unix/Linux systems. There is also a commercial implementation of S called S-PLUS, but it has been eclipsed by R. The workshop assumes no previous experience with R.

This workshop will provide a basic overview of and introduction to R – in effect, using R as a statistical package. Topics to be covered include getting started with R; drawing standard statistical graphs; statistical models in R; and data in R. In addition, and as time permits, participants will be briefly introduced to R programming and to customized R graphics.


May 25-26, 2012
Linear and Generalized Linear Models
by John Fox

Linear and generalized linear models encompass the most commonly used statistical methods in social research, and provide the basis for most other statistical methods, such as the mixed-effects models taken up in the second part of this year's SPIDA. This workshop provides a review of applied linear regression analysis ("linear models"), followed by an introduction to generalized linear models, with an emphasis on models for categorical response variables (e.g., logit models) and for count data (e.g., Poisson regression and log-linear models). Particular attention will be paid to "diagnostic" methods for determining whether a fitted linear or generalized linear model adequately represents the data, and to the interpretation of results, including methods for visualizing models. The workshop will also cover the implementation of linear and generalized linear models in R. The workshop assumes that participants have previously been exposed to applied linear regression analysis and are at least generally familiar with logistic regression. Each day there will be a lab session in which participants have an opportunity to apply the material covered in that day's lecture, using R to conduct analyses of datasets provided by the instructor.

The text for the materials covered in this three-day Workshop is as follows:
Fox, John and Sanford Weisberg (2011), An R Companion to Applied Regression, Second Edition, Thousand Oaks, Ca.: Sage.

May 25th
Review of Linear Models

  • Multiple regression analysis
  • Dummy-variable regression
  • The elliptical geometry of regression
  • Principle of marginality and "effect displays" for complex models with interactions
  • Regression "diagnostics" for unusual data, non-constant error variance, and nonlinearity; how to correct these problems
  • Implementation of linear models and associated methods in R
  • Introduction to maximum likelihood estimation and Bayesian inference

May 26th
Generalized Linear Models (GLMs)

  • The structure of GLMs: linear predictors, distributional families, and link functions
  • How the familiar linear, logit, and probit models fit into the GLM framework
  • Poisson regression models for count data; handling over-dispersed count data
  • Time permitting: Loglinear models for contingency tables
  • Diagnostics for GLM
  • Implementation of generalized linear models in R

May 28-June 1, 2012
Linear, Non-Linear, and Generalized Linear Mixed Models
for Multilevel and Longitudinal Data
by Georges Monette

Linear models and their extension to generalized linear models are powerful tools for the analysis of data with observations that are statistically independent. Mixed models generalize these tools so they can be used with the complex data structures that are increasingly common in modern research. In fact, it is the recent development of these methods that makes the efficient analysis of complex data feasible.

We start with an intuitive visual development of the basic theory of mixed models. We then show how to apply the theory in practice. We will work with linear mixed models (LMEs), non-linear mixed models (NLMEs), generalized linear mixed models (GLMMs), and multivariate GLMMs, using packages (nlme, lme4, MCMCglmm and others) available in the R statistical programming environment. R is very well suited to this kind of analysis because it has strong capabilities for the graphical visualization of data and models, and for simulation, which are both important adjuncts to mixed modelling.

Data structures too complex for analysis with traditional methods arise two major ways: multilevel data (e.g. students within classes within schools) and longitudinal data in which each subject is measured on a potentially varying number of occasions. In contrast with traditional methods such as repeated measures analysis, mixed models allow for the analysis of very messy multilevel and longitudinal data. In addition, mixed models can be used to model flexible functions of time using non-parametric splines. GLMMs can be used to fit longitudinal models with realistic assumptions for response distributions such as zero-inflated models, hurdle models and overdispersed models.

The extension of generalized linear models (GLMs) to GLMMs is comparatively more complex than the extension of linear models to LMEs. The last day of the course will be devoted to methods of inference that work with GLMMs. We will learn how to apply Bayesian inference and Monte-Carlo Markov Chains (MCMC) to GLMMs.

By the end of the course, participants should have a working knowledge of mixed models and be able to apply them to a broad class of settings including, for example, the analysis of growth curves in accelerated longitudinal designs, such as those used in Statistics Canada’s longitudinal surveys like the Canadian National Population Health Survey (NPHS). You will also see many examples of specialized functions written in R for the analysis of mixed models and you should be able to design and program your own simple tools in R.

Prospective participants are encouraged to download R from http://cran.r-project.org and to explore the language and its facilities for statistical modelling.

The text for the materials covered in this five-day Workshop is as follows:
Snijders, Tom A.B. and Roel J. Bosker (2012), Multilevel Analysis: An Introduction to Basic and Advanced Multilevel Modeling, Second Edition, Thousand Oaks, Ca: Sage.

May 28th
Mixed Models for Hierarchical Data

  • A multilevel example
  • Fixed effects models for hierarchical data
  • Visualizing multiple fits: data space versus beta space
  • Random effects
  • Hierarchical models
  • Formulating hierarchical models as mixed models
  • Anatomy of mixed models: from the simplest to the largest
  • Estimation of fixed effects, inference for fixed effects
  • Formulating research questions as linear hypotheses
  • Prediction of random effects
  • Variance of random effects, estimation and inference

May 29th
Mixed Models for Longitudinal Data

  • Modeling dependency in time
  • "G side" versus "R side" variance models: overlapping and distinct functions. What is the difference between marginal and subject-specific models for linear models?
  • Modeling interesting functions of time: polynomial, periodic, discontinuous, flexible splines
  • Assumptions and diagnostics for mixed models, remedial measures

May 30th
Further Topics in Mixed Models

  • Interpreting random slopes and the variance of random effects, the variance function, components of variance models
  • Model selection, REML versus ML
  • Contextual variables: contextual effects versus compositional effects.
  • Disentangling age-period-cohort effects
  • Causal inference with longitudinal data, generalized propensity score
  • Dealing with missing data, selection models, pattern mixture models, imputation
  • What is R-squared for multilevel models?
  • Using mixed models for non-parametric splines
  • Power and design for multilevel models

May 31st
Non-Linear Mixed Models and Introduction to Generalized Linear Mixed Models

  • Non-linear mixed models: applications, asymptotic functions of time
  • When should you transform data to make the model linear or analyze the original data with a non-linear model?
  • GLMMs for multilevel logistic regression using glmmPQL and glmer.
  • GLMMs for multilevel count data using glmmPQL and glmer. Overdispersion, diagnostics and remedies.

June 1st
Generalized Linear Mixed Models: Applications and Inference

  • GLMMs for more complex models using MCMCglmm. Specifying models in MCMCglmm, priors for MCMC
  • Comparison of fitting methods: quasi-likelihood, likelihood. Why MCMC?
  • Multivariate GLMMs: zero-inflated models
  • Convergence diagnostics and how to improve convergence
  • Marginal versus subject-specific models and other issues of interpretation
  • Presenting the model graphically
  • Review
. top