|Description of Topics|
The R statistical programming language and computing environment has become the de-facto standard for writing statistical software among statisticians and has made substantial inroads in the social and behavioural sciences. R is a free, open-source implementation of the S language, and is available for Windows, Mac OS X, and Unix/Linux systems. There is also a commercial implementation of S called S-PLUS, but it has been eclipsed by R. The workshop assumes no previous experience with R.
This workshop will provide a basic overview of and introduction to R in effect, using R as a statistical package. Topics to be covered include getting started with R; drawing standard statistical graphs; statistical models in R; and data in R. In addition, and as time permits, participants will be briefly introduced to R programming and to customized R graphics.
Linear and generalized linear models encompass the most commonly used statistical methods in social research, and provide the basis for most other statistical methods, such as the mixed-effects models taken up in the second part of this year's SPIDA. This workshop provides a review of applied linear regression analysis ("linear models"), followed by an introduction to generalized linear models, with an emphasis on models for categorical response variables (e.g., logit models) and for count data (e.g., Poisson regression and log-linear models). Particular attention will be paid to "diagnostic" methods for determining whether a fitted linear or generalized linear model adequately represents the data, and to the interpretation of results, including methods for visualizing models. The workshop will also cover the implementation of linear and generalized linear models in R. The workshop assumes that participants have previously been exposed to applied linear regression analysis and are at least generally familiar with logistic regression. Each day there will be a lab session in which participants have an opportunity to apply the material covered in that day's lecture, using R to conduct analyses of datasets provided by the instructor.
The text for the materials covered in this three-day Workshop is as follows:
Linear models and their extension to generalized linear models are powerful tools for the analysis of data with observations that are statistically independent. Mixed models generalize these tools so they can be used with the complex data structures that are increasingly common in modern research. In fact, it is the recent development of these methods that makes the efficient analysis of complex data feasible.
We start with an intuitive visual development of the basic theory of mixed models. We then show how to apply the theory in practice. We will work with linear mixed models (LMEs), non-linear mixed models (NLMEs), generalized linear mixed models (GLMMs), and multivariate GLMMs, using packages (nlme, lme4, MCMCglmm and others) available in the R statistical programming environment. R is very well suited to this kind of analysis because it has strong capabilities for the graphical visualization of data and models, and for simulation, which are both important adjuncts to mixed modelling.
Data structures too complex for analysis with traditional methods arise two major ways: multilevel data (e.g. students within classes within schools) and longitudinal data in which each subject is measured on a potentially varying number of occasions. In contrast with traditional methods such as repeated measures analysis, mixed models allow for the analysis of very messy multilevel and longitudinal data. In addition, mixed models can be used to model flexible functions of time using non-parametric splines. GLMMs can be used to fit longitudinal models with realistic assumptions for response distributions such as zero-inflated models, hurdle models and overdispersed models.
The extension of generalized linear models (GLMs) to GLMMs is comparatively more complex than the extension of linear models to LMEs. The last day of the course will be devoted to methods of inference that work with GLMMs. We will learn how to apply Bayesian inference and Monte-Carlo Markov Chains (MCMC) to GLMMs.
By the end of the course, participants should have a working knowledge of mixed models and be able to apply them to a broad class of settings including, for example, the analysis of growth curves in accelerated longitudinal designs, such as those used in Statistics Canada’s longitudinal surveys like the Canadian National Population Health Survey (NPHS). You will also see many examples of specialized functions written in R for the analysis of mixed models and you should be able to design and program your own simple tools in R.
Prospective participants are encouraged to download R from http://cran.r-project.org and to explore the language and its facilities for statistical modelling.
The text for the materials covered in this five-day Workshop is as follows: