4th Nordic and Baltic Stata Users Group meeting 2011
Scientific Program
When: Friday, November 11, 2011
Where: CMB, Berzelius väg 21, Solna Campus, Karolinska Institutet
Hosted by: Unit of Biostatistics, IMM, Karolinska Institutet
Registration:
To register for the meeting, please send an e-mail to
metrika@metrika.se containing your name, affiliation, and contact details. It is free.
Schedule
08:45-09:00 Introduction and welcome
09:00-09:25 Double robust estimators
Arvid Sjölander
Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Sweden
Nicola Orsini
Unit of Biostatistics and Nutritional Epidemiology, Institute of Environmental Medicine, Karolinska Institutet, Sweden
The aim of epidemiological research is typically to estimate the association between a particular exposure on a particular outcome, adjusted for a set of additional covariates. This is commonly done by fitting a regression model for the outcome, given exposure and covariates. If the regression model is misspecified, then the resulting estimator may be inconsistent. Recently, a new class of estimators have been developed, so called "doubly robust" (DR) estimators. These estimators utilize two regression models; one for the outcome and one for the exposure. A DR estimator is consistent if either model is correct, not necessarily both. Thus, DR estimators give the analyst two chances, instead of only one, to make valid inference. In this presentation we describe a new package for Stata, which implements the most common DR estimators.
09:25-09:50 A command for Laplace regression
Nicola Orsini
Unit of Biostatistics and Nutritional Epidemiology, Institute of Environmental Medicine, Karolinska Institutet, Sweden
We present an estimation command for Laplace regression to model conditional quantiles of a response variable given a set of covariates. Differently from the official -qreg- command, the -laplace- command can take into account censored data. We illustrate its applicability and use through examples from health related fields.
09:50-10:15 Time to dementia onset: competing risk analysis with Laplace regression
Giola Santoni, Debora Rizzuto, Laura Fratiglioni
Aging Research Center, Karolinska Institutet, Sweden
We want to quantify the protective effect of education on time to dementia onset in a longitudinal data from a population study. We consider drop out due to death of the subject as a competing event of the outcome of interest. We show an adaptation of the Laplace regression method to the case of competing risk analysis. The first 20% percent of high educated people will develop dementia 2.5 years (p<.01) later than those with a low education level. The effect on all cause of mortality is negligible. We show that the results derived through Laplace regression are comparable to dose derived with Stata command -stcrreg-.
10:15-10:30 An example of competing risk analysis using stcompet and stcrreg
Christel Häggström
Umeå university, Sweden
Competing risk analysis in epidemiology is of special importance in survival analysis when studying elderly, and also, when the exposure is related to early death. In a cohort study we investigated association between metabolic factors (obesity, hypertension, high glucose levels etc) and prostate cancer (with mean age of diagnosis 70 years). Using this data, I will present the analysis were we plotted cumulative incidence curves to be able to visualize the risk of prostate cancer in comparison to the competing risk, all-cause mortality, for different levels of metabolic factors using Stata commands -stcompet- and -stpepemori-. We also used Fine and Gray regression, command -stcrreg-, to calculate hazard ratios of sub-distribution for both prostate cancer incidence and all-cause mortality.
10:30-11:00 Coffee break
11:00-12:00 Chained equations and more in multiple imputation in Stata 12
Yulia Marchenko
StataCorp LP
I present the new Stata 12 command, mi impute chained, to perform multivariate imputation using chained equations (ICE), also known as sequential regression imputation. ICE is a flexible imputation technique for imputing various types of data. The variable-by-variable specification of ICE allows you to impute variables of different types by choosing the appropriate method for each variable from several univariate imputation methods. Variables can have an arbitrary missing-data pattern. By specifying a separate model for each variable, you can incorporate certain important characteristics, such as ranges and restrictions within a subset, specific to each variable. I also describe other new features in multiple imputation in Stata 12.
12:00-12:25 Multiple imputation with quantile imputation
Matteo Bottai
Unit of Biostatistics, Institute of Environmental Medicine, Karolinska Institutet, Sweden
Multiple imputation is an increasingly popular approach for the analysis of data with missing observations. It is implemented in the -mi- suite of Stata commands. We present a new Stata command for imputation of missing values based on prediction of conditional quantiles of missing observations given the observed data. It does not require making distributional assumptions and can be applied to impute dependent, bounded, censored and count data.
12:25-13:30 Lunch break
13:30-13:55 Simulating complex survival data
Michael J. Crowther, Paul C Lambert
Department of Health Sciences, University of Leicester, Leicester, United Kingdom
Simulation studies are essential for understanding and evaluating both current and new statistical models. When simulating survival times, often an exponential or Weibull distribution is assumed for the baseline hazard function, which can be considered too simplistic and lack biological plausibility in many situations. I will describe a new user written command -survsim- which allows the user to simulate survival times from 2-component mixture models, allowing much more flexibility in the underlying hazard. Standard parametric models can also be used including the exponential, Weibull and Gompertz. Furthermore, survival times can be simulated from the all-cause distribution of cause-specific hazards for competing risks. A multinomial distribution is used to create the event indicator, whereby the probability of experiencing each event at a simulated time t is the cause-specific hazard divided by the all-cause hazard evaluated at time t. Baseline covariates and non-proportional hazards can be included in all scenarios. Finally, I will discuss the complex extension of simulating joint longitudinal and survival data.
13:55-14:10 Quantiles of the survival time from Inverse Probability Weighted Kaplan-Meier estimates
Andrea Discacciati
Unit of Biostatistics and Nutritional Epidemiology, Institute of Environmental Medicine, Karolinska Institutet, Sweden
The -stci- official Stata command indirectly estimates quantiles of the survival time for different exposure levels from the Kaplan-Meier estimates. However,
-stci- does not take into account possible confounding effects. Therefore, we introduce a new Stata command -stqkm- that indirectly estimates quantiles of the survival time from Inverse Probability Weighted Kaplan-Meier estimates. Confidence intervals for the quantile estimates are obtained using the bootstrap method. We present a simulation study to assess the performances of the
-stqkm- command in presence of confounding and a case study.
14:10-14:25 Methods for Projecting Cancer Incidence
Mark J Rutherford, Paul C Lambert, John R Thompson
Department of Health Sciences, University of Leicester, Leicester, United Kingdom
Age-period-cohort models provide a useful method for modelling cancer incidence and mortality rates. There is great interest in estimating the rates of disease at given future time-points in order that plans can be made for the provision of the required future services. In the setting of using age-period-cohort models incorporating restricted cubic splines, a new technique for projecting incidence is proposed. The method is validated via a comparison with existing methods in the setting of Finnish cancer registry data. The reasons for the improvements seen for the newly proposed method are twofold. Firstly, improvements are seen due to the finer splitting of the timescale to give a more continuous estimate of the incidence rate. Secondly, the new method uses more recent trends to dictate the future projections than previously proposed methods. The output will be produced via a user-written command -apcfit-. The functionality of the command will be illustrated throughout the talk.
The talk will comprise of an introduction of the use of restricted cubic splines for model fitting before describing their use for age-period-cohort models. A description of the new method for projecting cancer incidence will be given prior to showing the results of the application of the method to Finnish Cancer Registry data. The talk will conclude with a description of the potential problems and issues when making projections.
14:25-14:40 Using meta-analysis to inform the design of subsequent studies
Sally R. Hinchliffe, Michael J. Crowther, Alison Donald and Alex J. Sutton
Department of Health Sciences, University of Leicester, Leicester, United Kingdom
This work describes a suite of programs (-metasim-, -metapow-, -metapowplot-) which enable the user to estimate the probability the conclusions of a meta-analysis will change with the inclusion of a new study(ies), as described previously by Sutton et al (Sutton, Cooper, Jones, Lambert, & Thompson, 2007). Using the -metasim- program, a simulation approach to estimating the effects in future studies is taken. The method assumes that the effect sizes of future studies are consistent with those observed previously, as represented by the current meta-analysis. Both the contexts of two-arm randomised controlled trials and studies of diagnostic test accuracy are considered for a variety of outcome measures. Calculations are possible under both fixed and random effect assumptions and several approaches to inference including statistical significance and limits of clinical significance are possible. Calculations for specific sample sizes can be conducted (using -metapow-), and plots, akin to traditional power curves, indicating the probability a new study(ies) will change inferences for a range of sample sizes can be produced (using -metapowplot-). Finally, plots of the simulation results are overlaid on a previously described macro -extfunnel- which can help to intuitively explain the results of such calculations of sample size.
We hope the macro will be useful to i) trialists who want to assess the impact potential new trials will have on the overall evidence base; and ii) meta-analysts who want to assess the robustness of the current meta-analysis to the inclusion of future data.
14:40-15:05 Using Stata for agent-based simulations
Peter Hedström
Institute for Futures Studies, Stockholm, Sweden
Thomas Grund
ETH, Zürich, Switzerland
Agent-based modeling (ABM) is an analytical tool that is becoming increasingly important in the social sciences. The core idea behind ABM is to use computational models to analyze the macro or aggregate-level outcomes that groups of agents, in interaction with one another bring about. In this presentation I briefly discuss why ABM is important and show how Stata can be used for such analyses. A suit of programs is presented. Some of these are used for generating, visualizing and/or measuring various properties of the networks within which the agents are embedded, and others are used for analyzing the collective outcomes that agents are likely to bring about when embedded in such networks.
15:05-15:20 Coffee break
15:20-15:45 Comparing observed and theoretical distributions
Maarten L. Buis
Institut fuer Soziologie, Universitaet Tuebingen, Germany
The aim of this talk is to introduce graphical tools for comparing the distribution of a variable in your dataset with a theoretical probability distribution, like the normal distribution or the Poisson distribution. It will consist of two parts. In the first part I will consider univariate distributions, with a particular emphasis on hanging and suspended rootograms (-hangroot-). Looking at univariate distribution is not very common in a lot of (sub-(sub-))disciplines, but there are situations where this can be very useful: For example, if we have a count of accidents and we want to know whether these are occurring randomly, then we can compare this variable with a Poisson distribution. Another example would be simulations, where it is often the case that parameters or test statistics should follow a certain distribution when the model that is being checked is working as expected. The second part of the talk will focus on the more common situation where models assume a certain distribution for the explained/dependent/y variable and estimate how one or more parameters, often the mean, changes when one or more explanatory/independent/x variables changes. The challenge now is that the dependent variable no longer follows the theoretical distribution, but rather a mixture of these theoretical distributions. In case of a linear regression, we can circumvent this difficulty by looking at the residuals, which should follow a normal distribution. However, this does not generalize to other models. In this part I will show how to graphically compare the distribution of the dependent variable with the theoretical mixture distribution. The focus will be on a trick to sample new dependent variables under the assumptions that the model is true. Graphing the distribution of the actual dependent variable together with these sampled variables will give an idea whether deviations from the theoretical distribution could have occurred by chance or not. This idea will be applied to checking the distributional assumption in beta regression (-betafit-) and to choosing between different parametric survival models (-streg-).
15:45-16:00 Taking the pain out of looping and storing
Patrick Royston
MRC Clinical Trials Unit, United Kingdom
Quite a common task in Stata is to run some sequence of commands under the control of a looping parameter, and store the corresponding results in one or more new variables. Over the years, I have written many such loops, some of greater complexity than others. I finally became fed up with it and decided to write a simple command to automate the repetitive parts. The result is
-looprun-, which I shall describe in the talk.
16:00-17:00 Structural equation modeling for those who think they don't care
Vince Wiggins
StataCorp LP
We will discuss SEM (structural equation modeling), not from the perspective of the models for which it is most often used—measurement models, confirmatory factor analysis, and the like—but from the perspective of how it can extend other estimators. From a wide range of choices, we will focus on extensions of mixed models (random and fixed-effects regression). Extensions include conditional effects (not completely random), endogenous covariates, and others.
17:00-17:15 Wishes and Grumbles
The Nordic and Baltic Stata Users Group Organizing Committee
Nicola Orsini
Unit of Biostatistics, The National Institute of Environmental Medicine, Karolinska Institutet
nicola.orsini@ki.se
Matteo Bottai
Unit of Biostatistics, The National Institute of Environmental Medicine, Karolinska Institutet
matteo.bottai@ki.se
Peter Hedström
Nuffield College, Oxford University
peter.hedstrom@nuffield.ox.ac.uk