INTRODUCTION | BIBLIOGRAPHIC REVIEW | SCALE CONSTRUCTION |
make_CAMSIS - A resource for deriving CAMSIS measures with Stata
First published Autumn 2009, Paul Lambert
- Making CAMSIS with Stata
Software alternatives
The is a steadily expanding tradition in the construction of scales of occupational ranking by using patterns of social interaction distance - see our bibliographical review page for many relevant references. Since the first endeavours in the 1960's, we have seen social scientists generating such scales in a number of alternative software packages. Examples that we are aware of include:
Scales affiliated to the CAMSIS website:
- In the original 'Cambridge scale' derivations (Stewart, Prandy and Blackburn 1980; Prandy 1990), a multi-dimensional scaling programme called MINISSA was used (with adaptations to accomodate the matrix size invoolved - see Stewart et al., 1980: 37)
- For the wider CAMSIS project (2000-present), Ken Prandy, Paul Lambert and colleagues wrote programmes, and provided usage instructions, exploiting a combination of the SPSS and lEM (Vermunt, 1997) packages using their respective correspondence analysis and RC-II association model routines.
- Additionally, working within the CAMSIS tradition..
- As documented on this page, Paul Lambert increasingly uses Stata for either the entire scale derivation process, or for the data manipulation precursors to an analysis in lEM
- Stephen McTaggart, working with data for New Zealand, has written SAS programmes to undertake correspondence analysis routines replicating the SPSS and lEM approaches documented on the CAMSIS website
Other published scales of social interaction distance:
- Chan and Goldthorpe (2004) programmed in R to develop scales for the UK based upon RC-II association models for friendship and marriage patterns
- Rytina (1992) programmed in GAUSS to develop scales for the US based upon RC-II association models for intergenerational mobility patterns
- Laumann and Guttman (1966) used smallest space analysis algorithms developed by Guttman (1968) and Lingoes (e.g. 1966) to develop a scale for the US based upon pattern analysis of friendship patterns
- MacDonald (1972) used MDSCAL to develop scales for the UK based upon multi-dimensional scaling of intergenerational mobility patterns
Elsewhere on our website, we give extended instructions on generating CAMSIS scales from occupational data by using a combination of SPSS and lEM (CAMSIS scale construction guide in SPSS and lEM). On this page, however, we concentrate upon using Stata for CAMSIS scale construction.
Stata
Within the field of social statistics and social survey data analysis, Stata has emerged as a very popular choice of analytical software, valued for its combination of extended functionality covering tasks of basic and advanced statstical analysis, and data construction and management (see Treiman, 2009; if you are interested, for other projects related to CAMSIS, I have written many Stata training materials which are available at our websites http://www.longitudinal.stir.ac.uk/Stata_support.html and http://www.dames.org.uk/workshops/stir09/dm_stata_august09.html). Stata is also popular because of its capacity for documentation and replication (Long, 2009), and its accessibility for complex automated programming .
All of these features, but particularly its capacity for automated programming, make Stata an attractive attractive for the sequence of activities involved in constructing CAMSIS scales. Indeed, since around 2007, I have found that I primiarly use Stata for generating new CAMSIS scales.
Stata is useful for generating CAMSIS scales because it supports Correpsondence Analysis on highly sparse tables, and because it simplifies many of the data management tasks associated with producing CAMSIS scales (e.g. linking multiple data files, standardising variables and distributing scale scores across files). Stata is not, however, flawless for the purpose, because (to the best of my knowledge) it doesn't support routines for RC-II association models (and their associated design matrix specification) which can handle large sparse two-way tables. Association model estimations with linked design matrices are more attractive for CAMSIS scales as they give us a better controlled statistical estimation, however, in my experience equivalent results can be achieved in an correspondence analysis by carefully excluding relevant combinations of occupations from analysis. Thus, in my experience, the trade off in using Stata for CAMSIS scale estimates is usually one worth making.
Manually generating CAMSIS scales using Correspondence Analysis in Stata
As in example, linked below is a partially annotated Stata command file which covers the entire 'workflow' of producing a CAMSIS scale for Romania using 2002 census data obtained from the IPUMS International project (www.ipums.org). The end product of this analysis is available for download from http://www.camsis.stir.ac.uk/versions.html#Romania.
http://www.camsis.stir.ac.uk/make_camsis/romania_example/camsis_romania_construction_example.do
Subfiles used within the above command file are available at: http://www.camsis.stir.ac.uk/make_camsis/romania_example/
The example file illustrates the process of opening up data on pairs of social interactions, organising it so that it is suitable for correpsondence analysis; performing a sequence of correspondence analyses gradually removing small numbers of 'pseudo-diagonal' combinations; settling on a final model and exporting the results of that model; then re-standardising and distributing the results of that model.
The example file is not perfectly annotated and may even have a few errors in. In principle, the example file should be fully replicable - anybody interested should be able to access the same IPUMS micro-data and re-run the analysis. In practice, there may be some unanticipated features which prevent this (comments welcome).
Automatically generating CAMSIS scales using Correspondence Analysis in Stata
In Stata, I have written macros which have the possibility to automate the entire CAMSIS scale construction process (or the large majority of it). To apply these, the user needs to specify their input micro-data file, and a list of 'pseudo-diagonals' if relevant, as well as the locations they want their derived scales to be saved to. The macros generally come in a few different forms, as well, which depend upon the underlying occupational unit groups being used.
Intro
The occupational unit group scheme on which a CAMSIS scale is being prepared is of course quite consequential. More detailed unit group schemes generally make for a harder analysis (there may be more problems associated with 'pseudo-diagonality', and recoding of sparse categories), but they also give, substantively, a better end product (because they can be sensitive to finer differences in the occupational structure).
For some widely used occupational unit group schemes, we have generated our own data resources relevant to CAMSIS scale derivations. These include recommended recodes for sparse occupational unit categories, and other summary data. When available, these resources are listed below.
ISCO88
Link: Description of ISCO88
HISCO
Link: Description of HISCO
{more examples and explanatory information to follow}
Additional comments: CAMSIS scale construction in Stata
References
Bibliogrpahical references
Websites