City University of New York
By Dean Savage, Department of Sociology, Queens College-CUNY
URL: http://www.soc.qc.edu
The advent of the World Wide Web and tools such as Mosaic and Lynx present the opportunity for dissemination of research results and research tools to new groups of users. The sociology department at Queens College-CUNY offers an example. NSF funding for our computer laboratories provided an environment in which it has become possible to develop new research tools and do new types of research, and now the World Wide Web provides the means for dissemination to groups of researchers and users both inside and outside CUNY who were not easy to reach by traditional means. Queens College-CUNY can be described as a low-end testbed: if it can be done here in a social science department, it is possible in a very wide range of environments. We will use one ongoing research project and two software development projects as examples of uses of the web which are likely to become increasingly common.
A Data Extraction Program for the General Social Surveys: Several projects are currently underway to provide data extraction capability for the General Social Surveys on mainframes, workstations or PCs. Since the complete file for this widely-used representative national dataset contains more than 2000 variables, it is customary to create data extracts prior to analysis. The need for simplified data extraction has been evident for some time, and is finally receiving serious attention. We announce here a completed effort in the form of an extraction program for use with the complete GSS dataset on PCs running DOS, and thus intended for the broadest and least skilled user audience. This group includes almost all undergraduate social science students and that not insubstantial share of faculty members and graduate students who have either forgotten their mainframe skills or never learned any in the first place. Several design criteria influenced our choices in writing the program: it had to be simple enough for utter novices to use; it had to create extracts which could be used with more than one statistical package, and it had to keep cost and delivery expenses to the absolute minimum. Extract is a freeware program written by Jesse Reichler and Dean Savage which permits novices to browse and select from the 2,173 variables used in the nineteen annual surveys included in the database. The GSS codebook has been integrated into the program so users can examine the text of individual questions before making a selection. We wrote Extract to produce program and data files for SPSS, SAS, Dbase, and ACII. In the event that users do not have any of these programs, Extract also writes files for use with QStats, a data exploration and analysis package designed for ease of use and heavily oriented toward graphics. Also written by Jesse Reichler and Dean Savage, QStats is available as freeware from the same location. The data are available in single year compressed files. Initially developed for dissemination via CDROM -- an outlet we are still pursuing -- it became clear that both the software and data could be readily and immediately distributed within the twenty-one campuses of the City University. Some users will prefer to obtain the entire set of data files for local use with Extract, while others may prefer to do extracts on a remote server. We are conducting CUNY-wide faculty workshops to accomodate both groups in November. Since the complete GSS data files in compressed form amount to some 20 odd megabytes, they are perhaps best retrieved by users outside our system from alternate sources with greater bandwidth, but we intend to support such retrieval on a trial basis. Once the CDROM version has been distributed, it is easy to imagine how new year files could be easily distributed via the web. This would resolve one of the annoying problems in the current mode of dissemination for the General Social Surveys: to acquire a new year, it is necessary to reacquire the entire file, which at 70 plus megabytes is too large for some researchers to handle easily.
A Machine-Searchable Version of the GSS Annotated Bibliography: The ninth edition of The Annotated Bibliography of Papers Using the General Social Surveys, 1972-1993, by Tom Smith and Bradley W. Arnold (Chicago: NORC, 1994), is an invaluable research tool for users of the General Social Survey. It contains some 2500 references and abstracts of works which have used the GSS, and indicates which variables were used in each work. Researchers can review the progress of research in a field by consulting this bibliography. Organized alphabetically by author's last name, it has nonetheless been somewhat awkward to use and has been underutilized. As a part of the CDROM project described above, Jesse Reichler and Dean Savage have written a search engine, QSearch, which can search the annotated bibliography by keyword and rank the results in several ways. The results can be printed or saved to a file. Both QSearch and the associated bibliographical database (GSSBIB9) are available via the web as freeware. Distribution in machine-searchable form will increase the impact of this very useful resource. It will also provide a simpler and more rapid parallel distribution path for the hard copy version of new editions. The next edition -- the tenth -- will add four hundred new references and is scheduled for release by Tom Smith and Bradley Arnold in November of 1994.
Author's email address: savage@qcvaxa.acc.qc.edu