Social Science Uses of Mosaic at the

City University of New York

By Dean Savage, Department of Sociology, Queens College-CUNY

URL: http://www.soc.qc.edu


Abstract:

Three social science research and software projects undertaken for reasons other than distribution on the WWW are described. Once available, WWW became the preferred distribution channel, both for use within the CUNY system and externally. The projects include a demonstration archive of GIS maps of New York City done for The New York Times and derived from analyses of county and tract level census data; a data extraction program for use with the General Social Surveys which integrates the questions from the GSS codebook and writes output for SPSS, SAS, and other software packages; and a machine-searchable version of the ninth edition of Tom Smith's and Bradley Arnold's Annotated Bibliography of Papers Using the General Social Surveys, 1972-1993.

The advent of the World Wide Web and tools such as Mosaic and Lynx present the opportunity for dissemination of research results and research tools to new groups of users. The sociology department at Queens College-CUNY offers an example. NSF funding for our computer laboratories provided an environment in which it has become possible to develop new research tools and do new types of research, and now the World Wide Web provides the means for dissemination to groups of researchers and users both inside and outside CUNY who were not easy to reach by traditional means. Queens College-CUNY can be described as a low-end testbed: if it can be done here in a social science department, it is possible in a very wide range of environments. We will use one ongoing research project and two software development projects as examples of uses of the web which are likely to become increasingly common.


Geographic information systems presentation of results of demographic research on New York City: Funding provided to Andrew Beveridge by The New York Times has resulted in the acquisition of extensive experience in analysis and GIS presentation of census data on the county, zip code, and tract level for counties in the New York City metropolitan area. The result for The Times has been a series of articles and accompanying maps which have appeared during the past year and a half. This has permitted the acculation of an archive of maps of social scientific interest. A typical example might be dot density maps of the residential locations of new Asian immigrants and how these have changed over time. By examining the 1970, 1980, and 1990 census data for the county of Queens, it is possible to see both how the new Korean immigration has grown and come to be concentrated in particular areas. Although such results have been available in the past in tabular form, there are several advantages to presentation and dissemination of an archive of map images. Information is conveyed more effectively in map form -- the spread and concentration of new waves of immigration along subway lines, for example, simply cannot be conveyed as well by means of text or numbers. A second advantage is that the needs in both equipment and time for creation of map images remains substantial enough that most casual researchers will not use maps if they have to create them. An archive will be especially useful for this group. Certain archives may become known as locations where maps on particular themes will be available. While these will not serve all purposes, they will frequently help to sharpen the focus of research done by others. Finally, the dissemination of results via the web provides a source which is intermediate to traditional newspaper publication (in The New York Times, in this case) and publication in scholarly venues. The latter is so slow that an instantly and easily accessible archive has some strong advantages. In this example as in many others, the web has the effect of decentralizing the sources of information by providing access to specialized research results.

A Data Extraction Program for the General Social Surveys: Several projects are currently underway to provide data extraction capability for the General Social Surveys on mainframes, workstations or PCs. Since the complete file for this widely-used representative national dataset contains more than 2000 variables, it is customary to create data extracts prior to analysis. The need for simplified data extraction has been evident for some time, and is finally receiving serious attention. We announce here a completed effort in the form of an extraction program for use with the complete GSS dataset on PCs running DOS, and thus intended for the broadest and least skilled user audience. This group includes almost all undergraduate social science students and that not insubstantial share of faculty members and graduate students who have either forgotten their mainframe skills or never learned any in the first place. Several design criteria influenced our choices in writing the program: it had to be simple enough for utter novices to use; it had to create extracts which could be used with more than one statistical package, and it had to keep cost and delivery expenses to the absolute minimum. Extract is a freeware program written by Jesse Reichler and Dean Savage which permits novices to browse and select from the 2,173 variables used in the nineteen annual surveys included in the database. The GSS codebook has been integrated into the program so users can examine the text of individual questions before making a selection. We wrote Extract to produce program and data files for SPSS, SAS, Dbase, and ACII. In the event that users do not have any of these programs, Extract also writes files for use with QStats, a data exploration and analysis package designed for ease of use and heavily oriented toward graphics. Also written by Jesse Reichler and Dean Savage, QStats is available as freeware from the same location. The data are available in single year compressed files. Initially developed for dissemination via CDROM -- an outlet we are still pursuing -- it became clear that both the software and data could be readily and immediately distributed within the twenty-one campuses of the City University. Some users will prefer to obtain the entire set of data files for local use with Extract, while others may prefer to do extracts on a remote server. We are conducting CUNY-wide faculty workshops to accomodate both groups in November. Since the complete GSS data files in compressed form amount to some 20 odd megabytes, they are perhaps best retrieved by users outside our system from alternate sources with greater bandwidth, but we intend to support such retrieval on a trial basis. Once the CDROM version has been distributed, it is easy to imagine how new year files could be easily distributed via the web. This would resolve one of the annoying problems in the current mode of dissemination for the General Social Surveys: to acquire a new year, it is necessary to reacquire the entire file, which at 70 plus megabytes is too large for some researchers to handle easily.

A Machine-Searchable Version of the GSS Annotated Bibliography: The ninth edition of The Annotated Bibliography of Papers Using the General Social Surveys, 1972-1993, by Tom Smith and Bradley W. Arnold (Chicago: NORC, 1994), is an invaluable research tool for users of the General Social Survey. It contains some 2500 references and abstracts of works which have used the GSS, and indicates which variables were used in each work. Researchers can review the progress of research in a field by consulting this bibliography. Organized alphabetically by author's last name, it has nonetheless been somewhat awkward to use and has been underutilized. As a part of the CDROM project described above, Jesse Reichler and Dean Savage have written a search engine, QSearch, which can search the annotated bibliography by keyword and rank the results in several ways. The results can be printed or saved to a file. Both QSearch and the associated bibliographical database (GSSBIB9) are available via the web as freeware. Distribution in machine-searchable form will increase the impact of this very useful resource. It will also provide a simpler and more rapid parallel distribution path for the hard copy version of new editions. The next edition -- the tenth -- will add four hundred new references and is scheduled for release by Tom Smith and Bradley Arnold in November of 1994.


Dean Savage is Associate Professor of Sociology at Queens College of the City University of New York. His recent research has been in the areas of the sociology of higher education and software development. The software development projects, all in collaboration with Jesse Reichler (now of the University of Illinois at Urbana-Champaign), have resulted in Extract and QSearch programs described in the paper and also a data management program (QData) and data exploration program (QStats). All are available as freeware on the WWW.

Author's email address: savage@qcvaxa.acc.qc.edu