This document is designed to introduce students to SPSS for Windows. Another wonderful web site that discusses SPSS has been developed by Raynald Levesque (Montreal, Canada). Click here to visit his official SPSS website.
A debt of gratitude is due to those responsible for the following website, which was used as the basis for the present webpage: http://www.indiana.edu/~statmath/stat/spss/win/giant.html (University Information Technology Services, Stat/Math Center, Indiana University, 410 N. Park Ave, Bloomington, In. 47408). This is one of the best websites for learning how to use SPSS for Windows; be sure to visit it!!!!!!
What is SPSS?
SPSS is a comprehensive and flexible statistical analysis and data management system. SPSS can take data from almost any type of file and use them to generate tabulated reports, charts, and plots of distributions and trends, descriptive statistics, and conduct complex statistical analyses. SPSS is available from several platforms including WINDOWS 3.x, WINDOWS 95, WINDOWS 10, WINDOWS NT, Macintosh, and the Unix systems.
What is SPSS for Windows?
SPSS for Windows brings the full power of the mainframe version of SPSS to the personal computer environment. It will enable you to perform many analyses on your PC that were once possible only on much larger machines. SPSS for Windows provides a user interface that makes statistical analysis more intuitive for all levels of users. Simple menus and dialog box selections make it possible to perform complex analyses without typing a single line of command syntax. The built-in SPSS Data Editor offers a simple and efficient spreadsheet-like utility for entering data and browsing the working data file. High-resolution, presentation-quality charts and plots can be created and edited. Using the SPSS Viewer, you can handle the output with greater flexibility. SPSS for Windows also reads data files from a variety of file formats including Excel, dBASE, Lotus, and SAS.
Windows in SPSS for Windows
There are six types of windows that you will see in SPSS for Windows:
Data Editor window. This window displays the contents of the data file. You may create new data files, or modify existing ones with Data Editor. The Data Editor window opens automatically when you start a new SPSS session.
Viewer window. The Viewer window displays the statistical results, tables, and charts from the analysis you performed (e.g., descriptive statistics, correlations, plots, charts). A Viewer window opens automatically when you run a procedure that generates output. In the Viewer windows, you can edit, move, delete and copy your results in a Microsoft Explorer-like environment.
Pivot Table Editor window. Output displayed in pivot tables can be modified in many ways with the Pivot Table Editor. You can edit text, swap data in rows and columns, add color, create multidimensional tables, and selectively hide and show results.
Chart Editor window. You can modify and save high-resolution charts and plots by invoking the Chart Editor for a certain chart (by double-clicking the chart) in a Viewer window. You can change the colors, select different type fonts or sizes, switch the horizontal and vertical axes, rotate 3-D scatterplots, and even change the chart type.
Text Output Editor window. Text output not displayed in pivot tables can be modified with the Text Output Editor. You can edit the output and change font characteristics (type, style, color, size).
Syntax Editor window. You can paste your dialog box selections into a Syntax Editor window, where your selections appear in the form of command syntax. You can then edit the command syntax to utilize special features of SPSS not available through dialog boxes. If you are familiar with SPSS software under other operating systems (e.g., Unix), you can open up a Syntax Editor window and enter SPSS commands exactly as you did under those platforms and execute the job. You can open multiple syntax windows and the contents of a Syntax Editor window may be saved for later use.
If you have more than one open Viewer window, output is routed to the designated Viewer window. If you have more than one open Syntax Editor window, command syntax is pasted into the designated Syntax Editor window. (Paste feature will be explained later.) The designated windows are indicated by an exclamation point (!) in the status bar at the bottom of each SPSS window. You can change the designated window at any time by selecting it (making it active) and clicking the highlighted pushbutton on the toolbar. An active window is the currently selected window which appears in the foreground. An active window may not be a designated window until you instruct SPSS to make it a designated window (by clicking the icon on the toolbar).
Menus in SPSS for Windows
Many of the tasks you want to perform with SPSS start with menu selections. Each window in SPSS has its own menu bar with menu selections appropriate for that window type. The Data Editor window, for example, has the following menu with its associated toolbar:
Most menus are common for all windows and some are found in certain type of windows.
Pivot Table Editor specific menus
Chart Editor specific menus
Text Output Editor specific menu
Syntax Editor specific menu
Toolbars in SPSS for Windows
Each SPSS window has its own toolbar that provides quick and easy access to common tasks. ToolTips provide a brief description of each tool when you put the mouse pointer on the tool. For example, the toolbar with Syntax Editor window shows the following tooltip when the mouse pointer is put on the run icon:
Status Bar in SPSS for Windows
A status bar at the bottom of the SPSS application window indicates the current status of the SPSS processor. If the processor is running a command, it displays the command name and a case counter indicating the current case number being processed. When you first begin an SPSS session, the status bar displays the message Starting SPSS Processor. When SPSS is ready, the message changes to SPSS Processor is ready. The status bar also provides information such as command status, filter status, weight status, and split file status. The following status bar in an Viewer window, for example, shows that the current Viewer window is the designated output window and the SPSS is ready to run:
The examples given in this document are for working with SPSS for Windows. This document is not intended to replace vendor-supplied SPSS documents. It is assumed that the reader is familiar with the fundamentals of Windows 95 / NT. It is also assumed that the reader has basic statistical knowledge. The term SPSS refers to the software or command language, since the basic command structure is the same across all platforms for SPSS products.
Organizing your data for analysis
Suppose you have three test scores collected from a class of 10 students (5 males, and 5 females) during a semester. Each student was assigned an identification number. The information for each student you have is an identification number, gender of each student, and scores for test one, test two, and test three (the full data set is displayed toward the end of this section for you to view). Your first task is to present the data in a form acceptable to SPSS for processing.
SPSS uses data organized in rows and columns. Cases are represented in rows and variables are represented in columns.
A case contains information for one unit of analysis (e.g., a person, an animal, a machine). Variables are information collected for each case, such as name, score, age, income, educational level. In the above chart, there are two cases and four variables. When data in files are arranged in rows and columns (as shown above), it is called cases-by-variables, or rectangular data files.
In SPSS, variables are named with eight or fewer characters. They must begin with a letter, although the remaining characters can be any letter, any digit, or certain symbols (@, #, _, or $). Blanks and special characters such as &, !, ?, ', and * cannot be used in a variable name. Variable names are not case sensitive. The following reserved keywords cannot be used as variable names:
ALL NE EQ TO LE LT BY OR GT AND NOT GE WITH
Most variables are generally numeric (e.g., 12, 93.23) or character/string/alphanumeric (e.g., F, f, john) in type. Maximum width for numeric variables is 40 characters, the maximum number of decimal positions is 16. String variables with a defined width of eight or fewer characters are short strings, more than eight characters (up to 255 characters) are long strings. Short string variables can be used in many SPSS procedures. You may leave a blank for any missing numeric values or enter a user-missing (e.g., 9, 999) value. However, for string values a blank is considered a valid value. You may choose to enter a user-missing (e.g., x, xxx, na) value for missing short string variables, but long string variables cannot have user-missing values.
There is no defined limit to the number of variables that can be contained in an SPSS data file. In tests from a prompted session, SPSS for Windows has processed up to 32000 variables. However, there is a Windows limitation of approximately 4500 variables that can be accessed through dialog boxes. In files with more than 4500 variables, only the first 4500 will appear on dialog box source lists. The number of cases that can be processed through SPSS for Windows depends on the availability of hard disk space on your PC at the time of processing. Refer to the last part of this document for details on hardware requirements for using SPSS for Windows.
Following the conventions above, let us assign names for the variables in our data set: id, sex, test1, test2, and test3. Once the variables are named according to SPSS conventions, it is a good practice to prepare a code book with details of the data layout. Following is a code book for the data in discussion. Note that this step is to present your data in an organized fashion. It is not mandatory for data analysis. A code book becomes especially handy when dealing with large number of variables. A short sample data, like the following, may not need a code book, but it is included for illustration.
var. name width columns var. type var. labels id 2 1-2 N identification no. sex 1 4 A student gender (f, m) test1 2 6-7 N test one score test2 2 9-10 N test two score test3 2 12-13 N test three score
In the above code book, width stands for the number of fields/columns taken by each variable. For example, the value for variable id takes a maximum of two fields since the highest identification number in our example is going to be 10. The value for variable sex takes a maximum of one field, and so on. Columns stands for the column number/s or location on a given line where the value for each variable can be found by SPSS. Var. type indicates whether a given variable is numeric (N) or alphanumeric (A). In our example, sex is the only alphanumeric variable coded as f for female, m for male.
The next issue is entering your data into the computer. There are several options. You may create a data file using one of your favorite text editors, or word processing packages (e.g., Word Perfect, MS-Word). Files created using word processing software should be saved in text format before trying to read them into an SPSS session. You may enter your data into a spreadsheet (e.g., Lotus 123, Excel, dBASE) and read it directly into SPSS for Windows. Finally, you may enter the data directly into the spreadsheet-like Data Editor of SPSS for Windows. In this document we are going to examine two of the above data entry methods: using a text editor/word processor, and using the Data Editor of SPSS for Windows.
Using an editor/word processor to enter data
Let us first look into the steps for using a text editor or word processor for entering data. Note that if you have a data set with a limited number of variables, you may want to use the SPSS Data Editor to enter your data. However, this example is for illustration purposes. Open up your editor session, or word processing session, and enter the variable values into appropriate columns as outlined in the code book. If you are using a word processor, make sure to save your data in text format. Your completed data file will appear as follows. (Note: The first line is included as a column marker line and is not part of the data. It must be removed before saving or using the data for analysis.)
12345678901234567890 01 f 83 85 91 02 f 65 72 68 03 f 90 94 90 04 f 87 80 82 05 f 78 86 80 06 m 60 74 64 07 m 88 96 92 08 m 84 79 82 09 m 90 87 93 10 m 76 73 70
Save the data as a text file named, grade.dat, onto the floppy disk or onto the hard drive. If you are saving the file on a time sharing computer, then you need to download the file to the local computer.
Notice in the above data layout that one blank space is left after each variable as specified in the code book. It is optional whether to leave a space between variable values. For example, you may choose to enter the data as following:
01f838591 02f657268 03f909490 04f878082 05f788680 06m607464 07m889692
Whichever style (format) you choose, as long as you convey the format correctly to SPSS, it should not have any impact on the analysis. In the above layout, each case/observation has only one line (record) of data. In another situation you may have multiple records per observation.
Using the SPSS Data Editor for entering data
Suppose you want to use the SPSS for Windows features for data entry. In that case, you enter data directly into the SPSS spreadsheet-like Data Editor. This is convenient if you have only a small number of variables. The first step is to enter the data into the Data Editor window by opening an SPSS for Windows session. You will define your variables, variable type (e.g., numeric, alphanumeric), number of decimal places, and any other necessary attributes while you are entering the data. In this mode of data entry, you must define each variable in the Data Editor. You cannot define a group of variables (e.g., Q1 to Q10) using the Data Editor. To define a group of variables, without individually specifying them, you would use the Syntax window.
Let us start an SPSS for Windows session to enter the above data set. If you are using your own PC, start Windows and launch SPSS. If you are using a PC in a UITS Student Technology Center:
This opens the SPSS Data Editor window (titled Untitled). The Data Editor window contains the menu bar, which you use to open files, choose statistical procedures, create graphs, etc. When you start an SPSS session, the Data Editor window always opens first.
You are ready to enter your data once the Data Editor window appears. The first step is to enter the variable names that will appear as the top row of the data file. When you start the session, the top row of the Data Editor window contains a dimmed var as the title of every column, indicating that no data are present. In our sample data set, discussed above, there are five variables named earlier as id, sex, test1, test2, and test3. Let us now enter these variable names into the Data Editor.
To define the first variable, id, move the cursor to any cell in the first column and:
SPSS considers all variables as numeric variables by default. Since id is a numeric variable you do not have to redefine the variable type for id. However, you may want to change the current format for decimal places.
The variable id appears as the title of the first column. Now let us define the second variable, sex.
Since the variable sex is a character type variable you want to change the numeric format type to string format. To change the format type:
At the Define Variable Type: dialog box in which the term Numeric is selected.
The variable sex appears as the title of the 2nd column. Define the remaining three numeric variables, test1, test2, and test3, the same way the variable id was defined. But, remember to move to the respective columns before defining them.
Now enter the data given below pressing [Tab] or the right arrow key after each entry. After entering the last variable value for case number one use the arrow key to move the cursor to the beginning of the next line. Continue the process until all the data are entered.
The data will be saved as an SPSS format file which is readable only by SPSS for Windows. Note that the data file, grade.dat, you saved earlier and the file, sample1.sav, you saved now are in different formats.
Even after saving the data file, the data will still be displayed on your screen. If not, select sample1-SPSS Data Editor from the Window menu.
Creating a Command file to read in your data
In many instances, you may have an external ASCII data file made available to you for analysis, just like the data, grade.dat, we discussed earlier. In such a situation, you do not have to enter your data again into the Data Editor. You can direct SPSS to read the file from the SPSS Syntax Editor window.
Suppose you want to read the file, grade.dat, into SPSS from a Syntax Editor window and create a system file -- as you did with the previous data (sample1.sav) by using the Data Editor. Creating a command file is a faster way to define your variables, especially if you have a large number of variables. You may create a command file using your favorite editor or word processor and then read it into a Syntax Editor window or open a Syntax Editor window and type in the command lines.
To read your already created command file into a Syntax Editor window
In the following example we are opening a new Syntax Editor window to enter the following command lines.
When the Syntax Editor window appears, type:
DATA LIST FILE='A:\GRADE.DAT' FIXED / id 1-2 sex 4 (A) test1 6-7 test2 9-10 test3 12-13. EXECUTE. SAVE OUTFILE='A:\SAMPLE1.SAV'.
The command file will read the specified variable values from the data file, grade.dat, on drive A, and create a system file, sample1.sav, on drive A. Make sure you specify the pathname appropriately indicating the location of the external data file and where the newly created file is to be written. However, you do not have to save a system file to do the analysis. This means the last line is optional for data analysis. Every time you run the above lines, SPSS does create an active file stored in the computer's memory. However, for large data sets, it will save processing time if you save it as a system file and access it for analysis.
In the above command lines, DATA LIST defines a raw data file by assigning names and formats to each variable in the file. They can be in fixed format (values for the same variable are always entered in the same location on the same record for each case) or in freefield format (values for consecutive variables are not in particular columns but are entered one after the other, separated by blanks or commas). In our example, we used the fixed format. FIXED is the default if no format is specified. That is, in our example we did not have to use the FIXED keyword, but it is included for the sake of illustration. The only string variable in the data is sex, which is identified with a (A) after the variable name and column location.
We do not have any numeric variables with decimal places. SPSS assumes that decimal points are explicitly coded in the data file. If there are no decimal points, the numeric variables are assumed to be integers. To indicate noninteger values for data that have not been coded with decimal points, specify the implied number of decimal places in parentheses after the variable name and column location as in gpa 16-18 (2). This means the variable gpa is in columns 16-18 and is recorded as, for example, 389, and it will be assigned 3.89 by SPSS.
In the above example your data are being read from an external file, grade.dat, on drive A. Still another option is to keep the data within the command file. In such an instance you direct SPSS to read your inline data from the command lines with the BEGIN DATA and END DATA commands. In this mode of data input you will omit the FILE subcommand from the DATA LIST command. The BEGIN DATA command follows the DATA LIST command, and the END DATA command follows the last line of data. All procedure commands should come after the END DATA command, but transformation commands can be specified before BEGIN DATA. For example, if you want to read the above data file, grade.dat, as inline data, you should modify the above command lines as following:
DATA LIST FREE / id * sex (a) test1 test2 test3. BEGIN DATA. 01 f 83 85 91 02 f 65 72 68 03 f 90 94 90 04 f 87 80 82 05 f 78 86 80 06 m 60 74 64 07 m 88 96 92 08 m 84 79 82 09 m 90 87 93 10 m 76 73 70 END DATA. EXECUTE. SAVE OUTFILE='A:\SAMPLE1.SAV'.
In this example, we used a FREE format data layout for illustration. Each variable value is separated by a blank space. Since we are using free format, the column specification, after each variable, is dropped. Note that the variable sex is a one-character string variable. In free field format, when you specify a string format, that format applies to all preceding variables. This means SPSS will regard both id and sex to be read with the string format. To avoid this, place an asterisk (*) after the variable id, to convey that id must be read with the default numeric format. FIXED format can be used with inline data. You may type the above lines in to a Syntax Editor window, or read in the text file with inline data into a Syntax Editor window and execute it as explained above. Keeping data inline may not be an efficient option when you have a large number of data lines.
In the above sections, we discussed several options for entering data in to the computer before data analysis. You may choose any of them for your data entry.
Selecting to display variable name or label within variable list boxes in dialogs
Within variable list boxes in dialogs, you have the option to display the variable name as always or the entire variable label (up to 256 characters) can be displayed.
Suppose that you have the data set, sample1.sav, still displayed on your screen. If not, select SPSS Data Editor - sample1 from the Window menu. (If you used the Syntax Editor window to read in your data file, grade.dat, then also make the sample1-SPSS Data Editor window active.)
The next step is to run some basic statistical analysis with the data you entered. The commands you use to perform statistical analysis are developed by simply pointing and clicking the mouse to appropriate menu options. This frees you from typing in your command lines.
However, you may paste the command selections you made to a Syntax Editor window. The command lines you paste to the Syntax Editor window may be edited and used for subsequent analysis, or saved for later use. Use the Paste pushbutton to paste your dialog box selections into a Syntax Editor window. If you don't have an open Syntax Editor window, one opens automatically the first time you paste from a dialog box. Click the Paste button only if you want to view the command lines you generated. Once you click the Paste pushbutton the dialog selections are pasted to the Syntax Editor window, and this window becomes active. To execute the pasted command lines, highlight them and click run. You can always get back to the Data Editor window by selecting sample1-SPSS Data Editor from the Window menu.
Before computing the descriptive statistics, we want to calculate the mean score from the three tests for each student. To compute the mean score:
A new column titled average will be displayed in the Data Editor window with the values of the mean score for each case. The number of decimal places in a newly created variable can be tailored by selecting Edit/Options/Data/Display format for new numeric variables prior to creating new variables. This display format setting affects the formats of all new subsequent numeric variables.
Now that we have computed a new variable, average, let us run a few descriptive statistics on the variables in the data set. Of the variables in our data set, sex is a categorical variable, and test1, test2, test3, and average are continuous variables. We want to run a FREQUENCIES procedure for the categorical variable, and a DESCRIPTIVES procedure for the continuous variables. The FREQUENCIES procedure produces tables of frequency counts and percentages of the values of individual variables. It is also used to obtain frequencies and graphical displays (e.g., barcharts) for categorical variables and to obtain statistics (e.g., median, percentiles) and graphical displays (e.g., histogram) for continuous variables. The DESCRIPTIVES procedure computes univariate (e.g., mean, std. dev, minimum, maximum) statistics for numeric variables. Because it does not sort values into a frequency table, DESCRIPTIVES is an efficient way of computing descriptive statistics. Other procedures which display descriptive statistics include MEANS, FREQUENCIES, and EXAMINE.
To run the FREQUENCIES procedure:
Now the selected variable appears in a box on the right and disappears from the left box. Note that when a variable is highlighted in the left box, the arrow button is pointed right for you to complete the selection. When a variable is highlighted in the right box, the arrow button is pointed left to enable you to deselect a variable (by clicking the button) if necessary. If you need additional statistics besides the frequency count, click the Statistics... button at the bottom of the screen. When the Statistics... dialog box appears, make appropriate selections and click Continue. In this instance, we are interested only in frequency counts.
The output appears on the Viewer screen
Our next task is to run the DESCRIPTIVES procedure on the four continuous variables in the data set.
A dialog box appears. Names of all the numeric variables in the data set appear on the left side of the dialog box.
Now the selected variables appear in the box on the right and disappear from the box on the left.
The mean, standard deviation, minimum, and maximum are displayed by default. The variables are displayed, by default, in the order in which you selected them. Click Options... for other statistics and display order
The output will be displayed on the Viewer screen.
Suppose you want to obtain the above results for males and females separately. The MEANS procedure displays means, standard deviations, and group counts for dependent variables based on grouping variables. In our data set sex is the grouping variable and test1, test2, test3, and average are the dependent variables. Before proceeding with the analysis let us clear the output window to better view the output from the MEANS procedure and later print it out. To do so, make the Viewer window active (select Output1 - SPSS Viewer from the Window menu).
To run the MEANS procedure:
The output will be displayed on the Viewer screen.:
There may be other situations in which you want to select a specific category of cases from a grouping variable (e.g., ethnic background, socio-economic status, education). To do so, choose Data/Select Cases... to select the cases you want and do the analysis (e.g., from the grouping variable educate, select cases without a college degree). However, make sure you reset your data if you want to include all the cases for subsequent data analysis. If not, only the selected cases will appear in subsequent analysis. To reset your data choose Data/Select Cases.../All Cases, and click OK.
Working with output
When you run a procedure in SPSS, the results are displayed in the Viewer window in the order in which the procedures were run. In this window, you can easily navigate to whichever part of output you want to see. You can also manipulate the output and create a document that contains precisely the output you want, arranged and formatted appropriately. You can use the Viewer to:
The Viewer is divided into two panes. The left pane contains an outline view of the output contents. The right pane contains statistical tables, charts, and text output. You can use the scroll bars to browse the results, or you can click an item in the outline to go directly to the corresponding table or chart.
Suppose you want to copy the Descriptives table into another Windows application, such as a word processing program or a spreadsheet.
Manipulating pivot tables
Much of the output in SPSS is presented in tables that can be pivoted interactively. You can rearrange the rows, columns, and layers. To edit a pivot table, double-click the pivot table and this activates the Pivot Table Editor. Or click the right mouse button on the pivot table and from the context menu, choose SPSS Pivot Table Object/Open and the pivot table will be ready to edit in its own separate Pivot Table Editor window. The second feature is especially useful for viewing and editing a wide and long table that otherwise cannot be viewed at a full scale.
Printing the output
Once you are satisfied with your analysis you may want to obtain a hard copy of the output. You may print the entire output on the Viewer window, or delete the sections you do not want before you print. Or you can save the output to a diskette or hard drive and print it later. In this case, let us print the entire output which is on the Viewer window. It is assumed that there is a printer attached to your PC or you are working from a Student Technology Center. Make sure that you are at the Viewer window by selecting Output1-SPSS Viewer from the Window menu. If you open multiple Viewer windows, select the output you want to print:
The contents of the output window will be directed to the printer. To save paper, choose Infinite option for the Length. You can also control the page width by changing Width. For some procedures, however, some statistics are only displayed in wide format.
Reviewing what you have done
In SPSS for Windows most tasks can be accomplished simply by pointing and clicking the mouse. The data file is automatically read from the Data Editor window. So far, you have used the menu-driven dialog box interface for statistical analysis. There is little opportunity to examine what goes on behind the scene. If you want to understand how the SPSS command language works, paste the commands you've used to a Syntax Editor window and examine them.
Below are the SPSS command lines used in our analysis. The command lines were pasted to the Syntax Editor window. (Note that the lines given below were edited to choose the most relevant lines for our discussion.)
COMPUTE average = mean (test1,test2,test3). EXECUTE. FREQUENCIES VARIABLES=sex. DESCRIPTIVES VARIABLES=average test1 test2 test3 /STATISTICS=MEAN STDDEV MIN MAX. MEANS TABLES=average test1 test2 test3 BY sex /CELLS MEAN COUNT STDDEV.
A quick glance at the command file reveals that SPSS commands begin in column 1 with a command keyword and continue on as many lines as needed. Continuation lines are indented at least one column. Each command keyword is followed by command specifications separated by at least by one space. Some specifications form subcommands. In some cases, a slash (/) is required to separate subcommands. Each SPSS command should be terminated by a period (.). SPSS commands may be entered in upper or lower case. However, you might notice that all keywords and specifications pasted from the dialog-box, except for the variable names, are in uppercase. This is because the variable names were entered in lower case into the Data Editor. SPSS translates all keywords and names to upper case before processing. But it preserves the lower and upper case within labels and strings.
The first command we executed was the COMPUTE command. This creates new numeric variables or modifies the values of existing string or numeric variables. In our example, we created a new variable average which was the mean of the three test scores. The EXECUTE command forces the data to be read and executes the transformations that precede it in the command sequence. EXECUTE is designed for use with transformation commands and other SPSS commands such as ADD FILES, MATCH FILES, WRITE, etc., which do not read data and are not executed unless followed by a data-reading procedure.
The next command is the FREQUENCIES command. The VARIABLES specification is followed by the name of the selected variable. The DESCRIPTIVES and MEANS commands, in our example, have subcommands separated by a slash. Subcommands are used for additional specifications in locating data, handling data, and formatting the output display.
Optional command lines
We could have added a few optional command lines to the example above to make it more readable. The VARIABLE LABELS command in SPSS assigns extended descriptive labels to variables. Since variable names are limited to eight characters VARIABLE LABELS becomes useful for documentation purposes. The basic specification is a variable name and the associated label in apostrophes or quotes. A variable label can be up to 256 characters long, although most procedures print fewer characters. The syntax for the command is:
VARIABLE LABELS test1 'score for test1'.
Another optional command is the VALUE LABELS command, which assigns value labels for categories in a variable. The basic specification is a variable name and the individual values with their assigned labels. The syntax for the command is:
VALUE LABELS sex 'f' 'female' 'm' 'male'/ q1 to q5 1 'agree' 2 'undecided' 3 'disagree'.
Each value label can be up to 60 characters long, although most procedures do not display that many characters. The value labels for each variable are separated by a slash (/). Value labels are not available for long string variables (longer than 8 characters).
During a Window session, the VARIABLE LABELS and VALUE LABELS can be assigned from the Define Variable dialog box discussed earlier in this document. From the Define Variable dialog box click the Labels pushbutton and enter the value labels and variable labels.
Using SPSS for Windows for further data analysis
So far, we've used SPSS to develop a basic idea about how SPSS for Windows works. Next step is to examine a few other data analysis techniques (CORRELATIONS, REGRESSION, T-TEST, ANOVA). All the statistical procedures available under a mini or mainframe version of SPSS are available from SPSS for Windows. Refer to the vendor documentation for the most complete information.
Now we will turn to another data set with more variables and cases. In this example, you will read an ASCII data file, clas.dat, created with a word processor and saved as a text file into the SPSS session. The data collected from 40 middle school students contains 26 variables including the following:
The first four variables (id, sex, exp, school) are background variables. The variable sex has two levels (M=male, F=female). Exp (prior computer experience) has three levels (1=less than one year, 2=1-2 years, 3=more than 2 years), school (type of school system) has three levels (1=rural, 2=suburban school, 3=urban school). The next 20 variables (C1.. C10, M1..M10) are Likert type responses to computer opinion and math anxiety surveys. The remaining variables (mathscor, compscor) are scores on the math test and computer test.
Creating a program to read the data file
Let us assume that the data file, clas.dat, is on drive A. At this point the fastest way to read this data into SPSS for Windows is using the Syntax window. You may open a Syntax Editor window (File/New/Syntax) and type in the following lines or create a command file with the following lines using a word processor or editor and then read it into the Syntax Editor window (File/Open and read it by clicking Syntax(*.sps) for the file type from the Open File dialog box). Suppose the following command lines are stored (in text format) in a file, clas.sps, on drive A.
DATA LIST FILE='a:\clas.dat' /id 1-2 sex 3 (A) exp 4 school 5 c1 to c10 6-15 m1 to m10 16-25 mathscor 26-27 compscor 28-29. MISSING VALUES mathscor compscor (99). RECODE c3 c5 c6 c10 m3 m7 m8 m9 (1=5) (2=4) (3=3) (4=2) (5=1). RECODE sex ('M'=1) ('F'=2) INTO nsex. /* char var into numeric var COMPUTE compopi=SUM (c1 TO c10). /* find sum of 10 items using sum function COMPUTE mathatti=m1+m2+m3+m4+m5+m6+m7+m8+m9+m10. /* adding each item VARIABLE LABELS id 'Student Identification' sex 'Student Gender' exp 'Yrs of Comp Experience' school 'School Representing' mathscor 'Score in Mathematics' compscor 'Score in Computer Science' compopi 'Total for Comp Survey' mathatti 'Total for Math Atti Scale'. VALUE LABELS sex 'M' 'Male' 'F' 'Female'/ exp 1 'Up to 1 yr' 2 '2 years' 3 '3 or more'/ school 1 'Rural' 2 'City' 3 'Suburban'/ c1 TO c10 1 'Strongly Disagree' 2 'Disagree' 3 'Undecided' 4 'Agree' 5 'Strongly Agree'/ m1 TO m10 1 'Strognly Disagree' 2 'Disagree' 3 'Undecided' 4 'Agree' 5 'Strongly Agree'/ nsex 1 'Male' 2 'Female'. EXECUTE.
Use the mouse to highlight the command lines and click Run. The command lines will be executed and an active SPSS file will be created. Select Window/Untitled - SPSS Data Editor to see the data file you just read in. Save the data file as an SPSS system file to drive A or to a hard drive.
A correlation analysis is performed to quantify the strength of association between two numeric variables. In the following task we will perform Pearson correlation analysis. The variables used in the analysis are compopi, mathatti, mathscor and compscor.
A symmetric matrix with Pearson correlation as given below will be displayed on the screen. Along with Pearson r, the number of cases and probability values are also displayed.
Simple Linear Regression
A correlation coefficient tells you that some sort of relation exists between the variables, but it does not tell you much more than that. For example, a correlation of 1.0 means that all points fall exactly on a straight line, but it says nothing about the form of the relation between the variables. When the observations are not perfectly correlated, many different lines may be drawn through the data. To select a line that describes the data, as close as possible to the points, you employ the Regression Analysis which is based on the least- squares principle. In the following task you will perform a simple regression analysis with compscor as the dependent variable, and mathscor as the independent variable.
The output will now be displayed on the screen as shown below:
T-test is a data analysis procedure to test the hypothesis that two population means are equal. SPSS can compute independent (not related) and dependent (related) t-tests. For independent t-tests, you must have a grouping variable with exactly two values (e.g., male and female, pass and fail). The variable may either be numeric or character. Suppose you have a grouping variable with more than two categories. You may use the RECODE (Transform/Recode) command to collapse the categories into two groups. For example, a variable, exp, has 3 categories. You want to collapse this into two categories (1 = < 1 yr. exp, 2 = one or more yrs.) and create a new variable, newexp. The syntax is:
recode exp (1 = 1) (2,3 = 2) into newexp. execute.
RECODE is a powerful SPSS command for data transformation with both numeric and string variables.
In the following task, we will perform an independent t-test. The test variables are mathscor and compscor, and the grouping variable is newexp.
The output will now be displayed on the screen as shown below:
A t-test with two related variables is performed using the Paired-Samples T-Test from the Statistics/Compare Means menu. The paired T-test is applicable for data collected in a pre-post (before and after) kind of situation.
Analysis of Variance
The statistical technique used to test the null hypothesis that several population means are equal is called analysis of variance. It is called that because it examines the variability in the sample and,
based on the variability, it determines whether there is a reason to believe the population means are not equal. The statistical test for the null hypothesis that all of the groups have the same mean in the population is based on computing the ratio of within and between group variability estimates, called the F statistic. A significant F value only tells you that the population means are probably not all equal. It does not tell you which pairs of groups appear to have different means. To pinpoint exactly where the differences are, multiple comparisons may be performed.
In the following exercise you will perform a One-Way ANOVA with compopi as the dependent variable, and exp as the factor variable.
The output will be displayed on the screen as shown below:
Post Hoc Tests
SPSS users under other operating systems
If you are familiar with SPSS under other operating systems, working with SPSS for Windows will be an easy transition. SPSS for Windows is basically the same as the SPSS products on the Unix, and other mainframe computers. A job which ran successfully under a mainframe computer will run without changes under SPSS for Windows. However, in SPSS for Windows, command lines must be terminated by a period whereas on the mainframe computers the period is optional. By opening a Syntax Editor window you may type in the commands just like from the command prompt on a time sharing computer. You may read in your already created command file into a Syntax Editor window. An SPSS system file created under other operating systems must be converted to export format before reading it into SPSS for Windows.
Working memory allocation
Working memory is allocated as needed during the execution of most commands in SPSS for Windows. However, there are a few procedures that take all of the available workspace at the beginning of execution. Among the procedures that may require all of the available workspace during execution are Frequencies, Crosstabs, Means, and Nonparametric Tests. If you get a message stating that you should change the workspace allocation, increase the special memory workspace limit. The default amount of working memory is 1024K. You can change this setting as follows:
To decide on a new value, use the information that is displayed in the output window before the out-of-memory message. After you are finished with the procedure, you should probably reduce the limit to its default amount, since an increased workspace allocation may reduce performance under certain circumstances.
Hardware and software requirements
The material covered in this document illustrates some of the basic features of SPSS for Windows. Examining additional features of SPSS for Windows is beyond the scope of this document. For further help, refer to SPSS for Windows documents.
SPSS for Windows documents
The basic documents for SPSS for Windows are:
Documentation for other add-on modules (e.g., Tables, Trends, Categories, Lisrel) are also available. Documents may be purchased though the local bookstore, or directly ordered from Prentice Hall (800/374-1200). Those who wish to contact SPSS, Inc. directly for information on SPSS for Windows may call 800/543-2185.