Test Construction, Item Analysis, Reliability, & Validity Exercise

Background Information

You will find the data in a file named PSY621.XLS. If you right click, and choose the "Save Target As" option, you can save it to your local machine.

 

These data are from a group of nearly 500 subjects who filled out a 122-item survey which is available here.  This survey is actually four questionnaires:

1. Perceptual Aberration (PA): taps body image aberrations and distortions.

2. Magical Ideation (MI): taps "magical thinking" and other beliefs that would not be endorsed by many people in this culture.

3. Impulsive Nonconformity (NC): taps impulsivity with special disregard for social norms.

4. Infrequency (IF): a seven item scale designed to identify careless responding and malingerers.

 

If the item was endorsed "true", the code is 1; if "false", the code is 0.  Missing data (omitted items) are indicated by spaces.  You will need to decide how to handle missing data.

 

Most items are keyed "True"; i.e., endorsing true gives the person one point towards the total score, endorsing false does not.  Some items, however, are keyed "False"; endorsing false gives the person one point towards the total score, endorsing true does not.  The following items are keyed false and therefore must be recoded:  4, 6, 14, 26, 27, 28, 35, 38, 40, 46, 48, 49, 52, 53, 54, 56, 58, 59, 76, 84, 101, 103, 104,  106, 112.

 

The following items are on each scale (also indicated on the attached questionnaire):

 

Perceptual Aberration: 2, 3, 5, 7, 9, 13, 16, 18, 22, 23, 26, 29, 31, 34, 37, 39, 41, 44, 47, 48, 52, 55, 59, 60, 66, 70, 71, 77, 94, 96, 98, 102, 108, 109, 118.

 

Magical Ideation: 06, 08, 10, 11, 25, 27, 36, 46, 50, 51, 54, 56, 61, 65, 67, 73, 75, 76, 78, 79, 81, 83, 88, 92, 104, 116, 117, 119, 120, 122.

 

Impulsive Nonconformity: 01, 04, 12, 14, 15, 17, 19, 20, 24, 28, 30, 32, 35, 38, 40, 42, 43, 45, 49, 53, 57, 58, 63, 68, 69, 72, 74, 80, 84, 85, 86, 87, 89, 90, 91, 93, 95, 99, 100, 101, 103, 105, 106, 107, 110, 111, 112, 113, 114, 115, 121.

 

Infrequency: 21, 33, 62, 64, 82, 97.

 

Before starting the assignment, also determine whether you will exclude individuals based on aberrant scores on the Infrequency scale, and justify your decision and approach to exclusion.

 

Assignment

 

First you should decide what subjects to include and exclude and justify your decision.  Then on the dataset of included subjects, address the following items.  For each item below, please include all relevant printouts and, when performing calculations by hand, please include the formula you choose to use and show your calculations.

 

1.     Compute the total score for each of the four scales, then obtain and print out the intercorrelation matrix for these four scales (a 4x4 matrix of correlations).  This is a single-method multitrait matrix.  What does this tell you about the homogeneity or heterogeneity the constructs tapped by these scales?

 

2.     What are the standard errors of these correlations (assuming that r is an unbiased estimate of ρ)?

 

3.     Compute and print out the item statistics (p and rit).  Identify poor items on the printout, and indicate in general what you decided would constitute a bad item.  Articulate your rationale for your decision about the criterion/criteria for a bad item.

 

4.     Compute and print out the internal consistency (rtt) of each of the four scales by whatever method you deem appropriate (justify your deeming).

 

5.     Fred scored 22 on the Perceptual Aberration Scale; Clyde scored 13.  Is the difference between their scores statistically significant at the .05 level?

 

6.     What do you estimate the reliability of the difference (rDD) between the Perceptual Aberration and Magical Ideation Scales to be?

 

7.     By hand, recompute the correlation matrix from question 1, correcting for attenuation by the unreliability of each scale.  Now, what would you say about the homogeneity or heterogeneity the constructs tapped by these four scales?

 

8.     By hand, compute the estimated reliability of each scale if you doubled the length of the scale by adding items comparable to these existing items in terms of their psychometric properties and content area.

 

9.     Exclude the items you identified as poor on each of the scales (excluding the infrequency scale).  Then recompute the internal consistency estimates (rtt) of each of these three scales by whatever method you already deemed appropriate.

 

10. Now assume that you added new items that are of comparable quality to these retained items.  Assume you add as many items to each scale as you excluded in # 9 above.  By hand, what would you estimate the internal consistency estimates (rtt) of each of these three new-and-improved scales to be?