Example: Confidence
Intervals
This
40-year-old male with ten years of education sustained a traumatic brain injury
in 1990. He was tested by a psychologist six months after his injury (T1). A
year later he sustained a second traumatic brain injury and is again tested by
a psychologist approximately six months later (T2).
. T1 . T2
WAIS-R 90% Confidence Interval
Composite SS Low High SS
VIQ 94 88 100 82
PIQ 87 79 96 93
VC 94 87 101 89
PO 95 84 107 105
FFD 92 83 103 75
VIQ, PIQ, VC, and PO at the time of the second testing (following injury 2) fall within the 90% confidence interval predicted from the initial testing following injury 1. This indicates that based upon the psychometric properties of the WAIS-R these scores at the second testing are within the range expected based upon the first testing (90% confidence).
FFD at initial testing had a standard score of 92. A retest 90% confidence band is placed around this number using the standard error of prediction to determine the range of scores that are likely to result from a second testing based upon the psychometric properties of the test. As can be seen the FFD standard score of 75 following the second testing falls outside this range. The most likely explanation for this is that it results from the second injury.
Test-Retest
Issues
There are
essentially two ways to evaluate whether or not a clinically significant change
in scores has occurred.
1. Place confidence
intervals around the predicted true score using the standard error of
prediction. A retest score that falls within this confidence band could be
attributed to the properties of the test. This technique requires only the test
scores and then subsequent comparison with retest data.
(a) Note that in the tables supplied the SEp was calculated
using the reliability coefficient at test and NOT the test-retest
reliability. This is because if we use the test-retest reliability in the
formula the interval applies only to data with a similar test-retest testing
interval. If the retest time is similar to that of the standardisation sample
then SEp calculated using test-retest reliability would be appropriate.
(b) Note that the calculation of confidence intervals here does
not take into account obvious systematic errors such as practice effects or
fatigue. If the magnitude of these are known then it would be appropriate to
subtract these effects from the test-retest difference score, i.e. X1 - X2 -
practice.
The problem with these considerations is that they both diminish
the likelihood of detecting a true change, i.e. increases probability of a type
II error.
2. Use formula for
calculating the significance of difference scores. This method requires both test and retest scores. Note also that
we are comparing the difference between predicted true scores at test and
retest. As with 1(b) it would also be appropriate to remove practice effects if
these are known.
Significance
To determine whether the difference between two scores
is significant or not, you must calculate the standard error of measurement of
the difference between the two tests, using:
![]()
Multiplying this value by the z-score corresponding to
the desired significance level, i.e. 1.96 for .05 level, will generate the
critical difference value. Equally,
dividing the difference between the two scores by the standard error of the
difference will give a z-score that can be then evaluated for significance
level. Note that predicted true scores
are used to determine the difference and standard errors of estimate are used
in the standard error of difference formula.
![]()
Abnormality
This is probably the most important of the
computational procedures as we are more interested in meaningful change than
significant differences. The difference between this procedure and the
significance formula is that here we are interested in the standardisation
sample specifically. Critical abnormality levels are calculated by multiplying
the standard deviation of the difference between two scores calculated using,
![]()
which for Wechsler Composite scores simplifies to
![]()
and for Wecshler subtest age-scaled scores
simplifies to
![]()
by the z-score corresponding to the desired frequency
level (1.96 for 5%). Note that this value is two-tailed so for a unidirectional
test you would use a correspondingly more rigorous value (such 2.5%). Again, as
with significance, you can divide the actual difference (no true scores
here) by the standard deviation of difference and look up the frequency
associated with the calculated z-score.
![]()
WAIS-III SUBTESTS
No. of Subtests: 14
Verbal Subtests: (Number of Items)
Information (28)
Digit Span (15)
Vocabulary (33)
Arithmetic (20)
Comprehension (18)
Similarities (19)
Letter-Number Sequencing (7)
Performance Subtests:
Picture Completion (25)
Picture Arrangement (11)
Block Design (14)
Digit Symbol-Coding (133)
Matrix Reasoning (26)
Symbol Search (60)
Optional Subtests: Object Assembly (4)
IQ Composites:
Verbal IQ
Performance IQ
Full Scale IQ
Factor Indices:
Verbal Comprehension
Information
Vocabulary
Similarities
Perceptual Organisation
Block Design
Picture Completion
Matrix Reasoning
Working Memory
Digit Span
Arithmetic
Letter-Number Sequencing
Processing Speed
Digit Symbol-Coding
Symbol Search
Note: Picture Arrangement, Comprehension, and Object Assembly do not contribute to WAIS-III Index scores.
WAIS-III RELIABILITY
AVERAGED
ACROSS ALL AGE GROUPS
Internal Consistency Test-Retest (30-54
yr.)
Information .91 .94
Digit Span .90 .83
Vocabulary .93 .94
Arithmetic .88 .87
Comprehension .84 .81
Similarities .86 .88
Letter-Number Sequencing .82 .78
Picture Completion .83 .79
Picture Arrangement .74 .73
Block Design .86 .88
Object Assembly .70 .78
Digit Symbol .84 .84
Matrix Reasoning .90 .75
Symbol Search .77 .82
Verbal IQ .97 .96
Performance IQ .94 .90
Full Scale IQ .98 .96
Verbal Comprehension .96 .95
Perceptual Organisation .93 .88
FFD/Working Memory .94 .90
Processing Speed .88 .88
On the WAIS-III mean
retest scores for the 30-54 age group are higher (2-12 weeks later) than
initial testing by:
Verbal IQ 2.0 points
Performance IQ 8.3 points
Full Scale IQ 5.1 points
Verbal Comprehension 2.1 points
Perceptual Organisation 7.4 points
Working Memory 3.1 points
Processing Speed 4.6 points
STRUCTURE OF THE WMS-III
SUBTESTS
Information & Orientation
Logical Memory I, II, Recognition
Faces I & II
Verbal Paired Associates I, II, Recognition
Family Pictures I & II
Word Lists I, II, Recognition
Visual Reproduction I, II, Recognition
Letter-Number Sequencing
Spatial Span
Mental Control
Digit Span
Note 1: Information &
Orientation, Word Lists, Visual Reproduction, Digit Span, and Mental Control do
not contribute to composites
PRIMARY INDICES
IMMEDIATE MEMORY
Auditory Immediate Visual
Immediate
Logical Memory I Faces I
Verbal Paired Associates I Family Pictures I
GENERAL MEMORY
Auditory Delayed Visual
Delayed Auditory
Recognition Delayed
Logical Memory II Faces II Logical Memory II
Verbal Paired Associates II Family Pictures II Verbal Paired Associates II
WORKING MEMORY
Letter-Number Sequencing
Spatial Span
AUDITORY PROCESS COMPOSITES
Single-Trial Learning
Learning Slope Retention Retrieval
Logical Memory I Logical Memory I Logical Memory I Logical Memory II
Verbal Paired Verbal Paired Logical Memory II Verbal Paired
Associates I Associates I Verbal Paired Associates II
Associates I
Verbal Paired
Associates
II
COMPARABLE RELIABILITY COEFFICIENTS FOR WMS-III
SUBTESTS AND INDICES
Primary Subtests: Consistency Retest
Logical Memory I .88 .74
Faces I .74 .70
Verbal Paired Associates I .93 .81
Family Pictures I .81 .63
Letter-Number Sequencing .82 .71
Spatial Span .79 .71
Logical Memory II .79 .76
Faces II .74 .63
Verbal Paired Associates II .83 .77
Family Pictures II .84 .68
Supplementary Subtests:
Mental Control .87 .77
Word Lists I .79 .61
Word Lists II .80 .62
Visual Reproduction I .79 .60
Visual Reproduction II .77 .60
Index Scores:
Auditory Immediate .93 .85
Visual Immediate .82 .77
Immediate Memory .91 .85
Auditory Delayed .87 .83
Visual Delayed .83 .75
Aud. Rec. Delayed .74 .62
General Memory .91 .87
Working Memory .86 .79