What follows is a written version of the material presented in lectures and the residential schools designed to achieve this goal of aiding you in mastering this material.
The three types
of information are presented: the introductory background material for the
case; information regarding the tests that have been utilized in the
assessment; and the testing results along with the standardization tables
necessary to interpret them. I shall proceed in the order in which the
information is presented in the assignment. The first thing we need to consider
is the introduction to the assignment and what it reveals about this young man:
·
He is a 28-year-old white male.
·
He is in the last year of his B.Sc. in Biology – based on this we would
estimate his number of years of formal education as 14.
·
Prior to the accident he was getting A’s and HD’s
·
Following the accident and his return to university he was achieving C
and B grades.
Note that the assessments
are conducted in the context of his university education, i.e. prior to his
return to his studies and approximately a year later when he is considering
graduate studies. As you would expect the impact of his injuries upon his
ability to engage in tertiary education and the challenges he will face in
doing so, will be uppermost in his mind.
Regarding the accident:
·
He was involved in a motor vehicle accident
·
He lost consciousness and remained so for 28 hours.
·
Period of retrograde amnesia is 10 minutes
·
Period of post-traumatic amnesia (PTA) is around three days
·
Glasgow Coma Scale of 7/15
The table below is adapted
from the 1985 Central Nervous System Trauma Status Report edited by Donald P.
Becker and John T. Povlishock. Prepared for the National Institute of
Neurological and Communicative Disorders and Stroke, National Institute of
Health, USA, 1985. It indicates categories for defining head injury severity.
Based upon these criteria Mr. Gould’s traumatic brain injury would be
classified as Severe (LOC=28 hours, GCS = 7, PTA = 72 hours).
CATEGORIES
DEFINING HEAD INJURY SEVERITY
1) FATAL injuries.
2) SEVERE injuries
where LOC* and/or PTA* occurred for more than 24 hours, or a cerebral
contusion, laceration, or intracranial hematoma was present. GCS* = 3-8.
3) MODERATE injuries
where LOC and/or PTA occurred for more than 30 minutes but less than 24 hours,
and/or a skull fracture was noted. GCS = 9-12.
4) MILD injuries
where LOC and/or PTA occurred, but for less than 30 minutes, with no skull
fracture, cerebral contusion, laceration, or intracranial hematoma. GCS =
13-15.
5) TRIVIAL injuries without
LOC or PTA, woth no skull fracture, cerebral contusion, laceration, or
intracranial hematoma.
*LOC – Loss of
Consciousness
*PTA –
Post-traumatic Amnesia
*GCS – Glasgow
Coma Scale
He reports symptoms related
to:
·
Headaches
·
Memory
·
Concentration
·
Word-finding
The data for only two tests have been provided. This would constitute a poor assessment if these were the only tests administered but the data presented is sufficient to enable you to test the hypotheses generated and get a taste for the psychological assessment interpretative process.
The Wechsler Adult Intelligence Scale – 3rd Edition (WAIS-III) and the Wechsler Memory Scale – 3rd edition (WMS-III) have been administered and some of the test scores from the initial and follow-up assessments have been provided. Data has also been provided based on the Wechsler Test of Adult Reading which can also be combined with demographic data to predict WAIS-II and WMS-III scores. Examination of this measure is not strictly necessary to complete the assignment (but the values it predicts certainly are), but this measure would be commonly employed in cases of this type to estimate premorbid or expected level of cognitive functioning.. The WTAR consists of progressively more difficult words which must be read aloud by the test-taker. Reading tests have been commonly employed to estimate IQ scores. As the words are roughly in order of reading grade level, those who have not been exposed to certain words will not know how to pronounce them. Thus performance on the test is a measure of the degree to which you have been exposed to infrequent or irregular words in the English language, something which is highly correlated with the number of years of formal education one has had. Some of the original studies with measures of this type have indicated that, perhaps with the exception of severe impairment, declines in cognitive functioning are NOT accompanied by declines in performance on reading tests. Many clinicians, therefore use the measure as a way of examining word knowledge with the expectation that whatever problem you may have, has had little impact upon test performance. This is a round about way of saying that it is often used as an estimate of your abilities before you began suffering from your current difficulty. Individuals hypothesized to be currently experiencing a decline in cognitive functioning could be recognized as such by comparing where they should be based upon their WTAR + Demographic scores with where they are now! More about this later!
THE WAIS-III
The WAIS-III is a cognitive/intellectual battery of tests
designed to assess a variety of cognitive domains. Most notable in the battery
is the omission of tests that evaluate memory beyond that of short-term recall.
The battery consists of 14 subtests only 13 of which are customarily
administered. The Object Assembly subtest has been included in the third
edition of the battery to maintain consistency with its predecessor (the
WAIS-R) and to permit the substitution of this test for a spoiled Block Design
performance. In the case that you will be analyzing no data for Object Assembly
has been provided.
The nature of the subtests will be discussed shortly, but
analysis of the WAIS-III focuses primarily around composite scores. Composite
scores are combinations of subtests according to a particular theoretical
framework. The two types of composites on the WAIS-III are IQ scores (which are
based upon an historical framework) and Index scores (which are based upon an
empirical framework). One of the things that can be somewhat confusing about
the WAIS-III is that subtests are combined in ways that are similar but not
identical for the two frameworks described. Let’s consider the IQ framework
first.
David Wechsler, the developer of the Wechsler batteries conceived of his test as a measure of overall cognitive/intellectual ability in the form of a score that included all of the subtests on the battery, the Full Scale Intelligent Quotient (FSIQ). This FSIQ could be divided into two sub-categories, the Verbal Intelligence Quotient (VIQ) which encompassed those measures that were administered verbally and required verbal responses, and the Performance Intelligence Quotient (PIQ) which encompassed those measures that were administered visually and usually required a written, pointing, or object-manipulation response. So essentially, FSIQ represents an overall indication of performance and VIQ and PIQ are subdivisions that relate to the mode of input and output of information. This framework has been retained in the WAIS-III with IQ scores computed from the measures that have been traditionally used for this purpose.
Factor analysis of the WAIS-III, however, reveals that rather than a dichotomous verbal and visual structure to the battery, four abilities or constructs are actually measured by the test. This factor structure forms the basis of the four Index scores. The Verbal Comprehension Index (VCI) consists of those subtests that best measure verbal comprehension and expression. The Perceptual Organization Index (POI) consists of those subtests that best measure an individual’s ability to process complex visual information and solve problems – in some ways a non-verbal reasoning measure. The Working Memory Index (WMI) consists of those subtests that best measure attentional abilities and the degree to which an individual can efficiently perform mental operations. The Processing Speed Index (PSI) consists of the subtests that best measure speeded visual information processing - these measures all involve scanning of visual information and the rapid writing of responses on a page.
As you are well aware, the more items that are included on a scale the higher the reliability is likely to be. Examination of the numbers of subtests that are included in each of the IQ and Index scores (in the table below) would appropriately suggest that higher internal consistency is found in composites that contain more subtests. Our approach to analyzing psychological test data is to proceed in a hierarchical fashion from the most reliable measures and work our way down to the least reliable measures. This means beginning with FSIQ, proceeding to VIQ and PIQ, then to VCI, POI, WMI, and PSI, and finally to the individual subtest level should this prove necessary.
The following is a general description of each of the WAIS-III subtests:
Vocabulary
This subtest
presents 33 words orally to the test-taker and requires them to supply a
dictionary style definition. It is the most reliable subtest in the scale and
is the best measure of g (69% of its variance). It contributes to FSIQ, VIQ,
and VCI composites and is generally considered a measure of word knowledge.
This measure is heavily influenced by formal education and literacy.
Similarities
This subtest consists of 19 word pairs. The test-taker’s
task is to indicate how the two words are similar. More points are awarded for
a more abstract relationship. It contributes to FSIQ, VIQ, and VCI composites and is generally considered to be a
measure of relational word knowledge. This measure is strongly influenced by
education and literacy.
STRUCTURE OF WAIS-III COMPOSITE SCORES
No. of Subtests: 14
VIQ Subtests: (Number of Items)
Information
(28)
Digit
Span (15)
Vocabulary (33)
Arithmetic
(20)
Comprehension
(18)
Similarities
(19)
PIQ Subtests:
Picture
Completion (25)
Picture
Arrangement (11)
Block
Design (14)
Digit
Symbol-Coding (133)
Matrix
Reasoning (26)
FSIQ = VIQ subtests + PIQ subtests
Factor Indices:
Verbal Comprehension
Information
(28)
Vocabulary
(33)
Similarities
(19)
Perceptual Organisation
Picture
Completion (25)
Block
Design (14)
Matrix
Reasoning (26)
Working Memory
Digit Span
(15)
Arithmetic
(20)
Letter-Number
Sequencing (7)
Processing Speed
Digit
Symbol-Coding (133)
Symbol
Search (60)
Arithmetic
This subtest contains 20 items requiring progressively
more demanding mental arithmetic. Factor analytic studies indicate that despite
the arithmetic content that this is most commonly a measure of attention or
working memory. This does not mean that people with dyscalculia or specific
learning disability in arithmetic will not perform poorly on this measure. The
content of the subtest requires mastery of relatively simple mathematical
procedures such as percentages, averages, and probability. Consequently the
influence of education on this subtest is greater than for the other measures
that contribute to the Working Memory Index. This measure is included in the
computation of FSIQ, VIQ, and WMI composites.
Digit Span
This subtest also contributes to FSIQ, VIQ and WMI
composites. The Digit Span task is the prototypical immediate memory or
attentional task. On Digits Forwards, test-takers are required to repeat up to
eight-digit sequences back in correct order as they were presented. The Digits
Backwards task requires the digit sequences to be repeated back in reverse
order. It is this second task that is, perhaps, more appropriately termed
working memory as mental juggling of the number sequence is required to
successfully complete the task. Characteristically, individuals recall on
average 2 nire digits forwards than backwards. Less than 4% of the standardization
sample recalled more digits backwards than forwards.
Information
This 28 item subtest measures general knowledge through a
broad range of questions about science, literature, geography, and historical
events. This measure is highly correlated with educational achievement. This
subtest contributes to FSIQ, VIQ, and VCI.
Comprehension
This 18 item
subtest asks questions about social knowledge and awareness of socially
appropriate behaviours and responses. A critical issue with this test is that
it does not ask what you would do in a particular situation but rather what
should you do. Similarly it asks not why you think something is so, but what we
are taught are the reasons behind issues such as taxation, or the importance of
a free-press. This subtest contributes to FSIQ, and VIQ. It does not contribute
to any factor indices
Letter-Number Sequencing
This subtest contains seven
items with three trials for each item. In some ways it is similar to Digits
Forwards from the Digit Span subtest. In each trial the examinee is read a
series of number and letters that have been placed in a random order. The
examinee is required to reorder the numbers and letters and repeat them back in
the correct ascending sequence with numbers first followed by letters. This
subtest contributes only to the WMI composite. It does not contribute to any IQ
scores.
Picture Completion
This subtest consists of 25
colour drawings of objects, people, and scenes where an important element is missing.
The examinee is required to indicate what important element is missing from the
picture. This subtest contributes to FSIQ, PIQ, and POI composites.
Digit Symbol-Coding
This subtest consists of a
maximum of 133 items. The examinee is presented with a table containing the
numbers 1 through 9 and symbols (simple line drawings) that are associated with
each number. A template which contains 133 numbers (1 through 9) in a random
sequence where the numbers are presented but the associated symbols have been
omitted is presented to the examinee. Beginning with the first item, the
examinee must fill-in as many symbols that go with each number in sequence in a
two minute period. This subtest contributes to FSIQ, PIQ, and PSI composites.
Block Design
This subtest utilizes up to
9 blocks each with 2 red surfaces, 2 white surfaces, and 2 half red/half white
surfaces. These blocks are employed by the examinee to replicate a design
presented as a two-dimensional picture. Examinees manipulate the blocks and put
them together in such a way so as to produce the same design with the top
surfaces. The designs become progressively more complex and go from requiring 4
blocks to all 9 blocks in order to replicate the design.This subtest
contributes to FSIQ, PIQ, and POI composites.
Picture Arrangement
This subtest consists of 11
series of line drawings that when placed in the correct order tell a story
(allegedly humorous, but clients seldom laugh). The drawings are laid out in
front of the examinee in an incorrect order and the examinees task is to
reorder them in the correct arrangement. This subtest contributes to FSIQ, and
PIQ composites. It does not contribute to any factor indices.
Symbol Search
This subtest was taken directly from the WISC-III and requires the examinee to look at two symbols and determine whether either symbol is present in a sequence of five symbols. There are 60 items each with two target and five test symbols. The examinee is required to verify the presence or absence of the symbols in as many items as they can in two minutes. This subtest contributes only to the PSI composite. It does not contribute to any IQ scores.
When interpreting
WAIS-III, WMS-III, or any psychological test data for that matter we proceed
from the most reliable measures to the least reliable measures. This approach
protects the clinician from being biased by an interesting, unusual, abnormal
BUT unreliable finding. The most reliable measures are less prone to random
variations and our decisions are more likely to be accurate if we base our
interpretations on them. Now its time for me to get on to my soapbox. I will
make a statement now that will seem radical but is eminently defensible through
logic and basic psychometric principles. Measures of intelligence have no
meaning in a clinical evaluation. This is not because the concept of
intelligence does not exist (although I have seen little evidence to support it
in my career) but rather that the measurement of intelligence necessarily
presupposes that the individual being assessed is normal! Said another way,
when you know that someone is normal then a test of intelligence may indicate
that individual’s ranking relative to other normal individuals. This is not,
however, the case when we use these tests in the clinical setting. Clinically,
the Wecshler scales utilize tests of different cognitive abilities that are of
interest to the clinician. In this context, scores such as Full Scale
Intelligence Quotient FSIQ), Verbal IQ (VIQ), and Performance IQ (PIQ) cannot
be construed as measuring intelligence in any way. To believe so would be to
infer that a blind person is of low intelligence (VIQ = 100, PIQ = 47, FSIQ
=74) (when it’s just that they can’t see any of the Performance subtests), a
deaf person is of low intelligence (VIQ=48, PIQ= 100, FSIQ = 68) (when it’s
just that they can’t hear the Verbal subtests), and a dead person is just
profoundly intellectually impaired (VIQ = 48, PIQ = 47, FSIQ = 45). This last
little gem is a consequence of the fact that despite the fact that the dead
individual makes no responses to any of the questions receiving a raw score of
0 on every subtest, the scaled score associated with a raw score of 0 is 1
(surprisingly not 0). With 11 subtests constituting FSIQ, a dead person (or your
favourite piece of lint if you prefer) receives a sum of scaled scores of 11
which corresponds to a FSIQ of 45.
So, having convinced
you of the evils of the pragmatics of intelligence theory, what do FSIQ, VIQ,
and PIQ mean? Good question. In the context of clinical assessment FSIQ
represents overall performance on the majority of subtests. It functions
essentially as a grand mean. It is of value to us because it has the highest
reliability and has the potential for representing overall performance on the
test. However, just as a grand mean may not actually be representative, we look
to subgrouping of tests to determine whether the FSIQ is representative or not.
VIQ and PIQ divide the subtests into two groupings based upon their input and
output modalities. VIQ subtests are all administered verbally by the tester and
answered verbally by the test-taker. PIQ subtests are all administered using
visual stimuli and the test-taker can respond by pointing, writing, or
manipulating objects. Two PIQ subtests may also involve giving verbal responses
(Picture Completion and Picture Arrangement) but the scoring of each of these
tests does not rely on a verbal response. These subgroupings are historical
rather than empirical. By this I mean that the existence of VIQ and PIQ as
scores was based upon David Wechsler’s belief that these divisions would be
meaningful and not based upon research findings that support this distinction.
Empirical studies (factor analyses) of the WAIS-III in fact support four
composites or indices and these form the true basis of interpreting the
WAIS-III in the clinical setting. What is lucky for us is that the four
constructs, in principle, reflect a subgrouping of each of VIQ and PIQ into two
smaller divisions. I say, in principle, because in each case one of the
subtests omitted from the IQ score is included in an index score (Letter Number
Sequencing is omitted from VIQ but is part of WMI, and Symbol Search is omitted
from PIQ but is part of PSI).
So in proceeding from
the most reliable to the least reliable measures we begin with FSIQ, proceed to
VIQ and PIQ, and then to the Index scores VCI, POI, WMI, and PSI. After, and
only after, that you can proceed to consider individual subtests if they
address relevant hypotheses or in order to examine the integrity of index
scores.
The last two columns contain
the 90% confidence interval for a score on that measure at retest. The Lo and
Hi columns indicate the lower and upper limits of the range. So we can be 90%
sure that because Mr. Gould got a FSIQ of 108 at his initial testing that when
we retest him it should fall somewhere between 103 and 113. The implication of
this is that if we tested Mr. Gould some time later and he got a FSIQ of less
than 103 or more than 113 we would have direct evidence that his score has
changed (declined if less than 103, or improved if greater than 113). As you
can see these last two columns would only be used if a second testing had been
performed – more on this later!
Test – 90%CI
Retest – 90%CI
Verbal 111 77 107 115 105 117
Performance 104 61 98 109 95 112
Full Scale 108 70 104 111 103 113
Test – 90%CI Retest – 90%CI
POI 109 73 102 114 99 117
WMI 95 37 90 101 87 104
PSI 81 10 76
91 72 95
So, with the information
above we can describe Mr. Gould’s performances on the three IQ measures and
four index scores of the WAIS-III indicated in the table above. Remember that
only other psychologists are going to know all this jargon, so rather than
referring to VCI you would talk about a measure of his verbal comprehensive and
expressive abilities. Rather than WMI you would say something like a measure of
attention, concentration, and the ability to efficiently perform mental
operations.
Now the next
topic relates to the detection of abnormality in the profile of scores and uses
the two tables below. We have already discussed the idea of FSIQ reflecting Mr.
Gould’s overall level of functioning. To test this idea we need to consider
that FSIQ may not be made up of 11 homogeneous test scores but rather
systematic differences that average out to the FSIQ score. The first way of
addressing this is to examine the subsets of FSIQ, VIQ and PIQ. If VIQ and PIQ
differ significantly, then FSIQ does not represent a homogeneous level of
functioning. VIQ for Mr. Gould is 111, and PIQ is 104. The difference between
these two measures is (111-104) 7. If we consult the table of significant
differences between WAIS-III composite scores for Mr. Gould’s age group we find
that an 8.27 difference would be significant at p=.05. The 7 point difference
is less than this so we can assert that VIQ and PIQ do not differ
significantly.
Differences Between IQ Scores and Between Index Scores Required for Statistical Significance at the .05 Level for the 25 to 29 Age Group
|
|
VIQ – PIQ |
VCI – POI |
VCI – WMI |
POI – PSI |
VCI – PSI |
POI – WMI |
WMI – PSI |
|
p=.05 |
8.27 |
9.01 |
9.67 |
12.66 |
12.26 |
10.17 |
13.14 |
But what does this
mean? A test of significance is testing the likelihood that the two scores came
from the same distribution. Do not be swayed by the apparent size of the
difference. Just looking at the significance table for the different WAIS-III
composites shows differing values from 8.27 to 13.14. These values are
different because of the differences in reliability of the measures involved.
The more reliable two measures are, and the more inter-correlated they are,
then the less a difference needed for significance. Just remember that good
reliability results in things being more easily detected and poor reliability
results in things being much harder to detect. Back to our original question
here, what does it mean that VIQ and PIQ are significantly different? The easiest
way to understand this is to turn it around. What is the null hypothesis? That
VIQ and PIQ are identical (no difference between them). A significant
difference means we reject the null and infer that the two numbers are not the
same. Nothing more, nothing less. Knowing that VIQ is not significantly
different from PIQ and means that we can infer that Mr. Gould has similar
verbal and visual/graphomotor abilities (remember VIQ and PIQ reflect the
modalities of the tests).
Disappointed? Hoping
for more? What you really probably wanted to know was whether or not the
difference is clinically meaningful! This is not addressed by significance
(although this is not always true) but rather by abnormality. We address
abnormality by doing a “head-count”. This is the testing equivalent of asking
“OK! Hands up all those people who did …”. How common is a 7 point difference
between VIQ and PIQ in a person who has a FSIQ in the Average range. That question can be answered by the table
below – 27.3% of the standardization sample.
One-Tailed Frequencies of
Differences Between WAIS-III IQ and Index Scores for Individuals with FSIQ of
90 to 109
|
|
Difference |
Frequency |
|
VIQ – PIQ |
7 |
27.3% |
|
VCI – POI |
7 |
29.2% |
|
VCI – WMI |
21 |
4.8% |
|
POI – PSI |
28 |
2.3% |
|
VCI – PSI |
35 |
1.0% |
|
POI – WMI |
14 |
13.7% |
|
WMI - PSI |
14 |
16.3% |
This raises the
next big question – how rare is rare? All clinicians who use psychological tests
have to ultimately make a decision about this. I can tell you what I do and
why. Other decisions are not wrong but like anything else in life there are
consequences to what we decide. For tests of significance I use p<.05 as the
standard for a statistically significant difference with a two-tailed test. For
abnormality, I consider that anything that occurs with a frequency of 1 in 20
or less (5%) is sufficiently rare to call the behaviour abnormal. I will
confess, however that analyses that have a frequency of 6-10% are of particular
interest to me. I am not permitted to change my criterion when I have a
behaviour that occurs with a frequency between 6 to 10%, but what I can do is
keep it in the back of my mind and pay particular attention to any opportunities
that may arise to test the hypothesis (i.e. is it abnormal or normal). One
other comment here, abnormality refers only to infrequency or rarity of the
behaviour, it does not tell you whether or not the behaviour is impaired. For
example, an accountant is likely to be highly skilled at mental arithmetic so
his score on the WAIS-III Arithmetic subtest is likely to be abnormally higher
than other scores. This abnormality will be detected during data analysis, but
does not signal impairment. Some abnormalities are good – we call them skills,
some abnormalities are bad- we call them deficits. You will need to determine
when an abnormality is signaling a skill versus an impairment. The final
criterion is the confidence interval applied to test scores. I use 90%
confidence because I am happy to have an error rate of 5% at either end of the
distribution. Too lax a criterion (68%) will result in too narrow a band of
scores while too strict a criterion (99%) will result in a band that is too
wide to be of use (i.e. I can be 100% sure that the FSIQ score you got on the
test was somewhere between 45 and 155! Well, duh!).
A second point. Some people (not me, but you know, “other
people”) might ask: If we care about abnormality so much, why don’t we go straight
to those tables and leave out the test of significance? Ahhhh, the impetuosity
of youth! Remember, the score we obtain on the test is only a sample of
behaviour from the individual. It is no more necessarily characteristic of that
individual than your blood pressure at midday is representative of your blood
pressure overall. There will certainly be a relationship, but this is why we
have confidence intervals – to tell us the likely range in which a person’s
score is likely to fall when we have only one observation. So when we ask if a
VIQ of 111 and a PIQ of 104 are significantly different we are implicitly
taking into account the less than perfect reliability of our measurement.
Although this does not make it easier, it is understandable if we stop talking
about scores as if they are real things. The question we are really asking is:
How likely is it that a VIQ which I am 90% sure falls somewhere between 107 and
115 is the same level of performance as a PIQ which I am 90% sure falls
somewhere between 98 and 109? Can you see that the assumption that the real
difference between VIQ of 111 and PIQ of 104 is 7 must be tested. Since this
difference is less than that necessary to meet our criterion of significance we
accept the null and assert that for all intents and purposes the difference is
0 (i.e. we cannot reject the null hypothesis). So what does this have to do
with the point? When we look up how frequent a difference is (like 7) it is
based upon the assumption that 7 is the right number to reflect the difference!
The significance test, in this case for VIQ and PIQ, tells us that 7 is not the
best difference and it could well be 0. Consequently when the test of
significance indicates a non-significant finding you do not know what
difference value to look up – it certainly is not 7 and probably should be 0!
When we do find a significant difference as with VCI and WMI in the case of Mr.
Gould then the 21 point difference (which exceeds the critical difference of
9.67 necessary for significance) is assumed to be the best estimate of the
difference between these two measures, and we happily look up the frequency of
a difference of 21 between VCI and WMI and find that it is rare - 4.8%! What
this all leads to a simple rule: If the difference is significant then look up
the frequency of the difference! If the difference is not significant DO NOT
look up the frequency of the difference – it is not abnormal! You will note
that despite this rule I have still supplied you with the frequencies for all
differences – this is to prevent you from assuming that the only information I
provide is meaningful!
We now repeat this process for the Index scores of the
WAIS-III. Six comparisons can be made: VCI with POI, VCI with WMI, VCI with
PSI, POI with WMI, POI with PSI, and WMI with PSI. Essentially we are looking
for significant differences to indicate where performance levels differ and
then for those, AND ONLY THOSE, statistically significant differences we then
examine the frequency of the difference in order to detect abnormal
differences. Note that many of the comparisons are statistically significant,
indicating that this individual has distinctly different performance levels for
each of the cognitive domains assessed on the WAIS-III: verbal comprehension,
visual organization, attention/concentration, and speed of information
processing. Notice also that many of the comparisons with WMI ands PSI are
associated with frequencies of less than 5% in the standardization sample. This
is revealing a consistent pattern – Mr. Gould’s attentional abilities (WMI) and
processing speed (PSI) are abnormally lower than his verbal (VCI) and visual
organisational abilities (POI).
The next table, below, can be used to examine the
individual subtests for relative strengths and weaknesses. The reasoning behind
this analysis goes as follows. A relative strength is a performance on a
subtest that is comparatively higher than other subtests. Similarly a relative
weakness can be seen on those measures that are comparatively lower relative to
other measures. Each subtest is usually compared to the mean of subtests. The
question is how do we compute the mean? This goes back to the discussion
regarding the representativeness of FSIQ. Simply put, if there is no difference
between VIQ and PIQ (as in this case) then the average of all the subtests
administered can be used. If there is a difference between VIQ and PIQ then
separate means must be computed for verbal and performance subtests.
Differences Between Single Subtest Scaled Scores and Mean Scaled Score at the .05 Level of Statistical Significance and Magnitude of Difference Found in 5% of the Standardisation Sample
|
|
Verbal |
Subtests |
Performance |
Subtests |
All |
Subtests |
|
Subtest |
p<.05 |
5% |
p<.05 |
5% |
p<.05 |
5% |
|
VO |
2.10 |
3.00 |
|
|
2.30 |
3.38 |
|
SI |
2.77 |
3.29 |
|
|
3.12 |
3.69 |
|
AR |
2.63 |
3.57 |
|
|
2.95 |
3.85 |
|
DS |
2.40 |
4.43 |
|
|
2.67 |
4.62 |
|
IN |
2.34 |
3.29 |
|
|
2.59 |
3.69 |
|
CO |
2.96 |
3.57 |
|
|
3.35 |
3.58 |
|
LNS |
3.16 |
4.29 |
|
|
3.60 |
4.38 |
|
PC |
|
|
3.16 |
3.86 |
3.46 |
4.31 |
|
CD |
|
|
3.04 |
4.29 |
3.31 |
4.46 |
|
BD |
|
|
2.94 |
3.71 |
3.19 |
3.92 |
|
MR |
|
|
2.60 |
3.71 |
2.75 |
3.85 |
|
PA |
|
|
3.75 |
4.14 |
4.19 |
4.46 |
|
SS |
|
|
3.54 |
3.86 |
3.93 |
4.23 |
The mean for the thirteen subtests is 10.6. We now compare each subtest with this overall mean. Subtracting the mean from each of the subtests yields a pattern of positive and negative difference scores. A positive value means that the subtest is above that individual’s mean score, while a negative value indicates a lower performance. Comparison of these differences with those in the table above indicate which are significantly different from their respective means. Note that the two rightmost columns are used in this case because VIQ and PIQ do not differ significantly. The first thing you are looking for is whether or not each number differs significantly from it’s mean, indicated by an absolute difference greater than or equal to the cut-scores indicated in the table (we are only using column 6 here to detect significant differences). For those measures that are significantly different from their respective means, those that are positive are “relative strengths (S)” and those that are negative are “relative weaknesses (W)”. These findings can be used to describe where Mr. Gould’s strengths and weaknesses lie in terms of the behaviours assessed by each subtest. You can also examine column 7 to see how unusual or infrequent a significant difference is. For example the Information subtest has a scaled score of 16 which is 5.4 points above the mean of subtests. This difference is significant as 5.4 is greater than the critical value indicated for p<.05 in the table (2.59). This indicates that for Mr. Gould the Information subtest is a relative strength. Looking at column 7 we can see that 5% of the standardization sample had differences between Information and the mean of 3.69 or greater. From this we can infer that Mr. Gould’s difference of 5.4 is highly unusual and would not be expected to occur as a result of normal random variation.
Verbal Subtests |
SS |
SS-Mn |
S/W |
Performance Subtests |
SS |
SS-Mn |
S/W |
|
Vocabulary |
13 |
2.4 |
S |
Picture Completion |
13 |
2.4 |
|
Similarities |
10 |
-0.6 |
|
Digit Symbol-Coding |
7 |
-3.6 |
W |
|
Arithmetic |
11 |
0.4 |
|
Block Design |
12 |
1.4 |
|
|
Digit Span |
9 |
-1.6 |
|
Matrix Reasoning |
10 |
-0.6 |
|
|
Information |
16 |
5.4 |
S |
Picture Arrangement |
11 |
0.4 |
|
|
Comprehension |
12 |
1.4 |
|
Symbol Search |
6 |
-4.6 |
W |
|
Lett.-Num. Seq. |
8 |
-2.6 |
|
|
|
|
|
|
|
|
|
|
Mean for All Subtests |
10.6 |
|
|
A note about the subtest scores. The table below reproduces the subtests scores provided for Mr. Gould. Each subtest is named and two numbers are provided. In the first column of numbers the acronym SS in this case stands for Scaled Score. Scaled scores have a mean of 10 and a standard deviation of 3 and are adjusted for the age of the information (usually termed age scaled scores). The second column of numbers indicates the percentile rank of the scaled score. For example, Mr. Gould’s score on the Vocabulary subtest was 13 which indicates that his knowledge of the meaning of words is as good as or better than 84% of people of his age.
Vocabulary 13 84 Picture
Completion 13 84
Arithmetic 11 63 Block
Design 12 75
Digit Span 9
37 Matrix Reasoning 10 50
Information 16 98 Picture
Arrangement 11 63
Comprehension 12 75 Symbol
Search 6 9
Lett.-Num. Seq. 8
25
The next table asks about the status
of Mr. Gould’s WAIS-III scores in a different way. While the previous analyses
have focused on the level of performance amongst his cognitive abilities, the
next analyses asks the question “Based on what we know about you, where should
your scores lie?” Based on certain types of information, the clinician attempts
to construct a pattern of hypothetical scores which represent what the person
would look like if they had no disorder or condition, or in other words what
the person would have looked like before the accident.; You should immediately
recognize that if Mr. Gould had been tested on the WAIS-III and WMS-III before
his accident we would not need to do this at all. We would compare directly his
current scores with the earlier testing. Unfortunately for psychologists,
people do not routinely engage in psychological testing just in case they may
have an accident (although the idea certainly has some merit!). Such
circumstances are viewed as “pure gold” or “mana from heaven” when they occur,
but in the vast majority of cases we are forced to estimate what a person’s
scores would likely have been before the accident. This is called estimating
premorbid levels of functioning.
I have tested the client and need a
value from his past that will accurately tell me what he was like before his
difficulties. There are a number of ways of getting such values. The first is
to use information that is not related to the pathology such as demographic
characteristics of the individual. Such an approach asks the question “What
should your FSIQ be given that you are a male in your 20’s engaging in
university level education. This approach estimates premorbid functioning using
demographic variables alone.
A second approach, the
one used in this case, combines demographics with a test that we believe is unlikely
to be affected by whatever is wrong with Mr. Gould. Although this is a current
measure we are using it to “guesstimate” what he would have looked like before
his difficulties. For this approach we use the Wechsler Test of Adult Reading
(WTAR) along with demographic variables to estimate Mr. Gould’s WAIS-III
performance (and later his WMS-III performance). So, Mr. Gould’s score on the
WTAR along with his gender, age, and educational level estimate that his VIQ
should be 113. His obtained score of 111 does not differ significantly from
this estimate (for a difference to be significant it would have to be 8.1 or
larger) indicating that his current VIQ falls where we would expect it to.
Remember you don’t need to look up frequencies for non-significant differences.
Accordingly only WMI and PSI are of interest. The 16 point difference between
Mr. Gould’s actual WMI score of 95 and
his predicted score of 111 is significant and somewhat rare, with differences
of this magnitude found in only 5% to 9% of the standardization sample. The 23
point difference between his actual PSI of 81 and his expected value of 104 is
significant and extremely unusual, occurring in only 1% of the standardization
sample. This analysis further supports the contention that Mr. Gould’s WMI and
PSI scores are abnormally low and are likely to reflect cognitive impairment.
Full Scale 108 112 -4
7.4 25-49%
Verbal Comprehension 116 111 5 8.7 75-90%
Perceptual Organisation 109 109 0
8.4 50-74%
Processing Speed 81 104 -23 11.0
1%
The third edition of the Wechsler Memory Scale contains a number of subtests designed to assess both short-term and long-term memory functioning. The nomenclature of measures on this test can be confusing and mixes term from classical and modern memory theory. Unlike the WAIS-III there is no one overall measure comparable to FSIQ although there are measures similar to VIQ and PIQ. In this description of the test I will only focus on those measures necessary to generate the respective Index scores and will not include the optional subtests.
The WMS-III essentially measures three abilities: the ability to recall information shortly after its presentation (Immediate Memory), the ability to recall this same information after a 20 to 30 minute delay (General Memory), and the ability to attend and concentrate (Working Memory). Within each of the first two measures (Immediate and General Memory) there are subdivisions that are based upon whether or not the test is verbally or visually administered and whether or not the examinee had to recall or only recognize the information that was presented. In the case of this assignment these subdivisions are not relevant to either the analysis or interpretation. They will be discussed here for the sake of thoroughness.
The Immediate Memory Index consists of two verbal subtests (comprising Auditory Immediate) and two visual subtests (comprising Visual Immediate). The first verbal subtest, Logical Memory, involves the presentation of two stories which are then to be repeated back by the examinee in as much detail as possible. The second story is presented twice and recall is tested to get a gross measure of learning. The second verbal subtest, Verbal Paired Associates, presents 8 pairs of words which are read aloud to the examinee. Each word pair is an uncommon pairing of words (such as flower-paperclip) and recall is tested by the examiner supplying the first word of the pair (flower) and the examinee must supply the second word of the pair (paperclip). Four trials are administered with the examiner rereading the list of word pairs each time and testing cued recall.
The two visual subtests are Faces and Family Pictures. In
Faces, the examinee views 24 photographs of faces and is then asked to
determine which 24 out of a further 48 faces they have seen before. The Family
Pictures subtest introduces the examinee to seven characters (Grandmother,
grandfather, father, mother, son, daughter, and dog – I know, I know, blatant
discrimination against cat people!). The examinee is then shown four scenes in
which different family members are doing different things in different parts of
the picture (Maybe this is why they didn’t use a cat – the cat would be doing
the same thing in all four pictures –
nothing!). After seeing all four scenes, the examinee is required to indicate
who was in each picture, their location, and what they were doing.
The General Memory Index consists of the delayed recall
trials (administered approximately 20 to 30 minutes after the immediate recall)
of these same subtests. Auditory Delayed consists of delayed recall of the
Logical Memory stories and the Verbal Paired Associates. Visual Delayed
consists of the delayed trials of the Faces and Family Pictures tests. A third
measure Auditory Recognition Delayed consists of the recognition trials of
Logical Memory and Verbal Paired Associates that are presented after their
delayed recall. General Memory Index is therefore made up of Auditory Delayed,
Visual Delayed, and Auditory Recognition Delayed measures.
The Working Memory Index consists of the Letter-Number
Sequencing subtest of the WAIS-III and a visual span task called Spatial Span.
This task is very much like digit span except that rather than repeating
forwards or backwards number sequences read by the examiner, the examinee taps
out (forwards or backwards) a series of patterns tapped out by the examiner on
a board with ten blocks in various positions. One thing to note is that the
Letter-Number Sequencing score used in this Working Memory Index is not from a
second administration of the test but rather is exactly the same number as was
used in the WAIS-III. I know it seems insane, the reasons are too complicated
to discuss here – sufficient to say that the WMI on the WAIS-III and the WMI on
the WMS-III are far from independent assessments of the same construct.
Mr. Gould’s scores on the WMS-III were 89 for the immediate recall of information, 77 for the recall of information after a 20 to 30 minute time delay, and 91 for his ability to attend and concentrate. These performances were as good as or better than 23%, 6%, and 27% of people of his age and corresponded to performances in the low average, below average, and average ranges respectively.
Test – 90%CI
Retest – 90%CI
Immediate Memory
89 23 83 97 80 100
General Memory
77 6 72 86 69 89
Working Memory
91 27 84 100 80 105
There are three comparisons that can be made here, the most important of which is the IM comparison with GM. This addresses the issue of whether or not Mr. Gould’s memory performances are detrimentally affected by increasing the delay between presentation and recall. There is a 12 point difference between Mr. Gould immediate (IM) and delayed (GM) recall. This difference is significant (critical value of 11.8) and abnormal (estimated to occur in 4.4% of the standardization sample). This indicates that Mr. Gould’s ability to recall information after a time delay is abnormally poor relative to his ability to recall information when it is first presented. The comparisons of IM and GM with Working Memory (WM) also indicate a significant difference but only for GM and WM (a 14 point difference). This difference while significant is not considered abnormal as such discrepancies are found in almost 19% of the standardization sample.
Differences Between WMS-III Primary Index Scores for Statistical Significance at the .05 Level of Significance for the 25 to 29 Age Group
|
|
IM – GM |
IM – WM |
GM – WM |
|
p=.05 |
11.8 |
13.5 |
13.5 |
Frequencies of Differences Between WMS-III Primary Index Scores
|
Comparison |
Difference |
Frequency |
|
IM – GM |
12 |
4.4% |
|
IM – WM |
-2 |
46.3% |
|
GM – WM |
-14 |
18.9% |
While these findings strongly indicate
impairment in delayed recall (GM) there is another method that we can use to
determine whether or not his immediate memory is where it should be. We can
address this issue in two ways: first, by asking whether or not Mr. Gould’s
WMS-III scores are normal for a man with his intellectual abilities (as
indicated by his FSIQ), and second, by asking what we would expect a man of his
age, education, and reading ability to obtain on the WMS-III. The tables below
(again altered to remove the unnecessary comparisons) address these questions.
Based upon Mr. Gould’s FSIQ of 108 we
would expect that his predicted IM, GM, and WM scores would all be 105. He
actually obtained 89, 77, and 91 for these measures yielding differences of 16,
28, and 14 between estimated and obtained scores respectively. Only the GM and
WM differences are significant and indicate an abnormally low GM (frequency of
1-2%) and a suspiciously low WM (frequency of 10%). This supports that Mr.
Gould’s scores on delayed memory are abnormally low for a man of his
cognitive/intellectual abilities.
Comparisons of WAIS-III and WMS-III Composites Using Predicted Difference Method (Based Upon a FSIQ of 108)
|
WMS-III Index |
Predicted |
Obtained |
Difference |
p=.05 |
Frequency |
|
Immediate Memory |
105 |
89 |
16 |
17.7 |
10-15% |
|
General Memory |
105 |
77 |
28 |
16.9 |
1-2% |
|
Working Memory |
105 |
91 |
14 |
9.1 |
10% |
The second approach uses the
WTAR+demographics to predict expected WMS-III scores. Based upon his WTAR, age,
gender, and education we expect him to have an IM score of 102, GM score of
104, and WM score of 108. Comparison with his actual scores of 89, 77, and 91
yield differences of –13, -27, and –17 respectively. While all three scores are
significantly lower than would be expected, only GM is clearly abnormal (by my
criteria with a frequency of 2-4%), while WM is suspiciously low (frequency of
5-9%).
Immediate Memory 89 102 -13 9.0 10-24%
General Memory 77 104 -27 9.0 2-4%
Working Memory 91 108 -17 11.5
5-9%
We can now look
at the testing from eleven months later. This analysis is the most
straightforward but it requires you to again work with information from
multiple tables. Again for simplicity’s sake I will remove the irrelevant
measures.
Now this is where those retest confidence intervals from
the first testing come into the analysis. The retest confidence interval allows
us to determine whether or not scores on the second testing have deteriorated,
stayed the same, or improved. You will remember that this determination is
critical to our differentiating among the alternative hypotheses.
Verbal 115 84 Immediate
Memory 98 45
Full Scale 113 81 Working Memory 96 39
POI 118 88
WMI 99 47
PSI 88 21
We will now generate a table that combines the information from our first testing and second testing. We need the 90% RETEST confidence bands from the FIRST testing, the actual scores from the SECOND testing and then consideration of where these scores fall relative to the retest bands. Scores that fall within the retest band are unchanged. Those below it, have declined, those above it have improved.
This analysis indicates that Verbal Comprehension, Perceptual Organisation, and General Memory have improved to a degree where it cannot be attributed to random variation in test scores.
|
|
90% RETEST CI |
Retest |
|
|
|
WAIS-III/WMS-III Measures |
Lo |
Hi |
Score |
Status |
|
Verbal IQ |
105 |
117 |
115 |
Unchanged |
|
Performance IQ |
95 |
112 |
110 |
Unchanged |
|
Full Scale IQ |
103 |
113 |
113 |
Improved |
|
VCI |
108 |
122 |
122 |
Improved |
|
POI |
99 |
117 |
118 |
Improved |
|
WMI |
87 |
104 |
99 |
Unchanged |
|
PSI |
72 |
95 |
88 |
Unchanged |
|
|
|
|
|
|
|
Immediate Memory |
80 |
100 |
98 |
Unchanged |
|
General Memory |
69 |
89 |
91 |
Improved |
|
Working Memory |
80 |
105 |
96 |
Unchanged |
To better understand the implications of these changes, let’s take a
look at the retest WAIS-III and WMS-III
data.
Full Scale 113 81 WMI 99 47
PSI
88 21
VIQ for Mr. Gould
on retest is 115, and PIQ is 110. The difference between these two measures is
(115-110) 5. The table of significant differences below indicates that an 8.27
difference would be significant at p=.05. As the 5 point difference is less
than this, we can assert that VIQ and PIQ do not differ significantly.
Differences Between IQ Scores and Between Index Scores Required for Statistical Significance at the .05 Level for the 25 to 29 Age Group
|
|
VIQ – PIQ |
VCI – POI |
VCI – WMI |
POI – PSI |
VCI – PSI |
POI – WMI |
WMI – PSI |
|
p=.05 |
8.27 |
9.01 |
9.67 |
12.66 |
12.26 |
10.17 |
13.14 |
What about the
Index Scores?
|
|
Difference |
Frequency |
|
VCI – POI |
4 (ns) |
40.0% |
|
VCI – WMI |
23 (sig) |
3.9% |
|
POI – PSI |
30 (sig) |
2.5% |
|
VCI – PSI |
34 (sig) |
1.5% |
|
POI – WMI |
19 (sig) |
8.8% |
|
WMI - PSI |
11 (ns) |
23.1% |
Clearly, Mr.
Gould’s attentional abilities (WMI) and processing speed (PSI) are abnormally
low relative to his verbal abilities. This ties in with the retest comparisons
in that his VCI improved from test to retest but his WMI and PSI did not. If
you think about it, this suggests that although we did not detect it in the
test data, there must have been some lower than normal scores in VCI that have
now recovered on retest (take a look at Similarities and remember Mr. Gould’s
concern at not being able to do this test as well as he expected). By the way,
there are analyses which would have picked this up in the test data, but I
thoughjt you already have enough to deal with, without adding another three or
four analyses to the assignment. No don’t thank me,. it’s all part of the
service! Similarly, POI-PSI is abnormally low (2.5%) indicating that Mr. Gould’s
processing speed (PSI) is still abnormally low relative to his other visual
abilities (POI). As with VCI, POI also improved between test and retest and for
similar reasons (take a look at Matrix Reasoning!)
So, we can tell a
number of things at this point. Initially Mr. Gould demonstrated problems with
his attentional abilities and processing speed. His verbal and perceptual
organizational abilities appeared to fall within the range that would be
expected for this man. Retest data indicate that his attentional abilities and
processing speed have not substantially improved although his verbal and visual
processing scores have (primarily due to improvement in the two abstract
reasoning/problem solving tasks).
What about
memory? The tables below indicate that there are now no significant differences
between Mr. Gould’s immediate recall (IM), delayed recall (GM), and attentional
abilities (WM). This suggests that Mr. Gould’s ability to retain information
over a thirty minute period is commensurate with his ability to retain it shortly after presentation. This is also
consistent with the test-retest finding that Mr. Gould’s delayed recall has
improved.
Working Memory 96 39
Differences Between WMS-III Primary Index Scores for Statistical Significance at the .05 Level of Significance for the 25 to 29 Age Group
|
|
IM – GM |
IM – WM |
GM – WM |
|
p=.05 |
11.8 |
13.5 |
13.5 |
Frequencies of Differences Between WMS-III Primary Index Scores
|
Comparison |
Difference |
Frequency |
|
IM – GM |
7 (ns) |
17.1% |
|
IM – WM |
2 (ns) |
46.3% |
|
GM – WM |
-5 (ns) |
39.1% |
To understand how
his memory now is relative to his intellectual abilities we need to compare IQ
with memory scores. Note that because Mr. Gould’s FSIQ has gone up from 108 to
113 we must now use a different normative table:
Comparisons of WAIS-III and WMS-III Composites Using Predicted Difference Method (Based Upon a FSIQ of 113).
|
WMS-III Index |
Predicted |
Obtained |
Difference |
p=.05 |
Frequency |
|
Immediate Memory |
107 |
98 |
9 |
17.7 |
20-25% |
|
General Memory |
108 |
91 |
17 |
16.9 |
5-10% |
|
Working Memory |
109 |
96 |
13 |
9.1 |
10-15% |
Notice how,
despite the improvement in GM from test to retest, compared to his intellectual
abilities it is still abnormally low. Thus, while his delayed recall (GM) has
improved over the eleven months, it is still abnormally low compared to his
other intellectual abilities.
All that is left
now is to write an interpretation of this data. Unfortunately, since this is
what the assignment is about I cannot help you further. In thinking about the
data and Mr. Gould’s plight, consider what sort of question each analysis is capable
of addressing:
·
The percentiles of each of the subtests and indices permit you to
describe Mr. Gould’s current level of functioning.
·
The WAIS-III subtest analysis permits you to detect abilities or
performances which are particularly strong or weak for Mr. Gould.
·
The analysis of WAIS-III indices permits you to determine whether or
not his cognitive abilities are related in the way that they are in most
people, or whether there is something unusual about his pattern of abilities.
·
The analysis of WMS-III indices asks the same of the memory indices –
“Are they put together in the way that they are for most people?”
·
The analysis comparing WAIS-III with WMS-III permits you to determine
whether or not Mr. Gould’s memory abilities are consistent with his other
intellectual abilities.
·
The analyses involving the WTAR + demographics permits you to answer
the question “Where should Mr. Gould’s scores be?” Comparison of these values with those he actually obtained
provides insight as to which abilities may have been detrimentally impacted
upon by his injuries.
·
The Retest Confidence Intervals permit you to examine which measures
have changed over the eleven month interval between tests.
·
The intregration of all of these analyses will assist in constructing a
picture of not only what Mr. Gould’s cognitive abilities were like three months
after his motor vehicle accident and again eleven months later, but also how
these abilities may have changed over time. Recognition of these strengths and
weaknesses will be critical in advising Mr. Gould on his educational and career
aspirations.
This is a
demanding assignment because it asks you to grasp a lot of information to which
you have not been formerly exposed. However, this is a good model of how 21st
century assessors go about analysing and interpreting psychological test data.
Before you feel too hard done by, consider the small number of variables that
you have to consider in this case from only two tests. In a routine assessment
in my own clinical practice more than twenty tests are customarily administered
each with many scales and subscales. The process you have learned here is
essentially the same as that applied to analyse twenty tests or two hundred
variables, it just becomes progressively more complicated with the addition of
more measures. The other warning I should give is that this case has been
structured and simplified to illustrate certain principles. Most psychological
assessments are highly complex and full closure is seldom achieved. The process
is the same as in this case, but the outcome is rarely as neat.
This concludes
the ancillary materials provided to assist you in completing assignment 2.
Enjoy!
Dr. Graeme Senior
Senior Lecturer
Department of
Psychology
University of
Southern Queensland