Three types of
information have been presented that you must process and learn how to utilize
in order to successfully complete this assignment. What follows is a written
version of the material presented in lectures and the residential schools
designed to achieve this goal of aiding you in mastering this material.
The three types
of information are the introductory background material for the case,
information regarding the tests that have been utilized in the assessment, and
the testing results along with the standardization tables necessary to
interpret them. I shall proceed in the order in which the information is
presented in the assignment. As the introduction to the assignment tells us:
“Dr. Alvarez is a
62-year-old academic who for the past few years has been concerned about his
failing memory.”
Okay, let’s start
here! What have we learned so far? In addition to learning that our client is a
male in his ‘60s with a high level of education, we are getting our first
indication of the reason for referral – memory problems – this does not mean
that he must necessarily have memory problems but rather that his perception of
his difficulties are related to memory functioning.
“He has had a long and distinguished career
as an academic and has taught English Literature in the Faculty of Arts at the
local university for the last thirty-two years.”
On the basis of
this statement I would have an expectation that his verbal abilities should be
at least be above average.
“He has found
life to be considerably less enjoyable since his wife passed away just over
eighteen months ago, but has found some solace in his continued love of
teaching and the enjoyment he still gets from reading the great classics.”
This raises
issues and concerns such as bereavement and depression, but also suggests that
despite whatever grief he may experiencing, work is still rewarding for him.
“Twelve years ago, Dr. Alvarez was the
primary caretaker of his ailing father who died of Alzheimer’s disease. He
recalls vividly that the first indication he and his wife had that anything was
wrong with his father was a persistent
and worsening problem with memory. His father had been an intelligent man who,
although poorly educated, had passed on his joy of reading to his son. Dr.
Alvarez watched while his father, a robust vital man with a deep appreciation
for life, deteriorated into a dependent, frail and confused man before his
death. Dr. Alvarez now reports many of the difficulties he observed in his
father and is of the opinion that he too may have Alzheimer’s disease.
It is for this
reason that he has been referred for neuropsychological assessment. Dr. Alvarez
was cooperative and appeared highly motivated to perform well. He found the
verbal subtests of the WAIS-III particularly easy and made no pronunciation
errors at all on the NART-2. You have not been provided with data from his
psychosocial assessment but the results of this testing indicated clinically
high levels of depression for which he is taking antidepressant medication.”
Ah-hah! The plot
thickens. Let’s think about what may be going on here. We do not engage in
speculation at this point with the intent of prejudging the case, but rather to
consider alternative hypotheses and ensure that we examine each appropriately.
Dr. Alvarez has been experiencing difficulties with his memory. Let’s consider
some possibilities.
1.
The first that comes to mind is that Dr. Alvarez’ abilities are
unimpaired. Many individuals experience, with increasing age, changes in their
life that they may misinterpret as cognitive impairment. After I have taught
the same unit for more than three or four times I have a distinct feeling that
my audience has already heard all the things that I have to say. I find it
increasingly difficult to separate something that I have said many times before
from the groups that I have said it to. I compensate behaviourally by often
saying to my classes “stop me if I have already told you this” It would not be
uncommon for individuals less aware of
how memory functions to mistake this for pathological forgetting and be
concerned about not being able to remember to whom you said what! This,
however, is a natural consequence of the increasing interference that have been
laid down in the same way and in the same context. The more times I teach the
material the greater confusion I will have as to exactly where and when that
information was presented. Assessment of my memory should reveal no impairment
with my ability to learn novel material.
2.
The second possibility is that Dr. Alvarez has specialist knowledge
regarding Alzheimer’s disease that most people do not, having been his father’s
primary caretaker through his last few years. Dr. Alvarez may be accurately
recognizing the similarities between his father symptoms and his own. This
hypothesis essentially suggests that Dr. Alvarez has a dementing condition such
as Alzheimer’s disease.
3.
Related to the second hypothesis, is the possibility that there are
little to no meaningful similarities between his father’s memory problems and
his own. This can be a case of a little knowledge can be a bad thing. Dr.
Alvarez may be seeing similarities that are not relevant to his condition and
his fear and anxiety about ending up like his father may be causing the
difficulties he is experiencing.
4.
Another psychosocial issue we need to consider is the potential impact
of depression. He lost his beloved wife only a year-and-a-half ago and suffered
the witnessing of his father’s deterioration and eventual death. Depression
would certainly be an understandable reaction to these events. The case history
informs us that Dr. Alvarez is taking antidepressant medication supporting the
presence f depressive symptomatology and raising another associated hypothesis.
5.
The medication Dr. Alvarez is taking to treat his Depression could also
be impacting upon his cognitive functioning.
6.
The other major consideration that comes to mind is that of
Age-Associated Memory Impairment. This condition was discovered primarily
through longitudinal studies designed to detect early signs of Alzheimer’s
disease. These studies, by the way, also established that the old idea of
senility – cognitive deterioration as a normal consequence of advancing age –
is not the common lot of aged mean and women. In these longitudinal studies
some people were found to demonstrate over time little change in their
cognitive status – normal aging. Some showed a decline in overall cognitive
functioning consistent with dementing disorders such as Alzheimer’s Disease. A
third group however demonstrated declines in memory functioning with increasing
age but no changes in other areas of cognitive functioning. This last group
seemed to show an age-related decline in memory only which has been subsequently
referred to as age-associated memory impairment. It should be stressed that
these individuals do not demonstrate the normal range effects of memory
associated with dfferent age cohorts. Rather there is direct evidence for
impairment relative to others of their age but this seems to be restricted
solely to memory functioning.
So, why are we considering
these alternatives? They make nice stories (I particularly liked the one that
suggested that the examiner of this unit may be dementing!) but surely we need
evidence to determine which hypothesis is most likely! EXACTLY! The above
hypotheses serve little purpose if we cannot consider how we would expect them
to impact upon psychological tests. Before considering that, though, we should
review the last paragraph in the case introduction:
“Eighteen months
following the first assessment, Dr. Alvarez was assessed again using the
WAIS-III and WMS-III. Psychosocial testing at this time revealed a substantial
reduction in depressive symptomatology. This finding was consistent with his
physician’s observation of improvement in his mood and subsequent reduction in
dosage of his medication.”
Okay, let’s think about this. Eighteen months
later Dr. Alvarez’ depressive symptomatology has reduced, interesting! But of
even greater interest is that this corresponds to his physician’s observations
of improved mood and REDUCTION IN DOSAGE OF HIS MEDICATION. (I’m not sure but I
think this might be important later on!!).
· Let us now consider the likely impact of the respective hypotheses in possible test performance. It is understood that at this point in your training you may feel ill-equipped to do this. We all have to start somewhere, and in this case I will be giving you lots of prompts and guides (No, no need to thank me! It comes form a deep-seated commitment on my part to . . . erh, hmmm! But I digress!). The hypothese will be based solely around cognitive/intellectual (read WAIS-III) and memory (read WMS-III) assessments. Hypothesis 1 was that Dr. Alvarez’ abilities are unimpaired. If this is the case then we would expect that his first assessment results would fall in the normal range, be consistent with pre-morbid estimates, and that on retest there would be no change detected.
· Hypothesis 2 was that Dr. Alvarez has a dementing condition. In terms of psychological assessment of cognitive functioning, dementia is indicated where evidence exists for a significant decline in overall cognitive/intellectual functioning with specific cognitive deficits in at least one cognitive domain, usually memory. In this case we would expect that on initial testing there would be evidence for both memory and other cognitive/intellectual impairment and that on retest these scores would indicate further decline (There is much room for variation here but for the purposes of this assignment I am maximizing the differences between each of the hypotheses). If Dr. Alvarez is experiencing the ongoing effects of a dementing disorder, then as the disease progresses so too will his abilities decline.
· Hypothesis 3 proposes as with Hypothesis 1 that there is essentially no cognitive impairment.
· Hypotheses 4 and 5 are related and propose that any cognitive deficits may have are a result of his depression or the medication taken to treat it. If this is the case then we would expect to see impairment on the first testing that improves on the second testing. This improvement is predicted based upon the test data indicating a reduction in depressive symptomatology and the physician’s reduction in the amount of medication Dr. Alvarez is taking. If Dr. Alvarez’ difficulties are due to his depression and/or treatment then as this condition alleviates and medication is reduced then his cognitive functioning should improve.
· Hypothesis 6 proposes that Dr. Alvarez is experiencing age-associated memory impairment (AAMI). This condition would predict that Dr. Alvarez’ overall cognitive/intellectual functioning would be normal on both first and second testing and that only memory deficits should be indicated in his profile which would in all likelihood continue to worsen by the second testing.
These hypotheses and their impact on WAIS-III and WMS-III are summarized in the table below. Again, please understand that there is tremendous variation in how these different conditions actually alter test performances. This is an idealized representation that is designed to match the data that you have been asked to analyse.
|
|
First Testing |
Second Testing |
||
|
Hypothesis |
WAIS-III |
WMS-III |
WAIS-III |
WMS-III |
|
1. Normal |
Normal |
Normal |
Normal |
Normal |
|
2. Dementia |
Impaired |
Impaired |
More Impaired |
More Impaired |
|
3. Anxiety |
Normal |
Normal |
Normal |
Normal |
|
4&5. Depression |
Impaired |
Impaired |
Improved |
Improved |
|
6. AAMI |
Normal |
Impaired |
Normal |
More Impaired |
Alright, now we’re getting down to the nitty-gritty. In the data presented to you there are direct or oblique references to four tests that were administered to Dr. Alvarez. The data for only two tests have been provided but we should, nonetheless consider the implications of all four. As I have indicated in the assignment this would constitute a poor assessment if these were the only tests administered but they data presented is sufficient to enable you to test the hypotheses generated and get a taste for the marvelous detective game that psychological assessment can be.
The Wechsler Adult Intelligence Scale – 3rd Edition (WAIS-III) and the Wechsler Memory Scale – 3rd edition (WMS-III) have been administered and some of the test scores from the initial and follow-up assessments have been provided. The data from his psychosocial assessment (that you have not been provided with) was from a test called the Minnesota Multiphasic Personality Inventory-2nd edition and includes 567 true-false questions that the respondent has to answer. The responses can be combined into more than 250 scales that can be examined for indications of psychopathology such as depression, anxiety, physical complaints, social isolation, etc. To make you learn this test would constitute cruel and unusual punishment (and this reserved for 4th year and Master’s students). Sufficient to say, that this test revealed large numbers of depressive symptoms that is corroborated by his ongoing treatment for depression. It is important to note that eighteen months later this test indicates a substantial reduction in the number of depressive symptoms which again is corroborated by independent medical evaluation and a reduction in antidepressant medication. The second test referred to is the NART-2. Examination of this measure is not strictly necessary to complete the assignment, but this measure would be commonly employed in cases of this type. The NART-2 consists of irregular words whose pronunciation does not conform to standard rules of English pronunciation. For example, the word “drachm” would likely be pronounced as “drak-him” by those unfamiliar with the word. Those who have been exposed to it know how to pronounce it, “dram”, albeit often with a somewhat inebriated slur. Thus performance on the test is a measure of the degree to which you have been exposed to infrequent and irregular words in the English language, something which is highly correlated with the number of years of formal education one has had. Some of the original studies with this measure have indicated that, perhaps with the exception of severe impairment, declines in cognitive functioning are NOT accompanied by declines in performance on this test. Many clinicians, therefore use the measure as a way of examining word knowledge with the expectation that whatever problem you may have has had little impact upon test performance. This is a round about (but technically accurate) way of saying that it is often used as an estimate of your abilities before you began suffering from your current difficulty. An individual hypothesized to be currently experiencing a decline in cognitive functioning could be so recognized by comparing where they should be based upon their NART-2 score with where they are now! More about this later!
THE WAIS-III
The WAIS-III is a cognitive/intellectual battery of tests
designed to assess a variety of cognitive domains. Most notable in the battery
is the omission of tests that evaluate memory beyond that of short-term recall.
The battery consists of 14 subtests only 13 of which are customarily administered.
The Object Assembly subtest has been included in the third edition of the
battery to maintain consistency with its predecessor (the WAIS-R) and to permit
the substitution of this test for a spoiled Block Design performance. In the
case that you will be analyzing no data for Object Assembly has been provided.
The nature of the subtests will be discussed shortly, but
analysis of the WAIS-III focuses primarily around composite scores. Composite
scores are combinations of subtests according to a particular theoretical
framework. The two types of composites on the WAIS-III are IQ scores (which are
based upon an historical framework) and Index scores (which are based upon an
empirical framework). One of the things that can be somewhat confusing about the
WAIS-III is that subtests are combined in ways that are similar but not
identical for the two frameworks described. Let’s consider the IQ framework
first.
David Wechsler, the developer of the Wechsler batteries conceived of his test as measure overall cognitive/intellectual ability in the form of score, FSIQ, that included all of the subtests on the battery. This FSIQ could be divided into two sub-categories, VIQ (which encompassed those measures that were administered verbally and required verbal responses) and PIQ (which encompassed those measures that were administered visually and usually required a written, pointing, or object-manipulation response). So essentially, FSIQ represents an overall indication of performance and VIQ and PIQ are subdivisions that relate to the mode of input and output of information. This framework has been retained in the WAIS-III with IQ scores computed from the measures that have been traditionally used for this purpose.
Factor analysis of the WAIS-III, however, reveals that rather than a dichotomous verbal and visual structure to the battery, four abilities or constructs are actually measured by the test. This factor structure forms the basis of the four Index scores. VCI (Verbal Comprehension) consists of those subtests that best measure verbal comprehension and expression. POI (Perceptual Organization) consists of those subtests that best measure an individual’s ability to process complex visual information and solve problems – in some ways a non-verbal reasoning measure. WMI (Working Memory) consists of those subtests that best measure attentional abilities and the degree to which an individual can efficiently perform mental operations. PSI (Processing Speed) consists of the subtests that best measure speeded visual information processing - these measures all involve scanning of visual information and the rapid writing of responses on a page.
As you are well aware, the more items that are included on a scale the higher the reliability is likely to be. Examination of the numbers of subtests that are included in each of the IQ and Index scores (in the table below) would appropriately suggest that higher internal consistency is found in composites that contain more subtests. Our approach to analyzing psychological test data is to proceed in a hierarchical fashion from the most reliable measures and work our way down to the least reliable measures. This means beginning with FSIQ, proceeding to VIQ and PIQ, then to VCI, POI, WMI, and PSI, and finally to the individual subtest level should this prove necessary.
STRUCTURE
OF WAIS-III COMPOSITE SCORES
No. of Subtests: 14
VIQ Subtests: (Number of Items)
Information
(28)
Digit
Span (15)
Vocabulary
(33)
Arithmetic
(20)
Comprehension
(18)
Similarities
(19)
PIQ Subtests:
Picture
Completion (25)
Picture
Arrangement (11)
Block
Design (14)
Digit
Symbol-Coding (133)
Matrix
Reasoning (26)
FSIQ = VIQ subtests + PIQ subtests
Factor Indices:
Verbal Comprehension
Information (28)
Vocabulary
(33)
Similarities
(19)
Perceptual Organisation
Picture
Completion (25)
Block
Design (14)
Matrix
Reasoning (26)
Working Memory
Digit
Span (15)
Arithmetic
(20)
Letter-Number
Sequencing (7)
Processing Speed
Digit
Symbol-Coding (133)
Symbol
Search (60)
The following is a general
description of each of the WAIS-III subtests:
Vocabulary
This subtest presents 33 words orally to the test-taker and requires them
to supply a dictionary style definition. It is the most reliable subtest in the
scale and is the best measure of g (69% of its variance). It contributes to
FSIQ, VIQ, and VCI composites and is generally considered a measure of word
knowledge. This measure is heavily influenced by formal education and literacy.
Similarities
This subtest consists of 19 word
pairs. The test-taker’s task is to indicate how the two words are similar. More
points are awarded for a more abstract relationship. It contributes to
FSIQ, VIQ, and VCI composites and is
generally considered to be a measure of relational word knowledge. This measure
is strongly influenced by education and literacy.
Arithmetic
This subtest contains 20 items
requiring progressively more demanding mental arithmetic. Factor analytic
studies indicate that despite the arithmetic content that this is most commonly
a measure of attention or working memory. This does not mean that people with
dyscalculia or specific learning disability in arithmetic will not perform
poorly on this measure. The content of the subtest requires mastery of
relatively simple mathematical procedures such as percentages, averages, and
probability. Consequently the influence of education on this subtest is greater
than for the other measures that contribute to the Working Memory Index. This
measure is included in the computation of FSIQ, VIQ, and WMI composites.
Digit
Span
This subtest also contributes to
FSIQ, VIQ and WMI composites. The Digit Span task is the prototypical immediate
memory or attentional task. On Digits Forwards, test-takers are required to
repeat up to eight-digit sequences back in correct order as they were
presented. The Digits Backwards task requires the digit sequences to be
repeated back in reverse order. It is this second task that is, perhaps, more
appropriately termed working memory as mental juggling of the number sequence
is required to successfully complete the task. Characteristically, individuals
recall on average 2 nire digits forwards than backwards. Less than 4% of the
standardization sample recalled more digits backwards than forwards.
Information
This 28 item subtest measures
general knowledge through a broad range of questions about science, literature,
geography, and historical events. This measure is highly correlated with
educational achievement. This subtest contributes to FSIQ, VIQ, and VCI.
Comprehension
This 18 item subtest asks questions about social knowledge and awareness
of socially appropriate behaviours and responses. A critical issue with this test
is that it does not ask what you would do in a particular situation but rather
what should you do. Similarly it asks not why you think something is so, but
what we are taught are the reasons behind issues such as taxation, or the
importance of a free-press. This subtest contributes to FSIQ, and VIQ. It does
not contribute to any factor indices
Letter-Number
Sequencing
This
subtest contains seven items with three trials for each item. In some ways it
is similar to Digits Forwards from the Digit Span subtest. In each trial the
examinee is read a series of number and letters that have been placed in a
random order. The examinee is required to reorder the numbers and letters and
repeat them back in the correct ascending sequence with numbers first followed by
letters. This subtest contributes only to the WMI composite. It does not
contribute to any IQ scores.
Picture
Completion
This
subtest consists of 25 colour drawings of objects, people, and scenes where an
important elemnt is missing. The examinee is required to indicate what
important element is missing from the picture. This subtest contributes to
FSIQ, PIQ, and POI composites.
Digit
Symbol-Coding
This
subtest consists of a maximum of 133 items. The examinee is presented with a
table containing the numbers 1 through 9 and symbols (simple line drawings)
that are associated with each number. A template which contains 133 numbers (1
through 9) in a random sequence where the numbers are presented but the
associated symbols have been omitted is presented to the examinee. Beginning
with the first item, the examinee must fill-in as many symbols that go with
each number in sequence in a two minute period. This subtest contributes to
FSIQ, PIQ, and PSI composites.
Block
Design
This
subtest utilizes up to 9 blocks each with 2 red surfaces, 2 white surfaces, and
2 half red/half white surfaces. These blocks are employed by the examinee to
replicate a design presented as a two-dimensional picture. Examinees manipulate
the blocks and put them together in such a way so as to produce the same design
with the top surfaces. The designs become progressively more complex and go
from requiring 4 blocks to all 9 blocks in order to replicate the design.This
subtest contributes to FSIQ, PIQ, and POI composites.
Picture
Arrangement
This
subtest consists of 11 series of line drawings that when placed in the correct
order tell a story (allegedly humorous, but clients seldom laugh). The drawings
are laid out in front of the examinee in an incorrect order and the examinees
task is to reorder them in the correct arrangement. This subtest contributes to
FSIQ, and PIQ composites. It does not contribute to any factor indices.
Symbol
Search
This
subtest was taken directly from the WISC-III and requires the examinee to look
at two symbols and determine whether either symbol is present in a sequence of
five symbols. There are 60 items each with two target and five test symbols.
The examinee is required to verify the presence or absence of the symbols in as
many items as they can in two minutes. This subtest contributes only to the PSI
composite. It does not contribute to any IQ scores.
When interpreting
WAIS-III, WMS-III, or any psychological test data for that matter we proceed
from the most reliable measures to the least reliable measures. This approach
protects the clinician from being biased by an interesting, unusual, abnormal
BUT unreliable finding. The most reliable measures are less prone to random
variations and our decisions are more likely to be accurate if we base our
interpretations on them. Now its time for me to get on to my soapbox. I will
make a statement now that will seem radical but is eminently defensible through
logic and basic psychometric principles. Measures of intelligence have no
meaning in a clinical evaluation. This is not because the concept of
intelligence does not exist (although I have seen little evidence to support it
in my career) but rather that the measurement of intelligence necessarily
presupposes that the individual being assessed is normal! Said another way,
when you know that someone is normal then a test of intelligence may indicate
that individual’s ranking relative to other normal individuals. This is not,
however, the case when we use these tests in the clinical setting. Clinically,
the Wecshler scales utilize tests of different cognitive abilities that are of
interest to the clinician. In this context, scores such as Full Scale
Intelligence Quotient FSIQ), Verbal IQ (VIQ), and Performance IQ (PIQ) cannot
be construed as measuring intelligence in any way. To believe so would be to
infer that a blind person is of low intelligence (VIQ = 100, PIQ = 47, FSIQ
=74) (when it’s just that they can’t see any of the Performance subtests), a
deaf person is of low intelligence (VIQ=48, PIQ= 100, FSIQ = 68) (when it’s
just that they can’t hear the Verbal subtests), and a dead person is just
profoundly intellectually impaired (VIQ = 48, PIQ = 47, FSIQ = 45). This last
little gem is a consequence of the fact that despite the fact that the dead individual
makes no responses to any of the questions receiving a raw score of 0 on every
subtest, the scaled score associated with a raw score of 0 is 1 (surprisingly
not 0). With 11 subtests constituting FSIQ, a dead person (or your favourite
piece of lint if you prefer) receives a sum of scaled scores of 11 which
corresponds to a FSIQ of 45.
So, having convinced
you of the evils of the pragmatics of intelligence theory, what do FSIQ, VIQ,
and PIQ mean? Good question. In the context of clinical assessment FSIQ
represents overall performance on the majority of subtests. It functions
essentially as a grand mean. It is of value to us because it has the highest
reliability and has the potential for representing overall performance on the
test. However, just as a grand mean may not actually be representative, we look
to subgrouping of tests to determine whether the FSIQ is representative or not.
VIQ and PIQ divide the subtests into two groupings based upon their input and
output modalities. VIQ subtests are all administered verbally by the tester and
answered verbally by the test-taker. PIQ subtests are all administered using
visual stimuli and the test-taker can respond by pointing, writing, or
manipulating objects. Two PIQ subtests may also involve giving verbal responses
(Picture Completion and Picture Arrangement) but the scoring of each of these
tests does not rely on a verbal response. These subgroupings are historical
rather than empirical. By this I mean that the existence of VIQ and PIQ as
scores was based upon David Wechsler’s belief that these divisions would be
meaningful and not based upon research findings that support this distinction.
Empirical studies (factor analyses) of the WAIS-III in fact support four
composites or indices and these form the true basis of interpreting the
WAIS-III in the clinical setting. What is lucky for us is that the four
constructs, in principle, reflect a subgrouping of each of VIQ and PIQ into two
smaller divisions. I say, in principle, because in each case one of the subtests
omitted from the IQ score is included in an index score (Letter Number
Sequencing is omitted from VIQ but is part of WMI, and Symbol Search is omitted
from PIQ but is part of PSI).
So in proceeding from
the most reliable to the least reliable measures we begin with FSIQ, proceed to
VIQ and PIQ, and then to the Index scores VCI, POI, WMI, and PSI. After, and
only after, that you can proceed to consider individual subtests if they
address relevant hypotheses or in order to examine the integrity of index
scores.
The last two columns contain
the 90% confidence interval for a score on that measure at retest. The Lo and
Hi columns indicate the lower and upper limits of the range. So we can be 90%
sure that because Dr. Alvarez got a FSIQ of 138 at his initial testing that
upon retest it will fall somewhere between 132 and 142. The implication of this
is that if we tested Dr. Alvarez some time later and he got a FSIQ of less than
132 or more than 142 we would have direct evidence that his score has changed
(declined if less than 132, or improved if greater than 142). As you can see
these last two columns would only be used if a second testing had been
performed – more on this later!
Test – 90%CI
Retest – 90%CI
Verbal 144 99.8 139 147 137 149
Performance 121 92 114 125 111 128
Full Scale 138 99 134 141 132 142
Indices
POI 123 94 115 127 112 130
WMI 117 87 110 122 108 124
PSI 99 47 92 107 87 111
So, with the
information above we can describe Dr. Alvarez’ performances on the three IQ
measures and four index scores of the WAIS-III indicated in the table above.
Remember that only other psychologists are going to know all this jargon, so
rather than referring to VCI you would talk about a measure of his verbal
comprehensive and expressive abilities. Rather than WMI you would say something
like a measure of attention, concentration, and the ability to efficiently
perform mental operations.
Now the next
topic relates to the detection of abnormality in the profile of scores and uses
the two tables below. We have already discussed the idea of FSIQ reflecting Dr.
Alvarez’ overall level of functioning. To test this idea we need to consider
that FSIQ may not be made up of 11 homogeneous test scores but rather
systematic differences that average out to the FSIQ score. The first way of
addressing this is to examine the subsets of FSIQ, VIQ and PIQ. If VIQ and PIQ
differ significantly, then FSIQ does not represent a homogeneous level of
functioning. VIQ for Dr. Alavarez is 144, and PIQ is 121. The difference between
these two measures is (144 – 121 = ) 23. If we consult the table of significant
differences between WAIS-III composite scores for Dr. Alvarez’ age group we
find that a 7.90 difference would be significant at p=.05. The 23 point
difference is much larger than this so we can confidently assert that VIQ and
PIQ differ significantly.
Differences Between IQ Scores and Between Index Scores Required for Statistical Significance at the .05 Level of Significance for the 55 to 64 Age Group
|
|
VIQ – PIQ |
VCI – POI |
VCI – WMI |
POI – PSI |
VCI – PSI |
POI – WMI |
WMI – PSI |
|
p=.05 |
7.90 |
8.54 |
9.08 |
11.53 |
10.91 |
9.81 |
11.93 |
But what does
this mean? A test of significance is testing the likelihood that the two scores
came from the same distribution. Do not be swayed by the apparent size of the
difference. Just looking at the significance table for the different WAIS-III
composites shows differing values form 7.90 to 11.93. These values are
different because of the differences in reliability of the measures involved.
The more reliable two measures are, and the more intercorrelated theyare, then
the less a difference needed for significance. Just remember that good
reliability results in things being more easily detected and poor reliability
results in things being much harder to detect. Back to our original question
here, what does it mean that VIQ and PIQ are significantly different? The
easiest way to understand this is to turn it around. What is the null
hypothesis? That VIQ and PIQ are identical (no difference between them). A
significant difference means we reject the null and infer that the two numbers
are not the same. Nothing more, nothing less. Knowing that VIQ is significantly
different from PIQ and knowing that VIQ is larger than PIQ means that we can
infer that Dr, Alvarez has better verbal abilities than visual/graphomotor
abilities (remember VIQ and PIQ reflect the modalities of the tests).
Disappointed?
Hoping for more? What you really probably wanted to know was whether or not the
difference is clinically meaningful! This is not addressed by significance
(although this is not always true) but rather by abnormality. We address
abnormality by doing a “head-count”. This is the testing equivalent of asking
“OK! Hands up all those people who did …”. How common is a 23 point difference
between VIQ and PIQ in a person who has a FSIQ in the very superior range. That question can be answered by the table
below – 4% of the standardization sample.
Frequencies of
Differences Between WAIS-III IQ and Index Scores for Individuals with FSIQ >
120
|
|
Difference |
Frequency |
|
VIQ – PIQ |
23 |
4.0% |
|
VCI – POI |
27 |
2.9% |
|
VCI – WMI |
33 |
0.8% |
|
POI – PSI |
24 |
6.9% |
|
VCI – PSI |
51 |
<1.3% |
|
POI – WMI |
6 |
34.3% |
|
WMI - PSI |
21 |
12.6% |
This raises the
next big question – how rare is rare? All clinicians who use psychological
tests have to ultimately make a decision about this. I can tell you what I do
and why. Other decisions are not wrong but like anything else in life there are
consequences to what we decide. For tests of significance I use p<.05 as the
standard for a statistically significant difference with a two-tailed test. For
abnormality, I consider that anything that occurs with a frequency of 1 in 20
or less (5%) is sufficiently rare to call the behaviour abnormal. I will
confess, however that analyses that have a frequency of 6-10% are of particular
interest to me. I am not permitted to change my criterion when I have a
behaviour that occurs with a frequency between 6 to 10%, but what I can do is
keep it in the back of my mind and pay particular attention to any opportunities
that may arise to test the hypothesis (i.e. is it abnormal or normal). One
other comment here, abnormality refers only to infrequency or rarity of the
behaviour, it does not tell you whether or not the behaviour is impaired. For
example, an accountant is likely to be highly skilled at mental arithmetic so
his score on the WAIS-III Arithmetic subtest is likely to be abnormally higher
than other scores. This abnormality will be detected during data analysis, but
does not signal impairment. Some abnormalities are good – we call them skills,
some abnormalities are bad- we call them deficits. You will need to determine
when an abnormality is signaling a skill versus an impairment. The final
criterion is the confidence interval applied to test scores. I use 90%
confidence because I am happy to have an error rate of 5% at either end of the
distribution. Too lax a criterion (68%) will result in too narrow a band of
scores while too strict a criterion (99%) will result in a band that is too
wide to be of use (i.e. I can be 100% sure that the score you got on the test
was one of the scores that you can get on the test! Well, duh!).
So we have a significant and abnormal difference between
VIQ and PIQ. VIQ is better than the 99th %ile and PIQ is at the 92nd
%ile. Does this look like impairment? We would not really be surprised that an
English professor would have abnormally high verbal abilities.
We now repeat this process for the Index scores of the
WAIS-III. Six comparisons can be made: VCI with POI, VCI with WMI, VCI with
PSI, POI with WMI, POI with PSI, and WMI with PSI. Essentially we are looking
for significant differences to indicate where performance levels differ and
then for those statistically significant differences we then examine the
frequency of the difference in order to detect abnormal differences. Note that
all but one of the comparisons (POI with WMI) are statistically significant,
again indicating that this individual has distinctly different performance
levels for each of the cognitive domains assessed on the WAIS-III: verbal
comprehension, visual organization, attention/concentration, and speed of
information processing. Notice also that only comparisons with VCI are
associated with frequencies of less than 5% in the standardization sample. This
is revealing a consistent pattern – the only thing abnormal about Dr. Alvarez’s
cognitive performances on the WAIS-III relate to his extraordinarily high
verbal abilities consistent with his occupation.
The next table, below, can be used to examine the
individual subtests for relative strengths and weaknesses. The reasoning behind
this analysis goes as follows. A relative strength is a performance on a
subtest that is comparatively higher than other subtests. Similarly a relative
weakness can be seen on those measures that are comparatively lower relative to
other measures. Each subtests is usually compared to the mean of subtests. The
question is how do we compute the mean? This goes back to the discussion
regarding the representativeness of FSIQ. Simply put, if there is no difference
between VIQ and PIQ then the average of all the subtests administered can be
used. If there is a difference between VIQ and PIQ (as there is in this case)
then separate means must be computed for verbal and performance subtests.
Differences Between Single Subtest Scaled Scores and Mean Scaled Score at the .05 Level of Statistical Significance and Magnitude of Difference Found in 5% of the Standardisation Sample
|
|
Verbal |
Subtests |
Performance |
Subtests |
All |
Subtests |
|
Subtest |
p<.05 |
5% |
p<.05 |
5% |
p<.05 |
5% |
|
VO |
2.10 |
3.00 |
|
|
2.30 |
3.38 |
|
SI |
2.77 |
3.29 |
|
|
3.12 |
3.69 |
|
AR |
2.63 |
3.57 |
|
|
2.95 |
3.85 |
|
DS |
2.40 |
4.43 |
|
|
2.67 |
4.62 |
|
IN |
2.34 |
3.29 |
|
|
2.59 |
3.69 |
|
CO |
2.96 |
3.57 |
|
|
3.35 |
3.58 |
|
LNS |
3.16 |
4.29 |
|
|
3.60 |
4.38 |
|
PC |
|
|
3.16 |
3.86 |
3.46 |
4.31 |
|
CD |
|
|
3.04 |
4.29 |
3.31 |
4.46 |
|
BD |
|
|
2.94 |
3.71 |
3.19 |
3.92 |
|
MR |
|
|
2.60 |
3.71 |
2.75 |
3.85 |
|
PA |
|
|
3.75 |
4.14 |
4.19 |
4.46 |
|
SS |
|
|
3.54 |
3.86 |
3.93 |
4.23 |
The mean for the seven verbal subtests is 15.7. The mean for the six performance subtests is 12.7. We now compare each subtest. Subtracting the verbal mean from each of the subtests and the performance mean from each of the performance subtests indicates the yields a pattern of positive and negative difference scores. A positive value means that the subtest is above that individual’s mean score, while a negative value indicates a lower performance. Comparison of these differences with those in the table above indicate which are significantly different from their respective means. Note that the two rightmost columns are not used in this case because VIQ and PIQ differ significantly. The first thing you are looking for is whether or not each number differs significantly from it’s mean, indicated by an absolute difference greater than or equal to the cut-scores indicated in the table (we are only using columns 2 and 4 here to detect significant differences). For those measures thatare significantly different from their respective means, those that are positive are “relative strengths (S)” and those that are negative are “relative weaknesses (W)”. These findings can be used to describe where Dr. Alvarez’ strengths and weaknesses lie in terms of the behaviours assessed by each subtest.
Verbal Subtests |
SS |
SS-Mn |
S/W |
Performance Subtests |
SS |
SS-Mn |
S/W |
|
Vocabulary |
18 |
2.3 |
S |
Picture Completion |
14 |
.3 |
|
Similarities |
18 |
2.3 |
|
Digit Symbol-Coding |
9 |
-3.7 |
W |
|
Arithmetic |
15 |
-0.7 |
|
Block Design |
12 |
-0.7 |
|
|
Digit Span |
12 |
-3.7 |
W |
Matrix Reasoning |
15 |
2.3 |
|
|
Information |
18 |
2.3 |
|
Picture Arrangement |
15 |
2.3 |
|
|
Comprehension |
17 |
1.3 |
|
Symbol Search |
11 |
-1.7 |
|
|
Lett.-Num. Seq. |
12 |
-3.7 |
W |
|
|
|
|
|
Mean |
15.7 |
|
|
Mean |
12.7 |
|
|
A note about the subtest scores. The table below reproduces the subtests scores provided fro Dr. Alvarez. Each subtest is named and two numbers are provided. In the first column of numbers the acronym SS in this case stands for Scaled Score. Scaled scores have a mean of 10 and a standard deviation of 3 and are adjusted for the age of the information (usually termed age scaled scores). The second column of numbers indicates the percentile rank of the scaled score. For example, Dr. Alvarez’ score on the Vocabulary subtest was 18 which indicates that his knowledge of the meaning of words is as good as or better than 99% of people of his age.
Vocabulary 18 99 Picture
Completion 14 91
Arithmetic 15 95 Block
Design 12 75
Digit Span 12 75 Matrix
Reasoning 15 95
Information 18 99 Picture
Arrangement 15 95
Comprehension 17 99 Symbol
Search 11 63
Lett.-Num. Seq. 12 75
The next table adds a little to the
confusion and is strictly not necessary in the current evaluation. One of the
hypotheses in this case is that Dr. Alvarez could be experiencing difficulties
associated with a dementing disorder. Since such a condition requires the
demonstration of a decline in cognitive/intellectual functioning this would be
technically possible only in the situation where you have tested the person
twice (in order to show a decline). Nonetheless, clinicians will want to evaluate
this hypothesis as best they can on the first assessment. Even though a second
testing has not been conducted yet, the clinician still needs two numbers to
examine for a change in level of functioning. A second testing would provide
the second value. On the first testing, clinicians attempt to “back-generate” a
hypothetical earlier value which representes where the person would be if they
had no disorder or condition. This is called estimating premorbid levels of
functioning. Think of it this way: I have tested the client and need a value
from his past that will accurately tell me what he was like before his
difficulties. There are two primary ways of getting such values. The first is
to use information that is not related to the pathology such as demographic
characteristics of the individual. Such an approach asks the question “What
should your FSIQ be given that you are a male in your 60’s with doctoral level
education and employment as an academic. The first set of figures in the table
below addresses this issue. For example, based upon Dr. Alvarez’ demographic
characteristics we would estimate that his FSIQ should be 120 – actually we are
90% sure that the score will be somewhere between 102 and 138. Dr. Alvarez
actually got a FSIQ of 138 which is within this expected range. Consequently
there is littleevidence to suggest that Dr. Alvarez’ FSIQ has gone down from
where we would expect it to have been.
The second set of
figures are derived from the test that was discussed earlier and represents the
other main approach to estimating premorbid abilities. In this approach we use
a test that we believe is unlikely to be affected by whatever is wrong with Dr.
Alvarez. Although this is a current measure we are using it to “guesstimate”
what he would have looked like before his difficulties. This approach is much
more restricted in terms of what you can actually predict. Since the NART is a
word pronunciation test it makes sense that we can only estimate verbal
abilities. So, Dr. Alvarez’ score on the NART-2 estimates his premorbid VIQ to
be 122 (90% sure that it falls somewhere between 108 and 136). Interestingly
Dr. Alvarez’ actual VIQ is 144 which is outside that 90% confidence band.
Remember that a current score LOWER than the 90% confidence interval would
signal decline or deterioration. This is not the case here. Dr. Alvarez’ verbal
abilities are higher than we would expect for most people of his age which
merely indicates the skill in verbal ability that we have already detected
(although it does supply confirmatory evidence from another source of
information).
90% Confidence Band
Verbal 125 108 142
Performance 125 103 147
Full Scale 120 102 138
Verbal
Comprehension 126 109 144
Perceptual
Organisation 127 103 151
Working Memory 124 100 149
Processing Speed 117
95 138
WAIS-III VIQ 122 108 136
WAIS-III VCI 119 106 132
The third edition of the Wechsler Memory Scale contains a number of subtests designed to assess both short-term and long-term memory functioning. The nomenclature of measures on this test can be confusing and mixes term from classical and modern memory theory. Unlike the WAIS-III there is no one overall measure comparable to FSIQ although there are measures similar to VIQ and PIQ. In this description of the test I will only focus on those measures necessary to generate the respective Index scores and will not include the optional subtests.
The WMS-III essentially measures three abilities: the ability to recall information shortly after its presentation (Immediate Memory), the ability to recall this same information after a 20 to 30 minute delay (General Memory), and the ability to attend and concentrate (Working Memory). Within each of the first two measures (Immediate and General Memory) there are subdivisions that are based upon whether or not the test is verbally or visually administered and whether or not the examinee had to recall or only recognize the information that was presented. In the case of this assignment these subdivisions are not relevant to either the analysis or interpretation. They will be discussed here for the sake of thoroughness.
The Immediate Memory Index consists of two verbal subtests (comprising Auditory Immnediate) and two visual subtests (comprising Visual Immediate). The first verbal subtest, Logical Memory, involves the presentation of two stories which are then to be repeated back by the examinee in as great detail as possible. The second story is presented a second time and recall is tested to get a gross measure of learning. The second verbal subtest, Verbal Paired Associates, presents 8 pairs of words which are read aloud to the examinee. Each word pair is an uncommon pairing of words (such as flower-paperclip) and recall is tested by the examiner supplying the first word of the pair (flower) and the examinee must supply the second word of the pair (paperclip). Four trials are administered with the examiner rereading the list of word pairs each time and testing cued recall.
The two visual subtests are Faces and Family Pictures. In
Faces, the examinee views 24 photographs of faces and is then asked to
determine which 24 out of a further 48 faces they have seen before. The Family
Pictures subtest introduces the examinee to seven characters (Grandmother,
grandfather, father, mother, son, daughter, and dog – I know, I know, blatant
discrimination against cat people!). The examinee is then shown four scenes in
which different family members are doing differenmt in things in different
parts of the picture. After seeing all four scenes, the examinee is required to
indicate who was in each picture, where were they, and what they were doing.
The General Memory Index consists of the delayed recall
trials (administered approximately 20 to 30 minutes after the immediate recall)
of these same subtests. Auditory Delayed consists of delayed recall of the
Logical Memory stories and the Verbal Paired Associates. Visual Delayed
consists of the delayed trials of the Faces and Family Pictures tests. A third
measure Auditory Recognition Delayed consists of the recognition trials of
Logical Memory and Verbal Paired Associates that are presented after their
delayed recall. General Memory Index is therefore made up of Auditory Delayed,
Visual Delayed, and Auditory Recognition Delayed measures.
The Working Memory Index consists of the Letter-Number
Sequencing subtest of the WAIS-III and a visual span task called Spatial Span.
This task is very much like digit span except that rather than repeating
forwards or backwards number sequences read by the examiner, the examinee taps
out (forwards or backwards) a series of patterns tapped out by the examiner on
a board with ten blocks in various positions. One thing to note is that the
Letter-Number Sequencing score used in this Working Memory Index is not from a
second administration of the test but rather is exactly the same number as was
used in the WAIS-III. I know it seems insane, the reasons are too complicated
to discuss here – sufficient to say that the WMI on the WAIS-III and the WMI on
the WMS-III are far from independent assessments of the same construct.
Dr. Alvarez’ scores on the WMS-III were 98 for the immediate recall of information, 81 for the recall of information after a 20 to 30 minute time delay, and 115 for his ability to attend and concentrate. These performances were as good as or better than 45%, 10%, and 84% of people of his age and corresponded to performances in the average, low average, and high average ranges respectively.
Test – 90%CI
Retest – 90%CI
Immediate Memory 98 45 92 105 88 108
General Memory 81 10 76 89 72 94
Working Memory 115 84 105 121 101 125
There are three comparisons that can be made here, the most important of which is the IM comparison with GM. This addresses the issue of whether or not Dr. Alvarez’ memory performances are detrimentally affected by increasing the delay between presentation and recall. There is a 17 point difference between Dr. Alvarez immediate IM) and delayed (GM) recall. This difference is significant (critical value of 12.8) and abnormal (estimated to occur in approximately 1% of the standardization sample). This indicates that Dr. Alvarez’ ability to recall information after a time delay is abnormally poor relative to his ability to recall information when it is first presented. The comparisons of IM and GM with Working Memory (WM) also indicate significant differences but only an abnormal difference for delayed recall (GM). Note that the difference between IM and WM, while significant is found in almost 1 in every four people (15%). This indicates that while Dr. Alvarez’ immediate recall may not be unusual for a man of his attentional abilities, his delayed recall is not.
Differences Between WMS-III Primary Index Scores for Statistical Significance at the .05 Level of Significance for the 55 to 64 Age Group
|
|
IM – GM |
IM – WM |
GM – WM |
|
p=.05 |
12.8 |
13.8 |
14.1 |
Frequencies of Differences Between WMS-III Primary Index Scores
|
Comparison |
Difference |
Frequency |
|
IM – GM |
17 |
1.1% |
|
IM – WM |
-17 |
15.5% |
|
GM – WM |
-34 |
1.7% |
While these findings strongly indicate impairment in
delayed recall (GM) there is another method that we can use to determine
whether or not his immediate memory is appropriate. Consider that this man is a
university professor – one might expect that having a god memory would be
something that would be needed in being an effective scholar and teacher. We
address this issue by asking whether or not Dr. Alvarez’ WMS-III scores are
normal for a man with his intellectual abilities (as indicated by his FSIQ).
The table below (again altered to remove the unnecessary comparisons) addresses
this question. Based upon Dr. Alvarez’ FSIQ score we would expect that his
predicted IM and GM scores would be 122 and 123 respectively. He actually
obtained 98 and 81 for these measures yielding differences of 24 and 42 points
between estimated and obtained scores respectively. Both of these differences
are significant and abnormal. This indicates that Dr. Alvarez’ scores on memory
testing are abnormally low for a man of his cognitive/intellectual abilities.
Comparisons of WAIS-III and WMS-III Composites Using Predicted Difference Method (Based Upon a FSIQ of 138)
|
WMS-III Index |
Predicted |
Obtained |
Difference |
p=.05 |
Frequency |
|
Immediate Memory |
122 |
98 |
24 |
17.1 |
3% |
|
General Memory |
123 |
81 |
42 |
15.2 |
<1% |
We can now look
at the testing from eighteen months later. This analysis is the most
straightforward but it requires you to work with information from multiple
table. Again for simplicity’s sake I will remove the irrelevant WMS-III
measures.
Now this is where those retest confidence intervals from
the first testing come into the analysis. The retest confidence interval allows
us to determine whether or not scores on the second testing have deteriorated,
stayed the same, or improved. You will remember that this determination is
critical to our differentiating among the alternative hypotheses.
Verbal 138 99 Immediate
Memory 86 18
Full Scale 132 98 Working Memory 108 70
POI 118 88
WMI 115 84
PSI 103 58
We will now generate a table that combines the information from our first testing and second testing. We need the 90% RETEST confidence bands from the FIRST testing, the actual scores from the SECOND testing and then consideration of where these scores fall relative to the retest bands. Scores that fall within the retest band are unchanged. Those below it, have declined, those above it have improved.
|
|
90% RETEST CI |
Retest |
|
|
|
WAIS-III/WMS-III Measures |
Lo |
Hi |
Score |
Status |
|
Verbal IQ |
137 |
149 |
138 |
Unchanged |
|
Performance IQ |
111 |
128 |
117 |
Unchanged |
|
Full Scale IQ |
132 |
142 |
132 |
Unchanged |
|
VCI |
141 |
155 |
145 |
Unchanged |
|
POI |
112 |
130 |
118 |
Unchanged |
|
WMI |
108 |
124 |
115 |
Unchanged |
|
PSI |
87 |
111 |
103 |
Unchanged |
|
|
|
|
|
|
|
Immediate Memory |
88 |
108 |
86 |
Declined |
|
General Memory |
72 |
94 |
69 |
Declined |
|
Working Memory |
101 |
125 |
108 |
Unchanged |
This analysis indicates that only immediate memory and delayed recall (General Memory) measures have declined, all other measures have remained unchanged. Remember I said earlier that I keep in mind those differences that are not infrequent enough to be abnormal (<5%) but close enough to worry about (<10%). This was the case with Dr. Alvarez’ processing speed index from the WAIS-III. The question I was asking myself was whether this was a normal score for this man, or perhaps an abnormal score that isn’t quite bad enough YET for me to detect it. Examination of this score on retest resolves my concerns. PSI was clearly normal range variation as it has remained unchanged on the second testing. If it too had declined then this would suggest impairment in another cognitive domain.
All that is
really left now is to go back to look at how we expected the different
hypotheses to manifest on psychological testing. This will help is determine
what is the most likely explanation. I will not do this for you (I have to
leave some fun for you!).
This is a
demanding assignment because it asks you to grasp a lot of information to which
you have not been formerly exposed. However, this is a good model of exactly
how 21st century assessors go about analysing and interpreting
psychological test data. Before you feel too hard done by, consider the small
number of variables that you have to consider in this case from only two tests.
In a routine assessment in my own clinical practice more than twenty tests are
customarily administered each with many scales and subscales. The process you
have learned here is identical to that used to analyse twenty tests or two
hundred variables – it is just more complicated. The other warning I should
give is that this case has been structured and oversimplified. It is not always
the situation that cases as straightforward as this one do not occur, but they
are few and far between. Most cases requiring psychological assessment are
highly complex and full closure is seldom achieved. The process is the same as
in this case, but the outcome is rarely as neat.
This concludes
the ancillary materials provided to assist you in completing assignment 2.
Enjoy!
Dr. Graeme Senior
Senior Lecturer
Department of
Psychology
University of
Southern Queensland