Shipley Institute of Living Scale

 


The Shipley Institute of Living Scale (SILS) was developed in 1940 by Walter Shipley and is designed to assess general intellectual functioning in adults and adolescents. It is one of the oldest tests to still be administered in its original form.

 

The SILS is a self-administered tests and consists of two subtests:

 

The Vocabulary subtest:

                                 This subtests consists of 40 multiple-choice questions in which the respondent is asked to choose which of four words is closest in meaning to a target word. Administration time for the subtest is 10 minutes. The Vocabulary subtest relies on verbal skills which include reading ability, verbal comprehension, acquired knowledge, long-term memory, and concept formation.

 

The Abstraction subtest:

                                This subtest consists of 20 questions in which sequences of numbers, letters, or words with the final element in each sequence omitted. The respondent is required to complete each of the sequences. Administration time for the subtest is 10 minutes. The Abstraction subtest relies more heavily on attentional abilities, letter, word, and number concept formation, abstract thinking, cognitive flexibility, analysis and synthesis, processing speed, long-term memory, and specific vocabulary and arithmetic skills.

 

                The rationale for the test is that pathology does not influence an individual's cognitive abilities equally. Verbal abilities such as tests of word knowledge are less vulnerable to the influences of many pathologies. In contrast, abstract reasoning is believed to be much more vulnerable to a wide variety of pathologies. This aspect of the SILS' use reflects the tests potential for detecting the presence of intellectual deterioration. Another role for which the SILS is commonly employed is the assessment of general intellectual ability. The test has the advantage of being  self-administered, appropriate for group or individual settings, and takes only a short time to complete.

 

                Note that these two roles are virtually mutually exclusive. The test cannot both measure general intellectual ability and detect cognitive deterioration simultaneously. So while the SILS still enjoys widespread use, it is not necessarily used in different settings for the same reasons.

 

                The test is currently considered appropriate for adolescents (aged 14 years or older) and adults. The test is currently not indicated for children, individuals who are poorly motivated to perform (there are no validity indicators on the test), the mentally retarded, those with profound cognitive deterioration, or those with severe psychological disturbance. Because of the heavy reliance of the test on linguistic abilities it is likely that individuals would need to be fluent in English although minimum reading level is indicated. When  using the test to assess general intellectual functioning the limited range of the regression equation for estimating IQ would suggest that the test is inaccurate for those with extremely high or extremely low IQ's.

 

                The test requires little training to administer. However, the nature of the scoring and analysis procedures would require graduate training in psychological assessment. The test is administered in two parts with a maximum completion time of ten minutes for each part. The Vocabulary subtest is completed first and then followed by the Abstraction subtest.

 

Six scores can be generated and form the basis for the analysis system:

1.        a Vocabulary score - this is computed from the total number of correct responses out of 40. As this test involves multiple-choice responses the respondent may have attained some correct responses by guessing. The number of items that are not completed is divided by four and added to the raw score total. Any fractions are rounded to the nearest whole number. This is performed as a correction factor for guessing under the assumption that had the respondent made guesses on these omitted items, they would get, on average, 1 in 4 correct. In order to interpret this score it is converted to a T-score (with a mean of 50 and a standard deviation of 10) using normative tables, which adjust for the respondent’s age.

2.        an Abstraction score - this is computed from the total number of correct responses on the 20 items of this subtest. This total score is then multiplied by 2 - presumably so that the total raw score range for both Vocabulary and Abstraction subtests ranges from 0 to 40. As with the Vocabulary subtest raw scores are converted to T scores for interpretation against a standardisation sample.

3.        a Total score - this score is computed by summing the Vocabulary and Abstraction raw scores and converted into a T-score using age-based normative tables.

4.        a Conceptual Quotient - this score is based upon Shipley's original concept of examining the ratio of the Abstraction score to the mental age of the individual as estimated by the Vocabulary score. A conversion table is utilised to generate the ratio, which is multiplied by 100 to eliminate decimals. This measure should NOT be read like a Wechsler deviation IQ. A CQ of 100 indicates a ratio of 1.0 indicating that the individual's abstract reasoning is consistent with their mental age as estimated by their Vocabulary score. CQ's of higher than 100 indicate abstract thinking in advance of one's years, and less than 100 indicates increasingly poor abstract reasoning abilities for one's  age. The CQ is rarely used simply because the concept of mental age has fallen somewhat from grace. After all an 18 year-old with the abstracting abilities of a 14 year-old may mean something, but what does it mean for a 38-year-old to have the abstracting abilities of a 30-year-old.

5.        an Abstraction Quotient - this is similar to the CQ except that both age and education are accounted for in the conversion.

6.        an estimated WAIS or WAIS-R Full Scale IQ - conversion tables are available to estimate Full Scale IQ for either the WAIS or WAIS-R from the Total raw score. This conversion takes into account the age of the respondent.

 

Computer scoring is available and a report designed to generate hypotheses and aid in interpretation is available.

 

                In interpreting the results of the SILS caution must be taken to ensure that appropriate hypotheses are being tested. A common use of the test is to detect intellectual deterioration by finding a disproportionately low abstraction score relative to the vocabulary score. However, as the Abstraction subtest is generally harder than the Vocabulary subtest, a number of potential confounding variables need to be considered, such as:

1.        Language difficulties.

2.        Low level of formal education

3.        Severe intellectual impairment

4.        Age-related shifts in verbal and abstract thinking

 

One of the things that should never be forgotten is that the SILS is far from a comprehensive multidimensional test of intelligence. It relies heavily upon verbal abilities and skills often acquired through formal education. Other factors that are known to impact upon test performance are cultural and socio-economic differences, motivation, and age. The impact of education and age can be addressed with the age and education adjusted normative data. However, the interaction of these variables with socio-economic and motivational concerns must still be considered.

 

The first level of interpretation is performed by examining T-scores for Vocabulary, Abstraction, and Total scores. The T scores permit comparison of test performance with individuals of a comparable age to the test-taker. These scores can also be represented as percentiles, which indicate the percentage of the population that are estimated to have attained a similar score. Confidence intervals can be computed for each of the measures using the standard error of measurement.

 

The interpretative strategy for the SILS is as follows:

1.        Evaluate the Vocabulary, Abstraction, and Total scores independently and in relation to one another. Examine Vocabulary and Abstraction scores for unusual or rare combinations.

2.        Examine the CQ for indications of impairment. Shipley described these cut-off scores and interpretations in 1940:

CQ

Interpretation

>90

Normal

85-90

Slightly suspicious

80-85

Moderately suspicious

75-80

Quite suspicious

70-75

Very suspicious

<70

Probably pathological

 

3.        Calculate the AQ

4.        Compute WAIS or WAIS-R Full Scale IQ estimate

5.        Integrate these results from other sources of information.

6.        Make recommendations that take into account the limitations of the SILS.

Development of the Scale:

                The SILS was developed with the intent of being self-administered. Vocabulary and Abstraction subtests were created as Shipley viewed measures of verbal ability as the best measures of premorbid functioning and viewed abstract reasoning tasks as more effective than memory tasks in their sensitivity to impairment. The multiple-choice format for Vocabulary and no more than five letters or numbers required in an Abstraction answer were chosen to keep the response requirements minimally arduous. The number of items was deliberately kept small in order to keep administration time brief.

                Preliminary vocabulary and abstraction items were chosen to sample different levels of vocabulary and reasoning skill. A sample of 462 students divided into high school freshmen, high school juniors and seniors, and college upperclassmen was used to develop the scales. Analyses were conducted to derive the 20 abstraction items and 40 Vocabulary items that best discriminated between the three groups. The items were ordered in terms of increasing difficulty and for every Abstraction item, two Vocabulary items of equivalent difficulty were selected.

 

                The original normative sample consisted of  542 4th to 8th graders, 257 high school students, and 217 college students. This normative sample clearly consists of young adults with a bias towards the youngest age groups in the distribution. Such a sample would be unable to represent older individuals or reflect age-associated changes expected in abstraction scores.

 

A revised normative sample was developed in the 1970's using 290 psychiatric patients. The mean age of this sample was 34.9 years and the average WAIS Full Scale IQ was 104.3. The sample consisted of equal numbers of males and females. This is an extremely unusual normative sample and the author justifies its use as being representative of the types of clinical populations commonly assessed with the SILS. He also argues that the means and standard deviations are characteristic of many of the other published studies using the SILS. This is perhaps one of the greatest weaknesses of the SILS and limits its effective use in other clinical and normal settings.

 

Reliability:

                Split-half reliabilities were used to evaluate the internal consistency of the SILS subtests. The manual reports corrected split-half reliabilities computed in 1940 from a sample of 322 army recruits were .87 for Vocabulary, .89 for Abstraction, and .92 for the Total score.

                Test-retest reliabilities are reported from four studies, which consisted only of undergraduates and female student nurses. Retest intervals of 4 to 16 weeks (median 12 weeks). The test-retest reliabilities ranged from .31 to .77 for Vocabulary (median = .60), .47 to .88 for Abstraction (median = .66) , and .62 to .82 for Total score (median = .78).

                Standard error of measurement statistics also provide a way of examining the reliability of a measure with smaller SEM’s indicative of more reliable measures. The SEMs reported in the manual for Vocabulary, Abstraction, and Total raw scores were 3.8, 5.9, and 6.6, respectively.

 

Validity:

                Content validity refers to the adequacy in sampling from a specific content domain. This is not evaluated in the manual. Appeals to content validity are made with reference to the careful selection of items, their demonstrated ordering in increasing difficulty, and their ability to discriminate between high school freshmen, high school juniors and seniors, and college students.

 

                Validity issues with regard to this test revolve primarily around whether or not it can perform in the way intended: either detecting cognitive deterioration; or estimating verbal intelligence. The former is primarily examined through the use of CQ as an impairment index. Shipley’s original studies compared his normative sample with that of private and state hospital psychiatric patients. Findings of decreasing CQ’s with increasing diagnostic severity was viewed as evidence for criterion-related validity. The manual also discusses the use of the Shipley with groups such as alcoholics, brain-injured, psychiatric patients, and even prisoners.

 

There is also a detailed description of how to examine unusualness of combinations of Vocabulary and Abstraction scores. This section is well written and the method is interesting but cannot be viewed as a validation per se.

 

                The role of the Shipley as a brief test of intelligence is evaluated in the standard (and somewhat tired) method of demonstrating high correlations with other test of intelligence (Of course this could mean they are all equally as bad, not necessarily equally good – but excuse me my bias is showing!). The table below summarises the median correlations of the Shipley with a variety of intelligence measures. These correlations are derived from 20 studies assessing mostly psychiatric patients and undergraduates.

 

 

Shipley Correlated with

Median r

Army General Classification Test

.77

California Short-Form Test of Mental Maturity

.68

Quick Word Test

.68

Ravens Progressive Matrices

.72

Revised Beta Exam

.55

Slosson Intelligence Test

.59

Wechsler Adult Intelligence Scale

.79

Wechsler Adult Intelligence Scale – Revised

.80

Wechsler-Bellevue

.77

Wide Range Vocabulary Test

.73

 

 

Concluding Comment: 

                As can be seen, the Shipley was built upon sound and relevant principles with clear goals and guidelines. Based upon the validity, reliability, and particularly standardisation data it appears to have a substantial way to go before it can demonstrate its utility.

 


                I am not going to discuss the implications of these strengths and weaknesses presented above for the use of the test. However, you should consider the limitations of this test in your answer to the compulsory examination question in the final.

 

Cheers,

 

Graeme Senior

 


 Monday,22 October 2001
© 2001 by Graeme Senior, Ph.D.
Senior Lecturer
Department of Psychology
University of Southern Queensland
Toowoomba, QLD 4350
Australia