Pearson

Assessments for Specialized Education Needs

 




 
 Search this site
 All Pearson's Assessment group

 
 


  You are here: Home | Glossary of Terms Index | Assessment Terms


Glossary of Terms



PRINTPrinter-Friendly Version

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Ability Testing:
    The use of standardized tests to evaluate the current performance of a person in some defined domain of cognitive, psychomotor, or physical functioning.

Achievement Testing:

    A test to evaluate the extent of knowledge or skill attained by a test taker in a content domain in which the test taker had received instruction.

Age Equivalent:

    The chronological age in a defined population for which a given score is the median (middle) score. Thus, if children 10 years and 6 months of age have a median score of 17 on a test, the score 17 is said to have an age equivalent of 10-6 for that population.

Alternate Forms:

    Two or more versions of a test that are considered interchangeable, in that they measure the same constructs in the same ways, are intended for the same purposes, and are administered using the same directions. Alternate forms is a generic term used to refer to any of three categories. Parallel forms have equal raw score means, equal standard deviations, equal error structures, and equal correlations with other measures for any given population. Equivalent forms do not have the statistical similarity of parallel forms, but the dissimilarities in raw score statistics are compensated for in the conversions to derived scores or in form-specific norm tables. Comparable forms are highly similar in content, but the degree of statistical similarity has not been demonstrated.

Analytic Scoring:

    A method of scoring in which each critical dimension of performance is judged and scored separately, and the resultant values are combined for an overall score. In some instances, scores on the separate dimensions may also be used in interpreting performance. See holistic scoring.

Aptitude Test:

    A test that estimates future performance on other tasks not necessarily having evident similarity to the test tasks. Aptitude tests are often aimed at indicating an individual's readiness to learn or to develop proficiency in some particular area if education or training is provided. Aptitude tests sometimes do not differ in form or substance from achievement tests, but may differ in use and interpretation. See also ability test and achievement test.

Battery:

    A group of several tests standardized on the same sample population so that results on the several tests are comparable. (Sometimes loosely applied to any group of tests administered together, even though not standardized on the same subjects.) The most common test batteries are those of school achievement, which include subtests in the separate learning areas.

Cognitive Assessment:

    The process of systematically gathering test scores and related data in order to make judgments about an individual's ability to perform various mental activities involved in the processing, acquisition, retention, conceptualization, and organization of sensory, perceptual, verbal, spatial, and psychomotor information.

Composite Score:

    A score that combines several scores according to a specified formula.

Construct:

    The concept or the characteristic that a test is designed to measure.

Construct Domain:

    The set of interrelated attributes (e.g., behaviors, attitudes, values) that are included under a construct's label. A test typically samples from this construct domain.

Constructed Response Item:

    An exercise for which examinees must create their own responses or products rather that choose a response from an enumerated set. Short-answer items require a few words or a number as an answer, whereas extended-response items require at least a few sentences.

Criterion-Referenced Test:

    A test that allows its users to make score interpretations in relation to a functional performance level, as distinguished from those interpretations that are made in relation to the performance of others. Examples of criterion-referenced interpretations include comparison to cut scores, interpretations based on expectancy tables, and domain-referenced score interpretations.

Derived Score:

    A score to which raw scores are converted by numerical transformation (e.g., conversion of raw scores to percentile ranks or standard scores).

Decile:

    Any one of the nine points (scores) that divide a distribution into ten parts, each containing one-tenth of all the scores of cases; every tenth percentile. The first decile is the 10th percentile, the eighth decile the 80th percentile, etc.

Diagnostic Test:

    A test used to "diagnose" or analyze; that is, to locate an individual's specific areas of weakness or strength, to determine the nature of his weaknesses or deficiencies, and, wherever possible, to suggest their cause. Such a test yields measures of the components or subparts of some larger body of information or skill. Diagnostic achievement tests are most commonly prepared for the skill subjects.

Expected Growth:

    The average amount of change in test scores that occurs over a specified time interval for individuals with certain individual characteristics such as age or grade level.

Factor:

  1. Any variable, real or hypothetical, that is an aspect of a concept or construct.
  2. In measurement theory, a statistical dimension defined by factor analysis.
  3. In mental measurement, a hypothetical trait, ability, or component of ability that underlies and influences performance on two or more tests and hence causes scores on tests to be correlated. The term "factor" strictly refers to a theoretical variable, derived by the process of factor analysis from a table of interrelations among tests. However, it is also used to denote the psychological interpretation given to the variable - i.e., the mental trait assumed to be represented by the variable, as verbal ability, numerical ability, etc.

Factor Analysis:

    Any of several statistical methods of describing the interrelationships of a set of variables by statistically deriving new variables, called factors, that are fewer in number than the original set of variables. Factor analysis reveals how much of the variation in each of the original measures arises from, or is associated with, each of the hypothetical factors. Factor analysis has contributed to an understanding of the organization or components of intelligence, aptitudes, and personality; and it has pointed the way to the development of "purer" tests of several components.

Grade Equivalent:

    The school grade level for a given population for which a given score is the median score in that population. Grade Equivalent scores are useful primarily because of three characteristics: 1) they indicate the developmental level of the pupil's performance, 2) they may be averaged for the purpose of making group comparisons, and 3) they are suitable for measuring growth. For example, if a student obtains a grade equivalent score of 6.3 on a math test we would say that his raw score is equivalent to the average raw score obtained by students in the norm group who were in their third month of the sixth grade. A grade equivalent score does not equate to performance in the classroom. Grade equivalents are the first step in the further analysis of raw data. All subsequent statistics are directly related to the grade equivalent.

Holistic Scoring:

    A method of obtaining a score on a test, or a test item, based on a judgement of overall performance using specified criteria. In holistic scoring, raters evaluate the effectiveness of responses in terms of a set of overall descriptions of categories relevant for responses to the task -- be it a written response, an oral response, or some other performance task (i.e., constructed response). The scoring process is holistic in that the score assigned to an examinee's performance reflects the overall effectiveness of the examinee response.

Normal Curve Equivalents (NCE):

    Normal Curve Equivalents are normalized standard scores with a mean of 50 and a standard deviation of 21.06. The range of NCEs is from a score of 1 corresponding to a percentile rank of 1.0 to a score of 99 corresponding to a percentile rank of 99.0. NCEs have little direct normative meaning to the typical user. To interpret NCEs it is necessary to relate them to other status scores based on a single reference group such as percentile ranks or stanines. For those who are accustomed to interpreting stanines, NCEs may be thought of as roughly equivalent to stanines to one decimal place. For example, an NCE of 73 may be interpreted as a stanine of 7.3. The main advantage of NCEs is that they are derived through the use of comparable procedures by the publishers of the various tests used in federal projects. NCEs used in federal evaluation must be based on empirically established norms for a particular grade and time of year. This leads to standardization and comparability of reporting procedures. This does not mean that results from different test batteries are interchangeable, however. Tests differ in content, and norms are based on different samples tested at different points in time.

Normalized Standard Score:

    A derived test score in which a numerical transformation has been chosen so that the score distribution closely approximates a normal distribution, for some specific population.

Norm-Referenced Test Interpretation:

    A score interpretation based on a comparison of a test taker's performance to the performance of other people in a specified reference population.

Norms:

    Statistics or tabular data that summarize the distribution of test performance for one or more specified groups, such as test takers of various ages or grades. Norms are usually designed to represent some larger population, such as test takers throughout the country. The group of examinees represented by the norms is referred to as the reference population.

Objective Mastery:

    These are generally associated with criterion-referenced testing though many norm-referenced tests report this information. Items are written measuring particular objectives. If enough items measuring a specific objective are answered correctly, then objective mastery is concluded. Some norm-referenced tests are written in a criterion-referenced mode so that categories of objectives can be measured. The degrees of mastery of these category objectives are reported as objective mastery.

    All three terms-norm-referenced, criterion-referenced, and objective-based-have been used as adjectives to apply to tests, purpose and interpretations. Even though most criterion-referenced interpretations involve the use of skill or item norms, subjective standards are also important. Differences in ability levels within groups of pupils call for different standards and expectations. Discrepancies between expected and actual performance should be evaluated and interpreted in light of local visions for developing the particular skill.

    It should also be noted that the difficulty of a give item depends only on the inherent difficulty of the skill tested, but also on 1) the level of mastery required by the item; 2) the setting in which the item was placed; 3) the attractiveness of the distractors, etc. For example, an item that 80% of the students in a given school answer correctly may represent a skill that is extremely important for all pupils and that should require immediate attention. On the other hand, an item that 40% of the pupils answer correctly may represent a difficult concept and with an item norm of 30% or so that only the most able and talented pupils should be expected to master.

Out-of-Level Testing:

    Administering a test that is designed primarily for people of an age or grade level above or below that of the test taker.

Percentile Rank:

    Most commonly, the percentage of scores in a specified distribution that fall below the point at which a given score lies. Sometimes the percentage is defined to include scores the fall at the point; sometimes the percentage is defined to include half of the scores at the point. Percentile ranks indicate the status or relative standing of a pupil in comparison to other pupils. The percentile rank tells the percent of pupils in a particular norm group who obtain lower scores; thus, for example, if Ann earns a percentile rank of 70 on a particular test it means she scored better than 70 percent of the pupils in the norm group and 30 percent scored as well or better than she. The scale goes from 1 to 99 percent. If three points are used to divide the scale into four equal quarters the points are called quartiles; quartile one, quartile two, and quartile three. Quartiles are points, not areas. A score does not fall in a quartile. A score can be above, at, or below a quartile. A score can be within two quartiles. There is not fourth quartile. Many people have the misconception that since there are four quarters that there should also be four quartiles, but this is not the case. Quartiles are points, not areas, so there are four areas divided by the three quartiles, but there are not four quartiles.
Quartiles
 ---- 99th
 ---- 75th
Median 50th
 ---- 25th
 ---- 1st

Power Test:

    A test intended to measure level of performance unaffected by speed of response; hence one in which there is either no time limit or a very generous one. Items are usually arranged in order of increasing difficulty.

Profile:

    A graphic representation of an individual's scores (or their relative magnitudes) on several tests (or subtests) that employ a single standard scale. See also battery.

Raw Score:

    A raw score is the number of items answered correctly on a given test. For example, if a test had 59 items and the student got 23 correct the raw score would be 23. Raw scores by themselves have little or no meaning. Raw scores are converted to 1) developmental scores such as grade equivalents or 2) status scores such as percentile rank, normal curve equivalents, or stanines in order to be interpreted meaningfully.

Reference Population:

    The population of test takers represented by test norms. The sample on which the test norms are based must permit accurate estimation of the test score distribution for the reference population. The reference population may be defined in terms of examinee age, grade, or clinical status at time of testing, or other characteristics.

Scaled Score:

    A scaled score is a score derived from the original raw score on a test. A scaled score carries mathematical properties that allow these scores to be examined in a variety of ways. Generally, it can be said that the scaled scores have a wide range and are equally intervaled. There is the same distance from one scaled score unit to the next across the entire scale. However, this does not mean that there is an equal scaled score interval between two raw score units.

Score:

    Any specific number resulting from the assessment of an individual; a generic term applied for convenience to such diverse measures as test scores, estimates of latent variables, production counts, absence records, course grades, ratings, and so forth.

Speed Test:

    A test in which performance is measured by the number of tasks performed in a given time. Examples are tests of typing speed and reading speed. Also, a test scored for accuracy where the test taker works under time pressure.

Speededness:

    A test characteristic, dictated by the test's time limits, that results in a test taker's score being dependent on the rate at which work is performed as well as the correctness of the responses. The term is not used to describe tests of speed. Speededness is often an undesirable characteristic.

Standard Score:

    A type of derived score such that the distribution of these scores for a specified population has convenient, known values for the mean and standard deviation. The term is sometimes used to signify a mean of 0.0 and a standard deviation of 1.0.

Standardization:

    1. In test administration, maintaining a constant testing environment and conducting the test according to detailed rules and specifications, so that testing conditions are the same for all test takers. 2. In test development, establishing scoring norms based on the test performance of a representative sample of individuals with which the test is intended to be used. 3. In statistical analysis, transforming a variable so that its standard deviation is 1.0 for some specified population or sample.

Stanine Scores:

    Stanine scores are normalized standard scores with a range of 1 to 9, a mean of five, and a standard deviation of two. Like percentile ranks they are status scores within a particular norm group. The first stanine is the lowest scoring group and the 9th stanine is the highest scoring group. Advocates of stanine reporting site the fact that the single digit scale is simple and convenient to use and that its use minimizes the apparent importance of small score differences. On the other hand, the stanine scale may be regarded as unnecessarily coarse particularly for relatively reliable tests. For example, all pupils scoring between the 40th and 60th percentiles are assigned a stanine of 5. However, a pupil scoring at the 59th percentile, which is in stanine 5, is probably much more similar in achievement level to a pupil scoring at the 61st percentile, stanine 6, than to one at the 41st, stanine 5. In some instances the width of the stanine band exceeds the standard error of measurement. Another reservation about the use of stanine scores is that there is evidence that skills development in the elementary schools is more variable in subjects such as reading in which the pupils have many opportunities for advancing "on their own" than they are in subjects such as mathematics in which pupil progress is more rigidly controlled through placement of concepts and processes in the curriculum. The distribution of percentages from low to high is as follows:

Distribution of Percentages from low to high

1
2
3
4
5
6
7
8
9
STANINES
4% 7% 12% 17% 20% 17% 12% 7% 4% PERCENTAGE OF CASES
Low Low Low Average Average Average Average High Average High High  

Test Modification:

    Changes made in the content, format, and/or administration procedure of a test in order to accommodate test takers who are unable to take the original test under standard test conditions.

Timed Tests:

    A test administered to a test taker who is allotted a strictly prescribed amount of time to respond to the test.

True Score:

    In classical test theory, the average of the scores that would be earned by an individual on an unlimited number of perfectly parallel forms of the same test. In item response theory, the error-free value of test taker proficiency, usually symbolized by .

Validation:

    The process through which the validity of the proposed interpretation of test scores is investigated.

Reference List