Pearson

Assessments for Specialized Education Needs

 




 
 Search this site
 All Pearson's Assessment group

 
 


  You are here: Home | Glossary of Terms Index | Statistical Terms


Statistical Terms



PRINTPrinter-Friendly Version

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Arithmetic Mean:

    A kind of average usually referred to as the mean. It is obtained by dividing the sum of a set of scores by their number.

Average:

    A general term applied to the various measures of central tendency. The three most widely used averages are the arithmetic mean (mean), the median, and the mode. When the term "average" is used without designation as to type, the most likely assumption is that it is the arithmetic mean.

Central Tendency:

    A measure of central tendency provides a single most typical score as representative of a group of scores; the "trend" of a group of measures as indicated by some type of average, usually the mean or the median.

Confidence Interval:

    A sample-based estimate as an interval or range of values within which the true or target population value is expected to be located (with a specified level of confidence given as a percentage).

Construct Validity:

    A term used to indicate that the test scores are to be interpreted as indicating the test taker's standing on the psychological construct measured by the test. A construct is a theoretical variable inferred from multiple types of evidence, which might include the interrelations of the test scores with other variables, internal test structure, observations of response processes, as well as the content of the test. In the current standards, all test scores are viewed as measures of some construct, so the phrase is redundant with validity. The validity argument establishes the construct validity of a test.

Content Validity:

    A term used in the 1974 Standards to refer to a kind or aspect of validity that was "required when the test user wishes to estimate how an individual performs in the universe of situations the test is intended to represent" (p. 28). In the 1985 Standards, the term was changed to content-related evidence emphasizing that it referred to one type of evidence within a unitary conception of validity. In the current Standards, this type of evidence is characterized as "evidence based on test content."

Correlation:

    The tendency for certain values or levels of one variable to occur with particular values or levels of another variable.

Correlation Coefficient:

    A measure of association between two variables that can range from -1.00 (perfect negative relationship) to 0 (no relationship) to +1.00 (perfect positive relationship).

Distribution (Frequency Distribution):

    A tabulation of the scores (or other attributes) of a group of individuals to show the number (frequency) of each score, or of those within the range of each interval.

Factor:

  1. Any variable, real or hypothetical, that is an aspect of a concept or construct.
  2. In measurement theory, a statistical dimension defined by factor analysis.
  3. In mental measurement, a hypothetical trait, ability, or component of ability that underlies and influences performance on two or more tests and hence causes scores on tests to be correlated. The term "factor" strictly refers to a theoretical variable, derived by the process of factor analysis from a table of interrelations among tests. However, it is also used to denote the psychological interpretation given to the variable - i.e., the mental trait assumed to be represented by the variable, as verbal ability, numerical ability, etc.

Factor Analysis:

    Any of several statistical methods of describing the interrelationships of a set of variables by statistically deriving new variables, called factors, that are fewer in number than the original set of variables. Factor analysis reveals how much of the variation in each of the original measures arises from, or is associated with, each of the hypothetical factors. Factor analysis has contributed to an understanding of the organization or components of intelligence, aptitudes, and personality; and it has pointed the way to the development of "purer" tests of several components.

Generalizability Coefficient:

    A reliability index encompassing one or more independent sources of error. It is formed as the ratio of (a) the sum of variances that are considered components of test score variance in the setting under study to (b) the foregoing sum plus the weighted sum of variances attributable to various error sources in this setting. Such indices, which arise from the application of generalizability theory, are typically interpreted in the same manner as reliability coefficients.

Internal Consistency Coefficient:

    An index of the reliability of test scores derived from the statistical interrelationships of responses among item responses or scores on separate parts of a test.

Median (Md):

    The middle score in a distribution or set of ranked scores; the point (score) that divides the group into two equal parts; the 50th percentile. Half if the scores are below the median and half above it, except when the median itself is one of the obtained scores.

Mode:

    The score or value that occurs most frequently in a distribution.

N:

    The symbol commonly used to represent the number of cases in a group.

p-value:

    The p-value is a test item statistic that represents the percentage of students who answered that item correctly out of a particular population group. A p-value may be calculated for a national standardization sample or for a class or school level population where a test has been administered. The p-value is calculated by dividing the number of correct responses on an item by the total number of students tested. It may be expressed as a decimal value or as a percentage (by multiplying the decimal value by 100).

Predictive Validity:

    A term used in the 1974 Standards to refer to a type of "criterion-related validity" that applies "when on wishes to infer from a test score and individual's most probable standing on some other variable call a criterion" (p. 26). In the 1985 Standards, the term criterion-related validity was changed to criterion-related evidence, emphasizing that it referred to one type of evidence within a unitary conception of validity. The current Standards document refers to "evidence based on relations to other variables" that include "test-criterion relationships." Predictive evidence indicates how accurately test data can predict criterion scores that are obtained at a later time.

Reliability:

    The degree to which test scores for a group of test takers are consistent over repeated applications of a measurement procedure and hence are inferred to be dependable, and repeatable for an individual test taker; the degree to which scores are free of errors of measurement for a given group.

Reliability Coefficient:

    A coefficient of correlation between two administrations of a test. The conditions of administration may involve variation in test forms, raters or scorers, or passage of time. These and other changes in conditions give rise to qualifying adjectives being used to describe the particular coefficient, e.g. parallel form reliability, rater reliability, test retest reliability, etc.

Scale:

    1. The system of numbers, and their units, by which a value is reported on some dimension of measurement. Length can be reported in the English system of feet and inches or in the metric system of meters and centimeters. 2. In testing, scale sometimes refers to the set of items or subtests used in the measurement and is distinguished from a test in the type of characteristic being measured. One speaks of a test of verbal ability, but a scale of extroversion-introversion.

Scaling:

    The process of creating a scale or a scaled score. Scaling may enhance test score interpretation by placing scores from different tests or test forms onto a common scale or by producing scale scores designed to support criterion-referenced or norm-referenced score interpretations.

Split-Halves Reliability Coefficient:

    An internal consistency coefficient obtained by using half the items on the test to yield one score and the other half of the items to yield a second, independent score. The correlation between the scores on these two half-tests, adjusted via the Spearman-Brown formula, provides an estimate of the alternate-form reliability of the total test. The Spearman-Brown formula is a formula derived within classical test theory that projects the reliability of a shortened or lengthened test from the reliability of a test of specified length.

Standard Deviation (S.D.):

    A measure of the variability or dispersion of a distribution of scores. The most widely used measure of dispersion of a frequency distribution. It is equal to the positive square root of the population variance. The more the scores cluster around the mean, the smaller the standard deviation. For a normal distribution, approximately two thirds (68.3 percent) or the scores are within the range from one S.D. below the mean to one S.D. above the mean. Computation of the S.D. is based upon the square of the deviation of each score from the mean. The S.D. is sometimes called "sigma" and is represented by the symbol ().

Standard Error of Measurement:

    The standard deviation of an individual's observed scores from repeated administrations of a test (or parallel forms of a test) under identical conditions. Because such data cannot generally be collected, the standard error of measurement is usually estimated from group data.

T-Score:

    A derived score on a scale having a mean score of 50 units and a standard deviation of 10 units.

Test-Retest Reliability:

    A reliability coefficient obtained by administering the same test a second time to the same group after a time interval and correlating the two sets of scores.

Test-Retest Reliability Coefficient:

    A type of reliability coefficient obtained by administering the same test a second time, after a short interval, and correlating the two sets of scores. "Same test" was originally understood to mean identical content, i.e., the same form; currently, however, the term "test-retest' is also used to describe the administration of different forms of the same test, in which case this reliability coefficient becomes the same as the alternate form coefficient. In either case (1) fluctuations over time and in testing situation, and (2) any effect of the first test upon the second are involved. When the time interval between the two testings is considerable, as several months, a test-retest reliability coefficient reflects not only the consistency of measurement provided by the test, but also the stability of the examinee trait being measured.

Validity:

    The degree to which accumulated evidence and theory support specific interpretations of test scores entailed by proposed uses of a test. The capacity of a measuring instrument to predict what it was designed to predict; stated most often in terms of the correlation between values on the instrument and measures of performance on some criterion.

Variance:

    A measure of variability; the average squared deviation from the mean; the square of the standard deviation.

Z-score:

    A type of standard score scale in which the mean equals zero and the standard deviation equals one unit for the group used in defining the scale.

Reference List




 
AGS Assessments are now part of Pearson's Assessment group, a business of Pearson Education.
Customer Service: 800-627-7271 (7 AM-6 PM CST)    |    Inquiries: pearsonassessments@pearson.com

Privacy Policy | Terms & Conditions | International Distributors
© 2006, Pearson Education, Inc. or its affiliates. All rights reserved.
Pearson