GLOSSARY OF FREQUENTLY USED TESTING TERMS

Assessment Framework: contains the specific design elements of all test questions, and Content Category Clusters. The ATOL assessment framework can be found here.

Achievement Levels: designate levels of student performance based on the professional judgment of a panel of educators. ATOL has three achievement levels: Below Mastery, Mastery, and Honors. ATOL

Constructed Response: a type of test item that requires the student to enter a digit, number, word or sentence.

Content Category Cluster: the number of points earned in each group of questions. These clusters or strands of related statements describe what students should know and be able to do.

Developmental Scale: a test scale designed to measure a student’s annual progress from one grade to the next grade on the same scale. The ATOL does not use a Developmental Scale Score but the CELLA does use this scaling.

Standards Based Test: a test that measures how well a student has learned specific content area skills.

Mean: the average of a group of scores.

Median: the midpoint of a group of scores. One half of the scores fall above the median; the other half fall below the median.

Multiple Choice: a type of test question that presents a student with several options from which to choose the correct response.

Norm Referenced: a test that compares a student’s performance against how other students in a norm group did on the test. Examples include the off the shelf commercial tests such as the Stanford Achievement Test (SAT).

Open Response: see Constructed Response.

Percentile Rank: indicates the percentage of a reference group obtaining scores equal to or less than the score achieved by an individual. This rank indicates the relative standing of one student in comparison to students in the same grade who took the test.

Reliability: the quality of a test that indicates the degree to which a test consistently measures a particular trait. For example, reliability can be measured by administering a test to the same group of individuals on two different occasions. If the test is reliable, the results from both administrations will be consistent.

Rubric: a set of rules used to evaluate a student’s response to a constructed response item.
Scale Score - raw score that has been converted to a scale. Scale scores are suitable
for comparisons different test levels or test forms of the same subject area.

Selected Response: see Multiple Choice.

Short Response Item: an item that requires students to write a response or show a solution.

Standardized Test: a test in which the directions, time limits, materials, and scoring procedures are designed to remain constant each time the test is administered in order to ensure comparability of scores. Standardized tests can be either standards based, criterion referenced, or norm referenced. The ATOL and the CELLA are standardized.

Validity: the quality of a test that indicates the degree to which a test actually measures what it is intended to measure. In order for a test to be valid, it must first be reliable.