By Mika Hoffman, director of assessment at Michigan Language Assessment

In the field of assessment, the concept of validity refers to whether an assessment accurately measures the intended construct. In layman’s terms, it means that it measures what it’s supposed to measure. 

A reading comprehension test is considered valid if answering correctly requires understanding the passage itself, rather than general knowledge of the passage’s topic.

Consider a challenging scientific passage about honey bee behavior. If it contains a question about where honey bees collect pollen with the answer “flowers,” individuals without a genuine comprehension of the text could still respond correctly based on their prior knowledge about bees and flowers. Ensuring that reading comprehension passages accurately measure comprehension alone is important; however, validity encompasses much more.

Validity and Decision Making

Ultimately, validity concerns the appropriateness of using test scores to inform specific decisions about individuals. 

Returning to the reading comprehension example: 

A passage requiring genuine comprehension but focused on a simple topic such as “My Family,” listing names and relationships, would be inadequate if it were the sole basis for university admission decisions. Many individuals, including beginning English learners, could perform well on such basic tasks but might still lack the skills required for university-level reading assignments.

While this example may be extreme, it highlights a key principle: validity is not a simple designation achieved by a specific score or certification. Instead, test users must assess how well a given score supports their specific decision-making process. Validity exists along a continuum; scores generally offer varying degrees of support rather than absolute validity or invalidity.

Appropriate and Inappropriate Uses of Scores 

Returning to the “My Family” example, a single-passage test might effectively help a teacher determine whether to proceed to the next instructional unit or review family-related vocabulary. It could also indicate individual students’ progress or need for extra support. However, such limited testing would not provide sufficient data for making broader decisions, such as determining semester-long course outcomes, due to the expectation that many topics must be assessed. 

Consider a more realistic scenario: A reading comprehension test includes multiple passages covering various university-relevant topics, testing grammar, vocabulary, main ideas, details, and implicit concepts, with questions strictly requiring textual comprehension rather than general background knowledge. Would such a test provide valid data for university admission decisions?

The answer depends on scoring methods and score interpretations. For example, whether scores are numerical (50-100) or letter grades (A-F), it must be clear what these scores represent regarding the necessary language proficiency for university study. 

Scoring, Interpretation, and Validity

Test providers typically offer guidelines to interpret scores. Michigan Language Assessment, for instance, aligns scores to the Common European Framework of Reference (CEFR), which describes practical language competencies. Ensuring that scores accurately reflect corresponding CEFR levels is vital, as is maintaining reliability—consistency of results for test-takers of similar abilities or across repeated test administrations. At Michigan Language Assessment, we do considerable work to make sure that alignment is accurate. 

Decision-Making and Score Precision

For universities determining whether English learners require additional language instruction, understanding test scores in CEFR terms is beneficial. However, some institutions might need more precise benchmarks within CEFR levels. Scores that provide finer distinctions may thus enhance decision-making validity.

In conclusion, validity involves alignment among test content, scoring reliability, interpretive clarity, and decision-making context. Thus, validity is a comprehensive concept dependent upon these interconnected factors.