As the world becomes increasingly interconnected, the need for reliable and valid language assessments has never been more crucial. Behind the scenes of these high-stakes tests lies a complex and fascinating world, one that Mika Hoffman, the director of Assessment at Michigan Language Assessment, has navigated in several different contexts.

In a recent interview, Hoffman shared her insights into the evolving landscape of language assessment, the challenges of ensuring authenticity and fairness, and the unique approach that sets Michigan Language Assessment apart.

The Journey to Assessment

Hoffman’s journey into the field of language assessment began with a degree in linguistics, a passion for learning foreign languages, and a fascination with the underlying logic and rules of language. “Linguistics is all about what it means to learn a language and what underlies it,” she explains. “I was drawn to the precision of language, the ambiguity, and what things really mean.”

Initially contemplating a career teaching English as a second language, Hoffman spent a year teaching in France but soon realized that theoretical linguistics provided a better fit for her interests. A stint at Educational Testing Service (ETS) followed, where her passion for the precision of language found a perfect home in developing items for GRE, GMAT, and TOEFL exams. Hoffman was then offered a position with the U.S. Department of Defense, where she remained for 11 years.

“The combination of looking at language, what it means, how to measure things, and how you know when somebody knows something– these are all very interesting questions for me,” Hoffman said.

Linking Past and Present

While testing professionals are always looking for ways to measure more effectively, it takes time for advances to take hold.

“Change does not always come quickly in the world of high stakes assessment,” Hoffman said. “Part of the reason behind this is that it’s important for scores to be comparable.”

While change may not be rapid, it does happen. One of the key changes Hoffman has observed in the industry is the shift toward computer-based testing. Gone are the days of uniform paper tests; today, tests often involve complex algorithms that generate unique forms for each test taker, sometimes based on individual test taker responses during the test. She noted that automation has been introduced into every aspect of assessment. This evolution isn’t just about convenience; it enhances the accuracy and efficiency of assessments.

“The advent of computer-based testing has opened up new possibilities for the way you actually do the testing,” she said.

From linear on-the-fly testing (LOFT) that generates unique test forms for each test taker, to computer-adaptive testing (CAT) that tailors the difficulty of questions to the individual test-taker’s level, technology has transformed the landscape of language assessment.

However, with these advancements come new challenges. Hoffman emphasizes the importance of ensuring that the test-taking experience is seamless and intuitive, so that candidates can focus on demonstrating their language proficiency rather than navigating the technology.

“We have to be very careful about making changes, because we need to do all the statistical work to see whether the scores are going to mean the same or what we can do to give a number that is constant, even though the test underneath has changed,” Hoffman said.

Advancements in Computer-Based Testing

The shift toward computer-based testing has also brought about other changes, such as the ability for remote testing and the development of new question types. Hoffman explained that remote testing has introduced new challenges around effective proctoring and identity verification, but the technology is constantly improving to address these concerns.

Additionally, the move to computer-based testing has opened up new possibilities for the types of tasks that can be included in the assessment.

For example, advancement facilitated by computer-based testing offers the ability to administer speaking tests via computer, providing greater flexibility and consistency. Human raters’ training and scoring reliability are easier to control when responses are recorded and managed centrally. Writing assessments also benefit from this shift, as the problem of illegible handwriting becomes obsolete when candidates type their responses.

Praising this shift, Hoffman mentioned, “We don’t have to worry about training people to be able to deliver the questions and do the rating at the same time. We can have a much more reliable rating because we’re getting a trained pool of raters who are scoring consistently, and not just every three or four months in one big round.”

Maintaining Authenticity in Testing

The huge increase in the availability of recorded sources raises the question of using authentic materials on tests.

Achieving authenticity in language assessment involves creating tasks that genuinely reflect real-world language use. Hoffman explained that assessing listening comprehension, for example, demands careful balancing. While authentic audio from the Internet or on-the-street recordings may reflect language as it is actually used, they often lack the context that enables them to be used fairly in an assessment. Thus, Michigan Language Assessment recreates these authentic sources in a way that ensures fairness and comprehensibility while preserving their real-world essence.

“When we present a listening passage or reading passage, we put everything through a lot of layers of review. One of the things our reviewers ask is, ‘is this the kind of conversation that could actually happen in real life?’ ” Hoffman said.

For speaking and writing tests, contextual relevance is paramount. Tasks are designed to simulate real-life scenarios test takers are likely to encounter, such as writing a proposal or having a conversation about everyday topics. This approach not only gauges mechanical grammar and vocabulary skills but also measures practical language use.

Measuring Difficulty and Proficiency Levels

When determining the difficulty of test items, Michigan Language Assessment employs both straightforward and sophisticated statistical methods. Checking the percentage of correct answers among a diverse group of test-takers provides initial insights. Item Response Theory (IRT), a more advanced method, takes into account each participant’s overall proficiency and response pattern, ensuring a more nuanced understanding of item difficulty.

“When we in the testing business talk about difficulty, we talk about difficulty as a purely objective statistic, as in, is it hard for people to get the correct answer?”

However, a question’s difficulty is not its only important characteristic. It also needs to be considered in terms of the level of language proficiency required to answer it correctly, and making this determination involves expert judgment, not just raw statistics.

Drawing on descriptors from standards like the Common European Framework of Reference for Languages (CEFR), and ensuring raters interpret these consistently, is a meticulous process. Regular norming sessions and comprehensive training ensure that raters maintain uniformity in their judgments, which is crucial for the credibility of the test.

“The level, as opposed to difficulty, is a really complex measurement, and that is what takes people lots of training and constant norming discussions,” Hoffman said.

Validity and Effective Use

In high-stakes language testing, ensuring validity revolves around the appropriate use of test scores. Validity isn’t a binary; it’s about the degree to which test scores produce meaningful, actionable insights about a test taker’s language abilities. Hoffman emphasized that tests designed for general proficiency will not provide information about mastery of particular professional jargon, for example.

“Some people often talk about whether a test is valid, and we, in assessment, prefer to talk about the degree of validity of test score use,” Hoffman clarified.

Michigan Language Assessment is committed to transparency and clarity in score reporting to help stakeholders make informed decisions. Offering resources like research reports and user guides, the organization facilitates a deeper understanding of what specific scores represent.

Continuous Improvement and Stakeholder Feedback

Stakeholder feedback is crucial in refining the testing experience. Whether it’s simplifying the digital interface for ease of navigation or addressing misinterpretations of scores, listening to and acting on user feedback ensures the tests remain relevant and fair. While structural changes to test content require thorough consideration and time, responding to usability concerns can significantly enhance the overall test-taking experience.

Hoffman reflected on this aspect: “We can talk about how we scale our scores, and we can provide better information about how to interpret the scores. It’s usually a bit trickier when we get suggestions about things that should actually be tested, as we need to ensure consistency over time.”

Advice for Aspiring Professionals

For those interested in a career in high-stakes assessment, Hoffman suggests starting with foundational knowledge. This includes gaining an understanding of statistics and exploring research reports from various testing organizations. Internships provide practical insights and can help determine whether this field is a good fit.

“Take statistics and read some of the research reports that testing companies have up on their website,” Hoffman advised.

While it may not be a career many actively seek out initially, the lure lies in the central question of any assessment—how to reliably measure what someone knows. This challenge drives continuous innovation and learning.

Final Thoughts

Reflecting on her time at Michigan Language Assessment, Hoffman highlighted the organization’s unique commitment to empathy. Ensuring tests fairly reflect every individual’s true capabilities, acknowledging diverse backgrounds, and maintaining constant communication with stakeholders are all prominent parts of the team’s mission.

“One of the great values of Michigan Language Assessment is that we really see everybody as human. We’re constantly asking, ‘Is this going to be fair to our test takers?’” Hoffman said.

By treating each test taker and stakeholder with care and consideration, Michigan Language Assessment advances the field of language assessment, contributing to more accurate and fair measures of language proficiency.

If you are intrigued by the world of language assessment and want to understand more about it, Michigan Language Assessment offers a wealth of resources in the research section of our website.