Assessment Validity

John Winkley, AlphaPlus’ Director of Sales, shares his thoughts on assessment validity.

In this short blog, I want to talk about two aspects of assessment that have come up in discussions with clients at this most difficult of times, and have reminded me of my reasons for taking up assessment as a career, 15 years ago. Back in the 2000s, I was working at BTL, helping develop on-screen testing systems. I found I was increasingly interested in the tests that ran on those systems – why were they designed the way they were? What distinguished a good test from a bad one? I joined AlphaPlus to learn more about the answers to those questions. To this day, AlphaPlus’ motto is “We help organisations make their educational assessments better”.

I think there is a commonly held view among the general public that good assessments rely largely on writing really good questions. In current times where tests are often offered on-demand, I think there is also a public recognition of the importance of content balance across the many parallel test forms that support a rolling test programme. Perhaps even also a recognition of the importance of the use of test statistics. It’s hard to argue with many of those points.

However, as an assessment industry, I think we’ve done a poor job in raising public awareness of the bigger issues in assessment. When I started working in assessment, there was relatively little mention of “validity” as part of day-to-day assessment design. Thanks to the work of Ofqual and others, validity-driven design underpins many of today’s assessments. Nevertheless, in our work at AlphaPlus, we still encounter validity issues in high stakes assessments. Validity is about fitness for purpose of an assessment – how much can we trust the results of an assessment when we use those results for a particular purpose – deciding who passes and fails an entry test to a profession, or a rank order of candidates taking a test for awarding grades. We know that deciding on the purpose of a test is really important, because test designers can’t make tests do everything. Take, for example, a test designed for entry to a profession (we work on quite a few of these). The purpose here is usually clear – don’t pass any candidates who don’t meet the minimum standard of competence. The work that follows involves various approaches to identifying and articulating what the minimum competence actually is (not always easy in this complex human world), and then writing test questions (or tasks) that seek to differentiate effectively at that boundary. Such a test will have relatively little to challenge the most capable candidate, and the very weakest; those who shouldn’t really be taking the test yet, may score almost no marks. Nevertheless, we frequently have conversations about whether the results of such a test could be used for some form of rank order – identifying the very best candidates, for example. Despite the fact that there’s not much clarity about what “best” means.

There’s nothing wrong in principle with assessments serving more than one purpose. But not all purposes are compatible. As an assessment industry today, I think we understand this well, and our test design and assurance work has improved significantly over recent decades: serving the needs of the whole candidature better – reflecting the diversity of our society, better meeting the needs of those with access requirements, etc – often driven by the more detailed and easier-to-access data coming from our testing systems. But we haven’t done a great job in communicating what matters in assessments to our stakeholders. We still frequently encounter decision makers who are deciding on what to do with assessments without what we assessment professionals would consider an adequate understanding as to what the tests can do (and more importantly what they can’t). In engineering terms, tests are often pushed beyond their design spec because the nature and importance of their design spec isn’t known to the decision makers. “What we’ve got here is failure to communicate” – I see this as largely a challenge for the assessment professional – to make sure our voice is heard in the places that matter.

As you might expect, AlphaPlus is working currently with a number of awarding organisations within the school sector to support awarding this summer of what Ofqual terms “calculated results” (results based on partial completion and/or statistical and other predictions of outcomes). In these extraordinary times, this type of work is clearly essential, although the creation of those calculated results will be quite challenging in many cases, as will explaining the extent to which they are to be trusted. Two things have puzzled me about the way this important work is being specified and delivered. Firstly, the public debate has been quite limited – for example, whether students should aim to “resit” GCSEs in the first part of year 12, or not; what universities should do differently (if anything) regarding entry decisions; how the 2020 assessment arrangements are likely to affect different parts of our school communities – disadvantaged students, male and female students, the late bloomers vs the steady workers, etc. There is also the question as to whether not requiring “resitting” effectively changes the purpose and concepts of fairness around (for example) GCSEs, including their use the driver of accountability measures in secondary schools. Perhaps, the lack of debate demonstrates a deep trust in the national systems at times of crisis – I hope it does. I also think the debate has focused on “the system” – what will be done at national level – rather than on the options, needs, expectations and rights of individual learners.

What has also puzzled me in this limited public debate is the lack of assessment voices – professionals in the assessment community explaining to the public what they’ll be doing and why it’s for the best. This debate has largely been taken up by politicians and school leaders. Maybe I’m not looking in the right places, but I would like to be hearing more from the national leaders of assessment at this time, particularly on how we’ll ensure the system doesn’t just respond to the crisis to keep the large educational wheels turning, but on how individual learners won’t be disadvantaged. As I mentioned earlier, AlphaPlus’ motto is “We help organisations make their educational assessments better”; I’d like to think “better” means better for candidates, first and foremost.