Inclusion & accessibility in assessment design

No matter who is sitting an assessment – whether they are learners in a school or candidates for a professional assessment – it is crucial that inclusion and accessibility have been considered in the assessment development process. As assessment developers, we must ensure that the assessment is testing candidates’ ability in the assessment topic and not whether they can access the assessment, or other potentially extraneous factors such as reading ability or cultural knowledge.

Equality, diversity and inclusion within a validity-driven assessment design approach

Validity is about ensuring that an assessment properly and fully measures the knowledge and skills it is supposed to and nothing else. Equality, diversity and inclusion (EDI) presents particular challenges to validity because:

At a fundamental level, failures to avoid discrimination against any protected characteristics is illegal and a major validity threat. While avoiding this type of discrimination is now relatively routine, we need to ensure the necessary checks of content are made as part of the editing process. Many clients that we work with are concerned about the fairness of their assessments for the highly diverse candidature that take them. Statistical measures can identify potential unfairness, but this approach is only possible after the tests are taken – good design from the outset can minimise the incidence of unfairness at the time the tests are written.
Context is important in setting questions/tasks that are sufficiently challenging and discriminating (particularly around the pass threshold, in the case of minimum competence assessments). Learners may come from diverse cultural backgrounds, creating a risk that candidates may struggle to fully understand the context of questions which may prevent them from showing their capability. These issues need to be balanced carefully with the requirement included in many competency frameworks that candidates must be able to apply their knowledge to unfamiliar circumstances.
Similar challenges apply to language. Where tests are not supposed to be assessing the ability to speak or read English (i.e. that capability isn’t part of the test specification), care must be taken so that tests aren’t effectively proxy English tests or that slower readers suffer disadvantage due to time limits.

Validity-based development aims to explicitly provide detailed, defensible evidence to show an assessment can be trusted for its intended purpose, and this evidence is produced in parallel with the assessment development process itself. A key feature is that with each subsequent step in development and delivery, validity can only be preserved – it cannot be “improved downstream”. Careful evidence-based design in the early stages is therefore critical. We achieve EDI in our projects as follows:

A diverse and representative approach to the domain (not just one author writing alone for a subtopic, but a team effort with review at every stage) plus EDI monitoring and (if necessary) targeted recruitment for author and editorial teams
Strong effective and meaningful training at the outset into what EDI means in the context of this project ensuring the team are aware of all public duty legal requirements, though these are only a basic framework of requirements and our ambitions for EDI in the assessments are typically higher than this.
Assessment design training:
- deals with (1) (the law, protected characteristics and how these manifest in test content) above and (2) (context) by establishing a range of diverse and relatively neutral contexts to be used within the assessments.
- deals with (3) above by establishing the formal reading and speaking level for the assessment
Defining content checking protocols to be followed by the buddy-pair group and content advisory group
Supporting a best-practice approach to Special Access Requirements, i.e. providing tests in accessible formats.

Inclusivity in terms of accessibility

Firstly, the content development should follow universal design principles, and be written from the start to be as inclusive as possible. Factors here can sometimes be in tension, for example it is good practice to use images rather than text where possible to help learners with disabilities such as dyslexia, conversely images are more difficult for visually impaired learners. Universal design principles mean that the questions should be as short, clear and to-the-point as possible.

Equal access to the full curriculum wherever possible is an important part of inclusivity. In reading assessments, this may mean that all reading materials are also provided in Braille (this is the medium in which severely visually impaired or blind learners access the curriculum). This may be done using Braille devices, such as a BrailleNote Touch, or by providing booklets of Braille materials. In some areas of numeracy, for example data handling, reading information from charts is a significant element of the curriculum. It is generally not possible for visually impaired learners to do this on screen (although some Braille devices can do this to some degree). In some cases it may be necessary to provide hard copy tactile diagrams to refer to.

Although we design in accessibility, both in the content and the technology, we find there is no substitute for “proofreading accessibility – at content and UI level – by real accessibility experts”. In the Welsh national adaptive assessments (below), we worked closely with WAVIE (Welsh Association for Visually Impaired Educators) throughout the project and conducted trials in a number of special schools with learners with a range of behavioural and physical disabilities.

Accessibility in computer-based testing

On-screen testing can provide an opportunity to make assessments more accessible than on paper (e.g. for learners with learning disabilities) but only if the assessments are designed correctly. For example, we have worked with special schools in the development of the tests for Wales and our trialling concluded that presenting one question on screen at a time, with a clean interface, was less distracting for some learners that a double-page spread of questions in a paper-based booklet.

The assessments must be designed to:

Be as short as possible to give an adequately reliable judgement about the student in all the necessary areas
Be as simple to use as possible so that the process itself is straightforward – this includes the processes for logging in and accessing results, as well as the assessment itself. In some cases, validity reviews look at the broader processes by which candidates are able to access the assessments (public information, hurdles to be overcome prior to the assessment, accessibility of assessment locations, etc.)
Include questions that are straightforward to access, other than the challenge required of the particular skill being assessed, e.g. reading skills should not be a burden in assessing numeracy. High level IT skills should not be a requirement to access online assessments. Because almost all tests require a degree of reading, and all onscreen tests require an element of IT competency, we are finding that tests generally specify a minimum reading level and IT competency as part of the assessment requirements/competency framework.
Work on devices that the students are familiar with, e.g. if a student regularly uses a tablet in their classroom practice, they should be assessed on tablet.

Images can aid engagement but can also make the assessments less accessible if they take up too much of the screen and lead to scrolling. These should only be used where necessary.

Developing evidential proof of EDI

The above approaches are essentially “recipe ingredients” – activities designed to ensure the tests are fair. But the “proof of the pudding” is in how the tests perform in practice. There are various sources of EDI evidence outlined in our validity framework, but two key measures are:

Statistical analysis (as part of wider psychometric test performance management) to discover any evidence of EDI bias in the tests. it is possible to isolate differences in candidates’ performances which are due to factors outside the test’s scope. This Differential Item Functioning analysis (along with other analyses) represents international best practice in demonstrating test fairness. We undertake DIF analysis for a number of clients. In reality, we often find that where a test item shows evidence of differential performance, when the item is reviewed, it’s difficult to identify any element of it that suggests bias. This reflects the reality of inclusive design – the design inputs should consider EDI carefully, but there’s no guarantee they will work fully, or that any problems can be clearly identified or fixed when the tests are in use.
Candidate post-test surveys, which can provide valuable insight.

This is one part of our recommended validity-based test development approach – designing EDI in, and then proving (as far as possible) that the design delivers EDI in practice.

Examples of AlphaPlus work on inclusion

Solicitors Regulation Authority (SRA)

Diversity in the legal profession has received much attention, motivated primarily by concerns about equality and access to talent, and in response to pressure from policymakers and media. The SRA embarked on a major consultation programme to introduce a common assessment that all trainee solicitors would take before qualifying (including UK and internationally trained candidates): the Solicitors Qualifying Examination (SQE).

AlphaPlus undertook two major pieces of work for this reform:

A logic model-based approach to capture outcomes the reform aims to achieve (i.e. how the new programme will deliver better EDI), and the mechanisms for this. This involved a large validity study and test design recommendations. Report: https://www.sra.org.uk/documents/SRA/research/Alphaplus.pdf
Reviewing Levels and Proposed Content Demands in the SQE. Report: www.sra.org.uk/documents/sra/consultations/sqe2-alphaplus-report.doc

Both elements of this work had EDI, particularly ethnic and cultural diversity (within the context of test validity and reliability) at their heart.

Wales National Test Programme

In AlphaPlus’ current work with the Welsh Government on the Welsh national personalised assessments, we have made sure that the assessments have been designed with learners’ access in mind.

Once these national tests move online, absolutely all the assessments are delivered on-screen – there are no paper-versions, no large-font, no completely braille printed tests (although we do have some braille booklets to support the onscreen assessments), etc. Every child that is cognitively able to take the assessment, takes it onscreen, with access support, either in the form of technological support or by an adult, where needed.

The Welsh assessments are fully WCAG2.0 AA compliant and accredited by the Digital Accessibility Centre. We place accessibility experts within the software development teams and work closely with a partner organisation, Pia, in preparing test content.

More information on accessibility in the Welsh assessments can be found here.

BAME work for GPhC

AlphaPlus works with professional examining bodies who are often concerned with scoring within different candidate subgroups (especially overseas candidates). Working with GPhC, we have conducted regression modelling to identify background factors that may explain scoring on professional examinations. This work has been presented at the European Board of Medical Assessors conference. Multiple regression analysis of this type allows us to:

Identify background factors which are significantly associated with scoring.
Compare relative effects of significant factors, one to another.
Quantify the extent to which a model explains all the variance in the data (or are there other matters, not captured by our variables, which may explain variance in scores?)
Quantify how likely we are to be able to predict a pass or fail given background variables in our model.

News & Reports