Developing valid questions

GL Assessment has a track record of over 35 years of developing and designing valid, age-appropriate tests and assessment items (questions). Our admissions tests are typically constructed from the following subject areas –

  • Verbal Reasoning
  • Non-verbal reasoning
  • Spatial reasoning
  • English
  • Mathematics

The design, development and trialling of test items which have validity and reliability is critically important to the admissions testing process to ensure that schools, parents and candidates can trust in the outcomes of these tests.

We are painstaking in our approach to item development with the process taking a minimum of 15 months before the item is finally verified and passed ready for use in tests.

We develop well over 2000 new items annually including the development of new item types; each of the subject areas listed above has a range of item types. Initially, we specify the items required and then commission the development of these from leading experts in this field.

The items are scrutinised for appropriateness and bias. We work hard to remove bias including gender, socio-economic and ethnic and to confirm that the test items are age appropriate. In addition there are quality control checks to ensure that the item works and that it has construct validity, i.e. it will measure what it is designed to measure. For example, if there are distractors (alternative answers) how plausible are these? Is there only one possible correct answer? How similar are the answer and the distractors etc. Once the item has been checked by two independent reviewers the item is ready for the next stage.

The final step in the process is to ensure that the items are valid is through trialling. For admission tests the trials are undertaken by Year 7 pupils to see whether, in practice, items actually differentiate effectively, avoid gender bias and are pitched correctly to the age group. The statistics from the trial are then analysed at item level, for example, analysis of facility (difficultly), discrimination (the extent to which a question splits up overall strong and overall weak candidates), gender bias and distractor analysis etc. Questions are then only used if they pass through two industry-standard statistical methods, Classical Item Analysis and Item Response Theory.

Once passed the items are then uploaded to our secure item database from which new tests are constructed. The item database currently contains in excess of 25,000 items. We systematically review the database to ensure these are age-appropriate for children at the start of Year 6 and in line with national Curriculum expectations at points when the National curriculum is revised.

So you can see that developing robust items is a complex process and one that is taken with upmost care and rigour to ensure consistency, validity and reliability.