Practitioner Toolkit: Working with Adult English Language Learners

Assessment Validity, Reliability, and Appropriateness

The assessments used for program accountability must be valid, reliable, and appropriate. This has raised important questions for the field. What are features of assessments that make them valid, reliable, and appropriate? (For more detailed discussion of these issues, see American Educational Research Association, American Psychological Association, & National Council on Measurement in Education, 1999).

What makes an assessment valid?

Assessment is valid when the test, or other instrument, assesses what it is intended to measure, and when uses of the assessment results are only those for which the instrument was designed (Messick, 1989). This view takes into account both the validity of the test itself and the use of the test scores; a test's validity depends on what it is used for, in what contexts, and for what purposes. In terms of assessments used to fulfill NRS requirements, the answers to the questions shown below are important.

Does learner performance match the NRS descriptors?
How well does the test demonstrate learner progress?
How indicative of program quality are learner performances on the assessment?

Any assessment used for NRS purposes is valid only if the inferences made about the learners on the basis of the test scores can be related to the NRS descriptors, or what the learners can do (proficiency). The assessment also must be sensitive enough to learner gains to be able to show progress, since the quality of programs is to be judged by learner performance on the assessment.

What makes an assessment reliable?

An assessment is reliable if scores are consistent when the test is repeated on a population of individuals or groups. For example, if a learner takes a test once, then takes it again an hour later and maybe another hour after that, the learner should get about the same score each time, provided nothing else has changed.

Test reliability can be affected by a number of factors: the test itself, the test administrator, the person who does the scoring, the testing procedures, the conditions under which the test is administered, or even the examinee. For example, an examinee might be feeling great the day of the pre-test but facing a family crisis on the day of the post-test.

Who has responsibility for ensuring that an assessment is reliable? The developers of the assessment must demonstrate that reliability can be achieved. Program staff using the assessment must administer it in the ways it is designed to be administered. Programs need to train the individuals who will administer the test so that it will be administered appropriately each time it is used, and they need to monitor its administration and scoring. Programs also must ensure that enough time (or hours of instruction) has passed for learners to show gains.