Resources
- FLAD
- Foreign Language Assessment Directory
- Understanding Assessment Tutorial
- Heritage Language Assessment Module
- Post-Secondary World Language Assessment Module
- Introduction
- Proficiency
- Placement Testing
- Assessment Plans
- Assessment Plans: The Why
- Assessment Plans: The How
- Aligning Assessment with Instruction
- Performance-based Assessment Tasks
- Designing Performance-based Assessment Tasks
- Scoring Performance-based Assessment Tasks
- Using Integrated Performance Assessments
- Designing Integrated Performance Assessments
- Intercultural Communicative Competence
- Assessing Intercultural Communication
- Assessing Cultures
- Assessment and Program Articulation
- Summary of Best Practices
- Show What You Know!
- Putting It All Together
- Resources
Accountability | Responsibility for educational outcomes; these outcomes are often measured through standardized testing. |
Achievement test | A test that measures how well a student has reached the objectives of a specific course or program |
ACTFL proficiency levels | Guidelines developed by the American Council on the Teaching of Foreign Languages (ACTFL) that describe language performance |
Alternative assessment | Non-traditional forms of assessment; may include portfolios, observations, work samples, or group projects |
Analytic scoring | Method of scoring or rating that assigns separate scores for different aspects of a student’s performance |
Aptitude test | Test which measures a student’s talent for learning language; predicts future performance |
Assessment | An ongoing process of setting clear goals for student learning and measuring progress towards these goals |
Assessment literacy | Knowledge about and a thorough understanding of myriad assessment practices, especially by educators |
Authenticity | How well a test reflects real-life situations |
Cloze test | Test that measures comprehension by asking students to fill in missing words from a passage |
Computer-adaptive test | Computer-based test that adapts to the test-taker’s performance and presents easier or more difficult tasks based on previous answers |
Construct | What a test measures |
Construct validity | How well a test measures what it is supposed to measure |
Content validity | How well the content of a test reflects the construct that the test is measuring |
Criterion-referenced | Scores interpreted with respect to standards or a theory of language; everyone can get a high score. |
Cutoff score | On a criterion-referenced test, the minimum score a student must receive to demonstrate a determined level |
Direct testing | Testing method that closely matches the construct being measured |
Discrete test | Test focused on specific language skills |
Diagnostic test | Test that identifies a student’s strengths and weaknesses |
Evaluation | Making decisions based on the results of assessment |
Face validity | Non-technical term that refers to how fair, reasonable, and authentic people perceive a test to be |
Formative assessment | An assessment used during the course of instruction to provide feedback to the teacher and learner about the learner’s progress toward desired educational outcomes; the results of formative assessments are often used in planning subsequent instruction. |
High-stakes test | Assessment that is used to make critical decisions with consequences for one or more stakeholders in the assessment process; an admissions test that determines the course of a student’s academic future and a test used for accountability and linked to funding are both examples of high-stakes tests. |
Holistic scoring | Method of rating an assessment based on general descriptions of performance at specified levels; while a holistic scoring rubric may take into account performance along several dimensions (e.g., fluency, grammatical accuracy, and word choice for oral language), one overall score which best represents the examinee’s performance is assigned. |
Impact | The positive or negative effects of testing |
Indirect testing | A method of testing that measures abilities related to the construct being tested, rather than the construct itself |
Input | The materials (presented aurally and visually) that an examinee receives as part of the test tasks |
Integrative test | Test that addresses multiple language skills, sometimes in the same task |
Multiple choice test | Test in which examinees demonstrate knowledge, skill, or ability by selecting a response from a list of possible answers |
Needs assessment | Inquiry into the current state of knowledge, resources, or practice with the intent of taking action, making a decision, or providing a service with the results |
Norm-referenced | Scores interpreted with respect to other examinees; some must score high, some low. |
Off-the-shelf | Commercially-available test which can be purchased by an educational institution or individual user and administered at the discretion of the individual user |
Parallel forms | Two or more tests with different questions that measure the same underlying skill and whose difficulty levels have been determined to be equivalent; scores from parallel versions of a test can be compared with one another. |
Percentile | Range of measures from 1-99 used to compare examinees with one another; an examinee who scored in the 80th percentile placed higher than 80% of test takers. |
Performance assessment | Assessment which requires the examinee to demonstrate knowledge or skill through activities that are often direct, active, and hands-on, such as giving a speech, performing a skit, or producing an artistic product |
Placement test | Test whose results are used to assign students to classes designed for learners at a particular level |
Practicality | Feasibility of test given materials, funding, time, expertise, and staff |
Proficiency test | Test of ability in a defined area of language; the area may be narrowly-defined (e.g., English for airline pilots) or more broad (e.g., social and academic language). Proficiency tests are not tied to a specific curriculum or course and are often contrasted with achievement tests. |
Program evaluation | Process of collecting data from multiple sources about an instructional program or intervention and making a decision about the success of the program based on this information; the evaluation could target both the process and outcomes of the program. |
Raw score | Student’s total number of correct responses on a test |
Reliability | Consistency of scores/results |
Scale score | Score that allows test results to be compared across students; in standardized testing, raw scores are often converted to scale scores. |
Scoring method | Describes how scoring is accomplished (e.g., machine-scored, hand-scored, centrally scored, locally scored) |
Scoring process | Describes the procedures used to obtain a test score, e.g., counting the number correct, scoring holistically or analytically according to established guidelines, a scale, or a rubric |
Self-assessment | Personal rating of language ability according to specified criteria |
Skills test | Test focusing on a specific domain of language use, e.g., listening, reading, writing or speaking (interactive or presentational) |
Stakeholders | Persons involved with or invested in the testing process, e.g. test takers, administrators, parents, and teachers/instructors |
Standardized test | Test with fixed content, equivalent parallel forms, standard administration and scoring, field-tested, valid, and reliable |
Subscore | Score that represents student performance in a particular domain or part of a test |
Summative assessment | Outcome-based use of assessments, often for decisions such as grading, program evaluation, tracking, or accountability |
Test accommodation | “Any change to a test or testing situation that addresses a unique need of the student but does not alter the construct being measured” (Center for Equity and Excellence in Education, 2006) |
Test administration | Delivery of the test items/directions to the test-takers |
Test development | Process of creating a test; steps of test development (Hughes, 2003): 1. State the goals of the test. 2. Write test specifications. 3. Write and revise items. 4. Try items with native speakers and accept/reject items. 5. Pilot with non-native speakers with similar backgrounds as the intended test-takers. 6. Analyze the trials and make necessary revisions. 7. Calibrate scales. 8. Validate. 9. Write test administrator handbook, test materials. 10. Train staff as appropriate. |
Test format | Mode and organization of test, test structure (e.g., multiple choice, short answer) |
Test items | Tasks, questions, or prompts to which test-takers respond |
Test materials | Items used for the test administration/taking |
Test purpose | What you want to learn from the test results |
Testing | Valid and reliable practice of language measurement for context-specific purposes |
Validity | Validity is a judgment about whether a test is appropriate for a specific group and purpose and includes considerations such as whether the test really measures what you think it is measuring, whether the results are similar to examinees’ performance on other tests or in class or real-world activities, and whether the use of test results have the intended effects. |
Washback | Effects of test on teachers’ and students’ actions; washback can be positive (expected) or negative (unexpected, harmful). |
Online Resources
- CAL Digests: A collection of brief reports on assessment and other relevant topics
- Language Testing Resources: An online reference guide to language testing resources, open to all
- International Language Testing Association (ILTA) Guidelines for Practice
- Virtual Assessment Center: Learning modules about language assessment, from CARLA
- Classroom Assessment Literacy Inventory: Adapted questionnaire to measure level of competence in testing and assessment
- National Clearinghouse for English Language Acquisition & Language Instruction Educational Programs (NCELA): Information and resources related to support for English learners
- Assessment books and training resources, by SERVE Center at UNC – Greensboro
- Assessment Literacy: Video by Rick Stiggins
- Michigan Assessment Consortium – Free webinar on assessment by Rick Stiggins
Print Resources
Book with in-depth information on measurement, language test uses and methods, reliability, and validity
- Bachman, L. & Palmer, A. (2010). Language Assessment in Practice. Oxford: Oxford University Press.
A practical guide to developing your own classroom assessments
- Brown, H. D., & Abeywickrama, P. (2010). Language assessment: Principles and classroom practices (Vol. 10). White Plains, NY: Pearson Education.
A book which provides a thorough but accessible overview of foundational concepts in language testing
- Hughes, A. (2003). Testing for language teachers (2nd edition). Cambridge: Cambridge University Press.
Handbook which explains the principles of backward design for classroom assessment
- McTighe, J. & Wiggins, G. (2005). Understanding by design (2nd ed). Alexandria, VA: Association for Supervision and Curriculum Development.