Student Assessment and Program Evaluation
Reform leaders addressed several factors critical to the long-term viability of the benchmark assessment program.
First, in order to be most effective, the benchmark assessment program needed to be implemented “at scale” within the district. This represented a challenge for several reasons, including the time necessary to administer, score, report, and discuss benchmark assessment findings at the classroom, school, and district levels. In addition, the district did not have an electronic reporting portal that could both share results with school staff in a timely and customizable way, and ensure the results were both valid and easy to understand. OMS staff collaborated with district’s efforts to create an on-line portal that was appropriate for the reporting function. This involved attempts to buffer the detrimental impacts of one district-supported platform (GrowNet) at the elementary level while attempting to bridge some beneficial aspects of another (IMPACT CIM) while minimizing its unconstructive components. At the high school level, reform leaders worked with IDS and external assessment staff to develop a logistical system (including administrative, scoring, scanning, reporting functionality) that would support teachers’ instructional decision-making and which could be scaled across the district. These efforts led to mixed results.
Second, to develop and implement a coherent assessment system at scale required several different skill sets. The district was fortunate to be able to collaborate with external vendors and university assessment experts to develop an assessment system that was vertically coherent across levels of the system. Additionally, OMS was able to work with the developers of the district-supported instructional materials both on revisions to their own material’s embedded assessments, and on creating appropriate items and forms for the elementary mathematics benchmark assessments. At the high school level, OMS worked with IDS developers to create end-of-course IDS summative assessments. This approach may be different from other adoptions of assessments where traditionally assessments are either bought directly off the shelf (which often are not aligned to classroom instruction or which can’t be easily used by teachers) or developed independently by teachers (often leading to assessments that aren’t valid or reliable or whose psychometric properties are not established).
Third, to maintain the sustainability of assessments, test “item pools” need to be replenished over time. Benchmark test items are reusable to a greater extent than high stakes assessment items, but they still need to be replaced over time. As part of the development process, for each administration, a few new items were embedded within the assessment in order to be field tested and used on future administrations of the assessment. Long-term sustainability also requires that training materials can be modified and used by a wide variety of stakeholders. This was accomplished by providing examples of real, student-generated work on the assessments which was scored and publically distributed so stakeholders including teachers and parents could see concrete examples of what was expected of students and how students performed. Finally, by developing and distributing training modules and materials the district attempted to address sustainability in assessment training. Given the size of the district, variability in presenters, and assessment literacy among district staff, this has remained a continuous challenge, but one that the reform planners thought was essential if the benchmark assessment system was going to meet the formative needs of teachers and students and not simply be another summative assessment, albeit more frequently than the once a year high stakes NCLB assessment.
The idea of using assessment for learning rather than of learning, as well as using assessments in a non-high stakes manner, represented a substantial shift in thinking for many district stakeholders. To make that shift required buy-in to the whole idea of formative assessments. Buy-in would enhance the prospect of long-term sustainability of the assessment system.
Creating buy-in across the district was especially challenging, given the site-based control in CPS schools. Individual schools had wide latitude to decide (1) if and when they administered non-high stakes assessments, (2) whether or not time was provided for school staff to collaborate on the evaluation and scoring of student work product and assessments and, (3) how to use the assessment results.
The process for developing the benchmark assessment program helped create the buy-in necessary to overcome these challenges. Because OMS involved teachers and school administrators in developing both the benchmark assessment instrument and process, the program reflected input from actual users. This created a program that addressed teacher and administrator real-world needs, provided a vehicle for educating participants about assessment literacy and different uses of assessments for different purposes, and contributed to school buy-in.
Also, by designing the scoring process to be a collaborative effort within each school and throughout the school year, reform leaders created buy-in from school staff by involving them substantially in the development and refinement of the scoring process. In this way assessment practices were designed to be deeply intertwined in instructional practices and thus, go on throughout the year, not simply in a one-time, end-of-year assessment. If this is done well, such that practitioners can see the link between on-going assessment and instructional practices, this generally leads to more buy-in for the use of the assessments.
Using and sharing evaluation results
Formative program evaluation efforts can only be effective when results are considered in a timely, relevant fashion. Formative findings are especially valuable when they are included in program planning and management discussions.
In order to facilitate this usage, the Evaluation Specialist was part of the OMS leadership team that met weekly. He brought data and insights to these OMS “lead team” meetings from weekly meetings with the external evaluators, as well as from internal evaluation efforts. The evaluation teams, both internal and external, wrote reports with formative feedback in mind, first delivering “interim” reports, and filing summative reports later. Both evaluation teams also engaged in dialogues with OMS staff, to provide formative feedback from the reports’ findings. This proved useful both to those making program decisions and to the evaluators themselves. By talking with the program managers and staff, evaluators could better gauge what types of data the decision makers would find helpful in their next round of decisions.
Finally, reform leaders sought to increase the sustainability of the program evaluation efforts by sharing all evaluation reports with a broad range of stakeholders, regardless of how favorably the reports described programs. The methods for sharing evaluation results varied from formal presentations to informal discussions, and included both formal technical reports and more informal, interim briefs. Evaluation reports and papers are available to the public online, through a district website cite .