Introduction
Research concentrating on class size is important because the findings have largely concluded that smaller class size leads to increases in student achievement, helps to close the minority-majority achievement gap, and has several other long lasting benefits. This paper reviews the literature concerning the research into three major experiments concentrating on the relationship between smaller class sizes and student achievement. The review explores not only the methodology of different approaches to class size reduction as well as the benefits of smaller class sizes but also the range of conclusions drawn on the relationship between smaller class sizes and student achievement including those that are supportive as well as critical plus the need for additional research. The three major class size reduction initiatives are:
- Student-Teacher Achievement Ratio (STAR) project
- Student Achievement Guarantee in Education (SAGE) project
- California Class Size Reduction Program (CSRP)
The literature concerning the different approaches used to evaluate the impact of smaller class size on student achievement in each of the three class size reduction initiatives is reviewed including STAR’s use of a randomized, longitudinal experiment, SAGE’s targeting of students in low income or poverty level school districts, and California’s statewide implementation. The design, findings, and limitations of each of these studies are evaluated as discussed in the literature.
Although this review focuses on the three major class size reduction initiatives, the landmark STAR project not only set the standard against which other studies have been evaluated but also caused other class size reduction initiatives to be undertaken including Wisconsin’s SAGE project and California’s CSRP to try to replicate its results. The landmark STAR project is the leading longitudinal class size reduction experiment conducted over ten years in Tennessee to determine not only the relationship between smaller class sizes and student achievement but also the long term effects. Tennessee’s STAR project, Wisconsin’s SAGE project, and California CSRP’s are the largest class size reduction initiatives making their conclusions some of the most credible.
Although a majority of the body of research on the effectiveness of class size has found that smaller class size improves student achievement including Finn and Achilles (1999), Konstantopoulos (2009), and Schanzenbach (2007) and many small class size advocates promote a nationwide application, this conclusion has its critics. Critics such as Hedges, Laine, and Greenwald (1994), Hoxby (2000), Prais (1996), Hanushek (1989), and Hanushek (1996) contend that class size reduction programs are either too expensive, create certified teacher shortages, create classroom shortages, or have flaws in their methodology and, therefore, different and more cost effective ways of improving student achievement should be considered. The research that shares this perspective of class size reduction programs is reviewed. The paper concludes with a review of the literature that provides insights into the possible underlying reasons as to why small class improves student achievement.
A large portion of the class size reduction research has not only demonstrated that when qualified teachers teach students in smaller class sizes the students in the smaller classes learn more and according to Finn, Fulton, Nye, and Zaharias (1992), Konstantopoulos and Chung (2009), and Finn and Achilles (1999) these students retain this advantage over other students who attend larger classes but also shown how smaller class sizes help to significantly close the achievement gap among minority and majority students. According to Nye, Hedges, and Konstantopoulos (2000a), Konstantopoulos (2009), and Schanzenbach (2007) smaller class size seems not only to increase achievement for all students but also to benefit most those students who are minorities, eligible to receive free or reduced-price lunches, or attend urban schools in low income districts.
Krueger and Whitmore (2001) identified a number of other small class size benefits for at-risk students including how small class sizes narrow the achievement gap, reduce grade retention, decrease behavioral problems, reduce truancy, and increase graduation rates. While Fletcher (2009) found evidence of increased participation in high school sports, extracurricular activities, honors, and advanced placement courses among students who attended smaller class sizes during kindergarten through third grade, he did not find evidence of higher rates of attending college.
In terms of this literature review, however, it is important to understand two essential elements of the STAR, SAGE, and California CSRP class size reduction initiatives. First, these three major initiatives were not specifically designed to determine how smaller class size improved student achievement but rather to determine whether smaller class size improved student achievement. Second, it is important to distinguish between what is meant by class size and a student-teacher ratio. Class size is the number of students a teacher instructs in a classroom at a point in time. A student-teacher ratio is different because it is a school’s total student enrollment divided by the number of its full time teachers.
Student-Teacher Achievement Ratio (STAR) project
The leading experiment is the landmark longitudinal class size reduction initiative conducted over ten years in Tennessee to determine not only the relationship between smaller class sizes and student achievement but also the long term effects called the STAR project. The STAR project focused on the impact of smaller class size in kindergarten through third grade. Along with Wisconsin’s SAGE project and California’s CSRP, the STAR project is one of the few very large scale class size experiments making its data some of the most widely studied.
Glass and Smith (1979) performed a meta-analysis of the major class size reduction experiments to determine the relationship between class size and student achievement which helped to establish the foundation for the STAR project. Glass and Smith (1979) focused primarily on comparisons of student performance in class sizes of less than 15 versus classes with more than 20 students and concluded that smaller class size tended to improve student achievement significantly. Although Slavin (1989) found a less significant relationship between class size and student achievement when he analyzed Glass and Smith’s study, many evaluations of the STAR project’s findings were consistent with those of Glass and Smith. Also, Slavin (1989) argued that class size reduction projects such as STAR are too expensive relative to the degree of improvement in student achievement and that less expensive methods of improving student achievement should be considered. In another meta-analysis, Hedges et al. (1994) concluded that the larger is the scale of the class size reduction experiment the greater is the expense.
Three Phases
The STAR project was conducted in three phases. The first phase, following a similar two year initiative conducted on a much smaller scale in which class sizes were reduced from 25 to 18 in grades kindergarten through second grade in a small number of Indiana schools called Project Prime Time, was performed over four years. In Project Prime Time, Finn and Achilles (1990) found that students in smaller classes scored higher on standardized tests, had fewer behavioral problems, and performed at higher levels in mathematics and reading than those in larger classes. Because Project Prime Time did not randomly assign students to treatment groups, it was criticized for a lack of internal validity by many researchers including Blatchford, Goldstein, and Mortimore (1998), Goldstein and Blatchford (1998), and Grissmer (1999). Project Prime Time’s lack of randomization highlighted the need for a large scale randomized experiment to evaluate the impact of small class size on student achievement which led ultimately to the STAR project. Konstantopoulos (2009) and Finn and Achilles (1990) found that the STAR project showed students in smaller class sizes from kindergarten through third grade demonstrated significantly improved achievement as compared to those in larger classes after four years of the experiment. Konstantopoulos (2009), however, focused more on the impact of the teacher particularly on poverty level, minority, and female students in the STAR project and found that the teacher had the most impact on improving the achievement of students coming from poverty level homes with minimal effects on minority and female students.
STAR’s second phase, a three year observational study called the Lasting Benefits Study, found that the benefits of smaller class sizes continued into the later grades. In terms of the Lasting Benefits Study, Achilles, Nye, Zaharias, and Fulton (1993), Konstantopoulos and Chung (2009), Finn et al. (1992), Finn and Achilles (1999), Nye, Hedges, and Konstantopoulos (2001a), and Konstantopoulos (2009) concluded that even after the students returned to larger classes in the fourth through eighth grades those students who had attended smaller class sizes for their first three or four years maintained an advantage over students who had attended the larger classes from kindergarten through third grade. Finn, Gerber, Achilles, and Boyd-Zaharias (2001) found that the benefits of smaller class sizes from kindergarten through third grade were significant and lasted throughout high school.
Although Konstantopoulos and Chung (2009) found evidence that the duration of the benefits was a function of the number of years a student attended smaller classes and that the benefits lasted beyond eighth grade, Fletcher (2009) found that the benefits of smaller class size diminished with time. Still, Nye, Hedges, and Konstantopoulos (2001b) found that the duration of the benefits of smaller classes lasted beyond ninth grade. The students who attended smaller class sizes in kindergarten through third grade, therefore, continued to outperform those who had attended larger classes according to most researchers. Major assessments of the Lasting Benefits Study including those conducted by Nye, Hedges, and Konstantopoulos (2000b), Konstantopoulos and Chung (2009), and Finn and Achilles (1999) supported STAR’s earlier findings that minority students benefited the most from having smaller class sizes.
The STAR project’s third phase, called Project Challenge, was conducted over three years and placed all of the kindergarten through third grade students of Tennessee’s 17 most economically challenged school districts into small classes. Nye, Hedges, and Konstantopoulos (1999) assessed Project Challenge and found improvements in student achievement similar to those of the STAR project’s first phase. As a result of having smaller class sizes, Nye, Achilles, Zaharias, and Fulton (1993) concluded that the students in these 17 districts had raised their performance levels for reading and mathematics.
But unlike Project Prime Time which was attacked because the students were not randomly assigned, Krueger (1999) and Schanzenbach (2007) explained how the STAR project was a randomized experiment of kindergarten through third grade class size conducted in Tennessee. In the STAR project, the researchers used random assignment of the students and teachers to control the treatment. The researchers compared the students’ performance on the post treatment status on the dependent variable using statistical analyses of student test results. Because the researchers used random assignment, the statistically significant differences that they found enabled them to conclude that the effect of having students in smaller class sizes was the cause of the higher achievement levels which assured them that the effect was real. That is, researchers such as Finn and Achilles (1990) and Finn and Achilles (1999) were able to conclude that the treatment caused the differences in effect and that the results were internally valid because the project’s use of random assignment had eliminated other possible explanations for experiment’s outcomes.
Finn et al. (2001), Finn and Achilles (1990), Finn and Achilles (1999), and Schanzenbach (2007) explained how the STAR project’s use of randomization was likely to lead to internally valid results because it only manipulated class size and excluded other factors from influencing the selection and assignment of students into treatment and control groups. Thus, any significant differences in performance among the different groups would be a function of the difference in class size. Both standardized and curriculum based tests including the Stanford Achievement Test and the Tennessee Basic Skills First Test were employed to determine the performance of approximately 11,600 students in inner city, suburban, urban, and rural school districts in STAR’s three phases. The tests assessed the students’ reading, mathematics, and basic study skills.
Public schools statewide were invited to volunteer for the project based on their ability and willingness to follow the experiment’s protocol according to Finn et al. (2001), Schanzenbach (2007), Finn and Achilles (1990), and Finn and Achilles (1999). The 76 schools that were selected to participate represented all areas of the state as well as all of the state’s ethnic and racial groups. While teachers were randomly assigned by grade to classes of one of three different sizes so that they were not able to choose their classrooms, the students were randomly assigned to either the treatment group or one of two control groups, a no treatment group or an alternative treatment group, so that each class was composed of a roughly equivalent student population according to Krueger (1999) and Nye et al. (1993).
Schanzenbach (2007), Nye et al. 1993), Finn and Achilles (1990), and Finn and Achilles (1999) explained how students were randomly assigned by grade to classes of one of three different sizes:
- Treatment group: Class size of 13 to 17 students
- Control groups:
- No treatment group: Regular class size of 22 to 25 students
- Alternative treatment group: Regular class size of 22 to 25 students plus a teacher’s aid
Each year students were assessed and their performance was compared to the performance of students in the other groups according to their grade level, school district location, socioeconomic status, ethnicity, race, gender, and number of years of participation in the experiment.
Findings
Most researchers seem to share the view advanced by Konstantopoulos (2009), Finn and Achilles (1990), Nye et al. (1999), Schanzenbach (2007), and Word, Johnston, Bain, and Fulton (1994) that the STAR project found that as class size was reduced student achievement improved. These researchers found the STAR project demonstrated that students, who were enrolled in small classes beginning with kindergarten and continuing through third grade, were significantly more likely than their counterparts who attended larger classes, to:
- Demonstrate better reading and mathematics skills
- Complete more advanced mathematics, science, and English courses
- Complete high school
- Graduate high school on time
- Graduate high school with honors
- Reduce truancy
- Reduce grade retention
- Decrease behavioral problems
Also, teachers of the smaller classes in grades kindergarten through third grade reported that they spent more time teaching, provided more individualized or differentiated instruction, had more interactions per child, and were better able to minimize disruptive student behavior as reported by Addonizio and Phelps (2000), Achilles et al. (1993), and Cooper (1989).
Finn and Achilles (1990) summarized the STAR project’s findings as to how the treatment group, the group with a class size of 13 to 17 students, had higher levels of performance than the students in either of the controls groups who attended larger classes of 22 to 25 students. Finn and Achilles (1999) found statistically significant improvements in student achievement not only among those students who attended smaller classes from kindergarten through third grade but also among those who attended smaller classes for less than four years even if the student had a smaller class size for only one or two years. While Finn and Achilles (1990) found long lasting benefits of smaller classes, Krueger and Whitmore (2001) compared ninth grade students who had had at least one year of smaller class sizes from kindergarten through third grade with their ninth grade counterparts who had attended larger class sizes and found that those ninth grade students who had attended small classes performed better in mathematics. Also, Finn and Achilles (1999) stated that based on their assessment of the STAR project the treatment group improved at almost twice the rate as that which was extrapolated from Glass and Smith’s (1979) meta-analysis.
Nye et al. (2000b) and Nye et al. (2001a) further explained that the improvements in student achievement for those students in the treatment group were significant throughout the kindergarten through third grades but the most significant differences in performance versus the control groups were identified during the first two years of school. That is, the more years a student attended smaller classes the greater was his/her improvement in performance. Hanushek (1999a) and Hanushek (1999b), however, found less significant findings in his evaluations of the STAR project which led him to question the project’s implementation methodology including whether the STAR project used random assignment throughout the entire experiment due to the impact of attrition and how new students were introduced to the experiment. The lack of research into how attrition reduced the experiment’s random assignment and, therefore, affected the findings of the STAR project is reflected by a corresponding gap in the literature.
Research on the relationship between smaller class sizes and student achievement has demonstrated how smaller class sizes help to significantly close the achievement gap among minority and majority students. Although Word et al. (1994) found that the students in the treatment group outperformed the corresponding students in the control groups, minority students gained more than other students. Achilles et al. (1993), Konstantopoulos (2009), Nye et al. (1993), Konstantopoulos and Chung (2009), and Nye, Hedges, and Konstantopoulos (2004) echo the findings of Word et al. (1994) that smaller class size not only increased achievement for all students but also benefited most those students who are minorities, eligible to receive free or reduced-price lunches, or attend urban schools in low income districts. Krueger and Whitmore (2001), however, were more specific in concluding that for these at-risk students, small class sizes narrowed the achievement gap, reduced grade retention, decreased behavioral problems, reduced truancy, and increased graduation rates. But Fletcher (2009) did not find the same level of significance for Krueger and Whitmore’s (2001) results.
Word et al. (1994) found that minority, low income, and urban students achieved their most significant improvements during the first two years of having smaller class size rather than building cumulatively every year for four years. But Konstantopoulos (2009), Nye et al. (2001b), Konstantopoulos and Chung (2009), and Nye et al. (2004) found the most significant improvements for students spending four years in the experiment. Finn et al. (2001), however, found significant short term as well as intermediate term gains that continued through high school especially for minority students which seemed to combine the conclusions of Word et al. (1994) with those of Konstantopoulos (2009), Nye et al. (2001b), Konstantopoulos and Chung (2009), and Nye et al. (2004).
Among the researchers discussing the long term impact of smaller class size on student achievement, Achilles et al. (1993), Finn and Achilles (1999), Nye et al. (2001b), Konstantopoulos and Chung (2009), Nye et al. (2001a), and Nye et al. (1999) are at the forefront of those concluding that students in the smaller class sizes retained their advantage over other students who attended larger classes when students returned to larger classes beginning in fourth grade. Moreover, these researchers found that the advantages of smaller class size continued generally throughout high school. Konstantopoulos and Chung (2009) found that all students who attended smaller classes continued to outperform those students who had attended larger classes in mathematics, language arts, and science tests through eighth grade. Echoing Konstantopoulos and Chung (2009), Nye et al. (2001b) found that students who attended smaller classes continued to outperform those students who had attended larger classes in mathematics, language arts, and science tests through ninth grade. Nye et al. (2000a), Word et al. (1994), Nye et al. (2004), and Nye et al. (1999) also found that while minority, low income, and urban students who attended smaller classes achieved more significant gains than their majority student counterparts during the STAR experiment, these differences in achievement continued throughout high school although the differences narrowed over time.
Limitations
The STAR project was the longest longitudinal randomized experiment concentrating on the impact of smaller class size on student achievement and many researchers agreed with its conclusion of a significant internally valid relationship. A number of researchers, however, including Hanushek and Raymond (2005), Hanushek (1999a), Hanushek (1999b), Goldstein and Blatchford (1998), Hoxby (2000), Prais (1996), and Mitchell, Beach, and Baduruk (1991) raised questions or challenged certain aspects of the project.
Although Hoxby (2000), Prais (1996), Hanushek and Raymond (2005), Hanushek (1999a), Hanushek (1999b), and Mitchell et al. (1991) acknowledged that students and teachers were randomly assigned to the treatment groups’ classrooms, they faulted the experiment for its lack of information concerning the actual process through which the STAR project assigned teachers to classrooms and whether the STAR project tried to control for any differences in teacher quality. These researchers criticized the STAR project’s random assignment because of the ways in which teacher attitudes and behaviors as well as student expectations for achievement may have been influenced by their assignment to treatment groups. That is, these researchers argued that because the teachers and students understood the class size to which they were assigned, their attitudes and expectations based on their assignment may have influenced the outcome of the experiment.
Hanushek and Raymond (2005), Hanushek’s (1999a), and Hanushek’s (1999b) challenges concerning whether the STAR project actually proved a direct and causal relationship between smaller class size and corresponding increases in student achievement, however, seems to have been based on an inappropriate comparison. That is, Hanushek seems to have confused the definitions of class size and student-teacher ratio. The STAR project concentrated on determining a causal effect between reductions in class sizes and corresponding increases in student achievement based on class size which is defined as the number of students a teacher instructs in a classroom at a point in time. Hanushek based his analysis on student-teacher ratios instead of class sizes and a student-teacher ratio differs significantly from what is meant by class size because it is a school’s total student enrollment divided by the number of its full time teachers. Hanushek’s use of student-teacher ratios seems to have undermined his argument.
Schanzenbach (2007), Word et al. (1994), Finn and Achilles (1990), and Finn and Achilles (1999) reported how teachers and students were randomly assigned by grade level to classes of one of three different sizes including the treatment group of 13 to 17 students, a no treatment group of 22 to 25 students, and alternative treatment group of 22 to 25 students plus a teacher’s aid. But critics including Hoxby (2000), Prais (1996), Hanushek and Raymond (2005), Hanushek (1999a), Hanushek (1999b), and Mitchell et al. (1991) concluded that if the STAR project had been more of a parametric experiment then the project could have used many more and different class sizes instead of such a small number of groups which would have most likely led to the discovery of key break points. The importance of a parametric experiment is that it facilitates the identification of those specific class sizes or break points at which the marginal benefit of adding one more student equals the marginal cost of adding the incremental student and where if one more student were to be added then the marginal cost would exceed the marginal benefit. That is, it identifies the specific number of students in a class for which student achievement is maximized and if an incremental student were to be added then the incremental gain in student achievement would not be as significant. Using a parametric experimental design, therefore, could have enabled the STAR project to possibly determine the optimal class size for maximizing student achievement according to some of the experiment’s critics.
Although researchers such as Hanushek and Raymond (2005), Hanushek (1999a), Hanushek (1999b), Goldstein and Blatchford (1998), Hoxby (2000), Prais (1996), and Mitchell et al. (1991) concluded that the STAR project’s findings are difficult to generalize because of its use of just one treatment group and two control groups when using more and different treatment and control groups would have better, these critics raised additional questions concerning whether the experiment’s findings are internally valid. Goldstein and Blatchford (1998), Hanushek (1999a), Hanushek (1999b), and Hanushek and Raymond (2005) in particular criticized the STAR project because of the attrition it experienced. These critics questioned the experiment’s validity because they reported that only 48% of all students remained in the experiment for four years who entered the experiment starting in kindergarten and how the new students that replaced those who dropped out were selected and assigned was unknown. The STAR project’s validity is challenged by these critics because its randomization was reduced during the experiment.
Hanushek and Raymond (2005), Hanushek (1999a), Hanushek (1999b), Goldstein and Blatchford (1998), Hoxby (2000), Prais (1996), and Mitchell et al. (1991) enlarged their critique by determining that the replacement students were not assessed properly for such factors as preexisting differences in academic achievement upon entering the experiment. They also determined that test scores were lower among the dropouts than for those students who remained in study. In addition, certain students were shifted among the control groups in the second year of the study to balance the control group classes for racial and gender differences which led these critics to conclude that the STAR project’s results may be less significant. Hanushek and Raymond (2005), Hanushek (1999a), Hanushek (1999b), Goldstein and Blatchford (1998), Hoxby (2000), Prais (1996), and Mitchell et al. (1991) generally concluded that selection bias might have been a factor as a result of the attrition from the treatment and control groups as well as the movement of students among the control groups.
Student Achievement Guarantee in Education (SAGE) project
Wisconsin’s SAGE project attempted to replicate the findings of the STAR project. But in contrast to STAR, the SAGE project implemented a statewide class size reduction initiative to determine the relationship between smaller class sizes and student achievement as well as the long term effects of smaller classes that targeted students from disadvantaged homes. Molnar, Smith, Zahorik, Palmer, Halbach, and Ehrle (1999) reported that SAGE was a five year class size reduction experiment begun in the 1996-1997 school year to improve student achievement among students in low income or poverty level school districts by reducing class size in these schools to no more than 15. The SAGE program was phased in over three years with kindergarten and first grade in the first year, second grade in the second year, and third grade in the third year. Although only low income school districts were included initially in the experiment, by 2000 all other school districts could apply to the program.
Molnar, Smith, and Zahorik (2000) and Molnar et al. (1999) reported that the SAGE project reduced kindergarten through third grade class sizes only in schools referred to as SAGE schools in one of the following ways depending on the configuration of the classroom:
- A single classroom with a class size of 15
- A shared space classroom with a temporary wall dividing the classroom into two classrooms or sections with each classroom or section having 15 students
- A dual teacher classroom in which two teachers taught 30 students
- A “floating teacher” classroom of 16 to 20 students except during reading, language arts, and mathematics classes when a second teacher was added
Non-SAGE schools maintained existing larger class sizes.
Although non-SAGE comparison schools were not paired or linked with SAGE schools, the non-SAGE comparison schools were located within the same school district. Although the non-SAGE comparison schools had larger or what were considered to be more traditional class sizes for the respective school district, the non-SAGE comparison schools were similar in terms of their grade levels, enrollment, socioeconomic status, ethnicity, race, and gender. But Blatchford et al. (1998), Goldstein and Blatchford (1998), and Grissmer (1999) challenged the validity of the comparisons among class sizes of 15 in SAGE schools with class sizes in non-SAGE schools that varied greatly in contrast to the STAR project that limited its comparison of a treatment group class size of 13 to 17 students with a control group class size of 22 to 25 students.
Blatchford et al. (1998), Goldstein and Blatchford (1998), and Grissmer (1999) criticized how the four different configurations of a class size of 15 within the SAGE schools were incorrectly considered to be the same for comparison purposes with class sizes in non-SAGE schools. The lack of research into how the four different configurations of class size used within the SAGE schools not only were significantly different from one another but also affected the outcomes of the SAGE project is reflected by a corresponding gap in the literature.
Although reducing class sizes to 15 was the major component of SAGE, the project had three other features. Molnar et al. (1999) reported that a second component was the implementation of a rigorous curriculum that met state standards in every SAGE school. The third component of SAGE was the implementation of an educator and staff professional development program. The fourth component of SAGE was the implementation of what was called the lighted schoolhouse program. Molnar et al. (1999) and Graue and Oen (2009) explained that the lighted schoolhouse program included before and after school programs. But Blatchford et al. (1998), Goldstein and Blatchford (1998), and Grissmer (1999) criticized how the simultaneous implementation of three other components besides class size reduction might have affected SAGE’s outcomes. The lack of research into how the implementation of these four components might have affected the outcomes of the SAGE project is reflected by a corresponding gap in the literature.
Molnar et al. (1999) reported that SAGE used a quasi-experimental design with a treatment group and a control group that were pretested and posttested. SAGE used pretest scores to try to determine and reduce selection bias. Also, Webb, Meyer, Gamoran, and Jianbin (2004) reported that comparisons of kindergarten through third graders’ test scores on the Comprehensive Test of Basic Skills (CTBS) in SAGE and non-SAGE schools were controlled for family income, racial composition, and ethnicity.
Findings
The results of the SAGE project largely confirmed the findings of the STAR project. Molnar et al. (1999) concluded that students who attended small classes beginning with kindergarten and continuing through third grade significantly improved their academic achievement and that the benefits were greater for students from low income or poverty level families in the SAGE project. Students scored higher on the CTBS in reading, language arts, and mathematics tests as well as the overall CTBS results in SAGE schools than the students in their non-SAGE schools counterparts. Webb et al. reported that the favorable outcome based on socioeconomic status was significant because the SAGE project was designed to target students in low income school districts.
The increases in achievement for minority students were also greater than those of their majority counterparts which were among the findings that the SAGE project shared with the STAR project. While Molnar et al. (2000), Graue and Oen (2009), and Molnar et al. (1999) found improvements in student achievement during each of the four years, Blatchford et al. (1998), Goldstein and Blatchford (1998), and Grissmer (1999) differed in finding that first graders in SAGE schools had lower pretest but higher posttest scores than their non-SAGE counterparts and that the most significant increases in achievement were made in the first year even though Grissmer (1999) reported that the gains continued beyond third grade.
Cooper (1989), Molnar et al. (2000), Graue and Oen (2009), and Molnar et al. (1999) found increased student attention and teacher concentration on individual students as a result of smaller class size similar to the STAR project’s results as reported by Schanzenbach (2007), Finn and Achilles (1990), Finn and Achilles (1999), and Addonizio and Phelps (2000). These researches reported that teachers of the smaller classes in grades kindergarten through third grade reported that they spent more time educating, provided more differentiated instruction, had more interactions per child, and were better able to minimize disruptive student behavior. But Graue, Raucher, and Sherfinski (2009) concluded that the increases in student achievement were largely due to the combination of students having smaller class sizes with increased student attention and teacher concentration on individual students.
Limitations
Blatchford et al. (1998), Goldstein and Blatchford (1998), and Grissmer (1999) criticized the SAGE project primarily because SAGE used a quasi-experimental design in which the teachers and students were not randomly assigned. These researchers criticized SAGE for not employing matched pair SAGE and non-SAGE comparison schools. Also, classroom composition changed annually in both the SAGE and non-SAGE schools. These critics concluded that the SAGE project suffered from potential selection bias which reduced its internal validity because how school districts decided whether a school would be a SAGE school was unknown and the method of selecting students for smaller class sizes within SAGE schools was unknown. Blatchford et al. (1998) and Goldstein and Blatchford (1998) concluded that SAGE schools might have had favorable predisposition for treatment and that SAGE schools’ possible favorable predisposition might have biased outcomes so that the treatment seemed more effective. The SAGE project, unlike STAR, performed no long term evaluation beyond fourth grade of the effect of class size reduction.
Blatchford et al. (1998), Goldstein and Blatchford (1998), and Grissmer (1999) criticized the SAGE project because the three other components besides class size reduction of the SAGE project, including the rigorous curriculum that met state standards, professional development program, and lighted schoolhouse program, were not uniformly implemented. The extent to which these three factors may have contributed to study’s outcomes is unknown which makes it unclear if class size reduction by itself caused the outcomes.
California Class Size Reduction Program (CSRP)
The California CSRP attempted to replicate the findings of the STAR project through what remains as the largest class size reduction initiative. But in contrast to the STAR project, the California CSRP implemented a statewide class size reduction initiative to determine the relationship between smaller class sizes and student achievement which was phased in over four years but was not longitudinal. Stecher and Borhnstedt (2000) and Mitchell and Mitchell (1999) reported that the California CSRP used a quasi-experimental design to study the impact of smaller class sizes on student achievement using two nonrandomized comparison groups but no control group. The first nonrandomized comparison group or what was considered to be the smaller or reduced class used a class size of not more than 20 students while the second or larger class used a class size of more than 20 students. Student achievement was measured using the Stanford Achievement Tests for reading, language arts, and mathematics.
Findings
Although the results of the California CSRP were largely consistent with the findings of the STAR project, the impact of smaller class size on student achievement across all groups and especially in terms of closing the minority-majority achievement gap was considered to be not as significant. While Stecher and Borhnstedt (2000) found some increases in student achievement, they faulted the initiative’s quasi-experimental design which used two treatment groups but no control group for preventing significant conclusions from being drawn on the relationship of smaller class size and student achievement. Also, Stecher and Borhnstedt (2000) could not determine how a class size of 20 was determined to be the pivot point.
When Mitchell and Mitchell (1999) assessed the California CSRP, they controlled for a student’s family income, gender, racial composition, and ethnicity. Mitchell and Mitchell (1999) concluded that there were small but significant increases in student achievement across different groups once they controlled for these factors. Mitchell and Mitchell (1999) also concluded that they could not rule out the possibility that other factors in addition to smaller class size could have led to the increases in student achievement which echoed the findings of Stecher and Borhnstedt (2000).
The findings of the STAR and SAGE projects were echoed by Stecher and Borhnstedt (2000), Mitchell and Mitchell (1999), Jepsen and Rivkin (2009), and Funkhouser (2009) in their assessment of the California CSRP when they found increased student attention and teacher concentration on individual students as a result of smaller class size. These researches reported that teachers of the smaller classes in grades kindergarten through third grade reported that they spent more time teaching, provided more differentiated instruction especially for mathematics and language arts, had more interactions per child, and were better able to minimize disruptive student behavior.
But Jepsen and Rivkin (2009) found that the large scale of the California CSRP led to a shortage of certified teachers statewide that was most acute in low income school districts. Many schools serving at-risk and low income students, therefore, had a disproportionate number of teachers who were not certified, lacked previous teaching experience, or had an emergency teacher certification. This degradation of teacher qualifications may have offset some of the increases in student achievement that would otherwise have occurred especially in low income schools according to Jepsen and Rivkin (2009). Also, Funkhouser (2009) concluded that the increases in student achievement might have been not only more significant but also more readily attributed to smaller class sizes if it were not for other programs such as a revised statewide curriculum, more standardized testing, and new statewide student grade advancement standards that were implemented simultaneously.
Limitations
Mitchell and Mitchell (1999) and Stecher and Borhnstedt (2000) found that the California CSRP differed from the STAR project. The California CSRP differed from the STAR project because its large scale caused the initiative to become extremely expensive particularly in terms of increased aggregate school spending for more teachers and additional classroom facilities. Although Mitchell and Mitchell (1999) and Stecher and Borhnstedt (2000) concluded that the relatively small increases in student achievement were due at least in part to the reduction of class size, they faulted the California CSRP’s quasi-experimental design primarily because it employed two nonrandomized comparison groups but no control group, did not define why having class sizes of either less or more than twenty students was the basis for the treatment groups, and performed no pretesting. Also, Mitchell and Mitchell (1999) and Stecher and Borhnstedt (2000) faulted the California CSRP for not explaining how students were assigned to the treatment groups. This led critics to conclude that the California CSRP suffered from a history threat to internal validity.
Mitchell and Mitchell (1999) and Stecher and Borhnstedt (2000) found that California schools typically had class sizes of approximately 30 students before the implementation of the California CSRP. Because there was no pretest, these critics reported that it can not be determined if student performance improved with the posttest because no comparison could be made. This led them to conclude that the initiative’s use of treatment groups that were either below or above 20 students resulted in classes that were closer in size to the larger classes of the STAR project which might have caused smaller increases in student achievement than were found in the STAR project. In conclusion, Mitchell and Mitchell (1999), Jepsen and Rivkin (2009), and Stecher and Borhnstedt (2000) could not determine if the smaller increases in student achievement than those that were found in the STAR project were a function of using larger treatment groups than those of the STAR project or other factors such as the program’s quasi-experimental design, shortage of certified teachers, and the simultaneous implementation of other statewide educational programs. The California CSRP, therefore, did not replicate the STAR project which was a controlled randomized longitudinal experiment.
Questions for Future Research
Although the majority of the class size research, including Tennessee’s STAR project, Wisconsin’s SAGE project, and California’s CSRP, has largely concluded that smaller class size leads to increases in student achievement, helps to close the minority-majority achievement gap, and has several other long lasting benefits, it not only demonstrates the need for more and different kinds of research but also raises questions. How long should a student remain in a reduced class size before returning to a class of regular size? One question is whether the increases in student achievement resulting from class size reduction initiatives are maximized in the earlier grades such as kindergarten through third grade or whether class size reductions in subsequent grades perhaps through high school would generate similar gains. If it could be proven that incremental increases in student achievement resulted from reducing class sizes from kindergarten through high school, would school districts be able to raise the necessary funds to pay for the additional teachers, support personnel, classrooms, and other facilities?
Do the benefits of smaller classes persist or diminish over time? Future studies could investigate the extent to which increases in student achievement continue into later grades, whether the gains diminish over time, and the points at which the increases in achievement begin to diminish. If the research proved that the increases in student achievement are maximized from kindergarten through third grade but begin to decline significantly thereafter, then the benefits of smaller classes might be not generalizable to higher grade levels and implementing class size reductions from fourth grade through high school might not be as cost effective.
What is the optimal class size for maximizing student achievement? The smaller class sizes in the STAR project were 13 to 17 students while the larger class sizes were 22 to 25 students. While the SAGE project used smaller class sizes of 15 or less and a variety of larger ones, the California CSRP used class sizes of less than 20 as well as various class sizes of more than 20. Although smaller class size has been found to result in increases in student achievement, the optimal class size is not well defined. Also, future research could focus on whether smaller class size affects different groups differently. Future research could investigate the extent to which different groups such as students who are poor performers, come from low income families, members of a minority group, or English language learners might require different class sizes to maximize their achievement.
Although studies, including the STAR project, SAGE project, and California CSRP, have determined that class size reduction initiatives led to increases in student achievement, more research is needed to determine what other factors might contribute to increases in student achievement besides smaller class size. Research could be performed to determine the impact of having highly qualified teachers teach students in smaller classes. This factor may have influenced the outcomes of the California CSRP in which the lack of certified teachers may have significantly offset the benefits of smaller class size. Additional research could be conducted to determine the impact that different state teacher education, training, and certification requirements have on the ability of teachers in different states to influence student achievement. It would be helpful to learn through future research what unique teaching methods are best suited for small as well as large class sizes and how to train teachers to use these methods effectively in the classroom.
This review reveals that certain essential elements of smaller class sizes merit more research including the nature of small class sizes and whether the small class size effect is a start-up or one-time phenomenon. In terms of the nature of class size, a number of questions were raised by this review. Are there inherent advantages of smallness or having a smaller size for student achievement? That is, research could try to determine if smaller class sizes have certain contextual advantages over larger class sizes.
Another subject for future research is to determine the extent to which the small class size effect is a one-time or start-up phenomenon. Class size initiatives have concentrated primarily on evaluating the possible linkage to how students perform in kindergarten through third grade. Research is needed to determine whether smaller class sizes may work best because students are young and have just entered school. Researchers, therefore, must account for the fact that the smaller class size effect can not be separated from a student’s age. Also, it is possible that class size reductions realize the most significant increases in student achievement during the early grades because students are just beginning the school socialization process. One way to assess the possible start-up phenomenon would be to conduct class size reduction initiatives in the fourth through twelfth grades and compare the results to those from class size reduction initiatives from kindergarten through third grade.
Conclusion
The review of the literature concerning the research into three major experiments that concentrated on the relationship between smaller class sizes and student achievement, STAR project, SAGE project, and California CSRP, demonstrates that having a smaller class size not only increases student achievement but also helps to significantly minimize the achievement gap among different groups of students and has several other long lasting benefits. While the apparent advantages of smallness or small class size in improving student achievement may not be surprising to educators, administrators, students, and parents as well as to the many researchers who share this view, this literature review finds support for more research to address the challenges raised by the critics of the major experiments of the relationship between smaller class sizes and student achievement.
Having fewer students in the classroom enables the teacher to dedicate more time to each child. Consequently, students pay more attention to class work and participate more in academics. Because the students are more involved with their studies they learn more and behave better. The literature, therefore, is consistent with the conclusion that test scores are significantly higher for students who attend small classes. Based primarily upon the findings of the STAR project as well as the SAGE project and California CSRP, the overall conclusion in the literature that having a smaller class size increases student achievement seems to outweigh the doubts that students taught in small classes enjoy significant and lasting educational advantages especially minority and low income students.
The review of the literature concerning the research into three major experiments of the relationship between smaller class sizes and student achievement suggests that the social context of smaller class size may have a favorable influence on student performance. That is, there may be inherent advantages of smallness that foster a greater of sense of community than is possible in larger class sizes. Smaller class sizes, therefore, may generate what Fischel (2002, p. 5) defines as classroom based “community specific social capital” which contributes to improving student performance. Fischel (2002, p. 2) cites Boozer and Rouse’s (2001) assessment of how “bigger classes (more student-consumers) do detract, at plausible margins, from the education of others” in his assessment of the adverse impact of larger class sizes on student performance. Small class sizes, therefore, seem to generate increases in student achievement that are inversely related to their size.
Smaller class size may improve student achievement because of its social ecology. The provision of education through small class sizes in which the students and teachers are able to get to know and understand one another better may facilitate increases in student achievement while according to Fischel (2002, p. 1) it “reduces the transaction costs of the provision of true local public goods such as education” by increasing graduation rates and reducing truancy. The communal capital of smaller class sizes, therefore, generates more frequent and meaningful face-to-face student-teacher interactions. These more frequent one-on-one student-teacher interactions may create an improved learning process in which student performance improves as a result of smaller class size. Increases in student achievement, therefore, may be a function of the social ecology of smaller class size.
References
Achilles, C. M., Nye, B. A., Zaharias, J. B., & Fulton, B. D. (1993). The Lasting Benefits Study (LBS) in grades 4 and 5 (1990-1991): A legacy from Tennessee’s four-year (K-3) class-size study (1985-1989), Project Star. Greensboro, North Carolina: North Carolina Association for Research in Education.
Addonizio, M. F. & Phelps, J. L. (2000). Class size and student performance: A framework for policy analysis. Journal of Education Finance, 26(2), 135-156.
Blatchford, P., Goldstein, H., & Mortimore, P. (1998). Research on class size effects: A critique of methods and a way forward. International Journal of Educational Research, 29, 691-710.
Boozer, M. A. & Rouse, C. (2001). Intraschool variation in class size: Patterns and implications. Journal of Urban Economics, 50(1), 163-189.
Cooper, H. M. (1989). Does reducing student-to-teacher ratios affect achievement? Educational Psychologist, 24(1), 79-98.
Finn, J. D. & Achilles, C. M. (1990). Answers and questions about class size: A statewide experiment. American Educational Research Journal, 27(3), 557-577.
Finn, J. D. & Achilles, C. M. (1999). Tennessee’s class size study: Findings, implications, misconceptions. Educational Evaluation and Policy Analysis, 21(2), 97-109.
Finn, J. D., Fulton, B. D., Nye, B. A., & Zaharias, J. B. (1992). Carry-over effects of small classes. Peabody Education Journal, 67, 75-84.
Finn, J. D., Gerber, S. B., Achilles, C. M., & Boyd-Zaharias, J. (2001). The enduring effects of small classes. Teachers College Record, 103, 145-183.
Fischel, W. A. (2002). An Economic Case against Vouchers: Why Local Public Schools Are a Local Public Good, (Dartmouth Economics Department Working Paper 02-01). Dartmouth College: Dartmouth College Economics Department.
Fletcher, J. M. (2009). Is identification with school the key component in the “black box” of educational outcomes? Evidence from a randomized experiment. Economics of Education Review, 28(6), 662-671.
Funkhouser, E. (2009). The effect of kindergarten classroom size reduction on second grade achievement: Evidence from California. Economics of Education Review, 28(3), 403-414.
Glass, G. & Smith, M. L. (1979). Meta-analysis of research on class size and achievement. Educational Evaluation and Policy Analysis, 1, 2-16.
Goldstein, H. & Blatchford, P. (1998). Class size and educational achievement: A review of methodology with particular reference to study design. British Educational Research Journal, 24(3), 255-268.
Graue, M. E. & Oen, D. (2009). You just feed them with a long-handled spoon: Families evaluate their experiences in a class size reduction reform. Educational Policy, 23(5), 685-713.
Graue, M. E., Raucher, E., & Sherfinski, M. (2009). The synergy of class size reduction and classroom quality. The Elementary School Journal, 110(2), 178-201.
Grissmer, D. (1999). Class size effects: Assessing the evidence, its policy implications, and future research agendas. Educational Evaluation and Policy Analysis, 21(2), 231-248.
Hanushek, E. A. (1989). The impact of differential school expenditures on school performance. Educational Researcher, 18(4), 45-65.
Hanushek, E. A. (1996). A more complete picture of school resource policies. Review of Educational Research, 66(3), 397-409.
Hanushek, E. A. (1999a). The evidence on class size. In Mayer, S. E. & Peterson, P. E. (Eds.), Earning and Learning: How Schools Matter (pp. 131-168). Washington, DC: Brookings Institution Press.
Hanushek, E. A. (1999b). Some findings from an independent investigation of the Tennessee STAR experiment and from other investigations of class size effects. Educational Evaluation and Policy Analysis, 21(2), 143-163.
Hanushek, E. A. & Raymond, M. E. (2005). Does school accountability lead to improved student performance? Journal of Policy Analysis and Management, 24(2), 297-327.
Hedges, L. V., Laine, R. D., & Greenwald, R. (1994). Does money Matter? A meta-analysis of the effects of differential school inputs on student outcomes. Educational Researcher, 23(3), 5-14.
Hoxby, C. M. (2000). The effects of class size on student achievement: New evidence from population variation. Quarterly Journal of Economics, 115(3), 1239-1285.
Jepsen, C. & Rivkin, S. (2009). Class size reduction and student achievement: The potential tradeoff between teacher quality and class size. Journal of Human Resources, 44(1), 223-250.
Konstantopoulos, S. (2009). Effects of teachers on minority and disadvantaged students’ achievement in the early grades. Elementary School Journal, 110(1), 92-113.
Konstantopoulos, S. & Chung, V. (2009). What are long-term effects of smaller classes on the achievement gap? Evidence from the Lasting Benefits Study. American Journal of Education, 116(1), 125-154.
Krueger, A. B. (1999). Experimental estimates of education production functions. Quarterly Journal of Economics, 114(2), 497-532.
Krueger, A. B. & Whitmore, D. M. (2001). Would Smaller Classes Help Close the Black-White Achievement Gap? Princeton, New Jersey: Princeton University Press.
Mitchell, D. E., Beach, S. A., & Baduruk, G. (1991). Modelling the Relationship between Achievement and Class Size: A Re-analysis of the Tennessee Project STAR Data. Riverside, CA: University of California, California Educational Research Cooperative.
Mitchell, D. E. and Mitchell, R. E. (1999). The impact of California’s Class Size Reduction initiative on student achievement: Detailed findings from eight school districts. Riverside, CA: University of California, California Educational Research Cooperative.
Molnar, A., Smith, P., Zahorik, J., Palmer, A., Halbach, A., & Ehrle, K. (1999). Evaluating the SAGE program: A pilot program in targeted pupil-teacher reduction in Wisconsin. Educational Evaluation and Policy Analysis, 21(2), 167-177.
Molnar, A., Smith, P., & Zahorik, J. (2000). 1999-2000 Evaluation Results of the Student Achievement Guarantee in Education (SAGE) Program. Milwaukee, Wisconsin: Center for Urban Initiatives and Research, University of Wisconsin.
Nye, B. A., Achilles, C. M., Zaharias, J. B., & Fulton, B. D. (1993). Project Challenge third-year summary report: An initial evaluation of the Tennessee Department of Education “At Risk” Student/Teacher Ratio Reduction Project in seventeen counties 1989-90 through 1991-92. Nashville: Center of Excellence for Research in Basic Skills, College of Education, Tennessee State University, Tennessee State University Press.
Nye, B. A., Hedges, L. V., & Konstantopoulos, S. (1999). The long-term effects of small classes: A five-year follow-up of the Tennessee class size experiment. Educational Evaluation and Policy Analysis, 21(2), 127-142.
Nye, B. A., Hedges, L. V., & Konstantopoulos, S. (2000a). Do minorities and the disadvantaged benefit more from small classes? Evidence from the Tennessee class size experiment. American Journal of Education, 109, 1-26.
Nye, B. A., Hedges, L. V., & Konstantopoulos, S. (2000b). The effects of small classes on academic achievement: The results of the Tennessee class size experiment. American Educational Research Journal, 37(1), 123-151.
Nye, B. A., Hedges, L. V., & Konstantopoulos, S. (2001a). Are effects of small classes cumulative? Evidence from a Tennessee experiment. Journal of Educational Research, 94, 336-345.
Nye, B. A., Hedges, L. V., & Konstantopoulos, S. (2001b). The long-term effects of small classes in early grades: Lasting benefits of mathematics achievement at grade 9. Journal of Experimental Education, 69, 245-257.
Nye, B. A., Hedges, L. V., & Konstantopoulos, S. (2004). Do minorities experience larger lasting benefits from small classes? Journal of Educational Research, 98, 94-100.
Prais, S. J. (1996). Class size and learning: The Tennessee experiment – what follows? Oxford Review of Education, 22(4), 399-414.
Schanzenbach, D. W. (2007). What have we learned from Project STAR? Brookings Papers on Education Policy, 205-228.
Slavin, R. E. (1989). Class size and student achievement: Small effects of small classes. Educational Psychologist, 24, 99-110.
Stecher, B. M. & Borhnstedt, G. W. (2000). Class size reductions in California: The 1998-99 evaluation findings. Sacramento, California: California Department of Education.
Webb, N. L., Meyer, R. H., Gamoran, A., & Jianbin, F. (2004). Participation in the Student Achievement Guarantee in Education (SAGE) Program and performance on state assessments at grade 3 and grade 4 for three cohorts of students. Milwaukee, Wisconsin: Wisconsin Center for Education and Research.
Word, E. R., Johnston, J., Bain, H. P., & Fulton, B. D. (1994). The State of Tennessee’s Student-Teacher Achievement Ratio (STAR) Project: Technical Report 1985-1990. Nashville, Tennessee: Tennessee State Department of Education.