Prime Highlight
- Quill.org is partnering with Leanlab Education and Learning Commons in a $2.8 million initiative to improve the quality and reliability of AI-powered literacy tools used in schools.
- The project aims to ensure AI-generated feedback and reading materials meet research-backed instructional standards trusted by educators.
Key Facts
- The funding will support the creation of open datasets, testing methods, and scoring tools to evaluate the quality of AI-generated feedback and reading passages for grades 3–12.
- All datasets, tools, and evaluation frameworks developed through the initiative will be released publicly for use by schools and education technology developers.
Background
Quill.org has confirmed its role in a $2.8 million initiative aimed at improving the quality and reliability of AI-powered literacy tools used in schools. The nonprofit will work with Leanlab Education and Learning Commons to study whether AI-generated feedback and reading material meet research-backed classroom standards.
The announcement follows comments made by Quill.org CEO and co-founder Peter Gault on LinkedIn, where he raised concerns about the uneven quality of AI tools marketed to educators. He said that while artificial intelligence can help teachers give faster and more detailed feedback, many current products fail to meet high instructional standards.
Quill.org provides free writing and literacy tools that millions of students use across the US. Leanlab Education supports schools in testing education technology in real classrooms. Learning Commons, backed by the Chan Zuckerberg Initiative, builds open systems that connect learning science with AI development.
The funding will support three connected projects. These will focus on creating open datasets, testing methods, and scoring tools that can measure the quality of AI-generated feedback and reading passages. The goal is to make sure that AI content matches trusted learning rubrics used by teachers.
As part of the effort, Quill.org and Leanlab will build a large dataset of anonymised student writing. Researchers will tag the samples to show what strong feedback looks like in practice. This dataset will allow developers to test whether their tools follow proven writing instruction methods.
Another part of the project will improve tools that check the difficulty level of reading passages for students in grades three to twelve. These tools will help AI systems judge whether their content fits the right grade level.
All tools, datasets, and testing methods created under the project will be released to the public, giving schools and developers a shared way to judge AI classroom products.



