By Kirk Vanacore

Chatbots may sound like a tutor, but how do we make them teach like one?

Generative AI has made it suddenly easy to build a digital tutor that talks. That sentence would have sounded implausible not long ago. For years, most tutoring software behaved more like an adaptive worksheet than a conversational partner. It could present a problem, check an answer, perhaps offer a hint, and move on. Today, by contrast, a student can type a question into a chat window or speak into a microphone and receive an instant response that feels personal, fluent, and responsive. The change is dramatic enough that it is tempting to think the hardest part is over.

It is not. The future of AI tutoring will not be decided by which system sounds most human, but by which systems are built on a serious understanding of how tutoring actually helps people learn.
That is the bracing message of The Path to Conversational AI Tutors. We argue that conversational fluency is not the same thing as tutoring expertise and that the field risks confusing a new interface with a solved educational problem. If AI tutors are going to matter, they must be designed around what decades of research have already shown about why human tutoring works and what earlier intelligent tutoring systems got right.

Generative AI provides plenty of reasons to push toward novel designs in educational technology. But speed carries risks. Education technology has a habit of skirting familiar problems only to rediscover them in new packaging. AI now makes it possible to generate endless explanations, examples, prompts, and encouragement. But the central challenge of tutoring was never generating more words. It was, and still is, about applying the correct pedagogical and motivational strategies based on a student’s needs at a given time.

Fortunately, this is not unexplored ground. Researchers in education, the learning sciences, and AI in education have spent decades studying these problems, developing useful frameworks, and testing designs that can inform the next generation of conversational tutors.

Keep Legacy Intelligent Tutoring System Strengths

Educational technologists have long looked to human tutors as the gold standard for educational interventions. Human tutors are effective not because they constantly explain, but because they know when not to. They scaffold, probe, and cue students towards correct thinking. They notice when a student is on the verge of insight and when that student is simply stuck. They recalibrate based on new information. They know when to challenge a student or when that student needs empathetic support. They help a learner stay in the productive space between boredom and overload. Good tutoring is not a stream of polished responses. It is disciplined interaction in the service of learning.

Tutoring is, at its core, a form of guided cognition carried out through conversation. Dialogue matters, but not because dialogue is inherently magical. It matters because it can surface student thinking, prompt self-explanation, support metacognition, sustain engagement, and build the trust that makes learning possible. Research on human tutoring shows that expert tutors often rely on subtle moves rather than immediate correction: a leading question, a prompt to reflect, a cue that helps the learner repair their own reasoning. When students reach an impasse, tutors may become more direct, offering explanation or demonstration, but even then the goal is not to take over the thinking. It is to keep the learner doing the intellectual work while receiving just enough support to move forward. Conversation, in other words, is an effective medium for tutoring, but never a substitute for sound pedagogy.

This is where the real design work begins. The challenge is not simply to make tutors more conversational, but to decide what prior strengths to preserve, what new affordances to embrace, what dimensions of learning to place at the center, and what questions still demand careful study. We organize that challenge with a simple framework: keep, change, center, and study.

Keep Legacy Intelligent Tutoring System Strengths

That distinction helps explain why we do not treat generative AI as a clean break from the past. The future of conversational tutoring should be built partly from old parts. This is a useful corrective to the now-familiar story that large language models have rendered prior educational technology obsolete. They have not. Earlier intelligent tutoring systems addressed problems that remain central to tutoring: estimating what a student knows, selecting what they should do next, identifying misconceptions, and detecting when persistence has become confusion, frustration, or disengagement.

The point is not to preserve legacy systems out of sentiment. It is to preserve the capabilities they developed. Knowledge tracing, knowledge graphs, affect detection, and the classic inner-loop/outer-loop architecture may not sound glamorous next to today’s conversational AI, but they provide exactly what free-form dialogue lacks on its own: memory, constraint, and instructional direction. The inner loop helps a tutor respond productively within a problem; the outer loop helps it choose the next problem, concept, or level of challenge. Together, these structures keep tutoring aligned with learning rather than mere responsiveness. A tutor needs to know more than how to answer. It needs a grounded account of what the student is ready for next.

Changing Content, Feedback, and Scaffolding delivery

Of course, Generative AI does make some real changes possible. A conversational tutor can create examples on demand, adapt explanations to a student’s background, ask follow-up questions in natural language, and respond flexibly rather than pulling from a fixed script. Earlier systems were often powerful but rigid. Newer ones can be more open-ended, more conversational, and more responsive to the path a student is actually taking through a problem. That matters.

One important change is the ability to generate bespoke instructional content in real time. Instead of relying only on prewritten problem sets, hints, or examples, a conversational tutor can produce new material tailored to a learner’s interests, prior knowledge, or immediate difficulty. Another is the possibility of dialogic scaffolding: not simply telling students whether they are right or wrong, but probing their reasoning, asking them to clarify, and adjusting support as their thinking unfolds. In principle, this moves tutoring closer to the responsiveness of a skilled human tutor.

But the point is not to treat generative fluency as if it were pedagogy. A model that can produce endless explanations can still misread what a student knows, give unhelpful guidance, or respond plausibly without advancing learning. The opportunity is not just more language. It is the chance to make tutoring more adaptive, more immediate, and more dialogic, provided those capabilities are disciplined by sound educational design.

Centering Meaning-Making, Agency, and Student Reasoning

The most exciting opportunity in conversational tutoring is not simply greater flexibility. It is the chance to place meaning-making, student agency, and reasoning at the center of the interaction. The paper argues that natural language dialogue creates opportunities to engage not just with whether a student is correct, but with what the student is trying to say, how the student is making sense of an idea, and where that reasoning begins to break down.

This is a meaningful shift. Earlier systems were often designed to evaluate answers, deliver feedback, and move learners through a sequence. Conversational tutors can potentially do something richer: they can ask students to explain, clarify, revise, and reflect. They can work with partial understanding rather than waiting for a finished response. They can create more room for student choice in how a problem is approached, while also giving the tutor more access to the learner’s thinking in progress.

That does not make conversation inherently educational. But it does create the possibility of tutoring that is less about response selection and more about helping learners construct meaning, exercise agency, and develop stronger habits of reasoning.

Study Efficacy, Student Experience, and Human-AI Collaboration

That last term may be the most important. The current wave of AI tutoring tools has produced plenty of demos and no shortage of enthusiasm, but much less evidence than the excitement might suggest. The paper reviews emerging systems and some promising early findings, yet its tone is appropriately restrained. We still do not know enough about whether these systems consistently improve learning, how students experience them over time, or how conversational tutors should interact with teachers and other forms of human support.

Those are not minor open questions. A system may be fluent without being effective, engaging at first without sustaining productive use, or useful in isolation but poorly integrated into classrooms and tutoring programs. If conversational AI tutors are going to matter, they will need more than technical benchmarks or compelling transcripts. They will need careful study of outcomes, student perceptions, and the division of labor between AI and humans. In education, the real test is not whether the system can respond. It is whether the response helps someone learn.

Conclusion

Education has always been vulnerable to what might be called performance theater: systems that look impressive in use, feel modern in practice, and produce testimonials long before they produce convincing evidence of learning. Generative AI intensifies that risk because it is so good at sounding helpful. But students do not learn because a system is eloquent. They learn because the interaction changes what they notice, how they reason, what they attempt, and whether they persist. A tutoring system worth trusting has to be judged there.

This is why the paper matters beyond tutoring. It offers a broader lesson for AI in education: a model’s capability is not the same as an educational design. Large language models can generate plausible instructional moves, but plausibility is not the same thing as timing, diagnosis, or pedagogical fit. The question is not whether AI can produce tutor-like language. The question is whether AI can be embedded in systems that reliably support learning processes.

The strongest takeaway is also the simplest. The path to conversational AI tutors does not run mainly through bigger models or smoother prose. It runs through the old, stubborn questions educational research has always had to answer: What does the learner understand? What kind of support is needed now? What should happen next? And did any of it actually improve learning?
Until AI tutors can answer those questions well, talking like a tutor will remain easier than being one.