AI Therapy Bots: Evidence on What They Can and Cannot Do

Karlsson et al. (2020) (strong evidence: meta-analysis) showed us that for acute low back pain, exercise therapy is a real deal, suggesting that physical movement is a key part of recovery. As artificial intelligence gets smarter, so does the buzz around AI therapy bots - digital companions designed to offer mental health support. It's exciting stuff, promising accessible care anytime, but like any powerful tool, we need to know exactly what these bots are good for and, perhaps more importantly, where their limitations lie. Understanding this boundary is crucial for both patients and the technology developers.

What are the proven boundaries of AI support in therapy?

When we talk about AI therapy bots, we are essentially talking about sophisticated computer programs mimicking human interaction. The core question researchers are grappling with is: can an algorithm truly replicate the nuanced, messy, empathetic connection of a human therapist? The evidence suggests that while AI excels at certain structured tasks, it hits walls when deep, subjective human understanding is required. To understand these boundaries, it helps to look at what experts have historically warned us about when developing artificial intelligence. Dreyfus and Dreyfus (1992) (preliminary) laid out foundational concepts about what artificial experts can and cannot do, suggesting that true expertise involves contextual judgment that current systems struggle with. They noted that while AI can process vast amounts of data, it lacks the lived experience that informs human judgment.

In the area of mental health, the concept of "listening" is key. Solution Focused Brief Therapy, as described in the (2012) paper, emphasizes what the client can do, shifting focus away from deficits. An AI bot can certainly prompt you with goal-setting questions, which is helpful. However, the depth of a human therapist's ability to interpret subtle shifts in tone, body language (if video is involved), or unspoken emotional context remains a significant hurdle for current technology. The bot can process the words you type, but the subtext is often where the magic - and the danger - lies.

We also see parallels in other fields of science. Steane (2018) (preliminary) provided a useful framework for understanding what science itself can and cannot do. Science is excellent at identifying correlations and testing hypotheses, but it cannot predict every single outcome with 100% certainty, especially when complex biological systems are involved. Similarly, when looking at physical therapy, Karlsson et al. (2020) (strong evidence: meta-analysis) confirmed that structured interventions like exercise are highly effective for acute low back pain, demonstrating that evidence-based protocols work. This suggests that AI is best used to deliver these protocols - the structured, measurable parts of care - rather than replacing the overall therapeutic relationship.

Another helpful analogy comes from agriculture. When considering what insecticides can and cannot do to prevent potato viruses, the research highlights that while chemicals can target specific threats, they cannot account for every possible environmental variable or complex biological interaction (2015). AI bots face a similar challenge: they are trained on existing data. If a user presents a unique emotional crisis or a combination of symptoms not well-represented in their training data, the bot might offer a generalized, unhelpful, or even slightly inaccurate response. Govender (2023) (preliminary) directly addresses this, cautioning that understanding AI's capabilities requires acknowledging its inherent limitations - it is a tool for information processing, not a consciousness.

Furthermore, the concept of mediation, as discussed by Kratz (2024) (preliminary), shows that the process of communication is as important as the content. A human therapist mediates the conversation by guiding it, challenging assumptions gently, and building rapport. While an AI can simulate mediation by asking follow-up questions, it lacks the genuine, reciprocal vulnerability that builds true therapeutic alliance. Smith (1999) (strong evidence: meta-analysis), in reviewing meta-analyses, emphasized the need to critically evaluate the evidence base itself, reminding us that even seemingly strong data summaries require careful interpretation. For AI, this means we must treat its output as highly informed suggestions, not as definitive diagnoses or cures.

What does the evidence suggest about AI's supportive role?

The consensus emerging from the research is that AI bots are powerful assistants and educational tools, not replacements for human care. Their strength lies in accessibility and consistency. For instance, if a user needs reminders to practice deep breathing exercises or wants a structured journaling prompt based on Cognitive Behavioral Therapy (CBT) principles, an AI bot can deliver this reliably, 24/7. This is a massive improvement over waiting for a human appointment.

The evidence points toward AI being excellent at psychoeducation. It can explain concepts like the fight-or-flight response or the basics of mindfulness in a way that is immediately digestible, much like a well-designed educational module. This aligns with the idea that structured learning, like the physical routines proven effective for back pain (Karlsson et al., 2020), benefits immensely from consistent, repetitive guidance.

However, we must be careful not to overstate its emotional intelligence. The (2012) paper on Solution Focused Brief Therapy reminds us that the client's agency - their ability to identify their own strengths and goals - is paramount. An AI can prompt this self-discovery, but the feeling of that breakthrough, the moment of realization that comes from human connection, is something the current technology cannot replicate. The bot can provide the scaffolding, but the human must do the building.

In summary, the research paints a picture of a highly capable digital assistant. It can track moods, deliver psychoeducational content, and keep users accountable to self-care routines. But when the issue moves from "What steps should I take?" to "How do I feel right now, in this messy, unquantifiable moment?", the evidence suggests that the nuanced, embodied wisdom of a trained human professional remains irreplaceable. We are at a point where AI enhances the process of care, but it hasn't yet mastered the art of care.

Practical Application: Structuring AI-Assisted Support

For AI therapy bots to be most effective, their use should be highly structured, mimicking established therapeutic protocols rather than functioning as open-ended conversational companions. One promising area of application is in managing mild to moderate anxiety or depressive symptoms through Cognitive Behavioral Therapy (CBT) techniques. A structured protocol could involve daily check-ins, administered over a defined period, such as six to eight weeks.

The protocol would begin with a Needs Assessment Phase (Week 1). The bot would prompt the user to track mood, identify triggers, and log negative automatic thoughts (NATs) for a duration of one week. This initial data collection is crucial for establishing a baseline. Following this, the Skill Building Phase (Weeks 2-5) would commence. The bot would guide the user through specific exercises, such as thought challenging worksheets, where the user inputs a NAT, and the bot prompts them with Socratic questioning (e.g., "What evidence supports this thought?" or "What is an alternative perspective?"). These interactions should occur daily, ideally in two 15-minute sessions - one in the morning to set intentions and one in the evening for review. The Maintenance and Relapse Prevention Phase (Weeks 6-8) would involve reducing the frequency of direct intervention while increasing the complexity of self-guided tasks. The bot might prompt weekly journaling prompts related to high-risk situations or teach advanced distress tolerance skills, requiring the user to practice these skills independently between sessions. Consistency in timing (e.g., 8:00 AM and 8:00 PM) and duration (totaling 30 minutes of active engagement per day) helps build routine, which is a therapeutic goal in itself.

Furthermore, for individuals managing chronic conditions like insomnia, the bot could be programmed to deliver biofeedback-style relaxation scripts or guided progressive muscle relaxation (PMR) exercises at a set time, such as 30 minutes before the intended bedtime, ensuring the intervention is timed to maximize its physiological impact.

What Remains Uncertain

Despite the utility in structured skill practice, the current evidence overwhelmingly cautions against viewing AI bots as replacements for human care. A critical limitation is the inability of current models to accurately assess nuanced emotional states, particularly those involving complex trauma or acute suicidality. While bots can be programmed with safety protocols - such as recognizing keywords and immediately providing crisis hotline numbers - they lack the intuition, empathy, and contextual understanding derived from years of human clinical experience. They cannot read the subtle non-verbal cues that a human therapist relies upon.

Another significant unknown is the potential for dependency. Over-reliance on the bot for emotional regulation could stunt the development of the user's own internal coping mechanisms. The "therapeutic alliance," the bedrock of successful therapy, is inherently relational and requires mutual vulnerability, something an algorithm cannot genuinely replicate. Research is urgently needed to define the optimal "scaffolding" withdrawal rate - the point at which the bot must step back to force the user to take ownership of the therapeutic work. Furthermore, the ethical field remains murky; questions regarding data privacy, algorithmic bias in diagnosis or suggestion, and accountability when the bot provides flawed advice require far more rigorous, longitudinal study before widespread, unsupervised deployment can be ethically justified.

Confidence: Research-backed
Core claims are supported by peer-reviewed research including systematic reviews.

References

Karlsson M, Bergenheim A, Larsson MEH (2020). Effects of exercise therapy in patients with acute low back pain: a systematic review of systematic . Systematic reviews. DOI
Smith D (1999). Response to Stuart et al.: Shooting the messengers - what meta-analysis can and cannot do. Schizophrenia Research. DOI
Govender K (2023). Coming to Terms with What AI Can and Cannot Do. Age of Agency. DOI
(2012). Listening with a constructive ear: what the client can do, not what they cannot do. Solution Focused Brief Therapy. DOI
Dreyfus H, Dreyfus S (1992). What artificial experts can and cannot do. AI & Society. DOI
Steane A (2018). What Science Can and Cannot Do. Oxford Scholarship Online. DOI
(2015). Preventing the Spread of Potato Viruses: What Insecticides Can and Cannot Do. Grow: Plant Health Exchange. DOI
Kratz F (2024). Mediation and Decomposition Analysis: Why We Cannot Do What We Think We Can Do, and How Causal Medi. . DOI