Learning to speak in foreign languages is hard. Speech shadowing has been rising as a proven way to practice speaking, which asks a learner to listen and repeat a native speech template as simultaneously as possible. However, shadowing can be hard to do because learners can frequently fail to follow the speech, and unintentionally interrupt a practice session. Worse, as a technical way to evaluate shadowing performance in real-time has not been established, no automated solutions are available to help. In this paper, we propose a technical framework with context-dependent speech recognition to evaluate shadowing in real-time. We propose a shadowing tutor system called WithYou, which can automatically adjust the playback and the difficulty of a speech template when learners fail, so shadowing becomes smooth and tailored. Results from a user study show that WithYou provides greater speech improvements (14%) than the conventional method (2.7%) with a lower cognitive load.
Figure 1. a) A learner (blue) tries to shadow a template speech (orange) by repeating what he listened to from an earphone as simultaneously as possible, but he fails. b) The template speech detects this failure and automatically rewinds the speech after adding some pauses. This is controlled by the computer in remote without any manual operations. c) The practice session resumes with lower difficulty, and the learner follows well this time.

