This article examines a popular trend of postulating that gestures have played a crucial role in the emergence of human language. Language evolution is frequently understood as a transition from a system, in which signals (whether vocal or manual) have fixed meanings and are used asymmetrically by senders and receivers, through specific cognitive and neurological changes, to a system, in which signals are (1) flexibly referential, i.e., can stand for a variety of ideas and (2) intersubjective, i.e., can be used equally in production and comprehension with any member of the community. The function assigned to gestures in gesture-first theories is to provide a first version of the more advanced open-ended communication in the form of spontaneous pantomimes that initiates a subsequent expansion of this system, its conventionalization and eventually a switch to the vocal modality. In the present article, I examine a particular theory that claims that pantomime was enabled by changes within the system of complex action recognition, and imitation. I argue that while the theory is promising, the notion of a pantomime it employs, presupposes two sophisticated abilities that themselves are left unexplained: symbolization and intentional communication. I point out two ways to remedy the situation, namely, constructing a leaner understanding of pantomime or supplementing the theory with an explanation for the emergence of these abilities. In this article I pursue a third option: identifying an alternative mechanism that can lead to a suitably complex language precursor while avoiding pantomime and its problematic cognitive bases altogether. This mechanism is ontogenetic ritualization, a well-known process responsible for the development of gestures in non-human primates. I outline the possibility that when placed in appropriate sociocultural circumstances, in which complementary actions around objects are required, this process can lead to signals that are modestly referential and intersubjective.