Nanchang Hangkong University, Nanchang, China
1 Introduction
Online short dramas are a formal update in the field of online film and television, especially referring to online film and television dramas with a duration of no more than ten minutes per episode. They conform to the trend of transformation of film and television drama content to networking and mobile, and have the characteristics of short length, refined content and compact plot. Such works integrate various symbolic resources such as text, pictures, and sounds, and convey information to the audience through the reasonable arrangement and layout of multiple modalities. Escape from the British Museum is an online short drama self-made by self-media bloggers “Jianbing Guozai” and “Xiatian Meimei,” which was released on Douyin, Bilibili, Weibo and other video platforms on August 30, 2023. The play adopts an anthropomorphic narrative technique, mainly telling the story of a Chinese jade pot girl escaping from the British Museum and meeting Zhang Yong’an, a reporter in the UK. She awakened her identity with her “family” and returned to China with a letter of protection. The three episodes of the play take “fleeing-returning-reading the letter of the Chinese cultural relics in the British Museum entrusted the small jade pot to bring back to China,” using the eyes of the jade pot to tell the century-old nostalgia of cultural relics, and stimulate the audience’s feelings of family and country and the collective empathy for the return of national treasures. As of May 2025, the total number of views of this short drama on various platforms is close to 600 million, with a cumulative number of more than 20 million likes, and the related topics have achieved an extremely high level of online discussion volume and reading volume. After watching this drama, the call for the return of cultural relics at the British Museum became louder and louder; not only did the Chinese media rush to report it, but also foreign media such as the BBC and Telegraph also published articles about the drama one after another. This short drama has gained such high attention. Its success lies not only in the huge amount of traffic, it uses language, images, sounds, actions and other means to integrate language modalities (metaphors, referents, emotional vocabulary), visual and auditory modalities (music, rhythm, intonation) and other multi-sensory and symbolic resources, carry out multi-modal collaborative interactions, and build a strong emotional mechanism that stimulates the audience, which is worth learning from the dissemination and creation of online short dramas. Guided by the multimodal discourse analysis framework proposed by Zhang Delu (2009), this study therefore aims to explore the multimodal collaborative interaction relationship and its emotional construction mechanism from four levels: culture, context, content and expression.
Barthes was one of the earliest researchers of multimodal discourse analysis. In 1977, his paper “Rhetoric of the image” explored how images are interconnected with language, acting and synergistically expressing meaning. After that, Kress & van Leeuwen (1996) proposed the theory of visual grammar, which delved into the relationship between modality and media, and explained how multimodality conveys meaning in visual images, color grammar, and newspaper layout design. As early as 2002, Royce also focused on second language classroom teaching, studying multimodal synergy in second language classroom teaching (Royce, 2002). He proposed that different symbols are complementary in multimodal discourse. Guijarro & Sanz (2008) conducted a multimodal analysis of picture books to determine the extent to which visual and linguistic components create meaning. Sihura (2019) analyzed the film Frozen, but he only explored language modalities and lacked a comprehensive exploration of visual and auditory modalities. Sari & Noverino (2021) analyzed an advertisement from the perspective of gestures, vision, language, space and hearing, highlighting the diversity of television commercials, among other things.
Chinese scholars have yielded fruitful achievements in the field of multimodal discourse analysis in recent years. Among them, Hu Zhuanglin published a paper, “Multimodalization in Social Semiotics Research,” in 2007. He not only introduced computer semiotics but also analyzed the difference between multimodal semiotics and multimedia semiotics, and emphasized that the multimodalization of social semiotics is gradually emerging and developing, and it is particularly important to strengthen the cultivation of multimodal literacy ability (Hu, 2007). In the same year, Zhu Yongsheng (2007) comprehensively expounded the four aspects of multimodal discourse, including its origin, definition, nature and theoretical support, as well as its content, method and significance, which provided a basis for future research on multimodal discourse. Zhang Delu (2009) further explored the relationship between multiple modalities and constructed a more systematic theoretical framework for multimodal discourse analysis. Xu Guohong (2010) studied the classroom of a university English teacher from the perspective of multimodal discourse analysis, and the core of which is to analyze the scaffold function of teachers’ non-verbal behavior and whether there is complementarity between verbal and non-verbal behaviors in providing scaffolding help. After that, Zhao Weiping (2017) used the theory of multimodal discourse analysis to study the feasibility of the flipped classroom. In addition, some scholars have explored film subtitle translation (Yin, 2020) and movie posters (Liu, 2024) from the perspective of multimodal discourse analysis.
It can be seen that most of the research focuses on multimodality theory, the relationship between multimodality and media, and the application or embodiment of multimodality in teaching, picture books, movies, etc. At the same time, there is little research on short videos, let alone on the emerging genre of online short dramas in recent years.
Since online short dramas are a new form of film and television dramas that have only appeared in China in the past ten years, the academic community’s research on them has just improved. The number of articles is small, and no special research on online short dramas has been retrieved overseas. Relevant achievements are mostly concentrated in the field of general short videos. The study of online short dramas can be traced back to 2008, and with the rapid development of information technology and multimedia in recent years, online short dramas have gradually entered everyone’s field of vision, and researchers have also paid attention to them. Wu Qian (2019) explored the innovation of micro-short dramas and their inspiration for television media. Zhang Han (2022) proposed the narrative strategy of micro-short dramas in the new media environment from the aspects of narrative structure, narrative discourse, narrative focus, lens language and editing syntax of micro-short dramas. Wen Jianwen (2022) analyzed the reasons for the popularity of online short dramas and how they should continue to develop in the future. Li Feiran (2023) proposed that, in order to achieve high-quality development, online short dramas should move in the direction of content expansion, narrative refinement, and market integration.
As an online short drama released only in August 2023, Escape from the British Museum mainly focuses on the cultural identity, empathy, communication, and collective memory of the short drama (Shen, 2024; Liu, & Xie, 2025).
In recent years, the multimodal discourse analysis framework has been widely used in the fields of communication science and film and television research, but the research on short videos and online short dramas is still in its infancy. Although Zhang Delu’s multimodal discourse analysis framework provides a systematic theoretical basis for multimodal research, there are still the following gaps and deficiencies in its application to the field of short videos or online short dramas: First of all, from the perspective of the particularity of short videos and online short dramas, video and online short dramas have the characteristics of short length, rapid dissemination, and wide audience, but the current multimodal research is mostly focused on movies, advertisements, picture books and other fields, and there is little research on the multimodal synergy mechanism of short videos and online short dramas. For example, the fast pace and fragmented propagation characteristics of short videos pose new challenges to multimodal collaboration, but the relevant research is not sufficient. Second, from the perspective of dynamic analysis of multimodal collaboration, although Zhang Delu’s multimodal discourse analysis framework provides an analysis path from four levels: culture, context, content and expression, the existing research mostly stays in static analysis and lacks in-depth discussion of the dynamic process of multimodal collaboration. For example, how to enhance emotional resonance through dynamic means such as camera switching and music rhythm changes has not been systematically studied. Third, at the content level, although the effects of color saturation and music mode on emotional resonance are qualitatively described, there is a lack of quantitative analysis. For example, current research on how specific values of color saturation affect audiences’ emotional responses is mostly limited to qualitative descriptions and lacks accurate quantitative data support. Fourth, from the perspective of in-depth discussion of the emotional resonance mechanism, multimodal collaboration can promote the audience from cognitive identity to emotional identity, and complete the emotion-behavior closed loop through social practices such as likes and retweets. However, most of the existing research remains on the surface of emotional identity, and there is a lack of in-depth discussion on how emotional identity is translated into specific behaviors (such as participating in cultural relics protection actions). By filling these research gaps, this study will further improve the application of multimodal discourse analysis frameworks in the field of short videos and online short dramas, and provide a more solid theoretical foundation and practical guidance for the study of cultural communication and emotional resonance.
As a kind of dynamic discourse, online short dramas convey the content and emotions contained in them through layout and arrangement. The comprehensive analysis framework of multimodal discourse proposed by Zhang Delu (2009) divides multimodal discourse analysis into four levels: cultural level, contextual level, content level and expression level. Therefore, this article will analyze the ideology and genre of the short play at the cultural level according to this framework. At the context level, analyze its situational context, that is, the language field, semantic intent, and language. At the content level, it is mainly based on the meaning of reproduction, interaction and composition in visual grammar, and explores the synergistic relationship between visual and auditory modalities and their effect on conveying the feelings of family and country. The expression level mainly introduces the linguistic and non-verbal media contained in it (see Figure 1).
Figure 1 Analysis framework (Zhang, 2009)
Family and country feelings refer to an individual’s emotional identity and cultural belonging to the family and the country, which involves deep feelings and a sense of responsibility for the nation and the country. In Escape from the British Museum, multiple modalities interact to resonate with the audience and cultural identity through the display of collective memory, cultural symbols and national cultural relics, stimulate a strong desire for the return and protection of cultural relics, and convey feelings of family and country (a deep emotional attachment to one’s homeland and nation).
The cultural level refers to the ideology composed of people’s thinking patterns, philosophies, living habits, and unspoken rules of society, as well as the communication procedures or structural potential to realize this ideology. The latter is known as the genre (Zhang, 2009). To a certain extent, it reflects the function of discourse as a cultural carrier under the joint action of multiple modalities. In this study, it is specifically manifested as follows: On the one hand, the online short drama Escape from the British Museum tells the story of a Chinese tangled thin-tired jade pot transformed into humanity and hopes to return to China through anthropomorphic techniques. It conveys the expectation of the return of cultural relics, evokes the collective memory and renewed attention of the Chinese people to cultural relics, and makes the audience resonate emotionally. On the other hand, Escape from the British Museum adopts an innovative form of micro-short drama in terms of genre. Different from previous emotional short dramas, it focuses on the topic of the return of cultural relics, starting from the people who are active on the Internet, arousing the importance and expectation of the Chinese people for the return of cultural relics, which contains a strong sense of family, country and national identity.
The contextual level mainly refers to different situations (Zhang Delu, 2023). According to the theory of systematic functional linguistics (Halliday, 1978), the situational context mainly consists of three parts: (1) the language field: what happened, what happened, and what the subject matter was; (2) Meaning: who is involved in the communication and the relationship between them; (3) Language: What is the medium and channel of communication?
The meaning of the online short drama Escape from the British Museum can be summarized as the emotional call and responsibility co-construction between “blood compatriots” and “guardians of family and country.” The participants in the skit include reproduction participants and interactive participants. Among the reproduction participants are the anthropomorphic jade pot, journalist Zhang Yong’an, as well as Peking Opera performers, sugar blower bosses, flag raisers in Tian’anmen Square, etc. Interactive participants refer to the producer and the audience, forming a variety of interpersonal relationships, such as the anthropomorphic jade pot and reporter Zhang Yong’an, the sugar blower boss and customer, and the producer and audience. The narrator of the communication is the anthropomorphic overseas lost cultural relics (Xiaoyuhu) and Chinese journalist Zhang Yong’an, and the audience is the Chinese audience in front of the screen, overseas Chinese at home and abroad, and the potential international public. The social relationship thus constructed is one in which the jade pot uses the phrase “black-eyed and yellow-skinned people who can understand me are family members,” instantly incorporating the audience into the “blood community,” and upgrading the usual “audience-role” relationship to a “lost relative-receiver” relationship; Zhang Yong’an went from spectator to entrustment, symbolizing the identity transformation of the Chinese people from “knowing” to “taking responsibility.” The language of the play includes auditory modalities such as character lines and background music, and visual modalities such as scenes, images, and colors, and a variety of modalities interact to achieve the purpose of transmitting information and meaning.
Based on the functional syntax of the Halliday system, Kress & van Leeuwen (1980) proposed visual grammar and representational meaning, interactive meaning and compositional meaning, which correspond to the three meta-functions of Halliday: conceptual function, interpersonal function and discourse function. Representation meaning refers to the reproduction of objective things and their relationship with the external world. It includes real-world people, things, places, etc., as well as the activities of the human inner world, which can be divided into narrative representation and conceptual reproduction (Sun, & Zhao, 2023). Interactive meaning refers to the relationship between the image maker, the world presented by the image and the image viewer. Compositional meaning refers to the integration of representational meaning and interactive meaning to jointly construct a coherent meaning whole. The proposal of visual grammar theory provides the possibility for social semiotics to refer to language in the category of visual images, and the proposal of theoretical knowledge and the establishment of theoretical frameworks for multimodal discourse analysis have greatly promoted further research on multimodal discourse analysis at the social level (Liu, 2024).
The significance of the reproduction in the online skit Escape from the British Museum is shown in Figure 2:
Figure 2 The seated wooden carving Guanyin of the Five Dynasties, screenshot from the Escape from the British Museum
(Episode 3, Timecode 5:53, Creators: Jianbing Guozai, Xiatian Meimei 2023)
From the perspective of visual modality, both narrative representation and conceptual representation are reflected in this short drama. Specifically, on the occasion of the announcement of the disappearance of cultural relics at the British Museum, Chinese journalist Zhang Yong’an walked into the museum and saw many historic relics like the Chinese phoenix crown and the seated wooden carving Guanyin of the Five Dynasties (Figure 2) and so on. The news announcement on the disappearance of cultural relics at the British Museum informs of the real event of “the loss of the Chinese tangled thin-tired jade pot,” establishes a causal chain of “lost property-time-place,” and provides a narrative motivation for the subsequent action of the jade pot. Zhang Yong’an walked into the museum and showed the action sequence of the reporter pushing the door, walking, and raising his eyes to watch through continuous shots, objectively reproducing the specific behavior of “Chinese reporters entering the British Museum,” laying the fulcrum for the upcoming cultural relic gaze and emotional identity. The Chinese phoenix crown symbolizes the etiquette system of the emperor and queen and the Chinese royal power, Figure 2 symbolizes the belief in compassion and oriental aesthetics of the the seated wooden carving Guanyin of the Five Dynasties, which together constitute the category concept of “Chinese cultural heritage,” and forms a static relationship of “occupied - displayed” with the display case and museum space, so as to conceptually reveal the separation of cultural relics and the homeland, and provide a cognitive schema for the theme of “cultural relics returning home” in the whole play.
Interactive meaning is about the relationship between the maker of the image, the thing represented by the image, and the viewer of the image, mainly through the four elements of contact, distance, attitude and modality (Kress, & van Leeuwen, 2006). Among them, contact mainly refers to the establishment of an imaginary contact relationship between the participant and the viewer through the direction of the gaze in the image.
Figure 3 Eye contact screenshot from Escape from the British Museum
(Episode 2, Timecode 3:38, Creators: Jianbing Guozai, Xiatian Meimei 2023)
Distance refers to the distance between the participant and the viewer in the image. In the play, the producer generally uses frontal and head-up perspectives, and mostly uses close-up and close-up shots to put the participants and viewers in an equal relationship, enhance the audience’s sense of experience and intimacy, and resonate more easily. Modality is expressed through color brightness or saturation. Most of the pictures in the short drama are brightly colored, mostly blue, green and other highly saturated colors, restoring real life and presenting a warm and comfortable atmosphere as a whole;
At the level of interactive meaning, Escape from the British Museum pulls the audience into a “family” emotional community through the four links of “contact-distance-perspective-modality” (Figure 3). First, the small jade pot debuts with a frontal close-up, eyes looking directly at the lens, and pupil highlights, forming a typical “demand” type of contact (Kress, & van Leeuwen, 2006), sending a request to the audience in front of the screen, “Please take me back to my hometown in China.” Secondly, a large number of close-ups and close-ups with a social distance of 0.5-1.5 meters, supplemented by head-up camera positions, make the characters and the audience at the same level, and generate an equal and intimate “family-style” viewing position. Again, hue is strategically invoked as a modal resource. When the jade pot appears, the saturation reaches 85% in the HSV color model (Hue, Saturation and Value), and the color temperature is 5200K, which is between warm and cool colors, creating a sense of “innocence” and trustworthiness; When Zhang Yong’an hesitated whether to report it, the saturation plummeted to 45%, the color temperature was cold at 6500K, which belonged to the cold white light or daylight color temperature, and the low-modal gray and blue tones conveyed moral entanglements and invited the audience to enter ethical dilemmas. At the end, although the group portraits of cultural relics in the British Museum have returned to high saturation, the brightness has been reduced by 10%, forming a compound modality of “magnificence + shadow,” which not only shows the glory of Chinese craftsmanship, but also reminds the remnants of the trauma of cultural relics lost in Britain, so as to lead the audience from individual emotions to the action appeal of Chinese cultural relics lost overseas looking forward to returning to China. In short, through the dynamic ratio of direct view, close-up, head-up view and saturation, the short drama transforms “watching” cultural relics into “looking for relatives,” so that the audience becomes the recipient of “separated family members” and completes the emotional anchoring of interactive meaning.
Compositional meaning refers to the overall layout of the image, and the information value, framing and highlighting of different symbolic resources can show different effects. Figure 4 is an image with obvious compositional significance taken from the short drama.
Figure 4 Tea shop courtyard and flag-raising procession screenshot from Escape from the British Museum
(Episode 3, Timecode 2;17 and 3:07, Creators: Jianbing Guozai, Xiatian Meimei 2023)
Figures 4 juxtapose “family security” and “national respect,” reflecting the dilemma of “homelessness” of lost cultural relics in reverse, and activating the deep structure of family and country feelings. The dense tables and chairs in the courtyard of the tea shop are sitting tea-drinking people, densely packed with bamboo chairs, steaming tea soup and leisurely neighborhoods, highlighting the secular foundation of Guotai and Min’an. The flag-raising shot strengthens the national symbol, and the foreground crowd is blurred and upgraded to a “group portrait,” implying that the country is the pinnacle of value in the hearts of the people. Comparing the two frames, the composition logic transitions from “the warmth of life” to “the supremacy of the country,” and jointly encodes the narrative of “family security and national respect”; The Chinese cultural relics that had previously lived in the UK were the “absent” control group, which had no tea table to rely on and no flag to look up to. Through the scene of “complete family and country,” the short drama reverses the audience’s emotions, and sublimates the demand for the return of China’s overseas cultural relics into collective will and action, and the meaning of the composition and the feelings of family and country resonate here.
“At the formal level, the formal characteristics of different modalities are interrelated and jointly realize the meaning of discourse” (Zhang, 2009). Linguistic modalities do not simply interpret pictures, but form “point-to-point” complementary and non-complementary synergies through vocabulary selection, metaphorical construction, prosodic rhythm, and visual and auditory modes.
(1) complementary and reinforcement synergy. On the visual level, the museum takes a long shot of the museum, with warm top light + highly saturated emerald green, highlighting the magnificence of the Buddha statue. In the auditory layer, the guzheng prelude adopts a 6/8 time flow sound pattern with a speed of 72 BPM, creating a sense of time extension of “water waves returning.” At the language level, the voice-over line, “Reunion, all contained within a single Bodhi leaf,” employs a Buddhist metaphor “One Leaf Bodhi” to simulate cultural relics as the subject of enlightenment, which forms a semantic synonym with the top light of the “Buddha’s Light” from the top down of the camera; At the end of the sentence, “between” adopts a descending and ascending tone (↘↗), which is rhythmically synchronized with the glissando of the guzheng, realizing the three-line coupling of metaphor, light and shadow, and melody, and strengthening the emotional peak of “soul returning home.”
(2) Complementary and extended collaboration. The camera cuts to a close-up of the Ming Dynasty dragon-patterned pottery brick (Figure 5). The language gives a three-line ratio of “sick bones are separated, and it has been a hundred years away from home...” each line is 7-9 words, forming a 2-3-2 rhythm, which corresponds to the “enter-stop-retreat” movement beat of the picture pushing and pulling the lens; At the same time, negative semantic fields such as “sick bones” and “broken souls” offset the “quiet beauty” of vision and turn it into “poignant,” completing the reverse expansion of emotional value.
(3) Non-complementary independent collaboration. The pure black screen at the beginning of the opening film + the broadcast voice “The disappearance of the jade pot in the British Museum” constitutes a “zero vision” mode: the language mode provides information in the official language domain (police report register), and the visual mode is deliberately absent to form an “information-emotion” separation. Here, the language does not explain the image, but makes the “invisible” the vision itself, guiding the audience to associate the “disappearance” with the colonial historical background of cultural relic loss, and presupposing cognitive gaps for subsequent issues of home and country.
(4) Internal language cross-modality. At the same time as the subtitle “Next stop, Guijia Road” appears, the background music immediately changes from A minor to C major in the previous paragraph, and the language symbol “Guijia” (returning home) and the music “bright in major key” form an audio-linguistic metaphor.
In summary, language modalities are precisely coupled with visual color, musical mode, and lens movement through metaphorical mapping, prosodic alignment, and language domain transformation, which can not only directly strengthen the image emotion, but also reverse expand or independently provide information gaps, and jointly weave the discourse network of “cultural relics returning home” with vision and hearing.
Figure 5 Ming Dynasty dragon-patterned pottery brick screenshot from Escape from the British Museum
(Episode 3, Timecode 5:30, Creators: Jianbing Guozai, Xiatian Meimei 2023)
Zhang Delu (2009) divides the level of expression into verbal and non-verbal media. The chimerism of the two can amplify emotions. In terms of language media, Xiaoyuhu choked up when he first met Zhang Yong’an: “Family, I have been wandering outside for a long time, I am lost, and I don’t know how to find my way home.” The sentence “family” completes the identity anchor with the title of blood relatives, instantly upgrading the audience from “bystanders” to “lost relatives,” activating blood emotions. In the finale, the lost cultural relics shouted “I want to go home” in unison with differentiated timbre, intonation, and rhythm, forming a multi-voice chorus appeal, strengthening the ritual effect of “group portraits returning to the heart,” and promoting the feelings of family and country to reach the emotional boiling point.
Non-verbal media undertakes the functions of “embodiment” and “scene-based”: close-ups of the jade pot’s red eyes and slightly trembling hands, translating the “maze” in language into visible “physical trauma”; Zhang Yong’an shifted from the initial sideways avoidance to the space of positive embrace, implying the completion of responsibility recognition; The low-angle upward shot, the Tian’anmen flag raising, and the overhead shot, juxtaposes the tea stall market, vertically integrating “country” and “home” to provide realistic coordinates for emotional landing. Coupled with network equipment elements such as a mobile phone live broadcast interface and barrage fly-in, the “picking up cultural relics home” extends from the plot task to an online action that can be participated in immediately off-screen, realizing the seamless connection between plot emotion and audience practice. The language media directly express emotions, and the non-verbal media allow emotions to be “seen, touched, and participated,” and the two complement and synergize to complete the layers of progression and cross-screen diffusion of family and country feelings.
Based on the multimodal discourse analysis framework of Zhang Delu (2009), this study systematically examines the modal synergy and the construction of family and country feelings in Escape from the British Museum from the four layers of culture, context, content and expression. The results show that the cultural layer embeds “cultural relics home” into the collective memory of the Chinese nation, giving the short drama a shared emotional schema. The context layer makes the narrative everyday through “blood titles + civilian scenes.” Through the multi-modal combination of “highly saturated warm colors + direct eye contact + major melody,” the content layer transforms the desire of cultural relics to return home into perceptible audio-visual stimuli, so as to effectively promote the audience’s emotional resonance. The expression layer uses the title of “family” to anchor the identity, the language mode directly appeals to the identity, and the non-verbal mode (close-up, head-up, flag raising) provides embodied evidence. The language and non-verbal modalities resonate to promote the audience from cognitive identity to emotional commitment. The four-layer collaboration progresses step by step, forming an emotional peak and then connecting with the audience’s social practice (likes, retweets, topic participation) to complete the emotion-behavior closed loop (see Figure 6). This mechanism provides an operable and testable model for the emotional narrative of online micro-dramas, and also provides a reference for subsequent cross-platform and cross-cultural comparative research.
Figure 6 Multimodal emotional resonance model for online film and television creation
As an innovative communication model that people like to see, online short dramas should clarify the direction of progress, and take over the baton of spreading traditional Chinese culture, stimulating national identity, and cultivating people’s feelings for their family and country with high-quality and high-level works. Based on multimodal discourse analysis, this paper analyzes and interprets the feelings of home and country in Escape from the British Museum from the four levels of culture, context, content and form, and analyzes the effect of conveying the meaning and connotation of the short drama from the perspective of the synergistic relationship between multiple modalities such as visual, auditory and language. The success of this short drama is attributable not only to its high viewership and significant public engagement but also to its excellent production and delicate emotional expression.
Future research can horizontally compare online short dramas on different platforms and in different cultural contexts to further verify the cross-cultural effectiveness of modal synergy. For online short dramas to achieve sustainable development and broader impact, it is imperative to explore more “Chinese expressions” that integrate technology and emotion, so that every frame of the picture, every line of dialogue, and every beat of music is expected to become an important link for telling Chinese stories well and condensing national identity.
The screenshots from the video Escape from the British Museum used in this paper are solely for multimodal analysis in academic research and not for profit-making purposes. If any copyright issue arises, the rights holders are kindly requested to contact the authors promptly for removal or modification.
[1] Guijarro, J. M. , & Sanz, M. J. P. (2008). Compositional, interpersonal, and representational meanings in a children’s narrative: A multimodal discourse analysis. Journal of Pragmatics, 40(9), 1601–1619.
[2] Halliday, M. A. K. (1978). Language as social semiotic: The social interpretation of language and meaning. London: Edward Arnold.
[3] Hu, Z. L. (2007). Multimodalization in social semiotics research. Language Teaching and Research, 1(1), 1–10.
[4] Kress, G. R. , & Van Leeuwen, T. (1996). Reading images: The grammar of visual design. London: Routledge.
[5] Kress, G. R. , & Van Leeuwen, T. (2006). Reading images: The grammar of visual design (2nd ed.). London: Routledge.
[6] Li, F. R. (2023). Content expansion, narrative refinement, and market integration: Path selection for the high-quality development of domestic online short dramas. Film Literature, (9), 74–79.
[7] Liu, Y. J. & Xie, T. (2025). A study on the empathy transmission mechanism of online short dramas from the perspective of collective memory theory: Taking Escape from the British Museum as an example. News World, (3), 84-86.
[8] Liu, Y. N. (2024). Multimodal discourse analysis under visual grammar theory: Based on the poster of the movie “All or Nothing”. Ancient and Modern Cultural Creation,(18), 126–128.
[9] Royce, T. (2002). Multimodality in the TESOL classroom: Exploring visual-verbal synergy. TESOL Quarterly, 36(2), 191–205.
[10] Sari, V. W. , & Noverino, R. A. (2021). Multimodal discourse analysis in Pantene advertisement. International Journal of Linguistics, Literature and Translation, 4(10), 21–30.
[11] Shen, Y. Q. (2024). Research on empathy transmission of online short drama Escape from the British Museum. New Media Research, (2), 81-83.
[12] Sihura, M. (2019). Transitivity process in Frozen movie: A study of systemic functional grammar. International Journal of Systemic Functional Linguistics, 2(2), 79–85.
[13] Sun, J., & Zhao, Y. (2023). Multimodal discourse analysis of family and country feelings in “Fighting the Epidemic with One Heart”. Contemporary Foreign Language Research, (6), 78–89.
[14] Wen, J. W. (2022). Thoughts on the causes and development of the popularity of online short dramas. Media, 1, 35–38.
[15] Wu, Q. (2019). Micro-short dramas: The innovation of short videos and their inspiration for TV media: An analysis based on Tencent Yoo Video. Media, (22), 54–57.
[16] Xu, G. H. (2010). “Words” and “deeds” in teachers’ classrooms: A multimodal discourse analysis of a college English intensive reading class. Journal of University of Science and Technology Beijing (Social Sciences Edition), (4), 7–11.
[17] Yin, M. M. (2020). Research on subtitle translation strategies of French classic literary films from the perspective of multimodal discourse analysis. Contemporary Cinema, (2), 164–168.
[18] Zhang, D. L. (2009). Exploration of the comprehensive theoretical framework of multimodal discourse analysis. Chinese Foreign Languages, (1), 24–30.
[19] Zhang, D. L. (2023). Research on modal fusion mode in multimodal discourse construction. Modern Foreign Languages, (4), 439–451.
[20] Zhang, H. (2022). Attractiveness: The narrative strategy of micro-short dramas in the new media environment. Film Literature, (20), 23–27.
[21] Zhao, W. P. (2017). Research on the theoretical framework of multimodal discourse analysis in flipped classroom. China Adult Education, (14), 96–98.
[22] Zhu, Y. S. (2007). Theoretical basis and research method of multimodal discourse analysis. Journal of Foreign Languages, (5), 82–86.