1. University of Shanghai for Science and Technology, Shanghai; 2. The Hong Kong University of Science and Technology (Guangzhou), Guangzhou; 3. Higher Education Press, Beijing
Artificial Intelligence-Generated Content (AIGC) has already penetrated deeply into culture and commerce. The domain of visual graphics is no exception. Using textual prompts, rough sketches, samples, and structural references, generative models have learned to produce a wide variety of visual outputs. These include illustrations, cinematic backgrounds, advertising media, UI layouts, concept art, and brand identity systems. No longer viewed as a minor technical advance, AIGC has become substantially more significant within the broader visual landscape.
It is not simply a matter of speed. Visual elegance allows us to create, decode, arrange, attract, and engage. As the next generation of tools emerges, it also begins to influence conceptual thinking. Currently, a producer can go from a word to a specific graphic within five seconds. This narrows the gap between concept and image and alters the pace of production. Previously, professionals spent considerable time making choices; now they can explore many options simultaneously, refine prompts more frequently, or involve colleagues earlier.
At the same time, as more people use AIGC, significant concerns arise. When images originate from systems trained on vast historical image repositories, how can we verify their authenticity? If the style of a famous artist can be replicated, at what point does imitation cease to be inspiration? Furthermore, to what extent should a designer be held accountable for producing derivative or homogeneous illustrations? This cannot be viewed simply as technical progress; rather, it is also a social and ethical issue.
Currently, two opposing views exist. One argues that AIGC democratizes creativity, making production widely accessible. The opposite view sees it as a risk to authors, the labor market, and the arts. Although both perspectives have merit, they overlook a crucial point. We must determine the role AIGC plays in design projects — its strengths, its limitations, and who should supervise it while critically evaluating its outputs.
This article examines AIGC in visual aesthetics. It begins with the technical groundwork that has seen considerable progress, then discusses key application areas in visual design, including branding, advertising, interface design, illustration, and concept art. It then analyzes how AIGC transforms creative workflows and design processes, followed by an examination of major challenges and ethical questions. Finally, it considers how the industry and education might be governed in a multimodal space. The core argument is that AIGC does not replace human creativity but rather establishes a novel visual production system that transforms workflows and demands new forms of thinking.
The current development of AIGC in visual design rests on a longer history of generative image modeling. One of the earliest breakthroughs was the introduction of Generative Adversarial Networks (GANs), proposed by Goodfellow et al. (Goodfellow et al., 2014) GANs consist of a generator and a discriminator trained in opposition to one another: the generator produces fake data intended to appear real, while the discriminator attempts to distinguish synthetic data from authentic ones. Through this adversarial process, the generator progressively improves its ability to produce convincing outputs.
The significance of GANs is that they showed that AI could generate new visuals, not merely classify or retrieve existing images. Applications include face synthesis, style transfer, high-resolution imagery, and cross-domain mappings. For visual design, GANs suggest that machine learning can be integrated into the creative process, not just used as a diagnostic tool.
GANs have their own constraints on training and reproducibility due to the mode collapse problem, which is particularly relevant for specific creative tasks. The framework demands much more from the environment than mere realism; designers seek diversity, flexibility, meaning, and aesthetics. Commercial or social visualization is not about photorealism; it needs to serve particular communicative purposes.
A considerable change occurred after diffusion models emerged. In 2020, Ho, Jain, and Abbeel published Denoising Diffusion Probabilistic Models (Ho et al., 2020). In this framework, a system generates images by reversing a process that starts with random noise while referencing learned data. Inference begins by introducing random noise and continues until a single, coherent image emerges. These models are relatively easy to optimize compared to adversarial approaches and yield good results, albeit with notable inconsistency.
Latent diffusion methods subsequently improved the usability of diffusion architectures. Rombach showed that one could perform diffusion over compressed latent representations instead of full high-resolution image generation (Rombach et al., 2022). This advance is important, as it allows models like Stable Diffusion to be deployed more easily. For visual creation, diffusion models are valuable not only for their output quality but also for their compatibility with tasks such as inpainting, outpainting, and iterative image editing.
Diffusion models have spread widely in recent years, and multimodal alignment has improved considerably. CLIP represents a significant step in this direction, as introduced by Radford et al., who used large-scale internet datasets to create shared embeddings for vision and language (Radford et al., 2021). CLIP enables better alignment between textual descriptions and visual content.
This capability differs markedly from traditional design workflows. Design professionals traditionally worked from specific conceptual requirements — for example, an advertisement must be polished yet accessible, a product label must be legible while remaining inviting, a newspaper page must convey a sense of measured urgency. Such detailed language is now translated into multi-level imagery. This greatly improves the process of formulating a design brief. Currently, a designer can assess whether a system comprehends a verbalized idea.
Ramesh et al. further advanced this direction in Hierarchical Text-Conditional Image Synthesis Using CLIP Latents, building systems that generate images from text much faster, even with minimal cues such as spatial location or narrative tone (Ramesh et al., 2022). This transformed how linguistic elements integrate into artistic production. Text is no longer merely an adjunct to the artwork; it becomes a means of constructing the image.
Proficiency in prompting has emerged from guided generation practices. Efficient prompting goes beyond adding phrases; it requires understanding how the model processes descriptive focus, stylistic cues, structural suggestions, and emphases. Designers now provide more detailed instructions regarding topics, tones, perspectives, formats, and visual elements. The practice has shifted toward compositional thinking, creative direction, and software customization.
Neural style transfer marked a significant advance in the synergy between images and computational processes. Gatys, Ecker, and Bethge demonstrated that neural architectures could separate and then recombine different aspects of an image, such as texture and brushstrokes (Gatys et al., 2016). Although different from modern large-scale text-to-image generation, this approach conceptualizes style as quantifiable and transferable.
Visuals must remain highly communicative: corporate logos, creative elements, magazine layouts, and promotional graphics each convey a distinct atmosphere. Style transfer and subsequent image processing systems enable designers to explore many different aesthetics easily without committing to a single output. An artist could quickly investigate various visual styles — painterly, photographic, or hybrid — with considerable accuracy.
Subsequently, image-to-image architectures have advanced significantly. Sketches, wireframes, rough references, and low-fidelity compositions can be transformed into more polished visuals. This approach works effectively within creative workflows where concepts, rather than finished graphics, serve as the starting point. AI tools are valuable because they convert transitional materials into tangible reference points for discussion.
However, even with AI-assisted stylistic exploration, better results are not guaranteed. The output may be novel yet strategically nonsensical or culturally inappropriate for broader application. Therefore, it must be recognized that what we have is only one aspect of aesthetics. Even though the technology performs this transformation for you, that does not mean you can forgo critical discrimination.
A significant shift is underway: as AIGC implementation matures, mere visual appeal is no longer sufficient. Across industries, consistency is essential — layouts must remain coherent, branding continuous, and changes incremental.
ControlNet was introduced in its eponymous publication, enabling diffusion models to synthesize outputs conditioned on structural inputs such as pose maps, sketches, segmentation masks, or depth maps (Zhang et al., 2023). This is important for creative use cases. It generates diverse patterns while maintaining consistency: users can preserve a design layout or a character’s pose while experimenting with various styles and moods.
Furthermore, other approaches allow modifications to items, labels, and alternatives across repeated elements. Many creative environments require consistent results, where all assets look exactly alike. A brand identity, a coherent visual sequence, or a narrative atmosphere must be replicable. Without consistency, it is difficult to accept AI-generated outputs as coherent.
From a more general perspective, governance remains necessary: the potential of AIGC will not materialize simply because larger and better models exist. However, this is rarely the case. If we sufficiently manage, regulate, and refine the generative framework, it can reliably produce high-quality imagery. Creatives tend to prefer AI that is not fully autonomous but rather serves as a playful tool within an explorable space.
Brand identity creation is one of the fields where AIGC has made a considerable impact. The process typically begins with relatively vague notions of persona, maxims, and distinctiveness. Clients typically expect brands to be creative, reliable, sophisticated, engaging, and distinctive. Translating these expectations into tangible outputs requires extensive conceptual research.
This raises a compelling question: by building quickly, participants can generate a wide range of inventive mood boards, critical visuals, packaging guidelines, and emblem patterns — using not only a few curated references and basic sketches but also their own creative intuition. This approach allows the group to evaluate a broader set of possibilities more efficiently while offering a glimpse into how things might ultimately take shape.
Visualization also helps communicate ideas effectively. People respond more favorably to concepts that can be visualized. Thus, AIGC output can serve as a graphical medium for communication, and if one were to express an organization as sharp, natural, current, traditional, sophisticated, or streamlined, this would accelerate the initial stage for any startup.
However, there are also limitations. A strong brand is not merely about aesthetic appeal. It stands for something that should remain visible across different contexts. Logotypes, text styles, grids, symbols, palettes, and kinetic actions each require careful design. AIGC offers a space for brand exploration rather than a definitive framework. The transition from generated images to repeatable design systems still requires human expertise.
The influence of AIGC on UI/UX is more limited. Current generative models cannot meet the demands of interface design, such as accessibility and responsive elements. Nevertheless, AIGC is increasingly being integrated into digital product development processes.
A common method is to generate a large volume of visual assets rapidly. It can create tutorial screenshots, banner graphics, dummy icons, and theme background resources for design experts. Such elements do not generally explain how a product works, but can greatly affect how people perceive a platform. It is very easy to create and experiment with different styles.
Design teams use AIGC to articulate initial concepts before full interface development. A team may require a design to be aesthetically pleasing, engaging, technically oriented, and contemporary. The visuals generated help distill the essence of the conceptual stage, shaping both the language and mood of digital product development.
However, AIGC is still far from the goal of assisting in the rapid prototyping of visual interface components. Moreover, a human must supervise the space, structure, formulation, and step-by-step interaction to create such a system. Currently, it should be treated as a complement for exploring visual design, not a replacement for the organized approach of UX.
Advancements in the field of AIGC are moving rapidly. The aim is to reach a commercial level — developing substantial visual resources while continuously adapting to different audiences and contexts. Central to this is identifying the most effective framework for producing high-quality work.
AIGC allows creative professionals to envision the outcome more concretely. Grounding outputs in conceptual inputs, vocalized expressions, and implied mood enables graphical consultation on layout. This helps art directors, copywriters, and strategists explore a wide range of visual possibilities, moving beyond abstract discussion toward tangible imagery.
This is highly beneficial for pitching and presentations, as businesses constantly seek diverse alternatives for their clients. AIGC enhances perception and may facilitate greater creative output. It changes not only the speed of creation but also the way commercial designs are argued and presented.
In promotion and marketing, a range of AIGC tools are often used to deliver the same core message across different atmospheres, settings, compositions, and sensory cues. Social media and online promotion play a crucial role here, as companies typically need a high volume of visual assets that maintain a cohesive look while offering subtle variations.
However, this power is not without ethical concerns. While progress continues, synthetic visuals can mislead public opinion and evoke inappropriate emotional responses. AIGC can produce deceptive content, challenging the primacy of historical truth over algorithmic rules. This necessitates scrutiny of AI-generated advertisements and a clear boundary between imagination and reality.
This is also evident in visualization and concept art, which involve cyclic procedures, atmosphere, symbolism, and speculation. AIGC is accelerating these processes.
For example, AIGC can help convey the main idea. Journalistic artists or graphic storytellers can use these concepts to experiment with symbolism, color, mood, and composition before finalizing their work. In such cases, AIGC is not used merely as a replacement but as a support tool.
For the creative aspects of games, cinema, and entertainment, the role of AIGC is similarly significant. Teams require extensive exploration in areas such as character modeling, costume design, and object selection. Throughout the preparation phase, AIGC generates these assets rapidly, allowing experts to select preferred options from the set.
At the same time, this is an area where people are most reluctant to use generative AI. Many artists and creative professionals are concerned that AIGC might take over tasks requiring substantial expertise. This reflects their skepticism. However, it is more likely a matter of job transformation rather than complete obsolescence. While a broad construct of ideas and differentiation might be mechanizable, areas that require systemic coherence, effective narration, correct iconography, and refinement of judgment will no doubt require human participation.
Another pressing concern is the standardization of aesthetics. Much of what is produced begins to look remarkably similar — cinematic lighting, prettified renders, typical fantasy or lifestyle visuals. Over-reliance on such patterns risks narrowing the diversity of visual heritage, leaving work feeling unremarkable. Therefore, artists and producers must harness AIGC’s efficiency without slipping into homogenized uniformity.
The main impact of AIGC on visual aesthetics is likely to be the reorganization of workflow sequences. Traditionally, images evolve from sketches through step-by-step refinement until completion. Therefore, we cannot envision every possible outcome. Given the time and expertise required, only a few can meet this particular challenge.
AIGC lowers the cost of visualization. Creatives can express their ideas more quickly and consider many options. The process transforms concepts into tangible visuals. Rather than adhering to a single path, they explore widely divergent variations in organization, atmosphere, and symbolism.
Designers may also become more imaginative when working with open-ended tasks. However, the nature of creative output is also changing. Greater quantity does not equate to quality; much of the output may be only partially original or incoherent. Hence, we need to consider evaluation criteria during production. The emphasis shifts from making to selecting.
To understand the innovation of AIGC, it is worth considering its role as externalized ideation. Oppenlaender argues that text-to-image systems can support creativity by helping users surface visual possibilities that might not have been consciously available at the beginning of a task (Oppenlaender, 2022). Thus, we should not assume that AI is creative in the same way human designers are.
Creative processes have always relied on drafting, documentation, mood boards, visual samples, and collaborative critiques. With AIGC, a unique “generate” operation allows designers to produce a much larger range of possible visuals stemming from their initial inspiration, rather than searching for a specific reference. Having a mixed or ambiguous aesthetic goal can still be productive.
AIGC often delivers incorrect answers rather than accurate ones. Its primary goal is simply to generate a response. Yet in doing so, it may yield striking configurations, unexpected color associations, or partially automated results — providing a tentative foundation, if only in brevity. As humans, we bring intelligence to the process, engaging in dialogue with these synthetic ambiguities.
However, the process remains effective if the designer maintains a critical perspective. Without selection, novelty is merely an old advancement recast as new. AIGC increases possibilities but cannot determine which is better.
With the emergence of AIGC, prompt engineering has become a new competence in creative fields. In graphic design, prompts refer to determining the phrases and frameworks that a system requires to produce specific types of ideas and organizations. This involves not only writing but also perspective, emotion, medium, material choices, styles, and priorities.
Creatives need a range of capabilities: understanding how visuals affect outcomes, how language influences bias and hype, and how to express core ideas within constraints through iterative evaluation. As a result, this becomes a highly dynamic form of creativity.
But these are foundational cues: for certain pipelines, it often requires frameworks, governance, benchmarks, graphics, first drafts, masks, and further composition refinements. Real creative production means integrating inputs from diverse sources. So, prompt engineering is merely one part of this larger Human–Machine Oversight Framework. It is not a magic way to obtain pre-cooked visual assets.
With the increase in dependence on AIGC, a significant concern arises: how do we determine who made AI art? Design authorship cannot be determined simply at the moment of image creation. The issue involves stating the problem, setting objectives, choosing among alternatives, strengthening weaknesses, and managing final distribution.
AIGC may produce unpredictable, entirely fabricated visuals. Many artists report using AIGC, but there is no real motivation or accountability here. It simply gives out what it has; that is, generative capacity, not human creativity.
It is important that even the best-designed artifact entails accountability. Even if using these images is permissible, the creator may face legal or professional consequences. Such distribution has traditionally been handled by production management rather than through evaluation and accountability.
But AIGC redefines jobs. Routine visualization will become less common as a job, while conceptual organization, creative direction, evaluation, and management will rise. It changes how we teach and learn to make designs, especially for those who have done extensive exploration. Therefore, it changes more than tools — it changes professions, and which particular skills are in dominance.
The main legal issue concerning AIGC is copyright ownership. Many existing models use very large amounts of copyrighted artistic and visual material. Its legal status remains a subject of ongoing global debate. Synthesis should be akin to the stylistic cadence or aesthetic spirit of certain existing works.
From a creative perspective, this is a practical concern. Advertising requires clear licensing boundaries to ensure accountability. Any concealment of a graphic’s origin during commercial promotion, distribution, or brand interaction poses legal risks. Moreover, the lack of a clear legal framework directly affects artists’ ability to protect their innovative works.
AIGC also raises ethical concerns. Generative models can mimic the visual aesthetics of existing artists, cultural heritage, and other sources. Thus, users can appropriate the style and principles of a creative work without knowing its origin or owner.
Creation is intrinsically linked to legacy and cultural context. If we treat it as a mere prompt, the inventive output becomes a derivative alternative. The situation is more concerning when marginalized visual languages are appropriated by dominant cultures through industrialized generation without consent or authorization.
Moreover, because generative models derive data from vast amounts of online photos, they necessarily inherit all the inherent biases. Cetinic and others found that AI reproduced content both visually and culturally without explicit direction (Cetinic & She, 2022). Outcomes produced could perpetuate stereotyping concerning gender, social status, ethnicity, occupation, and different types of subjects.
Likewise, bias can be found in our aesthetic thresholds. Furthermore, these models often promote Eurocentric ideas of what is luxurious, elegant, urbane, or innovative, gradually reinforcing a limited point of view.
Regarding aesthetics, there is a challenge of uniformity stemming from such biases. AIGC creates many images, but they often share the same texture, illumination, layout, and mood. This means that a greater quantity does not imply a wider variety of heritage and ideas. We must prevent visualization from diluting the essence of machine-made creations.
The contribution of AIGC remains unclear, yet it is significant enough to warrant attention. Certain design tasks can probably be carried out faster and cheaper with some help from machine intelligence. It will become less important to perform certain kinds of foundational artistic work. At the same time, it may place greater emphasis on more developed competencies, including ideation, systematic comprehension, thematization, and aesthetic appreciation.
Another problem is the professionalization. Craftspeople may lose their sensibilities and techniques if they depend entirely on AI for ideas or compositions. They may question whether their manual skills remain relevant — can design direction and labor still produce lasting forms of knowledge in an era when things are made more quickly than ever before?
The challenges outlined above — from copyright ambiguity and style imitation to bias, aesthetic homogenization, and labor displacement — point to a fundamental transformation that extends beyond individual technical issues. As Epstein et al. argue, understanding the full impact of generative AI on creativity and society requires new interdisciplinary inquiry that spans culture, economics, law, algorithms, and the evolving interaction between technology and human creators (Epstein et al., 2023). The following section explores strategic responses across these interconnected dimensions.
We must act with honesty and transparency as producers when considering AIGC as a vast source of visual assets. This requires regionalized structures that maintain consistency across iterations, with the same improvements applied uniformly over time. Furthermore, it becomes essential to have better knowledge of training dataset origins, licenses, and material screening.
Therefore, it is not just a matter of creatives having some form of appeal: Does the system work well when modifications, documentation, or a degree of responsibility are involved?
In matters of regulation, employment, and professional advancement, governance must adopt structured frameworks. This part should establish rules around ethical training datasets, making synthetic output available for sale, showing real pictures as needed, and examining sensitive material. For instance, departments, publishing houses, scholarly bodies, and artistic agencies should have guidelines on using generative tools internally.
Without them, it would still be technologically advanced, but without morality or systemic integrity — AIGC.
Educational frameworks must evolve while maintaining rigor. Students need to understand the generative paradigm: all kinds of prejudice, its limitations, and the fact that we should rigorously evaluate what is generated from AI. Also, it is necessary to be fluent in composition, typography, visual persuasion, storytelling, and investigative design.
It is much more than showing a trick. Rather than simply telling, we should demonstrate how design takes shape through both manual/click-based actions and conceptual/generative processes.
Although this discussion has focused on image creation, AIGC is likely to develop rapidly toward video, 3D, and interactive multimodal design architectures. Regarding future creative implementation, these technologies could reduce the distinction between static, motion, audio, environments, and the ability to generate practically any amount of any media.
This means the strategic dimension of design will become significantly more important. Instead of crafting static objects, designers will create continuously evolving interaction environments across diverse media and technologies. Therefore, the ability to convey meaning, maintain internal coherence, and analytically appraise across different domains could become our most important professional abilities.
AIGC is becoming an important catalyst in the transformation of visual aesthetics. It is more than automation of graphics; it encompasses ideas, concepts, structures, methods, and the integration of linguistic rules into form. It also supports quick iteration and thorough exploration of a large number of different styles, offering many new perspectives for professionals in branding, marketing, UI/ID, and illustration.
In contrast, it also reveals important issues: intellectual property disputes, stylistic imitations, biased portrayals, and aesthetic homogenization. All these issues demonstrate that generative AI is not neutral in content production. Rather, it is a socio-technical system that transforms the nature and production of visual culture.
The main issue is not whether AI can make images. Rather, it is a question of whether creators or organizations choose to use them. Human creativity is still needed — not because AI cannot produce incoherent outputs, but because design involves intention, context, and consequence. Thus, AIGC cannot take the place of the designer. Instead, it forces the practitioner to seek other means.
These developments, combined with deeper ethical and pedagogical reflection, will give rise to more robust professional roles in the future. Proper use of AIGC can enhance creative output. Conversely, misuse may reduce visual diversity while increasing media saturation. Looking forward, the goal is not merely technological improvement but the active construction of human-AI collaboration that balances aesthetics and ethics.
[1] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2014). Generative adversarial nets. Advances in Neural Information Processing Systems, 27, 2672−2680.
[2] Ho, J., Jain, A., & Abbeel, P. (2020). Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33, 6840−6851.
[3] Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-resolution image synthesis with latent diffusion models. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10684−10695.
[4] Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., ... & Sutskever, I. (2021). Learning transferable visual models from natural language supervision. Proceedings of the 38th International Conference on Machine Learning, 8748−8763.
[5] Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., & Chen, M. (2022). Hierarchical text-conditional image generation with CLIP latents. arXiv e-prints.
[6] Gatys, L. A., Ecker, A. S., & Bethge, M. (2016). Image style transfer using convolutional neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2414−2423.
[7] Zhang, L., Rao, A., & Agrawala, M. (2023). Adding conditional control to text-to-image diffusion models. Proceedings of the IEEE/CVF International Conference on Computer Vision, 3836−3847.
[8] Oppenlaender, J. (2022). The creativity of text-to-image generation. Proceedings of the 25th International Academic Mindtrek Conference, 192−202.
[9] Cetinic, E., & She, J. (2022). Understanding and creating art with AI: Review and outlook. ACM Transactions on Multimedia Computing, Communications, and Applications, 18(2), 66.
[10] Epstein, Z., Hertzmann, A., Herman, L., Leach, N., Maher, M. L., Resnick, M., ... & Schroeder, J. (2023). Art and the science of generative AI. Science, 380(6650), 1110−1111.