1 of 3 August 25, 2024

Essence of Creativity: Future of Creative Work

Exploring the intersection of AI-generated content and human creativity. Analysis of creative workflows, multimodal interactions, and the future of content creation in the AI era.

Is it creative to screenshot someone else’s video and caption it with other people’s comments? This seemingly simple question hits every creator making rent from content: if AI can remix, analyze, and generate at scale, what’s left that’s genuinely yours?

Through building AI content tools, I’ve discovered the real opportunity isn’t AI replacing creators—it’s AI helping creators understand the massive amounts of content data around them to find genuinely fresh angles. Think of it as having a research team that can analyze millions of posts, comments, and engagement patterns in seconds, then surface the insights that lead to truly original work.

TikTok's content creation interface: Where creators combine video, audio, and engagement elements to build viral content

Note (Nov 4th, 2025): I was actually able to speak to the head of strategy for Mr.Beast and surprisingly enough (or not surprisingly) this is exactly what they are doing and what makes their content so successful - they look for outliers in mass amounts of data - finding videos that genuinely spark viewers’ interest, even if it’s in a different domain - think a Minecraft simulation with 100 players on each island (male/female) and recreating that with actual human participants.

Comment analytics revealing audience engagement patterns and content resonance across different demographics

Content structure breakdown: How platforms organize multimodal content across video, interaction data, and comments

Let’s touch on how, as we can see in this breakdown, what gets abstracted away to just tabular blobs of text contains much information on how information is presented and interacted with. The visual elements and text within the video grab a user’s attention, while the audio provides voiceover narrating the message (or sometimes relevant background music). The titles and description provide detailed information on the video, while the comments section—something often overlooked in current processing workflows—presents a goldmine of user interaction and feedback.

The primary comments act as tier 1 opinions, with responses and likes serving as interaction trackers. If users have the same opinion, they’ll usually hit the like button instead of posting the exact same thing again. By linking the content of the video to actual interactions (both tier 1 and tier 2), we get a polling of audience feedback that is timely and beyond anything we can do in the same amount of time with surveys or really any other kind of data collection.

AI isn’t becoming creative—it’s becoming the ultimate creative research assistant. While generative AI struggles to produce truly fresh perspectives, it excels at helping us understand information and generate new insights that lead to genuinely creative work.

What Constitutes Creative Work?

To understand AI’s role in creativity, we need to establish clear boundaries around what constitutes creative work. Consider the common practice of taking screenshots from viral videos and adding captions from popular comments. While this involves some editing, it’s essentially sophisticated copying that accelerates content diffusion while reducing the economic returns of original creation—what economist Schumpeter called “creative destruction” in reverse.

Understanding how platforms structure multimodal content helps us see the complexity involved in creative work.

Real creativity is about choosing a unique perspective. Content with contrast or conflict naturally captures our attention—think of viral TikToks that expose workplace absurdities or Twitter threads that challenge conventional wisdom. But thoughtful, empathetic content is equally creative: the YouTube essayist who helps you understand your own anxiety, or the LinkedIn post that perfectly articulates what you’ve been feeling about remote work.

Here’s how I think about the creative ecosystem: there’s production (generating new content) and diffusion (deriving from or spreading existing content). AI’s sweet spot isn’t in either category alone—it’s in helping us understand massive amounts of information to find genuinely new angles and insights.

Right now, creating professional-quality video requires hours of shooting, editing, and post-production. As AI gets better at handling video, audio, and text together, we’re heading toward a world where that same video could be produced in minutes. This changes everything about the creative economy, making tools that help you find new inspiration increasingly valuable. Through this creative assistance, we can achieve two main effects:

Two primary effects of AI creative assistance: Inspiration acquisition and content derivation

Inspiration acquisition: Accelerating original content production by collapsing the draft → iterate loop
Content derivation: Accelerating the diffusion of quality creative work across formats and channels

Content Understanding for Enhanced Generation

How can we make language models produce outputs that meet our expectations? This challenge breaks down into two distinct problems: (1) we don’t know what our ideal output looks like, and (2) we know what we want, but the language model doesn’t understand us.

Most teams focus on the second problem through a toolkit of techniques: model alignment (training AI to follow human preferences), prompting (crafting better instructions), few-shot learning (training AI with just a few examples), retrieval-augmented generation or RAG (helping AI access specific databases), fine-tuning (customizing AI for specific tasks), and memory systems (helping AI remember context across conversations). But companies are rapidly commoditizing these approaches—many solutions are open-sourced, which explains why so many generative products deliver roughly comparable results.

Different types of AI workflows: From basic generation to iterative collaboration

The real differentiation lies in how we adapt engineering and data processing to specific business scenarios—and more importantly, in solving the first problem: helping users understand what they actually want to create.

August 2025: Recently I was asked about what the role of a PM in building technical products stands, especially given how much of the model side work is handled by technical teams. I think the answer, in short, is understanding of the product, and making sense of data in relation to how they are communicated and what they express. One may deal with numbers, but it’s harder to actually understand the people that make up those numbers. In the concrete example of multimodal content understanding, it’s being able to preserve the very granularity that makes this data so valuable, and proposing technical solutions—modality alignment, weighted clustering, agent triage, content rewrite, etc.

Brand Understanding and AI Integration

Brand intelligence platform: AI systems learning to understand brand voice, visual identity, and content guidelines for contextual generation

The evolution toward brand-aware AI represents a significant shift in content generation capabilities. Instead of producing generic output that requires extensive human editing, these systems can understand context—what works for a luxury brand versus a startup, what tone resonates with different demographics, what visual styles align with brand guidelines.

November 2025: I saw the recent release of Google Pomelli and I think this is a great example of how a general purpose technology moves from research and public beta to a more grounded and applied case that delivers actual time saving and value. Like TypeFace, it’s essentially creating a brand kit to free users from prompting repeatedly, and often times not knowing how to most effectively describe their style.

AI brand training interface: How multimodal brand kits teach AI systems to generate content that aligns with specific brand requirements

The training process involves feeding AI systems examples of successful brand content across multiple modalities—text, images, videos, and audio. The system learns not just what the brand says, but how it says it, what visual elements it uses, and what emotional tone it maintains. This creates AI that can generate content that feels authentically on-brand without constant human oversight.

Returning to the first problem—“I don’t know what output I want”—this stems from a lack of content understanding. Good script writing requires more than just hooks (“You won’t believe what happens next”), unique selling propositions (USPs), and calls-to-action (CTAs)—it needs a clear angle: content that resonates with the audience, fits the context, and achieves its purpose.

Some products are building brand kits or audience profiles to guide more specific content generation through manually defined style rules or user personas. While these types of configurations will probably become standard, the real breakthrough would be connecting insight data with generation without requiring manual setup every time.

Understanding User Needs

Looking at the creative technology landscape, every category—ad aggregation, competitor tracking, brand insights, performance analysis, content generation—has 3-4 companies offering basically the same thing. The data products feel traditional, while the AI products often just add ChatGPT integrations to existing workflows.

The real opportunity lies in acquiring more granular data and creating smoother interactions. Instead of isolated tools, imagine connecting the entire creative production process where you can participate and adjust at each stage—from initial research through final publication.

Here’s a simple way to think about product value: user value = new experience - old experience - replacement cost. Most products built on foundational language models with minor tweaks deliver limited incremental value. Users still need to craft personalized prompts, and outputs almost always require multiple rounds of editing before they’re usable.

So how do we increase incremental value? The answer isn’t just better AI—it’s better workflows.

User-Friendly Workflows

Currently, creators mostly call upon individual capabilities or data, but single capabilities are insufficient for full-process script/video generation. Building workflows can help users connect various AI capabilities, reducing friction between tool switches.

The concept of “workflows, not skills” addresses user needs: many users currently need 5-10 AI capabilities to complete their creative work, with most capabilities being disconnected and requiring frequent switching. By establishing a clear workflow, users can more efficiently call upon relevant tools to complete their creative work.

I used to think that simply connecting AI capabilities constitutes a workflow, but that’s like saying a toolbox is the same as knowing how to build a house. What we call “Language UI” is actually “Prompt UI”—it differs from true language interaction by missing the context and shared understanding present in human conversation.

Think about the difference: you can tell a colleague “make this more engaging” and they understand your brand, audience, and context. With ChatGPT, you need to write a novel-length prompt every single time explaining who you are, what you’re building, and what “engaging” means in your specific context.

The future workflow tools will have human-like elements—they’ll ask follow-up questions, remember previous conversations, and understand your specific goals without you having to explain everything from scratch. Current prompting is probably transitional; eventually, we’ll eliminate the need for context-heavy prompts by building AI that understands your context and generates appropriate guidance automatically.

Multimodal Interaction and Content Ecosystem

Finally, let’s discuss modality. Given the characteristics of different modalities (text - easily editable, images - non-linear, video - linear), different scenarios should use different modalities. The same user may need different interactions in different contexts.

Understanding Content Through Data Visualization

The first layer of multimodal content understanding goes beyond traditional analytics. Rather than just tracking views and likes, the most interesting insights come from clustering comments and opinion spread by category—product feedback, creator engagement, emotional responses. This granular analysis reveals patterns in audience sentiment and helps creators understand not just what performs well, but why it resonates with different audience segments.

Vector visualization analysis: AI-powered semantic mapping revealing hidden relationships between content themes and audience preferences

Brand intelligence extraction: Analyzing fine-grained insights from audience feedback and engagement patterns

But the real magic happens in semantic analysis. Vector embeddings can reveal hidden relationships between content themes that humans might miss. For example, videos about “productivity tips” might cluster surprisingly close to “cooking tutorials” because both satisfy the same underlying need for life optimization. This kind of insight helps creators find unexpected angles and untapped niches.

Content performance metrics: Comprehensive analysis tracking engagement patterns, reach optimization, and conversion effectiveness across content types

The final piece is comprehensive performance analysis that connects creative decisions to business outcomes. This isn’t just about vanity metrics—it’s about understanding which content patterns lead to sustainable audience growth, higher conversion rates, and long-term creator success. When you can see these patterns clearly, you can make more informed creative decisions.

Switching between modal forms (long/short/mixed) and modal types (text/image/audio/video) will become easier, essentially providing the same content with applicability across different scenarios. Users aren’t just people; they’re collections of needs. For instance, I might read text at the office due to setting constraints, watch videos while waiting in line with nothing to do, and listen to audio while driving or commuting. The same content may need three modalities (text/video/audio) connected based on the scenario. This could be further refined - people accelerate reading or listening for higher information intake. Finding ways to adapt the same content to different scenarios without increasing creation costs is another interesting challenge.

Case Study: Voice Synthesis

Take voice synthesis as an example. Technically, this technology is already quite mature—you can clone a voice with just a few minutes of audio. Yet when most people think about AI voice cloning, they imagine phone scams. Sure, there are fun projects like AI David Attenborough narrating random videos, or OpenAI’s GPT-4o launch event that briefly simulated Samantha’s voice from “Her.” But the most creative use I’ve seen comes from short video creators.

I recently discovered a creator called “Yi Tou Jue Lv” who makes derivative content based on “In the Name of the People” (a 2017 Chinese political drama). Their videos consistently get 500K+ views by doing something brilliant: they take original footage but replace all the narration with AI-synthesized character voices speaking internal monologues and psychological commentary. The result feels like getting inside the characters’ heads in a way the original show never offered.

What makes this work is the creator’s deep understanding of the characters combined with AI’s ability to generate consistent, high-quality voice synthesis. They’re not just copying—they’re creating a completely new layer of interpretation that audiences can’t get anywhere else.

Contrasting Audio & Text

It’s fascinating how differently our brains process audio and text. When we read, we’re essentially interacting with a graphical user interface—scanning, jumping between sections, processing information at our own pace. We’ve evolved sophisticated tools for text: highlighting, bookmarking, section headers, and search functions. Yet despite these advantages, text can feel less engaging than a good conversation.

Audio interfaces evolution: From podcast consumption (left) to social audio saving (right)—platforms adapting to our increasingly audio-first content behaviors and the need to preserve valuable conversations

Speaking, in contrast, is inherently linear and social. There’s something about the human voice that keeps us present—the subtle shifts in tone, the natural pauses, the back-and-forth rhythm. It’s why we can stay engaged in a podcast while walking (and multitask), yet reading typically demands our full attention.

This contrast reveals something deeper about how we process information. Text excels at conveying complex ideas—we can revisit difficult passages, cross-reference concepts, and process at our own speed. Audio shines in maintaining engagement and conveying emotion, even if the content itself is relatively simple. Perhaps the future lies not in choosing between these mediums, but in finding ways to combine their strengths. Imagine an interface that preserves the natural flow of conversation while adding the structural advantages of text—where you could navigate both temporally and conceptually, maintaining both engagement and comprehension.

Conclusion

Roland Barthes' "Death of the Author": When content is created, interpretation rights transfer to the audience

As Roland Barthes suggested with “The Death of the Author,” once content is created, interpretation rights transfer to the audience. We see this everywhere today—YouTube channels that analyze every Marvel movie, TikTok accounts that remix old TV shows, podcast networks that dissect every episode of popular series.

With improvements in AI voice synthesis, character generation, and content manipulation, we’re approaching a future where derivative works based on original intellectual properties can achieve professional quality while satisfying different interpretations and imaginations. The “Yi Tou Jue Lv” example I mentioned earlier is just the beginning.

These perspectives might all exist in the original work, but each remix offers a different angle, providing audiences with unique experiences. There’s still massive amounts of content that people want to see but isn’t available on any platform. Maybe creativity’s next evolution isn’t about generating entirely new content—it’s about intelligently remixing and reinterpreting what already exists to better satisfy what audiences actually want.

Final Thoughts

While generative AI capabilities evolve rapidly, human nature changes slowly. We overestimate technology’s short-term creative impact (AI won’t replace human creativity next year), but underestimate how fundamentally it will change creative workflows (it’s already happening in ways we’re only beginning to understand).

Making probabilistic models truly creative remains challenging yet fascinating work. The future lies not in AI replacing human creativity, but in building systems that amplify our ability to understand, synthesize, and create meaning from the infinite streams of content around us. That’s the creative challenge I’ll continue working on.

Appendix

This article was originally developed as a presentation and shared internally with TikTok team members during my time there. The content has been adapted for public publication and adjusted to remove potentially sensitive information while preserving the core insights about AI and creativity.

All views expressed in this article are my own and do not represent the official positions or strategies of TikTok or any other organization.

2 of 3 May 20, 2024

Multi-modal Creative Ad Generation

Development of TikTok Symphony Assistant - an AI-powered creative tool for generating ad scripts and video content. Agentic workflows and interface optimization for automated creative generation.

The advertising industry’s AI tools problem isn’t about generation quality—it’s about workflow integration. Most creative AI products today offer impressive individual capabilities but fail catastrophically when creators try to chain them together into actual work processes. A typical ad campaign might require juggling five to ten different AI tools for concept development, script writing, visual generation, voice synthesis, and editing, with creators constantly context-switching between platforms that don’t understand each other. The result? AI tools that promise ten-fold efficiency but deliver ten-fold frustration instead.

During my internship building TikTok Symphony Assistant, I learned why the future of creative AI isn’t about better models, but about better agent workflows that understand how creativity actually happens. The challenge isn’t technical—it’s cultural and systemic. Professional creators need to feel like they’re directing the AI, not being replaced by it. This means building systems where AI handles the tedious execution while creators focus on strategy, brand voice, and creative direction. The goal is augmentation that preserves creative agency rather than automation that eliminates it.

The TikTok Symphony Assistant represents a significant step toward solving this integration problem, leveraging sophisticated agentic workflows to streamline creative processes from ideation through execution. Rather than offering another standalone tool, Symphony Assistant demonstrates how AI can enhance existing creative workflows by understanding the context and continuity that professional campaigns require. The platform is accessible at https://ads.tiktok.com/business/copilot/standalone and serves as a practical case study for how agent-based systems can transform traditional advertising workflows.

Credits: TikTok Creative Team

Building Agentic Workflows

From LLMs to Agents

Leading AI companies now recognize that the transition from large language models to agent-based systems represents a fundamental shift in how we approach complex creative tasks. Traditional LLMs excel at generating individual pieces of content but struggle with the multi-step coordination that professional creative work demands. Agent systems solve this by introducing AI that can break down complex creative briefs, plan multi-step campaigns, and automatically route tasks to specialized tools while maintaining context throughout the entire process.

The key insight driving this evolution is that large language models deliver not just tools, but actual work results at specific stages of creative processes. Rather than asking creators to become prompt engineers, effective agent systems understand creative workflows and can execute specific roles within them. Application deployment becomes a matter of providing models with specific contexts and clear behavioral standards that align with professional creative workflows. The understanding and reasoning capabilities of LLMs can be applied to various creative scenarios, but success requires packaging general capabilities as abilities needed for specific positions or processes, overlaying domain expertise with general intelligence.

However, the promise of ten-fold efficiency gains remains largely unfulfilled because most AI tools still require creators to adapt their workflows rather than the AI adapting to how creative work actually happens. These workflows aren’t merely a presentation of parallel capabilities running in isolation, but rather seamless integrations where creators can jump in at any step to provide feedback, make adjustments, or take creative control. The real challenge isn’t building smarter AI—it’s building AI that preserves creative agency while eliminating the tedious, repetitive tasks that consume most creators’ time.

To understand what effective creative AI workflows look like in practice, let’s examine Typeface—a billion-dollar startup that’s closest to solving this integration problem. Their approach reveals both the promise and the remaining challenges in building AI that actually enhances creative work rather than replacing it.

The fundamental insight that drove our approach at TikTok was recognizing that successful AI applications require a workflow perspective that considers the entire creative process rather than optimizing individual tasks in isolation. Instead of asking “How can AI help with script writing?” we asked “How can AI understand the complete journey from creative brief to final campaign delivery?” This shift in thinking leads to very different product decisions. Rather than building another chatbot that generates scripts, we focused on building an intelligent coordinator that understands how scripts fit into broader campaigns, how they need to align with brand guidelines, and how they connect to visual concepts and distribution strategies.

The key questions that guided our Symphony Assistant development were practical and workflow-centered: What parts of daily creative workflows can be effectively enhanced by AI without disrupting the creative process? If AI systems need to process enterprise creative data, what value does this data provide at different stages of the creative business? Where does AI assistance sit most naturally in the creative value chain? In current creative operational models, which specific handoffs and transitions could be most effectively streamlined with intelligent automation? These questions helped us move beyond generic AI capabilities toward purpose-built creative intelligence.

Industry Consensus: Task-Specific Models and Architecture Evolution

Leading AI companies have converged on several key architectural approaches that directly informed our work on TikTok Symphony Assistant. Companies like Anthropic, A12Labs, and others now prioritize task-specific models and Mixture of Experts (MoE) architectures that represent a significant evolution from general-purpose language models. This shift reflects the recognition that creative workflows benefit more from specialized intelligence than from generalized capability.

Think of Mixture of Experts like a creative agency where different specialists handle different aspects of a campaign. Instead of one generalist AI doing everything poorly, you have separate ’experts’ for script writing, visual concepts, brand voice consistency, and audience targeting—all coordinated by an intelligent router that knows which expert to consult for each task. This approach dramatically improves both the quality of individual outputs and the coherence of the overall campaign, while reducing the computational resources required compared to scaling a single massive model.

For creative applications like Symphony Assistant, MoE architecture enables the system to develop deep expertise in different aspects of content creation while maintaining overall campaign coherence. Rather than asking a general-purpose model to switch context constantly between writing scripts and understanding visual concepts, we route different creative challenges to models specifically trained for those domains.

Our implementation assigns input creative data to different expert networks based on the creative challenge type. Each expert returns specialized outputs optimized for their domain—audience-appropriate script writing, brand-compliant visual concepts, or platform-specific content adaptations. The final output emerges as a coordinated combination that ensures both specialization and coherence throughout campaign development.

The key innovation lies in organizing expert networks around actual creative roles rather than technical divisions. For Symphony Assistant, we created experts that mirror how creative teams organize: audience psychology and messaging strategy, brand voice and tone consistency, platform-specific content requirements, and visual-text integration. This approach required training each expert on carefully curated datasets representing high-quality examples of their creative specialty, allowing deep domain expertise rather than shallow general competency.

Long Context Windows Enable Sophisticated Routing

The development of longer context windows, exemplified by Gemini 1.5’s one million token capacity, has fundamentally changed what’s possible in creative AI applications. Extended context windows solve one of the most persistent problems in creative work: maintaining consistency and coherence across complex, multi-faceted campaigns. Jeff Dean’s presentation at the Gemini 1.5 Hackathon at AGI House highlighted how these extended context windows enable more sophisticated in-context learning and more effective Mixture of Experts architectures, allowing AI systems to understand not just individual creative tasks but the broader strategic context that informs every creative decision.

For creative applications like Symphony Assistant, longer context windows mean the system can maintain awareness of entire creative briefs, comprehensive brand guidelines, detailed audience research, and complete campaign contexts throughout the generation process. This eliminates the frustrating experience of AI systems that “forget” crucial brand requirements or campaign objectives partway through content creation. Instead of forcing creators to constantly re-specify context, the system maintains a persistent understanding of the creative project’s goals, constraints, and requirements across every interaction.

Practical Implementation of AI Routing Systems

Real-world implementations of intelligent routing concepts can be seen in platforms like Writesonic, which uses GPT Router for intelligent model selection during content generation. The GPT Router system demonstrates how smooth coordination of multiple specialized models—including OpenAI’s GPT series, Anthropic’s Claude, Microsoft’s Azure models, and image generation models like DALL-E and Stable Diffusion—can dramatically speed up responses while ensuring reliability and consistency across different types of creative tasks.

This approach directly influenced our architecture decisions for Symphony Assistant, where different creative challenges benefit from different specialized models and different computational approaches. Script writing might route to a model optimized for conversational language and narrative structure, while visual concept development routes to models that understand visual composition and brand aesthetics. Platform optimization for TikTok versus LinkedIn requires entirely different understanding of audience behavior and content format requirements, so these tasks benefit from specialists trained on platform-specific data and success patterns.

How Agents Can Help Creators Achieve 10x Efficiency

The advertising and marketing industry represents one of the most promising applications for AI-driven workflow automation. Currently, creators typically juggle eight to ten different AI tools to produce a complete video campaign. This fragmented approach creates significant friction and context-switching overhead that negates many of the efficiency benefits AI should provide.

A typical video creation workflow demonstrates this challenge perfectly. Creators start with concept design in Midjourney, move to script and storyboard development in ChatGPT, generate visual assets using multiple image generation platforms, create video content through services like Runway or Pika, add dialogue and narration via Eleven Labs, incorporate sound effects and music from platforms like SUNO, enhance video quality through Topaz Video, and finally handle subtitles and editing in CapCut or similar tools. Each transition requires re-establishing context and manually ensuring consistency across platforms.

Improving Agent User Experience

Effective creative AI systems must address four fundamental user experience challenges that consistently emerge in professional creative workflows.

Personalized Memory & Style Customization becomes essential because adjusting generation style through prompts before each generation is both time-consuming and unpredictable. Professional creators need comprehensive generation rules that ensure consistent output quality without repeated manual adjustments. Typeface’s Brand Kit exemplifies this approach by allowing creators to establish persistent brand guidelines that inform every generation.

Rewind & Edit functionality addresses the reality that agent chaining accuracy decreases progressively through multi-step workflows. Human-in-the-loop processes allow creators to regenerate or fine-tune content at each step, ensuring final generation quality meets professional standards. Typeface’s Projects feature demonstrates this principle by including Magic Prompt assistance and seamless regeneration capabilities.

Choose from Variations recognizes that creators require options to make informed decisions about their content. Traditional generation processes force users to refresh entirely when dissatisfied with outputs, creating inefficiency. Providing multiple variations in single generations significantly improves user experience and creative flexibility.

Workflows, Not Skills addresses the core problem that creators currently need five to ten disconnected AI capabilities to complete advertising video creation. Most tools require frequent platform switching and context re-establishment. Effective creative AI systems present all capabilities at appropriate workflow stages, enabling efficient tool invocation without breaking creative flow.

Typeface: Blueprint for Integrated Creative AI

Typeface serves as the closest current example of effective creative AI workflow integration, having raised $165 million to reach a $1 billion valuation by solving the fundamental coordination problem in creative AI tools.

The platform demonstrates successful implementation of the four essential user experience principles. Their Brand Kit system allows creators to establish comprehensive brand guidelines including image styles, color palettes, and brand voice analysis. The Projects interface provides a Google Doc-like experience where creators can seamlessly invoke different AI capabilities without losing context. Their Template Library offers workflow-specific starting points that understand creative intent rather than just generating generic content.

Most significantly, Typeface’s integration strategy eliminates cross-platform collaboration friction through native connections with Microsoft Dynamics 365, Salesforce Marketing, Google BigQuery, Google Workspace, and Microsoft Teams. This approach recognizes that effective creative AI must work within existing professional workflows rather than requiring creators to adopt entirely new platforms.

Summary

The evolution of AI-powered creative tools reveals a clear trajectory from isolated capabilities toward integrated workflow solutions. Current marketing-focused products successfully integrate multiple stages of the creation process, providing workflow-like experiences that reduce cross-platform collaboration friction through strategic external integrations. However, the most successful implementations go beyond simply chaining various capabilities together—they require thoughtful GUI process specifications that understand how creative work actually happens.

The key insight from analyzing platforms like Typeface, Symphony Assistant, and similar tools is that workflows must be designed around creative intent rather than technical capability. Effective creative AI systems understand the relationships between different creative decisions, maintain context across complex campaigns, and preserve creative agency while automating repetitive tasks. The future of creative AI lies not in building more powerful individual models, but in building more intelligent coordination systems that understand how different types of creative intelligence need to work together to produce professional-quality campaigns.

Added Nov 11th: Case Study of Pomelli - Progress and Limitations in Brand Kit Creation

Pomelli Landing Page Pomelli’s landing page showcases a visually appealing interface for generating on-brand content, positioned as “Google Labs” experimental project for business content creation.

Pomelli represents a good step forward in creating a brand kit framework for creative content generation, demonstrating several advances in user experience design for AI-powered marketing tools. However, the platform reveals key limitations that highlight ongoing challenges in the space: lacking context awareness and an overfocus on generalization at the expense of domain-specific optimization.

Pomelli Campaign Interface The campaign creation interface emphasizes simplicity with a central prompt area and “Suggest Ideas” functionality, but the disclaimer “Pomelli can make mistakes, so double-check it” reveals underlying reliability concerns.

Strengths: Brand Identity Framework

Pomelli’s most significant contribution lies in its systematic approach to brand identity capture and application. The platform successfully implements several key principles we identified in our analysis of effective creative AI tools:

Personalized Memory & Style Customization: Like Typeface’s Brand Kit, Pomelli allows users to establish comprehensive brand guidelines that persist across content generation sessions. This addresses the fundamental user frustration of having to re-specify brand requirements for each creative task.

Workflow Integration: The platform demonstrates understanding that effective creative AI tools must integrate seamlessly into existing creative processes rather than requiring users to adapt their workflows to the tool’s limitations.

Pomelli Business DNA Setup The “Business DNA” setup process captures comprehensive brand information including logos, fonts, color palettes, taglines, and brand values, demonstrating a systematic approach to brand identity integration.

Despite these strengths, Pomelli doesn’t seem to support consistent generation of poster content. [Section to be continued]

The evolution from tools like Typeface through Pomelli to platforms like Symphony Assistant demonstrates the rapid maturation of creative AI, but also reveals that the most significant challenges lie not in generating content, but in generating the right content for specific contexts, audiences, and objectives.

3 of 3 September 20, 2024

Why Context is Everything in AI Content Generation

Why context is the critical factor in AI content quality - moving beyond single-prompt expectations to rich, contextual content generation that adapts to individual needs and cognitive patterns.

I spent a lot of time at TikTok watching AI-generated ad scripts fail in ways that were instructive. The model could write. The words were coherent. The outputs were genuinely worse than what a mid-level human copywriter would produce, not because the language was bad but because the context was thin.

The problem wasn’t capability. It was input. Someone would write a one-line prompt — “write an ad for our sports drink targeting Gen Z” — and expect something good. That’s not how it works. A good copywriter working that brief would spend twenty minutes asking clarifying questions before writing a word. The AI just started writing, because nobody told it what it needed to know.

This isn’t a model problem. It’s a workflow problem. The quality of any AI content output is roughly determined by how much context went in. Not necessarily more words — relevant, specific context. The target user’s exact situation. What they’ve seen before. What they’re likely to be skeptical about. The tone the brand has established. What worked last time and why.

When you provide that kind of context, the outputs change dramatically. Not because the model got smarter, but because you gave it something to reason about instead of asking it to generate from nothing.

The way I came to think about this: a single prompt is like asking someone to cook you dinner without telling them what you’re hungry for, what’s in the fridge, or whether you have any restrictions. A good cook can make something, but probably not the thing you wanted. Constraining the problem — specifying the constraints — is itself the skill.

This has implications for how you build content products. Most of the early AI content tools were essentially prompt UIs. They put a text box in front of you and sent whatever you typed to the model. That design assumes the user knows how to specify what they need, which is rarely true. The better products recognize that the user doesn’t know how to prompt well, and that their job is to extract context through the interface rather than expecting users to supply it through text.

The TikTok work made this concrete for me. The ads that performed well were the ones where we’d pulled signals from actual audience behavior — what they’d engaged with, what they’d scrolled past, the specific language patterns that appeared in comments from different segments — and built those signals into the generation context. The model wasn’t doing anything magical. It was just working with richer information.

What changes when you get this right is that the outputs start feeling like they were written for someone, not at someone. That’s the actual difference between content that converts and content that doesn’t. The technology can get you to “competent.” Context is what gets you to “resonant.”

I think about this now every time I see a demo of a content generation product. The demo prompt is always fully specified. The user always knows exactly what they want and describes it perfectly. Real users don’t work that way. The gap between demo and reality is almost entirely a context gap.