Multi-agent LLM Systems
Developing capable multi-agent systems for complex reasoning and human-AI collaboration.
Developing capable multi-agent systems for complex reasoning and human-AI collaboration.
Extracting meaningful insights from unstructured multi-modal content.
Leveraging synthesized insights for enhanced content generation.
Things keeping me busy on weekends
December 28, 2024
Interactive chat interface with multiple AI agents, enabling dynamic conversation flows and specialized problem-solving capabilities.
December 21, 2024
Content recommendation system leveraging embedding similarity for personalized content recommendation (based on profile).
December 7, 2024
Interactive learning aids for reading comprehension and engagement.
November 6, 2024
Crowdlistening transforms large-scale social conversations into actionable insight by integrating llm reasoning with expanding model context protocol(MCP) capabilities. While extracting quantatitive patterns from realtime data is already an rewarding task, our focus is not just on analyzing content at scale, but rather conducting original research directly from raw social data, generating insights that haven’t yet appeared in established reporting.
Deep research features provide professional-looking research reports, yet the contents are far from original, as they’re drawn from articles already indexable on the internet and paraphrased with LLMs. However, much of the internet’s data exists in unstructured formats - TikTok videos, comments, and metadata, for example. Too much content is generated every day for there to be existing articles written about it all, and when such articles are published, they’re often already outdated. When you consider multimodal data, metadata, and connections between data points, these are precisely the types of information that could yield genuinely interesting and useful insights.
I’ve been thinking about this problem while working at TikTok, enabling better social listening through more fine-grained insights extracted using multi-modal/LLM-based approaches. In October 2024, I started developing early conceptions of Crowdlistening, focusing on multi-modal content understanding for TikTok videos. Although deep research features like GPT Researcher and Stanford Oval Storm existed, it wasn’t intuitive to integrate unstructured data processing capabilities into their workflows.
I paused Crowdlistening in Winter Quarter due to other commitments, but during this time, Anthropic released the Model Context Protocol (MCP). I’ve just recently gotten back on track following progress in this field, and I believe this presents an interesting avenue for product innovation - deep research features are significantly enhanced by the growing ecosystem of MCP servers (the same agentic workflows perform much better given they rely on APIs, whose capabilities have improved over recent months).
What I’m particularly interested in exploring and building with Crowdlistening is the ability to extract actionable insights from large volumes of unstructured or semi-structured data, forming linkages, and perhaps even testing hypotheses to enable effective research at scale. We started with TikTok data as a prototype ground given my familiarity with the medium, but I could quickly see this covering any type of unstructured data available on the web.
Brands today face a fundamental paradox: they need broad insights from vast amounts of social data, yet require the detailed understanding typically only available through limited case studies. Current solutions offer either abstracted metrics that require tedious manual interpretation, expensive and limited content screening that can’t scale, or surface-level sentiment analysis that misses nuanced opinions. Crowdlistening bridges this gap by combining the scale of algorithmic analysis with the depth of human-like comprehension. This addresses the first challenge identified in “Essence of Creativity” - helping users understand massive amounts of information and generate meaningful insights when they “don’t know what output they want.”
The rationale behind Crowdlistening’s multi-modal technical architecture stems from the fundamental challenge of extracting truly valuable insights from the vast and varied landscape of online conversations. Traditional methods often fall short because they either focus on structured data or analyze individual modalities (text, video, audio) in isolation. This approach misses the rich context and nuanced understanding that arises from the interplay between different forms of content and engagement. For example, a viral TikTok video’s impact is not solely determined by its visual content but also by its accompanying audio, captions, user comments, and engagement metrics like likes and shares.
Crowdlistening’s design directly tackles this limitation by integrating embedding-based topic modeling and LLM deep research capabilities to process and understand this multi-faceted data. Embedding-based topic modeling efficiently identifies key themes across massive datasets, while the LLM’s deep reasoning capabilities can then analyze these themes within the context of various modalities. This dual approach allows for a layered analysis, examining both the primary content and the subsequent engagement it generates. By processing video, audio, text, and engagement metrics in a unified system, Crowdlistening can generate insights that reflect not just what is being said, but how it’s being said, the surrounding context, and the audience’s multifaceted response. This comprehensive understanding is crucial for overcoming the “insight paradox” and delivering truly actionable intelligence that goes beyond surface-level sentiment or abstracted metrics. Ultimately, this multi-modal design is essential for achieving the core goal of Crowdlistening: to conduct original research directly from raw social data and uncover emerging trends and nuanced opinions that would be invisible to single-mode analysis systems.
The platform provides granular breakdowns of content performance and audience reactions. As shown in our analysis results page, users can explore specific themes, track sentiment over time, and identify the most engaging content types. This helps brands understand not just what is being said, but why certain content resonates with their audience.
The opinion analysis feature goes beyond simple positive/negative sentiment to categorize specific viewpoints and concerns. This allows brands to understand the nuanced perspectives their audience holds, helping them craft more targeted and effective messaging.
We have integrated Model Context Protocols (MCPs) - an emerging standard that simplifies how LLMs interact with specialized tools and data sources. Rather than simple API calls, MCPs provide structured interfaces for LLMs to access specialized capabilities while maintaining context awareness throughout the analysis process.
As shown here, when a user submits a research question, the system dynamically determines which analytical capabilities to deploy. The Claude interface serves as the orchestration layer, identifying relevant MCP tools to activate and calling them sequentially:
This MCP-driven approach creates a dramatic efficiency improvement, reducing complex social media analysis from weeks to minutes while maintaining remarkable analytical depth.
To demonstrate Crowdlistening’s capabilities, we conducted a comprehensive analysis of public sentiment regarding Trump’s tariff policies. This serves as an excellent test case due to its complexity, polarizing nature, and economic impact.
When a user inputs the query about Trump’s tariff policies, our system activates the appropriate MCP tools in sequence. First, it gathers factual background information on the policies themselves, as shown below:
This background research provides context on what the current tariff policies are, including the 10% baseline tariff on all imports that took effect in April 2025, plus the higher “reciprocal” tariffs on countries with which the US has trade deficits (34% for China, 20% for the EU, and 24% for Japan).
Next, the system analyzes public opinion on these policies by examining social media content. The analysis reveals highly polarized reactions, categorized into three main perspectives:
The sentiment analysis dashboard shows that opinions on Trump’s tariff policies are distributed as 38% supportive, 42% critical, and 20% neutral or mixed. This visualization helps brands and researchers quickly understand the overall public response landscape.
One of the most valuable outputs is our projected economic impact analysis. This data visualization clearly presents the concrete financial implications of these policies across multiple domains:
The analysis shows an estimated $1,300 annual cost increase per US household, a projected 0.8% reduction in long-run US GDP, significant auto price increases ($3,000 for US vehicles, $6,000 for imports), and warnings about market volatility.
Beyond simple pro/con sentiment, our opinion analysis feature categorizes specific viewpoints with remarkable granularity. For instance, when examining comments on related content, we can identify nuanced perspectives and their prevalence:
This example shows how our system can identify several different comment themes, including positive views of content creators (37.5%), appreciation for intelligent discussion (25%), and concerns about media echo chambers (12.5%). This level of nuanced understanding would be impossible through traditional keyword or basic sentiment analysis.
Our solution has been validated through interviews with major brands like L’Oreal, confirming we drastically cut the time and cost of social media analysis. Crowdlistening enables:
We believe Model Context Protocols represent the future of specialized LLM applications. As shown in our implementation, MCPs provide a structured way for language models to interact with specialized tools and data sources while maintaining context awareness throughout the analysis process.
This approach is likely to become standard in LLM application development given how effectively it bridges the gap between general-purpose AI and domain-specific functionality. We anticipate seeing more MCP clients (interaction surfaces like Claude’s interface) emerge as this paradigm gains traction.
For social media analysis specifically, this approach creates a fascinating dynamic where AI-driven insights can actually lead structured reporting in terms of timeliness and depth. By processing and analyzing unstructured social data at scale, we can identify emerging trends and public sentiment shifts before they’re covered in traditional reporting.
Crowdlistening represents the next evolution in social listening tools - moving beyond counting mentions to truly understanding conversations at scale. By transforming social media chatter into structured insights, we’re helping brands make more informed decisions faster than ever before.
As noted in “Essence of Creativity,” the real value in AI-powered tools comes not just from generating content, but from helping users find new perspectives and insights. Our platform serves as both an inspiration acquisition tool (accelerating original content production) and a content understanding tool (helping brands better comprehend their audience). By connecting insight data with generation capabilities, we’re creating the kind of breakthrough product that bridges the gap between understanding and action.
Credits: This project was developed in collaboration with Madison Bratley, whose expertise in journalism and social media analysis was instrumental in conceptualizing how this technology could transform research methodologies. Additional contributions from Violet Liu in providing valuable usability feedback for our early prototype. I would also like to acknowledge Zhengjin, Cathy, Roy, Ruiwan, Qiping, Tongming, and other members on the Creative team at TikTok, who I’ve discussed early conceptions of this idea with.
August 20, 2024
Analyze thousands of tiktoks to provide actionable trends & insights for key agencies. (Worked on multi-modal content understanding) To be released on TikTok Creative Center (https://ads.tiktok.com/business/creativecenter/pc/en)
Credits: TikTok Creative Team
In the rapidly evolving space of AI-driven creative tools, we’re witnessing a significant transition from general-purpose large language models to specialized, task-specific agent systems. This shift represents a fundamental change in how AI approaches creative work, particularly in advertising and marketing.
While many current GenAI applications focus heavily on content generation capabilities, the true creative bottleneck often isn’t in the generation step itself. Rather, it lies in the quality of insights that inform and guide the creative process. Without meaningful data and analysis, even the most sophisticated generation tools produce generic, uninspired content.
This blog explores how data insight products are evolving alongside generative AI technologies, and how their integration could fundamentally transform content creation workflows.
The limitations of general-purpose large language models have become increasingly apparent when handling complex creative tasks. To address issues like hallucinations and improve task completion capabilities, the industry has largely reached a consensus around agent-based approaches.
Different types of agent workflows have emerged to address specific needs. Non-agentic workflows generate content linearly without backtracking, suitable for straightforward tasks. Reflection-based systems introduce iterative improvement cycles where the AI criticizes and refines its own outputs. Tool use capabilities enable function calls and web browsing for enhanced research capabilities.
More advanced systems implement planning algorithms that decompose complex tasks into manageable steps, similar to how human creators break down projects. At the frontier, multi-agent collaboration enables specialized AI agents to work together, each handling different aspects of a complex creative process.
This evolution toward more sophisticated agent architectures reflects a growing understanding that creative work isn’t linear—it requires iteration, refinement, and the ability to leverage different capabilities at different stages of the process.
One of the key limitations in current AI creative products is their focus on isolated capabilities rather than integrated workflows. In the advertising and marketing industry, there’s a high concentration of AI tools, but most provide only single functions or partial capabilities.
A content creator typically needs to move through multiple stages: gathering insights, analyzing competitors, developing concepts, generating scripts, creating visual assets, and optimizing the final product. Currently, this requires juggling multiple disconnected tools, manually transferring context between them, and piecing together a cohesive workflow.
Users don’t simply need better individual tools—they need comprehensive workflows that connect these steps seamlessly. The value proposition shifts from “what can this AI do?” to “how does this AI fit into my creative process?” This represents a fundamental shift in how we should design and evaluate AI creative systems.
The current market for insight products shows several distinct categories, each addressing different aspects of the creative process. Here’s a structured analysis of the landscape:
Products in this category focus on collecting, organizing, and analyzing existing advertisements across platforms. Pipi Ads maintains a library of over 20 million TikTok ads with extensive filtering capabilities, allowing users to study successful campaigns and identify trending approaches. Foreplay offers a more workflow-oriented solution, enabling users to save ads from multiple platforms, organize them with custom tags, and build creative briefs based on existing successful content.
The value proposition of these tools centers on learning from what already works. By studying high-performing ads, creators can identify patterns and strategies that resonate with specific audiences. However, most of these tools stop at the analysis stage without directly connecting insights to content generation.
Tools like Social Peta, Big Spy, and Story Clash provide deeper analysis of competitive activities. Social Peta offers insights into content distribution across 69 countries and 70 networks, analyzing multimedia types and dimensions. Big Spy enables cross-network ad searching with multiple filters, while Story Clash specializes in TikTok influencer tracking and performance analysis.
The competitive analysis market has grown substantially with the rise of social media advertising, with new players continuously entering the space to address specialized niches and platforms. These tools typically provide dashboard interfaces with various filters for monitoring competitor strategies, but most lack direct integration with content creation workflows.
Social listening platforms like Springklr, Exolyt, and Keyhole monitor brand mentions and sentiment across social channels. These tools analyze both posts and comments, providing valuable data on how audiences perceive brands and their content. Springklr offers comprehensive post and comment analysis with sentiment tracking, while Exolyt specializes in TikTok-specific insights, comparing brand content with user-generated content.
Keyhole delivers profile analytics, social trend monitoring, and campaign tracking. These tools excel at capturing the audience’s voice and identifying shifts in perception, but typically require significant manual analysis to translate these insights into actionable creative strategies.
Platforms such as Social Insider, Motion App, and RivalQ focus on analyzing ad performance metrics. These tools help marketers understand what content performs best, with detailed analytics on engagement, conversion, and return on investment. By identifying high-performing content patterns, these tools can inform future creative decisions.
However, there remains a significant gap between identifying what works and automatically generating new content based on those insights. Most performance analysis tools remain separated from content creation workflows, requiring manual interpretation and application of insights.
Several standout products illustrate different approaches to the insight-generation challenge:
TikBuddy focuses exclusively on TikTok analytics, offering creator rankings by category, follower count, and growth rate. The tool provides comprehensive account performance monitoring and video data analysis through a convenient Chrome extension.
Its specialized focus allows for deeper platform-specific insights, but its utility is limited to a single platform and doesn’t extend to content creation. Users must manually apply any insights gained to their creative process.
Foreplay stands out for its more integrated workflow approach. The platform enables users to collect ad content across platforms, preserve it even after platform deletion, and organize it with tags and categories. Its brief creation tools facilitate the transition from insight to execution, with support for brand information and specific generation requirements.
The platform’s AI storyboard generator creates hooks and develops scripts based on collected insights. Foreplay also integrates discovery features organized by community, brand, and experts, alongside competitor monitoring capabilities.
This approach begins to bridge the gap between insights and creation, though the integration remains partial rather than fully automated.
Keyhole exemplifies the analytics-focused approach, tracking keywords and brand mentions with temporal context. The platform offers detailed post analysis, influencer identification, trending topics visualization, and profile analytics with optimization recommendations.
Its strength lies in comprehensive data collection and visualization, but like many analytics platforms, it requires significant human interpretation to translate insights into creative decisions.
The data insight landscape continues to evolve rapidly, with several notable innovations emerging:
Open source projects like Vanna are revolutionizing text-to-SQL capabilities, making database querying more accessible to non-technical users. These tools enable creators to extract specific insights from complex datasets without specialized database knowledge.
Recent startups are developing interactive data dashboards that visualize complex datasets in more intuitive ways, allowing for easier pattern identification and insight extraction. These tools employ advanced visualization techniques to make data more accessible and actionable.
User feedback aggregation tools are also gaining traction, automatically summarizing and categorizing customer sentiment from reviews and comments. These systems can identify common themes and concerns, providing valuable input for content creators looking to address audience needs.
The most promising innovations focus on reducing the cognitive load required to extract meaningful insights from data, making the path from analysis to action more direct and intuitive.
The next evolution in creative AI tools will likely center on high-quality content generation based on data insights. Current GenAI applications often produce unnecessary content redundancy—different from hallucinations, but equally problematic for effective communication.
The real creative barrier isn’t typically in the generation process itself, but in the prompts—the insights that inform decision-making. When using agent-based systems, the quality of instructions and background information directly impacts the output quality.
For example, advanced AI systems can now decompose complex goals like “How can a lifestyle channel creator get 1,000 subscribers on YouTube?” into specific tasks: analyzing successful channels, generating targeted content ideas, and implementing optimization strategies. However, the quality of these recommendations depends entirely on the AI’s access to relevant, accurate data about what actually works.
By leveraging sophisticated content analysis, we can identify truly effective patterns in high-performing content. Multimodal understanding can reveal why certain creative approaches resonate with specific audiences, providing creators with concrete, unique insights rather than generic advice.
The future lies in connecting these insights directly to the generation process—using what we know works as the foundation for creating new content that maintains brand uniqueness while leveraging proven patterns.
Compared to other GenAI creative tools, insight products place greater emphasis on data quality and quantity. The next leap in AI-generated content quality will likely come from precise generation guided by robust insight data.
The most promising opportunity lies in creating systems that can automatically analyze successful content across platforms, extract meaningful patterns from this analysis, and directly translate these insights into generation guidance. This approach would produce highly targeted content that leverages proven patterns while maintaining brand distinctiveness.
As we move forward, the focus will shift from mere information generation toward sophisticated information synthesis—providing not just content, but content informed by actionable insights derived from real-world performance data. Organizations that successfully integrate insight gathering with content generation will gain a significant competitive advantage in an increasingly crowded digital landscape.
The future belongs not to those with the most powerful generative models, but to those who can effectively transform data into creative insight, and insight into compelling content.
May 20, 2024
Leverage generative AI capabilities for creative script ideation and video ad creation. (Worked on agentic workflows and interface optimization) https://ads.tiktok.com/business/copilot/standalone?locale=en&deviceType=pc
Credits: TikTok Creative Team
The transition from LLMs to Agents has become a consensus in the AI community, representing an improvement in complex task execution capabilities. However, helping users fully utilize Agent capabilities to achieve tenfold efficiency gains requires careful workflow design. These workflows aren’t merely a presentation of parallel capabilities, but seamless integrations with human-in-the-loop quality assurance. This document uses Typeface as a reference to explain why a clear primary workflow is necessary, as well as design approaches for functional extensions.
Google held its Google Cloud Next conference from April 9-11, announcing products like Google Vids, Gemini, Vertex AI, and related updates.
From a consumer product perspective, despite Google releasing many products, they were relatively superficial (Google Vids, Workspace AI, etc.). Examples like their Sales Agent demonstration were awkward in workflow, similar to Amazon Rufus. However, the enhanced data insight capabilities enabled by long context windows are becoming a confirmed trend.
From a business product perspective, while Google showcased many Agent applications built on Gemini and Vertex AI and emphasized their powerful functionality, they glossed over the difficulties of actual deployment. Currently, both large tech companies and traditional businesses face challenges in implementing truly effective workflows.
LLMs deliver not just tools, but work results at specific stages of a process. Application deployment can be viewed as providing models with specific contexts and clear behavioral standards. The understanding and reasoning capabilities of LLMs can be applied to various scenarios; packaging general capabilities as abilities needed for specific positions or processes involves overlaying domain expertise with general intelligence.
We should look beyond ChatBot and Agent dimensions to view applications from a Workflow perspective. What parts of daily workflows can be taken over by LLMs? If large models need to process certain enterprise data, what value does this data provide in the business? Where does it sit in the value chain? In the current operational model, which links could be replaced with large models?
Content that has reached consensus:
Most companies (A12Labs, Anthropic, etc.) are now developing Task Specific models and Mixture of Experts architectures. The MoE architecture has been widely applied in natural language processing, computer vision, speech recognition, and other fields. It can improve model flexibility and scalability while reducing parameters and computational requirements, thereby enhancing model efficiency and generalization ability (Mixture of Experts Explained).
The MoE (Mixture of Experts) architecture is a deep learning model structure composed of multiple expert networks, each responsible for handling specific tasks or datasets. In an MoE architecture, input data is assigned to different expert networks for processing, each returning an output structure, with the final output being a weighted sum of all expert network outputs.
The core idea of MoE architecture is to break down a large, complex task into multiple smaller, simpler tasks, with different expert networks handling different tasks. This improves model flexibility and scalability while reducing parameters and computational requirements, enhancing efficiency and generalization capability.
Implementing an MoE architecture typically requires the following steps:
Define expert networks: First, define multiple expert networks, each responsible for handling specific tasks or datasets. These expert networks can be different deep learning models such as CNNs, RNNs, etc.
Train expert networks: Use labeled training data to train each expert network to obtain weights and parameters.
Allocate data: During training, input data needs to be allocated to different expert networks for processing. Data allocation methods can be random, task-based, data-based, etc.
Summarize results: Weight and sum the output results of each expert network to get the final output.
Train the model: Use labeled training data to train the entire MoE architecture to obtain final model weights and parameters.
At the Gemini 1.5 Hackathon at AGI House, Jeff Dean noted the significant aspects of Gemini 1.5: 1 Million context window, which opens up new capabilities with in-context learning, and the MoE (Mixture of Experts) architecture.
Writesonic (https://writesonic.com) uses GPT Router for LLM Routing during AI Model Selection.
GPT Router (https://github.com/Writesonic/GPTRouter) allows smooth management of multiple LLMs (OpenAI, Anthropic, Azure) and Image Models (Dall-E, SDXL), speeds up responses, and ensures non-stop reliability.
from gpt_router.client import GPTRouterClient
from gpt_router.models import ModelGenerationRequest, GenerationParams
from gpt_router.enums import ModelsEnum, ProvidersEnum
client = GPTRouterClient(base_url='your_base_url', api_key='your_api_key')
messages = [
{"role": "user", "content": "Write me a short poem"},
]
prompt_params = GenerationParams(messages=messages)
claude2_request = ModelGenerationRequest(
model_name=ModelsEnum.CLAUDE_INSTANT_12,
provider_name=ProvidersEnum.ANTHROPIC.value,
order=1,
prompt_params=prompt_params,
)
response = client.generate(ordered_generation_requests=[claude2_request])
print(response.choices[0].text)
Content that is still not determined:
What constitutes a reasonable workflow remains to be determined. Some scenarios, like Amazon Rufus shopping guidance (where users need to converse before selecting products), differ significantly from existing user workflows and fail to provide efficiency improvements. -Verge
Many companies conducting needs validation are choosing customer profiles too similar to themselves or their friends, so the authenticity of these needs remains questionable. Additionally, existing AI product business models are trending toward price wars at the foundational level, with unclear differentiation at the application layer. -Google Ventures
AutoGPT represents the vision of accessible AI for everyone, to use and build upon. Their mission is to provide tools so users can focus on what matters. https://github.com/Significant-Gravitas/AutoGPT
A GPT-based autonomous agent that conducts comprehensive online research on any given topic. https://github.com/assafelovic/gpt-researcher
The advertising and marketing industry is one of the business sectors where AIGC is most widely applied. AI products are available for various stages, from initial market analysis to brainstorming, personalized guidance, ad copywriting, and video production. These products aim to reduce content production costs and accelerate creative implementation. However, most current products offer only single or partial functions and cannot complete the entire video creation process from scratch.
Concept Design: Midjourney Script + Storyboard: ChatGPT AI Image Generation: Midjourney, Stable Diffusion, D3 AI Video: Runway, Pika, Pixverse, Morph Studio Dialogue + Narration: Eleven Labs, Ruisheng Sound Effects + Music: SUNO, UDIO, AUDIOGEN Video Enhancement: Topaz Video Subtitles + Editing: CapCut, JianYing
User Need: Adjusting generation style through prompts before each generation is time-consuming and unpredictable. A comprehensive set of generation rules can help ensure that generated content consistently meets user needs, avoiding repeated adjustments. Example: Typeface Brand Kit
User Need: From a probability perspective, the accuracy of Agent Chaining decreases progressively. Setting up human-in-the-loop processes allows users to regenerate or fine-tune after each step, helping ensure final generation quality. Example: Typeface Projects (also includes Magic Prompt to assist with prompt generation)
User Need: Users want options. In existing generation processes, if users are dissatisfied with generated content, they need to refresh the generation, which is inefficient. Providing multiple options in a single generation can improve user experience. Example: Typeface Image Generator (also supports favoriting)
User Need: Currently, some users need to use 5-10 AI capabilities to complete advertising video creation. Most capabilities are disconnected, requiring frequent switching. By establishing a clear workflow, users can more efficiently invoke relevant tools to complete their creation. Example: Typeface Workflow (all capabilities presented at the appropriate stages)
Typeface was founded in May 2022, based in San Francisco. In February 2023, it received $65 million in Series A funding from Lightspeed Venture Partners, GV, Menlo Ventures, and M12. In July 2023, it completed a $100 million Series B round led by Salesforce Ventures, with Lightspeed Venture Partners, Madrona, GV (Google Ventures), Menlo Ventures, and M12 (Microsoft’s venture fund) participating. To date, Typeface has raised a total of $165 million, with a post-investment valuation of $1 billion. (Product positioning: 10x content factory)
Multiple Agent calls centered around the core document editing experience.
When users log into the Typeface homepage, they see four core functions in the left toolbar (Projects, Templates, Brands, Audience). The main page shows corresponding workflow options (Create a product shot, generate some text, etc.). The Getting Started Guide at the bottom of the main page provides guidance videos for certain use cases (Set up brand kit, repurpose videos into text) to help users understand how to invoke various capabilities.
When users click to enter the Brands page, they can set up multiple Brand generation rules, divided into 3 items:
When users click to enter the Projects page, they see a Google Doc-like interface storing multiple projects. Each project opens to a main document page with a resizable input bar at the bottom. Clicking the input bar presents options:
Additionally, users can select Refine to adjust generation language and tone (fixed options).
After clicking Create an image, users enter the image editing page with six integrated functions on the left: “Add, select, extend, lighting, color, effects, adobe express.” Users can generate and adjust images directly and favorite preferred generations.
The difference from Create an image is that Product shot includes specific products, while image isn’t necessarily product-related.
After clicking Generate text, users enter a prompt input field. Clicking the settings icon in the upper right allows setting Target Audience and Brand Kit. After generation, users can further adjust the prompt for a second generation, and selected content appears in the Project docs.
Typeface offers various generation templates. Users can search and select from the Template library, which adjusts the input box according to the content template, like TikTok Script.
When generating content, users can select user profiles and set Age Range, Gender, Interest or preference, and Spending behavior (with fixed options).
These integrations allow users to create in their familiar workspaces, avoiding the friction of cross-platform collaboration.
https://www.typeface.ai/product/integrations
Integration allows marketers to generate personalized content directly within Dynamics 365 Customer Insights, enhancing productivity and return on investment.
Users can generate multiple personalized creatives for audience-targeted campaigns, support scaling tailored content, and create variations for different target audiences.
Users can define audience segments with customer intelligence from BigQuery’s data from ads, sales, customers, and products to generate a complete suite of custom content for every audience and channel in minutes.
Users can streamline content workflows from their favorite apps and access content drafts within Google Drive, refine, rework, or write from scratch, and share with other stakeholders for quick approvals.
Create content in Teams using Typeface’s templates and repurpose materials or create new content. Make quick edits, such as improve writing, shorten text, change tone, and more, all within the Teams chat environment.
Workflows are not just about having various capabilities in the creation process, but also about chaining them together with appropriate GUI process specifications. Current marketing-focused products mostly integrate multiple stages of the creation process, providing workflow-like experiences for users, and reducing cross-platform collaboration friction through external integrations.
August 20, 2023
AI Homework helper with advanced reasoning and visualization for all school subjects. (Worked on LLM Reasoning) https://www.gauthmath.com/
Credits: Lexi Ling, Gauth Team
June 15, 2023
Authors: Terry Chen, Allyson Lee
Effective coaching in project-based learning environments is critical for developing students’ self-regulation skills, yet scaling high-quality coaching remains a challenge. This paper presents an LLM-enhanced coaching system designed to support project-based learning by helping connect peers struggling with the same regulation gap, and to help coaches by identifying regulation gaps and generating tailored practice suggestions. Our system integrates vector-based semantic matching with LLM-generated regulation gap categorizations for Context Assessment Plan (CAP) notes. Results demonstrate that our system effectively retrieves relevant coaching cases, reducing the cognitive burden on mentors while maintaining high-quality, context-aware feedback.
Training college students to tackle complex, open-ended innovation work requires developing strong regulation skills for self-directed work. Coaches guide the development of these regulation skills, helping students develop cognitive, motivational, emotional, and strategic behaviors needed to problem solve and reach desired outcomes. However, coaches face significant challenges in providing personalized guidance to multiple student teams.
Existing AI-based project management tools help track tasks but fail to capture nuanced ways students approach their work. Large Language Models (LLMs) show promise in analyzing text-based interactions and generating structured feedback, but their application to coaching remains underexplored.
To address these issues, we propose utilizing LLMs to develop and integrate three key technical innovations. First, Peer Connections facilitate connections between students with similar challenges. Second, Coaching Reflections help coaches analyze patterns and improve their practice through identifying regulation gaps. Finally, Practice Suggestions adapt similar cases to new situations.
Our system is built around a novel codebook consisting of regulation gap definitions and examples gathered across learning science literature. The codebook categorizes student regulation gaps in a tiered approach:
Our codebook includes three primary categories. Cognitive skills relate to approaching problems with unknown answers. Metacognitive skills involve planning, help-seeking, collaboration, and reflection. Emotional aspects cover dispositions toward self and learning that affect motivation.
The tier 2 categories provide more specific regulation gaps. These include representing problem and solution spaces, assessing risks, and critical thinking and argumentation. Additionally, they cover forming feasible plans, planning effective iterations, addressing fears and anxieties, and embracing challenges and learning.
Our system combines semantic similarity search with LLM-based analysis in a retrieval-augmented generation approach. The process begins when student regulation notes are pre-processed with metadata on tier 1 and tier 2 regulation gaps. These notes are then encoded into text embeddings, after which a vector database retrieves the most similar historical cases. Finally, an LLM (Deepseek) generates structured responses including diagnosis of potential regulation gaps, practice suggestions targeted to these gaps, and references to similar historical cases.
This grounds LLM suggestions in actual coaching experiences rather than generic advice, improving the relevance and actionability of recommendations.
We developed and tested three approaches to match students with similar regulation challenges:
The Baseline Semantic Approach uses vector embeddings to find similar cases based on textual similarity. The Weighted Semantic Similarity approach separates and weights regulation gap description (0.7) from contextual information (0.3). Our Hybrid LLM-Codebook Approach combines semantic matching with LLM-generated metadata using our regulation codebook.
The hybrid approach proved most effective, assigning the highest weight (0.5) to tier 2 categories and lower weights to tier 1 categories (0.1) and text content (0.2 each for gap text and context).
We evaluated each model against the same three notes, analyzing the top 5 returned similar notes. The semantic matching performed well when addressing cognitive and metacognitive gaps with repetitive terminology but struggled with emotional regulation gaps. The LLM-codebook approach showed promise in accurately identifying regulation gaps but was computationally intensive. The hybrid model consistently and efficiently identified notes with the same regulation gap while maintaining contextual similarity.
Our system effectively bridges the gap between human expertise and AI capabilities in coaching contexts. Key takeaways include Hybrid AI-Driven Case Retrieval, where combining LLM-driven metadata tagging with traditional semantic matching enables precision in retrieving relevant coaching cases, and Structured Codebooks for Domain-Specific AI, where our tiered classification system grounds LLM-based reasoning in expert-validated pedagogical frameworks.
Future work will focus on several areas of improvement. We plan to improve clarity of writing in notes and collect more data through alternative sources. Additionally, we aim to develop sub-categorized codebooks with specific examples and reasoning chains. Finally, we will explore more sophisticated reasoning methods like external knowledge bases or memory systems.
This research contributes to the broader field of AI-enhanced education and human-AI collaboration, offering insights into how AI can augment expert-driven mentoring in complex, open-ended learning settings.