The Agent Economy: Tokens as Bandwidth, Apps for Agents

A reasoned view of the agent economy: what is changing, where agent-mediated software is headed, and what product teams should build for next.

Thesis role: Macro frame — what is changing in software as agents become the operational layer.

The Interface Shift: Humans → Agents → Tools

As agents become the operational layer between people and tools, value shifts from UI access to reliable delegation. As token costs continue to fall, more work moves into agent-mediated execution. We already see this in practice: operating-layer agents can run app actions, and systems like OpenClaw can coordinate research, synthesis, and production workflows that previously required manual tool switching. In many workflows, humans are no longer the direct interface to every tool; agents increasingly sit in the middle.

This changes the product bottleneck. The hard part is not only generation speed, but ambiguity delegation: how to carry constraints, trade-offs, and rationale from human intent into executable action.

Delegating Ambiguity, Not Just Tasks

In human teams, incomplete requirements are often repaired through shared context and follow-up conversation. Agent workflows are less forgiving. If intent is implicit, distributed, or contradictory, agents can execute quickly but drift strategically.

The practical implication is that agent-era systems must support reasoning transfer, not just command transfer. Teams need mechanisms that preserve upstream judgment—why this trade-off, why this constraint, why now—so delegated execution remains aligned.

What Winning Systems Must Provide

In an agent-mediated economy, durable advantage comes from systems that can reason over unstructured human signal, distill decision-grade constraints, and package full-context artifacts that survive handoffs across product, engineering, and go-to-market functions.

This is the foundation for the rest of the CrowdListen thesis: if agents become primary operators, then context quality and ambiguity handling become first-order product design variables.

Turning Audience Insight Into Agent-Ready Specs

CrowdListen converts raw user conversation into structured product insight and agent-ready PM specs, preserving intent from feedback signal to executable task.

Thesis role: System design — how CrowdListen converts ambiguity into agent-ready execution context.

The problem: customer feedback is fragmented and unstructured, while PRDs written for humans often fail when passed to coding agents because intent and context are lost. CrowdListen preserves and operationalizes intent by synthesizing user signal into agent-ready PM specs that decompose features into executable tasks with full context.

CrowdListen Overview

Why this product exists: ambiguity breaks execution

In the agent economy, ambiguity compounds cost quickly. Teams can ingest massive volumes of comments, videos, and feedback and still miss the decision signal. Dashboards may look complete, but at prioritization time the core question remains: which pain points are durable, which are transient, and what should be built now.

This is where execution quality breaks down. If ambiguity is not reduced early, it propagates through planning, handoff, and implementation. Agents can execute quickly, but they execute what they are given. When context is fuzzy, speed amplifies misalignment. CrowdListen is designed to reduce that ambiguity before work is delegated, so the transition from signal to decision to execution remains grounded in evidence.

Product Suite Overview

Product Suite

CrowdListen is built around a specific execution failure we kept seeing: feedback is fragmented across channels, synthesis is manual, and intent gets lost between research, planning, and delivery. Most product organizations can collect signal but cannot preserve user context through implementation. That is where quality drops: requirements become abstract, handoffs become lossy, and coding agents produce work that is technically correct but strategically misaligned.

We designed the product as one connected operating flow instead of disconnected tools. Feed captures and structures raw audience signal, Workspace turns that signal into evidence-backed product direction, and Tasks routes scoped work to agents with enough context to execute reliably. The system is not trying to generate more artifacts; it is trying to preserve decision quality from first observation to shipped output.

Feed

Feed: Cross-channel Signal Intake

Feed consolidates cross-channel conversation into a single signal layer, including pain points, feature requests, objections, and recurring workarounds. Instead of relying on mentions or keyword counts, it clusters meaning and tracks persistence over time, so teams can distinguish temporary noise from durable demand. The output is a structured view of what users are actually asking for, in their own language, with enough specificity to drive product decisions.

Workspace

Workspace: Insight Development

Workspace is where teams turn raw signal into decisions. It supports conversational exploration of the problem space, helps validate hypotheses against evidence, and produces richer product artifacts than traditional summary docs. The emphasis is not on writing longer PRDs; it is on preserving rationale, constraints, and user context so that every decision remains traceable back to real audience behavior.

Tasks

Tasks: From Insight to Execution

Tasks closes the loop between product intent and implementation. It decomposes specs into executable tasks and routes them to coding agents with project context intact, which reduces drift between what the team intended and what actually gets built. This is the layer that turns CrowdListen from an analysis product into an execution system: user signal becomes prioritized work, and prioritized work becomes shipped outcomes with accountability.

Why this structure matters

Agent-Ready Analysis Canvas

The thesis is simple: in an agent-driven product economy, the bottleneck is no longer writing code, it is preserving intent. Teams that can carry user context through every handoff will iterate faster and ship better decisions, while teams that lose context will scale misalignment. CrowdListen is designed to be the PM layer for agents by translating audience insight into agent-ready specifications that remain grounded in evidence.

Tooling for agents: MCPs and skills that connect directly to CrowdListen

A large part of our execution runs through an agent integration layer. In practice, we treat CrowdListen as tooling for agents rather than a dashboard humans manually operate. We expose MCPs and agent skills so agents can directly access features, pull evidence, and convert findings into work artifacts.

This shows up in two concrete examples.

CrowdListen Docs & Operating Surface

1) Product management for agents (delegating ambiguity)

The first pattern is delegating ambiguity to agents while keeping intent intact. Agents ingest signals across channels, connect dots between recurring pain points and feature requests, and surface structured opportunities that can be acted on immediately. Instead of handing agents vague summaries, we route evidence-backed context and constraints so they can turn fragmented conversation into actionable feature proposals.

A meaningful share of this workflow now runs through our agent integration layer: source ingestion, synthesis, and conversion into agent-ready specs/tasks. The objective is not more reporting; it is reducing ambiguity between signal, decision, and execution.

2) Actionable insights for agents, with agents

The second pattern is the insight loop itself: turning broad social data into detailed, operational insight that agents can directly use. This is the practical form of the insight paradox. Teams need scale and depth at the same time, but most tools force a tradeoff between high-volume aggregation and high-fidelity interpretation.

CrowdListen is designed to close that gap by combining large-scale signal capture with structured synthesis that preserves nuance. In practice, this means agents can move from fragmented discussion to usable decisions with less manual translation and less context loss.

Early validation supports this direction. In enterprise conversations, including teams such as L’Oréal, we repeatedly saw the same outcome: when synthesis is structured and context is preserved, teams reduce analysis overhead, respond to shifts earlier, and make faster product calls with higher confidence. The key benefit is not just speed; it is converting audience evidence into better execution decisions without losing fidelity across the workflow.

Feature Extraction & Multimodal Content Understanding

Fine-grained social listening for understanding crowd conversations: a framework for turning fragmented, high-volume social discussion into weighted signal and agent-ready product action.

Thesis role: Technical substrate — how multimodal signal becomes decision-grade input for agent execution.

The Data Problem

The most relevant market signal increasingly lives in unstructured web discourse: short-form video, comments, threads, and community discussion. This data is rich and current, but difficult to interpret at scale because meaning is distributed across modalities and interaction context.

Two Implementation Branches

The first branch is flatten-to-text: ASR/OCR plus comments and metadata are merged, then NER/keyword extraction is applied. This is efficient, but it collapses structure too early. Tier-one source content and tier-two reaction signal become mixed, making weighting and causality blurry.

The second branch is direct multimodal model pipelines. This improves semantic coverage, but often increases latency/cost and still under-specifies platform-aware structure. Better models help, but do not automatically solve representation quality.

What Is Missing

The missing layer is structured interpretation: explicit handling of signal hierarchy, evidentiary weighting, and cross-modal relationships. Humans do this naturally when consuming content; agents need it encoded.

Feature Extraction as Decision Infrastructure

Feature extraction should be treated as decision infrastructure, not preprocessing. The output must be traceable and actionable: prioritized themes, weighted evidence, trade-offs, and constraints that can be passed forward into full-context specs.

This is how multimodal content understanding becomes useful in the agent economy: not as summary generation, but as a reliable substrate for ambiguity resolution and downstream execution.

Business Model and Competitive Position in an Agent Economy

A core economic shift is happening from SaaS priced per human seat to systems where agents increasingly function as operational headcount. In that world, charging mainly for interface access becomes less defensible than charging for completed work. The more relevant unit is successful task throughput: how many useful synthesis, decision, and execution tasks the system actually completes with quality.

Because baseline model runtime costs continue to compress, raw model access becomes less differentiated over time. What differentiates value is outcome conversion: whether the system can reliably transform noisy, multimodal input into high-quality requirements and downstream execution artifacts. That supports a consumption model tied to completed tasks or validated outcomes rather than static seat count.

On competition, foundation-model companies and large platforms will likely dominate broad planning layers and general agent orchestration. The strategic gap is the intent-preservation layer: converting unstructured, multimodal human signal into context that agents can execute against without drift. This is less about who has the biggest model and more about who owns the strongest translation layer from discourse to decision.

Over time, each company will accumulate its own delegation assets: constraint libraries, brand logic, quality thresholds, decision history, and reusable context specs. Those assets become compounding infrastructure for agent execution and may be defensibility points if integrated deeply into workflow and governance. In that framing, feature extraction is not a reporting feature; it is the front end of a results-oriented delegation system.

CrowdListen (Archived v2025-11-09)

Transform large-scale social conversations into actionable insights. Understand crowd sentiment, track emerging opinions, and identify key narratives.

Archived snapshot (pulled from commit a6ddeef, 2025-11-09). Historical version preserved from the earlier crowdlistening.md lineage.

CrowdListen Homepage

From Content Aggregation to Original Research (crowdlisten.com)

Crowdlistening transforms large-scale social conversations into actionable insight by integrating llm reasoning with extensive model context protocol(MCP) capabilities. While being able to quantatively analyze large volumes of data is already an interesting task, our focus is not just on content analysis at scale, but rather conducting original research directly from raw social data, generating insights that haven’t yet appeared in established reporting.

Deep research features provide professional-looking research reports, yet the contents are far from original, as they’re drawn from articles already indexable on the internet and paraphrased with LLMs. However, much of the internet’s data exists in unstructured formats - TikTok videos, comments, and metadata, for example. Too much content is generated every day for there to be existing articles written about it all, and when such articles are published, they’re often already outdated. When you consider multimodal data, metadata, and connections between data points, these are precisely the types of information that could yield genuinely interesting and useful insights.

I’ve been thinking about this problem while working at TikTok, enabling better social listening through more fine-grained insights extracted using multi-modal/LLM-based approaches. In October, I started developing early conceptions of Crowdlistening, focusing on multi-modal content understanding for TikTok videos. Although deep research features like GPT Researcher and Stanford Oval Storm existed, it wasn’t intuitive to integrate unstructured data processing capabilities into their workflows.

I paused Crowdlistening in Winter Quarter due to other commitments, but during this time, Anthropic released the Model Context Protocol (MCP). I’ve recently gotten back on track following progress in this field, and I believe this presents an interesting avenue for product innovation - deep research features are significantly enhanced by the growing ecosystem of MCP servers (the same agentic workflows perform much better given they rely on APIs, whose capabilities have improved over recent months).

What I’m particularly interested in exploring and building with Crowdlistening is the ability to extract actionable insights from large volumes of unstructured or semi-structured data, forming linkages, and perhaps even testing hypotheses to enable effective research at scale. We started with TikTok data as a prototype ground given my familiarity with the medium, but I could quickly see this covering any type of unstructured data available on the web.

Product Suite Overview

Product Suite

CrowdListen has evolved into a comprehensive suite of AI-powered products designed to address different aspects of social intelligence and content strategy. The Analyze product serves as our core offering, enabling users to discover what people really think about any topic through sophisticated AI-powered sentiment analysis and opinion mining capabilities. This goes beyond simple positive/negative categorization to understand nuanced perspectives, emotional context, and the underlying reasons behind audience reactions.

Our Research product focuses on real-time social media sentiment analysis and trend detection, particularly across Chinese platforms where traditional Western tools often fall short. This capability is crucial for brands and researchers who need to understand global conversations and cultural nuances that might be missed by region-specific tools.

The Predict product represents our foray into predictive analytics, allowing users to test content variations and predict audience engagement before publishing. Using AI simulation technology, teams can experiment with different messaging approaches and understand likely audience reactions without the risk and cost of live testing.

Finally, our Insights+ product caters to enterprise users and power analysts who need advanced analytics capabilities and custom reporting features. This tier provides the depth and customization necessary for organizations making strategic decisions based on social intelligence data.

The Insight Paradox

Insight Paradox

Brands today face a fundamental paradox: they need broad insights from vast amounts of social data, yet require the detailed understanding typically only available through limited case studies. Current solutions offer either abstracted metrics that require tedious manual interpretation, expensive and limited content screening that can’t scale, or surface-level sentiment analysis that misses nuanced opinions. Crowdlistening bridges this gap by combining the scale of algorithmic analysis with the depth of human-like comprehension. This addresses the first challenge identified in “Essence of Creativity” - helping users understand massive amounts of information and generate meaningful insights when they “don’t know what output they want.”

Technical Architecture: Multi-Modal by Design

The rationale behind Crowdlistening’s multi-modal technical architecture stems from the fundamental challenge of extracting truly valuable insights from the vast and varied landscape of online conversations. Traditional methods often fall short because they either focus on structured data or analyze individual modalities (text, video, audio) in isolation. This approach misses the rich context and nuanced understanding that arises from the interplay between different forms of content and engagement. For example, a viral TikTok video’s impact is not solely determined by its visual content but also by its accompanying audio, captions, user comments, and engagement metrics like likes and shares.

Analysis Page

Crowdlistening’s design directly tackles this limitation by integrating embedding-based topic modeling and LLM deep research capabilities to process and understand this multi-faceted data. Embedding-based topic modeling efficiently identifies key themes across massive datasets, while the LLM’s deep reasoning capabilities can then analyze these themes within the context of various modalities.

This dual approach allows for a layered analysis, examining both the primary content and the subsequent engagement it generates. By processing video, audio, text, and engagement metrics in a unified system, Crowdlistening can generate insights that reflect not just what is being said, but how it’s being said, the surrounding context, and the audience’s multifaceted response. This comprehensive understanding is crucial for overcoming the “insight paradox” and delivering truly actionable intelligence that goes beyond surface-level sentiment or abstracted metrics. Ultimately, this multi-modal design is essential for achieving the core goal of Crowdlistening: to conduct original research directly from raw social data and uncover emerging trends and nuanced opinions that would be invisible to single-mode analysis systems.

Detailed Analysis Capabilities

The platform provides granular breakdowns of content performance and audience reactions. As shown in our analysis results page, users can explore specific themes, track sentiment over time, and identify the most engaging content types. This helps brands understand not just what is being said, but why certain content resonates with their audience.

Analysis Results

The opinion analysis feature goes beyond simple positive/negative sentiment to categorize specific viewpoints and concerns. This allows brands to understand the nuanced perspectives their audience holds, helping them craft more targeted and effective messaging.

Opinion Analysis

Advanced Research Infrastructure

Research Command Center

CrowdListen’s research infrastructure is built around a sophisticated orchestration system that coordinates multiple specialized AI engines. The Research Command Center provides users with a unified interface to launch complex analysis workflows while monitoring the progress of different analytical engines in real-time.

Our system utilizes the BettaFish Control Surface, which orchestrates various AI engines including the Insight Engine for sentiment analysis, Media Engine for multimodal content processing, Query Engine for information retrieval, and Report Engine for generating executive-ready reports. This modular architecture allows for scalable analysis that can adapt to different research requirements.

Research Interface

The research interface enables users to input complex queries and optionally upload analysis templates to guide the investigation. The system then automatically determines which analytical capabilities to deploy, processing everything from web search and specialized platform data collection to multi-layered content analysis and synthesis.

This integrated approach represents a significant advancement over traditional social media monitoring tools, enabling researchers to conduct comprehensive investigations that would typically require weeks of manual work in a matter of minutes while maintaining the depth and rigor of human-led research.

Case Study: Google NotebookLM Analysis

To demonstrate Crowdlistening’s capabilities in product intelligence, we conducted a comprehensive analysis of user sentiment regarding Google’s NotebookLM tool. This case study showcases our platform’s ability to extract nuanced insights about emerging AI tools and understand user adoption patterns.

NotebookLM Analysis

When analyzing user sentiment around NotebookLM, our system provided a comprehensive overview showing that customer feedback indicates NotebookLM is effective for information synthesis and content generation, particularly in educational settings. However, users express concerns about the lack of persistent chat history, word count limits, and potential biases in the auto-generated podcast feature. Approximately 56% of users have a positive sentiment, praising its summarization capabilities and educational applications, while 34% express negative sentiment due to usability issues and accuracy concerns.

Theme Analysis

Our thematic analysis reveals that Information Synthesis and Summarization is the most discussed topic, with 100 mentions representing 33.39% of all conversations. The sentiment breakdown shows overwhelmingly positive feedback for this core functionality, with users particularly appreciating the tool’s ability to synthesize information from uploaded documents and aid in quick comprehension and analysis.

The detailed sentiment analysis shows specific user opinions, including praise for NotebookLM’s effectiveness in summarizing and synthesizing information from uploaded documents, its utility for creating study guides and educational materials, and its ability to provide citations for generated information to help users verify accuracy and build trust in the tool’s output.

Source Analysis

Our analysis draws from 31 sources across 25 unique domains, indicating a moderate level of source diversity at 81%. The sources encompass various types including blogs, news outlets, and other platforms, offering a mix of perspectives. This comprehensive source analysis helps validate the reliability and breadth of our insights.

Related Topics

The platform also identifies related research opportunities, suggesting additional analysis areas such as specific research or writing challenges that NotebookLM helps users overcome, how effectively it addresses information overload, the biggest frustrations users encounter, and whether it has improved research workflows. This demonstrates our system’s ability to not only analyze current sentiment but also identify strategic research directions.

Content Predictor: AI-Powered Engagement Forecasting

Content Predictor

One of our most innovative features is the Content Predictor, which allows users to test content variations and predict audience engagement before publishing. This tool represents a significant advancement in social media strategy, enabling teams to experiment with different messaging approaches without the traditional risks and costs associated with live testing.

The Content Predictor uses a sophisticated three-step workflow. Users begin by generating multiple versions of their content, allowing our AI to create variations optimized for specific platforms like Twitter, Instagram, or LinkedIn. Next, the system runs engagement simulations using AI-powered user reactions that model realistic audience behavior patterns. Finally, users can view detailed simulation results and select the most promising content variations based on predicted performance metrics.

This capability is particularly valuable for brands and content creators who need to maximize the impact of their social media presence. Rather than relying on intuition or conducting expensive A/B tests with real audiences, teams can now validate their content strategies in a controlled environment before committing to publication. The system considers factors such as platform-specific audience behaviors, trending topics, and historical engagement patterns to provide accurate predictions.

The Content Predictor exemplifies our broader mission of transforming social media from a reactive medium to a strategic tool where decisions are informed by data and predictive intelligence rather than guesswork.

Validation and Impact

Our solution has been validated through interviews with major brands like L’Oreal, confirming we drastically cut the time and cost of social media analysis. Crowdlistening enables:

  • Rapid response to emerging trends
  • Deep understanding of consumer sentiment across demographics
  • Identification of microtrends before they become mainstream
  • Competitive intelligence at unprecedented scale

The Future of MCP-Driven Research

We believe Model Context Protocols represent the future of specialized LLM applications. As shown in our implementation, MCPs provide a structured way for language models to interact with specialized tools and data sources while maintaining context awareness throughout the analysis process.

This approach is likely to become standard in LLM application development given how effectively it bridges the gap between general-purpose AI and domain-specific functionality. We anticipate seeing more MCP clients (interaction surfaces like Claude’s interface) emerge as this paradigm gains traction.

For social media analysis specifically, this approach creates a fascinating dynamic where AI-driven insights can actually lead structured reporting in terms of timeliness and depth. By processing and analyzing unstructured social data at scale, we can identify emerging trends and public sentiment shifts before they’re covered in traditional reporting.

Credits

This project was developed in collaboration with Madison Bratley, whose expertise in journalism and social media analysis was instrumental in conceptualizing how this technology could transform research methodologies. Additional contributions from Violet Liu in providing valuable usability feedback for our early prototype. I would also like to acknowledge Zhengjin, Cathy, Roy, Ruiwan, Qiping, Tongming and other members on the Creative team at TikTok, who I’ve discussed early conceptions of this idea with.

On Social Intelligence

Crowdlistening represents the next evolution in social listening tools - moving beyond counting mentions to truly understanding conversations at scale. By transforming social media chatter into structured insights, we’re helping brands make more informed decisions faster than ever before.

As noted in “Essence of Creativity,” the real value in AI-powered tools comes not just from generating content, but from helping users find new perspectives and insights. Our platform serves as both an inspiration acquisition tool (accelerating original content production) and a content understanding tool (helping brands better comprehend their audience). By connecting insight data with generation capabilities, we’re creating the kind of breakthrough product that bridges the gap between understanding and action.


📋 Version History

v1.1 • Oct 25, 2025 • View changes • Updated Title

💡 Click “View changes” to see exactly what changed between versions

CrowdListen (Archived v2025-09-01)

Understanding collective intelligence and social dynamics. Why crowd psychology matters for product builders and how human need for belonging drives engagement patterns.

Archived snapshot (Sep 2025 era; source commit d3e4a51 dated 2025-09-03). Preserved as an in-between version between the May and Oct thesis snapshots.

In a world dominated by expert opinions and algorithm-driven content, there’s something fundamentally human about wanting to know what others think. Whether we admit it or not, we’re drawn to understand the collective mindset.

There’s wisdom in crowds. While large groups may not always converge on absolute truths (in fact, many truthful views begin as contrarian positions), they provide something equally valuable: comfort and context. Being part of a group, understanding its thoughts and values, creates a sense of safety and belonging that’s deeply wired into our social nature. Even when we disagree with mainstream opinions, understanding them helps us navigate social landscapes and provides reference points for our own thinking. This isn’t mere conformity—it’s about contextualizing our experiences within the broader human narrative.

Our information ecosystem has evolved in two problematic directions. On one side, mainstream media delivers curated “expert views” that often miss nuance. On the other, recommendation algorithms trap us in personalized echo chambers that reinforce existing beliefs.

What’s missing? The authentic, unfiltered perspective of the crowd.

Comment sections, forums, and face-to-face conversations provide windows into what people actually think—unmediated by gatekeepers or algorithms. These spaces, though sometimes chaotic, offer genuine insights that both experts and algorithms frequently miss.

This is where Crowdlistening enters the picture. Rather than filtering out the noise of crowd perspectives, Crowdlistening aims to extract meaningful patterns and insights from collective thought. It’s about amplifying voices without homogenizing them. By understanding what people collectively think—their concerns, insights, and experiences—we can build products, services, and communities that truly resonate. The crowd isn’t always right, but it’s always worth listening to.

When we learn to listen to crowds effectively, we gain access to a type of distributed intelligence that no single expert or algorithm can match. In our increasingly fragmented information landscape, this skill becomes not just valuable but essential.

WIP

CrowdListen (Archived v2025-08-29)

How CrowdListen started: from content aggregation ideas to original research workflows over unstructured social data.

Archived snapshot (v2025-08-29). Early CrowdListen thesis draft preserved for lineage.

Crowdlisten transforms large-scale social conversations into actionable insight by integrating LLM reasoning with extensive model context protocol (MCP) capabilities. While being able to quantitatively analyze large volumes of data is already an interesting task, our focus is not just on content analysis at scale, but rather conducting original research directly from raw social data, generating insights that haven’t yet appeared in established reporting.

Deep research features provide professional-looking research reports, yet the contents are far from original, as they’re drawn from articles already indexable on the internet and paraphrased with LLMs. However, much of the internet’s data exists in unstructured formats - TikTok videos, comments, and metadata, for example. Too much content is generated every day for there to be existing articles written about it all, and when such articles are published, they’re often already outdated. When you consider multimodal data, metadata, and connections between data points, these are precisely the types of information that could yield genuinely interesting and useful insights.

I’ve been thinking about this problem while working at TikTok, enabling better social listening through more fine-grained insights extracted using multi-modal/LLM-based approaches. In October, I started developing early conceptions of Crowdlisten, focusing on multi-modal content understanding for TikTok videos. Although deep research features like GPT Researcher and Stanford Oval Storm existed, it wasn’t intuitive to integrate unstructured data processing capabilities into their workflows.

I paused Crowdlisten in Winter Quarter due to other commitments, but during this time, Anthropic released the Model Context Protocol (MCP). I’ve recently gotten back on track following progress in this field, and I believe this presents an interesting avenue for product innovation - deep research features are significantly enhanced by the growing ecosystem of MCP servers (the same agentic workflows perform much better given they rely on APIs, whose capabilities have improved over recent months).

What I’m particularly interested in exploring and building with Crowdlisten is the ability to extract actionable insights from large volumes of unstructured or semi-structured data, forming linkages, and perhaps even testing hypotheses to enable effective research at scale. We started with TikTok data as a prototype ground given my familiarity with the medium, but I could quickly see this covering any type of unstructured data available on the web.