This is the origin story behind CrowdListen. Before the current product architecture, there was a simpler but important question: can we move from generic content aggregation to genuine original research directly on raw social data?
Crowdlisten transforms large-scale social conversations into actionable insight by integrating LLM reasoning with extensive model context protocol (MCP) capabilities. While being able to quantitatively analyze large volumes of data is already an interesting task, our focus is not just on content analysis at scale, but rather conducting original research directly from raw social data, generating insights that haven’t yet appeared in established reporting.
Deep research features provide professional-looking research reports, yet the contents are far from original, as they’re drawn from articles already indexable on the internet and paraphrased with LLMs. However, much of the internet’s data exists in unstructured formats - TikTok videos, comments, and metadata, for example. Too much content is generated every day for there to be existing articles written about it all, and when such articles are published, they’re often already outdated. When you consider multimodal data, metadata, and connections between data points, these are precisely the types of information that could yield genuinely interesting and useful insights.
I’ve been thinking about this problem while working at TikTok, enabling better social listening through more fine-grained insights extracted using multi-modal/LLM-based approaches. In October, I started developing early conceptions of Crowdlisten, focusing on multi-modal content understanding for TikTok videos. Although deep research features like GPT Researcher and Stanford Oval Storm existed, it wasn’t intuitive to integrate unstructured data processing capabilities into their workflows.
I paused Crowdlisten in Winter Quarter due to other commitments, but during this time, Anthropic released the Model Context Protocol (MCP). I’ve recently gotten back on track following progress in this field, and I believe this presents an interesting avenue for product innovation - deep research features are significantly enhanced by the growing ecosystem of MCP servers (the same agentic workflows perform much better given they rely on APIs, whose capabilities have improved over recent months).
What I’m particularly interested in exploring and building with Crowdlisten is the ability to extract actionable insights from large volumes of unstructured or semi-structured data, forming linkages, and perhaps even testing hypotheses to enable effective research at scale. We started with TikTok data as a prototype ground given my familiarity with the medium, but I could quickly see this covering any type of unstructured data available on the web.