Year in Review

2024 felt to have passed by very quickly, in part because I had interesting work to occupy my time. Early in the year, after expanding Cogno and gaining some traction, work slowly stalled: though we did code and talk to customers, the cycles were far and wide in between. The focus on multi-agent systems was (as of writing this) in the right direction, yet we did not find the niche to tackle sales conversion improvement.

One crucial lesson I learned as a first-time founder was the importance of being hands-on with coding the product. Having oversight of product development isn’t sufficient – you need to be directly involved in building to achieve faster iteration cycles and truly understand the risks in shipping. This insight came later than I would have liked but proved invaluable.

While I had actually considered taking a leave of absense to work on “the next big thing” coming into college, this year I decided that dropping out is not what I wanted at this stage of life – I still desired the complete college experience with its structure, humanities knowledge, friendships, and Chicago’s unique energy.

In March, after more than a year of working on Cogno, I accepted an opportunity at TikTok to build agent systems for Creative Copilots and Insights. I’m immensely thankful to Zhengjin and Caoye for the invitation. Though cutting short my long-anticipated vacation in Puerto Rico was not what I had planned for, the learning experience proved far more valuable – not to mention attending Nvidia GTC and Google Next in Vegas.

Working at TikTok offered a fascinating blend of familiar and novel experiences. While similar to ByteDance in some respects, collaborating with colleagues from multiple countries and being immersed in Silicon Valley provided distinct advantages. The work on insight extraction from multimodal data opened new perspectives on how LLMs interact with different content types. The question of whether probabilistic models like LLMs can truly be creative remains open for debate, yet their outputs can undeniably surprise us in unexpected ways.

My time at TikTok allowed me to witness firsthand how the field of generative AI is evolving across modalities – text, image, video, and audio – revealing new possibilities for product innovation as these technologies mature. I’ve come to believe that the future of generative AI lies in combining reasoning with synthesis (perhaps with greater emphasis on the latter) and multimodal output.

Later in the year, going to Beijing and returning to school felt like a blur. Yet I found myself enjoying academics more than anticipated this quarter, partly due to great teammates and LLM-focused project-based classes. The academic environment offered a focused intensity often diluted in industry settings.

This recognition led me to Crowdlistening, which addresses the abundance of unstructured, multimodal content across social media platforms. By extracting meaningful insights from these underanalyzed sources, we can deliver value to users seeking to understand these complex information streams. Now I’m helping build up ai features for a stealth startup, so I’m less sure how Crowdlistening will evolve, but I still view it as an interesting product.

I’m still figuring things out, but 2024 has undoubtedly been a year of exploration and growth – one for which I’m deeply grateful. I look forward to seeing how these parallel paths evolve in the coming year.

Courage to be last

Reflecting on 2024 (and 2023 for the context of Cogno), among my list of failed projects, very few failed due to lack of innovation. Since I began working with LLMs in fall 2022, there has been an abundance of interesting GenAI technologies to experiment with. It started with “domain specific prompting/finetuning” and data flywheels (thou not even now does anyone know what this looks like in action). By spring 2023, the focus shifted to LLMs as agents, exemplified by the Generative Agents paper, Microsoft AutoGen, and a few opensource projects like MetaGPT. At Cogno, we also built multi-agent systems, integrating various function calling features and agent collaboration for complex task reasoning. Everyone built, few created value (Glean focused on enterprise search, while Moveworks created value through api actions, neither of which I believe agents to have mattered). Founders encouraged each other’s enthusiasm, while investors rushed to learn the latest buzzwords in LLM technology (‘prompt engineering’ and ‘function calls’ sounded less sexy compared to’agents’).

Being first to market rarely matters - people won’t remember you. What matters is creating defensible moats or developing critical elements that lead to unfair advantages. While Google’s technology investment in Android can be considered ’not just building a moat, but scorching the earth for 250 miles around the castle,’ most companies’ self-described technological differentiation is merely self-flattery and a feeble attempt to impress tech-enthusiast investors. Technology truly matters only when it can solve seemingly insurmountable challenges or optimize costs and operations. In every other situation, the focus should be on building sustainable advantages that ensure long-term survival. [Thoughts WIP]