133 private links
This is the 16th year we’ve been teaching the Stanford Lean LaunchPad class. This year, from the first hour of the first class, we realized we were seeing something extraordinary happen. It was both the end and beginning of a new era.
Teams showed up to the first day of class with MVPs (Minimal Viable Products) looking like finished products that previous classes had taken weeks or months to build. After the class, as the instructors sat processing what just happened, we realized there’s no going back.
I’ve been writing about how AI is going to change startups, but the shock of seeing 8 teams actually implementing it was mind blowing. And not a single team thought they were doing anything extraordinary.
I’ve been getting more and more curious about the risk from Anthropic’s Claude Mythos Preview. So I pulled the system card, a whoppingly inefficient 244-page document that devotes just seven pages to the claim that the model is too dangerous to release. In fact, the 23MB of PDF I had to download was 20MB of wasted time and space. Compressing the PDF to 3MB meant I lost exactly nothing.
Foreshadowing, I guess.
Spoiler alert: the crucial seven pages out of 244 do not contain the word “fuzzer” once. That’s like a seven page vacation brochure for Hawaii that leaves out the word beaches.
Also, the crucial seven pages out of 244 do not contain the expected acronyms CVSS, CWE or CVE, they do not have comparison baseline, an independent reproduction, or the word “thousands.” I’ll get back to all of that in a minute.
The flagship demonstration document turns out to be like the ending of the Wizard of Oz, a sorry disappointment about a model weaponizing two bugs that a different model found, in software the vendor had already patched, in a test environment with the browser sandbox and defense-in-depth mitigations stripped out. Anthropic failed, and somehow the story was flipped into a warning about its success.
The landscape of AI is not merely filled with news. It is filled with teams. You have the doomers, the accelerationists, the skeptics, the it’s-a-bubble oracles, the anti-bubble counter oracles, and so on. It would be convenient for my sanity—and, perhaps, the sanity of my readers—if I simply joined one team and never removed the jersey. But I don’t think any aforementioned tribe has a monopoly on good arguments. I think the doomers are right about the risk of the technology, and the accelerationists are right about the promise of the technology, and the skeptics are right that the doomers and accelerationists can both overstate their cases.
My first session with Claude Code was practically magical. I was speaking to my computer, telling it with natural language what I wanted it to do, and it was able to just do it. It did ( and still does ) feel like a completely new form of input, a new way to control my machine. I have misgivings about using AI in this way, but I still think this is a great tool for sufficiently low-level tasks. I’m waiting eagerly for the day that I can spin up a local LLM that can perform this function as well as Claude Code does.
I'm as anti-genAI as it gets. And yet, this past month, I have used generative coding to complete a project. It works. I hated making it.
Retrieval-Augmented Generation (RAG) has become the dominant paradigm for grounding Large Language Model (LLM) agents in domain-specific knowledge. The standard approach requires selecting an embedding model, designing a chunking strategy, deploying a vector database, maintaining indexes, and performing approximate nearest neighbor (ANN) search at query time. We argue that for domain-specific knowledge grounding --- where the vocabulary is predictable and the corpus is bounded --- this entire stack is unnecessary. We present Knowledge Search, a two-layer retrieval system composed of (1) grep with contextual line windows and (2) cat of pre-structured fallback files. Deployed in production across 20 specialized LLM agents serving three knowledge domains (Traditional Chinese Medicine, Christian spiritual classics, and U.S. civics), our approach achieves 100% retrieval accuracy with sub-10ms latency, zero preprocessing, zero additional memory footprint, and zero infrastructure dependencies.
Scientists and educators are concerned about students using artificial intelligence to shortcut their learning. But there are also opportunities, especially when it comes to teaching neuroscience students how to code.
I do not think it will shock anyone to learn that big tech is aggressively pushing AI products. But the extent to which they have done so might. The sheer ubiquity of AI means that we take for ground the countless ways, many invisible, that these products and features are foisted on us—and how Silicon Valley companies have systematically designed and deployed AI products onto their existing platforms in an effort to accelerate adoption.
The role of the IC (Individual Contributor) is evolving fast—and AI is accelerating the shift. As AI tools become deeply integrated into development workflows, many engineers find themselves stepping into responsibilities once reserved for engineering managers. This isn’t a hypothetical trend—it’s already happening in high-performing teams.
With the advent of Llama 2, running strong LLMs locally has become more and more a reality. Its accuracy approaches OpenAI's GPT-3.5, which serves well for many use cases.
In this article, we will explore how we can use Llama2 for Topic Modeling without the need to pass every single document to the model. Instead, we are going to leverage BERTopic, a modular topic modeling technique that can use any LLM for fine-tuning topic representations.
An LLM is no black box but an ML model (based on Neural Networks) that predicts the ‘next’ token given a sequence of previously predicted tokens and input prompt.
How is it able to get the context of the input? Using multi-head attention helps in focusing on important words compared to other tokens in the input sentence. If you’re interested in mathematics, you can read the below blog.
In recent years, large-scale transformer-based language models have become the pinnacle of neural networks used in NLP tasks. They grow in scale and complexity every month, but training such models requires millions of dollars, the best experts, and years of development. That’s why only major IT companies have access to this state-of-the-art technology. However, researchers and developers all over the world need access to these solutions. Without new research, their growth could wane. The only way to avoid this is by sharing best practices with the developer community.
We’ve been using YaLM family of language models in our Alice voice assistant and Yandex Search for more than a year now.
GPT-3, or Generative Pre-trained Transformer 3, is a piece of AI from the OpenAI group that takes text from the user, and writes a lot more for them.
And, freaking heck am I am impressed at what folks have managed to build around the GPT-3 technology.
