Ecosystem and Trends of LLMs
Foundation
It occurred to me one day that the Large Language Models (LLMs) today have a strangely similar stack compared to its predecessors of PC Operating Systems and Cloud systems. Drawing the structure out, it will look like below:
In fact, LLM is the operating system of future. In this paradigm,
- GPU is the computing engine of LLM, this is easy to understand.
- Prompting is the programming language of LLMs, because to have LLM do useful stuff, you need to interact with it effectively with prompts. The quality of your code (prompt) generates the quality of your work. Another great example is ChatGPT Plugins. They are apps on LLMs and their essence is prepackaged prompts that do certain tasks.
- Context and Vector DB are working memory of the LLM OS. For any LLM based apps to be useful, they need up-to-date context to provide useful information or service to users. The problem with this memory architecture is that it is too small compared to that of real computers. 8k tokens of GPT4 model is equivalent to 32KB memory size. Incredibly small. If we want to have 16GB memory for LLM, then we need a context length of 4 billion tokens!
- Finetuning is the disk storage of LLM. Because it becomes part of the model state.
Other Thoughts and Insights:
- Tooling explosion. Tools like PromptPerfect, Scale Spellbook, Langchain are the modern day IDE like IntelliJ and Visual Studio Code. But they are not enough. We still need more tools for 1/ prompt evaluation, 2/ fine-tuning evaluation for different use cases.
- Distillation and Self-Hosting vs. Leverage Large Commercial Models. The two extremes of LLM world is 1/ all startups hosting their own stripped down versions of open source models and fine-tune their own, or 2/ all startups use commercial big model (like GPT4) and just pay the token usage fee. They both carry immense future implications.
- Every LLM is slightly different [1], because they are trained differently, and will require different prompting methods to be most effective. This means #1 tooling explosion is going to be bigger and more important.
- Growth of context length. Context length of the LLM has been growing from 8k to 32k to 100k (by Anthropic Claude) and now 5 M (by magic.dev) token length. It will continue to grow, maybe one day to reach 4 billion tokens.
- The improvement of memory size (context length) and disk storage (fine-tune) will continue together as marginal cost shift between them.
In Closing
One day, we will have an LLM fine-tuned to ourself just like our personal laptop. And all the interactions we have with it, will be stored into its long term storage. Because personalization will bring much more value to users.
[1] https://twitter.com/_jasonwei/status/1661781745015066624?s=20