My Takeaways from Andrej Karpathy's "How I Use LLMs"
Today I watched a really interesting video by Andrej Karpathy, a computer scientist and AI researcher who was a founding member at OpenAI. The video went into a nice level of detail in the ways that Andrej makes use of Large Language Models (LLMs). The video gave me some insight into how LLMs work, but it mainly inspired me to think about how I can use LLMs more effectively while being aware of their limitations.
The Video
My Takeaways
I’ve tried to summarise the main things that I have taken from the video. This is my first time learning about a lot of the concepts here, so hopefully I may be forgiven if I haven’t quite understood something fully.
The Context Window
I understand the context window to work like the LLM's short-term memory. Our interface with a LLM is sending and receiving tokens; chunks of information that the LLM processes. The data could be text, audio, parts of an image, etc. The tokens that are sent and received are added to this context window and each interaction builds upon this.
$$context\_window = context\_window + new\_tokens$$
When you interact with an LLM through chat, the model doesn't actually remember previous messages in the conversation. Instead, the entire conversation and your new message are sent to the LLM each time. The model "remembers" by having this contextual information packaged with the new input. I was quite surprised by this, but I think it makes sense. It also explains why LLMs might have trouble with a sudden change of topic and why it's better to start a new chat. If you don't, the input for your next query will be cluttered with the context from your previous unrelated conversation.
Pre-Training
Pre-training is a complex and resource-intensive process that involves gathering vast amounts of text data and refining it into a model that can recognise and generate patterns. Rather than simply compressing knowledge, the model learns statistical relationships between words, enabling it to predict the next token in a sequence. While this can create the illusion of understanding, the model does not "remember" or "reason" in the way humans do—it recognises patterns based on probability rather than true comprehension. As a result, it excels at recalling frequently encountered information but struggles with rare or nuanced details, much like how humans forget infrequent experiences over time.
As pre-training is so expensive, it is done infrequently. This means that the model will have something called a knowledge cutoff. For example, if a model was pre-trained in November 2024, it will have no ‘knowledge’ about the events or information on the internet after this. You therefore cannot ask a LLM product (that relies solely on its model’s pre-training) to answer questions about current events.
Post-Training
The post-training phase of a large language model (LLM) refines its capabilities beyond the initial pre-training stage. Its where each LLM product gets its unique traits. This phase typically includes fine-tuning and reinforcement learning from human feedback (RLHF) to align the model’s responses with human values, improve accuracy, and reduce biases. RLHF uses human annotators to rank model responses, guiding the model toward more helpful, coherent, and ethical outputs. Post-training also involves safety measures, such as filtering harmful content and mitigating biases, making the model more reliable for real-world applications.
Unlike pre-training, which is computationally intensive and infrequent, post-training can be done more regularly to adapt the model to user needs and evolving requirements.
Reinforcement Learning - Thinking
Thinking models are further refined through reinforcement learning, allowing them to develop more sophisticated "thinking strategies" that enhance their problem-solving abilities. This process involves iterative feedback loops, where the model evaluates different reasoning paths and learns to prioritise the most effective approaches. As a result, these models demonstrate significant improvements in complex tasks such as mathematics, coding, and logical reasoning. By continuously optimising their decision-making processes, they become more adept at breaking down problems, identifying patterns, and generating precise, well-structured solutions.
Accessing Tools
One limitation of LLMs is their knowledge cutoff, meaning they lack awareness of recent events and updates beyond their training data. Additionally, they may struggle with niche or less prominent information that wasn’t widely covered during training.
A solution to this is equipping LLMs with tools that allow them to retrieve real-time information. For example, integrating an internet search tool enables the LLM to pull in the latest search results, incorporating them into its context window (working memory) to generate more accurate and relevant responses.
When using a large language model (LLM) for maths, it’s important to be sceptical of its answers. While LLMs can recall basic mathematical facts, they do not inherently perform calculations or verify their outputs. Instead, they generate responses based on patterns, sometimes producing numbers that seem plausible but are actually incorrect—this is known as hallucination.
To overcome this limitation, LLMs can be equipped with tools such as a Python interpreter. If the model can recognise when it should rely on such a tool and effectively use it to compute the answer, the result is far more reliable. By offloading mathematical operations to a dedicated computation engine, we get accurate and verifiable solutions to mathematical problems.
This, once again, feels quite human. Just as we recognise and offload challenging tasks that we deem to be beyond our remit to tools. So too can a LLM source answers from more robust sources.
Outside of the video, I've been following the ‘buzz’ around the Model Context Protocol (MCP), which was open-sourced by Anthropic in 2024. Anthropic describes MCP as a “universal, open standard for connecting AI systems with data sources, replacing fragmented integrations with a single protocol.” Essentially, it aims to standardise how large language models (LLMs) interact with external tools, data sources, and services.
The idea behind MCP is to create a seamless framework that allows LLMs to access a variety of tools (like the internet, a Python interpreter, or other external APIs) in a consistent way. This would streamline integrations and make it easier for developers to connect LLM-based products with diverse data sources and services without having to build custom solutions for each one.
There are already MCP registries, such as Smithery, which aggregate and provide a pool of integrations for various tools. This creates an ecosystem where LLMs can easily interface with tools they otherwise might not be able to access natively.
Vibe Coding with Cursor
This is the first time I was exposed to the Cursor text editor. It looks like a fork of VSCode that has a bounty of features that allows you to make effective use of AI in programming.
I was most interested in the example that Andrej gave of developing a tic-tac-toe game using ‘vibe coding’. Here no coding was done by Andrej, and exclusive control (in this case) was given to the text editor and AI. I will admit, it was quite scary to see the impressive result - a fully functioning browser-based game with neat animations and sounds.
This is definitely something that I would like to explore more, the only thing that I am uncertain of is the cost of one of these programming sessions. I am yet to find an example that gives an impression of this. I imagine that it would not be extortionate.
Final Thoughts
The video has given me a great sense of the various ways I could utilise LLMs in my day-to-day life, and has enhanced my overall understanding of how these models work. Going forward, I’d like to explore more with the Cursor text editor and give vibe coding a try, provided the cost isn’t prohibitively expensive for simple exploration.




