Part 1
Think of an app that’s like a parrot. But this isn’t any ordinary parrot. It doesn’t just copy what it hears. Instead, it can come up with a wide range of clear and well-structured responses. And instead of eating bird food, it’s been ‘fed’ with all the different types of information on the internet. This is what Large Language Models (LLMs) like chatGPT or GPT-4 are about.

These LLMs work using two main parts: a parameters file and a run file. Think of the parameters file as the parrot’s memory of phrases it’s learned. It holds the ‘weights’ of the neural network, which are like the strength of connections in the parrot’s brain. These weights are learned during training on a vast corpus of internet data and are the core of what the model knows - they’re like the digital food that our parrot eats, sourced from the entire internet.
The run file is akin to the parrot’s voice box. It takes the ‘weights’ or learned phrases from the parameters file and uses them to form sentences. This file sets up the structure of the neural network, which is like the parrot’s brain. It then performs a forward pass of the network, which is a process of turning inputs (like prompts or questions) into outputs (the parrot’s responses). In essence, it’s the part of the system that allows our digital parrot to ‘speak’.
Training these digital parrots is no small feat. It’s akin to teaching a real parrot to understand and mimic human language, but on a much larger scale. This process requires a high-performance computer, known as a GPU cluster, capable of processing and learning from the vast expanse of the internet. It’s a time-consuming and costly process, but the result is a model capable of generating articulate and coherent responses to a wide array of prompts.
The magic of these LLMs lies in their ability to predict the next word in a sequence. This seemingly simple task is actually a powerful learning mechanism. It forces the model to understand the context of the sentence, the subject of the conversation, the tone of the writing, and much more. It’s not just predicting a word; it’s understanding language and the world it describes.
In essence, LLMs are like digital parrots that have read the entire internet. They don’t understand the world in the way humans do, but they can mimic human-like text based on the patterns they’ve learned. This makes them powerful tools for a wide range of tasks, from writing articles to answering questions, and much more.

About Sharad Jain
Sharad Jain is an AI Engineer and Data Scientist specializing in enterprise-scale generative AI and NLP. Currently leading AI initiatives at Autoscreen.ai, he has developed ACRUE frameworks and optimized LLM performance at scale. Previously at Meta, Autodesk, and WithJoy.com, he brings extensive experience in machine learning, data analytics, and building scalable AI systems. He holds an MS in Business Analytics from UC Davis.