How to Train Your Model

February 5, 2025


One of my favorite films growing up was “How to Train your Dragon”, so when they announced the live action feature, I was pretty excited to re-watch the original. Curiously, I started drawing some parallels from this fantastic universe to our own reality—especially in the world of business, where AI is increasingly playing a pivotal role.


Here, AI models are dragons: incredibly useful and capable of revolutionizing products and services in undeniably positive ways—but they remain a vague and terrifying entity to most people.


Like in the movie, the biggest reason people fear AI is because they don’t fully understand it. Yet as a business owner or founder, you don’t necessarily need to master every technical detail; you just need a clear idea of what AI can do for you and how to “steer” it in a beneficial way. So today, I’m donning my viking helmet and diving deep to break down the process of creating an AI model from scratch into a simple, easy-to-read guide! 


This is “How to Train your Model.”


Before machine learning engineers even begin writing code, they need to decide what kind of problem they want their model to solve. Do they want an AI with computer vision that can identify or generate images? Do they want a system capable of listening to and understanding audio bites? Maybe they want an LLM (or large learning language model) that can read and generate text? If you’re a founder hoping to create better customer support or new content-generation tools, that might be your focus. Regardless, one of the first steps is deciding whether it’s a classification (categorizing things) or regression (predicting numbers) problem.


For the purposes of simplicity, we’ll go with a text model, as it’s the most widely used form of AI today. LLMs are next-word prediction engines. Basically, the AI writes a sentence word-by-word by predicting what word makes the most sense to come next in the sequence. You can think of it like a child learning to finish your sentences: If you say, “The sky is ___,” the model guesses “blue” (with high probability) or “purple” (with lower probability).


But how does the model know that “blue” is a better guess for the color of the sky than “purple”? Well, it has to “read” about it! You need to provide the model with a library of sentences that it can assume have sequenced words correctly. This is your “training data,” and the amount you provide are your parameters. This is where you can start to customize things to your business! If you want to build an LLM with some kind of specialty (like fiction writing for a storytelling product), then you would only fill the training data with samples of fiction. However, the general rule is the more data you can provide, the better the predictions will be. That’s why OpenAI trains their chat models on pretty much the entire internet.


Now your model is ready to graduate from reading to writing. But imagine if you were a LLM trying to predict/write the next word in a sentence, and you had to choose between billions of words in your training data. It would probably take any human around 100,000 years to sort through and calculate which word is the best choice. AI can do these calculations in under a second with a transformer that enables “parallel processing.”


Before Transformers arrived on the scene in 2017, many AI models read text one word at a time—like scanning a page from left to right. Transformers changed the game by letting models look at all words simultaneously, supercharging their ability to understand context as they are “reading” in real time! This is how it works:


Tokenization
Think of this as chopping up sentences into bite-sized pieces (tokens). For instance, “ChatGPT” might be split into [“Chat”, “G”, “PT”]. Kind of like we give children vocabulary lists before they start learning to read strings of words together as sentences.

Embeddings
Each token is then turned into a list of numbers (a vector). This is basically like translating English into “computer language.” For example, the token “blue” might have a numerical fingerprint like [0.23, -1.02, 0.77…]. By reading these numbers, the AI can understand that blue is a color and not something else, like a feeling or object.

Attention Mechanism
This is the real wizardry that makes Transformers so powerful. Instead of focusing on one word at a time, the model “attends” to all tokens at once. If the sentence is “I went to the bank to deposit money,” it can be understood that “bank” means a financial institution, not a riverbank.

Back-Propogation
Finally, the process of Back-Propogation is like a model going through a “study session”: once it sees how far off its initial predictions were, it goes back through these layers and adjusts their parameters to reduce future errors. This continuous loop of trial-and-error gradually teaches the AI to make better predictions.


But how is it even physically possible for so many calculations to be done at once? Well, transformers are run on many different GPUs (processing chips) all working in parallel. How fast your model performs calculations depends on how many GPUs you are running: this is called your computational scale.


Okay, let’s move on to fine-tuning your model. This is the step that most developers—and many businesses—start with. The most common practice is to pay for access to a pre-trained model from a company like OpenAI, and then add your own customized fine-tuning to make it suit your products or services. But there’s still some general fine-tuning needed on a base-level AI dragon—er, pre-trained LLM—to make it useful for humans. Yes, it can generate text, but can it handle sensitive topics without breathing fire or giving clueless answers? Enter Fine-Tuning.


Reinforcement Learning with Human Feedback (RLHF) is basically what it sounds like: humans reading the AI’s answers, and explaining whether they thought it was a good or bad response.


Each time your AI spits out an answer, human testers rate it: “Is this helpful? Is it rude? Is it even correct?” The model then adjusts its parameters (its “attitude,” if you will) to do better next time. Over multiple rounds, you end up with an AI that not only answers questions properly but also avoids destructive behaviors and nonsensical statements.


Now that your AI dragon is well-trained, how do you set it free into the world so your customers or team can start using it? How do you deploy an AI online so that it is accessible and beneficial?


You could use “Cloud Deployment.” Platforms like AWS, Google Cloud, or Azure provide the giant “arena” for your LLM to stretch its wings. It’s scalable (meaning you can handle millions of user requests without your model crashing) and relatively straightforward to set up.


You can also try “Edge Deployment.” For smaller or more specialized tasks (like an AI that runs on your phone without needing the internet), you can use lightweight versions of your model. Think of it like sending a little messenger dragon that can navigate tight spaces rather than a massive one that needs a huge flight path.

The potential of AI is incredibly exciting, but there are some important drawbacks to consider when testing and deploying generative technology. 


LLMs aren’t perfect. They don’t “know” facts— they predict plausible text. Given conflicting or insufficient data, they might invent answers out of thin air. This phenomenon is called a “hallucination.”


But there are many ways to combat this issue. Firstly, you can enlist the help of a fact-checking tool. Try to pair your LLM with reliable databases or knowledge repositories. If your model tries to say “Fish can fly,” you’ll want to have it cross-check that claim before finalizing the answer.


It’s also important to keep an eye on your temperature settings. A high temperature can make your AI more creative but also more likely to produce weird or incorrect statements. Lowering it can keep the AI grounded—but may also make it sound dull. You should consider the objective of your AI service and how it will interact with customers when deciding on your settings.


Training an LLM is a lot like guiding a dragon from hatchling to hero: it starts off clumsy, but with the right data, architecture, and fine-tuning, it can become a powerful ally. Sure, it requires huge computing resources (those GPUs don’t come cheap), but like we mentioned earlier, you can always start from a pre-trained model (like GPT-4) and skip the earliest and most expensive steps.


And that’s it! AI may look like a fearsome dragon, but once you understand its biology (or, in technical terms, its architecture), it becomes a powerful, trainable ally. For founders and business owners, the key is recognizing which part of the process matters most to your goals—whether it’s fine-tuning for customer support, deploying in the cloud for scalability, or creating an edge solution for on-device intelligence. Happy training!


Next Steps

  1. Experiment with Open-Source Models: Check out repositories like Hugging Face for free model downloads.

  2. Learn Prompt Engineering: Simple instructions like “Explain this to me like I’m five” or “Write me a poem about neural networks” can drastically change the output.

  3. Explore Transformer Visualizations: Try the TensorFlow Playground to see how layers and attention mechanisms work in real-time.


Further Learning

  • Research Paper: “Attention Is All You Need” for a deep dive into the original Transformer blueprint.

  • Tutorials: Google Colab has hands-on guides to help you fine-tune your own model.

Brinlee Kidd - Junior Developer @ Bedrock