Your own Chef: how to install and execute your own AI LLM

Introduction: The brain in your desk

If you followed technology in the last two years, you have for sure heard about IA, ChatGPT, Gemini and the subject that bonds all these together: LLM. But, what exactly is an LLM? And why would you want to have one living in your Windows or Mac, taking up precious space in your disk, instead of using the optimized versions that giants such as Google or OpenAI offer in the cloud?

What is an LLM (Large Language Model)?

Let’s demystify this. A Large-Scale Language Model (LLM) is not a conscious digital brain. It doesn’t “think” like a human, it doesn’t “understand” emotions, and it certainly doesn’t “know” that chocolate is delicious.

Think of an LLM as the most sophisticated and supercharged autocomplete system on the planet. It is a statistical system. It has been fed a digital diet that consists of essentially the entire Internet (books, articles, forums, code, etc.). Through this massive training, he did not learn “concepts”, but patterns.

He became incredibly good at predicting the next word (or, more technically, the next token) in a sequence.

Quick Analogy: If I say “Heaven is…”, your brain fills “blue”. If I say “To make a chocolate cake, I need flour, eggs, and…”, an LLM does the same, but on a scale of trillions of parameters, predicting the most likely word sequence based on the data it was trained with.

When you ask him to “write a poem about a sad robot,” he feels no sadness. He just calculates that, statistically, after the word “sad,” words like “lonely,” “cold,” and “metal” have a high probability of appearing. It is a “stochastic parrot” of an almost magical level.

Models such as the Llama family (created by Meta, Facebook’s parent company) are top examples of these LLMs.

The Digital Dilema: Advantages of a Local LLM vs Cloud

You already use LLMs in the cloud. ChatGPT, Gemini (formerly Bard), Claude, Microsoft’s Copilot. They’re powerful, easy to use, and (mostly) free or with affordable subscriptions.

So, why the effort of running a local model on your Mac?

The answer boils down to three pillars: Privacy, Control, and Cost (or lack thereof).

  1. Absolut Privacy:
    • Cloud: Every prompt you write – “how to cure this rash?”, “ideas to fire my boss”, “financial analysis of my business” – is sent to servers you don’t control. This data is used to train future models, is reviewed by human moderators, and in the event of a data breach, can be exposed.
    • Local: When you run an LLM on your Mac, nothing comes out of your machine. Zero. The prompt is processed by its own chip (your M1/M2/M3/M4/M5) and the response is generated locally. You can review confidential documents, write your secret diary or ask for embarrassing medical advice with the assurance that no one is ‘peeking’.
  2. Control and Experimentation:
    • Cloud: You are using a ‘filtered’ product. Companies apply strong guardrails to prevent the model from saying offensive, illegal, or controversial things.
    • Local: You’re in charge. You can run “uncensored” templates that will answer any question. You can adjust technical parameters (such as “temperature”, or creativity) that the cloud versions hide. It’s your personal playground.
  3. No Continuous Costs and Offline Access:
    • Cloud: The most powerful models (GPT-4, Claude 3 Opus) require monthly subscriptions. And if your internet fails, your “external” AI disappears.
    • Local: It’s free (the software is open-source). The only cost is the hardware it already has and electricity. And it works perfectly on a plane, in a café without Wi-Fi or in the middle of the countryside.

The Disadvantage? On-premise models are generally less powerful than the trillion-dollar cloud giants (although the gap is closing fast) and require some basic technical knowledge to get started. That’s what the rest of this article is for..

Part 1: The Arsenal – Preparing your Mac

First of all, an important correction: you are not going to “create” an LLM from scratch. Training a model like the Llama 3 costs millions of dollars in hardware and energy.

What we are going to do is “run” (inference) a pre-trained LLM on our machine.

Requirements:

  • A Mac: Ideally, any Mac with an Apple Silicon chip (M1, M2, M3, M4, or M5). These chips are fantastic for running LLMs because of their unified memory architecture (shared RAM and VRAM). An old Intel Mac may work, but it will be painfully slow.
  • Disk Space: The models are big. A “small” model (like the Llama 3 8B) takes about 5GB. Larger models take tens of GB.
  • RAM: The more the better. 16GB is a great starting point, 8GB works for smaller models.

The Magic Tool: Ollama

Forget about compiling complex code or spending days setting up Python environments. The community has created a brilliant tool that makes this process ridiculously easy on Mac: Ollama.

Ollama is an LLM runner. It takes care of the installation, template management, and provides a simple local server to interact with them..

Step 1: Install Ollama

  1. Go to the official site: ollama.com
  2. Click on the MacOS download button.
  3. You will download a .dmg file. Open it and drag the Ollama to your Applications folder.
  4. Execute the app. You will see a small icon of a Llama in your menu bar. It’s installed.

Step 2: Open the Terminal Yes, we will have to use the Terminal. Don’t be afraid.That’s the “cockpit” of your Mac. Go to Applications > Utils > Terminal and open it.

Step 3: Call (Pull) the Llama 3 Model The Llama 3 is the most recent and powerful model from Meta. Ollama prives access to it with a simple command. In the Terminal, write:

ollama pull llama3

What happens now? Ollama is downloading Llama 3 (the 8B model, or 8 billion parameters, by default) from its repository. This may take a few minutes and will take up about 4.7GB.

Step 4: Run the Model and Start chatting As soon as the download is finished, you can talk with it. Write:

ollama run llama3

You Terminal prompt will change to >>>. That’s it. You can now talk directly to a cutting edge AI that is running entirely on your computer.

You can test with something like “Hello! Who are you?”. The response will be generated locally. To leave write /bye.

Part 2: The cooking challenge – Hunt for the Chocolate Cake

Now, let’s get down to business. We have a powerful but ‘raw’ LLM in our machine. We want a chocolate cake recipe..

Attempt 1: The Lazy Request

Abra o seu terminal e corra ollama run llama3. Quando aparecer >>>, escreva:

Prompt 1: give me a chocolate cake recipe

Likely Outcome: Llama 3 will dutifully give you a recipe. It will probably be a standard American recipe (glasses and spoons), correct but without ‘soul’. It will be functional, but generic. This is because your request was vague.

Attempt 2: Prompt Engineering (the Secret)

An LLM is not a crystal ball; It is a pattern engine that obeys instructions. The quality of the output (response) depends 90% on the quality of the input (your request, or prompt)..

Let’s be specific. Let’s use Prompt Engineering.

Let’s start the session again (or just continue it). This time, we’re going to give you context, a persona, constraints, and an output format.

Prompt 2 (The Good Prompt): Act as a world-renowned pastry chef with a passion for rich and decadent desserts. Your audience is an amateur cook, so be clear and encouraging. I need your *best* recipe for a chocolate cake that's incredibly moist, dense (almost like a fudge, but still a cake) and with a deep cocoa flavor, not too sweet. Please provide the recipe with the following rules: 1. Complete ingredient list, using metric measurements (grams and ml). 2. Step-by-step instructions, very clear. 3. The cooking time and oven temperature (in Celsius). 4. A "chef's trick" at the end to ensure that the cake turns out perfect.

Probable Outcome: The answer will be drastically different. The model will ‘dress’ the persona (“Ah, mon ami! Absolutely!”). It will focus on “moist” and “dense”, perhaps suggesting ingredients such as sour cream, yoghurt or coffee (which enhances chocolate). The measurements will be in grams. The instructions will be detailed. And the chef’s trick? You can suggest “don’t overmix the flour” or “use Dutch-processed cocoa.”

Attempt 3: Iteration (The Refinement)

The best thing about having the LLM local is that it maintains context (in the same session). You don’t need to start from scratch.

Prompt 3 (Follow-up): Excellent! But I forgot to say: my aunt is celiac. How can I adapt this recipe to be gluten-free? What flours do you recommend and do the proportions change?

Llama 3 (who has the previous conversation in his short-term “memory”) will now take the specific recipe you just gave him and modify it for his new needs, suggesting gluten-free flour blends (almond, rice, etc.) and adjusting the liquids if necessary.

And all of this happened without a single byte of your culinary preference leaving your Mac.

Part 3: The Dark Side – Risks, Security and Ethics

Running an LLM locally is fantastic, but it’s not a rosy world. It removes security barriers from the cloud, which means that the responsibility shifts from companies (Google, OpenAI) to… you..

1. Hallucinations: The Lying (and Confident) AI

LLMs “hallucinate”. This is a technical term for when AI invents facts, sources, or details but presents them with absolute confidence.

  • In the Cake Example: Llama 3 may “hallucinate” a wrong oven temperature (250°C instead of 180°C) or suggest 100g of yeast instead of 10g. The cake will not only be bad; It will be a disaster.
  • In the Real World: This is dangerous when asked about medical, legal, or financial advice.
  • Moral of the Story: CHECK EVERYTHING. Never blindly rely on an LLM for critical factual information. Use it for creativity, drafts, and suggestions, but do your own checking.

2. Systemic Bias: The Garbage That Enters…

The LLM has been trained on the Internet. The Internet is full of prejudices, stereotypes, racism, and sexism. The LLM is a mirror of this.

  • In the Cake Example: If you ask for a “grandmother’s cake recipe”, the model may take on gender stereotypes. If you ask for a “recipe from an exotic country”, you can return cultural clichés.
  • In the Real World: These biases can influence hiring decisions (if used to review CVs), medical diagnoses, or court rulings (if used as a legal assistant).
  • Moral of the Story: Be aware that AI is not “objective.” It reflects the biases of the data it was fed.

3. Security: The Modern Trojan Horse

  • Your Local Risk: What happens if you download a model (perhaps from a less reputable site than Ollama) that has been maliciously “fine-tuned”? It could theoretically be designed to scan your computer for sensitive files or give you malicious code when you ask for help programming.
  • Your Responsibility (Ethics): Llama 3 in Ollama has some safety barriers. But there are models on the internet (“uncensored” models) that don’t have any.

4. Ethics and Wrongful Usage (The Elephant in the Room)

This brings us to the most critical point. A local LLM, without filters, is a “dual-use” tool.

  • Cloud (ChatGPT): If you ask ChatGPT, “Write me a convincing phishing email to steal passwords from Bank X,” it will refuse, citing its security policies.
  • Local (Model Without Filters): If you make the same request to an uncensored local template, they can happily write the perfect email, suggesting social engineering tactics and even creating the HTML code.

The same tool that helps you write the perfect cake recipe can be used by a malicious actor to generate mass disinformation, write code for malware, create detailed plans for nefarious activities, or generate hate propaganda.

When you run a model locally, you are circumventing the (few) safeguards that the industry has tried to implement.

Conclusion

Having an LLM like Llama 3 running on your Mac is a liberating experience. It’s a glimpse into a future where powerful AI is a personal, private, user-controlled tool, not a service rented to mega-corporations.

It’s incredibly useful for drafting emails, coding, summarizing texts, translating, or yes, perfecting recipes.

But like any powerful tool — be it a hammer or a supercomputer — it has no morality of its own. Its usefulness and its danger depend entirely on the hands that operate it.

Now, go make that chocolate cake. But, just in case, check the amount of salt in a real cookbook.

Pedro Coelho

Leave a Reply