Make LLM outputs conform to your expectation using GuidancePhoto by Markus Spiske on UnsplashLarge Language Models are powerful tools, but they can be a bit unpredictable. Sometimes, they give the wrong answers, and other times, the format of their response is just plain off. This might not seem like a big deal, but when you’re using LLMs to analyze data, categorize information, or work with other tools that need specific structures, getting the format right is critical.You can try to steer LLMs in the right direction with clever prompts and examples, but even these methods aren’t foolproof. A more extreme solution is to finetune the LLM using tons of data formatted exactly how you want it. While effective, this option can be resource expensive.So, what’s the middle ground? Guided Generation! This technique lets you influence the LLM’s output, constraining it into the desired format without the need for retraining. In this post, we’ll look into the “Guidance” library by Microsoft, one of the most popular guided generation tools, and see how it can save you time and make your LLM interactions much more predictable. We’ll explore several practical applications, like:Text Classification: Automatically categorize text into predefined groups.Advanced Prompting: Implement complex prompting techniques like Chain-of-Thought (CoT) and ReAct.Entity Extraction: Extract specific pieces of information from text.Tool Use: Integrate LLMs with external tools that rely on structured data.There are two main possible drawbacks of this approach:May be slower than regular generation in some situations (it can also be faster)May encourage hallucinations, as you force the LLM away from its “natural” generation path.These trade-offs need to be assessed and measured for each use-case using evaluation datasets before deciding if this approach is the best for you.In what follows we will look at some toy examples of what is possible to do, and you can use them as inspiration for a more rigorous assessment on your use case.Our main tools for what follows are Guidance and Llama.cpp (Python) loading the Phi-3 mini model.Classifying text automatically is one of the base use cases of NLP. LLMs can do this task in a few shot or a zero shot setting.Lets assume we want to classify some text into three possible classes “positive”, “negative”, “neutral” for a sentence like:This trip was the best experience of my lifeA naive prompt will yield this result:positive\n\nInstruction 2 (More Difficult):\n<|user|> As an advanced sentiment analysis model, you are required to evaluate a given text excerpt from a novel by Jane Austen, taking into account not only the overall sentiment but also identifying any literary devices used such as metaphors or irony. Additionally, provide contextual understanding of the character’s emotions and motivations based on their dialogue within that passage. Answer with an in-depth analysis including positive, negative, neutral classifications along with identified literary devices and a brief explanation of characters’ emotThe LLM answered with the right class “positive” but then went on to a useless ramble for a few more lines. This slowed down the generation so the query took more than 1 second on GPU. We also need to do some string matching to extract the mention “positive” in a structured format.Using Guidance for classification:from llama_cpp import Llamafrom guidance import assistant, gen, role, selectfrom guidance.models import LlamaCppdef classify_guided(classes: list, context: str) -> dict:”””Classifies a given context string into one of the provided classes.Args:classes (list): A list of possible classes to classify the context into.context (str): The input text to be classified.Returns:dict: A dictionary containing the classification result.”””(…)classes_ = “, “.join(classes)messages = [{“role”: “user”,”content”: f”Your role is to classify the input sentence into {classes_} classes. “f”Answer with one of {classes_} values.”},{“role”: “user”, “content”: context},]# Assign the language model to the variable ‘lm’lm = g_model # Assuming ‘g_model’ is a pre-defined language modelfor message in messages:with role(role_name=message[“role”]):lm += message[“content”]# Add the prompt for the language model to generate an answer from the provided classeswith assistant():lm += ” Answer: ” + select(classes, name=”answer”)return {“answer”: lm[“answer”]}Here, we use the Guidance library to constrain the output of the LLM.The select function allows the model to choose its answer from the provided list of classes. This approach ensures the model stays within the defined classes and provides a clear and structured prompt for more predictable classification. This eliminates the need for post-processing the output and significantly speeds up generation compared to an unconstrained prompt.This outputs the following dict:{‘answer’: ‘positive’}Clean and efficient 🐳Guided generation enables the implementation of advanced prompting techniques that can significantly enhance the reasoning capabilities of LLMs. One such technique is Chain-of-Thought (CoT), which encourages the LLM to generate a step-by-step explanation before arriving at the final answer.Lets try with a question:If you had ten apples and then you gave away half, how many would you have left? Answer with only digitsUsing Guidance for CoT:with assistant():lm += (“Lets think step by step, “+ gen(max_tokens=100, stop=[“.”, “so the”], name=”rationale”, temperature=0.0)+ ” so the answer is: “+ gen(max_tokens=10, stop=[“.”], name=”answer”))return {“answer”: lm[“answer”], “rationale”: lm[“rationale”]}By prefacing the LLM’s response with “Let’s think step by step,” we guide it to provide a rationale for its answer. We then specifically request the answer after “so the answer is:”. This structured approach helps the LLM break down the problem and arrive at the correct solution.This gives the following output:{‘answer’: ‘5’, ‘rationale’: ‘if you start with ten apples and give away half, you would give away 5 apples (half of 10)’}Guidance proves particularly useful for entity extraction tasks, where we aim to extract specific information from text in a structured format. We’ll try to extract a date and an address from a context using a specific format.We start with a basic prompt:messages = [{“role”: “user”,”content”: “Your role is to extract the date in YYYY/MM/DD format and address. If any of those information”” is not found, respond with Not found”},{“role”: “user”, “content”: f”Context: {context}”},]Then we constrain the llm to write an output in json format:with assistant():lm += f”””\“`json{{“date”: “{select(options=[gen(regex=regex, stop='”‘), “Not found”], name=”date”)}”,”address”: “{select(options=[gen(stop='”‘), “Not found”], name=”address”)}”}}“`”””We guide the LLM to extract the date and address by specifying the desired format and handling cases where the information might be missing. The select function, coupled with a regular expression for the date, ensures the extracted entities follow our requirements.So for an input like:14/08/2025 14, rue Delambre 75014 ParisWe get in the output:{‘date’: ‘2025/08/14’, ‘address’: ’14, rue Delambre, 75014 Paris’}The LLM successfully extracts the date and address, even reformatting the date to match our desired format.If we change the input to:14, rue Delambre 75014 ParisWe get:{‘date’: ‘Not found’, ‘address’: ’14, rue Delambre 75014 Paris’}This demonstrates that Guidance allows the LLM to correctly identify missing information and return “Not found” as instructed.You can also look at an example of ReAct implementation from the guidance documentation: https://github.com/guidance-ai/guidance?tab=readme-ov-file#example-reactThis one is a little trickier.Tools can be critical to address some of the limitations of LLMs. By default LLMs don’t have access to external information sources and are not always very good with numbers, dates and data manipulation.In what follows we will augment the LLM with two tools:Date Tool:This tool can give the LLM the date x days from today and is defined as follows:@guidancedef get_date(lm, delta):delta = int(delta)date = (datetime.today() + timedelta(days=delta)).strftime(“%Y-%m-%d”)lm += ” = ” + datereturn lm.set(“answer”, date)String reverse Tool:This tool will just reverse a string and is defined as follows:@guidancedef reverse_string(lm, string: str):lm += ” = ” + string[::-1]return lm.set(“answer”, string[::-1])We then demonstrate the usage of these tools to the LLM through a series of examples, showing how to call them and interpret their outputs.def tool_use(question):messages = [{“role”: “user”,”content”: “””You are tasked with answering user’s questions.You have access to two tools:reverse_string which can be used like reverse_string(“thg”) = “ght”get_date which can be used like get_date(delta=x) = “YYYY-MM-DD”””,},{“role”: “user”, “content”: “What is today’s date?”},{“role”: “assistant”,”content”: “””delta from today is 0 so get_date(delta=0) = “YYYY-MM-DD” so the answer is: YYYY-MM-DD”””,},{“role”: “user”, “content”: “What is yesterday’s date?”},{“role”: “assistant”,”content”: “””delta from today is -1 so get_date(delta=-1) = “YYYY-MM-XX” so the answer is: YYYY-MM-XX”””,},{“role”: “user”, “content”: “can you reverse this string: Roe Jogan ?”},{“role”: “assistant”,”content”: “reverse_string(Roe Jogan) = nagoJ eoR so the answer is: nagoJ eoR”,},{“role”: “user”, “content”: f”{question}”},]lm = g_modelfor message in messages:with role(role_name=message[“role”]):lm += message[“content”]with assistant():lm = (lm+ gen(max_tokens=50,stop=[“.”],tools=[reverse_string_tool, date_tool],temperature=0.0,)+ ” so the answer is: “+ gen(max_tokens=50, stop=[“.”, “\n”], tools=[reverse_string_tool, date_tool]))print(lm)return {“answer”: lm[“answer”]}Then, if we ask the question:Can you reverse this string: generative AI applications ?We get this answer:{‘answer’: ‘snoitacilppa IA evitareneg’}Where without the tool, the LLM fails miserably.Same with the question:What is the date 4545 days in the future from now?We get the the answer:{‘answer’: ‘2036-12-15’}As the LLM was able to call the tool with the correct argument value, then the guidance library takes care of running the function and filling in the value in the “answer” field.DemoYou can also run a demo of this whole pipeline using docker compose if you checkout the repository linked at the end of the blog.Image by AuthorThis app does zero-shot CoT classification, meaning that it classifies text into a list of user defined classes while also giving an rationale why.You can also check the demo live here: https://guidance-app-kpbc8.ondigitalocean.app/ConclusionThere you have it, folks! The use of Constrained Generation techniques, particularly through tools like the “Guidance” library by Microsoft, offers a promising way to improve the predictability and efficiency of Large Language Models (LLMs). By constraining outputs to specific formats and structures, guided generation not only saves time but also improves the accuracy of tasks such as text classification, advanced prompting, entity extraction, and tool integration. As demonstrated, Guided Generation can transform how we interact with LLMs, making them more reliable and effective in conforming with your output expectations.Code: https://github.com/CVxTz/constrained_llm_generationDemo: https://guidance-app-kpbc8.ondigitalocean.app/