Learn Data Science,

Finetuning in large language models

May 7, 2024 | 7 minute read

Sandip Ghoshal

Principal Applied Scientist, OCI Gen AI

Sid Padgaonkar

Sr. Director - Product Management (Gen AI) - Strategic Customers

Text Size 100%:

－＋

Large language model (LLM) finetuning is a way to enhance the performance of pretrained LLMs for specific tasks or domains, with the aim of achieving improved inference quality with limited resources. Finetuning is crucial for domain-specific applications where pretrained models lack necessary context, taxonomy, or specialized knowledge. This blog post delves into different finetuning options, including unsupervised, supervised, and instruction-based methods, discussing the appropriate use case for each method. We also discussed advanced techniques to update pretrained LLM weights, such as full finetuning, adapter-based tuning, and parameter-efficient fine-tuning, each with distinct advantages and limitations. These techniques enable LLMs to adapt more effectively to tasks, balancing efficiency with performance depending on the approach chosen.

Finetuning or prompt engineering?

While trying to apply an LLM-based solution to a business problem, customers often ask whether to fine-tune or optimize the prompt. It depends on the complexity of the problem, the size of the dataset, the level of accuracy expected from the system and the budget associated with it. You can solve a plethora of business problems by crafting the prompt carefully with the simplest zero-shot approach with no example. You can solve many problems by few-shot or in-context learning approaches, where the prompt contains one or more examples for the LLM to learn from those and generate similar response. You can view these approaches as run-time optimization where no LLM weights are modified. Despite being simple, easy-to-implement, and effective, prompt-based techniques don’t work all the time.

Figure 1: Key benefits of LLM finetuning

Use case: Why LLM finetuning is required

Despite the extensive training data utilized to train LLMs, finetuning is necessary, particularly in the context of domain-specific applications. While pretrained LLMs excel at capturing general language patterns and semantics from vast corpora, their effectiveness in addressing specific tasks within specialized domains can be significantly enhanced through finetuning.

Consider the following use cases in this context:

Support email generation: In a telecom company, customer support handles a constant influx of inquiries and complaints. Every day, thousands of emails flood in, ranging from troubleshooting telecom software issues to lodging complaints. These emails are meticulously categorized, ensuring that they reach the most qualified department for swift resolution. Response protocols often involve outlining clear, step-by-step solutions to customer problems. In this scenario, LLMs trained on publicly available web data are unable to effectively categorize emails or generate appropriate solutions for customer inquiries.
Summarizing patients’ histories in a healthcare facility: Automatically generating summaries of patients’ medical histories reduces the time that medical professionals spend reading through lengthy records. A concise summary of the previous records allows them to dedicate more time to other vital tasks. Medical user histories are highly sensitive and not publicly available over the internet to feed pretrained LLMs.

Because the domain data is unseen, pretrained LLMs often fall short of expectations because their inability to comprehend the intricacies of medical data. This issue can lead to inaccurate summaries that have the potential to negatively impact patient care.

In both the use cases, generic pretrained LLMs lack specialized domain knowledge and can’t produce the optimal output. Finetuning on targeted datasets within these specific domains bridges this gap leading to significant improvement in accuracy and effectiveness.

Figure 2: The workflow for LLM finetuning

Types of LLM finetuning

LLM finetuning comes in several varieties, including the following examples:

Unsupervised finetuning: This method doesn’t require labeled data. Instead, the LLM is exposed to a large corpus of unlabeled text from the target domain. The LLM analyzes the statistical properties and relationships between words within this domain-specific data and refines its understanding of the language used in that field. This information helps the LLM venture into new domains like legal or medical, where identifying broad themes, unusual patterns, domain specific vocabulary is more important. Unsupervised finetuning is commonly used for tasks such as language modeling, where the model learns to predict the next word in a sequence based on its context. However, this approach can be less precise when comes to specific tasks such as classification or summarization.
Supervised finetuning (SFT): SFT example providing the LLM with labeled data specifically designed for the target task. For instance, to fine-tune an LLM for text classification specific to a business unit, we might give it a dataset of text snippets with the class labels attached. By analyzing the labeled data, the LLM identifies patterns in the text that correlate with these labels. This ability allows it to improve its ability to categorize the new, unseen text from that domain into the predefined labels provided during training. Supervised finetuning is an effective technique. However, it requires a significant amount of labeled data, which can be expensive and time-consuming to collect and annotate. In some cases, creating synthetic labeled data can be a viable alternative to manual annotation.
Instruction finetuning: Unlike supervised finetuning, which relies heavily on labeled data, instruction finetuning focuses on providing the LLM with instructions in natural language. Imagine you want to create a support assistant specific to an organization. Instruction finetuning allows you to provide instructions like “Write a response to the customer who is facing the following issue…” or “Summarize the following chat transcript between a support agent and a customer…” The LLM learns to interpret these instructions, allowing it to perform specific tasks or fulfill specific functionalities without needing vast amounts of labeled data for each task. While instruction finetuning empowers control, adaptability, and reduces data dependency, design the prompt or instructions can be challenging. Poorly designed prompts can lead to suboptimal model performance and have limited generalization capabilities.

Techniques to update pretrained LLM weights for finetuning

In the previous section, we explored various methodologies for finetuning LLMs based on the structure of the training dataset. This section dives into the various techniques used to update the weights of pretrained LLMs. LLM weights refer to the parameters learned by LLMs during training. These parameters determine how input data is processed and transformed into meaningful output. These weights are the core of the model’s language understanding. The optimization of LLM weights is crucial for finetuning the model’s performance on specific tasks, as adjusting these parameters enables the model to better capture the underlying patterns and complexities present in the data, that ultimately optimizes its performance towards our desired objectives.

Full finetuning: Full finetuning is a comprehensive approach for adapting LLMs to specific tasks. It involves essentially retraining the entire LLM architecture, on a dataset of labeled examples relevant to the desired outcome. Full finetuning is applicable to tasks where high accuracy is critical, we have access to a large amount of labeled data specifically customized to the target task and situations where the complexity of the task demands the full adaptability of the LLM architecture. Full finetuning increases the accuracy, however training all the layers of large models is cost- and resource-intensive, especially for large datasets. Moreover, effective fine-tuning often demands substantial amounts of labeled data specific to the target task, acquiring and annotating this data can be time-consuming and expensive.
Adapter-based finetuning: This method has gained immense popularity recently because it’s computationally efficient, flexible, lightweight in nature and can be integrated seamlessly into large systems. Unlike full fine-tuning, which adjusts the entire LLM architecture, adapter modules act as specialized add-ons. These small, trainable modules are integrated into specific layers within the pretrained LLM. This technique also minimizes the risk of the LLM forgetting its pretrained knowledge, known as catastrophic forgetting. While adapters can improve performance on specific tasks, they might not generalize effectively to new tasks or datasets.
Parameter-efficient finetuning (PEFT): Unlike full finetuning methods that update every parameter in the LLM during supervised learning, PEFT strategically selects specific model components for training while freezing the rest of the parameters. This method results in a dramatic reduction in the number of trainable parameters, sometimes as low as 15–20% of the original model, which results in low computational cost and faster training. PEFT also stores a small footprint of parameters for each fine-tuned task that not only mitigates storage issues but also enables simultaneous loading of multiple finetuned models in memory with the base LLM.

These traditional fine-tuning techniques described don’t always guarantee human-preferred outputs. Some advanced techniques proposed can bridge the gap, such as reinforcement learning from human feedback (RLHF) and direct performance optimization (DPO). We explore these topics in depth in a future article.

Conclusion

Finetuning LLMs presents a flexible and powerful way to tailor advanced AI tools to meet specific business or research needs. By using different finetuning methods—whether unsupervised, supervised, or instruction-based—organizations can significantly enhance the applicability and accuracy of LLMs in specialized domains. Techniques such as full finetuning, adapter-based tuning, and parameter-efficient tuning further refine this customization process, allowing for a targeted approach that maximizes performance while minimizing resource consumption. Ultimately, understanding and applying these techniques can transform a general-purpose LLM into a specialized tool that drives innovation and efficiency in any field.

Read part 1 of this 5-part blog series - "Navigating the frontier: Key considerations for developing a generative AI integration strategy for the enterprise"

Read part 2 of this 5-part blog series - "Comprehensive tactics for optimizing large language models for your application"

Read part 3 of this 5-part blog series - "Beginner’s Guide to Engineering Prompts for LLMs"

Look out for part 5 of this series on retrieval-augmented generation (RAG).

For more information, see the following resources:

Sandip Ghoshal

Principal Applied Scientist, OCI Gen AI

Sandip Ghoshal is a Principal Applied Scientist with Oracle Cloud Infrastructure's Generative AI Group. As a part of the Gen-AI Sciences team Sandip implements state-of-the-art machine learning algorithms, builds prototypes and explores conceptually new solutions for OCI products and customers. Before OCI, Sandip lead the machine learning division of Oracle Content Management and was a founder of Oracle Smart Content.
Other than Machine Learning, Sandip loves hiking, if you don't hear back from him right away, there's a good chance he's scaling a peak or trekking through a remote trail.

Sid Padgaonkar

Sr. Director - Product Management (Gen AI) - Strategic Customers

Sid Padgaonkar is the Senior Director with OCI's Strategic Customers Group. Sid if focused on GEN AI product incubations, outbound product management and GTM strategy.

Announcing the general availability of OCI Language 4.0

Mayank Goyal | 3 min read

Deploying ELYZA with vLLM and OCI Data Science

Wendy Yip | 6 min read

Resources for

Why Oracle

Learn

What's New