Using Google’s Open-Source LLM "Gemma" on Google Colab

Using Google’s Open-Source LLM "Gemma" on Google Colab

Introduction

Hello, I’m T.T., a QA Engineer.

In the world of AI, generative AI models like ChatGPT have been gaining traction, and large language models (LLMs) for text generation are increasingly being used in professional settings.

The open-source LLM introduced in this article plays a crucial role in the advancement of AI technology. While closed LLMs like ChatGPT are highly capable, they can be costly for research purposes, such as tool development and validation using LLMs. Open-source LLMs, on the other hand, are free to use, making them a flexible option for startups and budget-constrained projects. Additionally, Google Colab allows users to experiment with open-source LLMs without needing high-performance computing environments.

This article explains how to use Google’s open-source LLM "Gemma" on Google Colab, a compact yet powerful model compared to GPT-3 or GPT-4.

Overview of Gemma

Gemma is a lightweight, cutting-edge open-source large language model (LLM) developed by Google. Created by Google DeepMind and other Google teams, its name comes from the Latin word for "gem". In June 2024, Google released the latest version, Gemma 2.

Features and Benefits of Gemma

  • Open-source: Anyone can freely download and use Gemma.
  • Lightweight: Available in three sizes – Gemma2 2B, Gemma2 9B, and Gemma2 27B. The smallest, Gemma2 2B, balances lightweight design with high performance, making it suitable for mobile devices and embedded systems.
  • Commercial use: Gemma is commercially available under the Apache License 2.0, provided users agree to its terms.
  • Cost-efficient: Being free to use, it is accessible to startups and budget-limited organizations.
  • Broad compatibility: Gemma runs on various environments, including Google Colab, making it easy to experiment with.
  • Security-focused: Personal and sensitive data is excluded from its training dataset, ensuring high security.
  • High performance: Gemma achieves strong benchmark results across key AI performance tests.

Setting Up Google Colab for Gemma

Now, let's explore how to set up Google Colab to run Gemma.

About Google Colab

Google Colab (Colaboratory) allows users to write and execute Python code directly in a web browser. Colab provides access to GPUs, enabling high-speed execution of resource-intensive tasks like AI processing—all for free. Some key features include:

  • Minimal setup required – No need to install dependencies manually.
  • Easy to use – Intuitive interface and straightforward execution.
  • Free access to GPUs – Perform heavy computations without cost.
  • Simple sharing – Easily collaborate with others by sharing notebooks.

Limitations to Keep in Mind

While Google Colab is highly convenient, the free plan comes with certain restrictions:

  • Resets after 90 minutes of inactivity – If left idle, the session is terminated.
  • Resets after 12 hours – Sessions automatically restart after 12 hours, even if active.
  • GPU usage limits – Excessive GPU use may temporarily restrict access (CPU remains available).

Upgrading to a paid plan removes these limitations, allowing longer sessions, faster GPUs, and more memory. If the free plan feels restrictive, consider upgrading.

Additionally, when using Google Colab, avoid embedding sensitive information (e.g., API keys or passwords) directly in your code. Instead, use the "secrets" feature for secure handling of confidential data. A detailed explanation on how to use it will be provided later.


Getting Started

Now, let's set up Google Colab and start using Gemma. The first step is to create a new notebook (workspace).

1. Log in to your Google account.

2. Go to the official Google Colab site: https://colab.google/

3. Click "New Notebook" to create a new workspace.

Then, you will see a screen like this.

4. Select [Edit] → [Notebook settings] in the new notebook, and set "T4 GPU" as the hardware accelerator. If you leave it as "CPU", running Gemma will be extremely slow, so make sure to change this setting.

5. To use Gemma, you need an API Token from Kaggle. First, visit the following site and click [Request Access] to register on Kaggle and agree to the Gemma Terms of Use: https://www.kaggle.com/models/google/gemma/

If the following message appears, your registration is complete.

6. Next, generate an API Token. Go to the account settings page, navigate to the [API] section, and click "Create New API Token." The downloaded JSON file contains your username and key (API Token), which will be used later.

With this, the preparation is complete.

Gemma Setup Steps

Now, let's run Gemma! We'll follow the instructions from "Gemma in PyTorch."

1. Register API Credentials as Secrets

First, register the username and key (API Token) obtained earlier as secrets.

  • Click "Add New Secret."
  • Add KAGGLE_USERNAME and KAGGLE_KEY.
  • Enter your username and API Token in their respective fields.

2. Load Secret Information in the Program

Next, load the secret information into the program.

  1. Click [+ Code] to add a new code cell.
  2. Enter the following program code:
  3. After entering the code, press "Shift + Enter" to execute it.
  4. This will run the program in the current cell and automatically select the next cell.

(From now on, all program executions will be done using "Shift + Enter.")

import os
from google.colab import userdata # `userdata` is a Colab API.

os.environ["KAGGLE_USERNAME"] = userdata.get('KAGGLE_USERNAME')
os.environ["KAGGLE_KEY"] = userdata.get('KAGGLE_KEY')

Now, you can use KAGGLE_USERNAME and KAGGLE_KEY within the program. It is not good practice to directly include the username and key (API Token) in the program for security reasons, so be sure to use the secret management feature.

Next, install the necessary libraries to run Gemma.

!pip install -q -U torch immutabledict sentencepiece

Downloading the model weights. Here, we are selecting the instruction-tuned (IT) version of the Gemma2 2B model, "2b-it." The 9B and 27B models could not be run on the free version of Google Colab due to memory limitations

# Choose variant and machine type
VARIANT = '2b-it' #@param ['2b', '2b-it', '9b', '9b-it', '27b', '27b-it']
MACHINE_TYPE = 'cuda' #@param ['cuda', 'cpu']

CONFIG = VARIANT[:2]
if CONFIG == '2b':
  CONFIG = '2b-v2'

import os
import kagglehub

# Load model weights
weights_dir = kagglehub.model_download(f'google/gemma-2/pyTorch/gemma-2-{VARIANT}')

# Ensure that the tokenizer is present
tokenizer_path = os.path.join(weights_dir, 'tokenizer.model')
assert os.path.isfile(tokenizer_path), 'Tokenizer not found!'

# Ensure that the checkpoint is present
ckpt_path = os.path.join(weights_dir, f'model.ckpt')
assert os.path.isfile(ckpt_path), 'PyTorch checkpoint not found!'

Downloading the model implementation.

# NOTE: The "installation" is just cloning the repo.
!git clone < https://github.com/google/gemma_pytorch.git>
import sys

sys.path.append('gemma_pytorch')
from gemma.config import GemmaConfig, get_model_config
from gemma.model import GemmaForCausalLM
from gemma.tokenizer import Tokenizer
import contextlib
import os
import torch

Setting up the model.

# Set up model config.
  model_config = get_model_config(CONFIG)
model_config.tokenizer = tokenizer_path
model_config.quant = 'quant' in VARIANT

# Instantiate the model and load the weights.
  torch.set_default_dtype(model_config.get_dtype())
device = torch.device(MACHINE_TYPE)
model = GemmaForCausalLM(model_config)
model.load_weights(ckpt_path)
model = model.to(device).eval()

Running a Sample Test

Let's input a specific prompt (a command for the LLM) and check its response. We will have the model summarize the introductory explanation from Wikipedia's article on "Artificial Intelligence."

Note: In the example below, parts of the text are omitted, but in actual execution, the full Wikipedia text is used.

prompt = """
Summarize the following text in 100 characters.  
Artificial intelligence(AI) refers to a field of computer science that studies "intelligence" using the concepts of "computation" and the tool of "computers."

  (omitted)

"We can only see a little ahead, but we know there is still much to be done."
""".strip()

# Generate sample
model.generate(
    prompt,
    device=device,
    output_len=4096,
)

The summary result is as follows:

Gemma2's Output:

AI development is progressing rapidly, and while further advancements are highly possible, there is still no clear answer on how to predict its future evolution. <end_of_turn>

Comparison with ChatGPT-4o We compared it with ChatGPT-4o, a large-scale commercial model.

ChatGPT-4o's output:

"Artificial intelligence (AI) is a field of computer science that studies intelligence using computers, enabling tasks such as language understanding and reasoning. AI has historically made significant progress, but there are still challenges ahead for its future."

Comparison of Gemma2 and ChatGPT-4o Output

When comparing the outputs of Gemma2 and ChatGPT-4o, both models provide high-quality summaries, making it difficult to determine a clear winner. Both models effectively condense information and offer valuable insights to users. However, in terms of processing speed, ChatGPT-4o is significantly faster, likely due to differences in hardware specifications. Each model has its strengths, and choosing the appropriate one based on specific use cases can maximize the effectiveness of AI utilization.

  • Gemma2 on Colab: 3 minutes
  • ChatGPT-4o: 5 seconds

Practical Usability

Now that we have successfully run Gemma, let's consider its practicality for real-world applications. As an open-source LLM, it can be particularly useful in the following scenarios:

  • Reducing operational costs and enabling low-cost experimental implementations
  • Leveraging LLMs within highly secure environments without external data exposure
  • Automating and optimizing tasks in specific domains while ensuring stable operation on internal servers
  • Summarizing internal documents efficiently

However, there are some challenges to consider:

Performance

Open-source models often have slightly lower performance compared to commercial LLMs. This is particularly noticeable in generative tasks such as creative writing or code generation, where commercial models tend to produce more refined results.

Scalability

Deploying large-scale open-source LLMs in production requires significant computational resources, which can increase operational costs. Additionally, open-source models are often less optimized than commercial alternatives, requiring performance tuning for efficient use.

Domain-Specific Applications

Open-source models are typically trained on general datasets and may not be specialized for specific domains (e.g., legal or medical fields). Adapting them to specialized fields requires additional datasets and fine-tuning. In contrast, proprietary models are often pre-optimized for specific industries, incorporating tailored training data and parameter adjustments. This allows businesses and institutions to leverage high-precision models while maintaining necessary security and privacy measures.

Conclusion

This blog post demonstrated that advanced AI technologies like Gemma can be easily tested using Google Colab. In addition to Gemma, there are many other open-source LLMs such as Llama3 and BLOOM, each offering unique strengths and capabilities for AI applications.

With the rapid advancement of AI, we are entering an era where high-performance AI tools are accessible to everyone. The democratization of AI technology enables individuals and small businesses to develop and utilize AI-powered services and tools with ease. Our company also offers AI-driven services such as AI Technical Code Review and AI Debugging, which significantly enhance development efficiency and product quality. As AI technology continues to evolve, we can expect even more innovative services that will bring new possibilities to our daily lives and business operations.

This article was originally published in Japanese on Sqripts and has been translated with minor edits for clarity.

Turn your vision into a tangible reality

We've been doing this for 30 years - helping businesses like yours with software development and testing. Let's connect and explore how we can support your goals.

Related Posts