in Blog

January 26, 2024

How to Build an LLM Model Using Google Gemini API

Author:




Edwin Lisowski

CSO & Co-Founder


Reading time:




11 minutes


Multimodal AI is revolutionizing the AI landscape. In just a few years, the world has shifted from simple, analytics-based AI models that, at one time, seemed to be the epitome of technological advancement, to large language models (LLMs) that can handle a myriad of tasks, sometimes even rivaling human creativity. What’s even more impressive is these models can streamline the development of advanced AI models.

Take the Gemini API, for instance. Developed with multimodal capabilities, the LLM can handle various data types, facilitating the development of more advanced, multimodal large language models that will shape the future of AI utilization across the board.

This article will help you create unique large language models using the Gemini API. But first, here’s some helpful information to get you started.

Generative-AI-banner-CTA

What is Gemini AI?

The Gemini PRO model is the latest addition to Google DeepMind’s assortment of LLMs. It is significantly larger and better performing than their previous project, PaLM, which didn’t perform quite well compared to similar models of the same category.

That said, Gemini boasts a wide range of capabilities, most notable of which is its multimodality, which enables it to handle various data types, including text, images, audio, and video. It also performs quite well in tasks related to physics, math, code, and other technical fields. In fact, Gemini outperformed OpenAI’s GPT-4 in these and several other areas. [1]

Gemini is currently available through integrations with Google’s Pixel 8, Google Bard, and the Gemini API. According to Google, the company plans to make it available on other Google service platforms in the near future.

Besides its multimodality, the other thing that sets Gemini AI apart from other similar models is its flexibility and scalability. It comes in three sizes, enabling seamless utilization in several platforms and architectures, including data centers and mobile devices. The various sizes or iterations include:

Google Gemini Models Size, gemini pro

Image credits: Google

Gemini Nano

The Nano model is the smallest in the Gemini series. It is designed to run on smartphones like the Google Pixel 8, where it can perform several on-device tasks that don’t require a connection to external servers. Currently, the Nano model can perform simple tasks such as text summarization and suggesting replies in chat applications. [2]

Gemini Pro

Gemini Pro is much larger than the Nano model. As such, it cannot be utilized on on-device applications. Instead, it runs on Google’s data centers and is currently integrated into Google Bard. Its complex architecture and large dataset enable it to perform seamlessly on tasks that require the understanding of complex queries and fast response time.

Gemini Ultra

Gemini Ultra is still under development and thus not available for public use. However, the company describes it as its most effective model, exceeding the current SoTA in 30 out of the 32 academic benchmarks used in LLM research and development.[3]

According to Google, Gemini Ultra is designed for widespread use in complex tasks. The model is scheduled for release after completion of its current testing phase. Unfortunately, the company has not yet set a release date, but it’s estimated to be released shortly.

Read more about: Google Gemini: How Can It Be Used?

Getting started with Gemini

The flexibility and performance provided by Gemini’s multimodal capabilities make it one of the most versatile tools for building Large Language Models. In the sections below, we will explore how you can develop a unique LLM using the Gemini API platform, including Langchain interactions and how to leverage the platform effectively for various use cases.

What you need to start with Gemini:

  1. Google API key
  2. Python (at least version 3.10 or higher)
  3. Python package installer (Pip)
  4. A code editor (can be any editor of your choice such as PyCharm, VSCode, or any other suitable editor)

Building LLMs with Gemini API

The Gemini Pro API is set up in a way that enables you to build deployable LLMs seamlessly in just a few simple steps. Here’s how to do it:

Step 1: Set Up Your Development Environment

Like with any other AI project, you first need to create a new directory for your project, then navigate to it using your command prompt or terminal as shown below.

mkdir LLM_ cd Project
LLM_Project

Step 2: Install Dependencies

Once you have your development environment up and running, you need to set up the dependencies you need to develop your LLM. These dependencies include:

  • Google’s generative library for interacting with the Gemini Pro API
  • The Langchain-Google-Genai library, specifically one that supports the Gemini LLM
  • The streamlit web framework for creating a chat interface with the model

Once you have the libraries stored in a single location, you can install them with the code below:

pip install google-generativeai langchain-google-genai streamlit pillow

Alternatively, you can create a virtual environment to install the required libraries and manage your project dependencies as shown below:

python -m venv venv
source venv/bin/activate #for ubuntu
venv/Scripts/activate #for windows

However, if you choose to go with the method described above, you need to launch the streamlit App and create an application file (app.py) using your code editor. Once you have that up and running, you can run the application locally.

That said, if you choose to go with the initial method, proceed with step 3.

Step 3: Configure the Free API Key and Initialize the Gemini Model

Google is currently offering a free Google API key at Google Makersuite. After obtaining it, you need to store it in an environment variable. For this instance, let’s name it “GOOGLE_API_KEY”. However, to work with the Gemini AI model, you first need to import configurations from Google’s Genai library and transfer the Google API key you obtained to the api_key variable. Here’s how to do it:

import os
import google.generativeai as genai

os.environ['GOOGLE_API_KEY'] = "Your API Key"
genai.configure(api_key = os.environ['GOOGLE_API_KEY'])
model = genai.GenerativeModel('gemini-pro')

Generating text with Gemini LLM

Once you have your Gemini LLM model up and running, you can start utilizing it for text generation. For this, you first need to import the Markdown class from IPython. Importing the Markdown class helps display generated output in a Markdown format.[5]

Additionally, to create a working model, you also need to ‘call’ the GenerativeModel class from genai, then the GenerativeModel.generate_content() function to facilitate the processing of user queries so that the model can generate an appropriate response.

Here’s how to do it:

from IPython.display import Markdown
model = genai.GenerativeModel('gemini-pro')
response = model.generate_content("List 5 planets each with an interesting fact")
Markdown(response.text)

There are currently two models available in the Gemini LLM series;

1. Gemini Pro

Gemini Pro is an unimodal, text-based model. This means that it can only process text inputs and generate output in textual format. The model has an input token length of 30 tokens and can generate an output of up to 2000 tokens, making it suitable for creating chat applications.

2. Gemini Pro Vision

The Gemini Pro Vision model is a multimodal model capable of processing both text and image-based inputs.[6] It closely resembles OpenAI’s GPT-4 model, with their most significant difference lying in their context length. In this regard, Gemini Pro Vision has an input context length of 12,000 tokens and can generate an output of up to 4,000 tokens.

Depending on your specific use case, you can ask the model (Gemini Pro) to generate text by asking it a simple question such as ‘Name the eight planets of the solar system’, or input an image (for the vision model) and ask the model questions pertaining to the image. For instance, you can input an image of a cat and ask the model to recognize it. You can also take your queries a step further by asking it to further elaborate on them.

Using Gemini to generate multiple responses from a single prompt

The Gemini model does not refer to its output as an output. Instead, it uses the word candidates. As such, any response from the model may contain the word candidate. According to Google, the model can generate multiple candidates from a single prompt, giving you the chance to choose the best answer among a multitude of choices.

Unfortunately, this capability has not yet been implemented, which means you can only generate a single candidate per query. On the upside, there’s speculation that upcoming updates may provide multiple responses from a single query.

Exploring Gemini’s multimodal capabilities

As a multimodal model, the Gemini LLM can process and analyze different types of input data. In this instance, we’ll test its chat function with an image query. That said, it is important to note that only the Gemini Pro Vision model can handle image inputs.

Here’s an example of how you can use the Gemini LLM to process and analyze an image input.

Use the PIL library to load an image. lem

Create a new vision model called ‘Gemini-pro-vision’ using the GenerativeModel class.

Using the GenerativeModel.generative_content() function, input the image along with the necessary textual prompts.

Here’s a simple code for this operation:

import PIL.Image
image = PIL.Image.open('random_image.jpg')
vision_model = genai.GenerativeModel('gemini-pro-vision')
response = vision_model.generate_content(["Write a 100 words story from the Picture",image])
Markdown(response.text)

The code above will prompt the model to generate a text based on the image. However, you can take it a step further by prompting the model to generate a JavaScript Object Notation (JSON) response.

In this instance, the model will inspect the objects in the image and deliver an appropriate response. For example, if you input an image of a busy street, the model may be able to name the objects in the image, including people, vehicles, and other elements.

Using Gemini LLM’s chat version

Besides the regular text generation model described above, the Gemini LLM also offers a dedicated chat version, taunted to generate coherent, human-like responses. You can launch the chat version of Gemini Pro using the code below:

chat_model = genai.GenerativeModel('gemini-pro')
chat = chat_model .start_chat(history=[])

It is important to note the difference in code when generating plain text and launching the chat version. Instead of using the GenerativeModel.generate_text() line, you should use GenerativeModel.start_chat() to start a chat.

Like with the GPT-4 model, you can ask the Gemini LLM to elaborate further on the response provided or tweak it in a way that suits your preference. You can also type chat. history to track all previous messages in the chat session.

Langchain and Gemini integration

Langchain gained widespread popularity for its integration with the OpenAI API, which facilitates the seamless development of chatbots and other LLMs. Shortly after Gemini was released, Langchain provided improvements for integrating Gemini into its platform.[7] Here’s how to get started with Gemini on the Langchain platform.

  1. Create an LLM by passing your preferred Gemini AI model to the ChatGoogleGenerativeAI class
  2. Invoke function and pass your prompt to generate a response
  3. Access the generated response by calling the response. content

Here’s a simple example code for the operation:

from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(model="gemini-pro")
response = llm.invoke("Explain Quantum Computing in 50 words?")
print(response.content)

Creating a ChatGPT Clone With Streamlit and Gemini

Building a simple ChatGPT-like application with Gemini LLM and Streamlit is pretty straightforward. Here’s how to do it:

  1. Import the Streamlit, google, generative ai, and OS libraries
  2. Set up your Google Gemini PRO API key and configure it to interact with the Gemini Pro model
  3. Using the Gemini Pro API, create a GenerativeModel object
  4. Initialize chat session history so that you can store and load chat conversations
  5. Create a chat_input session to create a ‘window’ where you can type queries

Final thoughts

The Gemini PRO API is poised to revolutionize the AI landscape. Its multimodal capabilities make it perfectly suited for developing advanced LLMs that can process and analyze various data types. Like the OpenAI API, the Gemini API has multiple integration capabilities with numerous platforms, including Streamlit and Langchain, that further streamline the model development process.

Currently, developers can only work with the Gemini Pro Vision and Gemini Pro models to develop LLMs. However, the Gemini Ultra model is scheduled for release and will present further development capabilities, perhaps even surpassing OpenAI’s API.

References

[1] Deepmind. Google, Gemini. URL: https://deepmind.google/technologies/gemini/. Accessed on January 12, 2024

[2] Storage. Google. Gemini Nano Now Running on Pixel 8 Pro. URL: https://store.google.com/intl/en/ideas/articles/pixel-feature-drop-december-2023/. Accessed on January 12, 2024

[3] Theverge.com. Google’s Bard Chatbot is Getting Way Better Thanks to Gemini. URL: https://www.theverge.com/2023/12/6/23989744/google-bard-gemini-model-chatbot-ai. Accessed on January 13, 2024

[4] Blog. Google. Introducing Gemini: Our Largest and Most Capable AI Model. URL: https://blog.google/technology/ai/google-gemini-ai/. Accessed on January 13, 2024

[5] Markdownguide. Getting Started. URL: https://bit.ly/3SsfmPf. Accessed on January 13, 2024

[6]  Pyjmagesearch.com. Introduction to Gemini Pro Vision. URL: https://pyimagesearch.com/2024/01/01/introduction-to-gemini-pro-vision/. Accessed on January 13, 2024

[7] Protecto.ai. This Week in AI – LangChain Integrates Gemini Pro and More. URL: https://bit.ly/3SfwnLr. Accessed on January 13, 2024



Category:


Generative AI