使用新的 NVIDIA AI 基礎模型生成代碼、回答查詢并翻譯文本

本周的 Model Monday 版本包含 NVIDIA 優化的代碼 Lama、Kosmos-2 和 SeamlessM4T，您可以直接在瀏覽器中體驗。

通過NVIDIA AI 基礎模型和端點，您可以訪問由 NVIDIA 構建的一系列精選社區，生成式 AI用于在企業應用程序中體驗、自定義和部署的模型。

Code Lama 70B

Meta 的 Code Llama 70B 是最新的先進代碼 LLM，專用于代碼生成。它基于 Llama 2 模型構建，提供更高的性能和適應性。該模型可以從自然語言生成代碼，在編程語言之間翻譯代碼，編寫單元測試，并協助調試。

Code Lama 70B 具有 10 萬個令牌的大上下文長度，因此能夠處理和生成時間更長、更復雜的代碼，這對于更全面的代碼生成和提高處理復雜編碼任務的性能非常重要。這種開源模型可用于代碼翻譯、匯總、文檔、分析和調試等各種應用。

我們的 Code Lama 70B 模型通過 NVIDIA TensorRT-LLM 提供，您可以通過 NGC 目錄訪問。

Screenshot of the Code Llama 70B model running in the NGC catalog user interface, where the model understands natural language and generates code that can help software developers increase productivity. — *圖 1.Code Lama 70B 模型使用自然語言生成代碼*

Kosmos-2

Microsoft Research 的最新多模態大型語言模型 (MLLM) Kosmos-2 使用語言模型實現了視覺感知的顯著進步。它通過使用邊界框將語言元素（例如輸入或輸出中的單詞或短語）鏈接到圖像中的特定部分，從而實現這一目標。Kosmos-2 最終支持視覺基礎、地面問答、多模態參考和圖像字幕等任務。

Kosmos-2 基于 Kosmos-1，支持感知多模態輸入和上下文中學習。Kosmos-2 使用基于圖像文本對的 Web – Scale 數據集（稱為 GrIT）進行訓練，其中包括文本跨度和邊界框，將圖像中的特定區域鏈接到相關文本。圖 2 展示了其功能。

Kosmos-2 model is running in the NGC catalog user interface, where the model identifies various objects in the image, places bounding boxes around them, and then generates appropriate responses for the user prompts. — *圖 2.Kosmos-2 模型向圖像添加邊界框并生成提示響應*

與旨在實現類似目標的前幾代 MLLM 相比，KOSMOS-2 在熱門學術基準數據集上的零樣本短語基礎和參考表達理解功能方面表現出色。如果您是一名 AI 開發者，希望利用大型語言模型 (LLM) 突破多模態感知的界限，那么 Kosmos-2 就是您的不二之選。

SeamlessM4T

元數據開發的SeamlessM4T是一種多模態基礎模型，能夠翻譯語音和文本，從而簡化了企業克服溝通障礙的流程。這促進了知識交流，并有助于在當今全球經濟中的國際商業努力。

該模型系列支持近 100 種語言的自動語音識別 (ASR)、語音轉文本翻譯和文本轉文本翻譯。這些模型還支持語言切換，支持多語種演講者無縫溝通，因為他們在對話中自然會更改語言。

NVIDIA 優化了 SeamlessM4T 文本轉文本模型。圖 2 顯示了將演講者的多語種語音翻譯成西班牙語的模型。

Screenshot of the SeamlessM4T model running in the NGC catalog user interface, where the model translates text to different languages. Here, it is translating user input from English to Spanish. — *圖 3.將用戶輸入從英語翻譯成西班牙語的 SeamlessM4T 模型示例*

該模型的企業用例眾多，包括促進與國際客戶和合作伙伴的無縫交互。在客戶服務領域，客戶查詢和支持回復的實時翻譯可以確保跨語言障礙的有效溝通，以及全球團隊在項目中的協作。

Kosmos-2 用戶界面

您可以通過訪問 NVIDIA NGC 目錄上的簡單用戶界面，直接在瀏覽器中體驗 Kosmos-2。在 NGC 目錄中找到“宇宙 2 (Kosmos-2)”游樂場，然后輸入您的提示，查看在完全加速堆棧上運行的模型生成的結果。視頻 1 展示了 NVIDIA AI Foundation 模型接口的使用，該接口利用在完全加速堆棧上運行的 Kosmos-2 來回答圖像中的用戶提示。

視頻 1.在 NVIDIA AI 基金會模型游樂場中運行的 Kosmos-2

Kosmos-2 API

您還可以使用 API 測試模型。登錄 NGC 目錄，然后訪問 NVIDIA Cloud Credits，通過將應用程序連接到 API 端點來大規模體驗模型。

以下是一個使用 Python 調用 API 并可視化結果的示例。請確保您的環境中安裝了虛擬環境，并且已經設置了 Jupyter Notebook。如果需要，您可以使用 pip 安裝requests、PIL 和 IPython 模塊。

!pip install requests ipython pillow

第 1 步：獲取 NGC 目錄 API 密鑰

在“NGC catalog API”(NGC 目錄 API)選項卡中，選擇“Generate Key”(生成密鑰)。系統將提示您注冊或登錄。

Image showing how to navigate to the API tab in the Playground view. — *圖 4.生成 Kosmos-2 API 密鑰*

接下來，在代碼中設置 API 密鑰：

# Will be used to issue requests to the endpoint
API_KEY = “nvapi-xxxx“

第 2 步：以 Base64 格式對圖像進行編碼

要在請求中提供圖像輸入，您必須以 Base64 格式對其進行編碼。本示例使用來自 COYO-700M 圖像-文本對數據集的圖像。

import os
import base64
?
# Fetch an example image from 
!wget -cO - https://www.boredart.com//wp-content/uploads/2014/06/Beautiful-Pictures-From-the-Shores-of-the-Mythical-Land-421.jpg > scenery.png
?
# Encode the image in base64
with open(os.path.join(os.getcwd(), "scenery.png"), "rb") as image_file:
????encoded_string = base64.b64encode(image_file.read())
?
# Optionally, Visualize the image
from IPython import display
display.Image(base64.b64decode(encoded_string))

An image depicting a mythical landscape. — *圖 5.Kosmos-2 模型中輸入多個視覺對象的圖像*

第 3 步：發送推理請求

Kosmos-2 模型可以執行視覺著陸、地面問答、多模態引用和地面圖像字幕等任務。要執行的任務由包含特殊令牌決定。下面是特殊令牌<grounding>告知模型將文本中的某些短語鏈接到圖像中的部分。這些短語包含在<phrase>令牌，如輸出所示。

import requests
??
invoke_url = "https://api.nvcf.nvidia.com/v2/nvcf/pexec/functions/0bcd1a8c-451f-4b12-b7f0-64b4781190d1"
fetch_url_format = "https://api.nvcf.nvidia.com/v2/nvcf/pexec/status/"
??
headers = {
????"Authorization": "Bearer {}".format(API_KEY),
????"Accept": "application/json",
}
??
payload = {
??"messages": [
????{
????????"content": "This scenery<img src="image/png;base64,{}\"" data-mce-src="image/png;base64,{}\"">".format(encoded_string.decode('UTF-8')),
????????"role": "user"
????}
??],
??"bounding_boxes": True,
??"temperature": 0.2,
??"top_p": 0.7,
??"max_tokens": 1024
}
??
# re-use connections
session = requests.Session()
??
response = session.post(invoke_url, headers=headers, json=payload)
??
while response.status_code == 202:
????request_id = response.headers.get("NVCF-REQID")
????fetch_url = fetch_url_format + request_id
????response = session.get(fetch_url, headers=headers)
??
response.raise_for_status()
response_body = response.json()
??
response_body

例如，在 Kosmos-2 中，表達式以 Markdown 格式表示為鏈接：(邊界框)。邊界框表示為坐標序列。此 API 以如下所示的格式返回響應。它包括輸出文本、與完成中的短語對應的邊界框坐標，以及一些其他元數據。

{'id': 'cfbda798-7567-4409-ba55-6ba9a10294fb',
'choices': [{'index': 0,
???'message': {'role': 'assistant',
????'content': 'is a fantasy landscape with a tree and a temple by the lake',
????'entities': [{'phrase': 'a tree',
??????'bboxes': [[0.359375, 0.015625, 0.765625, 0.796875]]},
?????{'phrase': 'a temple',
??????'bboxes': [[0.078125, 0.421875, 0.234375, 0.890625]]},
?????{'phrase': 'the lake',
??????'bboxes': [[0.203125, 0.765625, 0.828125, 0.953125]]}]},
???'finish_reason': 'stop'}],
'usage': {'completion_tokens': 32, 'prompt_tokens': 70, 'total_tokens': 102}}

根據圖像和提示，Kosmos-2 API 生成的標題為“This scene”(此風景)，“is a fantasy scape with a tree and a Temple by the lake”(是一個湖邊樹木和寺廟的奇幻風景)。它還生成了邊界框坐標。

第 4 步：可視化輸出

如果您將圖像視為在 X-Y 平面上布局的 2D 網格，則邊界框坐標定義為[Top X，Top Y]和[Bottom X，Bottom Y]。這些坐標分別表示為相對于圖像總寬和總高的百分比。為了實現可視化，請在圖像上渲染它們：

from PIL import Image, ImageDraw, ImageFont
from IPython.display import display
??
# Extract the entities from the message body.
# This includes a list of phrases and corresponding bounding boxes
data = response_body['choices'][0]['message']['entities']
??
# Load your image
image = Image.open(os.path.join(os.getcwd(), "scenery.png"))
draw = ImageDraw.Draw(image)
width, height = image.size
?
# Set font type and size for phrases
font = ImageFont.load_default().font_variant(size=10)
??
def get_textbox_dim(text_string, font):
????'''
??????Calculate the width and height of a text string with a given font.
????'''
????_, descent = font.getmetrics()
????text_width = font.getmask(text_string).getbbox()[2]
????text_height = font.getmask(text_string).getbbox()[3] + descent
?????
????return (text_width, text_height)
??
# Iterate through all entities, and draw each phrase & box
for item in data:
????phrase = item['phrase']
????for bbox in item['bboxes']:
????????# Convert percentages to pixel coordinates
????????x1, y1, x2, y2 = bbox
????????x1, x2 = x1 * width, x2 * width
????????y1, y2 = y1 * height, y2 * height
??
????????# Draw the bounding box for entities
????????draw.rectangle([(x1, y1), (x2, y2)], outline="red", width=2)
??
????????# Determine size of the text for background
????????text_size = get_textbox_dim(phrase, font)
?????????
????????# Draw text background
????????draw.rectangle([x1, y1, x1 + text_size[0], y1 + text_size[1]], fill="black")
??
????????# Draw the phrase
????????draw.text((x1, y1), phrase, fill="white", font=font)
?????
# Display the image in Jupyter Notebook
display(image)

圖 6 展示了 Kosmos-2 輸出的可視化效果，給定的圖像和提示“This scene”。邊界框突出顯示完成時的實體：“is a fantasy scene with樹和寺廟作者湖面“。此示例展示了 Kosmos-2 在視覺方面描述圖像和特定于鏈接的短語的出色能力。

The figure illustrates how Kosmos-2 can link certain phrases in generated text, to sections in the image. In this case, Kosmos-2 successfully identified and linked key elements in the image: a tree, a temple, and the lake. — *圖 6.Kosmos-2 輸出的可視化邊界框*

同樣，通過在提示中使用 <phrase>和 </phrase>標記來包圍特定短語，您可以指示 Kosmos-2 專注于這些短語，并在理解或問答任務中關聯這些短語。視覺問答的示例提示可能是 “，<grounding>Question：What color the <phrase>leafs on the tree </phrase>？Answer：” (<grounding>問題：<phrase>在樹 </phrase>上留下什么顏色？答案：)，模型對其響應為 “red” (紅色)。

開始使用

讓模型在任何 GPU 或 CPU 上工作，NVIDIA Triton 推理服務器是一款開源軟件，可在每個工作負載中標準化 AI 模型部署和執行。Triton 是 NVIDIA AI 平臺的一部分，NVIDIA AI Enterprise是一個端到端 AI 運行時軟件平臺，旨在加速數據科學流程并簡化生產級生成式 AI 應用的開發和部署。

NVIDIA AI Enterprise 提供安全性、支持、穩定性和可管理性，以提高 AI 團隊的生產力，降低 AI 基礎架構的總成本，并確保從 POC 到生產的平穩過渡。當 AI 模型準備好部署以用于業務運營時，安全性、可靠性和企業級支持至關重要。

通過用戶界面或 API 試用 Kosmos-2 和 SeamlessM4T 模型。如果這些模型適合您的應用，請使用 NVIDIA TensorRT-LLM 進行優化。

如果您正在構建企業應用程序，請注冊 NVIDIA AI Enterprise 試用版，以獲得將應用程序投入生產所需的支持。

使用新的 NVIDIA AI 基礎模型生成代碼、回答查詢并翻譯文本

Code Lama 70B

Kosmos-2

SeamlessM4T

Kosmos-2 用戶界面

Kosmos-2 API

第 1 步：獲取 NGC 目錄 API 密鑰

第 2 步：以 Base64 格式對圖像進行編碼

第 3 步：發送推理請求

第 4 步：可視化輸出

開始使用

相關資源

標簽

關于作者

使用新的 NVIDIA AI 基礎模型生成代碼、回答查詢并翻譯文本

Code Lama 70B

Kosmos-2

SeamlessM4T

Kosmos-2 用戶界面

Kosmos-2 API

第 1 步：獲取 NGC 目錄 API 密鑰

第 2 步：以 Base64 格式對圖像進行編碼

第 3 步：發送推理請求

第 4 步：可視化輸出

開始使用

相關資源

標簽

關于作者

相關文章

Llama 3.2 加速部署從邊緣到云端實現提速

Mistral Large 和 Mixtral 8x22B LLM 現已由 NVIDIA NIM 和 NVIDIA API 提供支持

相關文章

借助加速網絡實現數據中心現代化

使用現已推出 Beta 版的 NVIDIA AI Workbench 來創建、共享和擴展企業 AI 工作流程

借助 NVIDIA RTX 系統為 Windows PC 上的 LLM 應用程序提供強效助力

使用 NVIDIA TensorRT 加速的全新 Stable Diffusion 模型

2023 年最熱門的 NVIDIA 技術博客文章：生成式 AI、LLM、機器人開發和虛擬世界的突破