???? ?? RAG ????? ??

Reading Time: 5 minutes

??? ???? AI ?? ?????? ???? ???? ?? ?? ??? ???? ???? ????? ?? ??? ??????. ???? ?? ?? ?? ????? ??? ?? ?? ??? ???? ??? ??? ? ? ??? ?????? ??? ??? ???? ?? ??????. ?? ?? ?? ???? ??? ?? ??? ?????? ??? ???? ??? ???? ??? ??? ?? ? ????.

?? ???? ?? ?? ??(RAG) ?????? ????? ? ??? ??? ??, ?? ?? ?? ?? ??(LLM)? ?? ???? ?? ???? ??? ??? ? ??? ?????. ??? ??? RAG ?????? ?? ????? ???? ? ? ?? ??? ??? ?? ??? ???? ??? ???? ???? ????? ???? ???? ? ? ?? ?????.

? ???? NVIDIA NeMo Retriever ??? NIM? ?????. ? ???? ???? ??? ?? ?? 16? ???? ???? Mistral-7B? ?? ??? LoRA ??? ????? ??????. ??? ??? ??? ??? ??? ?? ???? ????, ?? ?? ??? ?? ?? ?? ??? ?? ?????.

????? ??????

???? LLM? ?? ?? ?? ??? ???? ?? ??? ???? ??? ? ???? ??? ?????.

???? BM25 ?? ?? ??? ??? ?? ??? ?? ?? ??? ???? ?? ?? ?? ?? ??? ?????. ?? ?? ??? ?? ???? ??? ? ?? ?? ???? ???? ???? LLM? ?????. LLM? ??? ??? ???? ?? ???? ?? ??? ????? ?? ? ??? ??? ???????.

? ????? ??? ??? ??? ?? ??? ??? ??? ??? ?????? ?? ??? ??? ?? ??????. ???? ????? ?? ?? ?? ??? ?? ? ?? ??? ????, ?? ???? ?? ??? ????? ????? ???. ?? ?? ??? ??? ??? ??? ? ?? ?? ??? RAG ?????? ???? ?? ??? ?? ????? ????? ????? ? ?? ????.

??? ??? ?? ?? ???????? ??? NVIDIA NeMo Retriever ???? ?????? NVIDIA API ????? ?????.

???? ?? ??

? ????? ??? ????? ?? ???? ?? LLM ?? ?????? ?? ?? ??? ?????:

LangChain
NVIDIA AI ????? ?????
Vector store

??

????? NVIDIA API ?????? ?? ??? ???? ?? ??? ????:

??? ?????.
Python, API ? ??? ?????.
??? ?? NVIDIA_API_KEY? ?????.
?? ?????? ???? ? ????.

?? LangChain, NVIDIA AI ?????, FAISS? ?????:

pip install langchain
pip install langchain_nvidia_ai_endpoints
pip install faiss-gpu

?? ?? ????

? ????? ???? LLM? ?? ?? NVIDIA ??? VILA: ?? ?? ??? ?? ?? ??? ?????. ? ???? ?? ???? ? ?? PDF? ?????, ??? ?? ???? ?? ??? ??? ? ????.

from langchain_community.document_loaders import PyPDFLoader
 
document = PyPDFLoader("2312.07533v4.pdf").load()

??? ??

???? ??? ??? ??? ?????.

TextSplitter? chunk_size ????? ?????. RAG ?????? ?? ??? ??? ?? ???? ??? ??? ????? ?? ? ?? ?? ??? ??? ?? ??? ???? ?? RAG ??? ?? ?????. ?? ????? ????? ?? ??? ?? ?? ???? ?? ??? ?????.

?? ????(??? ??? ??? ??)? LLM? ???? ?? ??? ???. ?? ??? ?? ?? ???? ?? ?? ?? ??? ??? ????. ??? ?? ??? ??? ??, ???? ?? LLM? ?? 100~600????? ???.

from langchain_text_splitters import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=200)
texts = text_splitter.split_documents(document)

??? ????

????, NVIDIA AI ????? ?????? ???? ???? ???? ??? ?? ??? ? ??? /embed ????? ???? ?? ???? ???? ?????.

? ???? ??? ??? ???? ??? ?? ? ?????? ?? ?????? FAISS? ?????. ???? ?? ??? ?? ??? ???? ????? ???? ???, RAM? ?? ?? ? ?? ???? ??? ? ????.

from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings
from langchain_community.vectorstores import FAISS
 
embeddings = NVIDIAEmbeddings()
db = FAISS.from_documents(texts, embeddings)

?? retriever ???

?? ??? ???? ?? ???? ??? ??? ?? ???? ?? ??? ?????. ? ??? ??? ?? ????? ???? ??? ?? ???? ?? 45?? ??? ?????:

retriever = db.as_retriever(search_kwargs={"k": 45})
 
query = "Where is the A100 GPU used?"
docs = retriever.invoke(query)

??? ?? ??

?? NeMo Retriever ??? NIM? ???? ??? ??? ?????. ??? ??? ??? ??? ?? ? ?? ??? ???? ?? ?? ??? ???? ? ???? GPU ?? ?????. ??? ??? ???? ?? ???? ?? ??? ?? ??? ??? ??? ??? ?? ????.

NIM? LangChain ?? ?? ???? ???? ????, ?? ????? ?? ??? ???? ???? ? ?????? ??? ?????.

from langchain_nvidia_ai_endpoints import NVIDIARerank
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
 
reranker = NVIDIARerank()
compression_retriever = ContextualCompressionRetriever(
    base_compressor=reranker, base_retriever=retriever
)
 
reranked_chunks = compression_retriever.compress_documents(query)

??? NIM? ?? ???? ?? ??? ?? ?? ?? ???? ??? ??? ???? ????, ? ???? A100 GPU? ???? ????:

Table 10. The SFT blend we used during the ablation study.
 
B. Training Cost
 
We perform training on 16 A100 GPU nodes, each node
 
has 8 GPUs. The training hours for each stage of the 7B
 
model are: projector initialization: 4 hours; visual language
 
pre-training: 30 hours; visual instruction-tuning: 6 hours.
 
The training corresponds to a total of 5.1k GPU hours. Most
 
of the computation is spent on the pre-training stage.
 
We have not performed training throughput optimizations
 
like sample packing [ 32] or sample length clustering. We
 
believe we can reduce at least 30% of the training time with
 
proper optimization. We also notice that the training time is
 
much longer as we used a high image resolution of 336 ×336
 
(corresponding to 576 tokens/image). We should be able to

?? ??? ??? ?? ??

?? ??? ??? ?? ???? ????? ? ???, ???? ???? RAG ??????? ?? ??? ??? ??? ? ????.

?? ?? ?? ??? ???? BM25 ???? ???? ??? ?????? ??? ?????. ? ???? ????? ???? ?? ???? ???? ??? ???? ??? ?????. ??? ???? ???? ???? ?? ?? ?? ???? ??? ?????.

?? ?? ??? ?? ??? ?? ??? BM25 ??? ??? ????. combined_docs? ??? ??? NIM? ?? ???? ???? ?? ?????.

all_docs = docs + bm25_docs
 
reranker.top_n = 5
 
combined_docs = reranker.compress_documents(query=query, documents=all_docs)

BM25 ??? ??? ??? ??? ??? /langchain-ai/langchain-nvidia GitHub ?????? ?? ???? ?????.

RAG ?????? ??

???? ????? ???? ? ??? RAG ?????? ???? ?? ??? ???? ? ?? ???? ?? ??? ????? ???? ??? ?? ???? ? ????.

? ?? ?? ??? compression_retriever ??? RAG ?????? ?????.

from langchain.chains import RetrievalQA
from langchain_nvidia_ai_endpoints import ChatNVIDIA
 
chain = RetrievalQA.from_chain_type(
    llm=ChatNVIDIA(temperature=0), retriever=compression_retriever
)
result = chain({"query": query})
print(result.get("result"))

?? RAG ?????? ??? ??? ??? ???? ?? ????? ?????:

The A100 GPU is used for training the 7B model in the supervised 
fine-tuning/instruction tuning ablation study. The training is 
performed on 16 A100 GPU nodes, with each node having 8 GPUs. The 
training hours for each stage of the 7B model are: projector 
initialization: 4 hours; visual language pre-training: 30 hours; 
and visual instruction-tuning: 6 hours. The total training time 
corresponds to 5.1k GPU hours, with most of the computation being 
spent on the pre-training stage. The training time could potentially 
be reduced by at least 30% with proper optimization. The high image 
resolution of 336 ×336 used in the training corresponds to 576 
tokens/image.

??

RAG? LLM? ??? ?? ??? ??? ??? ??? ?? ???? ??????. ??? ?? ??? ???? RAG ??? ????? ??? ? ?? ??? ?? ??? ???? ?? ??? ????? ?? ??? ?????? ??????? ?????.

LLM? ?? ???? ?? RAG? ??? ???? ??? ??? ??? ???? ??? ? ?? ???? ??? ???? ???? ? ?? ? ??? ??? ? ?? ?????.

?? RAG ?????? ??? ?? ?? ???? ?? ??? ????? ??? ???? ??? ?? LLM? ???? ?? ??? ??? ??? ???? ???? ?? ?????. ??? ?? ?? LLM? ??? ??? ??? ?? ????. RAG ??? ????? ?? ??? ????? ??? ???? ??? ??? ?????.

?? ?? ? ??? ?? ??? ??? NVIDIA AI LangChain ?????? ?????.

???? ?? RAG ????? ??

????? ??????

???? ?? ??

??

?? ?? ????

??? ??

??? ????

?? retriever ???

??? ?? ??

?? ??? ??? ?? ??

RAG ?????? ??

??

Tags

??? ??

??

Related posts

NVIDIA NeMo ???????? ???? ??? ????? AI ???? ????

???? ?? ???? ???? NVIDIA Llama Nemotron Ultra ?? ??

AI ????? ?? 48% ????? NVIDIA Spectrum-X ???? ???

?? ??? 2? ?? NVIDIA GH200 ???, Llama ???? ??? ?????? ?? ???

??????? ???? ?? ?? ?????? ???? NVIDIA NIM Agent Blueprint