??? ???? AI ?? ?????? ???? ???? ?? ?? ??? ???? ???? ????? ?? ??? ??????. ???? ?? ?? ?? ????? ??? ?? ?? ??? ???? ??? ??? ? ? ??? ?????? ??? ??? ???? ?? ??????. ?? ?? ?? ???? ??? ?? ??? ?????? ??? ???? ??? ???? ??? ??? ?? ? ????.
?? ???? ?? ?? ??(RAG) ?????? ????? ? ??? ??? ??, ?? ?? ?? ?? ??(LLM)? ?? ???? ?? ???? ??? ??? ? ??? ?????. ??? ??? RAG ?????? ?? ????? ???? ? ? ?? ??? ??? ?? ??? ???? ??? ???? ???? ????? ???? ???? ? ? ?? ?????.
? ???? NVIDIA NeMo Retriever ??? NIM? ?????. ? ???? ???? ??? ?? ?? 16? ???? ???? Mistral-7B? ?? ??? LoRA ??? ????? ??????. ??? ??? ??? ??? ??? ?? ???? ????, ?? ?? ??? ?? ?? ?? ??? ?? ?????.
????? ??????
???? LLM? ?? ?? ?? ??? ???? ?? ??? ???? ??? ? ???? ??? ?????.
???? BM25 ?? ?? ??? ??? ?? ??? ?? ?? ??? ???? ?? ?? ?? ?? ??? ?????. ?? ?? ??? ?? ???? ??? ? ?? ?? ???? ???? ???? LLM? ?????. LLM? ??? ??? ???? ?? ???? ?? ??? ????? ?? ? ??? ??? ???????.
? ????? ??? ??? ??? ?? ??? ??? ??? ??? ?????? ?? ??? ??? ?? ??????. ???? ????? ?? ?? ?? ??? ?? ? ?? ??? ????, ?? ???? ?? ??? ????? ????? ???. ?? ?? ??? ??? ??? ??? ? ?? ?? ??? RAG ?????? ???? ?? ??? ?? ????? ????? ????? ? ?? ????.
??? ??? ?? ?? ???????? ??? NVIDIA NeMo Retriever ???? ?????? NVIDIA API ????? ?????.
???? ?? ??
? ????? ??? ????? ?? ???? ?? LLM ?? ?????? ?? ?? ??? ?????:
??
????? NVIDIA API ?????? ?? ??? ???? ?? ??? ????:
??? ?????.
Python, API ? ??? ?????.
??? ?? NVIDIA_API_KEY
? ?????.
?? ?????? ???? ? ????.
?? LangChain, NVIDIA AI ?????, FAISS? ?????:
pip install langchain
pip install langchain_nvidia_ai_endpoints
pip install faiss-gpu
?? ?? ????
? ????? ???? LLM? ?? ?? NVIDIA ??? VILA: ?? ?? ??? ?? ?? ??? ?????. ? ???? ?? ???? ? ?? PDF? ?????, ??? ?? ???? ?? ??? ??? ? ????.
from langchain_community.document_loaders import PyPDFLoader
document = PyPDFLoader("2312.07533v4.pdf").load()
??? ??
???? ??? ??? ??? ?????.
TextSplitter? chunk_size ????? ?????. RAG ?????? ?? ??? ??? ?? ???? ??? ??? ????? ?? ? ?? ?? ??? ??? ?? ??? ???? ?? RAG ??? ?? ?????. ?? ????? ????? ?? ??? ?? ?? ???? ?? ??? ?????.
?? ????(??? ??? ??? ??)? LLM? ???? ?? ??? ???. ?? ??? ?? ?? ???? ?? ?? ?? ??? ??? ????. ??? ?? ??? ??? ??, ???? ?? LLM? ?? 100~600????? ???.
from langchain_text_splitters import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=200)
texts = text_splitter.split_documents(document)

??? ????
????, NVIDIA AI ????? ?????? ???? ???? ???? ??? ?? ??? ? ??? /embed ????? ???? ?? ???? ???? ?????.
? ???? ??? ??? ???? ??? ?? ? ?????? ?? ?????? FAISS? ?????. ???? ?? ??? ?? ??? ???? ????? ???? ???, RAM? ?? ?? ? ?? ???? ??? ? ????.
from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings
from langchain_community.vectorstores import FAISS
embeddings = NVIDIAEmbeddings()
db = FAISS.from_documents(texts, embeddings)
?? retriever ???
?? ??? ???? ?? ???? ??? ??? ?? ???? ?? ??? ?????. ? ??? ??? ?? ????? ???? ??? ?? ???? ?? 45?? ??? ?????:
retriever = db.as_retriever(search_kwargs={"k": 45})
query = "Where is the A100 GPU used?"
docs = retriever.invoke(query)
??? ?? ??
?? NeMo Retriever ??? NIM? ???? ??? ??? ?????. ??? ??? ??? ??? ?? ? ?? ??? ???? ?? ?? ??? ???? ? ???? GPU ?? ?????. ??? ??? ???? ?? ???? ?? ??? ?? ??? ??? ??? ??? ?? ????.
NIM? LangChain ?? ?? ???? ???? ????, ?? ????? ?? ??? ???? ???? ? ?????? ??? ?????.
from langchain_nvidia_ai_endpoints import NVIDIARerank
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
reranker = NVIDIARerank()
compression_retriever = ContextualCompressionRetriever(
base_compressor=reranker, base_retriever=retriever
)
reranked_chunks = compression_retriever.compress_documents(query)
??? NIM? ?? ???? ?? ??? ?? ?? ?? ???? ??? ??? ???? ????, ? ???? A100 GPU? ???? ????:
Table 10. The SFT blend we used during the ablation study.
B. Training Cost
We perform training on 16 A100 GPU nodes, each node
has 8 GPUs. The training hours for each stage of the 7B
model are: projector initialization: 4 hours; visual language
pre-training: 30 hours; visual instruction-tuning: 6 hours.
The training corresponds to a total of 5.1k GPU hours. Most
of the computation is spent on the pre-training stage.
We have not performed training throughput optimizations
like sample packing [ 32] or sample length clustering. We
believe we can reduce at least 30% of the training time with
proper optimization. We also notice that the training time is
much longer as we used a high image resolution of 336 ×336
(corresponding to 576 tokens/image). We should be able to
?? ??? ??? ?? ??
?? ??? ??? ?? ???? ????? ? ???, ???? ???? RAG ??????? ?? ??? ??? ??? ? ????.
?? ?? ?? ??? ???? BM25 ???? ???? ??? ?????? ??? ?????. ? ???? ????? ???? ?? ???? ???? ??? ???? ??? ?????. ??? ???? ???? ???? ?? ?? ?? ???? ??? ?????.
?? ?? ??? ?? ??? ?? ??? BM25 ??? ??? ????. combined_docs
? ??? ??? NIM? ?? ???? ???? ?? ?????.
all_docs = docs + bm25_docs
reranker.top_n = 5
combined_docs = reranker.compress_documents(query=query, documents=all_docs)
BM25 ??? ??? ??? ??? ??? /langchain-ai/langchain-nvidia GitHub ?????? ?? ???? ?????.
RAG ?????? ??
???? ????? ???? ? ??? RAG ?????? ???? ?? ??? ???? ? ?? ???? ?? ??? ????? ???? ??? ?? ???? ? ????.

? ?? ?? ??? compression_retriever ??? RAG ?????? ?????.
from langchain.chains import RetrievalQA
from langchain_nvidia_ai_endpoints import ChatNVIDIA
chain = RetrievalQA.from_chain_type(
llm=ChatNVIDIA(temperature=0), retriever=compression_retriever
)
result = chain({"query": query})
print(result.get("result"))
?? RAG ?????? ??? ??? ??? ???? ?? ????? ?????:
The A100 GPU is used for training the 7B model in the supervised
fine-tuning/instruction tuning ablation study. The training is
performed on 16 A100 GPU nodes, with each node having 8 GPUs. The
training hours for each stage of the 7B model are: projector
initialization: 4 hours; visual language pre-training: 30 hours;
and visual instruction-tuning: 6 hours. The total training time
corresponds to 5.1k GPU hours, with most of the computation being
spent on the pre-training stage. The training time could potentially
be reduced by at least 30% with proper optimization. The high image
resolution of 336 ×336 used in the training corresponds to 576
tokens/image.
??
RAG? LLM? ??? ?? ??? ??? ??? ??? ?? ???? ??????. ??? ?? ??? ???? RAG ??? ????? ??? ? ?? ??? ?? ??? ???? ?? ??? ????? ?? ??? ?????? ??????? ?????.
LLM? ?? ???? ?? RAG? ??? ???? ??? ??? ??? ???? ??? ? ?? ???? ??? ???? ???? ? ?? ? ??? ??? ? ?? ?????.
?? RAG ?????? ??? ?? ?? ???? ?? ??? ????? ??? ???? ??? ?? LLM? ???? ?? ??? ??? ??? ???? ???? ?? ?????. ??? ?? ?? LLM? ??? ??? ??? ?? ????. RAG ??? ????? ?? ??? ????? ??? ???? ??? ??? ?????.
?? ?? ? ??? ?? ??? ??? NVIDIA AI LangChain ?????? ?????.