We have now all data available in the right places:

  1. Select a Vector DB
  2. Build a RAG System
  3. Generate Schema-compliant Embeddings
  4. Build embeddings and insert into DB

I have indexed a research paper and can ask the system a question. The process is somewhat simple:

  1. Define the question to ask
  2. Build a vector off it and query the database to get the relevant chunks
  3. Embed the chunks in the prompt as available information
  4. Ask the question to the full LLM (just with the developed prompt)

At the end, I expect to have an answer based on this obtained data.

Setting up the question

The paper is about environment-based research, in particular simulating scenarios for potential futures. My question is therefore quite straight forward:

QUESTION = "What is the practical use of the research?"

I have the system prompt defined as follows:

SYSTEM_PROMPT = """
  Human: You are an AI assistant. You are able to find answers to the questions from the contextual passage snippets provided.
"""

Generate the embedding and query the database

I already have a function defined before called emb_text(), and I just call it:

from embeddings import emb_text

question_embedding = emb_text(QUESTION)

Then, we perform the query in our database:

from milvus_init import get_client

# Search!
client = get_client()

search_result = client.search(
    collection_name=COLLECTION_NAME,
    data = [question_embedding],
    limit=7,
    search_params={"metric_type": "IP", "params": {}},  # Inner product distance
    output_fields=["text", "title"],  # Return the text field
)

We limit the number of results to 7. You can play with the size as it depends on how much info you want to retrieve.

For further investigation, I'm saving the search result into a file:

# Save the question-based embeddings
#
pathlib.Path("embed-question.json").write_text(json.dumps(search_result))
retrieved_lines_with_distances = [
    (res["entity"]["text"]["text"], res["distance"]) for res in search_result[0]
]
pathlib.Path("embed-question-text.json").write_text(json.dumps(retrieved_lines_with_distances, indent=4))

The file looks something like this:

[
    [
        "Name: Dr. Laura Oana Petrov and Prof. Dr. Nobukazu Nakagoshi\nInstitution: Graduate School for International Development and Cooperation Division of\nDevelopment Science, Hiroshima University, Higashi\u0096Hiroshima, Japan\nContact address: [email protected]\nFig. 1. The location of study area\nFig. 2. Land use maps of the study area, between 1968-1995\n13\n\n13\n\n6 2 2\n4\n3\n2\n\n3\n16\nLegend\n63\n6\nResident\nScho\n65\n2 6 Commerc\nIndustri\n66\nRoa\n2\nOpen\n1968 Par\nRecreati\n1988 Villa\nOth\n1995 Constructi\nPaddy\nBefore paddy\nOrcha\nGrassla\nFore\nWat\n\n\n\n\nFig. 3. Change of land use proportions in the greenbelt of\nHiroshima City,between 1968-1995\nFig. 4. The dynamics of land use in Hiroshima greenbelt,\nbetween 1968-1988\nFig. 5. The dynamics of land use in Hiroshima greenbelt,\nbetween 1988-1995",
        222.7347869873047
    ],
    ...
]

where the first component is the chunk retrieved and the second one is the score (the higher, the more relevant).

Build the prompt

Once we have the results, we can build the prompt. We have a context we need to build (with the data retrieved above), and the already defined question. The context is:

# Build the context for the prompt
context = "\n".join(
    [line_with_distance[0] for line_with_distance in retrieved_lines_with_distances]
)

The user prompt I have defined is:

USER_PROMPT = f"""
Use the following pieces of information enclosed in <context> tags to provide an answer to the question enclosed in <question> tags.
<context>
{context}
</context>
<question>
{QUESTION}
</question>
"""

Ask the question

Now we have the prompt which contains all information we need. We can ask the question to the system. I'm using Ollama with LLAMA3:latest (you can change it to other models like Mistral). The code is quite straight forward:

# Get it into the chat
from langchain_community.chat_models import ChatOllama
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

llm = ChatOllama(model="llama3:latest", base_url="http://10.0.0.35:11434")
prompt = ChatPromptTemplate.from_messages(messages=[("system", SYSTEM_PROMPT),("human", user_prompt)])

chain = prompt | llm | StrOutputParser()
print("-------------------------")
print(chain.invoke({}))
print("-------------------------")

The result looks like this:

Based on the provided context, the practical use of the research appears to be the application of greenbelt planning concept to manage and sustainably develop urban-rural landscapes. The study aims to provide a better understanding of historical landscape changes and their possible effects on present-day landscape changes, with a focus on finding sustainable land-use patterns.

The research highlights the importance of monitoring land-use changes and relating them to national policies, regional developments, and agricultural decision-making. By analyzing the dynamics of land-use change in Hiroshima's greenbelt, the study aims to contribute to the development of effective spatial policies that balance urban and rural landscape needs.

Practically, the findings can be used to inform policymakers, planners, and stakeholders involved in urban-rural development, helping them make more informed decisions about land-use management. The research can also serve as a model for other cities facing similar challenges, providing insights into how greenbelt planning can be effectively implemented to promote sustainable development.

This result does look OK to me, but I guess it's up to you to read the research paper in question and see if it's relevant.

Note: The research paper I tried the system with is:

Laura Oana Petrov and Nobukazu Nakagoshi
The Use of GIS for Assessing Sustainable Development of Urban Regions in Japan
The Case Study, Hiroshima City

Is it any good?

Well, for the one document (as a toy project) is good enough ™. The more important aspect is that it's fast, runs locally on your machine (if you have a Nvidia GPU), and allows you to play easily and fine tune the RAG and the prompt aspects.