Vibe Coding - Add a field to a Qdrant collection

I have created a Qdrant collection off a repository of mine and I had the following payload fields:
file_name
- the source code file name (path actually)branch
- the git repository branch that I have indexedrepository_name
- the git repository namechunk
- the text chunk
I forgot to add language
to specify the programming language for the file/chunk. So, what better way to do it than vibe coding?
I fired up my claude client and entered:
knowing the collection has a field "file_name" containing the file name, change the above function to add a field named "language" with the programming language extracted from the file name. The languages recognied should be java, javascript, typescript, markdown, html, xml, json. For all other languages, default to "unknown"
And I got
from qdrant_client import QdrantClient
from qdrant_client.http import models
import os
import re
# Initialize the Qdrant client
# Replace with your connection details if needed
client = QdrantClient(host="localhost", port=6333)
# Name of the collection to update
collection_name = "your_collection_name"
def detect_language_from_filename(filename):
"""
Detect programming language from file name based on extension.
Args:
filename (str): Name of the file
Returns:
str: Detected language or "unknown"
"""
# Remove path if present and get extension
_, ext = os.path.splitext(filename)
ext = ext.lower().lstrip('.')
# Map extensions to languages
language_map = {
'java': 'java',
'js': 'javascript',
'jsx': 'javascript',
'ts': 'typescript',
'tsx': 'typescript',
'md': 'markdown',
'markdown': 'markdown',
'html': 'html',
'htm': 'html',
'xml': 'xml',
'json': 'json'
}
return language_map.get(ext, "unknown")
def add_language_field_to_collection(collection_name):
"""
Add a 'language' field to all points in a Qdrant collection,
based on the existing 'file_name' field.
Args:
collection_name (str): Name of the collection to update
"""
# Get the number of points in the collection
collection_info = client.get_collection(collection_name=collection_name)
points_count = collection_info.points_count
# Set batch size for processing
batch_size = 100
print(f"Adding 'language' field to {points_count} points in collection '{collection_name}'")
# Process in batches to avoid memory issues with large collections
for offset in range(0, points_count, batch_size):
# Get a batch of points with their file_name payload
search_result = client.scroll(
collection_name=collection_name,
limit=batch_size,
offset=offset,
with_payload=["file_name"]
)
points = search_result[0]
if not points:
break
# Process each point to determine language and prepare updates
for point in points:
point_id = point.id
# Extract file_name from payload
if "file_name" in point.payload:
file_name = point.payload["file_name"]
language = detect_language_from_filename(file_name)
# Update the point with the language field
client.set_payload(
collection_name=collection_name,
payload={"language": language},
points=[point_id]
)
print(f"Updated {len(points)} points (offset: {offset})")
print(f"Field 'language' successfully added to all points in collection '{collection_name}'")
# Execute the update
add_language_field_to_collection(collection_name)
The detect_language_from_filename()
function looks at the filename's extension and returns the programming language for recognised extension.
The add_language_field_to_collection()
function iterates through all points in the collection and adds the language
field to the payload.
Note: The code just adds the field, and it could be extended to add an index for the new field. I didn't do it because I'm still testing to see how useful the RAG database is in my use case.
I verified the code and tested it on a dummy collection, so it works. 😄
HTH,