Bitbucket - Get Repository Information

In the previous post, we wrote the helper functions, which will allow us to get multi-page responses from the Bitbucket server. Now, we can use them to get various repository information.
Prerequisites
As in the previous post, we need the requests
python package. Once we have that, we can initialise the repository data:
BITBUCKET_URL = "https://bitbucket_repository/rest/api"
BITBUCKET_API_1_0 = "1.0"
BITBUCKET_API_2_0 = "2.0"
REPO_OWNER = "repo_owner"
REPO_SLUG = "your-repository-name"
ACCESS_TOKEN = "my_access_token_here"
HEADERS = {
"Authorization": f"Bearer {ACCESS_TOKEN}"
}
Note: You need to replace the BITBUCKET_URL
and ACCESS_TOKEN
with your own values, because I'm not going to give you mine :)
We can also make use of the repository owner and the REPO_SLUG
to point towards a repository for testing.
Get Repositories in a Project
As I want to go through all repositories, I wrote a function to list all repositories in a project:
def get_repositories_in_project(project:str):
project_url = f"{BITBUCKET_URL}/{BITBUCKET_API_1_0}/projects/{project}/repos"
projects = get_all_pages(project_url)
print (f"Number of repos in [{project}] =", len(projects))
list_of_project_names = []
for i in projects:
list_of_project_names.append(i['slug'])
list_of_project_names.sort()
with open("projects.json", "wt") as f:
f.write(json.dumps(list_of_project_names, indent=2))
return list_of_project_names
Once we get the list, we can process them one-by-one.
Get branches
First step in processing a repository is to determine which branch to scan. If you have consistent naming and you know that all repositories have -say- main
and develop
branches, you can just pick the most relevant. The only thing to note is that the script, as it's written, will pick up the same branch for all repositories (e.g. develop
). You can do a list (e.g. a JSON dict) manually and integrate that in your script, so it picks a specific branch for each repository. It's up to you :)
I have a helper function to list all branches:
def get_all_branches(project_key, repo_slug):
url_branch = f"{BITBUCKET_URL}/latest/projects/{project_key}/repos/{repo_slug}/branches"
# Get the default branch to list files
return get_all_pages(url_branch)
I then have a simple heuristic to get the name of the branch to retrieve:
def calculate_main_branch(project_key, repo_slug, default_branch:str="develop"):
# Endpoint to get the main branch
url_branch = f"{BITBUCKET_URL}/latest/projects/{project_key}/repos/{repo_slug}/branches"
print(url_branch)
# Get the default branch to list files
branches_info = get_all_pages(url_branch)
main_branch = None
print(f"Project has {len(branches_info)} branches")
# Assuming the first branch is the main branch; you might need to adjust if your main branch has a different name
if branches_info is not None:
dict_of_branches = array_to_dict(branches_info, "displayId")
main_branch = has_branch(dict_of_branches, "main")
if main_branch is None:
main_branch = has_branch(dict_of_branches, "master")
if not main_branch:
print("Main branch not found.")
return None
else:
main_branch = main_branch["displayId"]
# Check if we have a preferred branch
if default_branch is not None:
for branch in branches_info:
if branch['displayId'] == default_branch:
main_branch = default_branch
return main_branch
It practically checks the repository is valid (has at least a branch). If that's true, it looks for the default_branch
. If it finds it, then it picks it as the branch to scan. Otherwise, it'll retrieve either main
or master
.
In the next blog entry we will scan a repository.
HTH,
PS: This is part of the RAG with Continue.dev series.