Bitbucket - Get Repository Information

Bitbucket - Get Repository Information

In the previous post, we wrote the helper functions, which will allow us to get multi-page responses from the Bitbucket server. Now, we can use them to get various repository information.

Prerequisites

As in the previous post, we need the requests python package. Once we have that, we can initialise the repository data:

BITBUCKET_URL = "https://bitbucket_repository/rest/api"
BITBUCKET_API_1_0 = "1.0"
BITBUCKET_API_2_0 = "2.0"
REPO_OWNER = "repo_owner"
REPO_SLUG = "your-repository-name"
ACCESS_TOKEN = "my_access_token_here"

HEADERS = {
    "Authorization": f"Bearer {ACCESS_TOKEN}"
}

Note: You need to replace the BITBUCKET_URL and ACCESS_TOKEN with your own values, because I'm not going to give you mine :)

We can also make use of the repository owner and the REPO_SLUG to point towards a repository for testing.

Get Repositories in a Project

As I want to go through all repositories, I wrote a function to list all repositories in a project:

def get_repositories_in_project(project:str):
    project_url = f"{BITBUCKET_URL}/{BITBUCKET_API_1_0}/projects/{project}/repos"

    projects = get_all_pages(project_url)

    print (f"Number of repos in [{project}] =", len(projects))

    list_of_project_names = []
    for i in projects:
        list_of_project_names.append(i['slug'])
    list_of_project_names.sort()

    with open("projects.json", "wt") as f:
        f.write(json.dumps(list_of_project_names, indent=2))

    return list_of_project_names

Once we get the list, we can process them one-by-one.

Get branches

First step in processing a repository is to determine which branch to scan. If you have consistent naming and you know that all repositories have -say- main and develop branches, you can just pick the most relevant. The only thing to note is that the script, as it's written, will pick up the same branch for all repositories (e.g. develop). You can do a list (e.g. a JSON dict) manually and integrate that in your script, so it picks a specific branch for each repository. It's up to you :)

I have a helper function to list all branches:

def get_all_branches(project_key, repo_slug):

  url_branch = f"{BITBUCKET_URL}/latest/projects/{project_key}/repos/{repo_slug}/branches"

  # Get the default branch to list files
  return get_all_pages(url_branch)

I then have a simple heuristic to get the name of the branch to retrieve:

def calculate_main_branch(project_key, repo_slug, default_branch:str="develop"):
   # Endpoint to get the main branch
    url_branch = f"{BITBUCKET_URL}/latest/projects/{project_key}/repos/{repo_slug}/branches"
    print(url_branch)
   
    # Get the default branch to list files
    branches_info = get_all_pages(url_branch)

    main_branch = None

    print(f"Project has {len(branches_info)} branches")

    # Assuming the first branch is the main branch; you might need to adjust if your main branch has a different name
    if branches_info is not None:
        dict_of_branches = array_to_dict(branches_info, "displayId")

        main_branch = has_branch(dict_of_branches, "main")
        if main_branch is None:
            main_branch = has_branch(dict_of_branches, "master")

    if not main_branch:
        print("Main branch not found.")
        return None
    else:
        main_branch = main_branch["displayId"]
    

    # Check if we have a preferred branch
    if default_branch is not None:
        for branch in  branches_info:
            if branch['displayId'] == default_branch:
                main_branch = default_branch

    return main_branch

It practically checks the repository is valid (has at least a branch). If that's true, it looks for the default_branch. If it finds it, then it picks it as the branch to scan. Otherwise, it'll retrieve either main or master.

In the next blog entry we will scan a repository.

HTH,

PS: This is part of the RAG with Continue.dev series.