Package namespacing for Python library collection

A practical guide on how to manage a collection of code snippets as a single, easy to maintain library collection. I will utilise Python package namespacing, managed in a Git mono-repository.

Are you, your team or organisation suffering from constant copying of fragments of code among projects? Do they diverge quickly? Are you already finding an increasing need to reuse snippets of code in multiple places as your project/team/business grows? Then this article is for you. I will primarily focus on a mono-repo library collection but I will present other technologies as well. Hopefully, it will make choosing the most appropriate one for your situation easier.

Library collection (hereafter as library) is a number of code pieces, wrapped in packages (hereafter as sub-package) that are distributable individually and can be re-used in multiple projects.

TL;DR: If you’re here just for an example, have a look at an example Git repository.

Typical candidates for sub-packaging:

Constants, schemas, company policies
Utility functions extending libraries
Development tools

Motivation

The ultimate drive behind librarification is lowering the maintenance cost, which is affected by several closely related properties.

Breaking changes

A breaking change occurs when you make a backward incompatible change, such as removing or renaming a function, a function parameter, a package or changing a function behaviour. In Python, packages are versioned using PEP 440 and frequently comply with Semantic versioning. It is advisable to use the same for you own library.

To keep the maintenance cost as low as possible, you want to reduce the number of breaking changes.

Dependencies

The more code your library accumulates, the more likely it is for breaking changes to occur. But not every breaking change affects the whole library. Let’s say your library looks like this:

company_utils
    __init__.py
    constants.py
    flask_utils.py
    logging_utils.py

constants.py contains only constants without any dependencies.
logging_utils.py depends only on the Python logging library.
flask_utils.py contains several utility functions used with the Flask web framework and depends on importing it.

If you need to remove obsolete constant from company_utils.constants, you need to bump the major number of your library version, such as 1.2.4 -> 2.0.0. This will notify the user of the library „hey, you need to check what has changed and modify your code”. However, there is no need to modify the code if you don’t use company_utils.constants. Maybe you just use company_utils.logging_utils and the change is not breaking for you.

In the example above I tried to illustrate how unnecessary breaking changes checks in points of use increase the maintenance cost.

Domain separation

Multiple repositories

Building on top of the previous chapter, it may seem trivial to just split the different domains into separate packages hosted in individual repositories. However, this increases the maintenance cost again.

Now you need to maintain all development tooling in multiple repositories. This may include CI configuration, linter settings, documentation, build scripts etc. The actual code can be as small as a single file. Therefore, the maintenance cost on keeping multiple repositories up to date will likely outweigh any benefit gained from the split.

Extras

If multiple packages in individual repositories are not the answer, what about everything in a single repository? Popular packaging and distribution tools, such as setuptools, Pipenv and Poetry, allow declaring „extras” – optional features with their own dependencies. You could treat you library as a package and the sub-packages as extras.

You would install such a library as:

pip install "company_utils[logging_utils,constants]==2.0.0"

The library no longer brings number of unused dependencies. However, this approach has still many of the negatives:

The entire library uses a single version
Code is distributed even when not used
import company_utils.flask_utils will not show any errors in your IDE but will fail on execution because the flask dependency is not installed
Nothing prevents cross sub-package dependencies

Package namespacing

Another option is to use package namespacing, namely the native/implicit namespace packages as defined in PEP 420. Both setuptools and Poetry support package namespacing.

The documentation is vague on how namespacing helps and how to use it for multiple sub-packages. As it turns out, package namespacing is not designed to work in a single repository out of the box. Attempt to do so results in a mono-repo. The key missing information is that each namespaced package need its own build script that must live outside of the package. This is tricky in a mono-repo because you cannot easily have multiple setup.py/pyproject.toml files in the same folder.

File structure examples

setup-constants.py  # Each setup-*.py must explicitly include one sub-package
setup-flask_utils.py
setup-logging_utils.py
company_utils/
    # No __init__.py here.
    constants/
        # Sub-packages have __init__.py.
        __init__.py
        constants.py
    flask_utils/
        __init__.py
        flask_utils.py
    logging_utils/
        __init__.py
        logging_utils.py

company_utils.constants/
    setup.py  # All setup.py differ only in the package name
    src/
        company_utils/
            constants/
                __init__.py
                constants.py
company_utils.flask_utils/
    setup.py
    src/
        company_utils/
            flask_utils/
                __init__.py
                flask_utils.py
company_utils.logging_utils/
    setup.py
    src/
        company_utils/
            logging_utils/
                __init__.py
                logging_utils.py

company_utils.constants/
    pyproject.toml
    src/
        constants/
            __init__.py
            constants.py
company_utils.flask_utils/
    pyproject.toml
    src/
        flask_utils/
            __init__.py
            flask_utils.py
company_utils.logging_utils/
    pyproject.toml
    src/
        logging_utils/
            __init__.py
            logging_utils.py

You can see that having multiple setup or pyproject files is ugly and increases maintenance cost by introducing duplication. A better solution is suggested in the next chapter.

Low maintenance namespacing solution

It’s time to tie together information from the previous chapters. We are aiming for a solution with constant maintenance cost, independent on number of sub-packages. The resulting solution allows versioning and distribution of its sub-packages independently. Package namespacing provides an easy way to find them and import them.

To summarize the approach, we will replace duplication with iteration.

Build tools

For building the sub-packages, we will use setuptools as they offer higher flexibility. setup.py is just a Python script. We will parametrize it to get rid of the need for multiple files. Additionally, we will capture each sub-package requirements in a requirements.txt file. We will also keep version of each sub-package in __version__ of each src/<NAMESPACE>/<SUB-PACKAGE>/__init__.py file.

setup.py
src/
    company_utils/
        # No __init__.py here.
        constants/
            # Sub-packages have __init__.py.
            __init__.py
            constants.py
            requirements.txt
        flask_utils/
            __init__.py
            flask_utils.py
            requirements.txt
        logging_utils/
            __init__.py
            logging_utils.py
            requirements.txt

import os
import sys
from os.path import join
from typing import List

from setuptools import find_namespace_packages, setup

import importlib

def parse_requirements_txt(filename: str) -> List[str]:
    with open(filename) as fd:
        return list(filter(lambda line: bool(line.strip()), fd.read().splitlines()))

def get_sub_package(packages_path: str) -> str:
    package_cmd = "--package"
    packages = os.listdir(packages_path)
    available_packages = ", ".join(packages)

    if package_cmd not in sys.argv:
        raise RuntimeError(
            f"Specify which package to build with '{package_cmd} <PACKAGE NAME>'. "
            f"Available packages are: {available_packages}"
        )

    index = sys.argv.index(package_cmd)
    sys.argv.pop(index)  # Removes the switch
    package = sys.argv.pop(index)  # Returns the element after the switch
    if package not in packages:
        raise RuntimeError(
            f"Unknown package '{package}'. Available packages are: {available_packages}"
        )
    return package

def get_version(sub_package: str) -> str:
    return importlib.import_module(f"src.global_python_utils.{sub_package}").__version__

SOURCES_ROOT = "src"
NAMESPACE = "company_utils"
PACKAGES_PATH = join(SOURCES_ROOT, NAMESPACE)
SUB_PACKAGE = get_sub_package(PACKAGES_PATH)
NAMESPACED_PACKAGE_NAME = f"{NAMESPACE}.{SUB_PACKAGE}"

setup(
    name=NAMESPACED_PACKAGE_NAME,
    version=get_version(SUB_PACKAGE),
    # See https://setuptools.readthedocs.io/en/latest/setuptools.html#find-namespace-packages
    package_dir={"": SOURCES_ROOT},
    packages=find_namespace_packages(
        where=SOURCES_ROOT, include=[NAMESPACED_PACKAGE_NAME]
    ),
    include_package_data=True,
    zip_safe=False,
    install_requires=parse_requirements_txt(
        join(PACKAGES_PATH, SUB_PACKAGE, "requirements.txt")
    ),
)

# src/flask_utils/requirements.txt - the others are empty
flask>=1.0.2,<2.0

company_utils.constants-1.0.0-py3-none-any.whl
company_utils.flask_utils-1.0.0-py3-none-any.whl
company_utils.logging_utils-1.0.0-py3-none-any.whl

We will also need something to build all packages as setup.py builds only one. Personally, I like to automate tasks with PyInvoke. But any Python/Bash/other script will do as well.

import os
import shutil

from invoke import call, task

# For `install_subpackage_dependencies`, see the next section
# It prevent import errors when resolving sub-package version number
@task(pre=[install_subpackage_dependencies])
def build(ctx):
    for package in os.listdir("src/company_utils"):
        print(f"Building '{package}' package")
        print("Cleanup the 'build' directory")
        shutil.rmtree("build", ignore_errors=True)
        ctx.run(
            f"export PYTHONPATH=src\npython setup.py bdist_wheel --package {package}",
            pty=True,
        )

With this setup, we can build all packages at once and push them to a package registry (PyPI, PackageCloud, etc.).

Local development

You may have noticed that having many requirements.txt doesn’t make local development developer-friendly. How are you going to install all those requirements to not have import errors? And how you will keep them up-to-date?

Let us add another automation task to install all the src/<NAMESPACE>/<SUB-PACKAGE>/requirements.txt files.

import os
from invoke import call, task
from tasks.utils import PROJECT_INFO, print_header

@task
def install_subpackage_dependencies(ctx, name=None):
    """
    Args:
        ctx (invoke.Context): Context
        name (Optional[str]): Name of sub-package for which to collect and install dependencies.
            If not specified, all sub-packages will be used.
    """    
    print("Uninstalling previous dependencies")
    
    ctx.run("pipenv clean", pty=True)

    print("Installing new dependencies")
    
    packages = os.listdir(PROJECT_INFO.namespace_directory) if name is None else [name]
    for package in packages:
        print_header(package, level=3)
        requirements_file_path = PROJECT_INFO.namespace_directory / package / "requirements.txt"
        ctx.run(f"pipenv run pip install -r {requirements_file_path}", echo=True)

This task will either install all available dependencies or dependencies of selected sub-package. You can also see that pipenv is being called. I recommend using Pipenv or Poetry to manage your development dependencies and Virtual Python Environment.

How would this look like in practice? You would use your pipenv sync -d or poetry install for dev dependencies and pipenv run inv install_subpackage_dependencies or poetry run inv install_subpackage_dependencies for sub-package dependencies.

Continuous integration

Another problem you may have noticed is that installing all sub-package dependencies will prevent tests from discovering import of dependencies from other sub-packages. For example, if you import flask in company_utils.constants, it will work locally but fail when the library will be installed. Continuous integration (CI) comes for help! The „cross-import” scenario should be rare. Therefore, you can leave it to fail in a CI pipeline instead and keep a lot of complexity out of the local development environment. CI will be the quality gate.

Hopefully, the CI solution of your choice allows parametrization of jobs (such a CircleCI Matrix Jobs). Each parameter in this case will be the name of a sub-package. Since you want to target specific sub-packages, it is also a good idea to split your tests in folders named by the sub-package. Then the pipeline could look like:

install pipenv
pipenv clean # run if you cache dependencies
pipenv install --dev --deploy  # makes sure Pipfile.lock is up to date
pipenv run inv install_subpackage_dependencies --name ${sub_package}
# Run any tests you like - only one sub-package dependencies are now present

Versioning, semantic releases

As mentioned before, version number is kept in __version__ of each src/<NAMESPACE>/<SUB-PACKAGE>/__init__.py file. If you are wondering how to keep a changelog or automate versioning with semantic releases, I will describe it in a future blog post. For now, you can have a look at these resources for inspiration:

Lerna: A JavaScript tool for managing projects with multiple packages
This example change log of a mono-repo project using Lerna

Summary

Maintaining a collection of libraries can save a lot of development time. However, due to the lack of direct support in all commonly used build tools, it has also a small upfront cost on developing you own tasks around it. Hopefully, this article has helped you to see if the investment is worth the potential gains or event implement similar solution on your own.

All the examples above have a working example in this Git repository.

Comments

2 responses to “Package namespacing for Python library collection”

eren
2021-03-02
Fantastic work. Thanks alot.
Would you suggest using version.py files instead of specifying __version__ in __init__.py files of sub-packages?
1. Radek Lát
  2021-05-19
  Hi Eren. I’m not aware of any specific convention so use anything that is intuitive for you. I personally use the __init__.py file because it is always there but if you want to make it more explicit and visible, version.py sounds like a good idea as well.