Programming, Photography, Guides, Thoughts

Package namespacing for Python library collection


A practi­cal gui­de on how to manage a collecti­on of code snip­pets as a sin­gle, easy to main­ta­in lib­ra­ry collecti­on. I will uti­li­se Python pac­kage name­spa­cing, managed in a Git mono-repository.

Are you, your team or orga­ni­sati­on suf­fe­ring from con­stant copy­ing of frag­ments of code among pro­jects? Do they diver­ge quick­ly? Are you alrea­dy fin­ding an incre­a­sing need to reu­se snip­pets of code in mul­tiple pla­ces as your project/team/business grows? Then this article is for you. I will pri­ma­ri­ly focus on a mono-repo lib­ra­ry collecti­on but I will pre­sent other tech­no­lo­gies as well. Hope­fully, it will make cho­o­sing the most appro­pri­a­te one for your situati­on easier.

Lib­ra­ry collecti­on (here­af­ter as lib­ra­ry) is a num­ber of code pie­ces, wrap­ped in pac­kages (here­af­ter as sub-pac­kage) that are dis­tri­bu­table indi­vi­du­ally and can be re-used in mul­tiple projects.

TL;DR: If you­’re here just for an exam­ple, have a look at an exam­ple Git repo­si­to­ry.

Typi­cal can­di­da­tes for sub-packaging:

  • Con­stants, sche­mas, com­pa­ny policies
  • Uti­li­ty functi­ons exten­ding libraries
  • Deve­lo­p­ment tools

Motivation

The ulti­ma­te dri­ve behind lib­ra­ri­fi­cati­on is lowering the main­te­nan­ce cost, which is affec­ted by seve­ral clo­se­ly rela­ted properties.

Breaking changes

A bre­a­king chan­ge occurs when you make a bac­k­ward incom­pa­ti­ble chan­ge, such as remo­ving or rena­ming a functi­on, a functi­on para­me­ter, a pac­kage or chan­ging a functi­on beha­vi­our. In Python, pac­kages are ver­si­o­ned using PEP 440 and frequent­ly com­ply with Seman­tic ver­si­o­ning. It is advi­sa­ble to use the same for you own library.

To keep the main­te­nan­ce cost as low as possi­ble, you want to redu­ce the num­ber of bre­a­king changes.

Dependencies

The more code your lib­ra­ry accu­mu­la­tes, the more like­ly it is for bre­a­king chan­ges to occur. But not eve­ry bre­a­king chan­ge affects the who­le lib­ra­ry. Let’s say your lib­ra­ry looks like this:

company_utils
    __init__.py
    constants.py
    flask_utils.py
    logging_utils.py
  • constants.py con­ta­ins only con­stants without any dependencies.
  • logging_utils.py depends only on the Python log­ging library.
  • flask_utils.py con­ta­ins seve­ral uti­li­ty functi­ons used with the Flask web fra­mework and depends on impor­ting it.

If you need to remo­ve obso­le­te con­stant from company_utils.constants, you need to bump the major num­ber of your lib­ra­ry ver­si­on, such as 1.2.4 -> 2.0.0. This will noti­fy the user of the lib­ra­ry „hey, you need to check what has chan­ged and modi­fy your code”. However, the­re is no need to modi­fy the code if you don’t use company_utils.constants. May­be you just use company_utils.logging_utils and the chan­ge is not bre­a­king for you.

In the exam­ple abo­ve I tried to illustra­te how unne­cessa­ry bre­a­king chan­ges checks in points of use incre­a­se the main­te­nan­ce cost.

Domain separation

Multiple repositories

Buil­ding on top of the pre­vi­ous chapter, it may seem tri­vial to just split the dif­fe­rent doma­ins into sepa­ra­te pac­kages hos­ted in indi­vi­du­al repo­si­to­ries. However, this incre­a­ses the main­te­nan­ce cost again.

Now you need to main­ta­in all deve­lo­p­ment too­ling in mul­tiple repo­si­to­ries. This may inclu­de CI con­fi­gu­rati­on, lin­ter settings, docu­men­tati­on, build scripts etc. The actu­al code can be as small as a sin­gle file. The­re­fo­re, the main­te­nan­ce cost on kee­ping mul­tiple repo­si­to­ries up to date will like­ly outwei­gh any bene­fit gai­ned from the split.

Extras

If mul­tiple pac­kages in indi­vi­du­al repo­si­to­ries are not the answer, what about eve­ry­thing in a sin­gle repo­si­to­ry? Popu­lar pac­ka­ging and dis­tri­bu­ti­on tools, such as setup­tools, Pipe­nv and Poet­ry, allow dec­la­ring „extras” – opti­o­nal fea­tu­res with the­ir own depen­den­cies. You could tre­at you lib­ra­ry as a pac­kage and the sub-pac­kages as extras.

You would install such a lib­ra­ry as:

pip install "company_utils[logging_utils,constants]==2.0.0"

The lib­ra­ry no lon­ger brings num­ber of unused depen­den­cies. However, this appro­ach has still many of the negatives:

  • The enti­re lib­ra­ry uses a sin­gle version
  • Code is dis­tri­bu­ted even when not used
  • import company_utils.flask_utils will not show any errors in your IDE but will fail on execu­ti­on because the flask depen­den­cy is not installed
  • Nothing pre­vents cross sub-pac­kage dependencies

Package namespacing

Ano­ther opti­on is to use pac­kage name­spa­cing, name­ly the native/implicit name­spa­ce pac­kages as defi­ned in PEP 420. Both setup­tools and Poet­ry sup­port pac­kage namespacing.

The docu­men­tati­on is vague on how name­spa­cing hel­ps and how to use it for mul­tiple sub-pac­kages. As it turns out, pac­kage name­spa­cing is not designed to work in a sin­gle repo­si­to­ry out of the box. Attempt to do so results in a mono-repo. The key mis­sing infor­mati­on is that each name­spa­ced pac­kage need its own build script that must live out­si­de of the pac­kage. This is tric­ky in a mono-repo because you can­not easi­ly have mul­tiple setup.py/pyproject.toml files in the same folder.

File structure examples

setup-constants.py  # Each setup-*.py must explicitly include one sub-package
setup-flask_utils.py
setup-logging_utils.py
company_utils/
    # No __init__.py here.
    constants/
        # Sub-packages have __init__.py.
        __init__.py
        constants.py
    flask_utils/
        __init__.py
        flask_utils.py
    logging_utils/
        __init__.py
        logging_utils.py
company_utils.constants/
    setup.py  # All setup.py differ only in the package name
    src/
        company_utils/
            constants/
                __init__.py
                constants.py
company_utils.flask_utils/
    setup.py
    src/
        company_utils/
            flask_utils/
                __init__.py
                flask_utils.py
company_utils.logging_utils/
    setup.py
    src/
        company_utils/
            logging_utils/
                __init__.py
                logging_utils.py
company_utils.constants/
    pyproject.toml
    src/
        constants/
            __init__.py
            constants.py
company_utils.flask_utils/
    pyproject.toml
    src/
        flask_utils/
            __init__.py
            flask_utils.py
company_utils.logging_utils/
    pyproject.toml
    src/
        logging_utils/
            __init__.py
            logging_utils.py

You can see that having mul­tiple setup or pyproject files is ugly and incre­a­ses main­te­nan­ce cost by intro­du­cing dupli­cati­on. A bet­ter solu­ti­on is sug­ges­ted in the next chapter.

Low maintenance namespacing solution

It’s time to tie toge­ther infor­mati­on from the pre­vi­ous chapters. We are aiming for a solu­ti­on with con­stant main­te­nan­ce cost, inde­pen­dent on num­ber of sub-pac­kages. The resul­ting solu­ti­on allows ver­si­o­ning and dis­tri­bu­ti­on of its sub-pac­kages inde­pen­dent­ly. Pac­kage name­spa­cing pro­vi­des an easy way to find them and import them.

To sum­ma­ri­ze the appro­ach, we will repla­ce dupli­cati­on with iteration.

Build tools

For buil­ding the sub-pac­kages, we will use setuptools as they offer higher fle­xi­bi­li­ty. setup.py is just a Python script. We will para­me­t­ri­ze it to get rid of the need for mul­tiple files. Addi­ti­o­nally, we will cap­tu­re each sub-pac­kage requi­re­ments in a requirements.txt file. We will also keep ver­si­on of each sub-pac­kage in __version__ of each src/<NAMESPACE>/<SUB-PACKAGE>/__init__.py file.

setup.py
src/
    company_utils/
        # No __init__.py here.
        constants/
            # Sub-packages have __init__.py.
            __init__.py
            constants.py
            requirements.txt
        flask_utils/
            __init__.py
            flask_utils.py
            requirements.txt
        logging_utils/
            __init__.py
            logging_utils.py
            requirements.txt
import os
import sys
from os.path import join
from typing import List

from setuptools import find_namespace_packages, setup

import importlib

def parse_requirements_txt(filename: str) -> List[str]:
    with open(filename) as fd:
        return list(filter(lambda line: bool(line.strip()), fd.read().splitlines()))

def get_sub_package(packages_path: str) -> str:
    package_cmd = "--package"
    packages = os.listdir(packages_path)
    available_packages = ", ".join(packages)

    if package_cmd not in sys.argv:
        raise RuntimeError(
            f"Specify which package to build with '{package_cmd} <PACKAGE NAME>'. "
            f"Available packages are: {available_packages}"
        )

    index = sys.argv.index(package_cmd)
    sys.argv.pop(index)  # Removes the switch
    package = sys.argv.pop(index)  # Returns the element after the switch
    if package not in packages:
        raise RuntimeError(
            f"Unknown package '{package}'. Available packages are: {available_packages}"
        )
    return package

def get_version(sub_package: str) -> str:
    return importlib.import_module(f"src.global_python_utils.{sub_package}").__version__

SOURCES_ROOT = "src"
NAMESPACE = "company_utils"
PACKAGES_PATH = join(SOURCES_ROOT, NAMESPACE)
SUB_PACKAGE = get_sub_package(PACKAGES_PATH)
NAMESPACED_PACKAGE_NAME = f"{NAMESPACE}.{SUB_PACKAGE}"

setup(
    name=NAMESPACED_PACKAGE_NAME,
    version=get_version(SUB_PACKAGE),
    # See https://setuptools.readthedocs.io/en/latest/setuptools.html#find-namespace-packages
    package_dir={"": SOURCES_ROOT},
    packages=find_namespace_packages(
        where=SOURCES_ROOT, include=[NAMESPACED_PACKAGE_NAME]
    ),
    include_package_data=True,
    zip_safe=False,
    install_requires=parse_requirements_txt(
        join(PACKAGES_PATH, SUB_PACKAGE, "requirements.txt")
    ),
)
# src/flask_utils/requirements.txt - the others are empty
flask>=1.0.2,<2.0
company_utils.constants-1.0.0-py3-none-any.whl
company_utils.flask_utils-1.0.0-py3-none-any.whl
company_utils.logging_utils-1.0.0-py3-none-any.whl

We will also need some­thing to build all pac­kages as setup.py builds only one. Per­so­nally, I like to auto­ma­te tasks with PyIn­vo­ke. But any Python/Bash/other script will do as well.

import os
import shutil

from invoke import call, task

# For `install_subpackage_dependencies`, see the next section
# It prevent import errors when resolving sub-package version number
@task(pre=[install_subpackage_dependencies])
def build(ctx):
    for package in os.listdir("src/company_utils"):
        print(f"Building '{package}' package")
        print("Cleanup the 'build' directory")
        shutil.rmtree("build", ignore_errors=True)
        ctx.run(
            f"export PYTHONPATH=src\npython setup.py bdist_wheel --package {package}",
            pty=True,
        )

With this setup, we can build all pac­kages at once and push them to a pac­kage regis­t­ry (PyPI, Pac­kageCloud, etc.).

Local development

You may have noti­ced that having many requirements.txt does­n’t make local deve­lo­p­ment deve­lo­per-fri­en­dly. How are you going to install all tho­se requi­re­ments to not have import errors? And how you will keep them up-to-date?

Let us add ano­ther auto­mati­on task to install all the src/<NAMESPACE>/<SUB-PACKAGE>/requirements.txt files.

import os
from invoke import call, task
from tasks.utils import PROJECT_INFO, print_header

@task
def install_subpackage_dependencies(ctx, name=None):
    """
    Args:
        ctx (invoke.Context): Context
        name (Optional[str]): Name of sub-package for which to collect and install dependencies.
            If not specified, all sub-packages will be used.
    """    
    print("Uninstalling previous dependencies")
    
    ctx.run("pipenv clean", pty=True)

    print("Installing new dependencies")
    
    packages = os.listdir(PROJECT_INFO.namespace_directory) if name is None else [name]
    for package in packages:
        print_header(package, level=3)
        requirements_file_path = PROJECT_INFO.namespace_directory / package / "requirements.txt"
        ctx.run(f"pipenv run pip install -r {requirements_file_path}", echo=True)

This task will either install all avai­la­ble depen­den­cies or depen­den­cies of selec­ted sub-pac­kage. You can also see that pipenv is being called. I recom­mend using Pipe­nv or Poet­ry to manage your deve­lo­p­ment depen­den­cies and Vir­tu­al Python Environment.

How would this look like in practi­ce? You would use your pipenv sync -d or poetry install for dev depen­den­cies and pipenv run inv install_subpackage_dependencies or poetry run inv install_subpackage_dependencies for sub-pac­kage dependencies.

Continuous integration

Ano­ther pro­blem you may have noti­ced is that instal­ling all sub-pac­kage depen­den­cies will pre­vent tests from dis­co­ve­ring import of depen­den­cies from other sub-pac­kages. For exam­ple, if you import flask in company_utils.constants, it will work locally but fail when the lib­ra­ry will be installed. Con­ti­nu­ous inte­grati­on (CI) comes for help! The „cross-import” sce­na­rio should be rare. The­re­fo­re, you can lea­ve it to fail in a CI pipe­li­ne inste­ad and keep a lot of com­ple­xi­ty out of the local deve­lo­p­ment envi­ron­ment. CI will be the qua­li­ty gate.

Hope­fully, the CI solu­ti­on of your cho­ice allows para­me­t­ri­zati­on of jobs (such a Circle­CI Mat­rix Jobs). Each para­me­ter in this case will be the name of a sub-pac­kage. Sin­ce you want to tar­get spe­ci­fic sub-pac­kages, it is also a good idea to split your tests in fol­ders named by the sub-pac­kage. Then the pipe­li­ne could look like:

install pipenv
pipenv clean # run if you cache dependencies
pipenv install --dev --deploy  # makes sure Pipfile.lock is up to date
pipenv run inv install_subpackage_dependencies --name ${sub_package}
# Run any tests you like - only one sub-package dependencies are now present

Versioning, semantic releases

As men­ti­o­ned befo­re, ver­si­on num­ber is kept in __version__ of each src/<NAMESPACE>/<SUB-PACKAGE>/__init__.py file. If you are won­de­ring how to keep a chan­ge­log or auto­ma­te ver­si­o­ning with seman­tic rele­a­ses, I will descri­be it in a futu­re blog post. For now, you can have a look at these resour­ces for inspiration:

Summary

Main­ta­i­ning a collecti­on of lib­ra­ries can save a lot of deve­lo­p­ment time. However, due to the lack of direct sup­port in all com­mon­ly used build tools, it has also a small upfront cost on deve­lo­ping you own tasks around it. Hope­fully, this article has hel­ped you to see if the invest­ment is worth the poten­tial gains or event imple­ment simi­lar solu­ti­on on your own.

All the exam­ples abo­ve have a wor­king exam­ple in this Git repo­si­to­ry.


Comments

2 responses to “Package namespacing for Python library collection”

  1. Fan­tas­tic work. Thanks alot. 

    Would you sug­gest using version.py files inste­ad of spe­ci­fy­ing __version__ in __init__.py files of sub-packages?

    1. Hi Eren. I’m not aware of any spe­ci­fic con­ven­ti­on so use any­thing that is intu­i­ti­ve for you. I per­so­nally use the __init__.py file because it is alwa­ys the­re but if you want to make it more expli­cit and visi­ble, version.py sounds like a good idea as well.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.