A practical guide on how to manage a collection of code snippets as a single, easy to maintain library collection. I will utilise Python package namespacing, managed in a Git mono-repository.
Are you, your team or organisation suffering from constant copying of fragments of code among projects? Do they diverge quickly? Are you already finding an increasing need to reuse snippets of code in multiple places as your project/team/business grows? Then this article is for you. I will primarily focus on a mono-repo library collection but I will present other technologies as well. Hopefully, it will make choosing the most appropriate one for your situation easier.
Library collection (hereafter as library) is a number of code pieces, wrapped in packages (hereafter as sub-package) that are distributable individually and can be re-used in multiple projects.
TL;DR: If you’re here just for an example, have a look at an example Git repository.
Typical candidates for sub-packaging:
- Constants, schemas, company policies
- Utility functions extending libraries
- Development tools
Motivation
The ultimate drive behind librarification is lowering the maintenance cost, which is affected by several closely related properties.
Breaking changes
A breaking change occurs when you make a backward incompatible change, such as removing or renaming a function, a function parameter, a package or changing a function behaviour. In Python, packages are versioned using PEP 440 and frequently comply with Semantic versioning. It is advisable to use the same for you own library.
To keep the maintenance cost as low as possible, you want to reduce the number of breaking changes.
Dependencies
The more code your library accumulates, the more likely it is for breaking changes to occur. But not every breaking change affects the whole library. Let’s say your library looks like this:
company_utils __init__.py constants.py flask_utils.py logging_utils.py
constants.py
contains only constants without any dependencies.logging_utils.py
depends only on the Python logging library.flask_utils.py
contains several utility functions used with the Flask web framework and depends on importing it.
If you need to remove obsolete constant from company_utils.constants
, you need to bump the major number of your library version, such as 1.2.4
-> 2.0.0
. This will notify the user of the library „hey, you need to check what has changed and modify your code”. However, there is no need to modify the code if you don’t use company_utils.constants
. Maybe you just use company_utils.logging_utils
and the change is not breaking for you.
In the example above I tried to illustrate how unnecessary breaking changes checks in points of use increase the maintenance cost.
Domain separation
Multiple repositories
Building on top of the previous chapter, it may seem trivial to just split the different domains into separate packages hosted in individual repositories. However, this increases the maintenance cost again.
Now you need to maintain all development tooling in multiple repositories. This may include CI configuration, linter settings, documentation, build scripts etc. The actual code can be as small as a single file. Therefore, the maintenance cost on keeping multiple repositories up to date will likely outweigh any benefit gained from the split.
Extras
If multiple packages in individual repositories are not the answer, what about everything in a single repository? Popular packaging and distribution tools, such as setuptools, Pipenv and Poetry, allow declaring „extras” – optional features with their own dependencies. You could treat you library as a package and the sub-packages as extras.
You would install such a library as:
pip install "company_utils[logging_utils,constants]==2.0.0"
The library no longer brings number of unused dependencies. However, this approach has still many of the negatives:
- The entire library uses a single version
- Code is distributed even when not used
import company_utils.flask_utils
will not show any errors in your IDE but will fail on execution because theflask
dependency is not installed- Nothing prevents cross sub-package dependencies
Package namespacing
Another option is to use package namespacing, namely the native/implicit namespace packages as defined in PEP 420. Both setuptools and Poetry support package namespacing.
The documentation is vague on how namespacing helps and how to use it for multiple sub-packages. As it turns out, package namespacing is not designed to work in a single repository out of the box. Attempt to do so results in a mono-repo. The key missing information is that each namespaced package need its own build script that must live outside of the package. This is tricky in a mono-repo because you cannot easily have multiple setup.py
/pyproject.toml
files in the same folder.
File structure examples
setup-constants.py # Each setup-*.py must explicitly include one sub-package setup-flask_utils.py setup-logging_utils.py company_utils/ # No __init__.py here. constants/ # Sub-packages have __init__.py. __init__.py constants.py flask_utils/ __init__.py flask_utils.py logging_utils/ __init__.py logging_utils.py
company_utils.constants/ setup.py # All setup.py differ only in the package name src/ company_utils/ constants/ __init__.py constants.py company_utils.flask_utils/ setup.py src/ company_utils/ flask_utils/ __init__.py flask_utils.py company_utils.logging_utils/ setup.py src/ company_utils/ logging_utils/ __init__.py logging_utils.py
company_utils.constants/ pyproject.toml src/ constants/ __init__.py constants.py company_utils.flask_utils/ pyproject.toml src/ flask_utils/ __init__.py flask_utils.py company_utils.logging_utils/ pyproject.toml src/ logging_utils/ __init__.py logging_utils.py
You can see that having multiple setup
or pyproject
files is ugly and increases maintenance cost by introducing duplication. A better solution is suggested in the next chapter.
Low maintenance namespacing solution
It’s time to tie together information from the previous chapters. We are aiming for a solution with constant maintenance cost, independent on number of sub-packages. The resulting solution allows versioning and distribution of its sub-packages independently. Package namespacing provides an easy way to find them and import them.
To summarize the approach, we will replace duplication with iteration.
Build tools
For building the sub-packages, we will use setuptools
as they offer higher flexibility. setup.py
is just a Python script. We will parametrize it to get rid of the need for multiple files. Additionally, we will capture each sub-package requirements in a requirements.txt
file. We will also keep version of each sub-package in __version__
of each src/<NAMESPACE>/<SUB-PACKAGE>/__init__.py
file.
setup.py src/ company_utils/ # No __init__.py here. constants/ # Sub-packages have __init__.py. __init__.py constants.py requirements.txt flask_utils/ __init__.py flask_utils.py requirements.txt logging_utils/ __init__.py logging_utils.py requirements.txt
import os import sys from os.path import join from typing import List from setuptools import find_namespace_packages, setup import importlib def parse_requirements_txt(filename: str) -> List[str]: with open(filename) as fd: return list(filter(lambda line: bool(line.strip()), fd.read().splitlines())) def get_sub_package(packages_path: str) -> str: package_cmd = "--package" packages = os.listdir(packages_path) available_packages = ", ".join(packages) if package_cmd not in sys.argv: raise RuntimeError( f"Specify which package to build with '{package_cmd} <PACKAGE NAME>'. " f"Available packages are: {available_packages}" ) index = sys.argv.index(package_cmd) sys.argv.pop(index) # Removes the switch package = sys.argv.pop(index) # Returns the element after the switch if package not in packages: raise RuntimeError( f"Unknown package '{package}'. Available packages are: {available_packages}" ) return package def get_version(sub_package: str) -> str: return importlib.import_module(f"src.global_python_utils.{sub_package}").__version__ SOURCES_ROOT = "src" NAMESPACE = "company_utils" PACKAGES_PATH = join(SOURCES_ROOT, NAMESPACE) SUB_PACKAGE = get_sub_package(PACKAGES_PATH) NAMESPACED_PACKAGE_NAME = f"{NAMESPACE}.{SUB_PACKAGE}" setup( name=NAMESPACED_PACKAGE_NAME, version=get_version(SUB_PACKAGE), # See https://setuptools.readthedocs.io/en/latest/setuptools.html#find-namespace-packages package_dir={"": SOURCES_ROOT}, packages=find_namespace_packages( where=SOURCES_ROOT, include=[NAMESPACED_PACKAGE_NAME] ), include_package_data=True, zip_safe=False, install_requires=parse_requirements_txt( join(PACKAGES_PATH, SUB_PACKAGE, "requirements.txt") ), )
# src/flask_utils/requirements.txt - the others are empty flask>=1.0.2,<2.0
company_utils.constants-1.0.0-py3-none-any.whl company_utils.flask_utils-1.0.0-py3-none-any.whl company_utils.logging_utils-1.0.0-py3-none-any.whl
We will also need something to build all packages as setup.py
builds only one. Personally, I like to automate tasks with PyInvoke. But any Python/Bash/other script will do as well.
import os import shutil from invoke import call, task # For `install_subpackage_dependencies`, see the next section # It prevent import errors when resolving sub-package version number @task(pre=[install_subpackage_dependencies]) def build(ctx): for package in os.listdir("src/company_utils"): print(f"Building '{package}' package") print("Cleanup the 'build' directory") shutil.rmtree("build", ignore_errors=True) ctx.run( f"export PYTHONPATH=src\npython setup.py bdist_wheel --package {package}", pty=True, )
With this setup, we can build all packages at once and push them to a package registry (PyPI, PackageCloud, etc.).
Local development
You may have noticed that having many requirements.txt
doesn’t make local development developer-friendly. How are you going to install all those requirements to not have import errors? And how you will keep them up-to-date?
Let us add another automation task to install all the src/<NAMESPACE>/<SUB-PACKAGE>/requirements.txt
files.
import os from invoke import call, task from tasks.utils import PROJECT_INFO, print_header @task def install_subpackage_dependencies(ctx, name=None): """ Args: ctx (invoke.Context): Context name (Optional[str]): Name of sub-package for which to collect and install dependencies. If not specified, all sub-packages will be used. """ print("Uninstalling previous dependencies") ctx.run("pipenv clean", pty=True) print("Installing new dependencies") packages = os.listdir(PROJECT_INFO.namespace_directory) if name is None else [name] for package in packages: print_header(package, level=3) requirements_file_path = PROJECT_INFO.namespace_directory / package / "requirements.txt" ctx.run(f"pipenv run pip install -r {requirements_file_path}", echo=True)
This task will either install all available dependencies or dependencies of selected sub-package. You can also see that pipenv
is being called. I recommend using Pipenv or Poetry to manage your development dependencies and Virtual Python Environment.
How would this look like in practice? You would use your pipenv sync -d
or poetry install
for dev dependencies and pipenv run inv install_subpackage_dependencies
or poetry run inv install_subpackage_dependencies
for sub-package dependencies.
Continuous integration
Another problem you may have noticed is that installing all sub-package dependencies will prevent tests from discovering import of dependencies from other sub-packages. For example, if you import flask
in company_utils.constants
, it will work locally but fail when the library will be installed. Continuous integration (CI) comes for help! The „cross-import” scenario should be rare. Therefore, you can leave it to fail in a CI pipeline instead and keep a lot of complexity out of the local development environment. CI will be the quality gate.
Hopefully, the CI solution of your choice allows parametrization of jobs (such a CircleCI Matrix Jobs). Each parameter in this case will be the name of a sub-package. Since you want to target specific sub-packages, it is also a good idea to split your tests in folders named by the sub-package. Then the pipeline could look like:
install pipenv pipenv clean # run if you cache dependencies pipenv install --dev --deploy # makes sure Pipfile.lock is up to date pipenv run inv install_subpackage_dependencies --name ${sub_package} # Run any tests you like - only one sub-package dependencies are now present
Versioning, semantic releases
As mentioned before, version number is kept in __version__
of each src/<NAMESPACE>/<SUB-PACKAGE>/__init__.py
file. If you are wondering how to keep a changelog or automate versioning with semantic releases, I will describe it in a future blog post. For now, you can have a look at these resources for inspiration:
- Lerna: A JavaScript tool for managing projects with multiple packages
- This example change log of a mono-repo project using Lerna
Summary
Maintaining a collection of libraries can save a lot of development time. However, due to the lack of direct support in all commonly used build tools, it has also a small upfront cost on developing you own tasks around it. Hopefully, this article has helped you to see if the investment is worth the potential gains or event implement similar solution on your own.
All the examples above have a working example in this Git repository.
Leave a Reply