Verdict: It's a Match!

Introduction

Second blog post since forever, and I have no idea what this one should be about. The main problem is I”m writing this a I finish my workout, so I don’t have much going on in terms of setup, but that’s fine.

After the rant in the last post, I would like to aim for a more informative post this time. Something with more “meat”, so to speak. It’s not much, but perhaps I’ll share my experience using Python’s match statement; introduced in Python 3.10.

The Primal Age

In a not so distant past, writing code for checking the structure and types of objects was a huge pain in Python. Here is a real-world example of an object schema:

{
  "job_type": "training",
  "train_config": "some config",
  "min_examples": 10,
}

Intuitively, job_type value can be one of several predefined values, and based on the job_type, we would like to invoke a custom flow, or banch of code if you will. How would one normally handle this in Python?

def handle_job(job: dict):
  job_type = job.pop("job_type")
  if job_type == "training":
    train_job(**job)
  elif ...:
    ...

Looks fine to me, what do you think?

The main concern that screams danger at me is the full trust we have that the job is indeed a dict and that is has the key job_type. Granted, fixing the potential KeyError is easy, just add a default and you’re golden. Regarding the type assurance, you have to have some level of trust, especially if this is an internal function that you fully own.

This code is actually perfectly fine, and I wouldn’t necessary bark at it during a code review.

Let’s take another, rather similar example:

# snippet from a function that takes an SQLAlchemy orm type and a "dynamic filter",
# and returns a SQLAlchemy filter expression.
#
# e.g.
# ("date", ("2022-01-01", "2023-01-01")) -> orm_type.date.between(date1, date2)
#
if isinstance(filter_, (list, tuple)):
  if len(filter_) == 2 and isinstance(filter_[0], str) and isinstance(filter_[1], (list, tuple)):
    return getattr(orm_type, key).between(filter_[1][0], filter_[1][1])
  elif ...:
    ...
else:
  return filter_

If you want to go ahead and gouge your eyeballs out, please, be my guest. I started this project when my dependencies required me to use Python 3.8, and then soon after Python 3.9. I remember how badly I wanted to “matchify” this code…

I mean, even if one is more inclined to make it self-documenting, it may look something like this:

if isinstance(filter_, (list, tuple)):
  key = filter_[0]
  is_between_expr = len(filter_) == 2 and isinstance(filter[1], (tuple, list)
  if isbetween_expr:
    ...
else:
  return filter_

Yikes, doesn’t look any better to me, honestly.

Eye Candy

match filter_:
    case (str() as key, (start, end)):
        return getattr(orm_type, key).between(start, end)
    case (str() as key, str() as value) if re.match(r"^%[^%]+%$", value):
        return getattr(orm_type, key).like(value)
    case (str() as key, value):
        return getattr(orm_type, key) == value
    case (str() as key, str() as operation, _ as value):
        op = getattr(operator, operation)
        return op(getattr(orm_type, key), value)
    case _:
        return filter_

Huge sigh of relief! Not only is the code succinct, safe, expressive it’s also self-documenting. Keep in mind this code is deep within the code base, after Pydantic has done its validation on the input, so no need for us to be pedantic about it (pun intended). Yet we really do need to check types and structure to properly determine what to execute.

I can hear you already from shouting from behind the screen:

Actually, the problem here is the spec itself. It’s too vague.”

Fair enough. If we had passed the actual operation as some sort of OP_CODE type of thing, parsing would definitely be less hairy. To be completely honest, this code is not used anymore, lol. But for that time when I needed to quickly have a simple spec to manage sql filters dynamically, match had my back.

Match Madness

For the final piece of this post, I present you with an actually useful use case for the match statement:

# Recursively parse an SQLAlchemy ORM type, mapping everything to primitive types.
match obj:
    case dict():
        return {
            _to_dict_or_list(k, exclude_refs): _to_dict_or_list(v, exclude_refs)
            for k, v in obj.items()
            if not isinstance(k, str) or not k.startswith("_")
        }
    case tuple() | sa.engine.Row() if hasattr(obj, "_fields"):
        return {
            _to_dict_or_list(k, exclude_refs): _to_dict_or_list(v, exclude_refs)
            for k, v in zip(obj._fields, obj)
        }  # noqa
    case list() | tuple():
        return list(_to_dict_or_list(v, exclude_refs) for v in obj)
    case Enum():
        return obj.value
    case _:
        return copy.deepcopy(obj)

Honestly, this looks like a monster. The value of this code will also greatly diminish once we migrate to SQLAlchemy 2.0 with Mapped attributes, but the main use-case still holds true.

The idea is that working with arbitrary data structures, with various shapes and sizes, can get out of hand very quickly. Being able to break down the conditions that need to be met into readable statements goes a long way. This code could use a bit of cleaning as well, perhaps using local functions to keep each case body as simple as possible, so there is still room to improve beyond just using match.

Conclusion

I am all for Python’s steady march towards the functional paradigm. The less side-effects, mutation, inheritance, and other monsters you can avoid, the better off we’ll all be. I just wish pipes come very soon. They just can’t come soon enough, that’s for sure.