`tuple` Reveals Your Intent

Brief

tuple is framed as the less potent brother of list. In long-living code, however, limiting functionality is an asset. This article shows where list-ridden code would be better of using a tuple.

Context

Very often, I bump onto code that happens to work, by pure merit of the code author, but that doesn't give me an easy time to realize why.

Very often, this has to do with over-achieving data types with many bells and whistles, that allow all sorts of things to go wrong: dict objects passed around to convey data that has a fixed structure, list objects that happen to not be appended to, inserted to, sliced from, sorted, int objects that can be multiplied, added to, ... etc....

In any professional project, code has to be reviewed, maintained and debugged. This code is read an order of magnitude more often than it is written.

So in order to save the reader some time, we can tell them about the objects we create by choosing the proper type for them. In this post, I want to make the case for using tuple over other data structures.

The Right Frame of Mind

When looking for this, I bump on mostly discussions that say what you cannot do with a tuple, and most of the time this is seen as a liability (not always). This is a frame rooted in the comfort of the code author to choose a data type that can do everything they would ever need.

I claim that Not Allowing Unintended Stuff is an asset of the code: it makes it easy for the code reader to limit the possibilities where the program can go wrong. I want you to look at tuple from the frame of code reviewing and maintenance, and starts from the comfort of the reader.

When to Shun list

Try to reason about this code:

def main():
  inputs = [1, 2, -3, 4, -5]
  results = process(inputs)
  tabulate(inputs, results)

From the example, you may conclude we're creating a sort of table that maps a list of input values onto a list of results.

But what can you really tell about the correctness?

How can you tell that tabulate receives [1, 2, 3, 4, 5] as its inputs argument?

Yes: you'll need to check what the process function does. It may be implemented as

def process(inputs):
  # all my inputs are mine!
  originals = tuple(inputs)
  inputs.sort(reverse=True, key=abs)
  total = sum(
      multiprocess_pool.map(
          sum,
          more_itertools.chunked(inputs, 3)))
  return tuple(i/total for i in originals)

Not too far fetched, though this is a trivial example. In production code, the functions would be larger, more complex, and farther apart.

Now imagine this code would have used tuples:

def main():
  inputs = (1, 2, 3, 4, 5)
  results = process(inputs)
  tabulate(inputs, results)

This time, there is no way that the tabulate function could receive anything but (1, 2, 3, 4, 5). It becomes trivial to see that the inputs object cannot be mutated. How neat is that!?

When to Shun dict

The idea of limiting the possible uses of a construct is not limited to array-like structures. On the contrary: it is applicable to whatever code you're designing.

Another trivial design decision, now we're at Python 3.9+, is to no longer use the overly versatile dictionaries that have hidden the true meaning of your program.

def generate_data():
  return tuple({
    "temperature °C": float(tc),
    "time (ms)": float(tms),
    "location": loc
  } for tc, tms, loc in csv_rows("datapoints.csv"))

# In a totally different module, we find the consumer:

def print_location_temperature(data):
  get_loc = lambda s: s["location"]
  for loc, samples in group_by(data, key=get_loc):
    sum_t = sum(s["temperature"] for s in samples)
    avg_t = sum_t / len(data)
    print(f"average temperature in {loc}: {avg_t}")

Of course there must be a bug.

Can you spot it? You know where I'm heading, right? I'm nagging about dictionaries holding structured data. The s["temperature"] will surely raise KeyError!

So far for the review? Nope. But admit you were (or at least, I would have been) distracted by checking for key lookups.

What if we could ask the static checker to do this for us? Not with dicts, it won't, but by giving it dataclasses! This totally makes sense, because we know the structure of the data upfront, and we want to have two pieces of code using the same structure - exactly what dataclass is for!

@dataclass(frozen=True)  # reap benefits of immutable data
class Sample:
  temperature_C: float
  time_ms: float
  location_id: str


def generate_data() -> Iterable[Sample]:
  return Sample(
          temperature_C=float(tc),
          time_ms=float(tms),
          location_id=loc)
      for tc, tms, loc in csv_rows("datapoints.csv"))

# In a totally different module, we find the consumer:

def print_location_temperature(data: Iterable[Sample]):
  get_loc = lambda s: s.location_id
  for loc, samples in group_by(data, key=get_loc):
    sum_t = sum(s.temperature_C for s in samples)
    avg_t = sum_t / len(data)
    print(f"average temperature in {loc}: {avg_t}")

Now you we can focus on the logic of the function, since tools like mypy can check for attribute errors. (I bet you're smarter than me and long spotted that the len(data) should have been len(samples)).

Generalizing the Idea

The functionality we care about should be easy to verify.

Using immutable data, and statically check-able constructs, can greatly help with that, because it gives the reader language-guaranteed certainties.

I'm glad I did get this off my chest. And I'm happy you agree ;).

Want More?

This post is mostly about the syntactic level; the ideas here can be lifted to the whole world of software design.

Learn about strong, meaningful objects as opposed to primitive-obsession

Learn about the Repository pattern in Domain Driven Design, I learned about the Repository pattern producing guaranteed-valid Domain Objects.

And I might want to write some more about this, too.