Julia Duimovich

I'm a software developer, I do PyCon related things, and I blog, occasionally.

Debugging discovery

When you’ve been wrestling with a bug mentally for a while, finally figuring it out is a great feeling. Almost 2 years ago, I wrote an implementation of the Tortoise-Hare algorithm for cycle detection in singly linked lists.

This was just to see how the algorithm worked, but I couldn’t figure out why it didn’t always work perfectly. I was super stuck. So, I wrote out a (somewhat overkill) set of tests to help figure out what was wrong. The problem was that occasionally, the code would hit the max recursion depth (Python’s standard limit is 1000) and crash. This wasn’t happening consistently enough to be obvious. Not to mention, the code was noticeably slow. Yeah, it’s a recursive algorithm, but I was using test cases with a max of 6 items in the list. Why was the performance so terrible?

My linked lists were implemented with dictionaries as nodes, with a key for the data, and a key pointing to the next node.

Well, a week ago, I noticed that the loop I was running had this condition:

while slow!=fast:

Where slow and fast are the 2 “pointers” that represent the tortoise and hare. It turns out that comparing 2 cyclical objects isn’t very stack friendly. Basically, because every node of a linked list is an object, the .__eq__ method must be called for each node, then thrown on the stack. This occurs until we either find out that they’re not equal or we crash the stack or we hit the end of a linked list (the list has to be cycle-free for this to happen, so it’s not a particularly useful scenario).

The solution? Use .id() to compare the two objects. Since we’re trying to find out whether the 2 pointers have managed to point to the same place, this works perfectly for our purposes. Since id is guaranteed to be a unique and consistent int while the object exists, this is a great choice.

So now, the previous condition reads as

while id(slow) != id(fast):

This is obviously way faster than performing comparisons for each node in the list- comparing 2 ints is cheap and easy.

As a side note, it was very interesting to see online that some people really like writing their own classes and prefer using them to implement linked lists, rather than using the flexible data structures that are Python objects. I like my way. The custom object method has the benefit of being able to define your own custom dunder methods so you theoretically could fix the above problem by writing your own eq methods instead. Could be useful, just depends on what you’re doing! Luckily, in this case, there’s a good solution for my method with basically no new code needed.

+1 to my list of mental debugging checks.