Does pair programming work in data science?
Pair programming is common in coding. But does it work in data science? And with remote work? Our data scientists gave it a whirl.
Pair programming defined
In pair programming, two programmers work side by side at a single workstation. Like a rally team, they have separate but key tasks. One serves as the driver, writing code, while the other navigator, checks the driver’s work. The roles should be swapped frequently. This is a key agile technique used at Proekspert to speed the onboarding of new employees – and it’s especially efficient in an era of remote work.
Two heads better than one
At the beginning of October, Kaspar Hollo joined the data science team as our newest member and I took on the role of tech buddy. And since Proekspert encourages pair programming as an onboarding technique for new engineers, doubly so in a time of remote work, it was the right time to try it out.
I have done a lot of pair data science in my scientific career, but mostly using “point and click” data analysis software, rather than developing one myself. I searched for a few articles about how to go about this activity, finding some like this one and another one). These gave some useful tips and warnings about benefits and dangers, but they reminded me of cookbook instructions, rather than an evidence-based guide.
In my experience, data science tasks can be so diverse and intermixed that general rules have a high likelihood of being oversimplified. So, I created my own guidelines for a working session:
- Define a goal you want to achieve.
- The “driver” always writes code and constantly explains what they are doing. They alert their partner (who’s “rides shotgun”) when they get stuck.
- The person riding shotgun keeps an eye on the code, checking for mistakes, bad coding style, etc. The shotgun position stands ready to brainstorm if the driver gets stuck, and can split off to perform quick searches for algorithms, or write key pieces of code.
In most pair programming sessions, I teamed up with Kaspar. We’d worked on a traffic sign detection and localization project together. We used a single computer (at the office), worked in Jupyter Lab, and used Python.
Session 1: Coding with Kaspar
During a four-hour session I took the driver’s role. I felt the number of times I got stuck was reduced due to being able to immediately brainstorm. The many relevant questions from Kaspar riding shotgun helped me examine the problem from different angles (quite literally, as the task was about triangulations in 3D space). Since Kaspar has taught Python to thousands of Estonians through MOOCs, I also learned a few coding tips and tricks and a few shortcuts that I didn’t know before.
There were multiple positives to the whole session:
- As the driver, it was easier to ignore the blinking Teams icon and focus on the technical task. This saved a lot of time switching between tasks.
- If running a piece of code took longer than a few seconds but less than a few minutes, I could fill that time brainstorming the next mini-task, explaining some tech-buddy things, or just cracking jokes while we waited.
- Since two people need breaks at different times, we took shorter breaks more often than I usually do. This is good for the eyes, posture, etc.
Subsequent sessions: reversing roles
In subsequent sessions, we tried reversing driver- and shotgun roles and even changed roles multiple times during a single session. The sessions continued to be useful and we were able to discuss and immediately test a variety of hypotheses. In one four-hour session we managed to complete most of an entire list of tasks for a two-week sprint. This was only possible thanks to being able to continuously discuss the topic and next steps.
A data science pair session with Tanel
After working with Kaspar, veteran data scientist Tanel Peet and I paired up for a data science session.
Although pair work is suggested for new people, our task was tricky and pairing up seemed useful. We’d found that classical image-based tracking algorithms were not working well, so we decided to move to 3D ray tracking. Since Tanel had been working on tracking, and I had worked on localization, and we now needed to mix both algorithms, the knowledge from both sides proved invaluable.
We started with a two-hour brainstorming session where we discussed most aspects of the problem and thought deeply about the underlying mathematics. We discovered that the natural data structure for the task was a set of undirected graphs, and the corresponding adjacency matrix (each isolated graph corresponds to a single track and tracking comes down to finding rules which allow deciding if there is an edge between two nodes or not). It would have been tricky to come to that conclusion alone, since the data structure of the original tracking algorithm was a bounding box (rectangle) on an image, and the data structure of localization was a list of 3D vectors. It requires quite a shakeup to think outside the (bounding) box, and the pair session did it for us. After the brainstorming session, we discovered that the actual implementation of the idea does not benefit too much from working on the same code so, instead, I worked on the implementation, and Tanel worked on visualization to check if the set of rules and thresholds we produced were working.
This kind of separation is not how pair programming is supposed to work, but flexible rules served us well in this case. The visualizations were extremely useful for developing the tracking rules, and writing the rules was not difficult alone, once the natural data structure had been found during our brainstorm. By the evening, the algorithm was good enough that it made mistakes only in about 10% of cases (the previous tracking had around 20-30% mistakes) and around half of the mistakes in the prediction of the new algorithm ended up being mistakes in the ground-truth, instead. For me, this session proved that pair data science can be a useful concept not only for newcomers but for data scientists of all levels.
Given what we learned, I’d suggest another rule for the next sessions: Take time to think whether the session is more productive than two people working solo. If it is not, then consider stopping the session.
An interview with newcomer Kaspar Hollo
Proekspert’s agile and people coach Kadri Daljajev interviews onboardee Kaspar Hollo about his experience.
Kadri: Did you have any previous experience with pair programming?
Kaspar: This was not the first time that I had used pair programming, since most of the projects during my master’s studies my master’s thesis projects were done in pairs, and professors encouraged us to try pair programming. I didn’t see any advantages at first and had my doubts – I thought that it would only overcomplicate things and slow the process down. It turned out to be quite the opposite, especially when a deadline was approaching, and we had to push through. Pair programming forces people to talk through the problems, allows them to brainstorm solutions at a moment’s notice, and it also helps to keep the pair from distractions. Frequent switching of the driver also allows to take minibreaks while maintaining a high tempo.
How did you find the experience?
I’ve had a couple of interesting pair programming sessions at Proekspert with my tech buddy Tõnis Laasfeld, some sessions live and some via Teams. While working with Teams the driver shared his screen, and that’s it. The driver needs nothing more than advice and ideas from his shotgun position.
What was most beneficial?
Since I joined the team and company in the middle of an ongoing project, I would say that the pair programming sessions served first and foremost as a good introduction to the project code. I was able to ask questions without feeling like I’m interrupting my tech buddy. Afterward, it was much easier for me to solve problems in that particular part of the code-base by myself.
What else did you like about it?
I enjoyed the social aspect of the pair programming sessions. Even though I already knew my tech buddy Tõnis from university, I would still say that I got to know him a little better – how he approaches problems, or how many good math/programming related jokes can one person possibly make in a short period of time (he really pushed the boundaries on that one). Learning a couple of new tricks here and there was a real positive.
How did you prepare for the sessions?
We defined very clear goals about what we wanted to achieve and what must be done to achieve them. This helped us keep track of progress and decide when to end each session. We also tried to keep the schedule relatively empty on these days so that there wouldn’t be things that could potentially disturb the pair programming sessions.
What advice would you give to the next newcomer?
Three things. One: Keep an open mind about pair programming. I also had my doubts in the beginning, but it managed to positively surprise me. Two: Ask a lot of questions. They might lead to better solutions or eliminate misconceptions. And three: Try to get to know the person you are programming with. The sessions offer a good opportunity for that.
Would you recommend it?
I would say that pair programming is a useful method that helps kickstart onboarding for new employees.
Interested in joining Proekspert? Check out our current vacancies here.
Receive our weeky newsletter! Inspiring ideas that are worth your time