Introduction and Papers
For patients that require a kidney donation, they might have a family member or other loved one that might be willing a donate a kidney to them. However, they might not be compatible. There can be pairs generated by kidney patients, where one donor agrees to donate to the other's patient and vice-versa. Oftentimes, these donations have to happen simulatenously so that there is no possibility of either party backing out once their patient gets a donation, because the surgery involves some level of risk for the donor.
There is similar situation for liver donors, although the liver market is much smaller than the kidney market. Recent medical innovations have also led to new techniques for liver donation that further derisk the procedure for the donor. This requires the recipient to receive parts of two donor livers rather than just one (although the parts are smaller). The market for dual-donor organ transplants has been studied in a paper by Ergin, Sonmez, and Unver. It establishes various properties about how paired donations would work in these kinds of markets under different circumstances.
Furthermore, increasing the size of the cycles (from 2 pairs to 3 pairs, for example), can also introduce more matches and help more people obtain donations, as described in a paper by Saidman, Roth, and Sonmez. While more pairs can be added to a cycle, the condition of surgeries happening simultaneously makes things complicated, as that means that pair count * 2 surgeries need to be happening at the same time.
There was also a paper I had read recently about an "AI Economist" for optimal economic policy design. The paper, which focuses on tax policy, has an outer agent (economist/planner), which tries to design an effective mechanism for taxation by optimizing for total surplus. The inner agents (individuals and firms) take actions within this environment to optimize for their own surplus. Through reinforcement learning methods, this "AI Economist" is able to derive the same optimal tax policy as theoretical approaches.
In real life, the kidney donation problem can be viewed as a dynamic graph problem, with people entering and leaving (edges being formed and severed) through time as they get added to the transplant list, get matched for donation, or, unfortunately, become too sick to accept a transplant or die. In Matching in Dynamic Imbalanced Markets, this dynamic problem is studied, and it is shown that greedy matching (which is matching a given patient as soon as they enter the market), is the most effective way to go about matching patients in the 2-paired kidney donation market, contradicting approaches in some nations that wait for a period of time before matching everyone in the market at once.
I had tried coming up with a Pareto-efficient problem for the dual-donor problem where I mixed between the kidney markets and the dual-donor liver markets, but my algorithm fell apart quickly and a few small simulations made it apparent that even if I could derive an optimal solution, it wouldn't provide enough surplus to actually matter because the people participating in this new market would be the people that failed to find a match in the regular kidney and liver markets. The rest of this post focuses on the dynamic paired kidney matching problem.
RL (Attempt 1)
In "Matching in Dynamic Imbalanced Markets", they outlined a simulation, that, instead of relying on simulating the medical attributes of a patient (such as blood type), created two classes of individuals: hard to match and easy to match. They demonstrated results based on empirical data and simulation results that showed that a greedy policy was superior, but there were some environment/configurations where the batching method was more optimal (important to consider for other nations/markets). The paper also summarizes a "patient" policy proposed in this paper, which works by waiting for the patient to be on the verge of no longer being eligible for a transplant before trying to find a match for them (unless they get matched earlier).
After I had created a simulation environment in Farama Gymnasium, the successor to the famous OpenAI Gym library, I tried various algorithms in this enviroment. Since only cycles of patient/donor pairs of size 2 were being analyzed, it was easy to construct a graph of each potential pair of nodes and then try to identify the maximal proportion of pairs in the graph that could get matched. Comparing the greedy approach to the retrospectively optimal solution, there were still more opportunities for optimizing the model.
For the model, I trained a Graph Attention Network-based model. Since the key attribute of each individual was the number of potential matches (i.e., the number of edges in the graph), I used a single embedding to represent each participant. Additionally, I added a time-based embedding to capture temporal dynamics. The model classified nodes into three categories: (1) priority individuals for immediate matching, (2) whether those priority individuals should be matched at that step, and (3) whether the entire graph should be processed for maximum matches.
In retrospect, there were probably too many parameters for the model to decide whether or not it should be running the matching algorithm—instead, I should have restricted the action space to performing the matching algorithm only after a new patient entered. Simplifying the action space with an RL model is usually a good approach to improve results, especially if the model is small. With this model, I wasn’t able to get optimal results—even after training it to produce the same output as the greedy algorithm to provide a good starting point (imitation learning), performance started to drop off during further training, and it was clear the model wasn’t converging on a good solution. I don’t think training the RL policy to match the greedy algorithm was a good idea anyway, just because of how simple the greedy algorithm is.
Mixing greedy and batching
Instead of choosing between batching at fixed intervals or greedy matching, I experimented with a hybrid approach: matching easy-to-match individuals periodically while continuously attempting to match hard-to-match individuals. I implemented and tested these three algorithms, comparing their performance against the optimal solution over 100 simulations with 500, 1000, and 2000 total agents. The hyperparameters were drawn from the Ashlagi paper's simulation setup.
The results are shown in the bar plots below:
However, this method's improvements pale in comparison to the theoretical maximum matched pairs that I found earlier:
These results suggest that while current methods are effective, there remains room for improvement—particularly as the number of agents increases, causing both greedy and batch-based methods to diverge further from the optimal solution.
Conclusion
In a future update, I plan to further explore reinforcement learning to determine optimal matching policies. Specifically, I will investigate models better suited for homogeneous graphs. Since this is my first serious project involving reinforcement learning and graph embeddings, I hope my commit history reflects my learning process. The code is available on GitHub (vdaita/organ-donation).
Exploring larger cycles (e.g., three-pair exchanges) may also improve results. While comparing a greedy baseline to alternative approaches is straightforward, determining the theoretical maximum matches becomes increasingly difficult as cycle sizes grow, especially given the constraint that all surgeries must be scheduled simultaneously. I initially attempted using reinforcement learning to find the maximum number of cycles in a graph, but the large action space made it infeasible to outperform a simple greedy cycle-finding algorithm within a reasonable training time.
LLM Usage
Portions of the post edited with gpt-4o for clarity.