inc-join Python/PySpark library released
Incremental join During my time at ABN, one of the most complex topics was joining big datasets that were incrementally refreshed. The complexity comes from the fact that data might arrive late, or not at all. There is a tradeoff here between completeness and performance: the more data you use in your join, the more complete it will […]
