Conquer Duplicate IDs: Master Left, Right, Inner, & Full Joins in Google Sheets

Have you ever struggled with joining tables in Google Sheets that have duplicate IDs? The basic joins in Google Sheets assume that one table has unique IDs while the other may or may not. But what if both tables have duplicates? Don’t worry, we’ve got you covered with a powerful combination of formulas that will handle any scenario.

All-Weather Formulas for All Join Types

Introducing a skill-boosting adventure that will help you master left, right, inner, and full joins without worrying about duplicate IDs! This isn’t just joining; it’s a journey filled with never-before-seen Google Sheets tricks.

Prerequisites:

  • The unique ID column must be the first column in both tables.
  • Physical data ranges (not formulas).

While the core formula for all joins remains consistent, slight adjustments are needed depending on the join type. Full joins have an extra step, but the first three types share similar modifications.

Conquering Duplicate IDs: Sample Data for All Join Types in Google Sheets

Have duplicate IDs wreaked havoc on your Google Sheets joins? Fear not! We’ve got you covered with “all-weather” formulas that will conquer Left, Right, Inner, and Full Joins, even with duplicate IDs in your unique identifier column.

Our example involves two tables: Customer Data (with columns for Account ID, Company, and City) and Vehicle Information (with columns for Account ID and Model). Both tables contain duplicates in the unique identifier column (Account ID).

In the first table, some companies appear in multiple cities, reflecting branch locations or overlapping territories. Meanwhile, the second table reveals instances where a single company owns several vehicles. Adding to the complexity, some Account IDs exist in one table but not the other, further challenging traditional join methods.

Unlike the formulas presented in my previous ‘join’ tutorials, the formulas in this tutorial require physical tables, and the ID column must be the first column in both tables. This adjustment ensures compatibility with scenarios involving duplicate IDs.

However, the formulas provided here are versatile and can be used confidently for all join types in Google Sheets. They are designed to handle various situations, making them reliable solutions for your data joining needs.

Click here to copy my sample sheet with examples and formulas.

Handling Duplicate IDs in Left Joins on Source Tables

A left join combines all records from the left table with the matching records from the right table. To handle duplicate IDs in left joins, we’ll use a powerful key combination of REDUCE, FILTER, and OFFSET.

Formula:

=LET( lt_id_field, A1:A7, lt_range, A1:C7, rt_id_field, E1:E7, rt_range_x_id, F1:F7, merge, REDUCE("", lt_id_field, LAMBDA(a, v, VSTACK(a, IFNA( HSTACK( OFFSET(v, 0, 0, 1, COLUMNS(lt_range)), FILTER(rt_range_x_id, rt_id_field=v) ), OFFSET(v, 0, 0, 1, COLUMNS(lt_range))))) ), IFNA(FILTER(merge, CHOOSECOLS(merge, 1)<>"")) ) 

Result:

Left Join with Duplicate IDs in Both Tables in Google Sheets

To use this formula, adjust the range references to match your table. Replace the ranges A:A7, A1:C7, E1:E7, and F1:F7 in the formula with the corresponding ranges in your table. No other changes are required.

This formula combines the rows from both tables based on the matching Account IDs, even if there are duplicates. It handles situations with multiple matches in the right table by replicating the entire left table row for each match.

Handling Duplicate IDs in Right Joins on Source Tables

A right join combines all records from the right table with the matching records from the left table. To handle duplicate IDs in right joins, you can use the same formula as the left join with slight adjustments in the range references.

Formula:

=LET( rt_id_field, E1:E7, rt_range, E1:F7, lt_id_field, A1:A7, lt_range_x_id, B1:C7, merge, REDUCE("", rt_id_field, LAMBDA(a, v, VSTACK(a, IFNA( HSTACK( OFFSET(v, 0, 0, 1, COLUMNS(rt_range)), FILTER(lt_range_x_id, lt_id_field=v) ), OFFSET(v, 0, 0, 1, COLUMNS(rt_range))))) ), IFNA(FILTER(merge, CHOOSECOLS(merge, 1)<>"")) ) 

Result:

Right Join with Duplicate IDs in Both Tables in Google Sheets

Adjust the range references in the formula to match your table. Use the same names and ranges as in the left join formula, but swap the lt and rt range names.

This formula is specifically designed for right joins and handles duplicate IDs in the unique identifier fields.

Handling Duplicate IDs in Inner Joins on Source Tables

An inner join combines only the matching records from both tables. You can use the left join or right join formula for an inner join with a minor change in the formula_expression part.

Replace CHOOSECOLS(merge, 1)<>"") in the formula expression with CHOOSECOLS(merge, -1)<>"").

In the left and right joins, the formula expression filters out blank rows if the first column in the ‘merge’ is blank. This removes the first empty row added during REDUCE. In an inner join, if any cell in the last column is blank, it means it’s either the previously added blank row or a mismatching row in either of the tables.

Here is the left join formula converted to an inner join:

Formula:

=LET( lt_id_field, A1:A7, lt_range, A1:C7, rt_id_field, E1:E7, rt_range_x_id, F1:F7, merge, REDUCE("", lt_id_field, LAMBDA(a, v, VSTACK(a, IFNA( HSTACK( OFFSET(v, 0, 0, 1, COLUMNS(lt_range)), FILTER(rt_range_x_id, rt_id_field=v) ), OFFSET(v, 0, 0, 1, COLUMNS(lt_range))))) ), IFNA(FILTER(merge, CHOOSECOLS(merge, -1)<>"")) )

Result:

Inner Join with Duplicate IDs in Both Tables in Google Sheets

This formula performs an inner join, returning only the matching records from both tables, including duplicate Account IDs.

Handling Duplicate IDs in Full Joins on Source Tables

A full join combines all records from both tables. For a full join, you can use the left join formula. However, an additional step is required to include the mismatching records from the right table.

Formula:

=LET( lt_id_field, A1:A7, lt_range, A1:C7, rt_id_field, E1:E7, rt_range_x_id, F1:F7, merge, REDUCE("", lt_id_field, LAMBDA(a, v, VSTACK(a, IFNA( HSTACK( OFFSET(v, 0, 0, 1, COLUMNS(lt_range)), FILTER(rt_range_x_id, rt_id_field=v) ), OFFSET(v, 0, 0, 1, COLUMNS(lt_range))))) ), missing, IFERROR(FILTER( HSTACK(rt_id_field, SEQUENCE(1, COLUMNS(lt_range)-1)/0, rt_range_x_id), IFNA(XMATCH(rt_id_field, lt_id_field))="" )), VSTACK(IFNA(FILTER(merge, CHOOSECOLS(merge, 1)<>"")), missing) )

Result:

Full Join with Duplicate IDs in Both Tables in Google Sheets

This formula combines the left join formula with an extra step to filter the mismatching records from the right table. It performs a full join, returning all records from both tables, including duplicate Account IDs.

Additional Notes

All the provided formulas are flexible and easily adaptable to your data ranges, even for beginners. However, understanding the formula structure may require careful attention and could be intimidating for novice users.

Please note that these formulas utilize REDUCE and OFFSET functions, which can impact performance when working with large datasets. While I have avoided nested lambdas to enhance performance, you may still experience some performance issues.

If you wish to eliminate duplicate records in the processed data, you can use the UNIQUE function. Additionally, the SORTN function can be employed to remove duplicates based on any column.

Finally, if your data contains a date column, the resulting table may display a date value column instead. You can revert it to the date format using the Format > Number > Date command.

For merging tables using QUERY, please check out this resource: Merge Two Tables in Google Sheets – The Ultimate Guide.

Now that you have these all-weather formulas at your disposal, join tables in Google Sheets like a pro! Happy joining with Crawlan.com!

Related posts