Joining data is an inescapable and essential component of data work. Whether you are an economist or data analyst, a data engineer or data scientist, you regularly need to combine information from different data frames. However, more complex joins can result in computationally intensive operations where mistakes are difficult to detect. The {joyn} R package solves these problems by allowing efficient and flexible data joins as well as user-friendly checks and join validation features.
Stata users will find {joyn} particularly intuitive. Economists and data analysts accustomed to the robust functionalities of Stata's `merge` command often find the transition to R for joining operations frustrating. While R’s base `merge()` function offers basic joining capabilities, it lacks the intuitiveness and comprehensive features appreciated by Stata users in the merge command. Existing R packages like {data.table}, {dplyr}, and {collapse} provide powerful alternatives, some even surpassing Stata with unique functionalities. However, a crucial gap remains – data join validation.
{joyn} fills this gap by bridging the two worlds of R and Stata and joining the strengths of each. With the recent release of {joyn} version 0.2.0 for R, users, regardless of their proficiency or enthusiasm for R, can now access and benefit from its new features. By offering intuitive join handling tools, validation, and informative reports, {joyn} ensures precise and well-informed results that enhance data joining in R while also remaining intuitive for Stata users to navigate.
The real beauty of JOYn
Flexibility in Join Types:
{joyn} offers users the flexibility to select their preferred join type ("left", "right", “full”, or "inner"). By default, {joyn} performs a full join to ensure inclusion of all observations.
Easy Variable Handling:
{joyn} facilitates variable handling from both data frames, addressing issues such as duplicate variable names. Users can choose to update values automatically, retain both variables with unique suffixes, or selectively include specific variables from one data frame only.
Match Type Awareness:
Users can specify multiple keys as well as the match type – whether one-to-one, one-to-many, many-to-one, or many-to-many. Moreover, {joyn} checks whether the specified match type is appropriate on the given keys and returns information to inform match type specification. In contrast to other R packages, {joyn} performs a one-to-one join by default. This is the most restrictive match type that ensures the users don’t get unexpected results with the many-to-many match type that other R packages use by default.
Instant Feedback:
{joyn} improves the join process with instant feedback, providing a summary table detailing the merge, a reporting variable tracking individual row statuses, and various types of messages. For example, depending on user-selection, the report variable (see figure below) identifies each row's origin—whether it originated from the left or right input data frame—and highlights any updates made to the values of the columns from the left data frame by those from the right. Additionally, its messaging system is both preventive (e.g., flagging issues like unmatched observations or missing variables) and informative (e.g., time spent in execution).
Familiar Syntax:
{joyn} also acts as a wrapper: it includes functions that resemble the usability of base R, {data.table} and {dplyr} while also incorporating the additional features that characterize {joyn}.
An important caveat
While {joyn} does strive for efficiency, it does not prioritize speed above all else. Its comprehensive join checks and detailed reporting slow-down performance slightly, but to offset this, {joyn} utilizes the fastest joining alternatives available in the R community - namely {data.table} and {collapse}. As a result, {joyn} sets itself apart as a tool that allows users to integrate data frames confidently and effectively: the benefits of error prevention and valuable insights make {joyn} a reliable choice for joining tasks.
To get started
Take the first step towards leveraging {joyn} version 0.2.0 by installing it directly from CRAN.
Use the command install.packages("joyn"), and then refer to its website [https://randrescastaneda.github.io/joyn/] for further information on its functionalities.
The authors gratefully acknowledge financial support from the UK Government through the Data and Evidence for Tackling Extreme Poverty (DEEP) Research Program.
Join the Conversation