fpaR: A Business Intelligence Toolkit for Financial Planning & Analysis (FP&A)

Introduction

fpaR is a collection of business intelligence tools designed to simplify common financial planning and analysis (FP&A) tasks such as time intelligence calculations, customer, vendor or product segmentation, and factor/variance analysis.

The package is inspired by best practices from a collection of blogs, books, industry research, and hands-on work experience, consolidating frequently performed business analyses into a fast, efficient, and reusable framework.

In particular, the time intelligence functions are heavily inspired by PowerBI DAX functions.

Under the hood, these functions are built upon the great foundations of:

fpaR is designed to seamlessly work with either tibbles or modern databases (DuckDB, Snowflake, SQLite, etc) with a unified syntax.

Even if you are working with tibbles, most functions are optimized to leverage DuckDB for superior speed and performance.¹

By default most fpaR function returns a lazy DBI object which you can return as a tibble with dplyr::collect()

Key features & benefits

Unified syntax regardless if your data is in a tibble or a database
Scale your data with duckdb to optimize your calculations
Instant clarity as every function summarizes its transformation actions so that you can understand and validate the results

Installation

Install the development version of fpaR from GitHub:

# Install using pak or install.package()

pak::pak("alejandrohagan/fpaR")

What is in fpaR?

There are 3 main categories of functions:

Time intelligence (Ready for testing and feedback)
Segmentation strategies (work in progress)
Factor Analysis (work in progress)

Time intelligence

fpaR provides readily available functions for most time intelligence analysis such as Year-over-Year (YoY), Month-to-Date (MTD), and Current Year-to-Date over Previous Year-to-Date (YoYTD) analysis.

These functions are designed to quickly answer most common time intelligence related analysis in a consistent, fast and transparent way.

Key benefits:

Auto-fill missing dates: Ensures no missing periods in your datasets so that right period comparisons are performed
Flexible calendar options: Handle comparisons based on a standard or 5-5-4 fiscal calendar to accommodate different reporting frameworks
Period imbalance indicator: When comparing periods with dates imbalance, the time intelligence functions will alert you to the type and number of period imbalances to ensure you are aware of misleading likewise comparisons

Below is the full list of time intelligence functions:

short_name	description	shift	aggregate	compare
YoY	Full Year over Year			X
YTD	Year-to-Date		X
PYTD	Prior Year-to-Date amount	X	X
YoYTD	Current Year-to-Date over Prior Year-to-Date	X	X	X
YTDOPY	Year-to-Date over Full Previous Year	X	X	X
QoQ	Full Quarter over Quarter			X
QTD	Quarter-to-Date		X
PQTD	Prior Quarter-to-Date	X	X
QOQTD	Quarter-over-Quarter-to-Date	X	X	X
QTDOPQ	Quarter-to-Date over Full Previous Quarter	X	X	X
MTD	Month-to-Date		X
MoM	Full Month over Full Month			X
MoMTD	Current Month-to-Date over Prior Month-to-Date	X	X	X
PMTD	Prior Month’s MTD amount	X	X
MTDOPM	Month-to-Date over Full Previous Month	X	X	X
WTD	Week-to-Date		X
WoW	Full Week over Full Week			X
WoWTD	Current Week-to-Date over Prior Week-to-Date	X	X	X
PWTD	Prior Week-to-Date	X	X
ATD	cumlaitve total from inception to date		x
DoD	Full Day over Full Day			X

How to use fpaR?

Time Intelligence

When you execute a time intelligence function, it will return a ti class object with a custom print method that explains what the function is doing and a summary of transformation steps and the calendar attributes.

sales |> 
   mtd(.date=order_date,.value = margin,calendar_type = "standard")

── Month-to-date ───────────────────────────────────────────────────────────────

Function: `mtd` was executed

── Description: ──

This creates a daily `cumsum()` of the current month margin from the start of
the standard calendar month to the end of the month

── Calendar: ──

• The calendar aggregated order_date to the day time unit
• A standard calendar is created with 0 groups
• Calendar ranges from 2021-05-18 to 2024-04-20
• 222 days were missing and replaced with 0
• New date column date, year and month was created from order_date

── Actions: ──

✔Aggregate

✖Shift

✖Compare

✖Proportion Of Total

✖Count Distinct

── Next Steps: ──

• Use `calculate()` to return the results

────────────────────────────────────────────────────────────────────────────────

This prints a summary of the function’s actions,details the calendar’s attributes, summarizes the main transformation steps and lists out possible next actions.

To return a tibble of results, pass the ti object through to calculate().

sales |>                                                                
   mtd(.date=order_date,.value = margin,calendar_type = "standard") |>  
   calculate()

# Source:     SQL [?? x 7]
# Database:   DuckDB v1.1.3 [hagan@Linux 6.12.10-76061203-generic:R 4.4.3//tmp/RtmpLBktPR/file4b8fe61bc9464]
# Ordered by: date
    year month date       margin missing_date_indicator mtd_margin
   <dbl> <dbl> <date>      <dbl>                  <dbl>      <dbl>
 1  2022     2 2022-02-26 17891.                      0    219599.
 2  2022     2 2022-02-27     0                       1    219599.
 3  2022     2 2022-02-28  7647.                      0    227246.
 4  2021     7 2021-07-29     0                       1     48746.
 5  2021     7 2021-07-30     0                       1     48746.
 6  2021     7 2021-07-31  4070.                      0     52816.
 7  2022     2 2022-02-01  7032.                      0      7032.
 8  2022     2 2022-02-02  8778.                      0     15810.
 9  2022     2 2022-02-03 10359.                      0     26169.
10  2022     2 2022-02-04  7337.                      0     33506.
# ℹ more rows
# ℹ 1 more variable: days_in_current_period <dbl>

If you using a tibble, under the hood, fpaR is converting your data to a duckdb database.

If your data is in a database, the package will leverage dbplyr to execute all the calculations.

Either case use dplyr::collect() to return your results to a local tibble.

sales |>                                                                
   mtd(.date=order_date,.value = margin,calendar_type = "standard") |>  
   calculate() |>                                                       
   dplyr::collect() |>                                               
   head(10)

# A tibble: 10 × 7
    year month date       margin missing_date_indicator mtd_margin
   <dbl> <dbl> <date>      <dbl>                  <dbl>      <dbl>
 1  2023     7 2023-07-23     0                       1     69606.
 2  2023     7 2023-07-24   107.                      0     69713.
 3  2023     7 2023-07-25   566.                      0     70279.
 4  2023     7 2023-07-26  1331.                      0     71610.
 5  2023     7 2023-07-27  4201.                      0     75811.
 6  2023     7 2023-07-28  4508.                      0     80319.
 7  2023     7 2023-07-29  6999.                      0     87319.
 8  2023     7 2023-07-30     0                       1     87319.
 9  2023     7 2023-07-31     0                       1     87319.
10  2023    12 2023-12-01  2158.                      0      2158.
# ℹ 1 more variable: days_in_current_period <dbl>

What if you need the analysis at the group level?

Simply pass through the groups that you want with dplyr::group_by() and time intelligence function will create a custom calendar for each group level.

This will calculate a complete calendar ensuring each group has a complete calendar with no missing dates.

sales |>   
   dplyr::group_by(customer_key,store_key) |>  
   yoy(.date=order_date,.value = margin,calendar_type = "standard")

── Year over year ──────────────────────────────────────────────────────────────

Function: `yoy` was executed

── Description: ──

This creates a full year `sum()` of the previous year margin and compares it
with the full year `sum()` current year margin from the start of the standard
calendar year to the end of the year

── Calendar: ──

• The calendar aggregated order_date to the year time unit
• A standard calendar is created with 2 groups
• Calendar ranges from 2021-05-18 to 2024-04-20
• 222 days were missing and replaced with 0
• New date column date and year was created from order_date

── Actions: ──

✔Aggregate

✔Shift 1 year

✔Compare previous year

✖Proportion Of Total

✖Count Distinct

customer_key and store_key groups are in the table

── Next Steps: ──

• Use `calculate()` to return the results

────────────────────────────────────────────────────────────────────────────────

The functions will work with your database even if you don’t have write permission by creatively leveraging CTEs to create interim tables.

Why do we need this package when we have lubridate?

Lubridate is an excellent package and is heavily used by the package. The issue isn’t lubridate but rather issues you may not be aware of in your package.

Time-based comparisons, such as Year-over-Year (YoY), Quarter-over-Quarter (QoQ), and Month-to-Date (MTD), are common for tracking business performance. However, they come with challenges:

Many datasets do not have continuous dates, especially if data is recorded only on business days or for active transactions
Period imbalances between periods (Eg. the different number of days between February vs. January) can create misleading analysis or trends
Your analysis may need to reference a non-standard calendar such as a 5-5-4, 4-4-5, or 13 month calendar
Your data may be in excel sheets, csv or databases and you need to inter-operable framework to switch between all your data types

Issue 1: Continuous Dates

Referencing the table below, if we use dplyr::lag() to compare Day-over-Day (DoD) revenue, we would be missing 2024-01-02, 2024-01-04, and 2024-01-05 which will lead to incorrect answers or trends.

order_date	margin
2024-01-01	1200
2024-01-03	1100
2024-01-06	1300
2024-01-07	900
2024-01-08	1200
2024-01-09	850
2024-01-11	1450

To correct this, fpaR will automatically complete your calendar for each group for the missing periods to ensure there are no missing periods when calculating trends.

Issue 2

Issue 3

Datasets

This package leverages the Contoso package for its analysis. The contoso datasets are fictional business transaction of Contoso toy company which are helpful for business intelligence related analysis

This is an active work-in-progress, and feedback, testing, and contributions are welcome!

Future capabilities

Segmentation Strategies

Provides functions to segment and categorize your data into meaningful business categories.

Example Segmentation Methods:

ABC Classification – Categorizing products/customers based on revenue contribution
Cohort Analysis - Categorize your data by a shared time based attribute to track over time
New vs. Returning – Distinguishing first-time buyers from repeat customers
K-means Clustering – Grouping data points based on patterns
UMAP (Uniform Manifold Approximation and Projection) – Dimensionaltiy reduction for clustering

Factor / Variation Analysis

Breaks down revenue or cost changes into price, volume, and mix effects.

Use Cases:

Analyzing revenue growth due to price increases vs. increased sales volume
Measuring the impact of product mix changes on profitability.

Additional references and inspirations

PeerChristensen’s Cohort Package

README