md

My Project

Overview

This is a collection of utilities to help with the management, administration and navigation of duckdb database either locally on your computer or in the cloud via motherduck

Database management is incredibly easy in R with fantastic packages such as DBI and dbplyr, however some databases have specific extensions or utilities that are aren’t readily accessible via this packages

{md} pack simplifies these common database administration task with easy to understand syntax. {md} is built upon DBI and returns a lazy DBI object so that you can further fully integrate your data with dbplyr

Future ambition

Eventually, I’ll use the learning from this package to create a meta DB utilities package so that regardless if you’re in snowflake, DuckDB, Redshift, etc you will have generalized functions that work across your database types

This is very much work in progress – I’ll eventually transition to the R7 object system but just want to get some usage first before deciding on the architecture and structure.

Please create an issue if you have any comments or requests or reach out if you have any feedback.

Overview of functions

Below is a quick overview of the functions available in the package.

Functions that help you manage your connection and database metadata

Functions that help you see what is in your databases

Functions that will help you read data into duckdb or motherduck

Functions that will help you create or replace databases, scehems, tables or views

Functions to help you manage motherduck users, tokens, and instance settings

Functions to help you understand your data

Functions to help your read data

What do I need to use this?

[!NOTE]

Whats the difference between Motherduck and Duckdb?

Lets see the package in action

Create a duckdb instance and Connect to your motherduck account

When creating a duckdb database, you have three options

  1. A in-memory based instance that exists in your local computer
  2. A file based instance that exists in your local computer
  3. A cloud-based instance through motherduck

To create options 1 or 2 you can simply use either the duckdb or the duckplyr

To connect to your motherduck account you can still reference the above packages but need to do few additional steps or alternatively you use the connect_to_motherduck() function and pass through your token variable name1 and optional configuration options

con_md <- connect_to_motherduck("MOTHERDUCK_TOKEN")
── Extension Load & Install Report ─────────────────────────────────────────────

Installed and loaded 1 extension: motherduck

Use `list_extensions()` to list extensions, status and their descriptions

Use `install_extensions()` to install new duckdb extensions

See <https://duckdb.org/docs/stable/extensions/overview.html> for more
information

✔ You are connected to MotherDuck

This will return a connection and print statement indicating if connection status.

At any time you can validate your connection status with validate_md_connection_status()

validate_md_connection_status(con_md)
✔ You are connected to MotherDuck

[!NOTE]

how to create a motherduck account and access token?

  1. Go to motherduck and create an account, free options are available
  2. Go to your user name in the top right, click settings then click access tokens
  3. Click create token and then name your token and copy the token code
  4. You will need this token to access your account
  5. If you want to access it via R then simplest way is to save your access code as a variable in your r environment
  6. Simply leverage the usethis function edit_r_environ() to set your access code to a variable and save it – this is one time activity
  7. To check if your correctly saved your variable then you can use the Sys.getenv(“var_name”) with “var_name” the named you assigned your access token to
  8. Going forward, if you want to access your token you don’t need to re-type the access token, simply remember your variable name

When connecting to motherduck there are a number of configuration options available, you can reference them via the md::db_config which will pull a list of options and their default values

To change these, simply edit the configuration options you want and then pass the list as an argument connect_to_motherduck() or duckdb() if connecting locally

You can see the full list of duckdb configuration options here or alternatively you can use list_settings() to see your current configuration options.

config <- md::db_config #<1> get list of default configuration options


config$allow_community_extensions <- "true" #<2> change a default option

con_md <- connect_to_motherduck("MOTHERDUCK_TOKEN",config = config) #<3> pass the modified list to your connection
list_setting(con_md)

Congratulations, you’ve set up your motherduck database!

If you’re new to databases, it will be helpful to have a basic understanding of database management - don’t worry the basics are straight forward and won’t overwhelm you, below are a list of resources I found helpful.

Please see the {md} package down website to see additional documentation on how to use the functions and motherduck

  1. recommend you use MOTHERDUCK_TOKEN as your variable name