This is a collection of utilities to help with the management, administration and navigation of duckdb database either locally on your computer or in the cloud via motherduck
Database management is incredibly easy in R with fantastic packages such as DBI and dbplyr, however some databases have specific extensions or utilities that are aren’t readily accessible via this packages
{md} pack simplifies these common database administration task with easy to understand syntax. {md} is built upon DBI and returns a lazy DBI object so that you can further fully integrate your data with dbplyr
Eventually, I’ll use the learning from this package to create a meta DB utilities package so that regardless if you’re in snowflake, DuckDB, Redshift, etc you will have generalized functions that work across your database types
This is very much work in progress – I’ll eventually transition to the R7 object system but just want to get some usage first before deciding on the architecture and structure.
Please create an issue if you have any comments or requests or reach out if you have any feedback.
Below is a quick overview of the functions available in the package.
connect_to_motherduck()
will leverage your motherduck token to
connect you to your motherduck instance (it will install the
motherduck extension if not already present)install_extensions()
will install various duckdb extensions from the
official repository or if you set the flag, a community repositoryload_extensions()
will load a duckdb extensions either from an
official repositoryvalidate_md_connection_statu()
will validate your motherduck
connection statusvalidate_extension_install_status()
will validate if an extension
has been successfully installedvalidate_extension_load_status()
will validate if an extension has
been successfully loadedpwd()
prints the current database that you are “in”cd()
will change your “root” database so any execution functions are
relative to that databaselist_database()
list the databases and their metadatalist_schema()
list the schemas and their metadatalist_table()
list the tables and their metadatalist_view()
list the views and their metadataread_httpfs()
will read httpfs file formatsread_parquet()
will read parquet file formatsread_excel()
will read excel files formatscreate_or_replace_database()
will take R data and create a database
with your datacreate_or_replace_schemas()
will take R data and create a schema
with your datacreate_or_replace_view()
will take R data and create a views with
your datacreate_or_replace_table()
will take R data and create a tables with
your datadrop_table()
will delete a table from your databasesdrop_database()
will delete a databaselist_md_active_accounts()
list users with active duckling instances
(note: will not list inactive users)list_md_user_instance()
list a user’s instance settingslist_md_user_tokens()
list user’s tokensshow_current_user()
show your current user namecreate_md_user()
create user or service account in your organizationdelete_md_user()
delete a user or service accountcreate_md_access_token()
create an access token for a userdelete_md_access_token()
delete an access token for a userconfigure_md_user_settings()
configure a user’s instance settingssummary()
will summarize your table or view’s dataread_excel()
will read an excel file formats directly to duckdb or
motherduckread_httpfs()
will read httpfs formats directly to duckdb or
motherduckusethis::edit_r_environ()
[!NOTE]
Whats the difference between Motherduck and Duckdb?
Duckdb is a database that you can deploy and run either temporary or permanently in your computer. If you run it via your local computer, it is only available on your computer
Motherduck is a cloud based deployment of duckdb which means you can save your data in the cloud or access it locally
Most core functions in this package work for both motherduck or duckdb database
It is more of a question if you want you data to be access only locally on your computer or if you want to be able to access it remotely via the cloud
When creating a duckdb database, you have three options
To create options 1 or 2 you can simply use either the duckdb or the duckplyr
To connect to your motherduck account you can still reference the above
packages but need to do few additional steps or alternatively you use
the connect_to_motherduck()
function and pass through your token
variable name1 and optional configuration options
con_md <- connect_to_motherduck("MOTHERDUCK_TOKEN")
── Extension Load & Install Report ─────────────────────────────────────────────
Installed and loaded 1 extension: motherduck
Use `list_extensions()` to list extensions, status and their descriptions
Use `install_extensions()` to install new duckdb extensions
See <https://duckdb.org/docs/stable/extensions/overview.html> for more
information
✔ You are connected to MotherDuck
This will return a connection and print statement indicating if connection status.
At any time you can validate your connection status with
validate_md_connection_status()
validate_md_connection_status(con_md)
✔ You are connected to MotherDuck
[!NOTE]
how to create a motherduck account and access token?
- Go to motherduck and create an account, free options are available
- Go to your user name in the top right, click settings then click access tokens
- Click create token and then name your token and copy the token code
- You will need this token to access your account
- If you want to access it via R then simplest way is to save your access code as a variable in your r environment
- Simply leverage the usethis function
edit_r_environ()
to set your access code to a variable and save it – this is one time activity- To check if your correctly saved your variable then you can use the Sys.getenv(“var_name”) with “var_name” the named you assigned your access token to
- Going forward, if you want to access your token you don’t need to re-type the access token, simply remember your variable name
First you will need a motherduck account, which has both free and paid tiers
Once you’ve created an account, simply, go to your settings and click ‘Access Tokens’ under your ‘Integrations’
Keep this secure and safe as this lets you connect to your online database to read or write data
Open R and use the
usethis::edit_r_environ()
function to put your motherduck token as a variable in your enviornment profile
- MOTHERDUCK_TOKEN=‘tokenID’
From there you can use the
connect_to_motherduck("MOTHERDUCK_TOKEN")
This will use the DBI library to create a connection to your mother duck instance
When connecting to motherduck there are a number of configuration
options available, you can reference them via the md::db_config
which
will pull a list of options and their default values
To change these, simply edit the configuration options you want and then
pass the list as an argument connect_to_motherduck()
or duckdb()
if
connecting locally
You can see the full list of duckdb configuration options
here or
alternatively you can use list_settings()
to see your current
configuration options.
config <- md::db_config #<1> get list of default configuration options
config$allow_community_extensions <- "true" #<2> change a default option
con_md <- connect_to_motherduck("MOTHERDUCK_TOKEN",config = config) #<3> pass the modified list to your connection
list_setting(con_md)
Congratulations, you’ve set up your motherduck database!
If you’re new to databases, it will be helpful to have a basic understanding of database management - don’t worry the basics are straight forward and won’t overwhelm you, below are a list of resources I found helpful.
Please see the {md} package down website to see additional documentation on how to use the functions and motherduck
recommend you use MOTHERDUCK_TOKEN
as your variable name ↩