Loading required package: DBI
61 DuckDB
DuckDB is a popular and powerful “in-process SQL OLAP database management system”
It provides APIs in multiple languages, including R, Python, and Julia. 
 In practical terms, it provides a very efficient way to read, query, and write big data in R, which includes datasets that are bigger than memory. 
 For example, it allows lazy-reading of large datasets with optional filtering, selecting, and grouping operations before reading into memory. 
Chapter 60 includes examples using duckplyr
This section is a work in progress.
61.1 Installation
install.packages("duckdb")61.2 DuckDB API
DuckDB uses the R Database Interface (DBI). Learn more about using DBI here.
61.3 Setup Connection
By default, duckdb()’s dbdir argument defaults to DBDIR_MEMORY, which creates a non-persistent, in-memory object.
61.4 Settings
dbExecute(con, "PRAGMA enable_progress_bar;")[1] 0
61.5 Read CSV
You can directly read into data.frame, which may be best for smaller datasets:
ir <- dbGetQuery(con, 
    "SELECT * FROM read_csv_auto('./iris.csv');"
)
head(ir)  Sepal_Length Sepal_Width Petal_Length Petal_Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa
For larger datasets, you can create a DuckDB table that can be queried without loading the dataset into memory:
dbExecute(con,
    "CREATE TABLE iris AS SELECT * FROM './iris.csv';"
)[1] 150
61.5.1 List tables
dbGetQuery(con, "PRAGMA show_tables;")  name
1 iris
Using the R API:
dbListTables(con)[1] "iris"
61.6 Filter
dbGetQuery(con,
    "SELECT * FROM iris WHERE Species in ('setosa', 'versicolor')") |> head()  Sepal_Length Sepal_Width Petal_Length Petal_Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa
dbExecute(con,
    "
    CREATE TABLE ir AS SELECT * FROM 
    read_csv_auto('/Users/egenn/icloud/Data/iris.csv', 
    delim=',', header=True);
    "
)[1] 150
61.6.1 Fetch data to data.frame
ir = dbGetQuery(con, "SELECT * from iris")
head(ir)  Sepal_Length Sepal_Width Petal_Length Petal_Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa
class(ir)[1] "data.frame"
61.6.2 Fetch data to data.table:
ir = dbGetQuery(con, "SELECT * from iris")61.7 Close connection
When you are done working with the database, you should close the connection:
dbDisconnect(con)