library(data.table)
options(datatable.print.class = TRUE)35 Reference Semantics
This chapter introduces the concept of reference semantics, which is used by the data.table package.
When you create an object in R, it is stored at some location in memory. The address() function in the data.table package returns the memory address of an object.
35.1 Add column to data.frame
Let’s create a simple data.frame, df1:
df1 <- data.frame(
ID = c(8001, 8002),
GCS = c(15, 13)
)…and print its address:
address(df1)[1] "0x10f209788"
Right now, we don’t care what the actual address is - but we want to keep track when it changes.
Let’s add a new column to df1:
df1[["HR"]] <- c(80, 90)…and print its address:
address(df1)[1] "0x10eaf0048"
The address has changed, even though we’re still working on the “same” df1 object.
35.2 Add column to data.table
Let’s create a simple data.table, dt1:
dt1 <- data.table(
ID = c(8001, 8002),
GCS = c(15, 13)
)…and print its address:
address(dt1)[1] "0x10d3eac00"
Let’s add a new column to dt1 in-place:
dt1[, HR := c(80, 90)]…and print its address:
address(dt1)[1] "0x10d3eac00"
The address remains the same.
What if we had used the data.frame syntax (which still works on a data.table) instead?
dt1[["HR_too"]] <- c(80, 90)address(dt1)[1] "0x10e6e0f38"
The address indeed changes, just like with data.frames.
Making copies of large objects can be time-consuming and memory-intensive. Up to this point, we have seen that making changes to data.table by reference, changes the object in-place and does not create a new copy.
35.3 Caution with reference semantics
So far so good, we start to understand one reason why data.table is efficient. One very important thing to keep in mind is that when you do want to make a copy of a data.table, e.g. to create a different version of it, you must use data.tables’s copy().
Let’s see why.
35.3.1 Copying a data.frame
Let’s remind ourselves of the contents and address of df1:
df1 ID GCS HR
1 8001 15 80
2 8002 13 90
address(df1)[1] "0x10eaf0048"
To make a copy of df1, we can simply assign it to a new object:
df2 <- df1df2 ID GCS HR
1 8001 15 80
2 8002 13 90
address(df2)[1] "0x10eaf0048"
The address of df2 is the same as df1, which means they are pointing to the same object in memory.
As we’ve already seen, if we edit df2, its address will change:
df2[1, 3] <- 75df2 ID GCS HR
1 8001 15 75
2 8002 13 90
address(df2)[1] "0x10f22f158"
The contents and address of df2 have changed, but df1 remains the same, as you might expect:
df1 ID GCS HR
1 8001 15 80
2 8002 13 90
address(df1)[1] "0x10eaf0048"
35.3.2 Copying a data.table
Let’s remind ourselves of the contents and address of dt1:
dt1 ID GCS HR HR_too
<num> <num> <num> <num>
1: 8001 15 80 80
2: 8002 13 90 90
address(dt1)[1] "0x10e6e0f38"
Let’s see what happens if we assign dt1 to a new object:
dt2 <- dt1dt2 ID GCS HR HR_too
<num> <num> <num> <num>
1: 8001 15 80 80
2: 8002 13 90 90
address(dt2)[1] "0x10e6e0f38"
So far it’s the same as with data.frames.
Let’s see what happens if we edit dt2 by reference:
dt2[1, HR := 75]dt2 ID GCS HR HR_too
<num> <num> <num> <num>
1: 8001 15 75 80
2: 8002 13 90 90
address(dt2)[1] "0x10e6e0f38"
and let’s recheck dt1:
dt1 ID GCS HR HR_too
<num> <num> <num> <num>
1: 8001 15 75 80
2: 8002 13 90 90
dt1 has changed as well, because dt1 and dt2 are still pointing to the same object in memory!
This is crucial to remember to avoid errors and confusion.
When you want to make a copy of a data.table, you must use the copy() function.
Let’s see what happens if we use copy():
dt3 <- copy(dt1)dt3 ID GCS HR HR_too
<num> <num> <num> <num>
1: 8001 15 75 80
2: 8002 13 90 90
address(dt1)[1] "0x10e6e0f38"
address(dt3)[1] "0x10eaa8000"
dt3 and dt1 are pointing to different objects in memory, so editing one does not affect the other.
dt3[1, HR := 100]dt3 ID GCS HR HR_too
<num> <num> <num> <num>
1: 8001 15 100 80
2: 8002 13 90 90
dt1 ID GCS HR HR_too
<num> <num> <num> <num>
1: 8001 15 75 80
2: 8002 13 90 90
35.4 Summary
| Operation | data.frame | data.table |
|---|---|---|
df[["col"]] <- x |
Creates copy | Creates copy |
dt[, col := x] |
N/A | Modifies in-place |
df2 <- df1 |
df1 unchanged | - |
dt2 <- dt1 |
- | dt1 also changes! |