Returning memory to the OS

I was running the following test with the ffi pack, and I experience an interesting behavior of tcmalloc:

  • (in the code below I am using the test_mode.pl file included with the ffi pack, rss(R) unifies R with the rss memory used by the process)
1 ?- module(test_mode).
true.

test_mode: 2 ?- rss(Rstart).
Rstart = 19898368.

test_mode: 3 ?- S=1 00 000 000,rss(R0),c_alloc(P,long[S]),rss(R1),garbage_collect_atoms,rss(R2),RssDiff is R2 - $Rstart, format('~3I bytes above initial rss.',[RssDiff]).
802_115_584 bytes above initial rss.
S = 100000000,
R0 = 19955712,
P = test_mode:<C long[100000000]>(0x5573e7346000),
R1 = R2, R2 = 822013952,
RssDiff = 802115584,
Rstart = 19898368.

test_mode: 4 ?- S=1 00 000 000,rss(R0),c_alloc(P,long[S]),rss(R1),garbage_collect_atoms,rss(R2),RssDiff is R2 - $Rstart, format('~3I bytes above initial rss.',[RssDiff]).
1_602_072_576 bytes above initial rss.
S = 100000000,
R0 = 822054912,
P = test_mode:<C long[100000000]>(0x557416e38000),
R1 = R2, R2 = 1621970944,
RssDiff = 1602072576,
Rstart = 19898368.

test_mode: 4 ?- S=1 00 000 000,rss(R0),c_alloc(P,long[S]),rss(R1),garbage_collect_atoms,rss(R2),RssDiff is R2 - $Rstart, format('~3I bytes above initial rss.',[RssDiff]).
2_401_095_680 bytes above initial rss.
S = 100000000,
R0 = 1621078016,
P = test_mode:<C long[100000000]>(0x55744692a000),
R1 = R2, R2 = 2420994048,
RssDiff = 2401095680,
Rstart = 19898368.

test_mode: 4 ?- S=1 00 000 000,rss(R0),c_alloc(P,long[S]),rss(R1),garbage_collect_atoms,rss(R2),RssDiff is R2 - $Rstart, format('~3I bytes above initial rss.',[RssDiff]).
2_401_148_928 bytes above initial rss.
S = 100000000,
R0 = R1, R1 = R2, R2 = 2421047296,
P = test_mode:<C long[100000000]>(0x5573e7346000),
RssDiff = 2401148928,
Rstart = 19898368.

test_mode: 4 ?- S=1 00 000 000,rss(R0),c_alloc(P,long[S]),rss(R1),garbage_collect_atoms,rss(R2),RssDiff is R2 - $Rstart, format('~3I bytes above initial rss.',[RssDiff]).
2_401_148_928 bytes above initial rss.
S = 100000000,
R0 = R1, R1 = R2, R2 = 2421047296,
P = test_mode:<C long[100000000]>(0x557416e38000),
RssDiff = 2401148928,
Rstart = 19898368.

test_mode: 4 ?- S=1 00 000 000,rss(R0),c_alloc(P,long[S]),rss(R1),garbage_collect_atoms,rss(R2),RssDiff is R2 - $Rstart, format('~3I bytes above initial rss.',[RssDiff]).
2_401_181_696 bytes above initial rss.
S = 100000000,
R0 = R1, R1 = R2, R2 = 2421080064,
P = test_mode:<C long[100000000]>(0x55744692a000),
RssDiff = 2401181696,
Rstart = 19898368.

test_mode: 4 ?- S=1 00 000 000,rss(R0),c_alloc(P,long[S]),rss(R1),garbage_collect_atoms,rss(R2),RssDiff is R2 - $Rstart, format('~3I bytes above initial rss.',[RssDiff]).
2_401_148_928 bytes above initial rss.
S = 100000000,
R0 = R1, R1 = R2, R2 = 2421047296,
P = test_mode:<C long[100000000]>(0x5573e7346000),
RssDiff = 2401148928,
Rstart = 19898368.

test_mode: 4 ?- S=1 00 000 000,rss(R0),c_alloc(P,long[S]),rss(R1),garbage_collect_atoms,rss(R2),RssDiff is R2 - $Rstart, format('~3I bytes above initial rss.',[RssDiff]).
2_401_148_928 bytes above initial rss.
S = 100000000,
R0 = R1, R1 = R2, R2 = 2421047296,
P = test_mode:<C long[100000000]>(0x557416e38000),
RssDiff = 2401148928,
Rstart = 19898368.

test_mode: 4 ?- S=1 00 000 000,rss(R0),c_alloc(P,long[S]),rss(R1),garbage_collect_atoms,rss(R2),RssDiff is R2 - $Rstart, format('~3I bytes above initial rss.',[RssDiff]).
2_401_181_696 bytes above initial rss.
S = 100000000,
R0 = R1, R1 = R2, R2 = 2421080064,
P = test_mode:<C long[100000000]>(0x55744692a000),
RssDiff = 2401181696,
Rstart = 19898368.

test_mode: 4 ?- S=1 00 000 000,rss(R0),c_alloc(P,long[S]),rss(R1),garbage_collect_atoms,rss(R2),RssDiff is R2 - $Rstart, format('~3I bytes above initial rss.',[RssDiff]).
2_401_181_696 bytes above initial rss.
S = 100000000,
R0 = R1, R1 = R2, R2 = 2421080064,
P = test_mode:<C long[100000000]>(0x5573e7346000),
RssDiff = 2401181696,
Rstart = 19898368.

As you can see there are no memory leaks, but tcmalloc keeps about 2Gb to itself, starting first with the .8Gb which is what is actually needed, then going to 1.6Gb and then to 2.4Gb. Presumably for fast performance. It then keeps the 2.4Gb to itself and reuses part of it (the actual .8Gb needed) in future invocations.

Is there a way to tell tcmalloc or malloc to return the memory to the OS?