Previously, the following paper was mentioned in the Erlang paper we have read in [Erlang 0126:
On preserving term sharing in the Erlang Virtual Machine
Address: http://user.it.uu.se /~ Kostis/papers/erlang12_sharing.pdf
Abstract: In this paper we describe our experiences and argue through examples why? Attening terms during copying is not a good idea
A language like Erlang. More importantly, we propose a sharing preserving copying mechanic for Erlang/OTP and describe a pub-
Licly available complete implementation of this mechanism.
The term sharing data item sharing is not new, "Efficiency Guide User's Guide" section 4.2 "constructing binaries" [LINK] and section 8.2 "loss of sharing" [LINK] mentioned above (Let's Talk About Erlang document again organization, A topic is often dispersed in multiple documents and requires patience ). not only binary data type, term sharing is a common problem. the scenario of forced copy is mentioned in the Guide: Send to another process or insert to ETS. the document is excerpted below:
Loss of sharing
Shared sub-terms are not preserved when a term is sent to another process, passed as the initial process arguments in the spawn call, or stored in an ETS table. that is an optimization. most applications do not send messages with shared sub-terms.
During the data copying process, Erlang will traverse the data twice. the first traversal will calculate the flat size (ERTs/emulator/beam/copy. the size_object method in c) then allocates the corresponding memory for it. The second traversal completes the actual copy (function copy_structin ERTs/emulator/beam/copy. c ).
First, let's write a simple code to demonstrate the erts_debug: Size/1 and erts_debug: flat_size/1 that will be frequently used later.
s3(L)-> L2=[L,L,L,L], {{erts_debug:size(L),erts_debug:flat_size(L)}, {erts_debug:size(L2),erts_debug:flat_size(L2)}}.9> d:s3([1,2,3,4,5,6]).{{12,12},{20,56}}
The following code in Shell demonstrates spawn, message sending, and ETS insertion.
Eshell V6.0 (abort with ^G)1> L=[1,2,3,4,5,6,7,8,9,10].[1,2,3,4,5,6,7,8,9,10]2> L2=[L,L,L,L,L,L].[[1,2,3,4,5,6,7,8,9,10],[1,2,3,4,5,6,7,8,9,10],[1,2,3,4,5,6,7,8,9,10],[1,2,3,4,5,6,7,8,9,10],[1,2,3,4,5,6,7,8,9,10],[1,2,3,4,5,6,7,8,9,10]]3> erts_debug:size(L2).324> erts_debug:flat_size(L2).1325> spawn(fun () ->receive Data -> io:format("~p",[erts_debug:size(Data)]) end end).<0.39.0>6> v(5) ! L2.132[[1,2,3,4,5,6,7,8,9,10],[1,2,3,4,5,6,7,8,9,10],[1,2,3,4,5,6,7,8,9,10],[1,2,3,4,5,6,7,8,9,10],[1,2,3,4,5,6,7,8,9,10],[1,2,3,4,5,6,7,8,9,10]]7> erts_debug:size(L2).328> ets:new(test,[named_table]).test9> ets:insert(test,{1,L2}).true10> ets:lookup(test ,1).[{1, [[1,2,3,4,5,6,7,8,9,10], [1,2,3,4,5,6,7,8,9,10], [1,2,3,4,5,6,7,8,9,10], [1,2,3,4,5,6,7,8,9,10], [1,2,3,4,5,6,7,8,9,10], [1,2,3,4,5,6,7,8,9,10]]}]11> [{1,Data}]=v(10).[{1, [[1,2,3,4,5,6,7,8,9,10], [1,2,3,4,5,6,7,8,9,10], [1,2,3,4,5,6,7,8,9,10], [1,2,3,4,5,6,7,8,9,10], [1,2,3,4,5,6,7,8,9,10], [1,2,3,4,5,6,7,8,9,10]]}]12> Data.[[1,2,3,4,5,6,7,8,9,10],[1,2,3,4,5,6,7,8,9,10],[1,2,3,4,5,6,7,8,9,10],[1,2,3,4,5,6,7,8,9,10],[1,2,3,4,5,6,7,8,9,10],[1,2,3,4,5,6,7,8,9,10]]13> erts_debug:size(Data).13214> spawn(d,test,[L2]).132<0.54.0> test(Data)-> io:format("~p",[erts_debug:size(Data)]).
In addition to the above situations, there are also some potential situations that may also lead to data expansion. For example, the design example in the above paper:
show_printing_may_be_bad() -> F = fun (N) -> T = now(), L = mklist(N), S = erts_debug:size(L), io:format("mklist(~w), size ~w, ", [N, S]), io:format("is ~P, ", [L, 2]), %%% BAD !!! D = timer:now_diff(now(), T), io:format("in ~.3f sec.~n", [D/1000000]) end, lists:foreach(F, [10, 20, 22, 24, 26, 28, 30]).mklist(0) -> 0;mklist(M) -> X = mklist(M-1), [X, X].
IO: Format ("is ~ P, ", [l, 2]), % bad !!! After this line of code is deleted, the code is executed separately. The result obtained on the machine is as follows:
Eshell V6.0 (abort with ^G)1> d:show_printing_may_be_bad().mklist(10), size 40, in 0.001 sec.mklist(20), size 80, in 0.000 sec.mklist(22), size 88, in 0.000 sec.mklist(24), size 96, in 0.000 sec.mklist(26), size 104, in 0.000 sec.mklist(28), size 112, in 0.000 sec.mklist(30), size 120, in 0.000 sec.okEshell V6.0 (abort with ^G)1> d:show_printing_may_be_bad().mklist(10), size 40, is [[...]|...], in 0.001 sec.mklist(20), size 80, is [[...]|...], in 0.110 sec.mklist(22), size 88, is [[...]|...], in 0.421 sec.mklist(24), size 96, is [[...]|...], in 43.105 sec.mklist(26), size 104, Crash dump was written to: erl_crash.dumpeheap_alloc: Cannot allocate 3280272216 bytes of memory (of type "heap").rlwrap: warning: erl killed by SIGABRT.rlwrap has not crashed, but for transparency,it will now kill itself (without dumping core)with the same signal
Obviously, the version of this line of code is not only executed for a long time, but also requires a large amount of memory.
Why is this happening? As mentioned above, "loss of sharing", why does it trigger data expansion (or data tiled? We have talked about Io: format before ([Erlang 0041] Details IO: Format [LINK]), in Erlang/OTP, I/O is implemented by initiating an I/O Request to the I/O server. IO: The format call actually sends an IO request message to I/O, and the rest is processed by the IO server. although the above L is simply output as "[[...] |...] "However, data tiled and copied have been triggered during message transmission;
This problem is actually very concealed, so it is necessary to clear all IO: Format output during release generation through macro options;
[Erlang 0127] term sharing in Erlang/OTP