zlib memory leak

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

zlib memory leak

dmkolesnikov
Hello,

I’ve got an interesting issue with zlib at otp-18.2.1 I’ve not checked other releases yet.
I do have a file about 30GB of compressed data, it is expanded to 300GB of textual UTF8 data.
The producer of the file claims that standard gzip is used. The file header [1] is:
1F 8B 08 04 00 00 00 00 00 00 24 03

My decompression program is very simple [2], it reads 64K binary chunks from file and inflates them using zlib:inflate(…).

At some point of time, the inflate do not return and VM binary memory growth to infinity until it is crashed.
The crash is reproducible all the time with my file. The file is not corrupted and gzip is capable to perform it check and inflate data. The file becomes readable by program if it is inflated - deflated again using command line gzip. The header of readable file is:
1F 8B 08 00 6E EC E6 56 00 03 D4 BD

I am having a challenge to debug this issue further to zlib and understand root-cause.
Do you have any suggestions on it?

Best Regards,
Dmitry

P.S: the file, I am taking about, contains confidential data and cannot be disclosed to community.

Reference:
[1] http://www.zlib.org/rfc-gzip.html
[2] https://github.com/fogfish/feta/blob/master/src/gz.erl
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: zlib memory leak

Kirill Ratkin
Hi Dmitry,

To check your situation I created simple demo for your gz module (https://gist.github.com/kvratkin/ec4736aaf65a79bc4469).
As I can see file size about 1Gb unzipped properly. No errors or VM crashes. Maybe I use your module incorrectly ... 
Unfortunately I can't check as huge file as you have but I'll try file 10Gb soon.

P.S. I use Debian Jessie, 64 bit, 1 Virtual CPU and Erlang installed from Erlang Solution repo.

2016-03-14 23:25 GMT+03:00 Dmitry Kolesnikov <[hidden email]>:
Hello,

I’ve got an interesting issue with zlib at otp-18.2.1 I’ve not checked other releases yet.
I do have a file about 30GB of compressed data, it is expanded to 300GB of textual UTF8 data.
The producer of the file claims that standard gzip is used. The file header [1] is:
1F 8B 08 04 00 00 00 00 00 00 24 03

My decompression program is very simple [2], it reads 64K binary chunks from file and inflates them using zlib:inflate(…).

At some point of time, the inflate do not return and VM binary memory growth to infinity until it is crashed.
The crash is reproducible all the time with my file. The file is not corrupted and gzip is capable to perform it check and inflate data. The file becomes readable by program if it is inflated - deflated again using command line gzip. The header of readable file is:
1F 8B 08 00 6E EC E6 56 00 03 D4 BD

I am having a challenge to debug this issue further to zlib and understand root-cause.
Do you have any suggestions on it?

Best Regards,
Dmitry

P.S: the file, I am taking about, contains confidential data and cannot be disclosed to community.

Reference:
[1] http://www.zlib.org/rfc-gzip.html
[2] https://github.com/fogfish/feta/blob/master/src/gz.erl
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: zlib memory leak

Сергей Прохоров-2
In reply to this post by dmkolesnikov
Maybe your gzip file is some kind of sparse file at some point, so, even this 64K compressed chunk inflates to huge uncompressed value?

Is VM memory grows linearly during program runtime or it grows instantly at some moment?

P.S.: what is this `stdio` module?

Hello,
I?ve got an interesting issue with zlib at otp-18.2.1 I?ve not checked other releases yet.
I do have a file about 30GB of compressed data, it is expanded to 300GB of textual UTF8 data.
The producer of the file claims that standard gzip is used. The file header [1] is:
1F 8B 08 04 00 00 00 00 00 00 24 03
My decompression program is very simple [2], it reads 64K binary chunks from file and inflates them using zlib:inflate(?).
At some point of time, the inflate do not return and VM binary memory growth to infinity until it is crashed.
The crash is reproducible all the time with my file. The file is not corrupted and gzip is capable to perform it check and inflate data. The file becomes readable by program if it is inflated - deflated again using command line gzip. The header of readable file is:
1F 8B 08 00 6E EC E6 56 00 03 D4 BD
I am having a challenge to debug this issue further to zlib and understand root-cause.
Do you have any suggestions on it?
Best Regards,
Dmitry
P.S: the file, I am taking about, contains confidential data and cannot be disclosed to community.
Reference:
[1] http://www.zlib.org/rfc-gzip.html
[2] https://github.com/fogfish/feta/blob/master/src/gz.erl

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: zlib memory leak

dmkolesnikov
No, this file is a utf8 text file, there are not sparse chunks and data is correct there. I am able to use my program to read this file if it is re-compressed with command-line gzip. I am suspecting that something wrong with zlib. BTW, I’ve build OTP from source no Mac, I need to check it on Linux.

Best Regards,
Dmitry


> On Mar 15, 2016, at 6:47 PM, Сергей Прохоров <[hidden email]> wrote:
>
> Maybe your gzip file is some kind of sparse file at some point, so, even this 64K compressed chunk inflates to huge uncompressed value?
> In that case you may try http://erlang.org/doc/man/zlib.html#inflateChunk-2
>
> Is VM memory grows linearly during program runtime or it grows instantly at some moment?
>
> P.S.: what is this `stdio` module?
>
> Hello,
> I?ve got an interesting issue with zlib at otp-18.2.1 I?ve not checked other releases yet.
> I do have a file about 30GB of compressed data, it is expanded to 300GB of textual UTF8 data.
> The producer of the file claims that standard gzip is used. The file header [1] is:
> 1F 8B 08 04 00 00 00 00 00 00 24 03
> My decompression program is very simple [2], it reads 64K binary chunks from file and inflates them using zlib:inflate(?).
> At some point of time, the inflate do not return and VM binary memory growth to infinity until it is crashed.
> The crash is reproducible all the time with my file. The file is not corrupted and gzip is capable to perform it check and inflate data. The file becomes readable by program if it is inflated - deflated again using command line gzip. The header of readable file is:
> 1F 8B 08 00 6E EC E6 56 00 03 D4 BD
> I am having a challenge to debug this issue further to zlib and understand root-cause.
> Do you have any suggestions on it?
> Best Regards,
> Dmitry
> P.S: the file, I am taking about, contains confidential data and cannot be disclosed to community.
> Reference:
> [1] http://www.zlib.org/rfc-gzip.html
> [2] https://github.com/fogfish/feta/blob/master/src/gz.erl

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions