0%

Python gzip模块设置xfl和os标志

概述

使用gzip模块,发现Python 3.6生成的gzip HeaderJava不一致。查阅gzip结构及Python 3.6源码(Lib/gzip.py),发现不能设置XFLOS标识。
查看Python 3.9源码,发现虽然加了XFL,但OS标识仍硬编码为0xff。最后还是得手动patch。

Header格式

详见RFC1952

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
+---+---+---+---+---+---+---+---+---+---+
|ID1|ID2|CM |FLG| MTIME |XFL|OS | (more-->)
+---+---+---+---+---+---+---+---+---+---+

XFL (eXtra FLags)
These flags are available for use by specific compression
methods. The "deflate" method (CM = 8) sets these flags as
follows:

XFL = 2 - compressor used maximum compression,
slowest algorithm
XFL = 4 - compressor used fastest algorithm

OS (Operating System)
This identifies the type of file system on which compression
took place. This may be useful in determining end-of-line
convention for text files. The currently defined values are
as follows:
0 - FAT filesystem (MS-DOS, OS/2, NT/Win32)
1 - Amiga
2 - VMS (or OpenVMS)
3 - Unix
4 - VM/CMS
5 - Atari TOS
6 - HPFS filesystem (OS/2, NT)
7 - Macintosh
8 - Z-System
9 - CP/M
10 - TOPS-20
11 - NTFS filesystem (NT)
12 - QDOS
13 - Acorn RISCOS
255 - unknown

其中XFL为压缩等级标识,OS为系统标识

源码

gzip.py_write_gzip_header函数:

1
2
3
4
5
6
7
8
9
10
if compresslevel == _COMPRESS_LEVEL_BEST:
xfl = b'\002'
elif compresslevel == _COMPRESS_LEVEL_FAST:
xfl = b'\004'
else:
xfl = b'\000'
self.fileobj.write(xfl)
self.fileobj.write(b'\377')
if fname:
self.fileobj.write(fname + b'\000')

可以看到OS标识被硬编码为b'\377'(即b'\xff'

如果需要指定该标志,则需要手动patch