Just saw a fun program, pull it over. Original address: https://www.ttlsa.com/python/determine-file-type-by-the-file-header/infringement delete.
============================== Divider Line ==============================
For the server to provide the upload, the uploaded files need to be filtered, otherwise various Webshell, Bauku.
Importstruct#Supported file Types#The purpose of using a 16 binary string is to know how many bytes the file header is#the length of each file header is different, less than 2 characters, and 8 characters longdeftypelist ():return { "52617221": Ext_rar,"504b0304": Ext_zip}#byte code to 16 binary stringdefBytes2hex (bytes): num=len (bytes) Hexstr= u"" forIinchrange (num): t= u"%x"%Bytes[i]ifLen (t)% 2: Hexstr+ = U"0"hexstr+=TreturnHexstr.upper ()#Get file Typedeffiletype (filename): Binfile= open (filename,'RB')#required binary word readingTL =typelist () ftype='Unknown' forHcodeinchTl.keys (): Numofbytes= Len (hcode)/2#how many bytes to readBinfile.seek (0)#go back to the file header every time you read it, or you'll read it backwards.Hbytes = Struct.unpack_from ("B"*numofbytes, Binfile.read (numofbytes))#a "B" means one byteF_hcode =Bytes2hex (hbytes)ifF_hcode = =Hcode:ftype=Tl[hcode] Breakbinfile.close ()returnftypeif __name__=='__main__': PrintFileType (Your-file-path)
File headers for common file formats
File format file header (hex) JPEG (jpg) ffd8ffpng (PNG) 89504e47gif (GIF) 47494638TIFF (TIF) 49492a00windows Bitmap (BMP) 424DCAD (DWG) 4 1433130Adobe Photoshop (PSD) 38425053Rich Text Format (RTF) 7b5c727466xml (XML) 3c3f786d6chtml (HTML) 68746d6c3eemail [tho Rough only] (EML) 44656c69766572792d646174653aoutlook Express (DBX) cfad12fec5fd746foutlook (PST) 2142444EMS Word/excel (Xls.or.doc) d0cf11e0ms Access (MDB) 5374616e64617264204awordperfect (WPD) ff575043postscript (eps.or.ps) 252150532d41646f6265adobe Acrobat (pdf) 255044462d312equicken (QDF) ac9ebd8fwindows Password (PWL) E3828596zip Archive ( Zip) 504b0304rar Archive (RAR) 52617221Wave (WAV) 57415645AVI (AVI) 41564920Real Audio (RAM) 2e7261fdreal Media (RM) 2E524 D46mpeg (MPG) 000001BAMPEG (MPG) 000001b3quicktime (mov) 6d6f6f76windows Media (ASF) 3026b2758e66cf11midi (mid) 4d546864
============================== Divider Line ==============================
In other words, for files uploaded to the server, some may be modified by the extension to confuse the malicious file, this time you can judge the file header, see if it is not really the extension of the file shown, if it is released.
"Go" python to determine file type by file header