Determine the file type based on the file header data

Source: Internet
Author: User
Tags dbx

An existing file has an unknown extension or is marked incorrectly. Suppose it is a normal, non-empty file, and can be used normally after the extension is corrected, how to determine which type of file it is?
The file whose suffix is unknown or modified still uses the file header to determine the file type. We can use a text editing tool such as ultraedit to open a file (in hexadecimal mode), and then check the file header. The following are common file header characters (in hexadecimal format ), hope to help you:
JPEG (JPG), file header: ffd8ff
PNG (PNG), file header: 89504e47
GIF, file header: 47494638
Tiff (TIF), file header: 492.16a00
Windows Bitmap (BMP), file header: mongod
CAD (DWG), file header: 41433130
Adobe Photoshop (PSD), file header: 38425053
Rich Text Format (RTF), file header: 7b5c727466
XML (XML), file header: 3c3f786d6c
HTML (HTML), file header: 68746d6c3e
Email [thorough only] (EML), file header: 44656c69766572792d646174243a
Outlook Express (DBX), file header: cfad12fec5fd746f
Outlook (PST), file header: 2142444e
MS word/Excel (xls.or.doc), file header: d0cf11e0
MS access (MDB), file header: 5374616e64617213204a
Wordperfect (WPD), file header: ff575043
Postscript (EPS. Or. ps), file header: 252150532d41646f6265
Adobe Acrobat (PDF), file header: 255044462d312e
Quicken (qdf), file header: ac9ebd8f
Windows Password (pwl), file header: e3828596
ZIP Archive (ZIP), file header: 504b0304
RAR Archive (RAR), file header: 52617221
Wave (WAV), file header: 57415645
AVI (AVI), file header: 41564920
Real Audio (RAM), file header: 2e00001fd
Real Media (RM), file header: 2e524d46
MPEG (MPG), file header: 000001ba
MPEG (MPG), file header: 000001b3
QuickTime (mov), file header: 6d6f6f76
Windows Media (ASF), file header: 3026b2758e66cf11
MIDI (MID), file header: 4d546864

The following describes how to determine the file type based on the header file code written in Java on the Internet.

Package com; import Java. io. fileinputstream; import Java. io. filenotfoundexception; import Java. io. ioexception; import Java. util. date; import Java. util. hashmap; import Java. util. iterator; import Java. util. map; import Java. util. set; public class filetype {public final static Map <string, string> file_type_map = new hashmap <string, string> (); Private filetype () {}static {getallfiletype (); // initialize file type information}/*** discription: [getallfiletype, common file header information] */Private Static void getallfiletype () {file_type_map.put ("ffd8ffe000425a464946", "jpg "); // JPEG (JPG) file_type_map.put ("89504e470d0a1a0a0000", "PNG"); // PNG (PNG) file_type_map.put ("4749463820.126026f01", "GIF"); // GIF (GIF) file_type_map.put ("49347a00227105008037", "TIF"); // tiff (TIF) file_type_map.put ("127d228c0000000", "BMP"); // 16-color Bitmap (BMP) file_type_map.put ("effecd00000090000000000", "BMP"); // 24-Bit Bitmap (BMP) file_type_map.put ("effecd8e1b030000000000", "BMP"); // 256-Bit Bitmap (BMP) file_type_map.put ("41433130313500000000", "DWG"); // CAD (DWG) file_type_map.put ("html"); // html (HTML) file_type_map.put ("3c21646f637479706520 ", "htm"); // htm (HTM) file_type_map.put ("Examples", "CSS"); // css file_type_map.put ("696b2e71623d696b2e71", "JS "); // JS file_type_map.put ("7b5c727466315c616e73", "rtf"); // Rich Text Format (RTF) file_type_map.put ("38425053000100000000", "PSD"); // Photoshop (PSD) file_type_map.put ("plugin", "eml"); // email [Outlook Express 6] (EML) file_type_map.put ("d0cf11e0a1b11ae10000", "Doc"); // MS Excel note: the file headers of word, MSI, and Excel are the same as those of file_type_map.put ("d0cf11e0a1b11ae10000", "sealing"); // file_type_map.put ("5374616e64617824204a", "MDB "); // MS access (MDB) file_type_map.put ("252150532d41646f6265", "Ps"); file_type_map.put ("255044462d312e350d0a", "pdf"); // Adobe Acrobat (PDF) file_type_map.put ("example", "rmvb"); // rmvb/RM same file_type_map.put ("example", "FLV"); // The same file_type_map.put ("example ", "MP4"); file_type_map.put ("49443303000000002176", "MP3"); file_type_map.put ("000001ba210001000180", "MPG"); // file_type_map.put ("success", "WMV "); // The same file_type_map.put ("audio", "WAV") as ASF; // wave (WAV) file_type_map.put ("audio", "Avi"); file_type_map.put ("4d546864000000060001 ", "mid"); // Midi (MID) file_type_map.put ("504b0304140000000800", "Zip"); file_type_map.put ("character", "RAR"); file_type_map.put ("character ", "ini"); file_type_map.put ("504b03040a0000000000", "jar"); file_type_map.put ("example", "EXE"); // Executable File file_type_map.put ("example ", "jsp"); // JSP file file_type_map.put ("4d616e69666573742d56", "MF"); // file_type_map.put ("watermark", "XML "); // XML file file_type_map.put ("plugin", "SQL"); // XML file file_type_map.put ("7061636b616765207765", "Java"); // Java file file_type_map.put ("plugin ", "Bat"); // BAT file file_type_map.put ("1f8b0800000000000000", "GZ"); // GZ file file_type_map.put ("6c6f670000a2e0000f6f74", "properties "); // BAT file file_type_map.put ("cafebabe0000002e0041", "class"); // BAT file file_type_map.put ("49545346030000006000", "CHM"); // BAT file file_type_map.put ("04000000010000001300 ", "mxp"); // BAT file file_type_map.put ("504b0304140006000800", "docx"); // docx file file_type_map.put ("plugin", "WPS "); // WPS Text WPS, table et, and demo DPS are all the same file_type_map.put ("6431303a637265617465", "torrent"); file_type_map.put ("6d6f6f76", "mov "); // QuickTime (mov) file_type_map.put ("ff575043", "WPD"); // Wordperfect (WPD) file_type_map.put ("cfad12fec5fd746f", "DBX "); // Outlook Express (DBX) file_type_map.put ("2142444e", "Pst"); // Outlook (PST) file_type_map.put ("ac9ebd8f", "qdf "); // quicken (qdf) file_type_map.put ("e3828596", "PWL"); // Windows Password (pwl) file_type_map.put ("2e00001fd", "Ram "); // Real Audio (RAM)}/*** get the file header * @ Param SRC * @ return */public static string bytestohexstring (byte [] SRC) {stringbuilder = new stringbuilder (); If (src = NULL | SRC. length <= 0) {return NULL;} For (INT I = 0; I <SRC. length; I ++) {int v = SRC [I] & 0xff; string HV = integer. tohexstring (V); If (HV. length () <2) {stringbuilder. append (0);} stringbuilder. append (HV);} return stringbuilder. tostring ();}/*** determine the file type based on the specified file header * @ Param filepaht * @ return */public static string getfiletype (string filepaht) {string res = NULL; try {fileinputstream is = new fileinputstream (filepaht); byte [] B = new byte [10]; is. read (B, 0, B. length); string filecode = bytestohexstring (B); system. out. println (filecode); // This method can be used when the dictionary header code is not enough digits, but the speed is a little slower iterator <string> keyiter = file_type_map.keyset (). iterator (); While (keyiter. hasnext () {string key = keyiter. next (); If (key. tolowercase (). startswith (filecode. tolowercase () | filecode. tolowercase (). startswith (key. tolowercase () {res = file_type_map.get (key); break ;}} catch (filenotfoundexception e) {e. printstacktrace ();} catch (ioexception e) {e. printstacktrace ();} return res;} public static void main (string [] ARGs) throws exception {string type = getfiletype ("C:/test/EEE. WMV "); system. out. println ("Eee. WMV: "+ type); system. out. println (); type = getfiletype ("C:/test/350996.wav"); system. out. println ("350996.wav:" + type); system. out. println ();}}

Source: http://blog.csdn.net/songylwq/article/details/6139753

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.