Determine file types based on file header data

Source: Internet
Author: User
Tags dbx gz file rar what file type

An existing file with an unknown extension or a marked error. If it is a normal, non-empty file, and the extension is corrected for normal use, how do you determine what type of file it is?
In the suffix unknown, or the suffix of the modified file, still through the file header to determine exactly what file type. We can use a text editing tool such as UltraEdit to open the file (16 in binary mode), and then see what the file header is, the following is the common file type of the file header character (16), I hope that you can help:
JPEG (jpg), file header: ffd8ff
PNG (PNG), file header: 89504E47
GIF (GIF), file header: 47494638
TIFF (TIF), file header: 49492a00
Windows Bitmap (BMP), file header: 424D
CAD (DWG), file header: 41433130
Adobe Photoshop (PSD), file header: 38425053
Rich Text Format (RTF), file header: 7b5c727466
XML (XML), file header: 3c3f786d6c
HTML (HTML), file header: 68746d6c3e
Email [Thorough only] (EML), file header: 44656c69766572792d646174653a
Outlook Express (DBX), file header: cfad12fec5fd746f
Outlook (PST), file header: 2142444E
MS Word/excel (xls.or.doc), file header: d0cf11e0
MS Access (MDB), file header: 5374616e64617264204a
WordPerfect (WPD), file header: FF575043
Postscript (eps.or.ps), file header: 252150532d41646f6265
Adobe Acrobat (pdf), file header: 255044462d312e
Quicken (QDF), file header: ac9ebd8f
Windows Password (PWL), file header: E3828596
Zip Archive (Zip), file header: 504b0304
rar Archive (RAR), file header: 52617221
Wave (WAV), file header: 57415645
Avi (AVI), File header: 41564920
Real Audio (RAM), file header: 2E7261FD
Real Media (RM), file header: 2e524d46
MPEG (MPG), file header: 000001BA
MPEG (MPG), file header: 000001b3
Quicktime (mov), file header: 6d6f6f76
Windows Media (ASF), file header: 3026b2758e66cf11
MIDI (mid), file header: 4d546864

Below in the provision of an online using Java write according to the header file code to determine the file type

Package com;

Import Java.io.FileInputStream;
Import java.io.FileNotFoundException;
Import java.io.IOException;
Import Java.util.Date;
Import Java.util.HashMap;
Import Java.util.Iterator;
Import Java.util.Map;
Import Java.util.Set;

public class FileType {

Public final static map<string, string> File_type_map = new hashmap<string, string> ();

Private FileType () {}
static{
Getallfiletype (); Initialize file type information
}

/**
* Discription:[getallfiletype, common file header information]
*/
private static void Getallfiletype ()
{
File_type_map.put ("ffd8ffe000104a464946", "jpg"); JPEG (jpg)
File_type_map.put ("89504e470d0a1a0a0000", "PNG"); PNG (PNG)
File_type_map.put ("47494638396126026f01", "gif"); GIF (GIF)
File_type_map.put ("49492a00227105008037", "TIF"); TIFF (TIF)
File_type_map.put ("424d228c010000000000", "BMP"); 16-color bitmap (BMP)
File_type_map.put ("424d8240090000000000", "BMP"); 24-bit bitmap (BMP)
File_type_map.put ("424d8e1b030000000000", "BMP"); 256-color bitmap (BMP)
File_type_map.put ("41433130313500000000", "DWG"); CAD (DWG)
File_type_map.put ("3c21444f435459504520", "html"); HTML (HTML)
File_type_map.put ("3c21646f637479706520", "htm"); HTM (HTM)
File_type_map.put ("48544d4c207b0d0a0942", "CSS"); Css
File_type_map.put ("696b2e71623d696b2e71", "JS"); Js
File_type_map.put ("7b5c727466315c616e73", "RTF"); Rich Text Format (RTF)
File_type_map.put ("38425053000100000000", "PSD"); Photoshop (PSD)
File_type_map.put ("46726f6d3a203d3f6762", "eml"); Email [Outlook Express 6] (EML)
File_type_map.put ("d0cf11e0a1b11ae10000", "Doc"); MS Excel Note: Word, MSI, and Excel file headers are the same
File_type_map.put ("d0cf11e0a1b11ae10000", "VSD"); Visio Drawing
File_type_map.put ("5374616e64617264204a", "MDB"); MS Access (MDB)
File_type_map.put ("252150532d41646f6265", "PS");
File_type_map.put ("255044462d312e350d0a", "pdf"); Adobe Acrobat (PDF)
File_type_map.put ("2e524d46000000120001", "rmvb"); Rmvb/rm Same
File_type_map.put ("464c5601050000000900", "flv"); FLV is the same as F4V
File_type_map.put ("00000020667479706d70", "mp4");
File_type_map.put ("49443303000000002176", "MP3");
File_type_map.put ("000001ba210001000180", "mpg"); //
File_type_map.put ("3026b2758e66cf11a6d9", "WMV"); WMV is the same as ASF
File_type_map.put ("52494646e27807005741", "wav"); Wave (WAV)
File_type_map.put ("52494646d07d60074156", "avi");
File_type_map.put ("4d546864000000060001", "mid"); MIDI (mid)
File_type_map.put ("504b0304140000000800", "zip");
File_type_map.put ("526172211a0700cf9073", "rar");
File_type_map.put ("235468697320636f6e66", "INI");
File_type_map.put ("504b03040a0000000000", "jar");
File_type_map.put ("4d5a9000030000000400", "EXE");//executable file
File_type_map.put ("3c25402070616765206c", "JSP");//jsp file
File_type_map.put ("4d616e69666573742d56", "MF");//MF file
File_type_map.put ("3c3f786d6c2076657273", "xml");//xml file
File_type_map.put ("494e5345525420494e54", "SQL");//xml file
File_type_map.put ("7061636b616765207765", "Java");//java file
File_type_map.put ("406563686f206f66660d", "bat");//bat file
File_type_map.put ("1f8b0800000000000000", "GZ");//gz file
File_type_map.put ("6c6f67346a2e726f6f74", "Properties");//bat file
File_type_map.put ("cafebabe0000002e0041", "class");//bat file
File_type_map.put ("49545346030000006000", "CHM");//bat file
File_type_map.put ("04000000010000001300", "MXP");//bat file
File_type_map.put ("504b0304140006000800", "docx");//docx file
File_type_map.put ("d0cf11e0a1b11ae10000", "WPS"),//wps text wps, form ET, demo DPS are all the same
File_type_map.put ("6431303a637265617465", "torrent");


File_type_map.put ("6d6f6f76", "mov"); Quicktime (MOV)
File_type_map.put ("FF575043", "WPD"); WordPerfect (WPD)
File_type_map.put ("cfad12fec5fd746f", "dbx"); Outlook Express (DBX)
File_type_map.put ("2142444E", "PST"); Outlook (PST)
File_type_map.put ("ac9ebd8f", "QDF"); Quicken (QDF)
File_type_map.put ("E3828596", "PWL"); Windows Password (PWL)
File_type_map.put ("2E7261FD", "Ram"); Real Audio (RAM)
}

/**
* Get the file header of the uploaded file
* @param src
* @return
*/
public static String bytestohexstring (byte[] src) {
StringBuilder StringBuilder = new StringBuilder ();
if (src = = NULL | | src.length <= 0) {
return null;
}
for (int i = 0; i < src.length; i++) {
int v = src[i] & 0xFF;
String HV = integer.tohexstring (v);
if (Hv.length () < 2) {
Stringbuilder.append (0);
}
Stringbuilder.append (HV);
}
return stringbuilder.tostring ();
}

/**
* Determine the file type according to the file header of the developed file
* @param filepaht
* @return
*/
public static string Getfiletype (String filepaht) {
String res = null;
try {
FileInputStream is = new FileInputStream (FILEPAHT);
Byte[] B = new BYTE[10];
Is.read (b, 0, b.length);
String Filecode = bytestohexstring (b);

System.out.println (Filecode);


This method can be used when the header code of the dictionary is not enough, but it's relatively slow.
iterator<string> keyiter = File_type_map.keyset (). Iterator ();
while (Keyiter.hasnext ()) {
String key = Keyiter.next ();
if (Key.tolowercase (). StartsWith (Filecode.tolowercase ()) | | filecode.tolowercase (). StartsWith (Key.tolowercase ())) {
res = File_type_map.get (key);
Break
}
}
} catch (FileNotFoundException e) {
E.printstacktrace ();
} catch (IOException e) {
E.printstacktrace ();
}
return res;
}

public static void Main (string[] args) throws Exception {

String type = Getfiletype ("c:/test/eee. WMV ");
System.out.println ("Eee. WMV: "+type);
System.out.println ();

Type = Getfiletype ("C:/test/350996.wav");
System.out.println ("350996.wav:" +type);
System.out.println ();

}
}

Judging file types based on file header data

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.