Java-百度API的圖片文字識別（支援英文）

最後更新：2018-08-09 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

PS：

基於Java 1.8
版本控制：maven
使用之前需要擷取對應的項目API_KEY，SECRET_KEY，這些參數在使用API的時候必須用到，用於產生access_token。
如何擷取這些參數：在百度開發人員中心申請一個“通用文字識別”項目，然後就可以擷取到這些參數。
準備條件都完成了，現在開始進行Image Recognition了

1. 準備pom檔案

<!-- https://mvnrepository.com/artifact/com.alibaba/fastjson --><dependency>    <groupId>com.alibaba</groupId>    <artifactId>fastjson</artifactId>    <version>1.2.46</version></dependency><!-- https://mvnrepository.com/artifact/org.apache.httpcomponents/httpclient --><dependency>    <groupId>org.apache.httpcomponents</groupId>    <artifactId>httpclient</artifactId>    <version>4.5.5</version></dependency>

2. 擷取access_token

package com.wsk.netty.check;import org.json.JSONObject;import java.io.BufferedReader;import java.io.InputStreamReader;import java.net.HttpURLConnection;import java.net.URL;import java.util.List;import java.util.Map;/** * 擷取token類 * * @Author : WuShukai * @Date :2018/2/12 10:04 */public class AuthService {    /**     * 擷取許可權token     * @return 返回樣本：     * {     * "access_token": "24.460da4889caad24cccdb1fea17221975.2592000.1491995545.282335-1234567",     * "expires_in": 2592000     * }     */    public static String getAuth() {        // 官網擷取的 API Key 更新為你註冊的        String clientId = "**";        // 官網擷取的 Secret Key 更新為你註冊的        String clientSecret = "**";        return getAuth(clientId, clientSecret);    }    /**     * 擷取API訪問token     * 該token有一定的有效期間，需要自行管理，當失效時需重新擷取.     * @param ak - 百度雲官網擷取的 API Key     * @param sk - 百度雲官網擷取的 Securet Key     * @return assess_token 樣本：     * "24.460da4889caad24cccdb1fea17221975.2592000.1491995545.282335-1234567"     */    private static String getAuth(String ak, String sk) {        // 擷取token地址        String authHost = "https://aip.baidubce.com/oauth/2.0/token?";        String getAccessTokenUrl = authHost                // 1. grant_type為固定參數                + "grant_type=client_credentials"                // 2. 官網擷取的 API Key                + "&client_id=" + ak                // 3. 官網擷取的 Secret Key                + "&client_secret=" + sk;        try {            URL realUrl = new URL(getAccessTokenUrl);            // 開啟和URL之間的串連            HttpURLConnection connection = (HttpURLConnection) realUrl.openConnection();            connection.setRequestMethod("GET");            connection.connect();            // 擷取所有回應標頭欄位            Map<String, List<String>> map = connection.getHeaderFields();            // 遍曆所有的回應標頭欄位            for (String key : map.keySet()) {                System.err.println(key + "--->" + map.get(key));            }            // 定義 BufferedReader輸入資料流來讀取URL的響應            BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));            StringBuilder result = new StringBuilder();            String line;            while ((line = in.readLine()) != null) {                result.append(line);            }            /**             * 返回結果樣本             */            System.err.println("result:" + result);            JSONObject jsonObject = new JSONObject(result.toString());            return jsonObject.getString("access_token");        } catch (Exception e) {            System.err.printf("擷取token失敗！");            e.printStackTrace(System.err);        }        return null;    }    public static void main(String[] args) {        getAuth();    }}

3. 編寫將圖片轉化成base64後再轉化成urlencode的工具類

package com.wsk.netty.check;import sun.misc.BASE64Encoder;import java.io.FileInputStream;import java.io.IOException;import java.io.InputStream;import java.net.URLEncoder;/** * 圖片轉化base64後再UrlEncode結果 * @Author : WuShukai * @Date :2018/2/12 10:43 */public class BaseImg64 {    /**     * 將一張本地圖片轉化成Base64字串     * @param imgPath 本地圖片地址     * @return 圖片轉化base64後再UrlEncode結果     */    public static String getImageStrFromPath(String imgPath) {        InputStream in;        byte[] data = null;        // 讀取圖片位元組數組        try {            in = new FileInputStream(imgPath);            data = new byte[in.available()];            in.read(data);            in.close();        } catch (IOException e) {            e.printStackTrace();        }        // 對位元組數組Base64編碼        BASE64Encoder encoder = new BASE64Encoder();        // 返回Base64編碼過再URLEncode的位元組數組字串        return URLEncoder.encode(encoder.encode(data));    }}

4. 編寫調用百度API介面的方法，擷取識別結果

package com.wsk.netty.check;import org.apache.http.HttpResponse;import org.apache.http.client.HttpClient;import org.apache.http.client.methods.HttpPost;import org.apache.http.entity.StringEntity;import org.apache.http.impl.client.DefaultHttpClient;import org.apache.http.util.EntityUtils;import java.io.File;import java.io.IOException;import java.net.URI;import java.net.URISyntaxException;/** * 映像文字識別 * * @Author : WuShukai * @Date :2018/2/12 10:25 */public class Check {    private static final String POST_URL = "https://aip.baidubce.com/rest/2.0/ocr/v1/general_basic?access_token=" + AuthService.getAuth();    /**     * 識別本地圖片的文字     *     * @param path 本地圖片地址     * @return 識別結果，為json格式     * @throws URISyntaxException URI開啟異常     * @throws IOException        io流異常     */    public static String checkFile(String path) throws URISyntaxException, IOException {        File file = new File(path);        if (!file.exists()) {            throw new NullPointerException("圖片不存在");        }        String image = BaseImg64.getImageStrFromPath(path);        String param = "image=" + image;        return post(param);    }    /**     * @param url 圖片url     * @return 識別結果，為json格式     */    public static String checkUrl(String url) throws IOException, URISyntaxException {        String param = "url=" + url;        return post(param);    }    /**     * 通過傳遞參數：url和image進行文字識別     *     * @param param 區分是url還是image識別     * @return 識別結果     * @throws URISyntaxException URI開啟異常     * @throws IOException        IO流異常     */    private static String post(String param) throws URISyntaxException, IOException {        //開始搭建post請求        HttpClient httpClient = new DefaultHttpClient();        HttpPost post = new HttpPost();        URI url = new URI(POST_URL);        post.setURI(url);        //佈建要求頭，要求標頭必須為application/x-www-form-urlencoded，因為是傳遞一個很長的字串，不能分段發送        post.setHeader("Content-Type", "application/x-www-form-urlencoded");        StringEntity entity = new StringEntity(param);        post.setEntity(entity);        HttpResponse response = httpClient.execute(post);        System.out.println(response.toString());        if (response.getStatusLine().getStatusCode() == 200) {            String str;            try {                /*讀取伺服器返回過來的json字串資料*/                str = EntityUtils.toString(response.getEntity());                System.out.println(str);                return str;            } catch (Exception e) {                e.printStackTrace();                return null;            }        }        return null;    }    public static void main(String[] args) {        String path = "E:\\find.png";        try {            long now = System.currentTimeMillis();            checkFile(path);            checkUrl("https://gss3.bdstatic.com/-Po3dSag_xI4khGkpoWK1HF6hhy/baike/c0%3Dbaike80%2C5%2C5%2C80%2C26/sign=08c05c0e8444ebf8797c6c6db890bc4f/fc1f4134970a304e46bfc5f7d2c8a786c9175c19.jpg");            System.out.println("耗時：" + (System.currentTimeMillis() - now) / 1000 + "s");        } catch (URISyntaxException | IOException e) {            e.printStackTrace();        }    }}

5. 識別結果（僅測試本地圖片識別）

中文

結果：

結論

這裡是使用了Postman進行測試的，用IDEA控制台的話，返回的json不易讀。
從這裡可以看出，耗時是1s，雖然識別率高，但是結果還是有那麼的一些差距，例如識別結果的第五列，只返回了“我是遜尼”，而原圖片的很大串沒有識別出來。

英文：

結果：

結論

單識別英文的圖片，效果還是比較滿意的，耗時短，精確度高。

中英文結合：

結果：

結論

結果也是比較滿意的。百度的識別還是要雙擊66666.

具體文檔：http://ai.baidu.com/docs#/OCR-API/e1bd77f3

PS：
基於Java 1.8
版本控制：maven
使用之前需要擷取對應的項目API_KEY，SECRET_KEY，這些參數在使用API的時候必須用到，用於產生access_token。
如何擷取這些參數：在百度開發人員中心申請一個“通用文字識別”項目，然後就可以擷取到這些參數。
準備條件都完成了，現在開始進行Image Recognition了。

test4j圖片文字識別教程：http://blog.csdn.net/wsk1103/article/details/54173282

1. 準備pom檔案

<!-- https://mvnrepository.com/artifact/com.alibaba/fastjson --><dependency>    <groupId>com.alibaba</groupId>    <artifactId>fastjson</artifactId>    <version>1.2.46</version></dependency><!-- https://mvnrepository.com/artifact/org.apache.httpcomponents/httpclient --><dependency>    <groupId>org.apache.httpcomponents</groupId>    <artifactId>httpclient</artifactId>    <version>4.5.5</version></dependency>

2. 擷取access_token

package com.wsk.netty.check;import org.json.JSONObject;import java.io.BufferedReader;import java.io.InputStreamReader;import java.net.HttpURLConnection;import java.net.URL;import java.util.List;import java.util.Map;/** * 擷取token類 * * @Author : WuShukai * @Date :2018/2/12 10:04 */public class AuthService {    /**     * 擷取許可權token     * @return 返回樣本：     * {     * "access_token": "24.460da4889caad24cccdb1fea17221975.2592000.1491995545.282335-1234567",     * "expires_in": 2592000     * }     */    public static String getAuth() {        // 官網擷取的 API Key 更新為你註冊的        String clientId = "**";        // 官網擷取的 Secret Key 更新為你註冊的        String clientSecret = "**";        return getAuth(clientId, clientSecret);    }    /**     * 擷取API訪問token     * 該token有一定的有效期間，需要自行管理，當失效時需重新擷取.     * @param ak - 百度雲官網擷取的 API Key     * @param sk - 百度雲官網擷取的 Securet Key     * @return assess_token 樣本：     * "24.460da4889caad24cccdb1fea17221975.2592000.1491995545.282335-1234567"     */    private static String getAuth(String ak, String sk) {        // 擷取token地址        String authHost = "https://aip.baidubce.com/oauth/2.0/token?";        String getAccessTokenUrl = authHost                // 1. grant_type為固定參數                + "grant_type=client_credentials"                // 2. 官網擷取的 API Key                + "&client_id=" + ak                // 3. 官網擷取的 Secret Key                + "&client_secret=" + sk;        try {            URL realUrl = new URL(getAccessTokenUrl);            // 開啟和URL之間的串連            HttpURLConnection connection = (HttpURLConnection) realUrl.openConnection();            connection.setRequestMethod("GET");            connection.connect();            // 擷取所有回應標頭欄位            Map<String, List<String>> map = connection.getHeaderFields();            // 遍曆所有的回應標頭欄位            for (String key : map.keySet()) {                System.err.println(key + "--->" + map.get(key));            }            // 定義 BufferedReader輸入資料流來讀取URL的響應            BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));            StringBuilder result = new StringBuilder();            String line;            while ((line = in.readLine()) != null) {                result.append(line);            }            /**             * 返回結果樣本             */            System.err.println("result:" + result);            JSONObject jsonObject = new JSONObject(result.toString());            return jsonObject.getString("access_token");        } catch (Exception e) {            System.err.printf("擷取token失敗！");            e.printStackTrace(System.err);        }        return null;    }    public static void main(String[] args) {        getAuth();    }}

3. 編寫將圖片轉化成base64後再轉化成urlencode的工具類

package com.wsk.netty.check;import sun.misc.BASE64Encoder;import java.io.FileInputStream;import java.io.IOException;import java.io.InputStream;import java.net.URLEncoder;/** * 圖片轉化base64後再UrlEncode結果 * @Author : WuShukai * @Date :2018/2/12 10:43 */public class BaseImg64 {    /**     * 將一張本地圖片轉化成Base64字串     * @param imgPath 本地圖片地址     * @return 圖片轉化base64後再UrlEncode結果     */    public static String getImageStrFromPath(String imgPath) {        InputStream in;        byte[] data = null;        // 讀取圖片位元組數組        try {            in = new FileInputStream(imgPath);            data = new byte[in.available()];            in.read(data);            in.close();        } catch (IOException e) {            e.printStackTrace();        }        // 對位元組數組Base64編碼        BASE64Encoder encoder = new BASE64Encoder();        // 返回Base64編碼過再URLEncode的位元組數組字串        return URLEncoder.encode(encoder.encode(data));    }}

4. 編寫調用百度API介面的方法，擷取識別結果

package com.wsk.netty.check;import org.apache.http.HttpResponse;import org.apache.http.client.HttpClient;import org.apache.http.client.methods.HttpPost;import org.apache.http.entity.StringEntity;import org.apache.http.impl.client.DefaultHttpClient;import org.apache.http.util.EntityUtils;import java.io.File;import java.io.IOException;import java.net.URI;import java.net.URISyntaxException;/** * 映像文字識別 * * @Author : WuShukai * @Date :2018/2/12 10:25 */public class Check {    private static final String POST_URL = "https://aip.baidubce.com/rest/2.0/ocr/v1/general_basic?access_token=" + AuthService.getAuth();    /**     * 識別本地圖片的文字     *     * @param path 本地圖片地址     * @return 識別結果，為json格式     * @throws URISyntaxException URI開啟異常     * @throws IOException        io流異常     */    public static String checkFile(String path) throws URISyntaxException, IOException {        File file = new File(path);        if (!file.exists()) {            throw new NullPointerException("圖片不存在");        }        String image = BaseImg64.getImageStrFromPath(path);        String param = "image=" + image;        return post(param);    }    /**     * @param url 圖片url     * @return 識別結果，為json格式     */    public static String checkUrl(String url) throws IOException, URISyntaxException {        String param = "url=" + url;        return post(param);    }    /**     * 通過傳遞參數：url和image進行文字識別     *     * @param param 區分是url還是image識別     * @return 識別結果     * @throws URISyntaxException URI開啟異常     * @throws IOException        IO流異常     */    private static String post(String param) throws URISyntaxException, IOException {        //開始搭建post請求        HttpClient httpClient = new DefaultHttpClient();        HttpPost post = new HttpPost();        URI url = new URI(POST_URL);        post.setURI(url);        //佈建要求頭，要求標頭必須為application/x-www-form-urlencoded，因為是傳遞一個很長的字串，不能分段發送        post.setHeader("Content-Type", "application/x-www-form-urlencoded");        StringEntity entity = new StringEntity(param);        post.setEntity(entity);        HttpResponse response = httpClient.execute(post);        System.out.println(response.toString());        if (response.getStatusLine().getStatusCode() == 200) {            String str;            try {                /*讀取伺服器返回過來的json字串資料*/                str = EntityUtils.toString(response.getEntity());                System.out.println(str);                return str;            } catch (Exception e) {                e.printStackTrace();                return null;            }        }        return null;    }    public static void main(String[] args) {        String path = "E:\\find.png";        try {            long now = System.currentTimeMillis();            checkFile(path);            checkUrl("https://gss3.bdstatic.com/-Po3dSag_xI4khGkpoWK1HF6hhy/baike/c0%3Dbaike80%2C5%2C5%2C80%2C26/sign=08c05c0e8444ebf8797c6c6db890bc4f/fc1f4134970a304e46bfc5f7d2c8a786c9175c19.jpg");            System.out.println("耗時：" + (System.currentTimeMillis() - now) / 1000 + "s");        } catch (URISyntaxException | IOException e) {            e.printStackTrace();        }    }}

5. 識別結果（僅測試本地圖片識別）

中文

結果：

結論

英文：

結果：

結論

單識別英文的圖片，效果還是比較滿意的，耗時短，精確度高。

中英文結合：

結果：

結論

結果也是比較滿意的。

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Java-百度API的圖片文字識別（支援英文）

1. 準備pom檔案

2. 擷取access_token

3. 編寫將圖片轉化成base64後再轉化成urlencode的工具類

4. 編寫調用百度API介面的方法，擷取識別結果

5. 識別結果（僅測試本地圖片識別）

中文

結果：

結論

英文：

結果：

結論

中英文結合：

結果：

結論

test4j圖片文字識別教程：http://blog.csdn.net/wsk1103/article/details/54173282

1. 準備pom檔案

2. 擷取access_token

3. 編寫將圖片轉化成base64後再轉化成urlencode的工具類

4. 編寫調用百度API介面的方法，擷取識別結果

5. 識別結果（僅測試本地圖片識別）

中文

結果：

結論

英文：

結果：

結論

中英文結合：

結果：

結論

聯繫我們

熱門內容

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Java-百度API的圖片文字識別（支援英文）

1. 準備pom檔案

2. 擷取access_token

3. 編寫將圖片轉化成base64後再轉化成urlencode的工具類

4. 編寫調用百度API介面的方法，擷取識別結果

5. 識別結果（僅測試本地圖片識別）

中文

結果：

結論

英文：

結果：

結論

中英文結合：

結果：

結論

test4j圖片文字識別教程：http://blog.csdn.net/wsk1103/article/details/54173282

1. 準備pom檔案

2. 擷取access_token

3. 編寫將圖片轉化成base64後再轉化成urlencode的工具類

4. 編寫調用百度API介面的方法，擷取識別結果

5. 識別結果（僅測試本地圖片識別）

中文

結果：

結論

英文：

結果：

結論

中英文結合：

結果：

結論

聯繫我們

熱門內容

熱門主題

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support