Natural language Processing Task data set

Source: Internet
Author: User
Tags knowledge base

Natural language Processing Task data set

KEYWORDS:NLP, DataSet

AI Challenger-UK-China translation reviews

Applicable field: Machine translation

The largest English-Chinese bilingual data set in the field of spoken English. More than 10 million English-Chinese pairs of sentences are provided as data sets. All bilingual sentences are manually checked, and the data sets are guaranteed in terms of size, relevance and quality.

Training set: 10,000,000 sentences
Verification set (simultaneous interpretation): 934 sentences
Validation set (text translation): 8000 sentences

Https://challenger.ai/datasets/translation

UN Parallel Corpus-United Nations parallel corpus

Applicable field: Machine translation

The United Nations parallel corpus consists of official United Nations records and other parliamentary documents that have entered the public domain. The corpus contains text that has been written for 1990-2014 years and is manually translated, including text that is aligned in statement units.

The corpus aims to provide multilingual language resources to facilitate research and progress in various natural language processing such as machine translation. For ease of use, the corpus also provides ready-made bilingual text in specific languages and six language parallel language material libraries.

Description: Https://conferences.unite.un.org/UNCorpus/zh#introduction

Download: Https://conferences.unite.un.org/UNCorpus/zh/DownloadOverview

(not currently downloaded)

2nd International Chinese Word Segmentation Bakeoff

Applicable field: Chinese participle

This directory contains the training, test, and Gold-standard data
Used in the 2nd International Chinese Word Segmentation bakeoff.

http://sighan.cs.uchicago.edu/bakeoff2005/

Newsgroups

Applicable field: Text classification

The newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across different newsgroups.

http://qwone.com/~jason/20Newsgroups/

NLPCC 2017 News Headlines categories

Applicable field: Text classification

http://tcci.ccf.org.cn/conference/2017/taskdata.php

Reuters-21578 Text Categorization Collection

Applicable field: Text classification

This is a collection of documents, appeared on Reuters newswire in 1987. The documents were assembled and indexed with categories.

Http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.html

Full-screen News data (Sogouca)

Application areas: Text classification, event detection tracking, new word discovery, named entity recognition automatic summary

From a number of news sites June 2012-July period of domestic, international, sports, social, entertainment and other 18 channels of news data, provide URL and body information

http://www.sogou.com/labs/resource/ca.php

CMU World Wide Knowledge Base (WEB->KB) Project

Applicable field: Knowledge extraction

To develop a probabilistic, symbolic knowledge base this mirrors the content of the World Wide Web. If successful, this would make text information on the web available in computer-understandable form, enabling much more so phisticated Information retrieval and problem solving.

http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/

Natural language Processing Task data set

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.