Using N-gram language model in NLP to build the environment for completing Cloze in English

Source: Internet
Author: User
Tags nltk


This article is a description of the construction of a NLP project environment in the XING_NLP of the fork in GitHub with the N-gram language model, originally written in Readme.md. The first time to use the wiki on GitHub, think of a try is also good, but the format is very chaotic, they are not satisfied, so first in the blog Park record, and so on GitHub blog build success.

1. Operating system:

As Programer,linux nature is the first choice, Ubuntu,centos and so on can be. I use is CentOS7.3, before with Centos6.5 various error, recommended to install the latest version of the Linux system, what is the latest version? The LINXU system after 2016.
Related issues, follow-up given.

2. Environment Construction:

The following actions are recommended for root users.

2.1 Anaconda (python2.7 version)

Here is an open source image download link to Tsinghua University:
[anacondapython2.7 Latest version of Tsinghua Link] (https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/Anaconda2-4.4.0-Linux-x86_64.sh)
Installation method:
Bash anaconda2-4.4.0-linux-x86_64.sh

2.2 Installing the NLTK installation method:

Pip Install NLTK
After the installation is complete, download the Punkt package in NLTK.
[Email protected] ~]# Ipython
Python 2.7.13 | Anaconda 4.4.0 (64-bit) | (Default, Dec 20 2016, 23:09:15)
Type "Copyright", "credits" or "license" for more information.

IPython 5.3.0--an enhanced Interactive Python.
? Introduction and overview of IPython ' s features.
%quickref, Quick Reference.
Help, Python ' s own Help system.
Object? Details about ' object ', use ' object? ' for extra details.

In [1]: Import NLTK

In [2]: Nltk.download ()
NLTK Downloader
---------------------------------------------------------------------------
d) Download L) List u) Update c) Config h) help Q) Quit
---------------------------------------------------------------------------
Downloader> D

Download which package (l=list; x=cancel)?
identifier> Punkt
Downloading package Punkt to/root/nltk_data ...
Package Punkt is already up-to-date!

---------------------------------------------------------------------------
d) Download L) List u) Update c) Config h) help Q) Quit
---------------------------------------------------------------------------
Because I've installed it here, it's already up to date.

2.3 Installing KENLM

This is the focus of this article, the complexity of the anomaly:
The following link is the official website about the dependency package installation instructions, you can read, according to the official website, do not understand the continuing to look down:
[KENLM official website about dependent package installation instructions:] (https://kheafield.com/code/kenlm/dependencies/)
Description: Be sure to check the GCC version (GCC-V) version before installing the KENLM. The following error will be reported:
Gcc-pthread-fno-strict-aliasing-g-o2-dndebug-g-fwrapv-o3-wall-wstrict-prototypes-fpic-i.-I/root/anaconda2/inc Lude/python2.7-c Util/float_to_string.cc-o build/temp.linux-x86_64-2.7/util/float_to_string.o-o3-dndebug-dkenlm_ Max_order=6-std=c++11-dhave_zlib
Cc1plus: Warning: command-line option "-wstrict-prototypes" is valid for ADA/C/OBJC, but not for C + +
Cc1plus: Error: Unrecognized command-line option "-std=c++11"
Error:command ' gcc ' failed with exit status 1

----------------------------------------
Command "/root/anaconda2/bin/python-u-C" Import setuptools, tokenize;__file__= '/tmp/pip-ndhckc-build/setup.py '; f= GetAttr (tokenize, ' open ', open) (__file__); Code=f.read (). replace (' \ r \ n ', ' \ n '); F.close (); Exec (compile (code, __file_ _, ' exec ')) "Install--record/tmp/pip-gp5mep-record/install-record.txt--single-version-externally-managed-- Compile "failed with error code 1 in/tmp/pip-ndhckc-build/'

2.3.1 the installation of a dependent package:

CMake, XZ, zlib, bzip2, boost (boost must be placed at the end of the installation, it depends on the previous package)
(1) Installation of CMake: Because I am a centos, this I have before, using Yum installed, compile and install can also. (Download tar.gz, or tar.bz)
Yum Install CMake
(2) XZ installation: Official website Download the latest, the following addresses are the latest.
wget http://tukaani.org/xz/xz-5.2.2.tar.gz
Tar xzvf xz-5.2.2.tar.gz
CD xz-5.2.2
./configure
Make
Make install
(3) Installation of zlib:
wget http://zlib.net/zlib-1.2.8.tar.gz
Tar xzf zlib-1.2.8.tar.gz
CD zlib-1.2.8
./configure
Make
Make install '
(4) Installation of bzip2:
wget http://www.bzip.org/1.0.6/bzip2-1.0.6.tar.gz
Tar xzvf bzip2-1.0.6.tar.gz
CD bzip2-1.0.6/
Make
Make install
(5) Boost installation:
wget https://dl.bintray.com/boostorg/release/1.64.0/source/boost_1_64_0.tar.bz2
Tar xjf boost_1_64_0.tar.bz2
./bootstrap.sh
./B2 Install
Explain this I use the source code to compile the installation problem, do not know, now put the issue here, see can solve together:
. Failed Gcc.link.dll bin.v2/libs/iostreams/build/gcc-4.8.5/release/threading-multi/libboost_iostreams.so.1.64.0 ...

... found 14723 targets ...
... updating 3 targets ...
Gcc.link.dll bin.v2/libs/iostreams/build/gcc-4.8.5/release/threading-multi/libboost_iostreams.so.1.64.0
/USR/BIN/LD:/usr/local/lib/libbz2.a (BZLIB.O): Relocation r_x86_64_32s against ' bz2_crc32table ' can not is used when Maki ng a shared object; Recompile with-fpic
/usr/local/lib/libbz2.a:error adding Symbols:bad value
Collect2:error:ld returned 1 exit St ATUs

"g++"-o "bin.v2/libs/iostreams/build/gcc-4.8.5/release/threading-multi/libboost_iostreams.so.1.64.0"-Wl,-h- WL,LIBBOOST_IOSTREAMS.SO.1.64.0-SHARED-WL,--Start-group "bin.v2/libs/iostreams/build/gcc-4.8.5/release/ THREADING-MULTI/FILE_DESCRIPTOR.O "" BIN.V2/LIBS/IOSTREAMS/BUILD/GCC-4.8.5/RELEASE/THREADING-MULTI/MAPPED_FILE.O "BIN.V2/LIBS/IOSTREAMS/BUILD/GCC-4.8.5/RELEASE/THREADING-MULTI/BZIP2.O" "bin.v2/libs/iostreams/build/gcc-4.8.5 /RELEASE/THREADING-MULTI/GZIP.O "" BIN.V2/LIBS/IOSTREAMS/BUILD/GCC-4.8.5/RELEASE/THREADING-MULTI/ZLIB.O "-Wl,- BSTATIC-WL,-BDYNAMIC-LZ-LBZ2-LRT-WL,--end-group-pthread-m64

... failed Gcc.link.dll bin.v2/libs/iostreams/build/gcc-4.8.5/release/threading-multi/libboost_ iostreams.so.1.64.0 ...
... skipped <pstage/lib>libboost_iostreams.so.1.64.0 for lack of <pbin.v2/libs/iostreams/build/gcc-4.8.5/ release/threading-multi>libboost_iostreams.so.1.64.0 ...
... skipped <pstage/lib>libboost_iostreams.so for lack of <pstage/lib>libboost_iostreams.so.1.64.0 ...
... failed updating 1 target ...
... skipped 2 targets ...
And how did I finally get it done? I used the simplest way to install Yum:
Yum install-y boost Boost-devel Boost-doc

Installation of 2.3.2 KENLM:

The previous dependency package was successfully installed, so this step is a good one:
Weget http://kheafield.com/code/kenlm.tar.gz
CD KENLM
mkdir Build
CD Build
CMake.
Make
Get!


Using N-gram language model in NLP to build the environment for completing Cloze in English

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.