Reference articles
http://lxsay.com/archives/269
Windows Pylucene 6.2, 6.4, or 6.5 installationPOSTED on 2017-02-16 by Chiccs
Update 2017.07: Added support for Python 3
This is probably the first article on the Internet that details the installation of a new version of Pylucene under Windows. Pylucene 4 The installation package under Windows was originally provided, but the new version is not available.
Those articles on the web that say Pylucene can only have 32-bit versions of the Windows system are purely baseless assertion.
This article allows the reprint with the modification, but when reproduced please indicate the original source: lxsay.com
This article also applies to 32-bit Windows systems, as long as the installed software is changed to 32-bit version of the
Prerequisites:
- Install Python (in Python 2.7.13, for example, Python 3 is also available) environment (recommended Anaconda, with Gensim, NumPy, scipy, etc. in a library that is difficult to install under Windows).
- Install JDK 1.8 (64-bit version) and configure the environment variables (recommended for this article: http://www.cnblogs.com/shinge/p/5500002.html), and configure the environment variables for the JRE (java_home/jre/ Bin/server This path is also added to PATH environment variable)
- Install VC for Python 2.7 (www.microsoft.com/en-hk/download/details.aspx?id=44266). If it is Python 3, install Visual C + + Build Tools (http://landinghub.visualstudio.com/visual-cpp-build-tools)
Step 1. Install Apache Ant and configure environment variables (add the bin directory path under the ant installation home directory in the PATH environment variable, not the same as the ant executable path configured later)
Step 2. Install Cygwin (used to execute Linux commands on Windows systems) and change the "Devel" item from "Default" to "installed" in the Packages window when installing. and configure the environment variable for Cygwin 64 (add the Bin directory path under the Cygwin 64 installation home directory in the PATH environment variable) and restart the computer for it to take effect
Step 3. Download and unzip the source code of Pylucene (commonly extracted folders called pylucene-6.4.1, etc.)
Step 4. Enter the JCC folder under the Pylucene installation folder, modify the setup.py file, and in the CFLAGS configuration, add the "/bigobj" parameter after ' Win32 ', as shown below
CFLAGS = {
' Darwin ': ['-fno-strict-aliasing ', '-wno-write-strings ',
'-mmacosx-version-min=10.5 '],
' ipod ': ['-wno-write-strings '],
' Linux2 ': ['-fno-strict-aliasing ', '-wno-write-strings '],
' sunos5 ': ['-features=iddollar ',
'-erroff=badargtypel2w,wbadinitl,wvarhidemem '],
' Win32 ': ["/EHsc", "/d_crt_secure_no_warnings", "/bigobj"], # MSVC 9 (2008)
' Mingw32 ': ['-fno-strict-aliasing ', '-wno-write-strings '],
' Freebsd7 ': ['-fno-strict-aliasing ', '-wno-write-strings '],
}
If you do not add the "/bigobj" parameter, you will be prompted when compiling the Pylucene file is too large, the compiler cannot compile
Update: in Pylucene 6.5, this issue has been resolved and no need to add this parameter
Step 5. Enter the JCC folder under the command line, input python setup.py build, and if there is no error message, continue to enter the Python setup.py install setup JCC. Restart your computer
Step 6. Go to the Pylucene-6.2.0 folder (depending on the version you want to install, such as pylucene-6.4.1), modify the Makefile file
First comment out (by adding the ' # ' symbol before each line) the default configuration
# Mac OS X 10.12 (64-bit Python 2.7, Java 1.8)
Prefix_python=/users/vajda/apache/pylucene/_install
Ant=/users/vajda/tmp/apache-ant-1.9.3/bin/ant
python=$ (Prefix_python)/bin/python
jcc=$ (PYTHON)-M jcc.__main__--shared--arch x86_64
Num_files=8
Then insert
Prefix_python=d:/progra~2/anaconda2
Ant=d:/apache-ant-1.9.7/bin/ant
Java_home=c:/progra~1/java/jdk1.8.0_101
python=$ (Prefix_python)/python.exe
#JCC =$ (PYTHON)-M JCC--shared--find-jvm-dll
jcc=$ (PYTHON)-M JCC--shared
Num_files=8
Update: the JCC Item no longer accepts the shared parameter in Pylucene 6.5, so it should be changed to jcc=$ (PYTHON)-M JCC, otherwise an error will be installed
Where Prefix_python is the installation home directory for PYTHON, if you create a new Anaconda environment and compile it in that environment, change the Prefix_python to the root of the environment (typically the anaconda2/envs/environment name)
Ant is the path to the ant Binary executable (note that it is not the installation directory but the executable path)
The Num_files parameter is used to specify the number of splits in the intermediate file, which allows the compiler to handle large files, but may cause errors if modified to other values. So the best way is to specify the VC parameter "/bigobj" in the previous step
It is also important to note thatMakefile does not accept paths with spaces , so if the path has spaces, use the abbreviated path (also called DOS Path, as in the above paragraph), using the command line.
Step 7. Run make for compilation (note: This step will not work if Cygwin64 is not installed and configured previously)
Step 8. After compiling, run make test for testing, and due to some bugs in the project, there are several tests that detect pylucene exceptions that are not passed, but are not a big deal.
Step 9. Run make install for installation
Problems:
1. When compiling and installing JCC, it is not possible to compile with MSVC, Python is always automatically compiled with GCC: to be compiled with Visual Studio C + +, you need to edit distutils.cfg in the Python installation master directory \lib\distutils Changes to the contents of the file as shown below
[Build]
Compiler=msvc
[Build_ext]
Compiler=msvc
is essentially modifying the Setuptools configuration file
The second option is to add parameters to the Python setup.py build--compiler=msvc When you run the setup.py file installation, but this method will fail when you compile and install Pylucene next, because make cannot specify what Python uses C + + Compiler
2. Make test gives error 123:linux because Lucene's test code uses an exception from the previous version of Python. Change the Windowserror in test_pylucene.py to OSError. The cause of the error under Windows is also the same, but the file involved more, change it more troublesome, so directly ignore
3. Make when prompted from JCC import _JCC cannot find DLL: The reason is that Jvm.dll is not found and the JRE directory path containing Jvm.dll is added to the PATH environment variable (such as Jre\bin\server). This type of error can also occur if the system variables for the JDK are not set before the installation of JCC is compiled. It is recommended to restart the computer once the environment variable is set. If such an error persists after setting the environment variable, you can modify the setup.py file under the JCC directory to modify ' Javahome ' to the actual directory path of the JDK
4. Indexes established with Pylucene 6.2 and 6.4 can be compatible with each other, but the index established by Pylucene 4 cannot be used in more than 6.2 versions (requires the use of Indexupdater package processing). Program code is also, because many functions and definitions are removed from the new version
The problems encountered are
In the C:\Python27\Lib\site-packages\JCC-3.0-py2.7-win-amd64.egg\jcc\windows.py
The Add_jvm_dll_directory_to_path
I can't find the Jvm.dll.
Workaround:
To write Dll_path to death,
Def add_jvm_dll_directory_to_path ():
Path = os.environ[' path '].split (OS.PATHSEP)
Dll_path = Get_jvm_dll_directory ()
Dll_path= ' D:\programs\java\jdk1.8.0\jre\bin\server\jvm.dll '
If Dll_path is not None:
Path.append (Dll_path)
os.environ[' path ' = os.pathsep.join (path)
Return True
Raise ValueError, "Jvm.dll could not being found"
Above
Another problem encountered is that
Java.lang.ExceptionInInitializerError
caused By:java.lang.RuntimeException:WARNING:Can not find lexical dictionary directory! This would cause unpredictable exceptions in your application! Refer to the manual to download the dictionaries.
At Org.apache.lucene.analysis.cn.smart.AnalyzerProfile.init (analyzerprofile.java:73)
Workaround reference:
http://blog.csdn.net/dsbatigol/article/details/14448151
The reason is that when the SMARTCN is added, the corresponding configuration is not ready
Enter Makefile under the Pylucene root directory,
Find this line.
#JARS+=$(SMARTCN_JAR) # smart chinese analyzer
Delete this #
Found it
Add a sentence in the vicinity:
--exclude org.apache.lucene.analysis.cn.smart.AnalyzerProfile
Re-make and make install overlay installation, done!
The summary is that, in the face of problems, from error analysis,
WIN10 Configuration Pylucene