I have been reading this book about collective intelligent programming in my spare time recently. Here I will record some problems encountered during the process of accompanying the book code:
(Note: The following page numbers are for English non-Photocopies)
Chapter1 title:
Nothing to say, just browse.
Chapter2 provides recommendations:
1. The source code in the book is based on Python 2.x, while print is treated as a function in 3.x, so we need to add brackets;
2. Why this book has been published for a long time when p42 uses the RSS subscription source provided by Del. icio. us to build a dataset
Many interfaces have changed. I skipped this experiment and made movielens. The data in the book is as follows:
Http://grouplens.org/datasets/movielens /;
Chapter3 discovery group:
It is still a problem of dataset construction. The main focus of our learning is the data processing part, rather than data collection. Therefore, we can directly download data files,
We will
Replace it with segaran. For Pil installation, do not use Python easy_install or PIP install to directly download the corresponding
The executable file of the platform is now available. Here I download pil-1.1.7.win32-py2.7.exe and install it directly;
Chpter4 search and ranking:
1. The database SQLite is installed in Python 2.x and contains SQLite. Therefore, you do not need to install it yourself in most cases.
The installation path is Python/lib/. If sqlite3 is found, it can be used directly. The statement of the corresponding import package can be changed to import sqlite3 as SQLite.
2. In the p85 section, the addlinkref function is not introduced. If you need it, go to the source package of the book to view it.
3. In the calculatepagerank (self, iterations) function, when the PageRank table is initialized, the source code in the book is different from the source code in the book, but the efficiency of the source code in the book is
Relatively high
4. Different code output results:
A. The searchindex. DB created is about 27 MB in the book and about 22 MB in self-generated data. This is mainly because some links are invalid.
B. Before p103 is trained, the getresult function produces 0.076 results, which are then trained multiple times, and the output results are different due to different datasets.
Is normal.
C. Note that when creating databases and tables, you must first execute a function that generates hidden layer nodes, generatehiddennode (wordids, urlids)
In NN. py
In searchengine. py
Collective Smart Programming-error table (Chapter 1-4)