Simple testing and use of PHP Machine Learning Library php-ml, php machine library php-ml

Source: Internet
Author: User
Tags autoload

Simple testing and use of PHP Machine Learning Library php-ml, php machine library php-ml

Php-ml is a machine learning library written in PHP. Although we know that python or C ++ provides more machine learning libraries, in fact, most of them are slightly complicated. configuration makes many new users feel desperate. Although the php-ml Machine Learning Library does not have very high algorithms, it has the most basic algorithms such as machine learning and classification, our small companies are enough to do some simple data analysis and prediction. In our project, we should pursue cost-effectiveness, rather than excessively efficiency and accuracy. Some algorithms and libraries seem very powerful, but if we want to get online quickly, and our technical staff have no experience in machine learning, complicated code and configuration will drag down our project. If we are doing a simple machine learning application, the learning cost for studying complex databases and algorithms is obviously high. Moreover, the project has encountered strange and strange problems, can we solve this problem? What if the demand changes? I believe everyone has had this experience: When doing this, the program suddenly reports an error, and I don't know why. I searched Google or Baidu and found only one problem that met the conditions, I asked a question five or ten years ago, and then did not reply... Therefore, it is necessary to select the simplest, most efficient, and most cost-effective approach. Php-ml is not slow (change to php7), and the accuracy is good. After all, the algorithms are the same, and php is based on c. What bloggers are most familiar with is the performance ratio between python, Java, and PHP. If you really want performance, please use C for development. If you really want to pursue the scope of application, use C, or even compile...

First, we need to download this library before using it. You can download to this library file (https://github.com/php-ai/php-ml) on github ). Of course, we recommend that you use composer to download the library and configure it automatically.

After the download is complete, we can take a look at the documentation of this library, which is a few simple examples. We can try to create a file by ourselves. Easy to understand. Next, let's test the actual data. One dataset is Iris's data set, and the other is lost, so I don't know what the data is...

There are three different types of Iris flower data:

An unknown dataset with a comma (,) as the decimal point, must be processed during computation:

We first process unknown datasets. First, we do not know the name of the data set file is data.txt. This dataset can be first drawn into an x-y line chart. Therefore, we will first plot the original data into a line chart. Since the X axis is relatively long, we only need to see its approximate shape:

The jpgraph library using php is drawn. The Code is as follows:

1 <? Php 2 shortde_once '. /src/jpgraph. php '; 3 include_once '. /src/jpgraph_line.php '; 4 5 $ g = new Graph (1920,1080); // jpgraph plotting operations 6 $ g-> SetScale ("textint "); 7 $ g-> title-> Set ('data'); 8 9 // file processing 10 $ file = fopen('data.txt ', 'R '); 11 $ labels = array (); 12 while (! Feof ($ file) {13 $ data = explode ('', fgets ($ file); 14 $ data [1] = str_replace (',','. ', $ data [1]); // data processing. Correct the comma in the data to the decimal point of 15 $ labels [(int) $ data [0] = (float) $ data [1]; // store the data in the array as key values, so that we can sort the data by key 16} 17 18 ksort ($ labels ); // sort the key size 19 $ x = array (); // the x axis indicates data 21 $ y = array (); // The y axis indicates data 22 foreach ($ labels as $ key => $ value) {23 array_push ($ x, $ key); 24 array_push ($ y, $ value ); 25} 26 27 28 $ linePlot = new LinePlot ($ y); 29 $ g-> xaxis-> SetTickLabels ($ x ); 30 $ linePlot-> SetLegend ('data'); 31 $ g-> Add ($ linePlot); 32 $ g-> Stroke ();

With this source image for comparison, we will continue to learn. We use LeastSquars in php-ml for learning. We need to save the test output to a file so that we can draw a comparison chart. The Learning code is as follows:

1 <? Php 2 require 'vendor/autoload. php '; 3 4 use Phpml \ Regression \ LeastSquares; 5 use Phpml \ ModelManager; 6 7 $ file = fopen('data.txt', 'R'); 8 $ samples = array (); 9 $ labels = array (); 10 $ I = 0; 11 while (! Feof ($ file) {12 $ data = explode ('', fgets ($ file); 13 $ samples [$ I] [0] = (int) $ data [0]; 14 $ data [1] = str_replace (',','. ', $ data [1]); 15 $ labels [$ I] = (float) $ data [1]; 16 $ I ++; 17} 18 fclose ($ file); 19 20 $ regression = new LeastSquares (); 21 $ regression-> train ($ samples, $ labels ); 22 23 // The a array is provided based on the x value after the original data processing for testing. 24 $ a =, 55,57, 60,61, 108,124]; 25 for ($ I = 0; $ I <count ($ a); $ I ++) {26 file_put_contents ("putput.txt ", ($ regression-> predict ([$ a [$ I]). "\ n", FILE_APPEND); // append the object to 27}

Then, we read the data stored in the file, draw a graph, and paste the last one:

The Code is as follows:

 1 <?php 2 include_once './src/jpgraph.php'; 3 include_once './src/jpgraph_line.php'; 4  5 $g = new Graph(1920,1080); 6 $g->SetScale("textint"); 7 $g->title->Set('data'); 8  9 $file = fopen('putput.txt','r');10 $y = array();11 $i = 0;12 while(!feof($file)){13     $y[$i] = (float)(fgets($file));14     $i ++;            15 }  16 17 $x = [0,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,20,22,23,24,25,26,27,29,30,31,37,40,41,45,48,53,55,57,60,61,108,124];18 19 $linePlot = new LinePlot($y);20 $g->xaxis->SetTickLabels($x);  21 $linePlot->SetLegend('data');22 $g->Add($linePlot);23 $g->Stroke();

It can be found that the image access is still relatively large, especially in the area with a large amount of image sawtooth. However, this is, after all, 40 sets of data. We can see that the approximate graph trend is consistent. When a general library is doing this kind of learning, the accuracy is very low when the data volume is low. To achieve high precision, a large amount of data is required, and more than pieces of data are required. If this data requirement is not met, it would be futile to use any database. Therefore, in the practice of machine learning, it is really difficult not to solve technical problems such as low precision and complicated configuration, but to have insufficient data or low quality (too much useless data in a group of data ). Prior to machine learning, data pre-processing is also necessary.

 

 

Next, we will test the flower core data. There are three categories in total. Since we downloaded csv data, we can use the official php-ml method to operate csv files. This is a classification problem, so we chose the SVC algorithm provided by the database for classification. Set the file name of the data to iris.csv. The Code is as follows:

1 <? Php 2 require 'vendor/autoload. php '; 3 4 use Phpml \ Classification \ SVC; 5 use Phpml \ SupportVectorMachine \ Kernel; 6 use Phpml \ Dataset \ CsvDataset; 7 8 $ dataset = new CsvDataset('Iris.csv', 4, false); 9 $ classifier = new SVC (Kernel: LINEAR, $ cost = 1000); 10 $ classifier-> train ($ dataset-> getSamples (), $ dataset-> getTargets (); 11 12 echo $ classifier-> predict ([$ argv [1], $ argv [2], $ argv [3], $ argv [4]); // $ argv is a command line parameter. It is more convenient to debug this program using the command line.

Is it easy? Just 12 lines of code is enough. Next, let's test it. According to the figure above, when we input 5 3.3 1.4 0.2, the output should be Iris-setosa. Let's take a look:

Check that at least one original data is input and the correct result is obtained. But what if we enter data that is not in the original dataset? Let's test two groups:

From the data of the two images we posted earlier, the data we input does not exist in the data set, but the classification is reasonable according to our preliminary observation.

Therefore, this machine learning library is sufficient for most people. However, most people who despise this database and despise that database and talk about performance are basically not great. The real Daniel is busy making money or doing academic research. We should be more familiar with algorithms, understand the principles and mysteries, rather than talking about them. Of course, this library is not recommended for large projects, but for small projects or personal projects.

Jpgraph only depends on the GD library, so it can be used after downloading and referencing. A large amount of code is put on Drawing Graphics and initial data processing. Because of the excellent library encapsulation, learning code is not complicated. If you need all the code or test data sets, you can leave a message or send a private message. I provide the complete code and decompress it immediately (the blog Park is too small to support file uploading ). The blogger is also learning and working with everyone.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.