Numpy,sklearn provides random data generation capabilities, and we can generate data for a particular model ourselves, clean it with random data, convert it, and then select the model to fit and predict with the algorithm.
1.numpy Random data Generation API
NumPy is more suitable for producing some simple sampling data. The APIs are in the random class, and the common APIs are:
(1). rand (D0, D1, ..., dn) is used to generate an array of d0xd1x...dn dimensions. The value of the array is between [0, 1].
(2). RANDN (D0, D1, ..., dn), also used to generate an array of d0xd1x...dn dimensions. However, the value of the array follows the standard normal distribution of n (0, 1).
If you need to obey the normal distribution of ν (μ,δ2), simply change δx+μ on each generated value x on RANDN.
(3). Randint (low[, high, size]), generate random data of size, size can be an integer, a matrix dimension, or a dimension of tensor. The value is in the half open interval [low, high].
For example: Np.random.randint (3, size=[2,3,4]) returns data for dimension 2x3x4. The value range is an integer with a maximum value of 3.
For example: Np.random.randint (3, 6, size=[2,3]) returns data with a dimension of 2x3. The value range is [3,6].
(4). Random_integers (low[, high, size]), similar to the above randint, the difference between the range of values is closed interval [low, high].
(5). Random_sample ([size]), returns the random floating-point number in the half-open interval [0.0, 1.0]. If it is another interval [a, b), it can be converted (b-a) * Random_sample ([size]) + A
For example: (5-2) *np.random.random_sample (3) +2 returns 3 random numbers between [2,5].
Introduction to the 2.sklearn random data Generation API
Sklearn generates random data in the Datasets class, and numpy can generate data that is appropriate for a particular machine learning model. The commonly used APIs are:
(1). Using make_regression to generate regression model data
(2). Generate categorical model data with make_hastie_10_2,make_classification or make_multilabel_classification
(3). Using Make_blobs to PLA class model data
(4). Use Make_gaussian_quantiles to generate grouped data with normal distribution
Generation of random numbers in machine learning algorithms