Integer factorization (conversion)
Compared with prime number determination, the implementation of Factorization cannot be so fast. There is still no polynomial algorithm similar to the prime number determination for factorization, which also ensures the security of RSA public key systems.
The basis of the fault. Given the difficulty of these two problems, it is better to know in advance that the target integer is indeed not a prime number before decomposition. Otherwise, it is likely to take a lot of effort to do the work of determining the prime number.
-- It's time to kill the chicken with a knife.
Factorization can be divided into two categories: general method and special method. Generally, it tends to first target the particularity of the number (for example
) Use a special method. If the form of the target number is not so special, try the general method again. Of course, the former is often much faster than the latter.
Here we will focus on the general method of factorization, and it is always used
The number of targets to be decomposed. Special methods can be added later as needed.
-
Trial Division
-
Euclid Algorithm
-
Pollard
Method
-
Pollard
Method
-
Squfof)
-
Continued Fraction method (cfrac)
-
Lenstra Elliptic Curve Method (ECM)
-
Secondary screening (Qs)
-
-
Single polynomial quadratic screening (spqs)
-
Multiple polynomials quadratic screening method (mpqs)
-
Number field screening (NFS)
Trial Division
Trial Division (Trial Division) is the first step for determining prime numbers and factorization. There are two different options for the Trial Division Policy:
- Use enough space to store the prime factor used for Trial Division (the storage method can be quite compact, for example
The corresponding 0-1 vector represents a large integer ).
- It does not consume a lot of space to store all the necessary prime factors. In this case, a subroutine is required to quickly generate prime numbers, or simply only 2, 3, and
Integer to be used as the division factor.
There is seldom a case where the number does not have a small factor, for example, according to the Mertens theorem [1]
, Not in the odd number
Proportion of the following factors
We can know that all odd numbers of 76% have a prime factor smaller than 100, but not less.
The odd number of factors is only 6.1% [2]
. Therefore, in most cases, the second choice of Trial Division is sufficient, but the implementation is the simplest.
Euclid Algorithm
The Euclid algorithm is also very simple for factorization. The product of prime numbers smaller than 100 is calculated in advance.
Then
And target number
Perform the Euclid algorithm to obtain
And
And continue to break down the common factor to get a factor decomposition below 100. We can also calculate the product of prime numbers from 100 to 200,200 to 300 in advance.
,
And so on. This is essentially an implementation of the Trial Division. When
High precision arithmetic must be used for very large values.
Divided
Therefore, frequent Trial Division is very time-consuming, and The Euclid method can be executed for a very small number of times, and the final decomposition can be completed on the machine precision to improve efficiency.
Pollard
Method
Pollard
The method was proposed by Pollard in 1974. The basic idea is as follows: Set a prime number.
By the Fermat theorem
, So
It may be
Is an extraordinary factor. Of course, the problem is that we don't know
Yes. A reasonable assumption is:
For example,
All prime factors are included in the factor base.
To find
Able to "overwrite"
That is
To
So we can switch
To obtain the non-trivial factor. For example, if the upper limit of the prime factor is
, You can simply take
Or a minimum public multiple.
.
Pollard is given below
A version of the method:
Algorithm 1
(Pollard
Method)
- Set the upper limit of the prime factor search
, Generate
See the following figure.
Table of prime factor corresponding to number
, That is, 13, 2...
- Random positive integer
, Sequential computing
B _1 & = //
B _ {I + 1} &/equiv B _ I ^ {p_ I}/pmod {n}/quad I = 1, 2,/ldots
/End {Align *} "width =" 580 "Height =" 48 ">
- Regular check (for example
Is a multiple of 20)
, If
To obtain
Otherwise, the recursive calculation in step 1 is continued.
Note 1
Because the smaller the prime number
The power displayed in decomposition may be higher,
Small and Medium prime numbers (for example, 2 and 3) should appear repeatedly. The generation method in step 1 takes this into consideration (in fact, the final calculation
)
Note 2
In rare cases
, That is, all
All of the prime factors appear at the same time.
At this time, you can re-select the timing of the regular check or another
.
Note 3
Another similar Williams
The method depends on
There is only a small factor, and the famous Lucas sequence replaces
Power, multiplication Group
(
Is
Instead of the multiplication group.
. Therefore, Pollard
Method and Williams
The relationship between methods is good. Lehmer in the detection of the number of workers
Detection and Lucas
The relationships detected are the same. For details, refer to [3].
.
Note 4
In practice
Generally
Left and right.
Note 5
Pollard
The time complexity of the method is
, Where
For a positive number [4]
.
Pollard
Method
Currently, almost all practical decomposition methods are probabilistic algorithms with the goal of finding computing capabilities.
Algorithm to make
The probability is large (and the most common factor can be quickly calculated ). The above Pollard
This is an example.
Method is no exception.
Pollard
The method was proposed by Pollard in 1975. It comes from an interesting fact: randomly selected approximately
Integers (
For a constant), there is a high probability that two values are found in these integers.
Same. In practice, we can use the same-remainder recurrence sequence.
To generate pseudo-random numbers.
For ing:
. Set
Yes
And find
Is calculated
You may get
Is an extraordinary factor.
By
Limitation, the first-order recurrence sequence defined above
In
It must be a final loop (and looks like a Greek letter ).
). Set the length of the non-cyclic part starting with it
, The cycle section length is
. The famous Floyd algorithm can be found in
Efficiently finds two repeated elements in the sequence in step and uses only the constant storage space.
Algorithm 2
(Floyd)
- Judge whether or not
,
- If they are equal, terminate the operation. Otherwise, continue step 1.
Proof
(Algorithm validity)
You only need to prove that it must exist.
Yes
And
. Because
Equivalent
Therefore
Is
Medium
. You do not need to save all
To save the current
And recursive Calculation
,
.
□
Frequently Used in practice
Is
Selecting a quadratic recurrence sequence can provide enough randomness, and the calculation is also very simple.
Algorithm 3
(Pollard
)
- Random selection
And
.
- Sequential computing
.
- Computing
- For example
), Detection
If not, the algorithm is terminated. Otherwise, step 2 is continued.
Note 6
When
Large, for each
All go to Detection
It may take a lot of time, because our goal is to get
Non-trivial factor, which can be calculated
, And then timed Detection
To reduce the number of computations.
Note 7
Like Pollard
Method, or the most common factor calculated may appear.
In this case, the detection interval can be changed.
Or simply change
Calculate again.
Note 8
Pollard
The time complexity of the method is [4].
.
In fact, complexity depends on
Least prime factor
, In the separation
Is particularly effective.
Note 9
1980, Brent [5]
Pollard
An improvement of the method. When the integer is decomposed, the method can increase by 24% on average. This improvement is for the Floyd algorithm 2.
Because in the Floyd algorithm, repeated computation is often required.
And so on, Brent has the following improvements, without repeated computation, but still can find repeated elements equally effectively, as long as the constant storage space.
Algorithm 4
(Brent improvement)
- Ling
,
, If
The algorithm is terminated.
- If
Is the power of 2, that is
, Order
, Order in sequence
To determine whether there are
. If they are equal, the algorithm is terminated.
Proof
(Algorithm validity)
Note the algorithm process
Able to traverse all positive integers in sequence, repeat algorithm 2
The argument is known.
□
Squfof)
Square form factorization is an algorithm developed by shanks about thirty years ago, but it has never been officially published [6]
. Although squfof is complex
It is also an exponential algorithm (cfrac, ECM, and QS introduced below are all subexponential), but it still has its own advantages: on the one hand, the algorithm is very concise and elegant, which is easy to implement (or even can be implemented on a pocket calculator), and
To
The integer decomposition of the range is still the fastest.
Squfof depends on the analysis of the quadratic domain structure. Here we only provide the description of the algorithm, and omitted the proof. For details, refer to [6].
:
Algorithm 5
(Squfof)
Set
Non-prime number, which is output by the algorithm.
Is an extraordinary factor.
- Set
,
,
.
- Sequential computing
,
,
Until
It is the total number of shards. (
)
- Computing
,
,
,
.
- Repeat the calculation in step 2
, Output
.
Continued Fraction method (cfrac)
The continued fraction method was proposed by Morrison and brillhart in 1975.
They used this method to successfully break down the Fermat number.
. It is based on the following simple fact:
Then
Yes
Is an extraordinary factor.
Of course, looking for such
The cfrac method cannot only rely on luck to construct a group of identical operators.
X_k ^ 2/equiv (-1) ^ {e _ {0 k} P_1 ^ {e _ {1 k}/cdots P_m ^ {e _ {MK}/pmod {n }, /Tag {1}
/End {equation *} "width =" 580 "Height =" 20 ">
Where
All are factor bases
A smaller prime number. If you find enough such same-remainder formula (such as the number
), Then use the binary domain
The Gaussian elimination method on, you can find the combination coefficient
Make
Note
If
X =/prod _ {k = 1} ^ nx_k ^ {/varepsilon_k},/quad y = (-1) ^ {V_0}/prod _ {I = 1} ^ m {p_ I ^ {V_ I},/Tag {2}
/End {equation *} "width =" 580 "Height =" 45 ">
Then there is what we need
How can we construct so many similar expressions? We know that we can obtain the secondary irrational number by using the fractional part extension.
(
. Set
For its approximate score, then
The absolute value
Probably in the factor base
And
To obtain the remainder formula we expect (1)
.
Algorithm 6
(Cfrac method)
- Select an appropriate
(Usually set to 1. When the expanded cycle of the connected fraction is too small to generate enough same remainder formula, select another one.
), Order
, Making
.
- Computing
Returns a series of approximate fractions.
.
- Computing
, Try
Obtain
If the decomposition is successful
- When you get enough same-remainder formula (
You can.
Obtained by the Gauss elimination method (2)
In
.
- If
, Output
Non-trivial factor
.
Note 10
Because
Therefore, if
, Must
, So
Must be a model
So that only the prime number of the conditions can be selected in the first step.
.
Note 11
The calculation of the continuous fraction can only use the simple four arithmetic operations. The Gauss elimination method can be accelerated by some special sparse matrix algorithms. Therefore, the most time-consuming part of cfrac is
In terms of decomposition, when the decomposition
If it takes too long, you can simply give up and turn to the next same formula.
Note 12
The time complexity of the cfrac method is
(See [4].
), Where
The mark is defined as follows:
Note 13
Search for the same remainder (1)
The idea for decomposition comes from Dixon [8].
He chose to directly and randomly
And then
Lower Decomposition
The algorithm complexity is
This is also the first general integer Decomposition Method of the exponential order.
Lenstra Elliptic Curve Method (ECM)
Factor decomposition is ultimately looking
, Making
The key is to improve the search efficiency.
Success rate. Pollard
Calculation
To improve the success rate, essentially in the group
. The elliptic curve method is used to consider the random elliptic curve groups over finite fields. Since the elliptic curve can have many different options, the ECM method is better than the Pollard method.
It is much more efficient. So far, it is the third fastest factorization method, second only to number field screening and secondary screening.
First, we will give the definition of an elliptic curve on the domain:
Definition 1
(Domain
)
Set
Is the domain where the feature is not 2 or 3,
No square factor,
Indicates infinity, then
Called
An elliptic curve above.
Note 14
The square factor is equivalent to the discriminant.
That is, the elliptic curve is non-singular, and there is no "sharp point" in the ry ".
An elliptic curve is both an algebraic curve and an addition group:
Definition 2
(Addition operation on an elliptic curve)
Set
An elliptic curve,
, Pass
Straight line
At three
.
Indicates
About
The symmetry point of the axis. Define Addition
There are three other special conventions:
- If
, The line is regarded
The tangent;
- If
, Then define
Infinity
;
- If
, Then define
.
Through simple calculation, we can get the following:
Proposition 1
(Explicit expression of addition)
Set
,
,
, Then
/Begin {cases}
X_3 =/Lambda ^ 2-x_1-x_2 //
Y_3 =/Lambda (the x_1-x_3)-Y_1
/End {cases}/Tag {3}
/End {equation *} "width =" 580 "Height =" 61 ">
Where
/Lambda =
/Begin {cases}