Problem scenario: the company has more than 100 million data to submit to a third-party platform. The third-party platform requires the XML file format (the file format can be compressed to gz during upload ), but it also requires writing all the data into an XML file, which cannot be separated. My approach is to write the data into the $ dom object, and finally unify... problem scenario:
The company has more than 100 million data to submit to a third-party platform. The third-party platform requires the XML file format (the file format can be compressed to gz during upload ), however, it is also required to write all the data into an XML file, which cannot be separated;
My approach is to write data to the $ dom object, and finally unify $ dom-> save ($ xmlFile); this method occupies too much memory, in addition, it takes a long time to write more than 100 million products.
I would like to ask you if there are any better suggestions to reduce the memory usage and shorten the Job execution time?
Thank you very much ~~~
Reply content:
Problem scenario:
The company has more than 100 million data to submit to a third-party platform. The third-party platform requires the XML file format (the file format can be compressed to gz during upload ), however, it is also required to write all the data into an XML file, which cannot be separated;
My approach is to write data to the $ dom object, and finally unify $ dom-> save ($ xmlFile); this method occupies too much memory, in addition, it takes a long time to write more than 100 million products.
I would like to ask you if there are any better suggestions to reduce the memory usage and shorten the Job execution time?
Thank you very much ~~~
Can I try this idea? Http://phpedia.net/1v2knpye
- The memory is virtualized into a hard disk and then written into your memory hard disk.
- Write a file when you get a piece of data, instead of getting all the data.
- If the xml structure is not complex, String concatenation is usually much faster than using the xml library to export xml data.
- Try another language. php is very slow in file I/O operations.
With dom, all data must be stored in the memory.
I think it will be very fast to directly spell xml with strings, and append it to the file after a piece is put together, it should be much faster.
Why not consider the json format? Parse storage transmission has advantages over xml format.
In addition, php supports conversion between json and xml formats.
However, the conversion cost of over 100 million data has not been tested.
In this case, writing an XML file is not necessary.
It is worth considering using a streamlined model to solve the problem. To upload data, you do not have to wait until all the XML files are generated. Instead, you only need to generate one XML file to upload the data.
Gzip is also a pure stream compression (a bit out when it comes in), so gzip can also be simply inserted into this operation stream.
Regardless of the submission method, million tests the quality of network connections and the processing capability of the other party's API platform. First, try to find a batch submission method.
Are there xmlreader and xmlwriter? You can read the declaration in the Xml document without loading the node at a time.
file_put_contents($fileName, $contents, FILE_APPEND);
There is no problem with APPEND. The webserver logs are much larger than this one.
I have already done this for the feed file, which is much larger than you. I use php to directly splice xml without class.
Simply put, it is necessary to cut a large task into a small task and process it in batches. There are no problems such as insufficient memory.
The method is as follows:
$ Total number of records
$ How many records does batch process at a time?
Ceil ($ total/$ batch) processed times
Call the data again and again in the form of ajax requests. Each time a batch of data is processed in the future, the data is written as follows:
File_put_contents ($ fileName, $ contents, FILE_APPEND); As mentioned above @ tohilary, I am using this method. It is okay to generate a G-level file at a time, I also wrote a progress bar, which looks like this.
1. Use LIMIT 0,100 to retrieve data every time (take the first 100 records each time). This will not lead to slow paging of Data pulled from the database because of the large number of pages. [This operation is very fast]
2. Use 100Multithreading+Innodb transactionsPerform operations on the operated dataMarkFor example, isWrite = 1 (indicates that the file has been written)
As described above, 100*100 = 100 pieces of data can be executed each time, and batches can be executed. file_put_contents ($ fileName, $ contents, FILE_APPEND ); generate 100 node files. [This operation takes a while...]
3. In the last step, merge the above 100 XML node files into a large XML file. You can use the shell command:Copy/B *. xml all. xml[This operation is fast]
For specific transactions, refer to the specific use cases of Mysql Innodb transactions in the business + Demo demonstration
I am worried that your memory usage is too large and it will be slow.