After more than 10 years of work, we have done a lot of distributed computing, parallel Computing, memory computing, mass data processing projects, according to the current classification, these belong to the cloud computing/Big Data category. Today I have done three of the projects, only three. The first is that we received video to share the site's video transcoding orders, the site name will not say, there is a propaganda suspicion. They are the case, the content of the video site is played on the Web page in MP4 format, but the format of the upload is various, we must convert these videos into MP4 format, video transcoding work presumably everyone in their own computer tried, Usually a video transcoding of around 100M takes more than 20 minutes (CPU is Pentium IV). In order to improve transcoding efficiency, so that users get close to real-time results, must after the video, immediately split a video into multiple units, scattered across multiple computers for transcoding, and then recycling all transcoding video, in order to re-assemble into a new video. I guess the current video site is the same routine. Our team was the first to do streaming media started, the work is not difficult, pro, soon completed. As long as the machine is large enough, the video can almost complete the transcoding in near real-time status. Later again, from the user uploads the video to start immediately transcoding, received a turn, the user's video upload finished, our transcoding work is completed, the user can immediately see their uploaded transcoding video, basically 0 delay, and stability, efficiency is higher than the request of the other party. After this, the co-owner was satisfied and then gave a CDN project. Now we still have a cooperative relationship.
The second one is a bit iffy, the project can write a paper titled, "About the movement of air moving target and the problem of landing point." This project is XXX sent down, the essence is the shells to fight missiles, probably want to see if can play, demonstrating the technical feasibility. At that time, people are still in a unit, because the field of research and aerodynamics is irrelevant, so this knowledge is basically illiterate, know that there are a few limited terms and mathematical formula, but this is in line with the requirements of XXX, do not need us to understand these advanced theories, they sent people to cooperate with us this project. Our job is to put a lot of sensors (they are not clear, but the Earth people understand that this thing should be radar) received data, including the current number of kilometers of air wind speed, wind direction, surface surface rate, as well as some other indicators after the collection (these instantaneous generated three-dimensional data of very large amount of data), Distributed to a large pile of computers (100 nodes are prepared for testing), and the results are calculated quickly (must be faster than the target of air movement). This is really like the current work of Spark, Storm, to rely on large memory, high-performance CPU, and high-speed network, skipping the hard disk of this bottleneck, to achieve rapid calculation. This project took a long time, mainly with the cooperation of the communication improvement, and then re-communication and then improve, cyclical, cycle. Then finally finished, it is said to do the test, the feasibility is very high, the top leadership is satisfied, but also put a victory wine. Anyway, I haven't seen or drunk. In fact, how they calculate, I did not understand at the end, asked once, said is a secret.
The third and two barrels of oil are related to shale gas. The current background is in Sichuan fuling (this place also produces mustard) found a large area of natural gas, according to the natural law, oil and gas is symbiotic, so two barrels of oil to collect shale samples, analysis of the area underground there is no oil resources, if there is, is not a mining value. This program is already there, Exxon Mobil Oil company developed, running on the CARY-1 vector machine. The machine was a supercomputer in the 780 's, and it was no longer enough to be in the museum. Some data analysis theories are outdated and need to be adjusted. Two barrels of oil is not satisfied with the speed is too slow, the calculation will take several hours a few days to produce results. This time is co-operation, two barrels of oil out of the familiar with the production of technical experts, we out of parallel calculation of the code farm, the cooperation between the last six months, is completed. This time we used a new parallel algorithm, which improved the efficiency of a large section. Like the last shot of a missile, the whole project began silently, silently ending, and not knowing what was behind it.
Talk about the cloud and big data projects that have been done over the years