Store logs with MongoDB

Source: Internet
Author: User

Recently, we have been thinking about the architecture. One problem is that we are still plagued by these business systems, that is, logs and log statistics. The general problem is as follows:

  1. We have many modules. Although the log format is similar, they are all written in their respective servers and directories.
  2. The log contains a lot of data in key => value format.
  3. After a feature is launched, the PM or the demand side will require statistics and reports to track the use of the feature. Generally, PM doesn't know how to write programs, so most of the statistics are submitted to RD.
  4. The value of such statistical data and reports decreases with the passage of time. at a certain time, there will be no more value, and no one cares about it. The statistical program is still running, so it will be maintained on a daily basis, I forgot where to deploy it.
  5. Space occupied by log storage, which needs to be deleted regularly
  6. There are many web server images, and logs are usually multiple copies, which need to be merged during processing
  7. The Web server sometimes needs to be adjusted. When the Web server is deprecated, logs are usually lost.

I am a lazy, and after I have done a good job, I am usually disgusted with the demand for such data statistics, because data mining is a test of inspiration in general, so the demand is always changing, today we need such a number, and tomorrow we need it. An ideal situation is that PM will use SQL statements, and RD will inject all the data into the database. The former Nb PM in our group is still there, which is often the case, but it cannot work now. Another problem is that the database is not schema free, and the format is not so free. You need to design it in advance, which cannot meet your needs.

Log statistics usually have the following features:

  1. Large data volume, which may have GB of data (business data) per day)
  2. Frequent writes and infrequent reads (almost every PV generates several Log Data Records)
  3. Statistical Services can be task-based without real-time
  4. Absolute data consistency is not allowed

According to this feature, MongoDB is a suitable choice because:

  1. Schema free. You can add the required fields at any time.
  2. Excellent scalability, no worries about insufficient storage space
  3. It can be asynchronous during write, so you don't have to worry about occupying the request response time.
  4. For collection, you can specify a fixed size (Capped
    Collection)

    For example, 100 GB, so that MongoDB will reuse the space according to the LRU algorithm.
  5. Supports general query conditions and aggregation, and provides JavaScript shell. This allows pm, who is interested in data analysis, to compile statistics scripts by themselves, and finally frees rd from such work.

Although it is good to cultivate RD's product awareness, such a thing as counting product usage data does not really interest RD. Previously, the Department had a product, capture data from various product lines and record the data in the database and provide report presentation, but in general, the flexibility is very low. First, both parties need to set interfaces, and second, we need to do a good job of statistics, it only saves on data presentation.

The current idea is to build a MongoDB cluster to centrally store business log data, and then build a platform on MongoDB to handle general data statistics requirements, allowing you to write some tasks to run on the platform, these tasks can be written in a unified JavaScript language. For a relatively small amount of data (our business system, compared with the log on the retrieval end, is a small amount of data, and G data is big on a day), it is a good solution, the main purpose is to solve the maintenance and management problems.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.