Passers-by: Ginger juice brother, I heard your column sells very hot?
OK, thank you for your approval.
Passerby A: You make a little money, you run away? The last time I saw you send an article, the blink of an eye is n years.
No run, no run, I am now day and night in the "Network Workers 2.0 promotion strategy-0 basic Primer Ansible/python" to catch the manuscript.
Passers-by: really can pretend ....
Pondering for a long time, just want to come out with a 2B pencil, advertising ingredients, a great opening, the recent busy column, there is no time to update the blog, I do not live my friends.
Today, I want to talk to you about a topic that is related to many years of the lake.
SNMP is dead!
Whoa whoa, wait, don't rush to throw an axe, brother, this is not what I said, is what others say.
I know you have a deep relationship with SNMP, and your network is all covered by SNMP.
The goods to strike, it is estimated that the boss will go to your home soon, it is estimated that your wedding night also have to carry a notebook to the computer room.
But, like the beauty of the death of my heart is not like, many people have already treasonous to SNMP, bent to kill it.
The purpose of my coming today is to tell you what other people think of SNMP and how they plan to kill it.
After all, some things do not make a racket, certainly for a reason.
Let's not be too quick to oppose it and see if they say it makes sense.
No reason, fly the axe again.
First: Inaccurate data
SNMP is a query-based mode, the network management system through the periodic sending of SNMP query messages, asked ask networking equipment, or server equipment?
Hey, how are you?
What is your situation ah, interface traffic is how much, CPU is how much, internal occupancy rate, etc.?
Just like the old lady in college rounds, I'll come over and harass you in a minute.
But this query, after all, has a time interval, generally we are configured 5 minutes, that is, 300 seconds.
If you look at it for a day, or a few hours, 5 minutes is really short.
So everything is fine and perfect.
However, occasionally, we take the example of a traffic monitoring platform based on similar cacti.
For example, customers complain about the speed of a certain period of time, there is a packet loss phenomenon.
Then the engineer to view the monitoring platform, no problem ah, we monitor the platform on the interface traffic is very stable.
Didn't see the congestion.
You say, at this time, do you mean the customer unruly, or say the engineer is telling lies?
In fact, both of them are right.
Let's look at:
(Ginger juice, eh?) You need to strengthen this Windows painting skills Ah, not the general ugly ah. ）
, the Green line is the bandwidth that the monitoring system thinks, while the top yellow line represents the interface bandwidth, and the fluctuation represents the real-time traffic.
I guess I don't have to say it carefully, you probably know it.
Yes, when SNMP was first queried 5 minutes ago, the first value was obtained, and after the second query, it happened that the resulting value was the same as the first time.
So from the SNMP point of view, it seems that within 5 minutes, the occupied interface bandwidth has not changed.
But real user data is just as choppy as the waves.
You do not know at a certain moment there will be a burst of data, and a burst of two words, is that he is not persistent, is a temporary sudden appearance.
However, this burst of traffic will still cause the network interface to drop packets.
Examples of several bulges.
But in the monitoring system inside, but is calm, the years quiet good ah.
The above example can be slightly more extreme, because a completely straight monitoring platform traffic line, is unlikely.
But it's smooth, not sudden bursts of traffic, but it actually happens.
For example, here is another counter example:
, blue lines, unfortunately, are still SNMP queries.
And the red line, is a monitoring protocol spit out the data.
This shows that the red line is very close to the real flow.
While the thick red lines are circled, some failures cause traffic to plummet.
However, the periodic query of SNMP does not see these details.
In his eyes, always a silky smooth line.
Second: The output is not pleasing
It said that SNMP because of the reason for regular queries, causing more than n details are missing out.
Some of the partners ' mouths rose, revealing a bad smile.
You are not good to solve, the SNMP query time to shorten the point is not OK.
For example, 1 minutes, want to cool a little 30 seconds also become.
This is called when the leader of the mouth, the work of moving legs ah.
I believe a lot of operation and maintenance friends must have experienced, network equipment CPU regularly soar high.
It's very regular, just for a few minutes.
And who happened, the network management system of the server is also a special heart, the two resonance.
You are tall and I am tall.
Check it out, just a process: SNMP.
Needless to say, either the monitoring system is too much, the system is responsible for the query part, the system is responsible for querying another part.
This network equipment is too unbearable.
It is either a monitoring system, but there are too many queries. For example, every time a query, the network equipment basically turned upside down.
Because these queries are based on the network device's routing engine to handle, CPU can not be high?
Therefore, it is not possible to modify queries too frequently.
Third: not reliable
The SNMP query is over and the SNMP trap message is also problematic.
In general, we are using UDP to host SNMP messages, the virtue of UDP you also understand.
No problem OK, what is the problem, directly on the spot to lose the packet, the key is not to tell you the packet was lost, this character is questionable.
The general protocol is OK, but the SNMP trap is one.
If you have an interface down, the network device is sent once, only this time trap message this only one Miao.
UDP does not miss the picture.
Lost after the network equipment Pat Butt said, anyway I sent out.
Network management System said, I did not see, do not know.
Who's the bad luck at last?
The engineer who is engaged in operation and maintenance, still use to say.
The network world, actually also has the state-owned enterprise.
Another problem I have encountered myself, for example, when a monitoring platform device controls thousands of devices at the same time.
These different time periods of SNMP trap messages flood into the monitoring platform equipment, but when these traps in the monitoring platform in the internal SNMP process, because some of the open source software bug, the number of concurrency is not enough, causing the trap in the device internal software queue line, enter.
Then a funny scene appeared, 2 hours ago, a network equipment hung up, management Center monitoring staff happy to eat hot pot singing songs. Until someone rushed to the office and said, "We broke the net, what's the situation?"
No Ah, you see the monitoring platform, all the green lights, how beautiful.
Two hours later, someone shouted and the device was down.
That goes back to the question itself, assuming that there is now an important interface down, by SNMP how do you solve?
A. Let's adjust the query time to query every second?
B. Wait for the SNMP trap message?
You say two above, which one do you choose?
IV: not fully compatible
Are you experiencing the following scenarios:
Early in the morning, what things did not do, light Baidu.
Keyword: A MIB library for a certain device?
Or, the keyword: a certain device SNMP queries a numeric value.
These things, really upset.
And how did it end up being solved?
Alas, how can be solved, knock command line collection chant.
If you can program, write a program to hit the command to collect the chant.
If you are a leader, find an engineer who can write code and write a program to do the command collection.
V: the Inhuman OID value
Ask you a question, do you know what this is?
Answer: The SNMP OID value.
What OID value?
If you say: This refers to the interface state of the IF-MIB, Ifoperstatus
Congratulations, you can visit the non-normal human research center.
I'm sure you've played snmpwalk, you walk. All is a bunch of non-human language, dense numbers.
How can you feel good about working?
Dare not say more, say more are in pull hatred, after all, including me many people are still relying on SNMP, do not wait good, careful to you strike.
In summary, SNMP in today's network environment, indeed encountered a bottleneck.
In particular, the scale of the network is increasingly expanding today.
So, the sentence should be:
Some of the SNMP is still alive, but it's actually dead.
What to do?
The change from pull to push (push).
Can we change the angle, the traditional way from the monitoring system to the network equipment "pull" data into the network equipment active to the monitoring system "push" the data method?
For example, a device state acquisition method that takes SNMP as an example is a pull method, called a query.
This causes the network device to respond passively, because you don't know when the SNMP query will fly over, and when it comes, the network device has to allocate resources for processing.
However, in a different perspective, if the use of proactive reporting, the problem is solved.
Because of the active escalation, the network equipment has the initiative, the developer can adjust the equipment resource utilization and the load according to the actual operation situation.
To facilitate reading, here is a simple comparison of the two:
Needless to say, some PK down, in addition to flexibility to passive query, other aspects of the initiative to report "Push" the way the advantage is huge.
Future trends: Streaming Telemetry Flow Telemetry technology
This name is very hanging, stream telemetry technology.
In fact, simply speaking. It is the way to achieve the above "push" data.
How to do the "push" of this action efficiently?
Streaming telemetry has the following features:
1. Data-level-based data escalation
Traditional SNMP, whether the query or the trap, is the routing engine, the control plane to handle.
However, streaming telemetry can use vendor support to embed code at the hardware board ASIC level and export real-time data directly from the board.
The data exported by the card is sent at line rate, so that the routing engine in the upper layer focuses on processing protocol and routing computation.
As shown in the following:
2. High scalability
Based on the first data plane, the telemetry of Stream is greatly enhanced.
For example, the graph below is a CPU utilization graph. (Device model unknown)
In general, CPU utilization hovers around 8%.
However, this device is configured with stream telemetry active escalation.
What do you think it's been reporting?
Here's the data:
- Reported once every 15 seconds
- Over 60 types of indicators reported
- Contains more than 500 escalation types
- 176个万 Gigabit Interface input, output statistics, error number, QoS queue count statistics.
- Each interface contains two data types of IPv4 and IPv6.
- The number of bytes and packets for the last and 200 MPLS LSP.
Too scary, SNMP compared to the instant weak explosion.
This picture red line, mentioned above is a protocol spit out the data.
Needless to say, you know all about it.
This is the data that streaming telemetry spit out.
3. Automatic support for DevOps operations automation
Streaming telemetry because of two advantages, automatically docking the current popular technology, such as operation and maintenance automation technology.
On the one hand, the data collected by the streaming telemetry monitoring platform is close to instant information, so DevOps Operations Automation engineers can play many different ways, such as automatically adjusting data forwarding paths based on current traffic data and Sdn.
On the other hand, the data format used by streaming telemetry is a popular standard format and model today. such as json,netconf, and the Yang model.
So, simply put, this is a tool and technology that adapts to the times.
Currently streaming telemetry technology, there are two options.
One is sflow.
And the other one is Openconfig Telemetry.
(already deployed in Google, 30% of vendor devices have turned on streaming Telemetry, millions updates per second.) ）
Two of the above have been followed up by many manufacturers.
For example, Cisco and Juniper can be configured for both of the above.
Interested friends can go to see the official configuration document.
This article first hit the whistle.
If you are interested in sflow, or openconfig doing.
Please leave a message, my next article targeted to talk about the details.
Say so much, finally talk about feelings.
That is, in the last 5-6 years, the computer network this industry, has been a tremendous change.
All kinds of new technologies emerge, Blossom, and contention.
And when I constantly touch these new technologies, my heart is not only touched, but more important is a moment of crisis.
Therefore, I hope that I can build a small information bridge with limited time and energy, whether you are because of the gap in English, or other reasons, we together for the coming of the future, work together.
By the way, a small promotion:
What do you mean, if you don't know what the Json,netconf,yang model says?
If you want to learn automation?
Or, you just want to find a group of like-minded good xxx (the original is the base friend, the harmonious version of XXX), talk about network technology. Instead of joining a dead group once in a while.
Well, I think my column, "Network Workers 2.0 promotion strategy-0 basic Primer Ansible/python" will meet all of your above requirements.
Join us and meet the future.
Finally, to Cui Jian "not I do not understand the world changes fast" lyrics end, Happy National Day.
SNMP is dead-streaming Telemetry stream Telemetry technology