Nine o'clock in the evening | How do I parse Web Access logs using Python?

Source: Internet
Author: User
Tags log log

Topic: How to parse Web access logs using Python

Content

    • Python Basics

      • string, dictionary, file, time

      • Web Access Logs
    • Actual combat

    • Questions
Main lecturer: KK

Multi-language mashup engineer, love open source technology, like get new skills, 5 years of PHP, Python project development experience, lead the team to complete a number of small and medium-sized project development, security, cloud and other areas of strong interest, good at WEB Security development, performance optimization, Distributed Application development & Design and other aspects, serious and responsible, willing to share skills, the current 51reboot.com Python practical class lecturer

Any language has a use scene, only suitable and inappropriate, not good or bad. Language is the tool used to describe how computers work, ideas (ideas & algorithms) are the foundation and also the focus.

String

Like a name, a sentence to describe such a text

Use single quotes, double quotes, three single quotes, or three double quotation marks for some characters

What are the functions of a string

    • Split delimited string as List

    • Format formatted string

Dictionary definitions
    • Defined

      • Use curly braces to contain

      • The format of each element as Key:value

      • Separating elements with commas
Practice

First, count the number of occurrences of each element in the list

languages = [' Python ', ' java ', ' Python ', ' C ', ' C + + ', ' go ', ' C # ', ' C + ', ' Lisp ', ' C ', ' JavaScript ', ' java ', ' Python ', ' Matl Ab ', ' python ', ' Go ', ' Java ']

Tips:

The statistical results are in the form of element:count, and the statistical results are dict from left to right in order to determine whether the element is in Dict, if not, and if not, the element is stored in the dict and the count is 1, otherwise the element in the Dict The Count plus 1 should be stored in the Dict.

Ii. number of occurrences of each English letter in a statistical article

    • Article = ' I was not delivered unto this world in defeat, nor does failure course in my veins. I am not a sheep waiting to being prodded by my shepherd. I am a lion and I refuse to talk, to walk, to sleep with the sheep. I'll hear not those who weep and complain, for their disease are contagious. Let them join the sheep. The slaughterhouse of failure is not my destiny. '

    • Tip: Determine if the word is English

    • if (element > ' a ' and element < ' Z ') or (element > ' a ' and element < ' Z ')

The Key of the dictionary
    • Key must be an immutable data type

    • Digital

    • Integer

    • Floating point number

    • String

    • Boolean type

    • List X

    • Meta-group

    • Child elements must also be immutable ("a", "B")

    • ("A", ["B"]) X

    • Dictionary X
What are the functions of the dictionary

File
    • The order in which you open Word files on your computer

    • Find the corresponding file in the computer drive letter

    • Double-click to open the file (select the tool to consult)

    • Review file contents/Edit File Contents

    • Save files if you have edit file contents

    • Close File
File operations
    • Open File
    • Fhandler = open (path, mode, ...)
    • Path for file paths
    • Mode is open file mode and file type
Mode Open File Mode
R Read (default)
W Write
X Create and write
A Additional
r+ Write
w+ Write read
x+ Create and write Read
A + Append Read
    • Close File
      Fhandler.close ()
Mode File Type
T Text (default)
B Binary
    • Traversing file contents
Time

Web Access Logs
    • Web Access logs are process logs that are accessed by Web server-logged Web sites

    • Log Properties

      • When did

      • What people

      • Through what tools

      • In what way

      • What resources were accessed
      • What the result is (status/return data size)
Web Access log Log format
    • Common Log Format

127.0.0.1--[14/may/2017:12:45:29 +0800] "get/index.html http/1.1" 200 4286

Remote-host IP Request time TimeZone Method Resource Protocol status code send bytes

    • Combined log Format

127.0.0.1--[14/may/2017:12:51:13 +0800] "get/index.html http/1.1", 4286 "HTTP://127.0.0.1/" "mozilla/5.0 (Windows N T 6.1; Win64; x64) applewebkit/537.36 (khtml, like Gecko) chrome/53.0.2785.116 safari/537.36 "

Remote host IP--Request Time Time zone method resource protocol status code send byte referer character browser information

Web Access log Log Example

Actual combat
    • Statistics the following data

      • Need to get geo-location based on IP

      • Sum of traffic per line in daily log, sum of total traffic (sum of daily traffic)

      • Number of occurrences of each status code

      • The number of non-duplicated IPs per day, the total number of non-duplicated IPs (the sum of the number of non-recurring IPs per day)??? )

      • Number of journal lines per day, total number of rows in the journal (sum of journal lines per day)

      • Count the number of clicks, total hits per day

      • Count the number of visitors per day, total number of visitors

      • Statistical total Status Code distribution

      • Statistics of daily traffic size, total traffic size

      • Statistical access to geographical distribution and number of visits TOP20
Run

Analysis
    • Statistics by day

      • Number of log lines per day

      • Browse the number of visits per IP per day

      • Number of visitors per day = number of IP component collections appearing daily

      • Number of status code occurrences per day

      • Total daily traffic
    • Total statistics

      • Total journal lines = The sum of the number of journal lines per day

      • Total number of visitors = number of sets of all IP components appearing
    • Geographical distribution

      • All IP access counts sorted by TOP20

      • Find a location based on IP
Code

Statistics Daily Information

Statistics Total Data

Statistical Area data

Print results

What else can I do?
    • Count the number of visits per URL per day?

    • Last 24-hour access/traffic trend graph (per 5-10-minute granularity)

    • Daily Browser Distribution Map

    • Daily access to document distribution map

    • Daily static document traffic statistics such as JS, CSS, pictures, etc.

    • ......

    • Web pie charts, graphs, histograms, maps

    • ......

    • Attack detection by machine learning supervised learning method

Online Live sharing
How to apply: Add a small assistant (Xiao Yue): 1902433859 notes Open Class into the live sharing group

Nine o'clock in the evening | How do I parse Web Access logs using Python?

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.