Python----Face Question (iii)

Source: Internet
Author: User

1. What is the difference between post and get?
1. According to the HTTP specification, get is generally used to obtain/query resource information, which should be secure and idempotent. And post is typically used to update resource information 2. Get is the data that is passed in the URL, and the data is placed in the request header. Post is the passing of data in the request Body 3. Get transmits less data and can only send data on the request header. Post transmits a large amount of data, which is generally not restricted by default. 5. Get security is very low, post security is high. But the execution efficiency is better than the Post method. Recommendation: 1, get the security of the method than the Post method to be less, including confidential information, it is recommended to use Post data submission method; 2, in the data query, it is recommended to use the Get method, and in the data to add, modify or delete, the proposed post method;
2. What is the difference between HTTP and HTTPS protocols?
The HTTP protocol is a Hypertext transfer protocol that is used to pass information between a Web browser and a Web server. HTTP protocol work is to send content in clear text, do not provide any form of data encryption, and this is very easy to be exploited by hackers, if the hacker intercepted the Web browser and Web server transmission between the information, you can read the information directly, so the HTTP protocol is not suitable for transmitting some important, Sensitive information, such as credit card passwords and payment verification codes. Secure Sockets Layer HTTPS protocol is to solve this security flaw of the HTTP protocol was born, in order to secure the data transmission, HTTPS on the basis of HTTP added SSL protocol, SSL relies on certificates to verify the identity of the server, the browser and server communication between the encryption, In this way, even if the hacker borrowed the transmission process of information, also can not decipher read it, our website and the user's information has been the greatest security. The HTTPS protocol requires a certificate to be applied to the CA, generally with little free certificates and costs. HTTP is a Hypertext Transfer Protocol, the information is clear-text transmission, HTTPS is a security SSL encryption transport protocol HTTP and HTTPS using a completely different connection mode with the port is not the same, the former is 80, the latter is 443. HTTP connection is very simple, is stateless, HTTPS protocol is built by the SSL+HTTP protocol can be encrypted transmission, authentication network protocol than the HTTP protocol security 3, domain name and IP What is the relationship between, how to see a domain name corresponding to all the IP? There are millions of hosts on the Internet (host), in order to differentiate these hosts, each host is assigned a special "address" as the identity, called IP address because the IP address is a number of all, in order to facilitate user memory, DNS (domain Name System) is introduced on the Internet. When you type a domain name, this information is first reached on the server that provides this domain name resolution, and then resolves the domain name to the IP address of the corresponding Web site. The process of completing this task is called Domain name resolution. 1.ping 2.nslookup 3. Using webmaster Tools, etc.
4. What does the Keep-alive field do in the HTTP protocol header?

The HTTP protocol uses "request-answer" mode, when using normal mode, i.e. non-keepalive mode, each request/reply client and server will have to create a new connection, disconnect immediately after completion (HTTP protocol is a non-connected protocol); When using Keep-alive mode (also known as persistent connection, connection reuse), the Keep-alive feature keeps the client-to-server connection active, and when a subsequent request to the server occurs, the Keep-alive feature avoids establishing or re-establishing the connection. By using the keep-alive mechanism, you can reduce the number of TCP connection builds and also mean that you can reduce the TIME_WAIT state connection to improve performance and increase the throughput of the HTTPD server (fewer TCP connections mean fewer system kernel calls, socket accept ( ) and the close () call.
5. What is the robots agreement? Robots Protocol (also known as Crawler Protocol, crawler rules, Robot protocol, etc.) is robots.txt, the site through the Robots protocol tells the search engine which pages can be crawled, which pages cannot be crawled. The robots agreement is a code of ethics for Web sites that are intended to protect website data and sensitive information, and to ensure that users ' personal information and privacy are not violated. Because it is not an order, it needs the search engine to obey consciously.
6, list several common relational database and non-relational database? (at least two per type)
Oracle, Mysql, SQL Server, DB2 Redis MongoDB Cassandra
7. What is a memory leak? How to avoid it? A situation in which a program fails to release memory that is no longer in use due to negligence or error. A memory leak does not mean that there is physical disappearance, but that the application allocates some memory, and because of the design error, it loses the control of the memory, thus causing the memory waste. Cause the program to slow down and even the system crashes and other serious consequences. A circular reference between objects with the __del__ () function is the main culprit that causes a memory leak. Using the: Del object to remove a reference count for an object when you are not using an object can effectively prevent a memory leak problem. Details of objects that cannot be reclaimed through the Python extension GC can be obtained by sys.getrefcount (obj) to get the reference count of the object, and to determine if the return value is zero for memory leaks Python's memory management mechanism 1. Reference count: By reference count Keep track of the variables in memory, and how many references in the Python internal record each use the object. In Python, there is an internal tracking variable called a reference counter, and how many references each variable has, referred to as a reference count. A reference count is created when the object is created. When an object has a reference count of 0 o'clock, the object is not required and is included in the garbage collection queue. Reference count increased: 1. The object was created: X=4;2. Others were created: Y=x;3. Passed as a parameter to the function: foo (x); 4. As an element of the container object: a=[1,x, ' 33 ']; When a reference count is reduced: 1. A local reference leaves its scope. For example, at the end of the foo (x) function above, x refers to the object reference minus 1; 2. The alias of the object is explicitly destroyed: Del x; or del y; 3. An alias of an object is assigned to another object: x=789 4. Object is removed from a Window object: Mylist.remove (x) 5. The Window object itself is destroyed: Del myList, or the Window object itself has left the function Domain. 2. Garbage collection 1. Reference counts: Each object has ob-refcnt to make reference counts. When an object ..., the ob-refcnt is incremented, and when the referenced object is deleted, then the ob-refcnt decreases when the ob-refcnt is zero, freeing the object's memory space by 2. tag cleanup: Resolves issues with circular references. On demand, wait until there is no free memory, from the register and the reference on the stack, to traverse all objects and references to all the accessible tags, and finally release the unmarked object 3. Generational technology:Improve efficiency, improve the efficiency of garbage collection, according to the survival time, divided into different sets. Divides a block of memory into different collections according to its survival time. Each set is called a "generation", and the frequency of garbage collection decreases with the generation's survival time. Python defines a collection of generational objects by default, the larger the number of references, the longer the object will survive 3. Memory pool mechanism in Python, most of the applications are small chunks of memory, and a lot of malloc and free operations are performed. Python introduces a memory pooling mechanism for managing the application and release of small chunks of memory, the PYMALLOC mechanism. It puts unused memory into the memory pool instead of returning it to the operating system. 1. When the requested memory is less than 256 bytes, Pyobject_malloc will request memory in the memory pool, and when the requested memory is greater than 256 bytes, the behavior of Pyobject_malloc will degenerate to malloc. Of course, by modifying the Python source code, we can change the default value to change Python's default memory management behavior. 2. For Python objects, such as integers, floating-point numbers, and lists, have their own private pools of memory that are not shared between objects. That is, if you allocate and release a large number of integers, the memory used to cache these integers can no longer be assigned to floating-point numbers. 8. Enumerate several commonly used DOM parsing projects, plug-in XML, LIBXML2, lxml, XPath
9. What are the common anti-crawler mechanisms? Through headers anti-crawler: Solve the strategy, forge the headers based on user behavior Anti-crawler: Dynamic change to crawl data, simulate the behavior of ordinary users based on the dynamic page of the anti-crawler: Tracking the server sent Ajax request, simulation Ajax request
10, how to improve the crawl efficiency? 1. Crawl aspect, take advantage of asynchronous IO.  2. Processing, the use of Message Queuing to do producer consumer Model 1, stair problem 1.1, to a staircase, from the bottom to go up, every time you can walk 1 to n steps, how many kinds of the total number of methods to go?  1.2, to a staircase, from the bottom to go up, every time you can walk 1 steps or 2 steps, how many ways to go? 1.3, to a staircase, from the bottom to go up, each can walk 1 steps or 2 steps or 3 steps, how many kinds of way to go? # 1, to a staircase, from the bottom to go up, every time you can walk 1 or 2 steps, how many ways to go? # Assuming there is only one step, then there is only one jump, that is, one jump, f (1) = 1; If there are two steps, then there are two kinds of jumping method, the first method is jumping one level at a time, the second method is jumping two levels at a time, f (2) = 2. # If there are more than 2 levels of N-level steps, then if the first jump to a step, there are still n-1 steps, there are f (n-1) species jumping method, if the first strip 2 steps, the remaining n-2 steps, there are f (n-2) species jumping method. This represents f (n) =f (n-1) +f (n-2). def walk_stairs (stairs): if stairs = = 1:return 1 if stairs = 2:return 2 Else:return Walk _stairs (stairs-1) + walk_stairs (stairs-2) # 2, to a staircase, from the bottom to go up, every time you can walk 1 to n steps, how many ways to go? # on the 1 steps # 2 steps on the 3 steps on the 4# 4 steps on the 8# on the N Steps 2^ (n-1) # 3, to a staircase, from the bottom to go up, each can walk 1 steps or 2 steps or 3 steps, how many kinds of the total number of methods to go? # on 1 steps 2^ (1-1) # 2 steps 2^ (2-1) # on 3 steps 2^ (3-1) # f (n) = f (n-1) + f (n-2) + f (n-3)
2, to a character array, the characters contain a-Z, 1-9, for example: a b C 4 B 2 A C 1 1 3, only one occurrence of the first occurrence of the character str_list = [' A ', ' B ', ' C ', 4, ' B ', 2, ' A ', ' C ', 1, 1, 3]def Find_only_one (alist): for string in Alist:count = Alist.count (string) if Count = = 1:return String return None
3, have an HTML text string, let me take out <a href= "Prompt me this link address" >sflkj</a> this a tag inside the href link address? From BS4 Import beautifulsouptext = "<a href= ' Prompt me this link address ' >sflkj</a>" the_html = BeautifulSoup (text,features= ' lxml ') print (The_html.find (' a '). attrs[' href ')
4, the following is a single-threaded code, please rewrite into multi-threaded: start = "http://google.com" queue = [start]visited = {Start}while Queue:url = queue.pop (0) p Rint (URL) for next_url in Extract_url (URL): If Next_url not in Visited:queue.append (Next_url) Visited.add (Next_url)
Answer: from concurrent.futures import threadpoolexecutorstart = "http://google.com" queue = [start]visited = {Start}pool = Thr Eadpoolexecutor (Ten) def func (URL): For Next_url in Extract_url (URL): If Next_url not in Visited:queue . Append (Next_url) Visited.add (next_url) while queue:url = Queue.pop (0) pool.submit (func,url) Pool.shutdown (Wai T=true)

  

Python----Face Question (iii)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.