Cainiao also wants to play with search engines-a story that can be left blank for me and search

Source: Internet
Author: User

1.1 due to Java, originated from Crawlers

This section is intended to write an article about Jobsearch, a simple search engine (it's really easy, Daniel is just passing by). But let me talk about it for a while, and I'm brewing.

If you asked me what I did a year ago, I would not hesitate to tell you that I am a C # ER. I have been in touch with C # since my first summer vacation #, in the next two years, I started my career as a programmer using C #: I used her to write my first website, I used her to earn the first gold in my life, and I used her to stand out, it is favored by teachers and appreciated by students. Although the competition for Java, C #, and C ++ has never been resolved, I have always believed that whatever language, as long as I have learned the same depth, can achieve success.

Maybe there is no accident. I will keep learning C # (in fact, I prefer to talk about. Net). I went to my internship early in my senior year and worked on a C # job. However, this is not the case. At the end of my sophomore year, I received a project in my studio: a company wants to build a knowledge base, some of which are document libraries like Baidu Library, however, there is no resource at hand, and we hope we can help with collection. To put it bluntly, we are a crawler for document collection. At that time, we had no idea about the word crawler. The teacher only gave the previous brother a program to crawl Amazon's book categories and information, and required a prototype to be developed within two weeks. No way, we had to stick to our head. At that time, I was so energetic that I was checking materials and writing program tests all day. After two weeks of hard work, I finally came out of the first version and made an icon for it. The program ran on the server for one month in the winter vacation. When the program came back in the year, it was pleasantly surprised to find that it had downloaded more than documents, in addition, the program crashes on the seventh day of running (it seems to be bad enough now. It just took less than 15 minutes for Jobsearch to run 2 K + ). After six months, we will be able to deliver more than 300 million valid documents each month. However, unfortunately, the company will soon suspend the service, it is said that the document is almost enough (it is said that the money is based on the amount of documents delivered, and they are also crawling ).

In this project, I saw the charm of crawlers: It was really nice to take the massive amount of information on the Internet as your own (some people had always suggested crawling a web crawler H, of course, as a gentleman, I refused to say yes ). At that time, something called "Cloud" was popular in China, followed by big data processing and data mining, at the same time, Google, the IT giant, was removed from the Chinese mainland, and Deng Yaping, an athlete, made a search for the people. Even Xinhua News Agency and China Mobile launched a pangu search by moving the great gods out, I learned about the full-text search tool luence and many open-source crawlers heritrix, larbin, and even the complete search engine nutch. It also understands that a search engine like Baidu and Google needs to complete a seemingly simple query.

I think I have a relationship with these technologies. I fell in love with them after I got into touch with them. Due to the limitations of C #'s natural environment ,. there are not many mature open-source software under the. NET platform, so I started to try to write a search engine using C #, Which is modeled on Java's open-source project. However, it ended in failure: the level is not good, but I want to rush for success. It is very painful when you want something but don't know how to get it. I have read a lot of technical blogs, either just a few moments (I understand them) or a very in-depth analysis (I don't understand them ). I also read two books related to China. I think it is better to go to the blog.

In this way, although I learned a lot, I learned little about the technical level. At that time, my junior year was over half a year old, and many people began to think about the future direction. I also began to think about C #, which may be hard to find a job if I want to work in this area in the future. Since C # is hard to do, let's learn about Java. At least we can learn to use open-source software to study them. In addition, a teacher's words are very enlightening: the future career goal should be an expert in a certain field, rather than a language expert. In this case, I gradually become a javaer.

1.2 gorgeous changes, ending with a gloomy ending

As I am worried about now, with the foundation of C # (in fact, the basis of C # is also very general, and reflection and proxy are basically forgotten), Java basics have not been carefully read, I think it's almost the same. I just need to check the information if I don't know how to do it. Although it's easy to write code, the biggest drawback of this is that the Foundation is poor, the Design of many Java features is unknown. Even so, there is no problem in writing code and reading code. During this period, I learned lunece and heritrix. Although it is useless, it is basically okay to use it. This is just happened to happen in class, so I started to itch and wanted to train my hands. So I wrote a video capture program for letv and Jobsearch to be introduced.

Letv's video capturing program was mainly because of the bad speed of the school network at that time, but letv's speed was very fast (the packet capture analysis found that there were local servers and possibly educational networks ). So I want to capture letv's video and create a video server on campus. At that time, the page and video information were captured, but the video was stuck: the video address can be directly obtained through packet capture, but the video playback page does not know how to obtain the address, only knowing the address must be contained in a string of garbled characters. At that time, I did not finish the analysis for two weeks, and finally gave up because there were too many chores. Then I had nothing to do and analyzed it again. I suddenly found a base64 function in the JS file calendar referenced on the page ...... sure enough, it was base64 encoding. However, letv is no longer powerful. in colleges and universities, a company started to popularize the on-campus video client. (Haha, I once again proved that the most powerful anti-download method is video file encryption. I helped external companies test a system a few days ago and found that the videos above were all links. This made us despise it, at least you can beat many people to learn music from others)

Since the first abortion is complete, only another one is available. Just in time when I graduated from my previous job as a teacher, I wanted to crawl all the information about the current mainstream recruitment websites and school recruitment, and create a search engine, will it be inconvenient for us to query information when we graduate next year? We plan to simplify the course setup. Facts have proved that the previous accumulation was still useful. It was completed in just three days. At that time, I was very proud (this is the Jobsearch to be introduced), but only for 51job information, in addition, you can directly crawl the list page instead of the homepage's automatic crawling.

I was prepared to complete this program later, but some things have led me to take the postgraduate entrance exam.

1.3 join the postgraduate entrance exam for the sake of dreams

In the next semester of my junior year, two good technical friends successively decided to take an postgraduate entrance exam for the same reason: I hope that I can engage in technology R & D after graduation and continue to do so, as for the current situation in China, you can focus on R & D without worrying about your livelihood after graduation. Basically, only large Internet companies and some research institutes can do this. As an ordinary second school, it is difficult for a student who is not intelligent and cannot chew on mathematics and algorithms around the clock. (What I Want to specifically declare is that I am not saying that the postgraduate entrance exam is definitely better than the undergraduate students, but some things need basic skills, such as search technology, the algorithms involved in Word Segmentation, clustering, sorting, retrieval, filtering, and traversal all require good mathematics and algorithms, I think a lot of people cannot even make a simple statistical analysis question when they graduate. In addition, no matter whether we accept it or not, in fact, a job focusing on algorithms usually requires a postgraduate degree or above .) It can be done even if you work very hard, but postgraduate entrance exams are definitely a shortcut. In fact, I also realized that I expected to learn how to use those open-source software to write a few simple crawlers (the previous one was communicated in the technology group and suddenly found an interesting problem, it seems that almost all Java-related crawlers have been written) is not enough to complete a real search engine.

In this way, with the help of my family members, I took the postgraduate entrance exam. However, this road is still tearful. Maybe it's not just a question of study. I haven't been visiting the Technical Forum for a long time every day. If I don't write code for a few days, I feel itchy. So I took a small technical website, at last, it took a lot of time (XX's interface was changed every day. Fortunately, the final reward was enough ). It is worth mentioning that I took a few people to play LOL every day, but it also filled the gap between the University and the University. The final result was that we did not take any tests. One of them suddenly took the box to the north for C ++ (in fact, he mixed really good, it is said that the preparation for direct 7 K ).

1.4 fell and climbed up, dream still

After the postgraduate entrance exam, I was planning to go out and the result was dragged down. The technology of a management system was no difficulty, but it took a lot of time and three people went down. During this time, I was in a low mood, I was not admitted to the study, and I missed the time to find the best job. There are a few skilled students looking for a job (there are a lot of practical project experience, there is no problem with writing General programs), and the result hits the wall: It is stuck by many basic problems. I asked myself, and I found myself not necessarily better than them. At this time, the classmate mentioned above came back to complete the work and chatted with him and found that he was not very satisfied with the company. The biggest reason was that he felt that his colleagues were not very good and the technical atmosphere was not good enough, in addition, the most important thing is the foundation. Generally, individual programs can be written as long as they are trained. Only with a good foundation can they develop at a higher level.

In fact, it was quite a hit at the beginning. The C # That we developed together has a big gap. The reason is that you are too impetuous. However, if you find your own shortcomings, you can make up for them. This is actually nothing to do with search, but I will stick to it all the time, even if it can only be used as a hobby.

As I said to my friends a few days ago: I am now alone, and I am the only one who cares about me. As long as they are healthy, they will only wait for me to go on graduation. If the dream is not broken, the dream will remain!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.