Recent discussions on crowdfunding projects

Source: Internet
Author: User
Tags openkm

Openkm

Would like to ask if there is no such open source file management system, everyone can upload files, only the administrator can download other people's files?

I don't know if OPENKM can do it.

OPENKM is an open source electronic document management system, it is characterized by large companies or small and medium-sized enterprises, adaptability is relatively strong. and the processing of knowledge management provides more flexible and less expensive alternative applications.

The interface is as follows:

650) this.width=650; "title=" image "style=" margin:0px; "alt=" image "src=" http://s3.51cto.com/wyfs02/M01/6C/2D/ Wkiol1vbvbddzwbiaahwsxysdu0135.jpg "height=" 384 "/>

Zongtui Project

Project Address:
(Distributed crawler) Http://git.oschina.net/zongtui/zongtui-webcrawler
(Remove heavy filter) Https://git.oschina.net/zongtui/zongtui-filter
(text classifier) Https://git.oschina.net/zongtui/zongtui-classifier
(Document directory) Https://git.oschina.net/zongtui/zongtui-doc

650) this.width=650; "title=" O$}1emgohjhew ' WHJG0 "ala" alt= "O$}1emgohjhew ' WHJG0) ala" src= "http://s3.51cto.com/ Wyfs02/m02/6c/2d/wkiol1vbvbda_oc7aahfz6j2zfe553.jpg "height="/>

Then recommend an article: (Deep learning vs machine learning vs pattern recognition)

Http://www.itd4j.com/cloudcomputing/15538.html

Automating deployment

Is there a recommended automated deployment tool for Java?

Sometimes modify a few files will be re-packaged release restart, too much trouble, what better way to ask?

Jenkins is an open source software project designed to provide an open and easy-to-use software platform that enables continuous integration into a possible economic environment.

Cloud Crawl

Is that there is a client to crawl 1000 items, he can submit to the server, and then have the server assigned to other clients to crawl. This makes the performance relatively high, but also can escape the IP limit.

That's what the user is going to crawl. You let the user crawl on their own. The server is only responsible for receiving tasks, assigning tasks, and returning tasks.

is equal to the free IP pool.

New Project Architecture

After discussion, the current project new structure has been modified as follows:

650) this.width=650; "title=" 7nxn9[j0t ' zgmnulrli0@9c "alt=" 7nxn9[j0t ' zgmnulrli0@9c "src=" http://s3.51cto.com/ Wyfs02/m00/6c/2d/wkiol1vbvbgbbtmyaajpxjno_q8134.jpg "height=" 633 "/>

In this way, the problem of the focus on how to access the crawler, because now a variety of reptiles are too many, there is no need to engage in something new!

Core part of the idea for reference:

650) this.width=650; "Title=" 2p3 (8io@k_~[gg[6ftb9j%g "alt=" 2p3 (8io@k_~[gg[6ftb9j%g "src=" http://s3.51cto.com/ Wyfs02/m01/6c/2d/wkiol1vbvbgj7x4oaag7r0vicko729.jpg "height="/>

The next step of processing

1. Crawl the page by setting rules;

2. Set the page storage scheme;

3. Analyze the content properties through the page material;

4. Generate results from content attributes;

5. Learning through the results;

6. Generate content through results;

To explain why the other one, I give two examples

1, WebMagic

650) this.width=650; "title=" p{6{44@9$uv8d_i33% ' x3nd[4] "alt=" p{6{44@9$uv8d_i33% ' x3nd[4] "src=" http://s3.51cto.com /wyfs02/m02/6c/2d/wkiol1vbvbhatb8oaaiwea7w8ig122.jpg "height=" 565 "/>

As I know, this guy has been writing for 2 years, and basically all kinds of problems have been met. There is no need to walk the road again, if there is a problem can be reserved by the interface to help it perfect, or directly with their own implementation. For example, there is a performance problem, I know that there is no more authoritative in the domestic comparison of various reptiles.

2, Nutch is an open source Java implementation of the search engine. It provides all the tools we need to run our own search engine. Includes full-text search and web crawlers.

Nutch's founder is Doug Cutting, who is also the founder of Lucene, Hadoop and Avro Open source projects.

I think there are only a few possible things to say about it:

1, the scene is not suitable.

2, do not understand, useless understand.

So I don't think it's necessary to build a wheel.

Personalized recommendations

User-side is the data presentation, my understanding is the content of the main work is: collection, collation, recommendation, tag, score (multiple), recommendation, praise, stepping, reply number, type (graphic, video, text, Weibo, etc.);

The user side of the thing is really advanced many: the individual relationship portrait, the different social circle relations portrait, the main crowd divides the tag score, the age, the sex, the occupation, the special event, likes the content tag score, the collection content tag score, the content tag score, the content tag score (negative value or other score)
Recommended engine main work: According to the user's tag score match content, combined with geographical location (current and common), the current time period (early, middle, lower, late), the current date (holiday, weekend), Hot spot registration of the time to choose the label this is the practice of SNS, Headlines are now largely driven from user relationships with associated user data.

As long as we have completed the preliminary recommendation function, the others basically rely on the operation of the people to accumulate data. There is not enough data to be sure of accuracy. For example, a little information, now the content is almost all moved over, but the recommendation is still very bad, mainly by the editor of the headline to recommend that piece, artificially added a point score. Otherwise the feeling of recommendation will be more forbidden.

This article is from the "Skyme" blog, make sure to keep this source http://skyme.blog.51cto.com/447319/1640887

Recent discussions on crowdfunding projects

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.