In the Big data analytics/mining area, which programming languages are most used?

Source: Internet
Author: User


Tim Roy, I was here, too.9 people agreeUpdate the answer-

Before I mentioned the use of R, and later I also feel a little support, it should be technology can not do it. It's still recommended to go to Python.

Python is not limited to data analysis, and there are many other uses that can help broaden your horizons. At the same time, if it is used as an introductory language, its simplicity, strict indentation, and a rich third-party library can help beginners get started well.

Send a data analysis, mining, what are the good books worth recommending? -Book Recommendation
Shaoda recommended books can be consulted, including many excellent textbooks, are based on Python as a programming tool, than machine learning and Natural Language processing the most classic textbooks, have branded the Python mark. Edited on 2016-07-07 1 reviews thanks for sharing collection• No help • Report • Author retention rights 0 Approval objection, will not show your nameWang Z, Life is short, and it's wonderful.Actually, JavaScript is good.
30% faster than Python analysis
More efficient than Java Development Business release yesterday 08:12 add a comment thanks for sharing collection• No help • Report • Author retention rights 2 Approval objection, will not show your nameFang, deep technical home, and the line and view ...2 people agreeJava
The ecological circle is advantageous, second, to solve the large-scale heterogeneous machine cluster computing and management. Posted on 2015-01-12 Add Comment thanks for sharing collection• No help • Report • Author retention rights 8 Approval objection, will not show your nameNssimacer, CSU undergraduate/UCAS/yard animal/Wang Bobo ...8 people agreeThis semester just went to the "Data mining" class, the teacher recommended programming language is C + +, followed by Java. We use more of our own Python and Java, here only Python and java,r are not currently used. Personal experience:
    1. Python and Java get started relatively fast, the programming ability is not very high, programming efficiency is relatively high, can be more "elegant", faster to achieve a prototype;
    2. In terms of performance, the JIT optimization of Java makes performance almost catch up with C + +;
    3. Both of the platform portability is good;
    4. Python has a more professional and comprehensive library support in data mining, and it is also a cause of high programming efficiency.
    5. Java has a natural advantage when dealing with Hadoop/spark-based big Data Services, and Hadoop supports the most comprehensive Java
Portal: Nine programming languages for big data processing-data mining and data analysis published in 2015-01-10 add a comment thank you for sharing collection• No help • Report • Author retention rights 5 Approval objection, will not show your nameSun Minze, Programmers5 people agreeFirst of all, Linux Shell,awk as a data processing statistical tool, can solve more than 80% of the data analysis needs, and in the implementation of the efficiency is also significantly more than Python,
But Python's rich third-party library, including Numpy,pandas, facilitates development and handles standards more prescriptive. Posted on 2015-01-13 1 reviews Thank you for sharing your collection• No help • Report • Author retention rights 4 Approval objection, will not show your nameJay4 people agreeEvery industry is like a flood of information, in the face of tens of thousands of customers browsing records, purchase behavior data, if you want to use Excel to data processing is very unrealistic, Excel compared to other statistical software functions have been far from; The following programs should have a certain understanding
The advantage of R language is that it is simple and easy to get started;
Python combines the fast, complex data mining capabilities of R and the more pragmatic language to become mainstream quickly, Python is simpler and more intuitive to learn than R, and its ecosystem has grown incredibly fast in recent years, and is more powerful than r in statistical analysis.
Most of today's data science is based on R, Python, Java, Matlab and SAS, but there is still a gap to make up, and this time, the new person Julia saw this pain point. Julia is a high-level, incredibly fast and expressive language that is much faster than R, and has the potential to handle larger data than Python and is easy to get started with.
Java does not have the same visual capabilities as R and Python, nor is it the best tool for statistical modeling, but if you need to build a large system and use a prototype of the past, then Java is usually your most basic choice.


Another Java-based language, like Java, Scala is a growing tool for any large-scale mechanical learning or high-order algorithm. It is good at presenting and has the ability to build reliable systems.


Matlab can be said to be a long-lasting, even if it is a very high price, in a very specific niche market it uses a very wide range, including intensive research machine learning, signal processing, image recognition and so on.

Ethink Big Data One-stop platform http://www.

Posted on 2016-01-12 Add Comment thanks for sharing collection• No help • Report • Author retention rights Sail, Discover beauty, share beauty. 13 people agreeDo data analysis must be good at using tools, currently the world's most popular visualization tools total there are 56 kinds, let's look at the specific what is it!

One, Excel
Excel, as an entry-level tool, is an ideal tool for quickly analyzing data, as well as creating data graphs for internal use, but Excel has a limited range of colors, lines, and styles to choose from, which also means that it is difficult to create a data graph that meets the needs of professional publications and Web sites in Excel.

Second, Google Chart API
Google Chart provides a perfect way to visualize data, providing a large number of ready-made icon types, from simple line charts to complex hierarchical tree maps. It also has built-in animations and user interaction controls.

Third, D3
D3 (Data driven Documents) is another JavaScript library that supports SVG rendering. However, D3 can provide a large number of complex chart styles outside of linear and bar charts, such as Voronoi, tree, circular clusters, and word clouds.

Four, R
The R language is mainly used for statistical analysis, drawing of the language and operating environment. Although R is primarily used for statistical analysis or development of statistical-related software, it is also used as a matrix calculation. Its analysis speed is comparable to gnuoctave even business software matlab.

If you need to make infographic rather than just data visualization,http/ is one of the most popular options.

Liu, processing
Processing is a signature tool for data visualization. You just need to write some simple code and then compile it into Java. The processing can be run on almost any platform.

Seven, leaflet
Leaflet is an open source JavaScript library for developing mobile-friendly interactive maps.

Eight, Openlayers
Openlayers is probably the most reliable of all map libraries. Although the documentation comments are not perfect. The learning curve is steep, but for specific tasks, openlayers can provide special tools that are not available in other map libraries.

Nine, Polymaps
Polymaps is a map library that is primarily intended for users of data visualizations. Polymaps is unique in the area of map styling, like a selector for CSS style sheets.

Ten, charting Fonts
Charting fonts is the integration of symbol fonts with fonts (turning symbols into fonts), creating beautiful vectorization icons.

Xi. Gephi
Gephi is a tool for visualizing the social Atlas data, not only to handle large-scale datasets, but also to Gephi as a Visual Web discovery platform for building dynamic, hierarchical data graphs.

12, Cartodb
Cartodb is a non-missed site, you can use Cartodb very easily linked tabular data and maps, which CARTODB is the best choice.

13, Weka
Weka is an excellent tool for classifying and clustering large amounts of data based on attributes, Weka is not only a powerful tool for data analysis, but also generates some simple graphs.

14, Nodebox
Nodebox is an application that creates two-dimensional graphics and visualizations on OS X, and you need to understand the Python program, Nodebox is similar to processing, but has no processing interactive functionality. Nodebox | Home

XV, Kartograph
Kartograph does not need any map provider like Google Maps to create an interactive map, consisting of two libraries, from the open format of spatial data, using the vector projection of the Python library and the Post GIS, Combine the two into the SVG and JavaScript libraries and turn these SVG data into interactive maps.

16. Modest Maps
Modest maps is a small map library that, in conjunction with some extension libraries, such as wax, modest maps, instantly becomes a powerful map tool.

17, Tangle
Tangle is an interactive tool used to explore, play and instantly view document updates.

18, Crossfilter
Crossfilter is both a chart and an interactive graphical user interface applet, and when you adjust the input range in a chart, the data for other associated charts will change as well.

19, Raphael
Raphael is the JavaScript library that creates charts and graphs, and the biggest difference from other libraries is the output format only for SVG and VML.http/

JSDRAW2DX is a standard JavaScript library for creating any type of SVG interactive graphic that produces shapes including lines, rectangles, polygons, ellipses, arcs, and more. SVG Graphics Library for JavaScript html5:jsdraw2dx

21. Pizza Pie Charts
Pizza pie charts is a responsive pie chart that, based on the Adobe SNAP SVG framework, replaces JavaScript objects with HTML markup and CSS to make it easier to integrate advanced technologies.

22. Fusion Charts Suit XT
The Fusion Charts Suit XT is a cross-platform, cross-browser JavaScript charting component that gives you a delightful JavaScript charting experience. It is the most comprehensive charting solution with 90 + chart types and numerous interactive features including 3D, various gauges, ToolTips, drill down, zoom and scroll, and more. It has complete documentation and ready-made demos to help you quickly create charts.

23, Icharts
Icharts provides a managed solution for creating and presenting compelling charts. There are many different kinds of charts to choose from, each of which is fully customizable to fit the theme of the site. Icharts has interactive elements that can get data from Google Docs, Excel forms, and other sources.

24. Modest Maps
Modest maps is a lightweight, extensible, customizable, and free map display class library that helps developers interact with maps in their own projects.

25. Raw
The raw local D3.js library is very popular and supports many chart types, such as bubble maps, maps, and loops. It enables data sets to be in transit, copied, pasted, dragged, deleted, and allows us to customize the attempt and hierarchy.

26, springy
Springy design cool and simple answer. It provides an abstract graphical processing and calculation layout that supports canvas, SVG, WebGL, HTML elements.

27, Bonsai
Bonsai uses SVG as the output to generate graphics and animation effects, with a very complete graphics processing API, which makes it easier for you to work with graphical effects. It also supports effects such as gradients and filters (grayscale, blur, opacity).

28. Cube
Cube is an open-source system for visualizing time series data. It is based on MongoDB, Nodejs and d3.js development. Users can use it to build dashboard metrics for real-time visualization of internal dashboards.

29, Gantti
Gantti is an open source PHP class that helps users generate Gantti charts on the fly. Using Gantti to create a chart without using JavaScript, the pure HTML-CSS3 implementation. The default output of the chart is pretty, but the user can customize the style for output (SASS style sheet).

30, Smoothie Charts
The smoothie charts is a very small dynamic streaming data chart path. Display real-time data streams by pushing a websocket. Smoothie charts only supports Chorme and Safari browsers and does not support engraving text or pie charts, it is good at displaying streaming media data.

31, Flot
Flot is an excellent wireframe library that supports all canvas-enabled browsers (currently supported by mainstream browsers such as Firefox, IE, Chrome, etc.).

32. Tableau Public
Tableau Public is a desktop visualizer that enables users to create their own data visualizations and publish interactive data visualizations to Web pages.

33, many Eyes
Many eyes is a Web application used to create, share, and discuss user uploads of graphical data.

34, Anychart
Anychart is a flexible flash/javascript (HTML5)-based charting solution, cross-browser, and cross-platform. In addition to the charting function, it also has an interactive chart and meter for a fee.

35. Dundas Chart
Dundas Chart is an industry-leading net chart-processing control that was acquired by Microsoft in 2009 and integrates part of the charting product functionality into Visual Studio.

36, Timeflow
Timeflow analytical Timeline is a visual tool for temporary data, and now has an alpha version that gives you the opportunity to find errors, providing the following different rendering methods: Timeline, calendar, histogram, table, etc.

37, Protovis
Protovis is a visual JavaScript chart generation tool.

38, Choosel
Choosel is a scalable, modular Google Web tools framework that can be used to create web-based visualization platforms that integrate Data Workbench and infographics.

39, Zoho Reports
Zoho reports supports a wide range of features to help different users solve a variety of personalized needs, support SQL query, class four secretly table interface and so on.

40. Quantum GIS (Qdis)
Quantum GIS (QDIS) is a user-friendly, open source, GIS client program that supports the visualization, management, editing and analysis of data and the production of printed maps.

41, NodeXL
The main function of Nodexlde is social network visualization.

42, OpenStreetMap
OpenStreetMap is a map of the world, constructed by people like you, and can be used freely in accordance with open protocols.

43, Openheatmap
The Openheatmap is easy to use and allows users to upload data, create maps, and exchange information. It can convert data (such as Google Spreadsheet's form) into an interactive map app and share it online.

44, Circos
Circos was originally mainly used for the visualization of genomic sequence-related data, and has been applied in many fields, such as: the analysis of the characters in the film and television works, the order source and flow direction analysis of the logistics company, and so on, most of the relational data can try to visualize with Circos.

45, impure
Impure is a visual programming language designed to collect and process visual information.

46, Polymaps
Polymaps is a dynamic, interactive, dynamic map created based on vectors and tiles.

47, rickshaw
Rickshaw is a library of time series charts based on D3.js to create sequential interactions.

48, Sigma.js
Sigma.js is an open-source lightweight library that displays interactive static and dynamic charts.

49, Timeline
Timeline is the timeline where users can see at a glance what they did when they did it.

50, Birdeye
Birdeye is decearative Visual analytics, which belongs to a group project, in order to improve the design and visual development of a wide range of open source data, and for Adobe Flex to build a visual analysis of the library, this action is based on a narrative database, Allows users to create a multi-data visualization interface to analyze and present information.

51, Arbor.js
Arbor.js provides efficient, force-oriented layout algorithms, abstract chart organization, and filter update processing.

52, Highchart.js
Highchart.js is a simple JavaScript-written chart library that provides an easy way to add interactive graphs to your website or website application. At present, it can support line graph, spline function diagram.

53, Paper.js
Paper.js is an open-source vector graph narrative architecture that works in HTML5 Canvas, which is easy for beginners to learn, and many of them are professionally oriented to provide middle-level and high-order users.

54. Visualize Free
Visualize free is a visual analysis tool built on the high-end business background set tour Inetscoft, which can be used to screen and see trends from multivariate data, or to cut data or small-scale data using simple locations and methods.

55, Geocommons
Geocommons enables users to build rich interactive visualization applications to solve problems, even if they don't have any experience with traditional maps. You can visualize real social data or geocommons saved over 50,000 open source data on a map, create interactive visual analytics, and embed your work in websites, blogs, or on social networks.

56: Echarts

Mention echarts, often use open source software friends should be very familiar with, of course, if you do not know it is OK. But you must know that last spring Festival and the recent CCTV big plan report Baidu Big data products, such as Baidu migration, Baidu Sinan, Baidu Big data forecast and so on, these products data visualization are through echarts to realize.

In the case of foreign big data visualization enterprises tableau, Datawatch, Platfora Strong into China, the Chinese launched the Echarts, and open source, from this point, China's big data industry does not lag behind the North American countries. Echarts also let us see the future of China's big data visualization, thanks to Echarts and Echarts team.

Traditional data visualization tools simply combine the data to provide the user with different presentation methods for discovering the associated information between the data. In recent years, with the advent of the cloud and Big Data era, data visualization products are no longer satisfied with the use of traditional data visualization tools to extract, summarize and simply show the data in the Data Warehouse. New data visualization products must meet the big data needs of the Internet outbreak, and must quickly collect, screen, analyze, summarize and present the information needed by decision makers, and update them in real time based on the new data. Therefore, in the age of big data, the data visualizer must have the following features:

(1) Real-time: The data visualization tool must adapt to the explosive growth demand of data volume in the big data age, must collect the analysis data quickly, and update the data information in real time;

(2) Simple operation: Data visualization tools to meet the rapid development, easy to operate features, to meet the internet era of information changeable characteristics;

(3) Richer display: Data visualization tools need to have a richer presentation, can fully meet the multi-dimensional requirements of data presentation;

(4) A variety of data integration support methods: The source of data is not limited to the database, data visualization tools will support team Collaboration data, Data Warehouse, text and other ways, and can be displayed through the Internet.

Data visualization technology is now a new field, there are more and more development, research and other data visualization analysis in countries such as the United States are constantly being demanded. The enterprise obtains the data visualization function mainly through the programming and the non-programming two kinds of tools realization. Mainstream programming tools include the following three types: Data visualization from an artistic perspective, and a more typical tool is processing.js, which is a programming language for artists. From the point of view of statistics and data processing, R language is a typical tool, which can do both data analysis and graphical management. Between the two tools, both the data processing, but also to take into account the results, D3.js is a good choice. JavaScript-based data visualization tools like d3.js are more suitable for interactive display data on the Internet.

Fly Sword Network Marketing of online marketing real-time blog content, from the original Flying sword, has originated from the network, there is transcription, there are finishing, there are improvements, there are lessons, flying sword notes, flying sword sentiment, flying sword analysis, Flying sword excerpt, etc. flying Sword is good at resource integration, is the Internet porter. Like the flying Sword of the small partners, welcome to free subscription Flying Sword Blog, or add Flying sword qq, love to make friends, we are love Network marketing a bunch of people.

Flying Sword only focus on network marketing practice, Network marketing is not learned, is the real work out! Network Marketing Strategy layout, tactical start-up, competitor analysis, Industry business plan, online marketing effective execution-landing tips, seo,sem,smm,mmm,epm, Weibo, apps, all channels. Network Marketing creativity for the king, only continuous innovation, and constantly surpass, no overall copy, many success is not replicable!
Add Fei Jian qq:2734053776, jointly explore the network marketing promotion, welcomed the love of Internet marketing friends to exchange, network marketing is not so-called Master or master, only continuous efforts to learn, massive combat, the real accumulation of experience!

How to choose Big Data training organization? Dahne Big Data Training okay?

At present, big data training institutions are too many, dazzling. So how do you pick the right training organization? Small series deliberately for everyone summed up a few points:

1. See high-paying employment data

If there are many students to participate in the study, if there is a high level of employment data, it is a reliable organization. If only the instructor is more than the bull, do not promote employment information, then need to seriously consider. The lecturer is very cow, does not mean that the graduate student will be same cow.

2. See the number of full-time lecturers

Big Data industry salary is very high, big data company inside of the first-line engineer annual salary minimum more than 200,000, if hired as a full-time lecturer, Lecturer's annual salary must not be less than more than 200,000, this for training institutions, cost pressure is very big. Many training institutions have only part-time lecturers, so the cost is minimal. Because there is no class, the training institutions do not pay wages, there is no cost. But part-time lecturers do not have so much time to prepare, and there is a big difference in the level of lectures with full-length lecturers.

3, see Follow-up service it

Industry technology updates are very fast. We work overtime in the unit, tired of desperate, there is no time to learn new technology. If the training institutions only pay attention to training a technology to receive a sum of money, for our long-term development disadvantage. If in the training institutions, pay to learn the technology, all the updates of this technology can be free to learn, it is very good.

4. See if we allow field trips

There are a lot of purses in the training institutions, and they have no faculty, just an organization that combines the students who want to attend the training and the first-line people who want to do part-time lectures. This is not responsible for the students. If the field trip, and staff chat, it is easy to see.

And then I'll take you to see China's largest It vocational education Group Dahne Science and Technology group Big Data training institutions in the end what advantages? What did Mr. Li Yi, vice president of research and teaching at Danone Group, say?

(Dahne Group, vice president of teaching and research, Li Yi interviewed by reporters)

Li always told reporters, Dahne students in the choice to participate in vocational education, he is the first choice of courses, to see whether the curriculum is professional, whether it can let him direct employment, this is the most core, the most direct purpose. "Dahne's curriculum system, the biggest advantage is that it and enterprises to do is very good, can be said to be seamless." We through with Dahne tens of thousands of cooperative enterprises real job needs to visit to understand, directly grasp the first-hand information obtained.

Students second look, is to see whether the teacher is excellent. Li always told reporters, "with a sentence plain English, is to see the teacher has no name." Many people come to the title of a famous teacher. Dahne lecturers in the industry are the premier, at least 10 years of practical work experience, are from Huawei, IBM, Hewlett-Packard and other well-known IT companies. Teaching and training experience is also required. ”

Li always said, Dahne teacher Team Strong There is a key point, the teaching mode is an expert modular teaching, is the entire curriculum system is not a person to finish, is a certain aspect of the experts are only responsible for a module of the knowledge explained. Not that it will only

This piece, the Dahne lecturer on the entire technical system is completely master, is also experienced, modular teaching is only said may be in his more familiar with the field of learning experience richer.

Li always told reporters that the third advantage is the upgrading of the teaching platform. Dahne spent a lot of worry and put into the platform upgrade, students in class, you can login platform, in the above can see the teacher's handout, can do cases, do classic cases, can see the video playback, can ask the teacher questions, can interact with a series of teaching activities, can be achieved through the platform. The advantage of the platform is that it can realize the standardization and refinement of teaching. "Standardization is embodied in what lessons are taught every day, even every one hours to tell what class, what content, how to do exercises, how to do what the project, what the teacher said, with the students do what practice, students go to do what they do after the exercise, after class what homework, are strictly formulated." We call it a teaching guide, or a teaching calendar. Very process, very regular learning. ”

"Refinement embodies the Dahne of the course content into micro-knowledge points." is to divide the knowledge into a very fine, under the heading of a title, there may be a lot of two-level title, two levels under the heading of the three-level title, sub-level three headings, may be detailed to its specific in the work to do something, the whole of this knowledge refinement. "What good would that do?" Lee told reporters, "before the students may do a piece of homework, he can not, he may not feel." With the micro-knowledge point, after the students do this problem, do good and bad, we can directly be used to test. Just like a doctor to do testing, in the end is what the problem, check is very clear, a job below have knowledge points of the link, is the micro-knowledge point link, the problem is wrong, point corresponding micro-knowledge point, you can see the document, you can see the video to improve the knowledge. ”

It learning is sometimes difficult to feel dull, Dahne will use the "moisten the matter is silent" approach, through to the student positive energy incentives, let them build confidence. Please some successful people, say inspirational growth pass. Every month there will be Dahne CEO consultation day, Korea always talk to the students in person. Each class is equipped with a class teacher, to the trainees to conduct career planning, as a professional consultant.

Li always told reporters, Dahne students in the study, in the job interview, or in the psychological mood and other aspects of the problem, there are dahne professional teachers to do the best to improve the overall quality of students, so that students come to the inside, learn the true skills, the benefit of society at the same time, but also to seek their own happiness.

If you need to learn to sweep my two-bit code, send you coupons

http/ (Two-dimensional Code automatic identification)


In the Big data analytics/mining area, which programming languages are most used?

Related Article

Beyond APAC's No.1 Cloud

19.6% IaaS Market Share in Asia Pacific - Gartner IT Service report, 2018

Learn more >

Apsara Conference 2019

The Rise of Data Intelligence, September 25th - 27th, Hangzhou, China

Learn more >

Alibaba Cloud Free Trial

Learn and experience the power of Alibaba Cloud with a free trial worth $300-1200 USD

Learn more >

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.