Why does data analysis generally use java instead of hadoop, flume, and hive APIs to process related services? Why does data analysis generally use java instead of hadoop, flume, and hive APIs to process related services?
Reply content:
Why does data analysis generally use java instead of hadoop, flume, and hive APIs to process related services?
Isn't SQL directly used for data analysis like traditional relational databases?
If you want to analyze Web server logs, you can use PHP. file () is used to read files. Each line is an element of the array, then, the content of each column can be obtained through Segmentation or regular matching. if the file is large, you can call the split command to split the file before performing the operation.
I personally think that there are not many people and companies that can meet the "Big Data" scenario where databases cannot be installed.
For some text data, direct application of cat/find/grep/awk/sed/sort/uniq/cut/wc/split/xargs in Linux Shell is also a fast method.
Java is the most widely used statement in commercial use, and there are naturally more solutions for data analysis.
I think PHP is not doing well in other aspects except for WEB.
Hadoop kit is a distributed computing framework. Data Analysis is mostly performed on a single machine and does not require distributed clusters to provide computation.
Hadoop was developed using java and the earliest hadoop seems to only support java and C/C ++ (if it is wrong, please correct it ).
I think it is more caused by the language and history. If hadoop and so on are developed using php, it is estimated that more data analysis is now using php.