At present, the entire Internet is evolving from the IT era to the DT era, and big data technology is helping businesses and the public to open the door to DT world. The focus of today's "big data" is not only the definition of data size, it represents the development of information technology into a new era, represents the explosive data to traditional computing technology and information technology challenges and difficulties, represents the big data processing needs of new technologies and methods, It also represents new inventions, new services and new development opportunities brought about by big data analytics and applications. To help you get a better understanding of big data, cloud-based community organizations have translated GitHub Awesome big database resources for your reference. This resource type mainly includes: Big data frame, paper and other practical resources collection. \ resource list: relational database management System (RDBMS) MySQL: The world's most popular open source database; PostgreSQL: The world's most advanced open source database; Oracle database: Object-relational database management system. Framework Apache Hadoop: A distributed processing architecture that combines MapReduce (parallel processing), YARN (Job scheduling), and HDFs (Distributed File system); Tigon: High throughput real-time streaming processing framework. Distributed programming AddThis Hydra: A distributed data processing and storage system originally developed on AddThis; Amplab SIMR: Running spark on Hadoop MapReduce v1; Apache Beam: A unified model and a set of specific SDK languages for defining and executing data processing workflows; Apache Crunch: A simple Java API for performing tasks such as monotonous joins, data aggregation, and so on, when common MapReduce implementations are implemented; Apache Datafu: A user-defined set of functions developed by LinkedIn for Hadoop and pig; Apache Flink: High performance Execution time and automatic program optimization; Apache Gora: In-memory data model and persistence framework; Apache HAMA:BSP (overall synchronous parallel) computing framework; Apache MapReduce: A programming model that uses parallel, distributed algorithms to process large data sets on clusters; Apache Pig:hadoop, an advanced query language for processing data analysis programs; Apache REEF: A retention Assessment implementation framework for simplifying and unifying low-level big data systems; Apache S4:S4 Stream processing and implementation framework; Apache Spark: Memory cluster computing framework; Apache Spark streaming: Stream processing framework, also part of Spark; Apache storm:twitter Stream Processing frame, also can be used for yarn; Apache SAMZA: Stream processing framework based on Kafka and yarn; Apache Tez: Yarn-based for the execution of complex dags in a task (directed acyclic graph); Apache twill: Yarn-based abstraction to reduce the complexity of developing distributed applications; Cascalog: Data processing and query library; Cheetah: High-performance, custom Data Warehouse on MapReduce; Concurrent cascading: Data Management/analytics framework on Hadoop; Damballa Parkour: The MapReduce library for Clojure; Datasalt pangool: selectable mapreduce paradigm; Datatorrent Stram: A real-time engine for distributed, asynchronous, real-time memory big data calculations in the best possible way, with minimal expense and minimal performance impact; Facebook Corona: Optimizing for Hadoop to eliminate single points of failure; Facebook peregrine:mapreduce Framework; Facebook Scuba: Distributed memory data storage; Google Dataflow: Create a data pipeline to help it analyze the framework; Netflix Pigpen: For MapReduce, used to compile Apache Pig; Nokia Disco: Mapreduc, developed by Nokia, acquires, transforms and analyzes data; Google mapreduce:mapreduce Framework; Google Millwheel: A fault-tolerant stream processing framework; JAQL: A declarative programming language for working with structured, semi-structured, and unstructured data; Kite: A set of libraries, tools, instances, and documentation that makes it easier to build systems on the ecosystem of Hadoop; Metamarkets Druid: Real-time e-Framework for large data sets; Onyx: Distributed cloud computing; Pinterest pinlater: Asynchronous task execution system; Pydoop: Python mapreduce and HDFs APIs for Hadoop; Rackerlabs Blueflood: Multi-tenant distributed measure processing system; Stratosphere: GeneralCluster computing framework; Streamdrill: Used to calculate the activity of event streams based on different time windows, and find the most active one; Tuktu: Easy-to-use platform for batch processing and stream computing, built through Scala, Akka and play; Twitter scalding: Based on cascading, the Scala library for map reduce work; Twitter summingbird: Using scalding and storm streaming mapreduce on Twitter; A time-series aggregator on Twitter Tsar:twitter. Distributed File System Apache HDFS: The way to store large files on multiple machines; Beegfs: Formerly Fhgfs, parallel Distributed file system; Ceph Filesystem: Designed software storage platform; Disco DDFS: Distributed File system; Facebook Haystack: Object storage System; Google Colossus: Distributed File System (GFS2); Google GFS: Distributed File system; Google Megastore: Scalable, highly available storage; Gridgain: Compatible with GGFS, Hadoop memory file system; Lustre File System: High performance distributed filesystem; Quantcast file System QFS: Open source distributed filesystem; Red Hat GlusterFS: Extended Network Attached storage (network-attached Storage) file system; Seaweed-fs: A simple, highly scalable Distributed file system; Alluxio: File sharing on a cross-cluster framework at a reliable storage rate; TAHOE-LAFS: Distributed cloud Storage System, file data Model Actian Versant: Commercial Object-oriented database management system; Crate data: is an open-source, large-scale, scalable datastore that requires 0 management models; Facebook Apollo:facebook's Paxos algorithm, similar to the NoSQL database; Jumbodb: Hadoop-based document-oriented data storage; LinkedIn Espresso: Scalable, document-oriented nosql data storage; MarkLogic: Model agnostic Enterprise version of NoSQL database technology; MongoDB: Document-oriented database system; RavenDB: A transactional, open source document database; RethinkdB: A document database that supports queries such as connection queries and group basis. Key MAP Data Model Note: There are some terminology confusion in the industry, there are two different things called "column database". Some of the listed here are distributed, persistent databases built around the "Key-map" data model, where all of the data has (possibly consolidated) keys and is associated with key-value pairs in the map. In some systems, several such value mappings can be associated with keys, and these mappings are called "column families" (keys with mapped values are called "columns"). Another set of technologies that can also be called "column database" differs from the previous group in the way it stores data, either on disk or in memory--rather than in the traditional way, that is, the key values of all established keys are adjacent and row-by-line storage. These systems are also adjacent to each other to store all the column values, but to get all the values for a given column does not require the tedious work of the previous. The previous group is known here as the "key map data Model", and the boundary between the two and the Key-value data model is rather vague. The latter has more storage formats for the data model and can be listed in the column database. To learn more about these two models, read Daniel Abadi's blog: Distinguishing, major types of Column Stores. Apache Accumulo: Distributed key/value storage built into Hadoop; Apache Cassandra: Distributed data storage for columns, authorized by BigTable; Apache HBase: BigTable authorized, column-oriented distributed data storage; A derivative of hbase developed by Facebook Hydrabase:facebook; Google BigTable: Column-oriented distributed data storage; Google Cloud Datastore: A fully managed modeless database for storing non-relational data on bigtable; Hypertable: Distributed data storage for columns, authorized by BigTable; INFINIDB: Access via MySQL interface and parallel query using massively parallel processing; Tephra: for hbase processing; A real-time, multi-tenant distributed database of Twitter Manhattan:twitter. Key-Value Data Model Aerospike: Supports NoSQL flash optimizer with data stored in memory. Open source, server code in "' C ' (not Java or Erlang) can be precisely adjusted to avoid context switching and memory copying." Amazon DynamoDB: Distributed key/value storage, Dynamo paper implementation; Edis: A server that is compatible with the protocol in place of Redis; ELEPHANTDB: A distributed database that specializes in data export in Hadoop; Eventstore: Distributed time series database; GRIDDB: Suitable for sensor data stored in time series; LinkedIn Krati: Simple persistent data storage with low latency and high throughput; Linkedin Voldemort: Distributed key/value storage system; The distributed key-value database developed by the Oracle NoSQL database:oracle Company; Redis: In-memory key-value data storage; Riak: Decentralized data storage; Storehaus:twitter developed a library of asynchronous key-value stores; Tarantool: An efficient NoSQL database and LUA application server; TIKV: Licensed by Google Spanner and hbase, Rust provides technical support for distributed key-value databases; Treodedb: Can be copied, shared key-value store, can provide multi-line atomic write. Graphical data Model Apache Giraph: Hadoop-based Pregel implementation; Apache Spark Bagel: Can be implemented Pregel, as part of Spark; ARANGODB: Multi-layer model distributed database; Dgraph: A scalable, distributed, low-latency, high-throughput graphics database designed to provide low latency for Google production levels and throughput for real-time user queries of terabytes of structured data; Facebook Tao:tao is a distributed data store widely used by Facebook to store and serve social graphics; The Gaffer in GCHQ GAFFER:GCHQ is an easy-to-store framework for large-scale graphics, where nodes and edges have statistical data; Google Cayley: Open source graphics database; Google Pregel: Graphics processing framework; Graphlab Powergraph: Core C + + Graphlab API and a collection of high-performance machine learning and data mining toolkits built on the Graphlab API; The elastic distributed graphics system in Graphx:spark; Gremlin: Graphic tracking language; Infovore: RDF-centric map/reduce framework; Intel Graphbuilder: Tools to build large-scale graphics on Hadoop; Mapgraph: For large-scale parallel graphics processing on the GPU; NEO4J: A graphical database written entirely in Java; ORIENTDB: Documentation and graphics database; Phoebus: Large graphics processing framework; Titan: Distributed graphics database built in Cassandra; Twitter FLOCKDB: Distributed graphics database. Newsql database Actian Ingres: Business support, open source SQL relational database management system; Amazon RedShift: Data warehousing services based on PostgreSQL; BAYESDB: SQL database for statistical values; CITUSDB: Scale-out PostgreSQL through partitioning and replication; Cockroach: Scalable, address-replicable, transactional database; Datomic: A distributed database designed to generate scalable, flexible intelligent applications; FOUNDATIONDB: Distributed database inspired by F1; Google F1: A distributed SQL database built on spanner; Google Spanner: A global distributed semi-relational database; H-store: An experimental main memory parallel database management system for the optimization of online transaction processing (OLTP) applications; Haeinsa: Percolator,hbase-based linear scalable multi-row multi-table trading library; HANDLERSOCKET:MYSQL/MARIADB's NoSQL plugin; Infinisql: an infinitely extensible RDBMS; Memsql: In-memory SQL database, which has optimized flash-column storage; NUODB:SQL/ACID compatible distributed database; Oracle timesten in-memory Database: An in-memory, persistent and recoverable relational data base management system; Pivotal GemFire XD: Low latency distributed SQL data storage in memory, can provide SQL interface for memory list data, more persistent in HDFs; SAP HANA: is a column-oriented relational database management system in memory; SENSEIDB: Distributed real-time semi-structured database; Sky: A database of flexible, high-performance analytics for behavioral data; Symmetricds: Open source software for file and database synchronization; Map-d: For GPU Memory database, also for big data analysis and visualization platform; TIDB:TIDB is a distributed SQL database, based on the design inspiration of Google F1; Voltdb: Claiming to be the fastest in-memory database. Column Database Note: Read the relevant comments in the key-value data model. Columnar Storage: Explains what a columnstore is and when it will need to be used; ACtian Vector: Column-oriented analytic database; C-Store: column-oriented DBMS; MONETDB: column storage database; Parquet:hadoop the Columnstore format; Pivotal Greenplum: Specially designed, dedicated analytical data warehouses, similar to traditional line-based tools, provide a single-column tool; Vertica: Used to manage large-scale, fast-growing volumes of data that can provide very fast query performance when used in data warehouses; Google BigQuery: Google's cloud products, supported by its founding work in Dremel; Amazon Redshift: Amazon's cloud product, which is also based on a columnar data store back end. Time series Database cube: Use MongoDB to store time series data; Axibase time Series: A distributed timeseries database over HBase that includes built-in rule Engine, data prediction, and visualization; Heroic: Extensible time-Series database based on Cassandra and Elasticsearch; InfluxDB: Distributed time series database; Kairosdb: Similar to OPENTSDB but will take into account Cassandra; OPENTSDB: A distributed Time series database on HBase; Prometheus: A time series database and service monitoring system; Newts: A time-series database based on Apache Cassandra. class SQL processing Actian SQL for Hadoop: high-Performance interactive SQL for access to all Hadoop data; Apache Drill: An interactive analysis framework inspired by Dremel; Apache Hcatalog:hadoop's table and storage management layer; Apache Hive:hadoop's class SQL Data Warehouse system; Apache Optiq: A framework that allows efficient query translation, including heterogeneous and federated data queries; Apache Phoenix:apache Phoenix is the SQL driver for HBase; Cloudera Impala: An interactive analysis framework inspired by Dremel; Concurrent-Class SQL query language in lingual:cascading; Datasalt Splout sql: A complete SQL query tool for large datasets; Facebook prestodb: Distributed SQL query tool; Google BigQuery: Interactive analysis framework, Dremel implementation; Pivotal Hawq:hadoop's SQL-like Data Warehouse system; RAINSTORDB: A database for storing large-scale petabytes of structured and semi-structured data; Spark Catalyst: Query optimization framework for spark and shark; Sparksql: Using spark to manipulate structured data; Splice machine: A full-featured SQL RDBMS on Hadoop with acid transactions; Stinger: Interactive query for Hive; Tajo:hadoop Distributed Data Warehouse system; Trafodion: A solution for Enterprise-class sql-on-hbase transactions or business workloads for big data. Data ingestion Amazon Kinesis: real-time processing of large-scale data streams; Apache Chukwa: Data acquisition system; Apache Flume: A service that manages large volumes of log data; Apache Kafka: Distributed publish-subscribe messaging system; Apache Sqoop: A tool for transferring data between Hadoop and a structured data store; Cloudera morphlines: Help SOLR, HBase, and HDFs complete the ETL framework; Facebook Scribe: Stream log data aggregator; FLUENTD: Tools for capturing events and logs; Google Photon: A distributed computer system that connects multiple streams in real time with high scalability and low latency; Heka: Open source processing software system; Hiho: A framework for connecting different data sources with Hadoop; Kestrel: Distributed Message Queuing system; LinkedIn Databus: The stream of events captured for database changes; LinkedIn Kamikaze: A package that compresses an array of classified integers; LinkedIn Elephant: Log aggregator and dashboards; Logstash: Tools for managing events and logs; Netflix Suro: A log aggregator like Chukwa-based Storm and Samza; Pinterest Secor: is a service that realizes Kafka log persistence; Common data ingestion framework for Linkedin Gobblin:linkedin; Skizze: is a kind of data storage sketch, using probabilistic data structure to deal with counting, sketch and other related problems; Streamsets Data Collector: The infrastructure for continuous large data acquisition,The IDE is simple to use. Service programming Akka The running time of distributed, fault-tolerant event-driven applications in TOOLKIT:JVM; Apache Avro: Data serialization system; Java Library of Apache Curator:apache zookeeper; Apache Karaf: OSGi run time running on top of any OSGI framework; Apache Thrift: Framework for building binary protocols; Apache Zookeeper: Centralized service for process management; Google Chubby: A loosely coupled distributed system lock service; Linkedin Norbert: Cluster manager; OPENMPI: Message passing framework; Serf: Decentralized solutions for service discovery and coordination; Spotify Luigi: A python package that constructs a complex pipeline of batch jobs that handles dependency resolution, workflow management, visualization, fault handling, command line integration, and so on; Spring XD: Distributed, scalable systems for data ingestion, real-time analytics, batch processing, and data export; Twitter Elephant Bird:lzo compressed data working library; Twitter FINAGLE:JVM's asynchronous network stack. Dispatch Apache Aurora: Service Scheduler running on top of Apache Mesos; Apache Falcon: Data management framework; Apache Oozie: Workflow Job Scheduler; Chronos: Distributed fault-tolerant scheduling; Linkedin Azkaban: Batch workflow job scheduling; Schedoscope:hadoop operation Agile scheduling of Scala DSL; Sparrow: Scheduling platform; Airflow: A platform for writing, scheduling, and monitoring workflows programmatically. Machine learning Apache mahout:hadoop machine Learning Library; Neural networks in the brain:javascript; Cloudera Oryx: Real-time large-scale machine learning; Concurrent Pattern:cascading's machine learning library; Machine learning in Convnetjs:javascript, training convolutional neural networks (or normal networks) in a browser; Flexible and scalable machine learning in Decider:ruby; ENCOG: A machine learning framework that supports a variety of advanced algorithms, while supporting the standardization of classes and processing of data; ETCML: Machine learning text classification; Scalable machine learning in Etsy conjecture:scalding; Large-scale machine learning system in Google sibyl:google; Graphlab Create:python's machine learning platform, including an extensive collection of ML toolkits, data engineering and deployment tools; H2o:hadoop Statistical machine learning and mathematical running time; Mlbase: Distributed machine Learning library for Bdas stacks; Mlpneuralnet: Fast Multilayer perceptual Neural network library for iOS and Mac OS x; Monkeylearn: Making text mining easier, extracting categorical data from text; Nupic: Intelligent Computing Numenta Platform, it is a brain-inspired machine intelligence platform, based on cortical learning algorithm of accurate biological neural network; Predictionio: Machine learning server built on Hadoop, Mahout and cascading; Samoa: Distributed streaming media machine learning framework; Scikit-learn:scikit-learn for machine learning in Python; The implementation of some commonly used machine learning (ML) functions in Spark Mllib:spark; Vowpal Wabbit: The learning system launched by Microsoft and Yahoo; WEKA: machine learning software suite; Machine Learning Library for BIDMACH:CPU and accelerated GPUs. Benchmark Apache Hadoop Benchmarking: A micro-benchmark for testing Hadoop performance; Berkeley SWIM Benchmark: Realistic big data workload benchmark; Intel Hibench:hadoop Benchmark test suite; A benchmark suite for PUMA benchmarking:mapreduce applications; Yahoo Gridmix3: Hadoop cluster benchmark for the team of Yahoo engineers. Security Apache Knox Gateway:hadoop single point for secure access to the cluster; Apache Sentry: A data security module stored in Hadoop. The system deploys the operational framework of Apache Ambari:hadoop management; The deployment framework for the Apache Bigtop:hadoop ecosystem; Apache Helix: Cluster management framework; Apache Mesos: Cluster manager; Apache Slider: A yarn application for deploying existing distributed applications in yarn; Apache whirr: A library set running cloud services; Apache YARN: Cluster manager; Brooklyn: A library for simplifying application deployment and management; BuildOOP: Based on groovy language, similar to Apache Bigtop; Cloudera HUE: A Web application that interacts with Hadoop; Facebook Prism: Multi-data center replication system; Google Borg: Job scheduling and monitoring system; Google Omega: Job scheduling and monitoring system; Hortonworks HOYA: The application of HBase cluster can be deployed on yarn; Marathon: The Mesos framework for long-running services. Application Adobe Spindle: Next-generation web analytics with Scala, Spark, and parquet; Apache Kiji: A framework for real-time data acquisition and analysis based on HBase; Apache Nutch: Open source web crawler; Apache Oodt: For capturing, processing, and sharing data in NASA's scientific archives; Apache Tika: Content Analytics Toolkit; Argus: Time series monitoring and alarm platform; countly: Mobile and network analytics platform based on node. JS and MongoDB, open source; Domino: Run, plan, share, and deploy models-no infrastructure; Eclipse BIRT: Eclipse-based reporting system; Eventhub: Open source event analysis platform; Hermes: An asynchronous message agent built on Kafka; Hipi Library: API to perform image processing tasks on Hadoop ' s mapreduce; Splunk analysis of Hunk:hadoop; Imhotep: Large-scale analysis platform; Madlib:rdbms database for data analysis; Kylin: Open-source distributed analysis tools from ebay; R in Pivotalr:pivotal Hd/hawq and PostgreSQL; Qubole: Built-in data connector for automatic scaling of hadoop clusters; Sense: A cloud platform for data science and big data analytics; Snappydata: Distributed memory data storage for real-time operational analytics, providing data flow analysis, OLTP (online transactional processing), and OLAP (online analytics processing) built into spark single integrated clusters; Snowplow: Enterprise-Class networking and event analytics with technical support from Hadoop, Kinesis, Redshift, and Postgres; Sparkr:spark R front-end; Splunk: Analysis of data used in machine generation; Sumo Logic: Cloud-based analyzers forAnalyze machine-generated data; Talend: A unified open source environment for yarn, Hadoop, HBASE, Hive, Hcatalog, and pig; Warp: An instance query tool that leverages Big data (OS X app). Search engine and Framework Apache Lucene: Search engine library; Apache SOLR: A search platform for Apache Lucene; ElasticSearch: Search and Analysis engine based on Apache Lucene; Enigma.io: A robust web App for free add-on that explores, filters, analyzes, searches, and exports large datasets from the network; Facebook Unicorn: Social graphics search platform; Google Caffeine: continuous indexing system; Google percolator: Continuous indexing system; Teragoogle: Large search index; HBase coprocessor: An implementation of percolator, part of HBase; Lily HBase Indexer: Search quickly and easily for any content stored in HBase; LinkedIn Bobo: The implementation of faceted search, written entirely by Java, is an extension of the Apache Lucene; LinkedIn Cleo: For a flexible software library, local, unordered, real-time pre-input search to achieve rapid development; LinkedIn Galene:linkedin Search architecture; LinkedIn Zoie: is a real-time search/indexing system written in Java; Sphinx Search Server: The branch and evolution of full-text search engine MySQL Amazon RDS: Amazon cloud MySQL database; The evolution of the Drizzle:mysql 6.0; Google Cloud SQL: Google clouds MySQL database; Mariadb:mysql's enhanced version of the embedded alternative; MySQL Cluster: MySQL implementation using the NDB cluster storage engine; Percona Server:mysql's enhanced version of the embedded alternative; Proxysql:mysql high-performance agent; TOKUDB: Storage engine for MySQL and mariadb; Webscalesql: Several companies that are facing similar challenges when running MySQL, and the cooperation between their engineers. The branch and evolution of PostgreSQL Yahoo everest-multi-peta-byte database/mpp derived by PostgreSQL. A mixture of hadoopdb:mapreduce and DBMS; IBM Netezza: High-performance data warehousing equipment; POSTGRES-XL: A scalable open source database cluster based on PostgreSQL; RECDB: An open source recommendation engine built entirely within PostgreSQL; Stado: Open source MPP database system, only for data warehouse and data mart applications; Yahoo Everest:postgresql can deduce multi-byte p-bit database/mpp. Memcached Branch and evolution of Facebook Mcdipper: Key/value cache for flash; Facebook Memcached:memcache's branch; Fast, light proxy for twemproxy:memcached and Redis; Twitter Fatcache: Key/value cache for flash; The branch of Twitter Twemcache:memcache. Embedded database Actian psql:pervasive software company developed an acid-compliant DBMS that embeds optimizations in the application; BerkeleyDB: A software library that provides a high-performance embedded database for key/value data; Hanoidb:erlang LSM btree Storage; LevelDB: Google writes a quick key-value repository that provides an ordered mapping from string keys to string values; Lmdb:symas developed ultra-fast, ultra-compact key-value embedded data storage; ROCKSDB: Based on sexual leveldb, embedded persistence key-value store for fast storage. Business Intelligence BIME Analytics: Business Intelligence cloud Platform; Chartio: Lean business intelligence platform for visualization and exploration of data; Datapine: Cloud-based self-service business intelligence tools; Jaspersoft: A powerful business intelligence suite; Jedox Palo: Customized business intelligence platform; Microsoft: Business intelligence software and platforms; Microstrategy: Business intelligence, mobile intelligence and Web application software platform; Pentaho: Business intelligence platform; Qlik: Business intelligence and analytics platform; Saiku: Open source analysis platform; SpagoBI: Open source business intelligence platform; Tableau: Business intelligence platform; Zoomdata: Big Data analysis; Jethrodata: Interactive Big data analytics. Data visualization airpal: Web UI for Prestodb; ARbor: A graphical visual library using Web workers and jquery; Banana: For storage in Kibana in SOLR. The log and timestamp data of the port are visualized; Bokeh: A powerful Python interactive visual library, designed for modern web browsers to showcase, to provide an elegant and concise design for D3.js-style, novel graphics, while expressing this capability through high-performance interactivity in large-scale data or streaming data sets; C3: Based on the D3 reusable Chart library; Cartodb: Open source or Free value-added virtual hosting for geospatial databases with powerful front-end editing capabilities and APIs; CHARTD: A responsive, retina-compatible chart with only an IMG tag; Chart.js: Open source HTML5 chart visualization; Chartist.js: Another open source HTML5 chart visualization; Crossfilter:javascript Library for exploring multivariate large datasets in the browser, with Dc.js and d3.js. Good effect; Cubism: JavaScript Library for time-series visualization; Cytoscape: A JavaScript library for visualizing complex networks; Dc.js: Dimension chart, used with Crossfilter, is rendered by d3.js, which is more adept at connecting graphs/additional metadata, thus hovering around D3 events; D3: JavaScript library for manipulating files; D3.compose: Complex, data-driven visualizations from reusable diagrams and components; D3plus: A fairly powerful set of reusable charts, as well as a d3.js style; Echarts: Baidu Enterprise Scene chart; ENVISIONJS: Dynamic HTML5 visualization; Fnordmetric: Write SQL query, return SVG chart, not table; Freeboard: Building an open-source real-time dashboard for IoT and other web mashups; Gephi: Award-winning open source platform for visualizing and manipulating large graphics and network connections, a bit like Photoshop, but for charts, for Windows and Mac OS X; Google Charts: a simple charting API; Grafana: Graphite instrument panel front end, editor and graphic assembly; Graphite: Extensible real-time chart; Highcharts: Simple and flexible charting API; IPython: Provides a rich architecture for interactive computing; Kibana: Visual log and time stamp data; Matplotlib:python drawing; Metricsgraphic.js: A library based on D3, optimized for time series data; NVD3:d3.js's Chart Component; Peity: Progressive svg bar, polyline, and pie charts; Plot.ly: Easy-to-use Web service that allows you to quickly create complex diagrams, from heat maps to histograms, and upload data using the online spreadsheet of chart plotly for creation and design; Plotly.js: Support Plotly's open source JavaScript graphics library; Recline: Simple but powerful library that uses JavaScript and HTML to build data applications purely; Redash: Open source platform for querying and visualizing data; Shiny: Web application framework for R; Sigma.js:JavaScript Library, dedicated to graphic drawing; Vega: a visual grammar; Zeppelin: A notebook-style collaborative data analysis; Zing Charts: A library of JavaScript charts for big data. IoT and sensor Tempoiq: cloud-based sensor analysis; 2lemetry: Internet of Things platform; PUBNUB: Data Flow network; Thingworx:thingworx is to enable enterprises to quickly create and run the connected application platform; IFTTT:IFTTT is an innovative internet service called the "Web Automation artifact" whose full name is if this and that, meaning "if so, then"; Evrythng:evrythng is a truly public IoT platform that makes many of the products around you smarter. Article recommended NoSQL Comparison (NoSQL comparison)-Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase vs Couchbase vs neo4j vs Hyper Table vs ElasticSearch vs Accumulo vs Voltdb vs scalaris comparison; Big Data Benchmark-the benchmark for Redshift, Hive, Shark, Impala and Stiger/tez; The big data successor of the The spreadsheet (large metadata inheritors of spreadsheets)-the successor to the spreadsheet should be big data. Thesis 2015-2016 2015-facebook-one trillion Edges: Graph processing at Facebook-scale. (One trillion edge: image processing on Facebook scale) 2013-2014 2014-stanford-mining of Massive Datasets. (mass data set mining) 2013-amplab-presto:distributed machine learning and Graph processing with Sparse matrices. (Presto: Distributed machine learning and image processing for sparse matrices) 2013-amplab-mlbase:a distributed machine-learning System. (Mlbase: Distributed machine learning System) 2013-amplab-shark:sql and Rich Analytics at scale. (Shark: Large-scale SQL and rich Analytics) 2013-amplab-graphx:a resilient distributed Graph System on Spark. (GraphX: Flexible distributed graph computing system based on Spark) 2013-google-hyperloglog in practice:algorithmic Engineering of a state of the Art Cardin ality estimation algorithm. (Hyperloglog Practice: An art-form cardinality estimation algorithm) 2013-microsoft-scalable Progressive Analytics on Big Data in the Cloud. (Scalable analytics for big data in the cloud) 2013-metamarkets-druid:a real-time analytical data Store. (Druid: Real-time Analytics data storage) 2013-google-online, asynchronous Schema change in F1. (F1 in-line, asynchronous mode transition) 2013-GOOGLE-F1:A distributed SQL Database that Scales. (F1: Distributed SQL Database) 2013-Google-millwheel:fault-tolerant Stream processing at the Internet scale. (Millwheel: Fault-tolerant streaming at the internet scale) 2013-facebook-scuba:diving into Data at Facebook. (Scuba: Deep into Facebook's data world) 2013-facebook-unicorn:a System for searching the social Graph. (Unicorn: A system for searching social graphs) 2013-facebook-scaling Memcache at Facebook. (Facebook enhancements to Memcache scalability) 2011-2012 2012-twitter-the Unified Logging Infrastructure for Data Analytics at Twitt Er. (Unified logging Infrastructure for Twitter data analysis) 2012-amplab–blink and It's done:interactive Queries on Very Large data. (Blink and its completion: Interactive query for Hyper-scale data) 2012-amplab–fast and Interactive Analytics over Hadoop data with Spark. (Fast, interactive analysis of Hadoop data on spark) 2012-amplab–shark:fast data analytics Using coarse-grained distributed Memory. (Shark: Fast data analysis using coarse-grained distributed memory) 2012-microsoft–paxos replicated state machines as the Basis of a high-performance data Store . (Paxos's replication state machine-the foundation for high-performance data storage) 2012-microsoft–paxos made Parallel. (Paxos algorithm implements parallelism) 2012-amplab–blinkdb:blinkdb:queries with bounded Errors and bounded Response times on Very Large Data. (Query of finite error and bounded response time in ultra-large scale data) 2012-google–processing a trillion cells per mouse click. (Each click processes a trillion cell) 2012-google–spanner:google's globally-distributed Database. (Spanner: Google's Global distributed database) 2011-amplab–scarlett:coping with skewed popularity Content in MapReduce Clusters. (Scarlett: Response to biased content in mapreduce clusters) 2011-amplab–mesos:a Platform for fine-grained Resource sharing in the Data Center. (Mesos: A platform for fine-grained resource sharing in the data center) 2011-google–megastore:providing scalable, highly Available Storage for Interactive Services. ( Megastore: Provides scalable, highly available storage for interactive services) 2001-2010 2010-facebook-finding a needle in Haystack:facebook ' s photo storage. (Explore the nuances of haystack: Facebook image storage) 2010-amplab-spark:cluster Computing with working sets. (Spark: Cluster calculation on a workgroup) 2010-google-storage Architecture and challenges. (Storage architecture and challenges) 2010-google-pregel:a System for large-scale Graph processing. (Pregel: A large-scale graphics processing system) 2010-google-large-scale Incremental Processing Using distributed transactions and noti?cations base of percolator and caffeine. 2010-google-dremel:interactive Analysis of Web-scale Datasets (large-scale incremental processing using distributed transactions and notifications based on percolator and caffeine platforms). (Interactive analysis of Dremel:web scale datasets) 2010-yahoo-s4:distributed Stream Computing Platform. (S4: Distributed streaming computing platform) 2009-hadoopdb:an architectural Hybrid of MapReduce and DBMS Technologies for analytical workloads. (Hybrid MapReduce and DBMS technology used to analyze workloads for the architecture) 2008-AMPLAB-CHUKWA:A large-scale monitoring system. (Chukwa: Large surveillance System) 2007-amazon-dynamo:amazon ' s highly Available key-value Store. (Dynamo: Amazon's highly available key value store) 2006-google-the Chubby Lock Service for loosely-coupled distributed systems. (lock service for loosely coupled distributed systems) 2006-GOOGLE-BIGTABLE:A distributed Storage System for structured Data. (Bigtable: Distributed storage System for structured data) 2004-google-mapreduce:simplied processing on Large Clusters. (MapReduce: Simplified data processing on large clusters) 2003-google-the Google File System. (Google File system)
From for notes (Wiz)
Big Data Resources