老外就是能搞,硬讓Redis全面支援SQL。。。彪悍的人生不需要解釋~!
本文是對其博文的翻譯,還有些地方不是太明白,敬請指正~!
英文原文:http://jaksprats.wordpress.com/2010/09/28/introducing-redisql-the-lightning-fast-polyglot/
——————————————————————————華麗麗的分割線————————————————————————————
For about a year, I have been using the
NOSQL datastore redis,
in various web-serving environments, as a very fast backend to store and
retrieve key-value data and data that best fits in lists, sets, and
hash-tables. In addition to redis, my backend also employed mysql, because some
data fits much better in a relational table. Getting certain types of data to
fit into redis data objects would have added to the complexity of the system
and in some cases: it’s simply not doable. BUT, I hated having 2 data-stores,
especially when one (mysql) is fundamentally slower, this created a misbalance
in how my code was architected. The Mysql calls can take orders of magnitude
longer to execute, which is exacerbated when traffic surges. So I wrote Redisql which
is an extension of redis that also supports a large subset of SQL. The Idea was
to have a single roof to house both relational data and redis data and both
types of data would exhibit similar lookup/insert latencies under similar
concurrency levels, i.e. a balanced backend.
我堅持使用NOSQL資料庫redis大概有一年了吧,在各種Web服務環境中,它是一個灰常快的後端儲存,並且可以使用適合的列表、集合和雜湊表來檢索Key-value資料。除了redis,我的後端也會採用MySQL,因為有些資料用關係表來搞更給力。在一些情況下,把得到的符合redis資料物件類型的資料添加到複雜的系統中,這根本就不靠譜。但是,我討厭用兩種資料存放區,尤其是當其中一個(mysql)比較慢的時候,它使得我在設計代碼的時候感覺很不河蟹。流量激增時,那些Mysql調用會花大把的時間去執行。所以我給redis寫了個擴充版本——Redisql,支援大量的SQL子集。當時的想法是搞一坨既有關係資料又有redis資料的東東,兩種類型的資料在相似的並發層級下,有著差不多的查詢和寫入延遲,也就是一個河蟹的後端。
Redisql supports all redis data types and functionality (as
it’s an extension of redis) and it also supports SQL SELECT/INSERT/UPDATE/DELETE
(including joins, range-queries, multiple indices, etc…) -> lots of SQL,
short of stuff like nested joins and Datawarehousing functionality (e.g.
FOREIGN KEY CONSTRAINTS). So using a Redisql library (in your environment’s native language), you can either call redis
operations on redis data objects or SQL operations on relational tables, its
all in one server accessed from one library. Redisql morph commands convert
relational tables (including range query and join results) into sets of redis
data objects. They can also convert the results of redis commands on redis data
objects into relational tables. Denormalization from relation tables to sets of redis hash-tables is possible, as is normalization from
sets of redis hash-tables (or sets of redis keys) into relational tables. Data
can be reordered and shuffled into the data structure (relational table, list,
set, hash-table, OR ordered-set) that best fits your use cases, and the
archiving of redis data objects into relational tables is made possible.
Redisql支援所有的redis資料類型和功能(因為它是redis的擴充),也支援SQL語句
SELECT/INSERT/UPDATE/DELETE (包括串連、範圍查詢、多索引等等),大量的SQL,以及一些嵌套的串連和資料倉儲功能(例如外鍵約束)。所以使用Redisql庫(在你的語言環境下),你既可以調用redis來操作redis的資料對象,也可以操作關係資料表,這一切都只用了單個服務來訪問單個庫。Redisql的變形命令會把關係資料表(包括範圍查詢和串連結果)轉換成redis資料對象集合。同時也能把redis這樣的資料對象的結果轉換成關係表。從目前redis雜湊表的集合(或redis鍵集合)到關係表的標準轉換來看,從關係表到redis的雜湊表的山寨轉換方法也是靠譜的。資料能以最適合你的情況來被重新排序,並塞到資料結構(關係表、列表、集合、雜湊表,或全序集合)中,redis資料對象的歸檔塞入關係表中,這都是可行的。
Not only is all the data under a single
data roof in Redisql, but the lookup/insert speeds are uniform, you can predict
the speed of a SET, an INSERT, an LPOP, a SELECT range
query … so application code runs w/o kinks (no unexpected bizarro waits due to
mysql table locks -> that lock up an apache thread -> that decrease the
performance of a single machine -> which creates an imbalance in the
cluster).
Redisql不僅僅是把各種資料放到同一個容器裡,而且他們查詢和寫入的速度也是統一的,你可以對SET、INSERT、LPOP、SELECT查詢做出預估……所以應用程式代碼能按預期的來運行(不會意外的等待mysql鎖表 -> 鎖apache線程 -> 單機效能降低 -> 叢集不平衡)。
Uniform data access patterns between
front-end and back-end can fundamentally change how application code behaves.
On a 3.0Ghz CPU core, Redis SET/GET run at 110K/s and
Redisql INSERT/SELECT run at 95K/s, both w/ sub millisecond mean-latencies, so all
of a sudden the application server can fetch data from the datastore w/ truly
minimal delay. The oh-so-common bottleneck: “I/O between app-server and
datastore” is cut to a bare minimum, which can even push the bottleneck back
into the app-servers, and that’s great news as app-servers are dead simple
(e.g. add server) to scale horizontally. Redisql is an event-driven
non-blocking asynchronous-I/O in-memory database, which i have dubbed an Evented Relational Database, for brevity’s sake.
只有統一前端和後端之間的資料訪問模式,才能從根本上改變應用程式代碼的這些毛病。在一個3.0GHz的CPU上,redis能達到每秒11萬次的SET/GET,redisql則達到9.5萬次的INSERT/SELECT,兩者的子毫秒級延遲意味著應用程式服務能從資料庫以真正的最小耗時取到資料。常見的瓶頸“應用服務和資料存放區間的I/O”被降低到最低,甚至可以把瓶頸推回給應用程式,丫就可以靠使用簡單的方法(例如增加伺服器)來實現伸縮性,這實在太犀利了。Redisql是一個事件驅動的非阻塞非同步I/O記憶體資料庫,為了簡潔我稱其為事件觸發的關聯式資料庫。
During the development of Redisql, it
became evident that optimizing the number of bytes a row occupied was an incredibly
important metric, as Redisql is an In-Memory database (w/ disk persistence
snapshotting). Unlike redis, Redisql can function if you go into swap space,
but this should be done w/ extreme care. Redisql has lots of memory optimisations, it has been written from the
ground up to allow you to put as much data as is possible into your machine’s
RAM. Relational table per-row overhead is minimal and TEXT columns are stored
in compressed form, when possible (using algorithms w/ negligible performance
hits). Analogous to providing predictable request latencies at high concurrency
levels, Redisql gives predictable memory usage overhead for data storage and
provides detailed per-table, per-index memory usage via the SQL DESC command,
as well as per row memory usage via the “INSERT … RETURN SIZE” command. The predictability of Redisql, which
translates into tweakability for the seasoned programmer, changes the
traditional programming landscape where the datastore is slower than the
app-server.
隨著Redisql的發展,作為一個記憶體資料庫(寫磁碟快照實現持久化),最佳化已佔用的連續位元組很明顯的成為一個非常重要的指標。不像redis,Redisql有很多功能如果你進入交換空間的話,但是得非常小心。Redisql大量使用了記憶體最佳化技術,從根本上允許你把儘可能多的資料寫入到機器記憶體中。關係表每行開銷最小,如果可能的話文本列將被壓縮儲存。就像在高並發下提供可預估的請求延遲一樣,redisql還帶來可預估的資料存放區在記憶體中的使用方式,且提供每個表的詳情,還有通過SQL DESC命令時每個索引的記憶體使用量情況,以及通過“INSERT……RETURN SIZE”命令時每行的記憶體使用量情況。Redisql的可預估性,使得傳統程式員逃出了資料存放區慢於應用服務的魔掌。
Redisql is architected to handle the c10K problem, so it is world class in terms of
networking speed AND all of Redisql’s data is in RAM, so there are no hard disk
seeks to engineer around, you get all your data in a predictably FAST manner
AND you can pack a lot of data into RAM as Redisql aggressively minimizes
memory usage AND Redisql combines SQL and NOSQL under one roof, unifying them
w/ commands to morph data betwixt them …. the sum of these parts, when
integrated correctly w/ a fast app-server architecture is unbeatable as a
dynamic web page serving platform with low latency at high concurrency.
Redisql被設計來解決c10K問題,因此它的公網的速度靠譜且資料都在記憶體中,以至於可以把硬碟打入冷宮,你能以預估的那樣迅速地取到資料,並且能通過很低的記憶體使用量率把很多資料塞進去,何況Redisql是把SQL和NOSQL放在同一個容器中,在他們中使用的是統一的一套命令。
The goal of Redisql is to be the complete
datastore solution for applications that require the fastest data
lookups/inserts possible. Pairing Redisql w/ an event driven language like Node.js, Ruby
Eventmachine, or Twisted Python, should
yield a dynamic web page serving platform capable of unheard of low latency at
high concurrency, which when paired w/ intelligent client side programming,
could process user events in the browser quickly enough to finally realize the
browser as an applications platform.
Redisql的目標是為應用程式提供完整的資料存放區解決方案,使其能夠已最快的速度查詢和寫入。通過使用一些與Redisql匹配的事件驅動語言,像Node.js,Ruby Eventmachine,還有Twisted Python,應該能搞出一套前推500年後推500年都沒有(鳳姐語——譯者注)的低延遲高並發的動態網站服務平台。當搭配上智能用戶端,就能夠足夠迅速地處理瀏覽器中的使用者事件,最終實現用瀏覽器來作為應用程式平台。
Redisql: the polyglot that speaks SQL and redis, was
written to be the Evented Relational Database, the
missing piece in the 100% event driven architecture spanning from browser to
app-server to database-server and back.
Redisql:作為事件觸發的關聯式資料庫,應該說是支援多語言的SQL和redis,缺少的部分在100%事件驅動架構下,會跨越到瀏覽器,再到應用程式伺服器,再到資料庫伺服器,然後返回(蝦米意思?——譯者注)。