Recently, Clay.io's Zoli Kahan began writing "10X" series of posts. Through this series of posts, Zoli will share how to use only a small team to support Clay.io's large-scale applications. The first share is an inventory of the technology used by Clay.io.
Translation
Cloud aspect
CloudFlare
CloudFlare is primarily responsible for supporting DNS and as a caching agent for DDoS attacks, while CloudFlare is also responsible for processing SSL.
Amazon EC2 + VPC + NAT Server
Basically all Clay.io servers are Amazon EC2 types, most of which are medium-large instances. At the same time, we also use Amazon VPC to host all servers in a private network, which makes it difficult for these servers to access outside the network. To access these private networks, we have also configured a NAT server that is also used as a VPN terminal for internal network use.
Amazon S3
We use the Amazon S3 as the back end of the CDN application, which hosts all the static content of Clay.io. For security and performance reasons, we configure a separate domain name--CDN.WTF, a cookie-less domain name.
Haproxy
Haproxy is a very high-performance reverse proxy that we use to route traffic to different services. Based on the nature of Clay.io and the supporting content of the platform (such as legacy code), Haproxy is of great significance. In future articles, I will discuss this in depth.
At the moment, Clay.io is using a HAPROXY server configured with M3.medium. After the increase in traffic, Haproxy is bound to usher in an inevitable upgrade. In addition, we will add an Amazon Elb to the front end and extend it horizontally as needed.
Application Server--docker
Docker is used to manage Linux containers, similar to lightweight virtual rogue (while omitting isolation and security). The benefit of Docker is that you can do code encapsulation and ignore the underlying server features to do arbitrary code reuse/porting, as we will explain in detail below.
Demo Application Server--docker
In Clay.io, the demo environment server is the same as the application server and runs the same binary Docker file as the production environment. This setting environment is important to prevent unnecessary production system disruption and downtime.
Data storage
MySQL
MySQL is a battle-hardened relational SQL database, and most of the current Clay.io data relies on a master-slave MySQL cluster. In this cluster, a master node and two from the node server support most of the user queries. Of course, in the future we may need to fragment the master node. But don't expect it to be too early, because we're considering other databases.
Logstash
Logstash is a log aggregation tool that is analyzed through Kibana. At the moment, it is basically responsible for Clay.io all application logs for use in case of a system error. Using Kibana, we avoided having to SSH to a machine for log checking.
MongoDB
MongoDB is a NoSQL document storage database that is now MongoDB for some of our development endpoints and A/b test framework Flik cannon.
Memcached
Memcached uses a key-value storage type, very similar to Redis. Memcached is primarily responsible for some legacy Web applications in Clay.io to cache MySQL query results. There is no doubt that with the iteration of the system, eventually it will be completely replaced by Redis.
DevOps
Ansible
In our system, Ansible is a choice for server management tools. For most developers, it is very easy to learn and use, ansible can help many day-to-day devops automated processing, by a dedicated operations team responsible.
Other services
GitHub
github--Needless to say, this is a very powerful source control tool.
Uptime Robot
Uptime robot is a free monitoring service that we use to monitor health checks and endpoints. If a problem occurs, it can send us an email and a text message within 5 minutes.
Drone.io
Drone.io is a continuous integration service that we use to continuously run test sets for various projects. It is very similar to Travisci, and recently we have released an open source, self managed version.
Docker Registry
Now, we use the official Docker registry to manage the Docker container, which is similar to Docker dedicated GitHub.
New Relic
The New relic is a server and application monitoring service that is used primarily for server monitoring and reminders when disk and memory are loaded.
Google Analytics
Google Analytics is our main site analysis and tracking tool, used to track the specific features of the site, we use the custom event features.
Google Apps
Google Apps provides mail for our Clay.io domain name and provides a shared Google Drive setting for the organization.
Last Pass
Last Pass is a password management service that allows us to easily share company certification information across all teams.
Future
While we are satisfied with the current set-up, we still expect to upgrade some aspects of the system in a few months. Looking at the original version of the infrastructure, there are many places that don't spend a lot of time designing. And these are the places that have to be the bottleneck for future system expansion.
Kubernetes has shown tremendous potential in large-scale Docker container management and is one of our key tracking projects. There is no doubt that the project will become a part of our production environment when it matures. Amazon Glacier, for database backup, is one of our future goals. RETHINKDB, although the present is very immature, but its potential can not be ignored. Like Docker, we will follow the development of this project. It will be a good choice to discard MySQL in the future. SOURCE Links: How Clay.io Built misspelling 10x architecture Using AWS, Docker, Haproxy, and Lots-More (compilation/Dongyang Zebian/Zhonghao)
Free Subscription "CSDN cloud Computing (left) and csdn large data (right)" micro-letter public number, real-time grasp of first-hand cloud news, to understand the latest big data progress!
CSDN publishes related cloud computing information, such as virtualization, Docker, OpenStack, Cloudstack, and data centers, sharing Hadoop, Spark, Nosql/newsql, HBase, Impala, memory calculations, stream computing, Machine learning and intelligent algorithms and other related large data views, providing cloud computing and large data technology, platform, practice and industry information services.