High-availability security in the cloud, and look at the Auth0 across the provider cloudy architecture

Source: Internet
Author: User
Keywords Cloud computing Azure public cloud Azure

Auth0 is a "status as a service" start-up company, but also a heavy cloud service users. For them, service outages mean that a lot of user-managed applications cannot log in, so availability is critical to them. Recently, Auth0 Engineering director Jose Romaniello shared a cloudy architecture that they could exempt across providers from a wide range of Microsoft Azure downtime.

The following translation

Auth0 is an "identity as a service" start-up that allows users to ignore the underlying infrastructure and provide authentication, authorization, and single sign-on capabilities for applications on any type of stack, such as mobile, network, local, and so on.

For most applications, authentication is critical. Therefore, at design time we use multi-level redundancy, one of which is managed. Auth0 can be run anywhere: a private cloud, a public cloud, or an entity host that rents a computer room. Given this feature, Auth0 also selects multiple cloud service providers and backs up several geographically diverse areas.

This article will be a very simple introduction to the app.auth0.com infrastructure, as well as a high availability assurance strategy for app.auth0.com.

Core Service Architecture

The architecture of core services is actually very simple:

Front-End servers:

consists of multiple X-large VMs and Ubuntu running on Microsoft Azure. Storage: MongoDB, run on some x-large VMs that have done specialized storage optimizations. Service routing within a node: Nginx

All components of Auth0 are backed up in each node and are identical.

Cloudy/High Availability


Cloud Architecture


Not long ago, Azure had a global outage for hours. During this time, the HA strategy for the Auth0 system was activated and the service was switched to AWS with great flexibility. Most of the services are primarily run on Microsoft Azure (IaaS), and there are also standby auxiliary nodes on AWS. The Auth0 system uses ROUTE53, which is equipped with failover routing strategies. The TTL is set to 60 seconds. The Route53 health Check scans the primary data center, and if the data center is unresponsive (3 times, 10 seconds), the DNS portal is quickly switched to the standby data center. Therefore, for Auth0, the maximum failure time will not exceed 2 minutes. Puppet is deployed to each "push to master". The use of Puppet allows the configuration/deployment process to be independent of cloud provider features. Puppet Master runs on the server we built (now teamcity). MongoDB usually backup in the secondary data center, while the secondary datacenter is set to read-only mode. We backed up all the configuration required for login, including application information, connections, users, and so on. We do not back up transaction data, such as token and logs. When a failover occurs, more than one log record may still be lost. Here, we expect real-time backups across Azure and AWS to be resolved. We have customized the Chaos Monkey to test infrastructure resiliency, code see Https://github.com/auth0/chaos-mona


Automated Test

we conducted a 1000+ unit and integration test. We use Saucelabs to do the cross browser (desktop/mobile) test for the lock and JavaScript login windows. We use Phantomjs/casper to do integration testing. All of these tests will be completed before the product is pushed to the production environment.

CDN

Auth0 CDN Use case is very simple, we need to use it to service the JS Library and (allow other providers, etc.) other configuration, Assset and configuration data will be loaded into the S3. Therefore, our custom domain name (https://cdn.auth0.com) must support TLS. Ultimately, we choose to build our own Sdn. We have tried 3 well-known CDN providers and encountered various problems in this process:

When we did not have our own CDN domain name, we tried the first CDN service. Also at that time, we realized that we must have our own TLS domain name. At that time, the CDN service was expensive if the requirement was to use its own SSL and custom domain name. When used with S3 and gzip, we encountered some configuration problems. Since S3 cannot serve both versions of the same file (zipped and non-zipped), the CDN does not have content-negotiation mechanisms (contents negotiation), so it is not good for some browsers. So we moved to another CDN service, a very cheap CDN service. When using the second CDN service, we encountered too many problems and even some problems could not find the root knot at all. The service provider provides remote support only, so we need to spend a lot of time looking for answers. Some of the problems seem to be caused by S3, and some seem to be routing problems, but we have all sorts of problems. After a second CDN service, we found that the only way to save money does not solve the problem, so we changed a CDN service. Since this CDN service is chosen by GitHub and other high load applications, we think it should meet our needs. However, then we found that there is a very big difference between our needs and GitHub. Because, for GitHub, if CDN service is down, the user just can't see a readme.md mirror, but for our identity as a service, the user of the application service will not be able to log in. Ultimately, we choose to build our own CDN, using Nginx, varnish and S3. It is hosted on every region in AWS and so far it is running very well without any downtime. We use a routing mechanism based on Route53 delay.

Sandbox (for running unauthenticated code)

One feature of Auth0 is that it allows users to run custom code as part of a login transaction, and users can write their own validation rules as needed. Of course, we also build a common knowledge base for some common rules.

The

sandbox is composed of CoreOS, Docker and other components. Allocate the resources in the instance pool for the tenant Docker, as needed. Each tenant allocates a separate Docker instance and uses a recycle policy based on idle time. Use the controller to execute the Recycle policy and use the proxy server to route the request to the corresponding container.



More sandbox designs can be viewed in the November 2014 lecture videos and handouts.

surveillance

At first, we used pingdom (not completely discarded now), but then we found the need to customize our own health Check system (which can be based on node.js for any health test). The following operations will run on all of our AWS region:

uses a sandbox specially developed for the service. We can invoke the sandbox from an HTTP API and publish the Node.js script as an HTTP post. All components are monitored, as well as integrated services, such as logon transactions.



If the health check doesn't work, we'll get a notification from slack, we have two slack channels--#p1和 #p2. If the error occurs 1 times, #p2频道的同事会收到消息, if the error occurs two times in a row, the warning is released through the #p1 channel, and all of our development operators receive SMS (via Twillio).


We use STATSD to get better performance statistics and send metrics to Librato. The following illustration is visible in the example.


We also set reminders based on derived metrics, such as how much a value increases or decreases over time. For example, we have a metric for logging in: if Derivate (logins) > X, and then send an alert to slack.


Finally, we also use Newrelic to monitor infrastructure components.


On the login service, we used Elasticsearch, Logstash and Kibana. Here we use Nginx and mongdb to store the logs. We also use Logstash to parse MONGO logs to identify slower queries.

website

all relevant sites, such as auth0.com, blogs, and so on, are completely unrelated to the application and runtime, and they run on the Ubuntu + Docker vm.

Future

we are migrating to CoreOS and Docker. We have always wanted to move to a model, and based on this model, we can manage the cluster as a whole, rather than doing each node's configuration management separately. Without a doubt, Docker can help us solve some of the operations based on mirrored deployments. MongoDB between AWS and Azure for cross data center backups. Now, we are testing the delay. For all search-related features, we are migrating to Elasticsearch. In this scenario, the MongoDB performance is not very good given the characteristics of the multi-tenant.

Original link: Auth0 architecture-running in ListBox Cloud Providers and Regions (compilation/Dongyang Zebian/Zhonghao)

Welcome to subscribe to the "CSDN cloud" micro-signal for more cloud information.

Welcome to @csdn Cloud Computing Micro Blog for the latest cloud computing and large data information.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.