While the term cloud computing is not new (Amazon started providing its cloud services in 2006), it has been a real buzzword since 2008, when cloud services from Google and Amazon gained public attention. Google's app engine enables users to build and host Web applications on Google's infrastructure.
Together with S3,amazonweb services also includes the elastic Cloud Compute (EC2) Computing Web service, which can host applications on Amazon's infrastructure. Other companies are starting to get ready to compete with Amazon and Google, including microsoft® Azure, and even Sun Microsystems, whose cloud computing has not been formally marketed, wants a piece. IBM, for example, recently announced that it will provide certain products for developers to use in AmazonEC2 environments.
The Amazonsimple Storage Service (S3) is a publicly available services that Web application developers can use to store digital assets, including pictures, videos, music, and documents. S3 provides a RESTful API to programmatically implement interactions with the service. With this article, you'll learn how to use the Open-source jets3t library to store and retrieve data using Amazon's S3 cloud service.
Introduction to open source S3 cloud Platform
A cloud is an abstract concept that represents a loosely connected group of computers that perform a task or service together, as if using a single entity. The architecture behind this concept is also abstract: Each cloud provider is free to design its products according to their own circumstances. Software as a service (Software as a Service,saas) is a cloud-related concept that the cloud provides a service to the user. Cloud models can reduce user costs because they can run without the need to purchase software and hardware-the service provider has provided the necessary components for the user.
Take Amazon's S3 product for example. As the name suggests, this is a public service that enables WEB developers to store digital assets (such as pictures, videos, music, and documents) for use in applications. When using S3, it is like a machine on the Internet, with a hard drive that contains digital assets. In fact, it involves a number of machines (located geographically) that contain digital assets (or portions of digital assets). Amazon also handles all the complex service requests that can store data and retrieve data. You only have to pay a small fee (about 15 cents a month GB) to store data on Amazon's servers, and 1 dollars to transfer data over Amazon server.
Amazon's S3 service is not replicated, and it exposes the restful API, enabling you to access S3 in any language that supports HTTP communication. The JETS3T project is an open source Java library that abstracts the details of the RESTful API using S3, exposing the APIs as common Java methods and classes. The less code you write, the better, isn't it? It's good to make the most of other people's results. As you'll see in this article, JETS3T makes the work of the S3 and Java languages simpler and fundamentally improves efficiency.
S3 Open source Application Mode Introduction
Theoretically, S3 is a global storage Area network (SAN), which is represented by an oversized hard disk where you can store and retrieve digital assets. Technically, however, Amazon's architecture is somewhat different. The assets that you store and retrieve through S3 are called objects. Objects are stored in a storage segment (bucket). You can use a hard disk analogy: An object is like a file, and a storage segment is like a folder (or a directory). As with hard disks, objects and storage segments can also be found through Uniform Resource identifiers (Uniform Resource Identifier,uri).
For example, on my hard drive, I have a file called Whitepaper.pdf, which is located in a folder called documents in the home directory. Accordingly, the URI of the PDF file is/home/aglover/documents/whitepaper.pdf. In S3, URIs are a bit different. First, the storage segment can only be top-level-it cannot be nested like a folder (or directory) on a nested hard disk. Second, the storage segment must follow the Internet naming law; there is no slash next to the period, the name does not include underscores, and so on. Finally, the storage segment name must be unique in all S3 because the storage segment name is already part of the (s3.amazonaws.com) public URI within the Amazon domain. (The good news is that each account can contain only 100 storage segments, so don't worry that someone else is taking all the good names).
The storage segment is the root of the URI in S3. In other words, the name of the storage segment will be part of the URI that points to an object in S3. For example, if I have a storage segment named Agdocs and an object named Whitepaper.pdf, the URI will be: Http://agdocs.s3.amazonaws.com/whitepaper.pdf.
S3 also provides the ability to specify the owner and permissions of storage segments and objects, just as you would for hardware files and folders. When you define an object or storage segment in S3, you can specify an access control policy that indicates who can access your S3 asset and how to access it (for example, read and Write permissions). Accordingly, you can provide access to your objects in many ways, and using the RESTful API is just one of them.
Here Amazon has a magical DNS magic that users don't have to worry about S3 asset URLs. With Domain name System (DNS) and CNAME (abbreviated canonical name) records, you can map a more customized URL to a S3 URL. In this way, you hide the fact that you (or your application) rely on S3!
Start using open source S3 and jets3t
To start using S3, you need an account. S3 is not free, so you must provide Amazon with a means of payment (such as a credit card number) when creating an account. Do not worry: do not accept the initial installation fee, you only need to pay for the use. For the example of this article, you need to pay less than 1 dollars.
During the creation of an account, you also need to create credentials: access keys and secret keys (like user names and Passwords). (You can also obtain the X.509 certificate; However, you need to use it only if you are using Amazon's soap.) As with any access information, you must keep your secret key in good custody. You will be charged for this because anyone accessing S3 using your credentials. Therefore, whenever you create a storage segment or object, the default behavior is to make all content private, and you must explicitly gain access to the outside world.
With the access key and secret key, you can download jets3t and use it to interact with S3 through the RESTful API.
Using programmatic means to log in S3 through jets3t can be divided into two steps. First, you must create a Awscredentials object and then pass it to the S3service object. The Awscredentials object is very simple. It treats the access key and secret key as a String. The S3service object is actually an interface type. Because S3 provides both the restful API and a SOAP api,jets3t library, it can provide two types of implementations: Rests3service and Soaps3service. For the purposes of this article (including most S3 transactions), the simplicity of the RESTful API makes it a good choice.
Creating a Rests3service instance of a connection is simple, as shown in Listing 1:
Step 1. Create a jets3t Rests3service instance
def awsaccesskey = "Blahblah"
def awssecretkey = "Blah-blah"
def awscredentials = new Awscredentials (Awsaccesskey, Awssecretkey)
def s3service = new Rests3service (awscredentials)
Now you can do some interesting things: for example, create a storage segment, add a movie, and then get a time-limited URL. Actually, it sounds like a business process, doesn't it? This is a business process related to publishing a limited asset, such as a movie.
Create a storage segment
For a fictional movie business, I'll create a storage segment called bc50i. With the help of jets3t, the process is simple. With the S3service type, you have several options. I prefer to use the getorcreatebucket call, as shown in Listing 2. As the name suggests, calling this method may return a storage segment instance (represented as an instance of the S3bucket type) or create a code snippet in S3.
Introduction to the detailed steps of S3 platform
Step 2. Create a storage segment on a S3 server
def bucket = S3service.getorcreatebucket ("bc50i")
Don't be fooled by my simple code example. The JETS3T library is highly extensible. For example, you can quickly determine how many storage segments you have-simply request a S3 Service instance through the Listallbuckets call. The method returns an array of S3bucket instances. For any storage segment instance, you can ask for its name and the date it was created. More importantly, you can control the permissions associated with the jets3t accesscontrollist type. For example, I can get an instance of a bc50i storage segment and allow anyone to read and write publicly, as shown in Listing 3:
Step 3. Modify the Access control list for a storage segment
def bucket.acl = Accesscontrollist.rest_canned_public_read_write
Of course, through the API, you can also delete storage segments at will. Amazon even allows you to specify a location for creating storage segments. Amazon simplifies the complexity of storing actual data, but you can tell Amazon to place the storage segment (and all of its internal objects) in the United States or Europe (currently available options).
Adding objects to a storage segment
Creating an S3 object using the JETS3T API is as simple as manipulating a storage segment. The JETS3T library is also intelligent and can be responsible for handling content types that are related to files in the S3 storage segment. For example, I want to upload a video nerfwars2.mp4 to S3 so that users can watch it for a limited time. Creating a S3 object is as simple as creating a common java.io.File type and associating a s3object type with a storage segment, as shown in Listing 4:
Step 4. Create a S3 object
def s3obj = new S3object (bucket, new File ("/path/to/nerfwars2.mp4″)"
After you initialize s3object with files and storage segments, all you have to do is upload the Putobject method, as shown in Listing 5:
Step 5. Uploading videos
S3service.putobject (bucket, s3obj)
Use listing 5 to complete the upload. Now that the movie is on Amazon server, the key to the movie is its name. Of course, you can override the name to invoke other objects as needed. In fact, the JETS3T API (and the Amazons3restful API) exposes a lot of information for use when you create objects. As we know, you can also provide access control lists. Any object in S3 can save other metadata that the API allows you to create. You can then query for any object through the S3api (and derived jets3t) of the metadata.
Create the URL of an object
So far, my S3 instance has a storage segment that contains the movie. In fact, my movie can be obtained from the following URI: Http://bc50i.s3.amazonaws.com/nerfwars2.mp4. But only I can get. (In this case, I can only access it programmatically because the default access control associated with all content is set to deny any unauthorized access). My goal is to provide a way for the chosen user to view the new movie (within a limited amount of time) until I start charging the access fee (S3 will also help).
Creating a public URL is a handy feature provided by S3; in fact, with S3, you can create a public URL that works only for a period of time (for example, within 24 hours). For the movie I just stored on the S3 server, I'll create a valid URL for 48 hours. Then I will provide the selected user with the URL so that they can download and watch it (assuming they download it within two days).
Process files with validity
To create a time-sensitive URL for a S3 object, you can use the jets3t Createsignedgeturl method, which is a static method of the S3service type. The method takes a storage segment name, an object key (in this case, the movie name, remember?), some vouchers (in the form of a jets3t Awscredentials object), and the due date. If you understand the storage segments and object keys you need, you can quickly get the URL by using the Groovy code in Listing 6:
Step 6. Create a time-sensitive URL
def now = new Date ()
def URL = s3service.createsignedgeturl (
Bucket.getname (), S3obj.key, Awscredentials, now + 2)
With Groovy, I can easily specify a 48-hour qualifying date with the + 2 syntax. The resulting URL looks like this (only one line):
https://bc50i.s3.amazonaws.com/nerfwars2.mp4?AWSAccessKeyId=1asd06A5MR2&Expires=1234738280&Signature=rZvk8Gkms=
S3 can provide a lot of help if your bandwidth and storage requirements are not stable. For example, think about the business model I'm demonstrating-the movie is released at a specific time of the year. In a traditional storage model, you need to buy a lot of space on a rack (or provide the hardware and pipelines to it), which is likely to be large, but then relatively low. However, you cannot pay according to your needs. With S3, the model will pay as needed-the company pays for storage and bandwidth only when needed. More importantly, S3 's security features can help you further specify when people can download video, or even who can download it.
It is easy to implement these requirements with S3. At a high level, creating a restricted movie public download takes 4 steps:
1. Login S3.
2. Create a storage segment.
3. Add the desired video (or object) to the storage segment.
4. Create a time-sensitive URL that points to the video.
That's it!
Use PostScript: Convenient pay on Demand mode
Compared with the traditional storage model, the Pay-as-you-go model of S3 has many obvious advantages. For example, to store a music collection on your hard disk, I had to spend 130 dollars in advance to buy a 500GB storage unit. I don't have 500GB of data to store, so I spent 25 cents GB (though very cheap) for the space I didn't need. I also need to maintain the equipment and pay the electricity bill. If I use Amazon, I don't need to pay 130 dollars in advance for unimportant assets. I just need to pay 10 cents GB without paying for managing and maintaining storage hardware.
Now consider the benefits of implementing these services across the enterprise. Take Twitter, for example, to store pictures for 1 million user accounts on S3. By paying for it, Twitter does not have to spend a lot of money buying hardware infrastructure to store and provide picture services, and it does not need to spend human and part costs to configure and maintain pictures.
The benefits of the cloud are more than that. You can also implement low latency and high availability. Assuming that the assets stored in the Amazon cloud are all over the world, providing content for each location will be faster. More importantly, because your assets are distributed across a variety of machines, your data can remain highly available when some machines (or parts of the network) are paralyzed.
Word, the benefits of AmazonS3 are simple: low cost, high availability, and security. Unless you are a SAN expert and prefer to maintain hardware assets to store data content, Amazon may do better than you. Why do you have to put your money ahead of your hardware in a tight budget (and don't forget that your hardware will depreciate over time)?