Running a Drupal website on Amazon EC2
UPDATE: Fixed a small typo (mksf.xfs should have been mkfs.xfs)
As I described in my previous post, this website is currently running on Amazon's Elastic Compute Cloud (EC2). The underlying architecture of the site is based on the Drupal content management system, something that I've also described previously. I was not completely happy with the performance of my previous host (Media Temple), and Amazon EC2 promises to give you the ability to host a virtual machine running whatever you want within Amazon's data centers. You get access to the bandwidth and processing power of a huge online business, but you only pay for what you use.
In my limited testing so far, EC2 flies as a web host and appears to be able to scale for traffic spikes. It also provides a number of unique features, such as incremental snapshots of your data that are stored on S3 and the capability of creating throwaway clones of your server for doing development and testing.
About Amazon Web Services
I won't recount the history and philosophy of Amazon's Web Services (for that, I recommend visiting their site, their blog, or the RightScale blog), but I'd like to at least introduce them and describe why I think they are hugely important. Many web developers are familiar with Amazon's Simple Storage Service (S3), which lets you store gigabytes of data reliably in a globe-spanning system and serve that data to users on the cheap. Less well known is the Elastic Compute Cloud (EC2).
EC2 is a service whereby you can create virtual machines and run them on one of Amazon's data centers. These virtual machines take a slice of that center and simulate the hardware of a physical server. You can choose from a variety of machine configurations that have different processing powers, memory configurations, and virtual hard drive sizes. Once you have picked the class of your machine, you start it by loading an Amazon Machine Image (AMI). You can pick from one of the many public AMIs, most of them based on Linux, or any custom ones you may have created.
The greatest advantage of EC2 is the power it provides you with. The classes of hardware you can choose from start at a pretty powerful level and only increase from there. Your EC2 virtual machine sits on Amazon's data pipe, and thus has tremendous bandwidth available to it. You can create as many machines as you want and start or stop them on demand. You are only billed for the computing time and bandwidth you use.
To date, EC2 has been limited in terms of what you can use it for, due to a lack of persistent storage. You can create a custom virtual machine image that runs on EC2, but that machine runs as if it is in RAM, where all data or changes since the image started are lost when it terminates. Termination can be caused by a software shutdown or by a failure of some sort at the data center. While these failures are infrequent, the possibility of data loss on EC2 made it impractical to host websites on the service. More common applications included large-scale image processing, searches of massive DNA sequence databases, and other computationally intensive tasks that only needed to be fed initial data and to produce a result of some kind.
That changed August 20 with the release of Amazon's Elastic Block Store (EBS). With EBS, you can create persistent virtual drives that look like network-attached storage and that will exist after your EC2 instances have terminated. This enables the operation of EC2-based websites where the core configuration of the server is set up ahead of time and saved as a custom AMI, and the data that changes (logs, databases, HTML files, uploads) is all stored on the persistent drive.
EBS volumes are on RAID-equivalent hardware within a data center, but you will want to create even more reliable backups for critical data. Another new feature introduced with EBS was snapshots, where the EBS volumes can be set up to do automated incremental backups to an S3 bucket. Not only is this providing very dependable storage for the site's data, but the time-based snapshots give you the ability to roll back your site to specific points in time. This is very similar to Mac OS X Leopard's Time Machine backup feature and is a unique and useful capability.
One thing I haven't mentioned is cost. I did say that you only will be billed for what you use. For an entry-level virtual machine (also called an instance), you are billed $0.10 for every hour it is operational. For a web server that is on all the time, that comes to $72 per month. Bandwidth is billed at $0.17 per GB downloaded, with volume discounts available if you pass 10 TB per month. An EBS store will run you $0.10 per GB per month. Amazon has a site where you can estimate the cost of running an EC2 instance. Basically, it's a flat rate of $72 per month, with a small amount on top that scales very nicely with load. This is a little more costly than some virtual private server services, such as Media Temple's (dv) Base at $50 per month, but the fact that you get access to something as powerful as Amazon's data centers and network connections might make up the difference and more.
Setting up your AWS account and tools
If you are still interested in the service, you'll first need to set up an account with Amazon. If you already have a normal shopping account with Amazon, you can simply add the charges for the web services to that. Once you have an Amazon account, go to the AWS site and sign up for the web services. After you've signed up, go to the AWS Access Identifiers page and write down your Access Key ID and Secret Access Key. While you're on that page, create a new X.509 certificate by following their instructions. Download that certificate and your private key. You'll need all these identifiers later.
After the initial signup, you'll need to go to the EC2 page and click on "Sign Up For This Web Service". This will allow you to use EC2 and it will also sign you up for S3. Don't worry, you won't be charged anything until you actually start using the services.
Next, you'll need the best currently available graphical interface to EC2, Elasticfox. Elasticfox is a Firefox plugin that gives you control over all aspects of EC2. In Firefox, visit the link above and install the plugin. To open Elasticfox within Firefox, select the Tools | Elasticfox menu item.
Before you can use Elasticfox, you'll need to configure your Amazon Web Services credentials. Click on the Credentials button in the upper left of the screen, which should bring up a dialog where you can enter your Access Key and Secret Access Key and give your account a name. Enter those, click Add and then click Close.
The next step is to configure a key pair for use when starting up instances. This public / private key pair will allow you to log in as root to a new instance generated off of a public machine image without the use of a password. Click on the KeyPairs tab and then on the green "Create a new key pair" button. Give this new key pair a name and choose a location to save your private key. Click on the Tools icon in the upper right to set the location of the key on your local file system. Change the SSH Key Template field to be something like
changing the middle part of the path to reflect where you actually placed the key file.
That should take care of setting up your tools, now to configure an instance.
Setting up a security group
The first step is to set up a security group. A security group acts like a configuration file for a firewall. It lets you set which ports are open to the world and which are closed.
Click on the Security Groups tab, then on the green plus button to add a new group. Give it a name and description. Select that new group (as this is a browser plugin, you may need to hit the blue refresh icon to see the new group in the list).
By default, all ports are closed, so we'll want to open up the ones we'll use. In the "Group Permissions" section of the window, click on the green checkmark to bring up a dialog asking which ports you'd like to open. Choose the TCP/IP protocol, set a port range of 22 to 22, and click Add. This will open up port 22, necessary if you want to have SSH access to your instances. Repeat the process for port 80 (HTTP) and, if desired, port 443 (HTTPS).
Creating a persistent storage volume
As I said earlier, the addition of persistent storage to EC2 is what makes hosting a website on the service practical. To create a persistent store, first click on the Volumes and Snapshots tab, then click on the green plus button. Choose a size for your volume (I went with 1 GB, because I didn't have a lot of data to store) and an availability zone.
Availability zones are basically Amazon data centers. Currently there are three of them, all on the east coast of the U.S. Remember what availability zone you picked here, because you'll need to select that same zone when starting your instance. An instance can only mount a persistent storage volume existing within the same data center.
Starting and configuring your instance
With all the preparatory work out of the way, it's time to start up a virtual machine and customize it to meet our needs. Click on the AMIs and Instances tab to be brought to the main control panel for starting and monitoring your compute instances. First, we'll need to pick a public image to start from. You can create one from scratch if you want, but that's beyond the scope of this walkthrough. If you do not see a list of AMIs under the Machine Images section, click on the blue refresh button and wait a few seconds for the list to load. The image I chose was one of those contributed by RightScale, a company that provides EC2 services and has contributed quite a bit back to the community. The specific image is ami-d8a347b1, a 32-bit image based on CentOS 5. This particular image is pretty stripped down and should be a good starting point for a no-frills web server.
Find the image in the list using the search function, click to select it, and click on the green Launch Instance(s) button. A dialog will appear to let you configure the virtual hardware characteristics of this instance. Choose an instance type of m1.small, the least powerful of all the hardware types. Select your keypair by name in the dropdown list, and select the same availability zone as you had chosen for your persistent storage volume. Click on the default security group on the right side of the dialog and move it to the left, while moving your custom security group to the right to assign it to this instance. When everything is set, click on Launch.
Your new instance will now boot. This should take a minute or two. You'll need to keep checking on the status of your instance by clicking on the blue refresh icon under the Your Instances section of the screen. Once the status changes from Pending to Running, you're ready to start using the new virtual machine.
The first thing you'll want to do with this machine is to SSH into it. Click to highlight your instance and click on the Open SSH Connection button. If you properly set the location of your private key as described above, a terminal window should open and you should be connected to your virtual machine as the root user. The warning message you get means nothing. It is due to a slight bug in this particular public image.
Now it's time to start installing the packages you'll need to configure a full Drupal site. Execute the following commands to install PHP and MySQL:
yum -y install php yum -y install php-gd yum -y install php-mbstring yum -y install mysql yum -y install mysql-server yum -y install php-mysql
The XFS filesystem seems to be the preferred choice for the persistent storage, due to its ability to freeze the filesystem during a snapshot operation, and you'll need the following to be installed to take advantage of that:
yum -y install kmod-xfs.i686 yum -y install xfsdump.i386 yum -y install xfsprogs.i386 yum -y install dmapi
If you want SSL on your web server, you may want to run the following:
yum -y install mod_ssl
If you're like me, you'll want to create a non-root user to have on hand for day-to-day use. To create that user and set its password, enter the following
adduser [username] passwd [username]
I wanted to do password-based SFTP transfers using that new user because I really like to use Transmit. To enable password-based authentication for users, execute
to edit the SSH configuration and change the appropriate line to read
while leaving password authentication disabled for the root user. Restart the SSH daemon using the following command:
I also wanted to install eAccelerator to speed up PHP on the server, but the yum install of it failed with dependency errors. Therefore, to install it, you'll need to download php-eaccelerator-0.9.5.2-2.el5.remi.i386.rpm and SFTP it over to the server. Once on the server, install it using
rpm -ivh php-eaccelerator-0.9.5.2-2.el5.remi.i386.rpm
Attaching the persistent storage volume
With the base configuration of the image now how we want it, it's time to attach the persistent storage volume. Go back to Elasticfox and select the Volumes and Snapshots tab. Select the volume you created and click on the green checkmark to attach it to your running instance. A dialog will appear asking for you to select the instance to attach this to (there should only be the one in the pull-down list) and a device path. Enter a path of /dev/sdh for this volume and proceed. Your volume should shortly be attached to your instance.
Switch back to the instance to format and mount the volume. Run the following commands to create an XFS filesystem on the device, and create a mount point for it at /persistent.
mkfs.xfs /dev/sdh mkdir /persistent
Edit /etc/fstab and insert the following line at the end of that file:
/dev/sdh /persistent xfs defaults 0 0
Now you can mount the persistent store volume at the path /persistent using the following command:
Anything placed in the /persistent directory will survive sudden termination of the running instance, so you'll want to place log files, databases, the files that define your site architecture, and anything else you don't want to lose in that directory.
The first step is to move users' home directories. You can do this either by moving specific users' directories or by moving the /home directory to /persistent, then using the ln -s command to symbolically link the new location to its old place on the file system.
Next, you will want to move your log files. I chose to move my Apache server logs to the persistent store. To do this, I created a /persistent/log/http directory to store these logs (and a /persistent/log/https for the SSL logs). To point Apache's logging facilities to this directory, you'll need to edit /etc/httpd/conf/httpd.conf and change the following lines:
ErrorLog /persistent/log/http/error_log CustomLog /persistent/log/http/access_log combined
Finally, the MySQL database should be pointed to the persistent store. To do this, create a /persistent/mysql directory and edit the following line in /etc/my.cnf:
With the persistent store volume in place, it is time to install Drupal. To do so, go to the /persistent directory and download and extract the latest version of Drupal (6.4 as far as this writing) using the following commands:
wget http://ftp.drupal.org/files/projects/drupal-6.4.tar.gz tar -zxvf drupal-6.4.tar.gz
This should leave you with a drupal-6.4 directory. Rename this to html and point Apache to it by editing /etc/httpd/conf/httpd.conf again and changing the following lines:
DocumentRoot "/persistent/html" <Directory "/persistent/html">
You will need to set up a cron job to perform automated updates of your Drupal site's search index and other maintenance tasks. To do that, copy the /persistent/html/scripts/cron-lynx.sh to /etc/cron.daily. Edit that file to replace the sample domain name with your own, save it, and make it executable using chmod +x.
A database will need to be created and installed in MySQL for use with Drupal. Before you can do that, make sure the MySQL database is running using the following command:
For security, it's a good idea to set a password for the root MySQL user using the command
mysqladmin -u root password [password]
Next, create a database for your Drupal installation and a user that can access that database. You can use the following to create a database named "drupal" (feel free to change the name) and give the user "drupal" running on the local computer access:
mysqladmin -u root -p create drupal grant all privileges on drupal.* to 'drupal'@'localhost' identified by '[password]';
Finally, direct Drupal to use that database by editing /persistent/html/sites/default/settings.php to change the following line (you may need to make it writeable first, save it, then remove the writeable bit):
$db_url = 'mysql://drupal:[password]@localhost/drupal';
Once that's all set up, your Drupal installation should be good to go. If you had an existing Drupal database from another site, you could import from an SQL dump using
mysql -u root -p drupal < drupal.sql
Otherwise, you can set up a fresh Drupal site by loading the installation page. Your site is currently accessible to the outside world only via a special name that will resolve to its dynamic IP address within Amazon's data center. To obtain that name, go to Elasticfox and double-click on your running instance. Within the dialog that appears, copy the Public DNS Name and paste it into your web browser. Add the path element /install.php and load the resulting page. The remainder of the setup required for Drupal is beyond the scope of this guide, but I direct you to my previous post about Drupal, as well as the main Drupal.org site for more information.
Tuning MySQL and Apache for Drupal
You can spend a long while tweaking your server to run Drupal in an optimal fashion, but I'd like to share some of the settings that seem to work for me right now. Most of these were arrived at through trial-and-error with my simple site, and may not scale to your particular setup, so take them with a grain of salt.
First, let's look into Apache optimizations. A setting that I've found to help reduce transfer sizes on your site is to enable Gzip compression of all pages that Apache serves. To do this, you can add the following to the end of your /etc/httpd/conf/httpd.conf file:
SetOutputFilter DEFLATE BrowserMatch ^Mozilla/4 gzip-only-text/html BrowserMatch ^Mozilla/4\.0 no-gzip BrowserMatch \bMSI[E] !no-gzip !gzip-only-text/html SetEnvIfNoCase Request_URI \ \.(?:gif|jpe?g|png)$ no-gzip dont-vary Header append Vary User-Agent env=!dont-vary
I also noticed a significant performance improvement caused by changing the following line:
For Drupal's Clean URLs to work, the following line needs to be changed under the <Directory "/persistent/html"> section:
Related to Apache tuning are PHP settings. Edit the following lines in /etc/php.ini
max_execution_time = 60 max_input_time = 120 memory_limit = 128M
To apply these settings, restart the Apache server using
MySQL tuning is tricky and is site-dependent. I'll just list the current settings in my /etc/my.cnf file and let you decide if they'll work for your case:
[mysqld_safe] log-error = /var/log/mysqld.log pid-file = /var/run/mysqld/mysqld.pid [mysqld] datadir = /persistent/mysql socket = /var/lib/mysql/mysql.sock user = mysql old_passwords = 1 query_cache_limit = 12M query_cache_size = 32M query_cache_type = 1 max_connections = 60 key_buffer_size = 24M bulk_insert_buffer_size = 24M max_heap_table_size = 40M read_buffer_size = 2M read_rnd_buffer_size = 16M myisam_sort_buffer_size = 32M sort_buffer_size = 2M table_cache = 1024 thread_cache_size = 64 tmp_table_size = 40M join_buffer_size = 1M wait_timeout = 60 connect_timeout = 20 interactive_timeout = 120 thread_concurrency = 4 max_allowed_packet = 50M thread_stack = 128K
Again, to apply these settings, restart the MySQL server using
Setting up persistent store snapshotting
With all of your website's content now stored on the persistent volume, it's time to set up automatic snapshotting of that volume. As mentioned before, one of the unique features of Amazon's Elastic Block Store on EC2 is the capability to do incremental snapshots that are stored on S3. This provides a secure offsite backup of the important data on your website and lets you roll back to the state of your server at the time of any of the snapshots. For example, if your site was hacked two days ago but you only found out about it now, you could restore your site to the state it was in just before that.
The first step in setting this up is to install the latest binaries for Amazon's EC2 AMI and API tools. The latest EC2 API tools can be downloaded from this page and the latest AMI tools can be grabbed from here. I downloaded the Zip files, uploaded them to the running instance, and unzipped them. For the particular AMI that I started from, there was a /home/ec2 directory that contained older versions of the tools. I deleted the /home/ec2/bin and /home/ec2/lib directories and replaced them with the contents of the bin and lib directories from those two Zip files.
You will need to have the X.509 certificate and private key (you downloaded these when setting up your Amazon Web Services account) on your instance, so upload those now. Create a /home/ec2/certs directory and move these pk-*.pem and cert-*.pem files there. In your /root/.bashrc file, add the following lines to make sure that the EC2 tools know where to find your certificate and key:
export EC2_CERT=/root/[certificate name].pem export EC2_PRIVATE_KEY=/root/[private key name].pem
The backup script that will run every hour will need to lock the MySQL database during the snapshot process, so create a /root/.my.cnf file that has the following format:
[client] user=root password=[password]
I use two scripts, with cron calling one which in turn calls the other. The first is called takesnapshot and should be downloaded and placed in /etc/cron.hourly. You will need to edit this file to insert the volume ID of your persistent store. This ID can be found in Elasticfox under the Volumes and Snapshots tab. Finally, make this script executable using chmod +x.
The second script is called ec2-snapshot-xfs-mysql.pl and is a modified version of the one Eric Hammond created for his tutorial here. This one does all the heavy lifting by locking the MySQL database (to ensure that the snapshotted database will be in a workable state upon a restore) and by freezing the XFS filesystem of the volume during the snapshot process. Move this script to /usr/bin, edit it to point to the proper file names of your X.509 certificate and private key, and make it executable.
With all this in place, you should be able to test the snapshot process by manually running the takesnapshot script. If it runs without errors, go to Elasticfox and refresh the Snapshots section of the Volumes and Snapshots tab. Your new snapshot should appear in the list.
Creating and attaching an Elastic IP Address
This site is now fully operational, so it is time to give it a publicly accessible static IP address. Amazon offers what are called Elastic IP Addresses. These are static IP addresses that you can requisition on the fly and attach to any of your running instances. This means that you can have a static IP address that your outside DNS records will point to, but be able to switch it between different instances within EC2. This is extremely useful for development, where you might want to clone your existing site off of a snapshot, try out a new design, and if that design works you would simply switch over the Elastic IP Address to point to the development server to make it live.
Creating and assigning an IP address is simple. Return to Elasticfox and click on the Elastic IPs tab. Within this tab, click on the green plus button to allocate a new address. Unfortunately, Elasticfox does not give you an easy drop-down menu for selecting the ID of your running instance, so go to the AMIs and Instances tab and copy down that ID. Return to the Elastic IPs, select the IP address, and click the green button to associate this IP with an instance. Enter in the instance ID that you wrote down and proceed.
It takes a few minutes for the assignment to propagate through Amazon's routers, but once the process is done you should be able to see your new web site at that static IP address. You can then set up your DNS records to point to this new address.
Bundling and uploading your custom AMI
Before we are finished, you should wrap up your changes to the virtual machine and create a new custom image. Even though your website's data is protected on the persistent store volume, all the configuration changes you've made to the base machine will be reset upon termination of the running instance. To preserve them, you'll need to save them in a new AMI that can be started at any point in the future.
To do this, first shut down MySQL and Apache and unmount your persistent store using the following commands:
/etc/rc.d/init.c/mysql stop /etc/rc.d/init.c/httpd stop umount /persistent
Go to Elasticfox and retrieve your Owner ID from the running instance. Copy that and paste it within the following command, which creates the new AMI:
ec2-bundle-vol --fstab /etc/fstab -c /home/ec2/certs/[certificate] -k /home/ec2/certs/[private key] -u [Owner ID]
This will create the image in the /tmp directory, but that image still needs to be uploaded to S3. Upload it using the following command:
ec2-upload-bundle -b [S3 bucket name] -m /tmp/image.manifest.xml -a [Access Key ID] -s [Secret Access Key]
where the S3 bucket name is a globally unique identifier. It can be the name of an S3 bucket you already use or a new one, in which case the bucket will be created (if the name is available).
You will need to register this new AMI with Elasticfox by going to the AMIs and Instances tab and clicking the green plus button under the Machine Images section. The AMI manifest path that it will ask for is your S3 bucket's name followed by /image.manifest.xml. Elasticfox should add your AMI to the list of public ones (it will be marked "private"). If you don't see it right away, you can do a search for a substring within the name of your bucket.
As a final test, start a new instance based on this custom AMI while your original instance is running. If this new image boots to a running state, SSH into it to make sure that everything is operational. If so, shut down the old instance, dissociate the persistent store volume and Elastic IP from it, and associate them both with the new instance. Mount the /persistent directory on the new instance and start up the MySQL and Apache servers. Your new website should now be complete and running well on EC2.
Conclusion and additional resources
Thank you for reading this far. I'm sorry that this turned into a far longer post than I had intended. I may have gone overboard on the detail, but I hope that this was of some use to you in either starting out with EC2 or learning a bit more about what it offers you.
Unfortunately, you can see that these services are currently very intricate to set up and are aimed at developers, not casual users. Elasticfox, as incredibly impressive a tool as it is, is still limited by the fact that it's a Firefox browser extension and not a full desktop application. I'm sure that the brilliant engineers at Amazon and / or members of the AWS community soon will be designing tools to allow the average user to take advantage of EC2 and their other services. Amazon is sitting on core technology that I believe will have a tremendous impact on the Web over the next 5-10 years as it becomes accessible to more users.
For more information, I recommend reading the "Running MySQL on Amazon EC2 with Elastic Block Store" tutorial by Eric Hammond and the Amazon EC2 Developers Guide. Amazon also has a number of tutorials and other documentation available at their site, along with a reasonably active forum.