Running a Drupal website on Amazon EC2

Amazon Web Services

UPDATE: Fixed a small typo (mksf.xfs should have been mkfs.xfs)

As I described in my previous post, this website is currently running on Amazon's Elastic Compute Cloud (EC2). The underlying architecture of the site is based on the Drupal content management system, something that I've also described previously. I was not completely happy with the performance of my previous host (Media Temple), and Amazon EC2 promises to give you the ability to host a virtual machine running whatever you want within Amazon's data centers. You get access to the bandwidth and processing power of a huge online business, but you only pay for what you use.

In my limited testing so far, EC2 flies as a web host and appears to be able to scale for traffic spikes. It also provides a number of unique features, such as incremental snapshots of your data that are stored on S3 and the capability of creating throwaway clones of your server for doing development and testing.

About Amazon Web Services

I won't recount the history and philosophy of Amazon's Web Services (for that, I recommend visiting their site, their blog, or the RightScale blog), but I'd like to at least introduce them and describe why I think they are hugely important. Many web developers are familiar with Amazon's Simple Storage Service (S3), which lets you store gigabytes of data reliably in a globe-spanning system and serve that data to users on the cheap. Less well known is the Elastic Compute Cloud (EC2).

EC2 is a service whereby you can create virtual machines and run them on one of Amazon's data centers. These virtual machines take a slice of that center and simulate the hardware of a physical server. You can choose from a variety of machine configurations that have different processing powers, memory configurations, and virtual hard drive sizes. Once you have picked the class of your machine, you start it by loading an Amazon Machine Image (AMI). You can pick from one of the many public AMIs, most of them based on Linux, or any custom ones you may have created.

The greatest advantage of EC2 is the power it provides you with. The classes of hardware you can choose from start at a pretty powerful level and only increase from there. Your EC2 virtual machine sits on Amazon's data pipe, and thus has tremendous bandwidth available to it. You can create as many machines as you want and start or stop them on demand. You are only billed for the computing time and bandwidth you use.

To date, EC2 has been limited in terms of what you can use it for, due to a lack of persistent storage. You can create a custom virtual machine image that runs on EC2, but that machine runs as if it is in RAM, where all data or changes since the image started are lost when it terminates. Termination can be caused by a software shutdown or by a failure of some sort at the data center. While these failures are infrequent, the possibility of data loss on EC2 made it impractical to host websites on the service. More common applications included large-scale image processing, searches of massive DNA sequence databases, and other computationally intensive tasks that only needed to be fed initial data and to produce a result of some kind.

That changed August 20 with the release of Amazon's Elastic Block Store (EBS). With EBS, you can create persistent virtual drives that look like network-attached storage and that will exist after your EC2 instances have terminated. This enables the operation of EC2-based websites where the core configuration of the server is set up ahead of time and saved as a custom AMI, and the data that changes (logs, databases, HTML files, uploads) is all stored on the persistent drive.

EBS volumes are on RAID-equivalent hardware within a data center, but you will want to create even more reliable backups for critical data. Another new feature introduced with EBS was snapshots, where the EBS volumes can be set up to do automated incremental backups to an S3 bucket. Not only is this providing very dependable storage for the site's data, but the time-based snapshots give you the ability to roll back your site to specific points in time. This is very similar to Mac OS X Leopard's Time Machine backup feature and is a unique and useful capability.

One thing I haven't mentioned is cost. I did say that you only will be billed for what you use. For an entry-level virtual machine (also called an instance), you are billed $0.10 for every hour it is operational. For a web server that is on all the time, that comes to $72 per month. Bandwidth is billed at $0.17 per GB downloaded, with volume discounts available if you pass 10 TB per month. An EBS store will run you $0.10 per GB per month. Amazon has a site where you can estimate the cost of running an EC2 instance. Basically, it's a flat rate of $72 per month, with a small amount on top that scales very nicely with load. This is a little more costly than some virtual private server services, such as Media Temple's (dv) Base at $50 per month, but the fact that you get access to something as powerful as Amazon's data centers and network connections might make up the difference and more.

Setting up your AWS account and tools

If you are still interested in the service, you'll first need to set up an account with Amazon. If you already have a normal shopping account with Amazon, you can simply add the charges for the web services to that. Once you have an Amazon account, go to the AWS site and sign up for the web services. After you've signed up, go to the AWS Access Identifiers page and write down your Access Key ID and Secret Access Key. While you're on that page, create a new X.509 certificate by following their instructions. Download that certificate and your private key. You'll need all these identifiers later.

After the initial signup, you'll need to go to the EC2 page and click on "Sign Up For This Web Service". This will allow you to use EC2 and it will also sign you up for S3. Don't worry, you won't be charged anything until you actually start using the services.

Next, you'll need the best currently available graphical interface to EC2, Elasticfox. Elasticfox is a Firefox plugin that gives you control over all aspects of EC2. In Firefox, visit the link above and install the plugin. To open Elasticfox within Firefox, select the Tools | Elasticfox menu item.

Before you can use Elasticfox, you'll need to configure your Amazon Web Services credentials. Click on the Credentials button in the upper left of the screen, which should bring up a dialog where you can enter your Access Key and Secret Access Key and give your account a name. Enter those, click Add and then click Close.

The next step is to configure a key pair for use when starting up instances. This public / private key pair will allow you to log in as root to a new instance generated off of a public machine image without the use of a password. Click on the KeyPairs tab and then on the green "Create a new key pair" button. Give this new key pair a name and choose a location to save your private key. Click on the Tools icon in the upper right to set the location of the key on your local file system. Change the SSH Key Template field to be something like

${home}/Documents/Website/EC2/${keyname}.pem

changing the middle part of the path to reflect where you actually placed the key file.

That should take care of setting up your tools, now to configure an instance.

Setting up a security group

The first step is to set up a security group. A security group acts like a configuration file for a firewall. It lets you set which ports are open to the world and which are closed.

Click on the Security Groups tab, then on the green plus button to add a new group. Give it a name and description. Select that new group (as this is a browser plugin, you may need to hit the blue refresh icon to see the new group in the list).

By default, all ports are closed, so we'll want to open up the ones we'll use. In the "Group Permissions" section of the window, click on the green checkmark to bring up a dialog asking which ports you'd like to open. Choose the TCP/IP protocol, set a port range of 22 to 22, and click Add. This will open up port 22, necessary if you want to have SSH access to your instances. Repeat the process for port 80 (HTTP) and, if desired, port 443 (HTTPS).

Creating a persistent storage volume

As I said earlier, the addition of persistent storage to EC2 is what makes hosting a website on the service practical. To create a persistent store, first click on the Volumes and Snapshots tab, then click on the green plus button. Choose a size for your volume (I went with 1 GB, because I didn't have a lot of data to store) and an availability zone.

Availability zones are basically Amazon data centers. Currently there are three of them, all on the east coast of the U.S. Remember what availability zone you picked here, because you'll need to select that same zone when starting your instance. An instance can only mount a persistent storage volume existing within the same data center.

Starting and configuring your instance

With all the preparatory work out of the way, it's time to start up a virtual machine and customize it to meet our needs. Click on the AMIs and Instances tab to be brought to the main control panel for starting and monitoring your compute instances. First, we'll need to pick a public image to start from. You can create one from scratch if you want, but that's beyond the scope of this walkthrough. If you do not see a list of AMIs under the Machine Images section, click on the blue refresh button and wait a few seconds for the list to load. The image I chose was one of those contributed by RightScale, a company that provides EC2 services and has contributed quite a bit back to the community. The specific image is ami-d8a347b1, a 32-bit image based on CentOS 5. This particular image is pretty stripped down and should be a good starting point for a no-frills web server.

Find the image in the list using the search function, click to select it, and click on the green Launch Instance(s) button. A dialog will appear to let you configure the virtual hardware characteristics of this instance. Choose an instance type of m1.small, the least powerful of all the hardware types. Select your keypair by name in the dropdown list, and select the same availability zone as you had chosen for your persistent storage volume. Click on the default security group on the right side of the dialog and move it to the left, while moving your custom security group to the right to assign it to this instance. When everything is set, click on Launch.

Your new instance will now boot. This should take a minute or two. You'll need to keep checking on the status of your instance by clicking on the blue refresh icon under the Your Instances section of the screen. Once the status changes from Pending to Running, you're ready to start using the new virtual machine.

The first thing you'll want to do with this machine is to SSH into it. Click to highlight your instance and click on the Open SSH Connection button. If you properly set the location of your private key as described above, a terminal window should open and you should be connected to your virtual machine as the root user. The warning message you get means nothing. It is due to a slight bug in this particular public image.

Now it's time to start installing the packages you'll need to configure a full Drupal site. Execute the following commands to install PHP and MySQL:

yum -y install php
yum -y install php-gd
yum -y install php-mbstring
yum -y install mysql
yum -y install mysql-server
yum -y install php-mysql

The XFS filesystem seems to be the preferred choice for the persistent storage, due to its ability to freeze the filesystem during a snapshot operation, and you'll need the following to be installed to take advantage of that:

yum -y install kmod-xfs.i686 
yum -y install xfsdump.i386 
yum -y install xfsprogs.i386 
yum -y install dmapi

If you want SSL on your web server, you may want to run the following:

yum -y install mod_ssl

If you're like me, you'll want to create a non-root user to have on hand for day-to-day use. To create that user and set its password, enter the following

adduser [username]
passwd [username]

I wanted to do password-based SFTP transfers using that new user because I really like to use Transmit. To enable password-based authentication for users, execute

nano /etc/ssh/sshd_config

to edit the SSH configuration and change the appropriate line to read
PasswordAuthentication yes

while leaving password authentication disabled for the root user. Restart the SSH daemon using the following command:
/etc/rc.d/init.d/sshd restart

I also wanted to install eAccelerator to speed up PHP on the server, but the yum install of it failed with dependency errors. Therefore, to install it, you'll need to download php-eaccelerator-0.9.5.2-2.el5.remi.i386.rpm and SFTP it over to the server. Once on the server, install it using

rpm -ivh php-eaccelerator-0.9.5.2-2.el5.remi.i386.rpm

Attaching the persistent storage volume

With the base configuration of the image now how we want it, it's time to attach the persistent storage volume. Go back to Elasticfox and select the Volumes and Snapshots tab. Select the volume you created and click on the green checkmark to attach it to your running instance. A dialog will appear asking for you to select the instance to attach this to (there should only be the one in the pull-down list) and a device path. Enter a path of /dev/sdh for this volume and proceed. Your volume should shortly be attached to your instance.

Switch back to the instance to format and mount the volume. Run the following commands to create an XFS filesystem on the device, and create a mount point for it at /persistent.

mkfs.xfs /dev/sdh
mkdir /persistent

Edit /etc/fstab and insert the following line at the end of that file:

/dev/sdh /persistent xfs defaults 0 0

Now you can mount the persistent store volume at the path /persistent using the following command:

mount /persistent

Anything placed in the /persistent directory will survive sudden termination of the running instance, so you'll want to place log files, databases, the files that define your site architecture, and anything else you don't want to lose in that directory.

The first step is to move users' home directories. You can do this either by moving specific users' directories or by moving the /home directory to /persistent, then using the ln -s command to symbolically link the new location to its old place on the file system.

Next, you will want to move your log files. I chose to move my Apache server logs to the persistent store. To do this, I created a /persistent/log/http directory to store these logs (and a /persistent/log/https for the SSL logs). To point Apache's logging facilities to this directory, you'll need to edit /etc/httpd/conf/httpd.conf and change the following lines:

ErrorLog /persistent/log/http/error_log
CustomLog /persistent/log/http/access_log combined

Finally, the MySQL database should be pointed to the persistent store. To do this, create a /persistent/mysql directory and edit the following line in /etc/my.cnf:

[mysqld]
datadir=/persistent/mysql

Installing Drupal

With the persistent store volume in place, it is time to install Drupal. To do so, go to the /persistent directory and download and extract the latest version of Drupal (6.4 as far as this writing) using the following commands:

wget http://ftp.drupal.org/files/projects/drupal-6.4.tar.gz
tar -zxvf drupal-6.4.tar.gz

This should leave you with a drupal-6.4 directory. Rename this to html and point Apache to it by editing /etc/httpd/conf/httpd.conf again and changing the following lines:

DocumentRoot "/persistent/html"
<Directory "/persistent/html">

You will need to set up a cron job to perform automated updates of your Drupal site's search index and other maintenance tasks. To do that, copy the /persistent/html/scripts/cron-lynx.sh to /etc/cron.daily. Edit that file to replace the sample domain name with your own, save it, and make it executable using chmod +x.

A database will need to be created and installed in MySQL for use with Drupal. Before you can do that, make sure the MySQL database is running using the following command:

/etc/rc.d/init.d/mysqld restart

For security, it's a good idea to set a password for the root MySQL user using the command

mysqladmin -u root password [password]

Next, create a database for your Drupal installation and a user that can access that database. You can use the following to create a database named "drupal" (feel free to change the name) and give the user "drupal" running on the local computer access:

mysqladmin -u root -p create drupal
grant all privileges on drupal.* to 'drupal'@'localhost' identified by '[password]';

Finally, direct Drupal to use that database by editing /persistent/html/sites/default/settings.php to change the following line (you may need to make it writeable first, save it, then remove the writeable bit):

$db_url = 'mysql://drupal:[password]@localhost/drupal';

Once that's all set up, your Drupal installation should be good to go. If you had an existing Drupal database from another site, you could import from an SQL dump using

mysql -u root -p drupal < drupal.sql

Otherwise, you can set up a fresh Drupal site by loading the installation page. Your site is currently accessible to the outside world only via a special name that will resolve to its dynamic IP address within Amazon's data center. To obtain that name, go to Elasticfox and double-click on your running instance. Within the dialog that appears, copy the Public DNS Name and paste it into your web browser. Add the path element /install.php and load the resulting page. The remainder of the setup required for Drupal is beyond the scope of this guide, but I direct you to my previous post about Drupal, as well as the main Drupal.org site for more information.

Tuning MySQL and Apache for Drupal

You can spend a long while tweaking your server to run Drupal in an optimal fashion, but I'd like to share some of the settings that seem to work for me right now. Most of these were arrived at through trial-and-error with my simple site, and may not scale to your particular setup, so take them with a grain of salt.

First, let's look into Apache optimizations. A setting that I've found to help reduce transfer sizes on your site is to enable Gzip compression of all pages that Apache serves. To do this, you can add the following to the end of your /etc/httpd/conf/httpd.conf file:

SetOutputFilter DEFLATE
BrowserMatch ^Mozilla/4 gzip-only-text/html
BrowserMatch ^Mozilla/4\.0[678] no-gzip
BrowserMatch \bMSI[E] !no-gzip !gzip-only-text/html
SetEnvIfNoCase Request_URI \
\.(?:gif|jpe?g|png)$ no-gzip dont-vary
Header append Vary User-Agent env=!dont-vary

I also noticed a significant performance improvement caused by changing the following line:

KeepAlive On

For Drupal's Clean URLs to work, the following line needs to be changed under the <Directory "/persistent/html"> section:

AllowOverride All

Related to Apache tuning are PHP settings. Edit the following lines in /etc/php.ini

max_execution_time = 60
max_input_time = 120
memory_limit = 128M

To apply these settings, restart the Apache server using

/etc/rc.d/init.c/httpd restart

MySQL tuning is tricky and is site-dependent. I'll just list the current settings in my /etc/my.cnf file and let you decide if they'll work for your case:

[mysqld_safe]
log-error = /var/log/mysqld.log
pid-file = /var/run/mysqld/mysqld.pid
 
[mysqld]
datadir = /persistent/mysql
socket = /var/lib/mysql/mysql.sock
user = mysql
old_passwords = 1
query_cache_limit = 12M
query_cache_size = 32M
query_cache_type = 1
max_connections = 60
key_buffer_size = 24M
bulk_insert_buffer_size = 24M
max_heap_table_size = 40M
read_buffer_size = 2M
read_rnd_buffer_size = 16M
myisam_sort_buffer_size = 32M
sort_buffer_size = 2M
table_cache = 1024                   
thread_cache_size = 64
tmp_table_size = 40M
join_buffer_size = 1M
wait_timeout = 60
connect_timeout = 20
interactive_timeout = 120
thread_concurrency = 4
max_allowed_packet = 50M
thread_stack = 128K

Again, to apply these settings, restart the MySQL server using

/etc/rc.d/init.c/mysql restart

Setting up persistent store snapshotting

With all of your website's content now stored on the persistent volume, it's time to set up automatic snapshotting of that volume. As mentioned before, one of the unique features of Amazon's Elastic Block Store on EC2 is the capability to do incremental snapshots that are stored on S3. This provides a secure offsite backup of the important data on your website and lets you roll back to the state of your server at the time of any of the snapshots. For example, if your site was hacked two days ago but you only found out about it now, you could restore your site to the state it was in just before that.

The first step in setting this up is to install the latest binaries for Amazon's EC2 AMI and API tools. The latest EC2 API tools can be downloaded from this page and the latest AMI tools can be grabbed from here. I downloaded the Zip files, uploaded them to the running instance, and unzipped them. For the particular AMI that I started from, there was a /home/ec2 directory that contained older versions of the tools. I deleted the /home/ec2/bin and /home/ec2/lib directories and replaced them with the contents of the bin and lib directories from those two Zip files.

You will need to have the X.509 certificate and private key (you downloaded these when setting up your Amazon Web Services account) on your instance, so upload those now. Create a /home/ec2/certs directory and move these pk-*.pem and cert-*.pem files there. In your /root/.bashrc file, add the following lines to make sure that the EC2 tools know where to find your certificate and key:

export EC2_CERT=/root/[certificate name].pem
export EC2_PRIVATE_KEY=/root/[private key name].pem

The backup script that will run every hour will need to lock the MySQL database during the snapshot process, so create a /root/.my.cnf file that has the following format:

[client]
user=root
password=[password]

I use two scripts, with cron calling one which in turn calls the other. The first is called takesnapshot and should be downloaded and placed in /etc/cron.hourly. You will need to edit this file to insert the volume ID of your persistent store. This ID can be found in Elasticfox under the Volumes and Snapshots tab. Finally, make this script executable using chmod +x.

The second script is called ec2-snapshot-xfs-mysql.pl and is a modified version of the one Eric Hammond created for his tutorial here. This one does all the heavy lifting by locking the MySQL database (to ensure that the snapshotted database will be in a workable state upon a restore) and by freezing the XFS filesystem of the volume during the snapshot process. Move this script to /usr/bin, edit it to point to the proper file names of your X.509 certificate and private key, and make it executable.

With all this in place, you should be able to test the snapshot process by manually running the takesnapshot script. If it runs without errors, go to Elasticfox and refresh the Snapshots section of the Volumes and Snapshots tab. Your new snapshot should appear in the list.

Creating and attaching an Elastic IP Address

This site is now fully operational, so it is time to give it a publicly accessible static IP address. Amazon offers what are called Elastic IP Addresses. These are static IP addresses that you can requisition on the fly and attach to any of your running instances. This means that you can have a static IP address that your outside DNS records will point to, but be able to switch it between different instances within EC2. This is extremely useful for development, where you might want to clone your existing site off of a snapshot, try out a new design, and if that design works you would simply switch over the Elastic IP Address to point to the development server to make it live.

Creating and assigning an IP address is simple. Return to Elasticfox and click on the Elastic IPs tab. Within this tab, click on the green plus button to allocate a new address. Unfortunately, Elasticfox does not give you an easy drop-down menu for selecting the ID of your running instance, so go to the AMIs and Instances tab and copy down that ID. Return to the Elastic IPs, select the IP address, and click the green button to associate this IP with an instance. Enter in the instance ID that you wrote down and proceed.

It takes a few minutes for the assignment to propagate through Amazon's routers, but once the process is done you should be able to see your new web site at that static IP address. You can then set up your DNS records to point to this new address.

Bundling and uploading your custom AMI

Before we are finished, you should wrap up your changes to the virtual machine and create a new custom image. Even though your website's data is protected on the persistent store volume, all the configuration changes you've made to the base machine will be reset upon termination of the running instance. To preserve them, you'll need to save them in a new AMI that can be started at any point in the future.

To do this, first shut down MySQL and Apache and unmount your persistent store using the following commands:

/etc/rc.d/init.c/mysql stop
/etc/rc.d/init.c/httpd stop
umount /persistent

Go to Elasticfox and retrieve your Owner ID from the running instance. Copy that and paste it within the following command, which creates the new AMI:

ec2-bundle-vol --fstab /etc/fstab -c /home/ec2/certs/[certificate] -k /home/ec2/certs/[private key] -u [Owner ID]

This will create the image in the /tmp directory, but that image still needs to be uploaded to S3. Upload it using the following command:

ec2-upload-bundle -b [S3 bucket name] -m /tmp/image.manifest.xml -a [Access Key ID] -s [Secret Access Key]

where the S3 bucket name is a globally unique identifier. It can be the name of an S3 bucket you already use or a new one, in which case the bucket will be created (if the name is available).

You will need to register this new AMI with Elasticfox by going to the AMIs and Instances tab and clicking the green plus button under the Machine Images section. The AMI manifest path that it will ask for is your S3 bucket's name followed by /image.manifest.xml. Elasticfox should add your AMI to the list of public ones (it will be marked "private"). If you don't see it right away, you can do a search for a substring within the name of your bucket.

As a final test, start a new instance based on this custom AMI while your original instance is running. If this new image boots to a running state, SSH into it to make sure that everything is operational. If so, shut down the old instance, dissociate the persistent store volume and Elastic IP from it, and associate them both with the new instance. Mount the /persistent directory on the new instance and start up the MySQL and Apache servers. Your new website should now be complete and running well on EC2.

Conclusion and additional resources

Thank you for reading this far. I'm sorry that this turned into a far longer post than I had intended. I may have gone overboard on the detail, but I hope that this was of some use to you in either starting out with EC2 or learning a bit more about what it offers you.

Unfortunately, you can see that these services are currently very intricate to set up and are aimed at developers, not casual users. Elasticfox, as incredibly impressive a tool as it is, is still limited by the fact that it's a Firefox browser extension and not a full desktop application. I'm sure that the brilliant engineers at Amazon and / or members of the AWS community soon will be designing tools to allow the average user to take advantage of EC2 and their other services. Amazon is sitting on core technology that I believe will have a tremendous impact on the Web over the next 5-10 years as it becomes accessible to more users.

For more information, I recommend reading the "Running MySQL on Amazon EC2 with Elastic Block Store" tutorial by Eric Hammond and the Amazon EC2 Developers Guide. Amazon also has a number of tutorials and other documentation available at their site, along with a reasonably active forum.

Comments

I needed to convert the .pem security key for a .ppk using puttygen.exe

Just imported the pem file and the saved the private key and told elasticfox to use that.

Otherwise, rocking tutorial ! Thanks so much !

B

A great little post, very informative! It's people like you who make my life easy :) Ive been looking for this for quite awhile!

Hi brad,you have written an excellent article for the ec2 newbies who want to migrate their drupal to ec2.Infact after reading you article it gives me enough courage to go on ec2.

Can you also add up the email server configuration here which is an essential for drupal?.

As I recall, I didn't need to do anything special to get outbound email working on the site. The configuration steps in the post should take care of that.

However, it's come to my attention that there is a significant problem with sending email from EC2. It appears that the entire EC2 address space has been placed on some spam blacklists, which means that many people will not even see any email generated at your site. (If anyone has created a user account here and not received a confirmation email, send me a message at contact@sunsetlakesoftware.com and I'll fix it.)

The best workaround I could find to this problem is to use an external SMTP server and install the SMTP Authentication Support Drupal module. Once you have that installed on your Drupal site, you'll need to download phpMailer and install its contents to the phpMailer directory within the /sites/all/modules/smtp directory. Finally, you should configure the modules to send mail through your external SMTP server (in my case, I use Google Apps for Domains, so I created a separate mail account for my Drupal site and have all mail go through that). If everything is working properly, your site will now be sending email through that external server and it will be more likely to get through to your users.

Hopefully, a technical solution for the problem of EC2-originated email being marked as spam will be found soon.

Great post Brad! Your guide on persistent storage really helped clarify things for me. I did however have a question for when it came to bundling the AMI. From what I have read on your post and the AWS EC2 dev guide, I have noticed two ways to bundle an image:

ec2-bundle-vol -d /mnt -k /mnt/pk-AZWXJADBEQJKDLN3VTVWYSCLB7FIFXB7.pem -c /mnt/cert-AZWXJADBEQJKDLN3VTVWYSCLB7FIFXB7.pem -u 570945761316 -r i386 -p sampleimage

ec2-bundle-vol --fstab /etc/fstab -c /mnt/cert-AZWXJADBEQJKDLN3VTVWYSCLB7FIFXB7.pem -k /mnt/pk-AZWXJADBEQJKDLN3VTVWYSCLB7FIFXB7.pem -u 570945761316

Why would I use your method over the top method which came from the EC2 documentation? In your method, what does --fstab /etc/fstab do? Why is that needed? Also are the -r and -p parameters optional?

Lastly...what is the difference between ec2-bundle-image and ec2-bundle-vol?

Thanks!

The -r and -p are optional. I should probably use the -p to give the AMI a unique prefix, but I'm usually happy with the default "image".

As far as the --fstab /etc/fstab, that makes sure that the custom fstab you've created is also bundled within the image. Otherwise, it's ignored and an fstab with only the default mount points is used for the AMI. Because we've created the custom /persistent mount point, we'll want to make sure that we don't have to re-edit the fstab if we need to spawn a new instance.

Thanks Brad! I think I can exclude this since I don't want to maintain information about my persistent store within the AMI.

Brad,

I have been researching EC2 for some time, and i must say, this is one of the better tutorials out there. Thanks a ton.

We are experimenting with EC2 and AWS for our new re-design, our main goal is scalability during the busy events.

I have greasped the EC2 and EBS concepts rather well, my 2 questions are,

1. Should each instance have its own EBS assigned, or just those in the same availability zones. Also how can you handle DNS load balancing if not all instances have an elastic IP (i know you are only allowed 5)

2. How can we replicate the EBS across multiple availability zones. Say we use 1 instance as our main drupal administration, how do we replicate the data down to the other instances. Is this done using, scripts or is there something availabile out there.

Thanks in advance, and great tutorial again. I will let you know how this all works out.

Say we have

Glad that you found it useful.

Each EBS volume can only be attached to one instance, so you need to create a new volume for each instance. As far as replication across availability zones, that will require some custom scripting. I believe the RightScale people mentioned possible solutions for this during their excellent EBS webinar, the slides for which can be found at http://www.rightscale.com/lp/ebs_webinar.php .

DNS load balancing is not something I've spent much time on (it will be a long while before I have to worry about that kind of traffic), but I have heard that Amazon will be providing a solution for this sometime soon.

Thanks for your help Brad,

Hey brad,

1 more quick question

I follow your tutorial perfectly and i get everything running great, However i have issues when trying to save my bundle.

1. the ec2-bundle-vol command fails because i have unmounted my drive, and the certs are no longer available. I use the one that is located in the /root dircetory and it seems the image is created successfully, but the size is still the same as my drive that it now unmounted (10GB)

2. When i go back to remount my data to restore the instance, i get the /presistent directory with nothing in it. So i have to start all over again.

Just curious if you could help me out. Feel free to email me, cg@tennis.com

Thanks

Yes, you need to make sure that the EC2 certificate and key are located somewhere on your instance, not on your EBS volume, so that they are accessible when you unmount your persistent data store for AMI generation.

I'm a little confused about what you're asking in the second question. After AMI generation, did you try to remount the /persistent mount point and have nothing show up or are you asking about a new instance created by using that custom AMI? I don't see any reason for the former to occur, as long as your /etc/fstab is set up correctly and your EBS volume is still attached to your instance at /dev/sdh. For the latter, check to make sure that a new EBS volume has been created based off of a snapshot of the old volume and that it is attached to your new instance at /dev/sdh (also, that it is in the same availability zone). You can also try unmounting the drive, removing the /persistent directory, recreating the /persistent directory, and mounting it again.

Yes i have figured all of this out, seems i was a little confused.

I have created my image and uploaded it, also have auto snapshots working as well. I have successfully launched serval instances using my image and converted y snapshot into an ebs volume and attached it to the instance. So all is well.

Thanks again,

Hi there,
So far everything is going fine. But I have existing data and want to transfer that and my files/images etc over to the Instance. I am unable to ftp over to the instance and to send over my existing files etc. How do I set this up without using Transmit. I am running on a Linux machine. But when I login to the instance and do a "rpm -qa | grep ftpd" I get nothing.

Thanks,
RA

I would strongly recommend against FTP for transferring files to your instance for all the standard reasons (passwords sent in cleartext, etc.). Instead, you should use something like

scp [file or directory] [user]@[hostname]:[destination on server]

to copy files to your server via SSH at the command line.

In addition, there are plenty of graphical clients on Linux that support SFTP. KDE's own Konqueror file browser supported SFTP via sftp:// links, the last time I used it.

Hi Brad,

Simply amazing.. every line is precious!!! Thanks very much..

I have one question, the incremental snapshots that we take, how does that work? How can we roll back to a date? The list of snapshots that we see are readily attachable to an IP?

Thanks
Sree

Each snapshot can be used to create an Elastic Block Store volume that has within it the contents of that snapshot. For example, I can clone my web server in the state it was an hour ago by choosing a snapshot from an hour ago, creating a new volume from that snapshot, starting a new instance off of my server AMI, and attaching the volume to that instance. This is great for testing new designs on throwaway servers.

You can also detach the EBS volume from your running public instance and connect a volume created from a snapshot, of course stopping Apache, MySQL, etc. beforehand and starting them afterwards, but it's probably cleaner to create a whole new instance and attach the volume to that.

Hi, this is a very useful reference! Thank you for writing this up.
Just a few followup questions on the Bundling and Uploading Custom AMI section:

1. in the ec-2-bundle-vol command - what do I use as values for [certificate] and [private key]?

2. The /home/ec2/certs directory does not exist - should this be created before running the command?

3. Is it okay to do this on /home/ec2/certs/[certificate] and /home/ec2/certs/[private key] even if /home/ec2 is a symbolic link to /persistent/home/ec2 and /persistent is unmounted from the previous step?

Also:

I couldn not install php-eaccelerator based on your instructions above. The URL http://rpm.pbone.net/index.php3/stat/4/idpl/6807655/com/php-eaccelerator... indicates that
php-eaccelerator-0.9.5.2-2.el5.remi.i386.rpm is obsolete and the file is no longer available for download,

I tried the other newer RPM versions from that site but complained about dependencies:

warning: php-eaccelerator-0.9.5.2-2.el5.i386.rpm: Header V3 DSA signature: NOKEY, key ID 217521f6
error: Failed dependencies:
php-zend-abi = 20050922 is needed by php-eaccelerator-0.9.5.2-2.el5.i386

I'm using the same AMI you suggested in this tutorial. How can I get php-eaccelerator installed in this image?

Thanks again!

In regards to the certificate and key, the relevant part of the instructions is

Quote:
You will need to have the X.509 certificate and private key (you downloaded these when setting up your Amazon Web Services account) on your instance, so upload those now. Create a /home/ec2/certs directory and move these pk-*.pem and cert-*.pem files there. In your /root/.bashrc file, add the following lines to make sure that the EC2 tools know where to find your certificate and key:

[certificate] = cert-xxx.pem
[private key] = pk-xxx.pem

For the way that I describe the process, no you would not want to have /home/ec2 residing on the persistent store, as your certificate and key still need to be available during the AMI creation process.

As far as the eAccelerator RPM goes, if it's no longer available at that link you can download it from my server here.

Thanks for the quick response. I realized I skipped the steps on uploading the certs from the previous snapshot section. I got the snapshots working as well as the AMI bundling, this tutorial is really great!

Just small problem, when running the takesnapshot script, I get the error message at the end, although the snapshot generates fine:

Use of uninitialized value in concatenation (.) or string at /usr/bin/ec2-snapshot-xfs-mysql.pl line 64.
Use of uninitialized value in concatenation (.) or string at /usr/bin/ec2-snapshot-xfs-mysql.pl line 64.

Seems rather harmless as the script is stll able to generate the snapshot, but if you know how to fix this, that would be great! Thanks again!

IT's SIMPLY A GREAT POST ! Thanks Brad

I still didn't get it to work on Amazon. I'm using Wordpress now.

"The article is very informative as well as so creative. You have very great knowledge having this subject. So nice! It’s a great to see you here.

Thanks!
"

Great article, thanks Brad! I'm writing an image processing script for work and plan to use EC2 so I can scale up and down (by cloning) to match demand. If I put all my logfiles on an EBS and mount it (as above), can two or more EC2 instances have the volume mounted and be writing to the logfiles at the same time?

Thanks a lot! Adam

Unfortunately, no. Each volume can only be connected to one instance.

Others have worked around this by replicating a snapshot of their volume for each new instance, but that won't work for any case where data needs to remain in sync. It looks like the best solution for that would be to have a central database instance that all the others communicate with.

Thanks a lot for the clear tutorial Brad, however I do have one question, you wrote :

"leaving password authentication disabled for the root user" --> how do you tell sshd_config to do this?

Right now this is my setting:

PasswordAuthentication yes
UsePAM yes
PermitRootLogin without-password

Just like you I want to use SFTP .. but i want ssh login for root not to be able to use password. Thanks in advanced Brad!

Those should be the correct settings to do what you describe. Is this not working on your instance?

THANK YOU VERY MUCH!

Your article was extremely informative, clear and well-written. I was able to follow through and get an instance up & running without prior EC2 experience (although with Unix experience). I greatly appreciate your contribution, and will refer my colleagues to read your article.

With best regards,
Rao

Thank you for this post, Brad. This is by far the most impressive collection of information on EC2 and webserving in general that I have found anywhere. I followed your directions to the T for one client, and have now adapted them for an Ubuntu-based Rightscale image.

My only suggestion would be to mention that the default location for the EC2 tools directory in these images, /home/ec2, should be moved to someplace not on /persistent- and you'll need to update the environment variables in /root/.bashrc to reflect that.

For other people who like Ubuntu/APT-based systems rather than RPM, at least on the 8.04 Rightscale image, you'll need to "apt-get install ca-certificates" as part of the setup to get EC2 tools working when you go to upload the bundle. That one stumped me for a while.

Good pointers. I believe that I've since moved my tools directory, as you describe. I'v only played around with the CentOS image, but I've heard that there are one or two things you need to do to tweak the Ubuntu images to get Drupal to work well. Abraham Williams pointed out this fix for issues that he was having with placing MySQL on EBS with Ubuntu, as well.

Hello,

thanks for your article. This is most certainly very informative. I do have a question though -- how much were you looking at approximately on a monthly basis for hosting drupal? I'm planning to run a drupal site and your input would be of great help.

thanks again
Lawrence

Right now, depending on how much I'm experimenting with cloned instances, my hosting fees on Amazon run about $72-78 per month. This is a bit high compared to other services, like the Media Temple Gridservice that I migrated from, but for me it has been worth it. The performance, incremental backup, and experimental capabilities EC2 provides are incredibly useful.

Brad --

As many others have said, thanks for great post on EC2. One note on the pricing -- just recently, Amazon offered a $300/year payment option with drastically reduced hourly rates for a running instance. If you know you're going to have your instance up continuously, it's break even after about 5 months, and represents about 40% savings over their normal pricing on a yearly basis.

We're using EC2 for a number of different high-volume production websites, and it works brilliantly. Fast, reliable, flexible, and far less expensive than our co-location provider.

I did have one question -- one of our smaller sites uses WordPress, and as of 2.6 or 2.7 it offers automatic upgrades and automatic plugin updates. I have this working on a number of other non-EC2 sites (and it's great), but I am having trouble getting it working on EC2. I installed vsFTP and opened up the requisite ports on the firewall (the EC2 one), but it's reporting that it cannot find the Plugin (I am sure it's not a permissions thing). Have you used this feature on EC2?

Thank you for these detailed instructions! This is just what I need.

Hi

I read somewhere that when taking a snaphshot - Amazon basically freezes a copy of your Volume - i.e. from its replication store (not the working one) and does the snapshot from this. So Freezing the MySQL and Filesystem is not needed.

I believe it must be true as it is unworkable to take snapshots on a live system that you pause - on a 20GB volume that is almost full up the snapshot can take sometime to create - I dont think anyone wants there system down for X mins of every hour. Thats a hefty percentage.

We basically need to get Amazon to confirm this I think - as I have seen many posts about using the more unstable XFX system becuase it allows this FS lock.

Fantastic article though - I used it religiously when setting up my system.

Cheers

Ian

I go off of the advice of experts like Eric Hammond who say that Amazon does not freeze the state of your EBS for the snapshot, so you need to do it yourself using a filesystem like XFS. If you want an official response, I suggest posting in the Amazon EC2 forum. Many of the in-house personnel hang out there and tend to respond quickly. You might also search their archives to see if this has been asked before (I bet it has).

Also

Just read snapshots are incremental differences. So snapshots every hour acutually should be very quick unless a lot of data is added in that time (possible)

So what happens if you delete previous snapshots - does the system then recover that in later snapshots

Worth a mention in your atricle also.

Ian

Yes, snapshots are incremental. As to speed, that varies on how much has changed in the interim.

If you delete an old snapshot (which you need to do because you will hit their limit for the number of concurrent snapshots allowed), Amazon intelligently saves the blocks of data which were unchanged between the deleted snapshot and the next saved snapshot. It's a really cool setup.

Hi Brad,

Just a quick note to say thanks. I've been procrastinating getting my site up and running for dev on EC2 because I thought the learning curve was going to be steep. Your jump start not only got me up and running quickly but also provided good areas for further investigation in the amazon docs while allowing me to start development.

Thanks very much!

Regards,

Roland Rodriguez
BenchTime Software

Amazing article!

I just had some problems creating a snapshot for a volume on the EU region.

For this just change:

ec2-create-snapshot  $volume_id`;

to

ec2-create-snapshot --region eu-west-1 $volume_id`;

in your ec2-snapshot-xfs-mysql.pl file

All the best!

Congrats for this step-by-step howto, very useful for newbies like me in the cloud...

By the way there is a missing step, that I can't get with, when you have to move users' home directories, after setting a persistent storage.

I'm using the last ubuntu official ec2 AMI (with"ubuntu" as user) and I try to :
sudo mkdir /persistent/home
sudo mv /home/ubuntu /persistent/home/ubuntu
sudo ln -s /persistent/home/ubuntu /home/ubuntu

Issue #1 : sudo: cannot get working directory (after ln)
Issue #2 : because home/ubuntu and all the grant files has been moved, if I loose the connection, I can no more SSH the server and I have to launch a new instance :s

Great guide! It works... except persistent snapshotting(
When I try to run script manually (su -c "/usr/bin/ec2-snapshot-xfs-mysql.pl /persistent vol-be9e7bd7 > /persistent/logfromcron.log") I have the message: Client.MalformedSOAPSignature: Invalid SOAP Signature. Failed to check signature with X.509 cert

I tried to recreate x.509 certificate from AWS Console, then uploaded it instead of old version - nothing changed(

P.S. My instance is running on the EU-WEST location.

What's wrong?

I've just tried replace "ec2-create-snapshot $volume_id`;" with "ec2-create-snapshot --region eu-west-1 $volume_id`;" in ec2-snapshot-xfs-mysql.pl - nothing changed((( I still have "Client.MalformedSOAPSignature: Invalid SOAP Signature. Failed to check signature with X.509 cert" message(((((

Hi, I just read this so this maybe late for you, but for others perhaps,

I had this error too, and I found that it was because I confused the keypair with the private key. The private key is the key you get when you create the certificate at the amazon.com account page. The keypairs are keys you generate to log in via SSH etc.

The error you got might have come from trying to use the keypair pem file instead of your private key file. You will then get that error, since amazon checks if your passed private key file matches your certificate file. It's off throwing since the certificate file is correct.

So, in short, try using the private keyfile instead of the keypair file.

Great post. However, when I tried, as you suggested, to test out the newly created AMI, I encountered some problems with the MySQL setup. I am not able to bring up Drupal (the site offline page came up) and I could not log on MySQL with root. One thing strange is that, when I first launched the instance and before mounting the volume, there is already a mysql directory under /persistent.

Your help would be greatly appreciated.

With the new AMI, I also have difficult detaching the Volume or, sometimes, even unmounting /persistent. The devise is always busy.

Problem solved. Thanks!

Hi, you said to lock the database in order to do a snapshot. How can this be feasible on a production site?
Doesn't it mean the site won't be writable for the duration?

I've operated my site in this manner and not noted any problems with hangs or lost data. I can't comment on how this would affect a higher-volume site. This may be another one of those questions that needs to be posed within Amazon's forums.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <h3>
  • Lines and paragraphs break automatically.
  • You can enable syntax highlighting of source code with the following tags: <code>, <blockcode>. The supported tag styles are: <foo>, [foo].
  • You may quote other posts using [quote] tags.

More information about formatting options

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
1 + 1 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.
Syndicate content