How To Create a Remote Shared Git Repository

So Github doesn’t work for you? This article is about setting up a private, remote and shared Git repository for you and your team. I’m using Ubuntu 10.10 and 10.04, as well as Fedora Linux (assuming you replace apt-get with yum), anyhow this should work on all other distros with minor modifications.

I’m not going to tell you what Git is, and I’m not going to teach you how to use it. What I will do is guide you through the steps of setting up a Git repository on your VPS and giving you and your team access to that repository over SSH. You have to be familiar with the Git command-line client, so don’t come back and comment if you can’t get it right using TortoiseGit or whatsoever. So, without further ado.

Step 1: Preparation & Accounts Setup

This is one step you wouldn’t like to miss when setting up new repositories. You have to figure out a plan of what you want to achieve and do some initial setup on your VPS (let’s call this the Git server) and your local machines.

Let’s keep it simple for this tutorial — we’ve got small development team: John, Andrew and Robert; and we’ve got a project for which we’d like a git repo. So first of all make sure that john, andrew and robert have SSH access to the Git server. Let’s also keep them in a group called developers — this is the group who’s members will have read and write access to the contents of the repository.

$ sudo groupadd developers
$ sudo useradd -G developers -d /home/john -m -s /bin/bash john
$ sudo useradd -G developers -d /home/andrew -m -s /bin/bash andrew
$ sudo useradd -G developers -d /home/robert -m -s /bin/bash robert

This will create the three users and add them to the developers group. Go ahead and set some initial passwords for your team mates:

$ sudo passwd john
$ sudo passwd andrew
$ sudo passwd robert

And make sure that they can now login to your VPS over the SSH protocol.

Connecting to the Git Server Without a Password

The next step is optional, but really helps you save quite some time. I’m talking about SSH auto-logins, these will prevent the Git server from asking John, Andrew and Robert their passwords every time they’d like to push or pull something. I listed below the code for only John, so you should repeat the same for everybody else on the team. Lines that begin with the hash symbol # are either comments or command line output, you don’t have to type them.

So, get on John’s computer and start by browsing to the .ssh directory and creating an RSA keypair. More about RSA keys in Wikipedia.

$ cd ~/.ssh
$ ssh-keygen -t rsa -C "john@yourcompany.com"
# Generating public/private rsa key pair.
# (text omitted)
# Your public key has been saved in /home/john/.ssh/id_rsa.pub

Great! Now let’s see the contents of the public key, should look something like the following

$ cat id_rsa.pub
# ssh-rsa AAAAb4kzaC1 (text omitted) 86n3iEEQ78cPVazr john@yourcompany.com

Now what you have to do is copy the contents of the created file to your clipboard, we’ll have to write it to a file on our Git server. Start by logging on to the Git server as john (assuming it’s git.yourcompany.com), proceed to the user’s .ssh directory and paste your public key into a file called authorized_keys (create it if it doesn’t exist). Save the file and close it, check that the contents have been written to the file and finally disconnect from the Git server.

$ ssh john@git.yourcompany.com
# Password: ...
$ cd ~/.ssh
$ vi authorized_keys
$ cat authorized_keys
# ssh-rsa AAAAb4kzaC1 (text omitted) 86n3iEEQ78cPVazr john@yourcompany.com
$ exit

If you’ve done everything right, you should now be able to log on to the Git server from John’s computer without having to input a password:

$ ssh john@git.yourcompany.com
# Linux git.yourcompany.com 2.6.18-028stab069.5 #1 SMP Tue May 18 17:26:16 MSD 2010 x86_64 GNU/Linux

Repeat this for Andrew and Robert, then proceed to the next step.

Step 2: Setting Up a Shared Git Repository

This step should be done by somebody who has sudo access to the server or the root user himself. The latter is more dangerous, although saves you from typing sudo every time ;)

So, let’s assume there’s manager user account with sudo access, and the /home/manager home directory. We’ll create our Git repositories there, the manager will own them while the developers will have read and write access. Start by creating a directory called repositories and another one inside called project.git which will host our first project. Initiate a bare repository with the shared flag inside the project.git directory and then change the group of all the files and directories to developers:

$ cd ~
$ mkdir -p repositories/project.git
$ cd repositories/project.git
$ git init --bare --shared=group
# Initialized empty shared Git repository in repositories/project.git/
$ ls
# branches  config  description  HEAD  hooks  info  objects  refs

Being in the project.git directory use sudo to run the chgrp command to change the group of all the files. Use the -R flag to run this recursively:

$ sudo chgrp -R developers .
$ ls -l
# drwxrwsr-x 2 manager developers 4096 2011-01-19 13:38 branches
# -rw-rw-r-- 1 manager developers  126 2011-01-19 13:38 config
# (text omitted)

Okay, so now all your team mates have read and write access to your repository since they’re all in the developers group. Let’s proceed with setting up symbolic links and cloning the repository to the crew to finally start working.

Step 3: Accessing the Shared Git Repository

As mentioned earlier, everybody in the developers group has read and write access to the project.git repository, and yes, they can go ahead and clone it from where it is right now, i.e. repositories/project.git in the manager’s home directory. Now to make it more convenient, let’s set up symbolic links for all our developers to point to the repository, and later see how they can easily access it from their local machines.

$ sudo ln -s /home/manager/repositories/project.git/ /home/john/
$ sudo ln -s /home/manager/repositories/project.git/ /home/andrew/
$ sudo ln -s /home/manager/repositories/project.git/ /home/robert/

Done, so now John, Andrew and Robert can access the repository without having to move away from their home directories. Now let’s get back to John’s (local) computer and try to clone the project into John’s work directory:

$ mkdir -p ~/work/project/
$ cd ~/work/project/
$ git clone john@git.yourcompany.com:project.git .
# Initialized empty Git repository in ~/work/project/.git/

Let’s go ahead and add a readme file to the list, commit and push it to our server:

$ echo "Hello" > readme.txt
$ git add readme.txt
$ git commit -m "Adding a readme file"
$ git push origin master
# Commit messages ...

So if you’ve reached this step and successfully pushed your first commit to your remote Git server, you have done everything right, congratulations! You can now go ahead and clone the project for Andrew and Robert, I’m pretty sure they can’t wait to start pushing and pulling ;)

One more tip, suppose your username is john, you’re cloning a git repository from git.yourcompany.com, and your username on git.yourcompany.com is john as well, you can omit typing john@ when cloning, so it’s simply:

$ git clone git.yourcompany.com:project.git .

This can save you some time if you’re working on several projects and have new ones every day. If your usernames don’t match, you have to prefix the server host with your username followed by the @ symbol. This is more of an SSH trick and not really related to Git, but it’s a good practice to work with Git projects over SSH.

Conclusion

I know there are quite a lot of Git tutorials out there, but what I really wanted to show here is the fact that you can actually save some money. That’s right, save money.

I’ve been working with freelancers and small tech companies all over the world, and yes, everybody simply loves Github. In fact I love it too, but public repositories are not always a solution for some of us. We sometimes work on commercial projects, secret and NDA’yed ones. We sometimes have ideas of our own that we wouldn’t like to share with the rest.

There are private repositories on Github, but they start from $7/mo and you’ll have to get the $12 plan in order to have more than one collaborator, which gives you 10 projects. Yes, hosted is good, you get security and everything, an awesome control panel, issue tracking and much more. But! Sometimes you just don’t care!

Sometimes you’re already running a $30/mo VPS with some hosting company, so why pay more for Git hosting? Right, it may not be too easy running the steps above at first, but once you get used to them, you’ll be init’ing, cloning, pulling and pushing Git repos like crazy ;)

Thanks so much for reading, and of course sharing. Feel free to bug me with questions via the comments section. Cheers!

Installing Python 2.5 on Ubuntu Linux 10.10

If you’ve been working on App Engine and you’ve noticed that some stuff works on your development server but not on your production, it may be related to the different versions of Python. Latest linux builds including Ubuntu 10.04 and 10.10 come with Python 2.6 pre-installed, but Google App Engine still runs Python 2.5 (an issue has been created to add Python 2.6 support, make sure you vote that up).

Their roadmap mentions nothing about upgrading. So in order to make your development server look more like your production servers, you’ll have to get Python 2.5, which is not that trivial at first.

So, Felix Krull has published an archive of new and old Python packages, so let’s use that source to get Python 2.5 running on a new Ubuntu box:

sudo add-apt-repository ppa:fkrull/deadsnakes
sudo apt-get update
sudo apt-get install python2.5

Yup, that was easy! Let’s now see if both Python 2.5 and Python 2.6 are available:

$ python2.5
Python 2.5.5 (r255:77872, Sep 14 2010, 15:51:01)

$ python
Python 2.6.6 (r266:84292, Sep 15 2010, 16:22:56)

All done! Oh and don’t forget to launch your App Engine development server using python2.5 (installing it is not enough):

$ python2.5 dev_appserver.py .

As a bonus to this post, I’d like to share with you my way of working with App Engine, not in terms of code, but in terms of libraries organization. If you’re writing code for App Engine you’re probably working on more than one project at a time, hence you’ll need to use the SDK more than once.

So instead of copying it, replacing Python packages, etc, simply move the google_appengine folder to /usr/share and in every App Engine project create a symbolic link called .gae that points to that location. The SDK will automatically locate all the Google libraries and the development server is easy to launch:

$ ln -s /usr/share/google_appengine/ .gae
$ python2.5 .gae/dev_appserver.py .

Don’t forget the dot at the end, since it tells the SDK which project to launch. And make sure you don’t push the .gae directory to your source control ;) Happy coding!

Upgrading Django on Ubuntu Linux

As I wrote on Twitter a couple of times, I’ve been exploring the world of Django during the latest few weeks. I’m quite impressed with the framework, although there are some issues I’m not yet used to. This post is a short snippet for the Ubuntu users that are struggling with upgrading to the latest Django package. I’m not sure about other Linux distros, but the latest Ubuntu installs django 1.1 which is quite old nowadays. So if you’ve installed Django the following way on Ubuntu:

sudo apt-get install python-django

You’re probably running an outdated version too. Now, to fix this you’ll have to get something newer as a tarball package from the Django Download section and follow the install instructions, which basically run the python setup tools to install the package. After everything’s done, there’s just one minor issue. You have to remove your old django installation as it has a higher priority for python. Use the following code to remove your old version:

sudo apt-get remove python-django

Also note that django may be left somewhere around /usr/local/lib/python2.6/dist-packages/ so make sure you remove the outdated versions from there too. Then run the python interpreter and print out the current version of Django. Make sure that you see something similar:

import django
django.VERSION
# Outputs: (1, 2, 3, 'final', 0), hurray!

Driving the (ve) Server at Media Temple

It’s been a few weeks now since Media Temple launched their new (ve) Server and I’ve been testing it out for a few days now. I’m actually hosting my blog there to experience some real traffic load and my first impressions are awesome!

I started off with the simplest 512 MB server and transferred a few websites to the new platform. I’m not too used to the Ubuntu Linux operating system but I found my way around quickly. They do have other operating systems options, but Ubuntu is the one they recommend. First few tests showed that my load time decreased dramatically compared to my Amazon EC2 instance, which I was quite happy with. Next step was to run a few load tests using the Apache Benchmark tool (ab), and very soon I realized that I got quite a few failed requests, memory shortage and other strange stuff.

Media Temple’s (ve) servers are hosted on the Virtuozzo platform by Parallels, and after browsing their documentation I found out that there’s no swap space available for Virtuozzo containers. They do allow around 80% of burstable RAM (so you get around 1 GB when running 512 MB) but when that runs out, you’re left with nothing, not even some swap space on your hard drive. Some heavy load tests showed 30% request failure, which is quite horrible.

Media Temple don’t give much information on the new platform via the support system and in memory shortage questions in their user forums they advice you to upgrade, of course! Well, I wouldn’t like to upgrade to just run a couple of load tests, and what about Digg-traffic? Should I predict that and upgrade before the spike? Then downgrade again to save some cash? Of course not.

A good option I found here is to tune Apache a little bit, reduce it’s resources limits. This will not increase performance, but may guarantee a 100% fail-safe workflow. We wouldn’t like our users to see a blank page (or a memory shortage error) when a spike hits, but we would rather want them to wait more than often and still load the requested page. The settings mostly depend on what software you’re running, which services and the RAM available in your container.

You might want to reduce the KeepAliveTimeout in your apache settings (mine’s now set to 5), and the rest is up to the mpm prefork module. You’ll have to modify your settings and then run some tests until you’re comfortable with the results. Mine are the following:

<IfModule mpm_prefork_module>
    StartServers 3
    MinSpareServers 2
    MaxSpareServers 5
    MaxClients 10
    MaxRequestsPerChild 0
</IfModule>

This is on a 512 MB (~ 400 more burstable) container. An Apache Benchmark test showed that 100 concurrent (simultaneous) requests performed in 26 seconds with 0% failed requests, this makes 3.84 requests per second, which is quite good. To give a comparison, the same test ran on the mashable.com website gave 30 seconds with 3.32 requests per second, and of course a 0% failure. Also check out other MPMs for Apache which could give results too.

This definitely requires more fine-tuning and if the page load time becomes too high then yes, there is a reason to upgrade, but don’t forget about other performance tricks such as CDNs, gzip (deflate) and others. When you’re done with Apache, proceed to MySQL fine-tuning & php configuration, there are some tricks there too to give you some extra speed & performance.

I’ll keep playing around with this server, plus I’ve purchased a 1GB (ve) this morning, so there’s quite lot of tests that have to be run. Anyways, if you’re looking for a good, high-performance VPS, then Media Temple is definitely a choice to consider. For only $30/mo you can get quite a good looking virtual server. It is more interesting than their old dedicated virtual servers (although still in beta). Cheers, and don’t forget to retweet this post ;)

Cloud Tips: Amazon EC2 & Rejected Email

A few weeks ago I’ve setup my email in the /etc/aliases for user root (and the others) and started to actually read my root email from time to time (I wonder why I never did that before). Anyways, what bugged me straight away is that I had some rejected emails that were not being delivered, yielding the following errors (I removed some numbers):

Deferred: 450 4.7.1 : Helo command rejected: Host not found
421 invalid sender domain 'domU.compute-1.internal' (misconfigured dns?)

And some others that looked alike. Tonnes of them, every four hours! The emails to other addresses were delivered fine though. I had WordPress notification messages delivered to my email, never lost a message. I also tried sending out a few using the mail command via SSH, everything okay. For a second I thought that maybe those addresses were simply invalid, but wouldn’t the server reply with an “Invalid recepient” error? Probably.. Here’s what I got from the Amazon Web Services support forums:

It seems that some remote mail servers complain about your server
identifying itself in the SMTP dialogue as domU.compute-1.internal,
while its external name is ec2.compute-1.amazonaws.com

Makes total sense. Perhaps some servers do try to see where the e-mail is coming from and of course the .internal domain is unresolvable (thus the “dns” misconfiguration error). I had to identify myself with an external, resolvable name. So I copied the external name into the /etc/mailname file and hmm.. Well, it’s been a week now and I haven’t received anymore delivery errors, so that must have worked.

Cloud Tips: Automatic Backups to S3

In a previous post about backing up EC2 MySQL to an Amazon S3 bucket we covered dumping MySQL datasets, compressing them and uploading to S3. After a few weeks test-driving the shell script, I came up with a new version that checks, fixes and optimizes all tables before generating the dump. This is pretty important as mysqldump will fail on whatever step would cause an error (data corruption, crashed tables, etc), thus your uploaded to S3 archive would be kind of corrupt. Here’s the script:

filename=mysql.`date +%Y-%m-%d`.sql.gz
echo Checking, Fixing and Optimizing all tables
mysqlcheck -u username -p password --auto-repair --check --optimize --all-databases
echo Generating MySQL Dump: ${filename}
mysqldump -u username -p password --all-databases | gzip -c9 > /tmp/${filename}
echo Uploading ${filename} to S3 bucket
php /ebs/data/s3-php/upload.php ${filename}
echo Removing local ${filename}
rm -f /tmp/${filename}
echo Complete
<pre>

There you go. If you remember my previous example I stored the temporary backup file on Amazon EBS (Elastic Block Storage) which is quite not appropriate. Amazon charges for EBS storage, reads and writes, so why the extra cost? Dump everything into your temp folder on EC2 and remove afterwards. Don't forget to make changes in your upload.php script ($local_dir settings). Also, just as a personal not and to people who didn't figure out how to upload archives with data to S3, here's another version of the script which takes your public_html (www, htdocs, etc) directory, archives it, compresses and uploads to an Amazon S3 bucket:

<pre>filename=data.`date +%Y-%m-%d`.sql.gz
echo Collecting data
tar -czf /tmp/${filename} /ebs/home/yourusername/www
echo Uploading ${filename} to S3 bucket
php /ebs/data/s3-php/upload.php ${filename}
echo Removing local ${filename}
rm -f /tmp/${filename}
echo Complete

Oh and have you noticed? Amazon has changed the design a little bit, and woah! They’ve finally changed the way they show the Access Secret without a trailing space character! Congrats Amazon, it took you only a few months.

FTP Breaking on FEAT (vsftpd on Fedora Core 8)

It’s been a while since I connected to my Amazon EC2 running Fedora Core 8 via FTP and for no reason I tried connecting there today and badaboom! Strange though, it worked fine about a month ago, I was able to upload and download files, but this time I got a little crash. On one version of FileZilla FTP client I received a simple “Unable to connect” error. On a newer version I noticed that the FEAT (features list, or whatever) command was breaking the connection so I googled that.

People say that the server is broken but they don’t mention any tips on how to fix that. I logged on via SSH, rebooted the vsftpd daemon, with no luck. Then I tried connecting to localhost via FTP (in SSH) using the ftp command. I got a connection, LS and CWD commands worked just fine and I was able to see the files. So I sent a FEAT command and got an “invalid command” error. Humm?

Somebody on the Ubuntu forums mentioned that it’s an encodings issue. Client unable to handle UTF8 though server runs only UTF8. Does that make any sense? Guess not. Well before you go digging into your encoding settings and messing up your configuration files, or shutting down the server and starting a new instance (I’m on Amazon EC2) you might wanna try this fix.

I have no idea how it got there, but in my /etc/vsftpd.conf I found a new strange line saying:

connect_from_port_20=YES

For one second there I thought that it’s fair enough. But hey, wasn’t FTP supposed to work on port 21? Right. Comment out that line, restart your vsftpd daemon (service vsftpd restart) and voila! Worked for me.

I still think it’s strange though.. Ghosts? ;)