Pixelastic

You can clip our wings but we will always remember what it was like to fly.

Posts tagged with "mercurial"

From no versioning to git

When I started working as a professional web developer, I wasn't using any versioning system. When I needed to add a new feature to an existing website, I just coded it and then uploaded it by FTP.

Sometimes, it turns out that the feature was a failure, and that I needed to get back to a previous version. Most of the time, I remembered the old feature and could just type the old code back. Other times, I couldn't and getting the website back to its previous state was a nightmare.

So whenever I felt a feature could go wrong, I started creating manual backup of files before editing them, prefixing them with -version2 and such. It worked ok at first, but after some time, it was impossible, even for me to tell the different versions appart.

I also started to realize a trend on the web, reading from other developers, that there was something call a version-control system. I thought I needed to use one of them, to keep track of different version of files.

I wasn't exactly sure which one was right for me, so I asked a few friends. Most of them told me they were using Subversion at work, but that it had a few shortcomings, but that it was still better than nothing. Another friend told me that he was using Mercurial, both for personal projects and at work, at that his whole team was actually versioning everything with it and were very happy about it.

First steps with mercurial

So I thought I should start using Mercurial. If I have to learn a new tool, I'd rather take one that works well and where I have a friend able to help me on my first steps.

Back in these days, I was still using Windows so I downloaded and installed Mercurial and its GUI version, TortoiseHG. I was quite happy with it the instant I installed it. I could just "commit" (which was a new word for me) my changes and know for sure that they were safe and that I could get back to previous versions of my file easily.

I was committing all my changes at the end of each day, writing a long commit message listing everything I had done in that given day. Then, the next morning I could just re-read the previous commit message and remember what I was doing. The commit message worked a bit like a todo list for what needed to be done.

I was mostly using Mercurial as a backup system. I even opened an account on BitBucket where I uploaded my changes every day, in the case my laptop crashed, I still had backups of everything online.

A little note about FTP

Using Mercurial had another benefical side-effect on my productivity. It made me discover how backward FTP upload was. Before using Mercurial, when I needed to add changes to a website, I connected to the server with FTP and transfered every changed files from my local machine to the server.

Of course, I had to remember which files I changed, and upload them one by one. This was tedious. I had to move from directory to directory before copying the files. I can't tell you how many times I had a bug in production that I couldn't replicate on my local machine, just because I either forgot to upload a file or uploaded it in the wrong place.

Sometimes just to be sure to upload all the files on their correct places, I simply uploaded the full website directory and let the FTP software choose if they needed to be updated or not. But this took ages as the software had to send a request to compare timestamp modification date for each file.

And if I only had two files changed, I had to wait for the FTP to scan each file between the first and the second one before it gets uploaded. This too resulted in bug reports from customer that I couldn't replicate, even on production because by the time I get to check the website, the missing file that was causing the error had been uploaded and everything worked perfectly for me. In the end, it created a list of "ghost bugs" that I was never sure if they were real bugs or created by the FTP upload lag.

Mercurial get rid of all this crap because it was clever enough to know which files were changed from the last deployment, and even which part of the files where changed, sending only the difference. This resulted in much faster upload time and no more errors in missing files. Also, I got bonus security points for transferring through ssl and not clear text FTP.

Sorry for this little FTP disgression. Anyway, as you remember I was using Mercurial as a backup system more than a version-control system. Commiting once a day became cumbersome as I wanted more fine-tuning on what I commited. I ended up commiting twice a day, or even every time I took a break, but then I realized it made much more sense to commit based on feature and not on time.

No more Mercurial

And that's when Mercurial started to get in the way. I often worked on a feature and while doing that, spotted tiny bugs or typos that I fixed while working on the big feature. I wanted to commit only related changes together, and didn't manage to do that well with Mercurial.

I guess this is because I did not really understand how Mercurial worked back in these days, but to be honest I do not understand it better now. Whenever a file is modified in Mercurial, it is ready to be added to the next commit. But when you create a new file or delete one, you have to tell Mercurial that you want this add/delete to be registered in the next commit.

As I said, I sometime just fixed small bugs or typos in files that had not much to do with the feature I was working on at that moment. I would have like to be able to commit only these files, and not the others. I never really managed to make it correctly with Mercurial so in the end I was still commiting more files that I wanted to.

That's when I considered trying git instead of Mercurial. I made some googling on "git vs Mercurial" to see how they differ and the general consensus seems to be that git is more low level than Mercurial. Git has a plethora of commands, most of them you'll never use, while Mercurial is focused on a workflow for the end user that works well. Also, git allow the user to rewrite its history.

After reading all this I was convinced Mercurial was the right tool for me. I was still struggling with my current version system, I didn't want to try one that was even more hard to understand. Plus, rewriting history ? I don't want that to happen in real life, I sure don't want that to happen with my files either.

So I digged deeper in Mercurial, trying to understand it better, to grok it but no, really, I still could'nt make it behave the way I'd like to work with my files so I finally decided to give git a try.

First steps with git

Because there was this thing named github, and all the cool kids where on github, and I too wanted to participate in those big open source projects and I felt like I was held back by the tool I was using.

On my first hours of using git I managed to do what I had struggled to do with Mercurial for so long. I could easily choose which files to commit or not and split my work in several commits. Of course, I could never have done it without google. Git man pages are gibberish and the command line commands and options list is so vast I felt lost. Even when I found what I needed I wondered why the git developers choosed to give them such abstract names without any apparent cohesion in the complete git API.

I had to create aliases to do my day to day tasks with git so as not to remember and type all these crazy commands, but in the end it turns out ok. I keep running into problems, but none that the help of google, StackOverflow and the incredibly rich git API can't solve.

I'm now versioning any new project with git and even converted some old projects from Mercurial to git. I know I'm still in the early stages of learning git (I've only started using branches extensively since a few weeks) but I do really enjoy it a lot.

What's next ?

I've done all my git training by myself, while in New Zealand for a year, away from work and just for fun. I know that when I'll get back to Paris and start looking for a new job I'd like to find a place where the day-to-day workflow involves git because I really want to see the way branching and merging helps people work together building bigger things.

Working on Wednesday #9 : Mercurial

I'm feeling like I'm getting more and more behind schedule for what I intended to do at first. I still haven't tried Rails more than that and have gone on different learning side projects.

I've been reading the Mercurial : Definite Guide the past few days to get a correct grasp of the soft.

I've been using Mercurial for the pas two years, but through a GUI and without using any "advanced" features. I never branched a project, and always worked alone.

Commands

Now that I'm working on a Linux machine every day, I can use hg through the command line.

hg commit -Am "commit message" is the same as hg addremove; hg commit -m "commit message".

hg rollback will remove the last local commit. Useful if you forgot files in the commit, or if you inserted a typo in your commit message

hg revert can revert a file or set of files to the state they were at the last commit. This can also cancel a hg add or hg remove

hg backout can "forget" a commit in the history. It will not really forget the commit (ie. will not let you alter the history). Instead, it will create a new commit where the specified changeset is removed (through a merge). It can easily backout the tip, but may involve more merge work if we want to backout an old changeset.

Automation

Also, I've learned about two great tools of Mercurial.

hg bisect let you isolate a specified commit in your history where you introduced a specific bug. You write a piece of code that, given a changeset, returns true or false based on the bug presence, and hg bisect will cleverly scan the history to find the revision that introduced the bug.

hooks where also very interesting. One can script automatic command on specific hg command like commit, pull, push. Or even before those commande to refuse the command if something does not work as expected.

The classical examples where running a build process after a commit, refusing a commit if no bug id where specified, or if the tests didn't pass. Another use case would be to push changes to a remote server on commit.

Git

Why am I learning Mercurial while all the cool guys are using git ?

Well, I've read a lot of papers comparing hg to git. What I've read the more is that git is an awesome toolbox that lets you do whatever you want with your version control, through its 100+ tools.

On the other hand, Mercurial is far easier to learn and has built-in command for the day to day work. As I was already quite familiar with Mercurial, I stick with it, but know that I'll learn git also eventually.

 

Differences between Mercurial and Subversion

People at my current job are using Subversion, and our project will be tracked using it. I never used Subversion before, the only versionning system I ever used was Mercurial. And moving from one to the other meant changing a lot of reflexs I had.
Here is a little list of changes, mostly as a reminder for myself.

Directories

Mercurial uses a single .hg directory at the root of your project to store all your project history while Subversion adds multiple .svn directories in each directory, to track history changes to that directory only.

I prefer the Mercurial approach, it keeps all changes centralized in one place. You can simply remove the .hg directory to transform your versionned version into a stand-alone one.

While the Subversion approach litters your app with countless hidden directories, making copy and pasting a real pain.

Centralised vs Distributed

Subversion is centralised while Mercurial is distributed.
As far as I understand the difference, it means that Subversion uses one main directory to store the versionned version and can deploy (export) a specific revision at anytime.

That revision do not hold any history information, it is only a copy of your project at a given time.

On the other hand, each Mercurial repository holds both the current public version and all the history. You do not have to deploy anything anywhere, just update your current repo with data from one of the revision.

Tortoise

I am using both TortoiseSVN and TortoiseHg. When you commit with TortoiseSVN it displays the list of files that where updated since the last commit. If you added new files, they won't show unless the "Show unversionned files" is checked.

In TortoiseHg, all new files are automatically seen in the commit window, as well as a diff. It allows me the easily see what changes where made, and help me write my commit message.
I really like the TortoiseHg vision better.

Tracking directories

Subversion can track empty directory, just by adding them. Mercurial can't. You have to add an empty file in each to allow tracking.

Also, when doing a commit in a specific directory in Mercurial, it will commit the whole repo, while with Subversion it will only commit the current repo. I can see the benefits of both and I'm not quite sure which is better.

Subversion will allow me to do a commit of one special feature by committing only one directory, but the TortoiseHg integration help me doing commits more easily no matter where I'm browsing.

Ignoring files

I have some files in my project I don't want to track (like auto-generated cache files). In Mercurial, all I need is editing the .hgignore file with regexps. The syntax can be a little strange sometimes, it took me a while to correcly understand it, but it definitely works.

On Subversion, I can add files to the ignore list so they don't show as "unversionned", but I can also add a svn:ignore property to a specific directory to set regexp to files that I don't want to track. The end result is the same, but the way Mercurial handle it with one file is more appealing to me.

Conclusion

Having all history in one dir and all ignore rules in one files seems a better approach to me. I guess on some aspects Mercurial is still more centralised than Subversion.

My new backup strategy for 2011

My computer was starting to get slower and slower for the past days. And I realized my automatic backup wasn't backing anything up for the past month.

And I realized I had different versions of the same files on my 2 laptops...

Well, it seems I have to do some cleaning up.

Synchronizing paperwork

I started by cleaning up my Dropbox folder. I removed shared folders with past clients, and created a "Paperwork" folder where I put all my invoices, contracts and general paperwork.

I also added my private KeePass file as well as other info I may need to access anywhere, anytime.

KeePass allow me to store all my login/password credentials in a secure way (protected by a master password). It is really useful to have this file on all my computers (and mobile phone).

Dropbox is excellent for storing simple files, that you need everywhere. Being able to access invoices and contracts even from my mobile phone proved quite valuable when meeting clients.

Hard backup of personal files

I've also changed my scheduled backups of personal files. I bought an Acronis True Image last year, and reconfigured it today.

I have a hard drive whose sole purpose is to save backups. I scheduled for the first of each month to save : my system state, my applications configuration, and my personal files (photos, saved gamed, writings, etc).

I manually started all this backups to have a clean start. I also forced the backup to restart a whole new file every 6 month (opposed to using the incremental backup).

Backing up my music and movies

I did not spent too much time figuring how to save my hundred of Go of music and movies. I rarely watch the same movies twice, so losing them won't affect me too much.

I occasionally re-watch series, though, but as most of my friends have the same tastes as I, I could very easily get them back from them, or download them (again).

Regarding music, well, I have quite a big collection, but most of it is already "backed up" on my portable mp3 player.

Automatic synchronizing with BitBucket

On my day work, I now always version my files using Mercurial. BitBucket offers unlimited storage, and unlimited public repositories. Private repo are limited to 5 users. As I'm mostly alone on projects that should stay private, this seems the best deal I could found.

Mercurial being a versionning system, I got all the benefits of a backup here, being able to revert to previous versions, update it whenever I want and access it from anywhere.

I wrote a custom Hg hook on commit to automatically push my repos to BitBucket at least once a day (I'll post the code in a future post).

MySQL Backup

I used to backup mysql databases on my work computer using a windows app. This was slowing down my computer on every boot as well and backup was thus only effective when I was working and not when I was on vacations.

Today, I wanted something a lot more flexible, so I set a cronjob on my main host coupled with a slightly edited autoMySQLBackup script.

This will automatically run everyday at midnight and make a local save (with daily, weekly and monthly rotate) of all my clients databases. Logs are saved on disk and gzipped, and will also be sent to a special backup@pixelastic.com mail address (stored on GMail).

This way I am sure to have my mysql backups on two different hosts, with daily and automatic saves, that I can access from anywhere if anything goes wrong.

Conclusion

It took me almost two full days to get the right tools, configure them and write my custom scripts but now, it is seamlessly integrated with my daily workflow. This is a weight off my shoulders, I know I can safely work as usual and my files are saved and easily accessible.

Using nested subrepos with Mercurial and TortoiseHg

Nowadays, when I'm developping a new website, I almost always ended using parts and bits of the previous website I've done. All my websites are based on the same framework (cakePHP) that I have itself updated with its own CMS (Caracole, more on that later).

Caracole is made of several little plugins, each one of them focusing on a simple task (like handling 404 errors, adding a recycle bin, draftable elements, SEO-friendly url, and so on).

I've also updated each one of this plugins to BitBucket, allowing me to easily commit changes and clone new version from one project to another.

But very often, when working on a specific project, using a specific plugin I think that I can update the plugin (be it either by adding a new feature or fixing a bug I've discovered). In that case, I want my changes to be added to both the plugin (on BitBucket) and the project I'm working on at the moment.

To do that, I had to struggle my way with Mercurial because nested repositories (called subrepos) is not a trivial setup.

Setting up subrepo with Mercurial :

Let me show you the classical and easy way to achieve that :

First, let's say you have your main repo. You go in the directory where you want to add your subrepo and you either create it using hg init or hg clone.

You then go back to your main repo root and edit the .hgsub file (if you don't have this file yet, just create it). Add the following line to the .hgsub :

path/to/your/subrepo = path/to/your/subrepo

Now, on every subsequent commit Hg will be aware that your repo is holding a new subrepo. If you omit this line, Hg will not allow you to commit complaining about a repo inside an other repo.

You can now safely commit your main repo, or your subrepo independently.

Now, let's see the edge case.

Changing a classical sub directory into a subrepo

The classical example above is what you can find in the Mercurial help pages. It wasn't that helpful for me because my setup was a little different and it was causing Hg a lot of trouble.

I was not creating a new subrepo, nor cloning a new one. I had sub diretcory of my main app, that I wanted to change into a subrepo. My sub directory was named 'myplugin' and I had a repo of that name hosted on BitBucket.

So I tried to delete my existing 'myplugin' directory, and clone the 'myplugin' from BitBucket, edit the .hgsub and commit but Hg aborted the operation, complaining about the repo in repo file structure.

After a lot of testing, and cry for help, I finally managed to get it to work. The workflow is almost the same, with one little new step.

Deleting the 'myplugin' folder wasn't enough. I had to tell Mercurial to completly remove this files from its index. Using TortoiseHg, I was able to do that by right clicking on the folder, and then choosing 'TortoiseHg > Remove Files'. Then I had to commit those changes, officially telling Mercurial to forget this files, and putting it in a state where those files aren't there at all.

Then only was I able to clone my repo from BitBucket, edit the .hgsub file and commit my main repo.

cakePHP deployment with Mercurial on Dreamhost

I now use Mercurial on my daily work flow and have set up some methods on my dev machine to ease the pain of installing mercurial and make it work on any new webserver.

Here are some snippets that automate all that stuff. You may have to change one thing or two to accomodate your own setup.

First, I create a custom .bashrc file that I will put on the webserver and create into the following method :

hgInstall() {
mkdir -p ~/.packages/src
cd ~/.packages/src
wget http://www.selenic.com/mercurial/release/mercurial-1.2.tar.gz
tar xvzf mercurial-1.2.tar.gz
cd mercurial-1.2
python setup.py install --home=~/.packages/

echo -e "[ui]\nusername = Pixelastic <tim@mailastic.com>" >> ~/.hgrc
echo -e "[extensions]\nhgext/hbisect=!\nhgext.imerge=! >> ~/.hgrc

. ~/.hgrc
cd ~/
hg version
}

Let me explain. I first create a directory to store the packages I will download (in this example I will only download one package, but as I don't like to have files all around my server, I just keep them in this place). I will then download Mercurial 1.2 in this new directory, unzip it and install it.

Next step is configuring the default user and correcting some bugs with Dreamhost trying to load non-existing extensions (hgext/hbisect and hgext.imerge). As I've made a change to .hgrc, I reload it and get back to the default directory while displaying hg version.

That's almost done, I also have to edit the .bash_profile and add the following lines

export PYTHONPATH=~/.packages/lib/python
export PATH=~/.packages/bin:$PATH

Ok, so this method will download, install on configure Hg on the Dreamhost server. That's all very well, but I had to manually setup the .bashrc, let's see if we cannot automate that as well.

Now, I'm editing my .zsh_aliases on my local machine (or your .bash_aliases if you're using bash) to add the following method

dreamhost() {
scp ~/Documents/Config/Dreamhost/.bashrc ~/Documents/Config/Dreamhost/.bash_profile $1:~/
ssh $1 '. ~/.bashrc'
scp ~/.ssh/id_rsa.pub ~/Documents/Config/Dreamhost/.ssh/xpsfixe.pub $1:~/
ssh $1 'addKeys'
scp ~/Documents/Config/Dreamhost/cakeClearCache.sh $1:~/
ssh $1 'chmod +x ~/cakeClearCache.sh'
ssh $1 'hgInstall'
ssh $1
}

Ok, so this one is a little more complex. You have to call this method with one paremeter, being the user@domain credentials to connect to your Dreamhost server. What it will do is upload (using ssh) files from your local machine to the server and then apply some commands on the machine using ssh.

First it will upload both the local version .bashrc and .bash_profile that are sitting on your dev machine and "reload" the .bashrc, allowing you to use the previously defined hgInstall directly in the shell

Then, it will upload your ssh key(s) to the server and add them to the list of allowed keys (more on that later, just skip the addKeys line for now.)

The next step is uploading (and giving the correct chmod) a special script that will clear cakePHP cache (more on that later too)

And the final step is calling the previously explained hgInstall method. So the only thing you have to do is put this method in your .zsh_aliases (and the corresponding keys, .bashrc and scripts in their corresponding places) then run dreamhost() and Hg will be installed on your server.

So now let me get back a little on the two details I skipped. The first is the key stuff. What I'm doing is uploading your ssh key(s) to the server and then calling addKeys. It will authorize those keys to connect using ssh without having to type login/pass on each request. Here is the addKeys code (you have to put it in your .bashrc file and modify the filename to your own)

addKeys() {
mkdir .ssh
cat id_rsa.pub >> .ssh/authorized_keys
cat xpsfixe.pub >> .ssh/authorized_keys
rm id_rsa.pub
rm xpsfixe.pub
chmod go-w ~
chmod 700 ~/.ssh
chmod 600 ~/.ssh/authorized_keys
}

It will basically create the .ssh dir and authorized_keys file with your keys info. It will then delete the files and set the correct chmod.

And the second part was about that cache clearing thing. When you update your app using Hg, you do not want to update the cache files created by Cake as they contain filepath reference and are likely to be different between your test and prod environment and would surely broke your whole app. So, you set an ignore rule in the .hgignore about them like the following :

syntax:glob
app/tmp/cache/cake_*
app/tmp/cache/views/*.php
app/tmp/cache/models/cake_*
app/tmp/cache/persistent/cake_*

It does work fine almost all the time, but it sometimes lead to errors as the cache is not regenerated between each hg update. Sometimes you have to alter a model schema or the way a value is stored in cache and if you don't clear your cache, it can yield to unexpected results as the data will be wrongly parsed and used.

So what i did to avoid that was to create a script that will clear the cache for you. Here is the code (you have to be inside the project dir for this to work)

cd app/tmp/cache
rm -f cake_*
rm -f views/*\.php
rm -f models/cake_*
rm -f persistent/cake_*
cd ../../../

It will remove all the cache files generated by cake that could interfere after an update. You just have to wrap thoses lines in a method in your .bashrc (mine is called cakeClearCache) and execute it after each update or when you have caching issues.

Well, I think you guessed that I did not stop here. Manually applying the method after each update can be a little tedious. So I put the previous code in a file named cakeClearCache.sh (you can spot that I uploaded this file in the dreamhost() method earlie)r. I also added the following line to my /project/.hg/hgrc on my server (if you don't have this file, just create it, it's a project-based hg configuration file)

[hooks]
update = ~/cakeClearCache.sh

It means that everytime an hg update is done, the specified script is fired. That's really fine for us, it means that cache will be cleared on each update. Sounds good.

One last thing to do was creating the hgrc file automatically. That's why I created the following method (add it to the .bashrc file in the server). It is just a wrapper that will create the hgrc file after doing an hg init

hgInitStart() {
hg init
echo -e "[hooks]\nupdate = ~/cakeClearCache.sh" >> ./.hg/hgrc
}

So instead of doing hg init, just do hgInitStart. You can then start cloning your project here.

And one last thing, I also created a method that will set correct chmod to app/tmp and app/webroot/files

cakeCorrectChmod() {
chmod 777 ./app/tmp -R
chmod 777 ./app/webroot/files -R
}

And created a wrapper around it to call just after having cloned the project that will update it and set the correct chmods

hgInitEnd() {
hg update tip
cakeCorrectChmod
}

That's all. I bet anyone slightly more experienced in shell scripting could do better than that, but as I have struggled a little to get this right I thought I could share it.