Thursday, 31 January 2013

8 core components every PHP application should have to scale


As I find myself working on more and more software applications within different development teams, I'm sometimes amazed at how such large systems are missing some of the most fundamental components. Why they are missing these components is usually due to many reasons, but can include time pressures, developer experience, lack of solution planning.

Below is a list of the core components I really believe should exist in any well architected application. They will help your application scale when the time comes in the near future.

Obviously starting with the most important.

1. Multiple Environment support

An application should support at least the following environments:
  • Development;
  • Staging;
  • Production;
Many applications don't provide this support, but it's such a simple component that will allow you to:
  • Easily bring on more developers without hacking configs.
  • Implement new staging environments (eg: for new features, or for new QA teams).
  • Easily set up a new environment for unit testing.
  • Puts you in a good position when you are ready to incorporate continuous integration.

2. Decouple your configuration from your application

In any application I like to see decoupling of configuration to your PHP code. You can look into separating your configuration into .ini or .yml (or .xml if you really want to). For example:
  • config.ini
  • config_dev.ini
  • config_prod.ini
Your bootstrap will then load the correct config based on which environment you are running your application from.

3. Use version control

I have to admit, I have never worked with an application that isn't under version control. But I have heard of horror stories about developers manually versioning files like below:
  • index.html
  • index1.html
  • index2.html
I have even heard of developers SSH-ing onto the production servers, opening up vim and making modifications directly onto production! This is called "Cowboy Coding", you wont earn much respect from other software engineers if you are caught doing this :-)

Take this a step further, and have a branching strategy. For example:
  • What branch is the development stream?
  • What branch is the bug fix stream?
  • What branch is used on the staging environment?
  • What branch mimics production?
I highly recommend GIT (after being converted a few years ago from SVN). It provides local branching, local commits and so much more than SVN.

If you use GIT and are looking for a branching strategy, take a look at this post - our team implemented a similar variation of this very successfully.

http://nvie.com/posts/a-successful-git-branching-model/

4. Have a single repository (or use submodules in GIT, or externals in SVN)

I have (unfortunately) worked on a few applications that utilise 2 or more repositories, and then they require you to create a collection of magical symlinks between them to get the application up and running.

It's actually a very expensive process (time wise) for an engineering team to maintain multiple repositories that rely on each other (I speak from experience here), and they usually cause a lot of code duplication. Try to stick to a single repository, however, if you need to separate your code for reuse (which is great), take advantage of GIT submodules or SVN externals - that's what they are there for.

5. Have an error logging component available

I have always had the mentality that if you cannot log errors (or activities) within a component, then you shouldn't be building it. Being able to monitor errors or user activity, allows you to then improve the component or feature? Although this next section is in relation to error logging, you could quite easily architect the same component to log more than just errors (like user activity tracking).

2 problems I see in PHP applications are:
  • No centralized logging component.
  • The over use of error_log().
So what problems can error_log() cause?
  • Every environment will most likely be storing logs in a different locations (usually based off the OS or its specific apache/php setup). This makes it really hard to tail error logs when you have to always go on the hunt for them.
  • Some environments will output to screen, some to an error log file - no consistency.
  • It's hard/hacky to turn off logging for certain scenarios (unit testing, or staging environments) if your using error_log().
  • It just doesn't scale, what happens when your application goes from 1 server to 3 servers? All of the error logs are on their own individual servers. Then you will need to use additional software to aggregate into a single location.
  • What if your logging requirements change? You suddenly want to log the users IP, or users ID? How can you intercept the call cleanly to add this information?
Now if we were to implement or have available to us, a single logging component you could do the following:
  • Log to a centralized location, whether that is a database table, or log server.
  • You can scale to 'n' number of servers yet still have a centralized logging location.
  • You create a single entry point to log errors/warnings to:
    $logger->logErr('This is an error');
    $logger->logWarn('This is a warning');
  • As all logs go through a single method, you can easily attach more information in the future when required.
  • If this logging component logs to a database, you can easily make this data available within an admin area. Which can lead to faster error debugging when something goes wrong.

6. Use asynchronous solutions (or a simple queuing component)

No matter what application you work on, there will come a time when you don't want to have to wait for a certain process to finish before returning a response to the user, for example:
  • Sending an email.
  • Making a web service call
  • Running a memory and CPU intense report.
I'm not going to recommend building a complex queuing system. If you need one have a look at:
What I will recommend though, in its simplest form is being able to queue emails to send. When sending an email, your application needs to:

  • Build the email.
  • Connect to the email server.
  • Send the email content.
  • Then wait for a response.
This could take up to a few seconds - time which you don't want your end user to be waiting.

From a very high level, look at saving this information to a database table, allowing your application to return a response to the user almost instantaneously. Then all you need is a simple cronjob or daemon to monitor the database table and process all of the unprocessed records. If you build this component always try to think about how it could be used in the future for something other than sending emails?

7. Support for multiple database adapters

When your traffic and processing requirements begin to increase within your application you will find your single database may start to struggle under the load.

Now imagine.. When your application was first architected that the software engineer had this in mind and provided 2 (or more) database adapters:
  • One for writing (to the master)
  • One (or multiple) for reading (from a read only slave, or even a load balancer sitting in front of many read only slaves).
How easy would it be to suddenly start migrating the heavy read only SQL requests to your read only slave which in turn would dramatically reduce the load on your master database. You could even go one step further by abstracting the multiple adapters away from the application so your library classes automatically direct queries to either the master or slave depending on the query being executed.

Again a very simple component that can really come in handy when you need to start scaling your application quickly.

7. Support translations.

Decoupling your textual content into translation files will give you the following benefits:
  • Non-developers can maintain copy on your site without reliance on your engineering team.
  • Those phrases you use all the time for messages/headings etc will now be in a single place helping to reduce duplication.
  • Single point to update textual copy within your application.
  • When the time comes to support new languages you already have the infrastructure within your application.

8. Unit testing environment support

It's an interesting subject, from experience, some developers love it, some developers don't want to know about it. In my opinion it's because some of the developers pushing for unit testing, sometimes push too hard, too fast, sometimes all you can hear is "TDD this", "TDD that".

Let me first state, I'm not a firm believer of TDD, in practice:
  • It can take a lot of time to maintain;
  • You really need the support of the entire software engineering team;
  • Developers can end up writing tests just for the sake of code coverage.
When you start writing more test code than actual application code it rings alarm bells in my head.. Remember the more code you write, whether its application or test code, the more time required when you have to refactor in the future.

The middle ground..
  • Unit test the core components (payments, registration, messaging etc..);
  • At least unit test the critical path.
  • Having support for unit testing within your application means developers can easily write a few tests to validate a new component.
  • Unit testing code can give you and your manager confidence in your components.
  • Whether you like it or not, you will most likely work with a developer or two who will realize the importance of automated testing - be prepared.
  • Include unit tests within your task estimates (it is part of delivering the entire feature). If you break it out into a separate task it can more easily be dropped by product owners or managers etc..

The last bit..

Some of you may believe this is over engineering.. If you are working without a framework to be honest, it probably is. The better question is, why aren't you using a framework?

Setting up a base project with all or most of the above components shouldn't take any longer than a week. Once it's setup, if it's well architected so that each component is completely decoupled from every other component you can easily drop it into any new project and your ready to go.

You want to create just enough structure in your application to prepare yourself for scaling, but not so much that you spend 3 months setting it all up.

Lastly, all of those reason above is why I use the Symfony 2 PHP framework. It has most of those components I've described above with a standard install. Using bundles within Symfony2 gives you the decoupling of components and the ability to import bundles into new projects without copy and pasting code. If you haven't looked at the Symfony 2 framework, I highly recommend you do.

Hope you learned something from my blog post, any question let me know..

Dion Beetson
Founder of www.ackwired.com
Linked in: www.linkedin.com/in/dionbeetson
Twitter: www.twitter.com/dionbeetson
Website: www.dionbeetson.com

0 comments:

Post a Comment