(Miles Johnson)

Docsinated

I'm quite ashamed. I work constantly on all my PHP codebases for myself and everyone that uses them... but... I rarely update my actual documentation. After multiple emails from frustrated developers letting me know my docs make no sense and are quite outdated, I went on an update spree. I've spent the last few months updating all my codebases to a final stable release so that I may cease development on all of them (I need to free up some time). During this process I closed any outstanding issues and bugs, tagged new versions and have been recently converting all my CakePHP scripts to use the latest 2.0 version. Alongside that, I took the time to update all the documentation and changelogs on this site; the docs will always reference the latest version and any old docs have been removed.

I've always had a place in my heart for my stand-alone PHP scripts as they are very simplistic implementations of what I needed back in the day. It's quite heartening to get emails from new developers letting me know these scripts have taught them. Here's the current list:

TypeConverter
Decoda
Databasic
Formation
Gears
Compression
Numword
Resession
Statsburner

As for my CakePHP codebases, I will constantly keep those updated if any bugs come up, but will not be adding any new features for the most part. Since CakePHP 2 was released, I had to spend some time updated all my projects. I created new branches for the old 1.3 codebase and turned the master branch into the new 2.0 repository. By doing this, it allows me to keep both CakePHP versions up to date in parallel. It also allowed me to tag new major versions for all my projects. Below are my current projects (with possibly more to come):

Forum
Uploader
AutoLogin
AjaxHandler
Feeds
SpamBlocker
RedirectRoute
CacheKill

I hope all of you using my documentation find it useful. If you ever find something incorrect or not easily explained, be sure to shoot me an email and I will tackle it as soon as possible! Thanks for using my code :]

RFC proposal for getters and setters

If you haven't been following the PHP development lately, then you have been missing out. Recently, there was a vote on the PHP mailing lists about adding short syntax for arrays (ala Javascript), yet the devs vote against it with childish excuses. And then there was this one guy who forked the PHP project and patched it with speed improvements and features the users have been wanting (which I completely agree with). You can also view the PHP RFC Wiki on the list of *possible* features and the ones that were denied. As you can see, there is much happening in the PHP community, but nothing to show for it (yet).

However, my post today will be on the RFC suggestion for built in getters/setters. To keep it blunt, I really dislike the C# approach... it's, just not very PHP. Just seems odd to have floating curly blocks with a "get" and "set" in it, with no real defined scope block. On top of that, the "property" keyword is way too complicated for what it is trying to achieve. The one thing I do agree with though, is the readonly modifier. My suggestion is loosely based on the Traceur Compiler by Google syntax (they use get/set keywords instead of function, within the class).

class FooBar {
	public $value;
	protected readonly $_readOnly;
	protected static readonly $_static;
	public get value() {
		return $this->value;
	}
	final public set value($value) {
		$this->value = $value;
	}
	public get readOnly() {
		return $this->_readOnly;
	}
	public static get static() {
		return self::$_static;
	}
	public function noop() {
		return;
	}
}

Admittedly, my suggestion is a bit more verbose than the C# variant, and pretty similar to regular getValue() and setValue() methods, but there are a few key differences.

Method Naming

Technically, are they still considered methods? Regardless, when you are writing getters and setters, you should use the words "get" or "set" in place of "function". This dictates to the class that these methods should be used anytime a property is being read or written to. On top of this functionality, the visibility modifiers are in effect (public, protected, private). This allows you to write to protected properties using a public setter, or reading from private properties with a protected getter (while in the class scope of course). Final and static keywords work exactly the same as well. Below is a quick example.

$foo = new FooBar();
$foo->value = 'setter'; // calls set::value()
$foo->readOnly = 'readonly'; // throws an error/exception
FooBar::$static = 'static'; // throws an error/exception
echo $foo->value; // calls get::value()
echo $foo->readOnly; // calls get::readOnly()
echo FooBar::$static; // calls get::static() statically

Getters and setters are not required, but when implemented, they are automatically triggered. If a property is public, without a getter/setter, then getting/setting a value works like it normally would. The major difference with this proposal is allowing the getting/setting of non-public properties, and never having to write getValue() or setValue() (you just modify the property directly like the example above).

Read Only

One of the features within the original proposal that I did like, was the readonly keyword. This keyword can be applied to any class property to set it into a read-only state, which basically disallows the use of a set method. It also disallows setting a value to the property directly, using the old functionality. But this sounds like the final keyword right? Technically yes, the major difference is that you can overwrite a readonly value in a sub-class, and not with a final.

Abstract and Interfaces

These could also be used with abstract classes and interfaces, like so.

interface FooBar {
	public get value();
	public set value();
}
abstract class FooBar {
	protected $value;
	abstract protected get value();
	abstract protected set value();
}

Now this is just a personal preference and style, and is something I have been thinking about lately (I have ideas for other RFCs as well), so don't expect this to actually happen! I also didn't get too in depth, for example, when are magic methods called during this process? I will leave those out unless for some odd reason this makes it in (heh). Let me know what you think!

Old School PHP Scripts: Numword, the number to word converter

The Numword class (via Github) will rarely find a use, but its creation was primarily for fun. A friend of mine asked me if there was a PHP function that will turn a number into its word equivalent (example, 100 becomes one-hundred). As none existed, I felt like this would be a fun task to attempt, and so the Numword class was born. Numword supports the basic range of numbers and the ability to convert up to centillion (which is mind blowingly large).

The easiest way to convert a number is to use the single() method. This method accepts a single number argument and returns the word equivalent. You may also use the multiple() method which accepts an array of numbers. Do note however, that large numbers must be passed as a string, else it will blow up because of PHPs 32 bit integers.

// one-thousand, two-hundred thirty-four
Numword::single(1234);
// eight-billion, two-hundred thirty-four-million, seven-hundred eighty-thousand, two-hundred thirty-four
Numword::single('8234780234');

Some other convenient methods are block() and currency(). The block() method will parse out any numbers within a string of text, and convert them. While the currency() method is self explanatory, it converts currency.

// I am twenty-five years, fifteen days and sixty-two minutes years old.
Numword::block('I am 25 years, 15 days and 62 minutes years old.');
// one-thousand, three-hundred thirty-seven dollar(s) & fifteen cent(s)
Numword::currency('$1,337.15');

Awesome right? Furthermore, the currency() method is rather smart, in that it parses out the dollar sign, commas, and periods depending on the current locale based on setlocale(). You can also translate the strings used in currency() by passing an array as the second argument. But before we do that, lets go over translating the whole class.

Translating the strings in Numword is extremely easy, but also tedious. If you only need to translate for a single language, then you can overwrite the static properties. If you need to translate for multiple languages (user language selection system), then you will still need to overwrite the properties, but create some kind of system to know which language to use and when (possibly via includes). Here's an example translation of German; zero through nine respectively.

Numword::$digits = array('null', 'eins', 'zwei', 'drei', 'vier', 'fünf', 'sechs', 'sieben', 'acht', 'neun');

And to translate the currency strings, you can do something like:

Numword::currency('£48,530.38', array('dollar' => 'pound(s)', 'cent' => 'pence'));

Numword isn't as extensible as I would like, but since it is merely a fun project, the need for heavy translation and locale awareness settings aren't needed. You can always base your own class on Numword :). Hope you enjoyed!

Refactoring is fun

For over a year now I have been eager to redo the backend (built on CakePHP) of this site. I kept putting it off, until last week when I was fixing bugs that I realized it would be more beneficial to just rebuild the whole site. I wanted to redo both the PHP and the database tables, as the old tables were from a previous installation. Here's just a quick list of things I wanted to change:

Separate blog tags into its own table and setup a HABTM (was a column in each entry)
Remove ENUM fields from the database tables and use class constants
Use slugs for all dynamic URLs
Use the model's counterCache system instead of a count find()
Add a blog series system (example)
Fix the bugs in my comments and contact forms
Rebuild the code/script section and remove the old "versioning" system (since I use Github now)
Build an admin panel

So to begin this huge task, I created new database tables based on the architecture I wanted. Once done, I created all the models for each of these new tables (and made sure to keep the old models for importing). The next step was to create CakePHP shells that use the old models to generate the new data structure and save it into the new tables (while keeping legacy IDs intact). This database changed fixed the dislikes I had with the old table columns (by removing ENUMs), added the slug and ID fields where necessary, and removed the old and useless tables I don't have a need for. First task complete.

Now that the fun step was over, it was time to refactor all the old controllers and views. Most of the old controllers were re-usable (like blog, pages, comments and feeds), all I simply had to do was make sure the new column names were being used and add support for new features (blog series, etc). The most time consuming part in this process was splitting up the old resources controller into two new controllers: code and snippets. Since the code section of my site was re-built from the ground up, these controllers also had to be rebuilt. The new code structure only uses 3 tables compared to the previous 5, win! However, I still had a problem with old legacy URLs. The solution I went with, was to allow the old URLs to redirect to the new ones using a jump controller (which also powers my tiny URL system), as well as allowing the URLs to work with a slug or ID (very important). Example, all of these links lead to the same place.

http://milesj.me/code/cakephp/forum
http://milesj.me/resources/script/forum-plugin
http://milesj.me/c/13

At this point, I was extremely pleased with the refactoring process. The next and last step was to create an admin panel system for all types of content on the site (I didn't have one before). I decided to place all of this code within a plugin, as I didn't want to litter my controllers with admin_ methods and it gave me more control of security and management as it was self-contained. I was expecting this process to take weeks to finish, but I completed it in less than 8 hours, win again! I used the technique of having a single view template for both the add() and edit() methods and was able to re-use a lot of code (DRY for the win). I highly suggest this approach for anyone who needs an admin system.

All in all, the process wasn't as back breaking and time consuming as estimated. I basically rebuilt the whole site in under 2 weeks, working about 1 hour a day. If you are interested, here's a quick list of all the changes.

Importing of old data into new database tables
Refactor of old models, controllers and views
Moving tags into a HATBM table and model
Adding slugs to all URLs
New blog archiving system for date ranges, tags and topics
New blog series feature
Adding counterCache for comments and tags
Adding a jump controller to deal with legacy URLs and tiny URLs
Splitting of old resources controller into code and snippets
Rebuilding the code packages and versioning system

I wouldn't doubt it if I forgot something! But whats next you ask? The worst part of all, updating my documentation. Now that will take some time.

Naming your cache keys

Everyone caches, that's a pretty well known fact. However, the problem I always seemed to have was how to properly name my cache keys. After much trial and tribulation, I believe I have found a great way to properly name cache keys. To make things easy, my keys usually follow this format.

<model|table>__<function|method>[-<params>]

To clear up some confusion, it goes as follows. The first word of your cache key should be your model name (or database table name), as most cached data relates to a database query result. The model name is followed by a double underscore, which is then followed by the function/method name (which helps to identify exactly where the cache is set), which is then followed by multiple parameters (optional). Here's a quick example:

public function getUserProfile($id) {
	$cacheKey = __CLASS__ .'__'. __FUNCTION__ .'-'. $id;
	// Check the cache or query the database
	// Cache the query result with the key
	// Return the result
}

The $cacheKey above would become: User__getUserProfile-1337, assuming the user's ID is 1337. Pretty easy right? Besides the verbosity that it takes to write these constants, it works rather well (unless you want to write the method and class manually). You may also have noticed that I used __FUNCTION__ over __METHOD__ -- this was on purpose. The main reasoning is that __METHOD__ returns the class and method name, like User::getUserProfile, while __FUNCTION__ just returns the method name.

The example above will work in most cases, but there are other cases where something more creative is needed. The main difficulty is how to deal with array'd options. There are a few ways of dealing with that, the first is checking to see if an ID or limit is present, if so, use that as the unique value. If none of the options in the array are unique, you can implode/serialize the array and run an md5() on the string to create a unique value.

User::getTotalActive();
// User__getTotalActive
Topic::getPopularTopics($limit);
// Topic__getPopularTopics-15
Forum::getLatestActivity($id, $limit);
// Forum__getLatestActivity-1-15
Post::getAllByUser(array('user_id' => $user_id, 'limit' => $limit));
// Post__getAllByUser-1-15
User::searchUsers(array('orderBy' => 'username', 'orderDir' => 'DESC'));
// User__searchUsers-fcff339541b2240017e8d8b697b50f8b

In most cases an ID or query limit can be used as a unique identifier. If you have another way that you name your cache keys or an example where creating the key can be difficult, be sure to tell us about it!

How useful is the new ?: operator?

As with everyone else excited about PHP 5.3, I was extremely looking forward to developing in it. I was especially excited to use the new shorthand ternary operator (?:). This would remove the redundant middle expression of returning the variable, and instead would return itself if it evaluated to true. But after much testing and trying to implement it in interesting ways, the shorthand ternary just isn't as useful as you would hope. The primary problem is the left-most expression must evaluate to true or false, which isn't possible with the shorthand. Below is my test case.

error_reporting(E_ALL | E_STRICT);
class Ternary {
	private $__data = array('key' => 'value');
	public function get($key, $default = null) {
		return $this->__data[$key] ?: $default;
	}
}
$test = new Ternary();
var_dump($test->get('key')); echo '<br>';
var_dump($test->get('test')); echo '<br>';
var_dump($test->get('')); echo '<br>';
var_dump($test->get(false)); echo '<br>';
var_dump($test->get(null)); echo '<br>';

This test works for the most part, the value or null is always returned. However, the problem is that this technique throws notice errors; here is the result after running the test. You can easily avoid this by turning of notice errors, but that's bad practice.

string(5) "value"
Notice: Undefined index: test in C:\xampp\htdocs\scripts\index.php on line 9
NULL
Notice: Undefined index: in C:\xampp\htdocs\scripts\index.php on line 9
NULL
Notice: Undefined offset: 0 in C:\xampp\htdocs\scripts\index.php on line 9
NULL
Notice: Undefined index: in C:\xampp\htdocs\scripts\index.php on line 9
NULL

I was hoping the new shorthand ternary would internally run an isset() and evaluate automatically, but it looks like it does not. So now we are still stuck with the old verbose way of doing things.

return isset($this->__data[$key]) ? $this->__data[$key] : $default;

Is there a reason why the PHP devs chose not to run an isset automatically? Or am I doing something wrong here? More information on this would be helpful, because I believe the operator would be multitudes more useful if it worked like I suggested.

Using Closures as callbacks within loops

In jQuery (and other Javascript frameworks) it is quite common to use closures (I refer to them as callback functions) to loop over arrays or objects. Even though it's a slow process and is much more efficient to use the built-in for loop, it got me thinking. Why not try and use the new Closure class in PHP 5.3 and see how well it performs within a loop? Suffice to say, I got some really really interesting results. Before I get into the details, here is the test script I wrote (the Benchmark class is merely a class I have written in the past).

<?php $data = range(0, 1000);
$clean = array();
public function loop($array, Closure $closure) {
	if (!empty($array)) {
		foreach ($array as $key => $value) {
			$closure($key, $value);
		}
	}
}
Benchmark::start('loop');
foreach ($data as $key => $value) {
	$clean[$key] = $value;
}
loop($data, function($key, $value) {
	$clean[$key] = $value;
});
Benchmark::stop('loop');
echo Benchmark::display('loop'); ?>

I didn't get too in depth with my test cases and simply used Firefox and page refresh to get my results. I am running PHP 5.3.1 on a Windows 7 XAMPP installation with Apache and no caching. For benchmarking I was using microtime(true) and memory_get_usage().

I began testing with 4 different cases, each of which that changed the size of the $data array. I started with 1000 iterations, then 5000, then 10000 and lastly 100000. I would comment out the foreach/loop sections and run them one at a time (of course), and ran each test about 5 times to gather an average. Here are the results.

foreach:
1000	Time: 0.0010 / Memory: 137128 (Max: 689160)
5000	Time: 0.0052 / Memory: 706488 (Max: 1258528)
10000	Time: 0.0097 / Memory: 1412048 (Max: 1964120)
100000	Time: 0.0545 / Memory: 13849568 (Max: 14401656)
closure:
1000	Time: 0.0027 / Memory: 84984 (Max: 688832)
5000	Time: 0.0144 / Memory: 433672 (Max: 1258192)
10000	Time: 0.0267 / Memory: 866448 (Max: 1963744)
100000	Time: 0.1223 / Memory: 8525216 (Max: 14401256)

The first thing you will notice is the time it took to interpret the page. On average using a closure as a callback within a loop will take 2-3x longer to process. However, the interesting thing is that the memory usage is around 40% smaller (using more allocated memory) while using a closure than doing a foreach, yet the max allocated is nearly identical. I knew what the outcome would be before I even started it -- Javascript closures are the same way. Regardless it was a fun experiment and if anyone knows more about this, please shed some light on this topic for the rest of us!

But in closing I can sadly say, that no, you should not be using a closure for looping, just stick to the old fashion tried and true foreach or for loop.

Getting the page height with jQuery

I was recently adding an overlay (or modal) feature to one of my sites. This overlay feature required a blackout mask (when your webpage gets covered with an opaque black layer) to be shown over the content. Writing the CSS for this was a breeze, and writing the jQuery (or Javascript) for this was even easier. However, I did run into one little snag.

In current browsers I could do a position fixed on the blackout and define width and height at 100%. While in outdated browsers (Internet Explorer), position fixed does not work properly, so I had to resort to Javascript and position absolute. This was easy as I would grab the height of the page and apply it to the blackout CSS. During my testing I noticed that the blackout would only fill the viewport of the page, and the second you started scrolling you would see where the blackout cut off.

After some browsing of the jQuery API I discovered my problem. It seems that the window and document objects return different heights in jQuery. Window returns the height of the viewport (everything within view) while document returns the height of the HTML page (the whole page, including hidden content that requires scrolling). Furthermore, window and document can not use outerHeight() nor innerHeight(), they must simply use height().

$(window).height(); // returns height of browser viewport
$(document).height(); // returns height of HTML document

Here is how I remember which is which: document returns the height of the DOM (document object model) and window returns the height of the viewport (viewing out a window). I know many of you may of already known this, but it slipped past me and was quite a genius implementation of behalf of the jQuery team.

The end of the HtmlHelper

For the past two years, I have gotten pretty close with the HtmlHelper. It has been there for me, making my life easier than before. But that time has come to an end, so sorry HtmlHelper, you are just too much of a burden now a days. The HtmlHelper has been an amazing convenience by automatically building my anchor links, by creating images with the correct path, or linking my CSS from within separate views, and much much more. But why do all this when you can simply write the HTML yourself?

I am not sure why I didn't notice this sooner; I was probably just stoked on developing with CakePHP so I wanted to do everything the CakePHP way. Lately however, I have noticed that the HtmlHelper really isn't needed that much. I can only think of a few cases where it is needed: linking stylesheets/javascript dynamically, building breadcrumbs and creating routes. Everything else is just consuming PHP logic and processing time to render HTML, which you can simply write yourself in the first place and bypass the PHP interpreter.

Linking Routes

The primary use of the helper, but why not just use url() instead? By doing that you don't have to deal with the hassle of nesting your array of attributes, or escaping variables into the method call. You also don't have to worry about Cake being over zealous and escaping all your data. Take these examples, they deliver the same result.

// With the helper
<?php echo $this->Html->link('Anchor Link', array('controller' => 'news', 'action' => 'index'), array('title' => 'Anchor Title')); ?>

// Without the helper
<a href="<?php echo $this->Html->url(array('controller' => 'news', 'action' => 'index')); ?>" title="Anchor Title">Anchor Link</a>
// Or pure HTML if your routes never change
<a href="/news" title="Anchor Title">Anchor Link</a>

Doctypes and meta tags

Another example of using PHP to render HTML, when you can just write HTML. This gets even easier with HTML5.

<?php echo $html->docType('xhtml-trans');
echo $html->meta('keywords', 'miles johnson php mysql design code developement developer web production creation coding functions tutorials methods scripts packages open source cakephp cake bake controller component model behavior view helper'); ?>

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 
<meta name="keywords" content="miles johnson php mysql design code developement developer web production creation coding functions tutorials methods scripts packages open source cakephp cake bake controller component model behavior view helper" />

Images

This one I am still 50/50 on. It helps by determining the base URL, detecting asset timestamps and appending the alt attribute. But this of course can still be written normally as well. I will let you decide.

<?php echo $this->Html->image('logo.png'); ?>

<img src="/img/logo.png" alt="" />

Divs, paragraphs, tables, lists, script blocks

All of these should never be used in the view or layout. Using PHP to render basic HTML tags like this is absurd in my opinion.

So when should the helper be used?

Like I stated above, there are only a few times when the helper should be used. The first is asset linking (stylesheets and javascript). This allows you to include an asset from within any view, which is then output within $scripts_for_layout. Why is this so awesome? Simple, you can have a specific stylesheet for a specific page, without having to include it on all pages.

Secondly is building breadcrumbs. From within your view you can define the "top level" or "trailing" crumb, and within your layout you can define the base crumbs. This allows you to add multiple levels of crumbs within different layers of views. A quick example, which would give you the trail of: Blog -> Archives -> Blog Title.

// In the view
$this->Html->addCrumb('Archive', array('controller' => 'blog', 'action' => 'archive');
$this->Html->addCrumb($blog['Blog']['title'], array('controller' => 'blog', 'action' => 'read', $blog['Blog']['id']);
// In the layout
$this->Html->addCrumb('Blog', array('controller' => 'blog', 'action' => 'index');

Lastly, the primary reason that the HtmlHelper has all these convenience methods, is so that you can use them within other helpers. Since it's impossible to render HTML within PHP without string concatenation, the HtmlHelper gives other helpers the ability to render HTML easily without all the hardship. That is the primary reason of this helper.

Now all of this is a personal opinion of course, but since I didn't realize most of this for a while, I thought some of you might not have either. This is neither belittling the CakePHP dev team as they have done an awesome job so far, so thank you! So take it how you wish and code how you like. Enjoy!

Using the session within models

This is something that everyone wants to do, but are afraid it breaks the MVC paradigm. Theoretically, the session should be a model, seeing as how it represents data and manages adds, edits, deletes, etc. Regardless, it's a much easier approach to use the session within the model directly, instead of having to pass it as an argument within each method call. Other developers who have attempted this task either try to import the SessionComponent or to use $_SESSION directly.

If you use the component, then you are using the class outside of its scope (a controller helper). If you use the $_SESSION global, then you don't have the fancy Cake dot notation access (Auth.User.id, etc) as well as its session management and security. But don't worry, Cake comes packaged with this powerful class called CakeSession, which both the SessionComponent and helper extend. Merely instantiate this class within your AppModel and you are set.

// Import the class
App::import('Core', 'CakeSession');
// Instantiate in constructor
public function __construct($id = false, $table = null, $ds = null) {
	parent::__construct($id, $table, $ds);
	$this->Session = new CakeSession();
}
// Using it within another model
$user_id = $this->Session->read('Auth.User.id');

Now you have control of the session within the model, bundled with Cake's awesome session management.