Changelogs: Decoda v4.0.0-beta

A new version of Decoda has been released, version 4.0.0-beta. Please download the new tag or view the documentation. If you have any questions, be sure to send me an email or comment on this post. If you run into any problems, be sure to report an issue on the Github repository.

Version: 4.0.0-beta
Tested On: PHP 5.3
Requires: PHP 5.3
Commit Hash: f960a1b4f4064d48894f4f7438cce7eed3a9381c
Changes:

  • Refactored for PHP 5.3 namespaces and moved all classes to namespaced folders
  • Added unit tests for all classes
  • Added configuration paths to define custom lookup locations
  • Added a new option for Filters: childrenBlacklist and renamed children to childrenWhitelist
  • Added a new option for Filters: persistContent
  • Added a new config for QuoteFilters: dateFormat
  • Added a new config for BlockFilters: spoilerToggle
  • Added a new config for UrlFilter: protocols
  • Added a new config for CensorHook: suffix
  • Added a new configs for EmoticonHook: path, extension
  • Added support for self closing tags:
  • Added a global blacklist to Decoda using Decoda::blacklist()
  • Fixed incorrectly nested tags
  • Fixed child and parent hierarchies
  • Fixed CRLF conversion problems
  • Merged Filter options alias and map into mapAttributes
  • Moved all class constants to Decoda base
  • Refactored all Filter regex patterns
  • Refactored all the template HTML classes
  • Removed all global constants except for DECODA
  • Removed Decoda::nl2br()
  • Removed Filter option testNoDefault
  • Renamed Filter option key to tag
  • Renamed Filter option tag to htmlTag
  • Renamed Filter option type to displayType
  • Renamed Filter option allowed to allowedTypes
  • Renamed Decoda::disableFilters() to resetFilters()
  • Renamed Decoda::disableHooks() to resetHooks()
  • Updated doc blocks and examples
  • Updated EmailFilter and UrlFilter to use filter_var()
  • Updated Decoda to throw exceptions when necessary

Changelogs: Decoda v3.4

A new version of Decoda has been released, version 3.4. Please download the new tag or view the documentation. If you have any questions, be sure to send me an email or comment on this post. If you run into any problems, be sure to report an issue on the Github repository.

Version: 3.4
Tested On: PHP 5.3
Requires: PHP 5.2
Commit Hash: 6fb3cdb9a906d720687d87e4b6b3b4533cd5af7f
Changes:

  • Added Composer support
  • Added an alias property to filters allowing attribute alias names for tags

Changelogs: Decoda v3.3

A new version of Decoda has been released, version 3.3. Please download the new tag or view the documentation. If you have any questions, be sure to send me an email or comment on this post. If you run into any problems, be sure to report an issue on the Github repository.

Version: 3.3
Tested On: PHP 5.3
Requires: PHP 5.2
Commit Hash: 710b972a367d908a33d198134417a30ef418b54b
Changes:

  • Added DecodaFilter::setupHooks() to allow filters to initialize hook dependencies
  • Added DecodaHook::setupFilters() to allow hooks to initialize filter dependencies
  • Added CodeHook (CodeFilter dependency) that stops emoticons from being processed in code blocks [Issue #9]
  • Check for class or interface during autoload [Issue #10]
  • Made HTML escaping a boolean setting [Issue #11]
  • Switched CensorHook::afterParse() to beforeParse()

Docsinated

I'm quite ashamed. I work constantly on all my PHP codebases for myself and everyone that uses them... but... I rarely update my actual documentation. After multiple emails from frustrated developers letting me know my docs make no sense and are quite outdated, I went on an update spree. I've spent the last few months updating all my codebases to a final stable release so that I may cease development on all of them (I need to free up some time). During this process I closed any outstanding issues and bugs, tagged new versions and have been recently converting all my CakePHP scripts to use the latest 2.0 version. Alongside that, I took the time to update all the documentation and changelogs on this site; the docs will always reference the latest version and any old docs have been removed.

I've always had a place in my heart for my stand-alone PHP scripts as they are very simplistic implementations of what I needed back in the day. It's quite heartening to get emails from new developers letting me know these scripts have taught them. Here's the current list:

As for my CakePHP codebases, I will constantly keep those updated if any bugs come up, but will not be adding any new features for the most part. Since CakePHP 2 was released, I had to spend some time updated all my projects. I created new branches for the old 1.3 codebase and turned the master branch into the new 2.0 repository. By doing this, it allows me to keep both CakePHP versions up to date in parallel. It also allowed me to tag new major versions for all my projects. Below are my current projects (with possibly more to come):

I hope all of you using my documentation find it useful. If you ever find something incorrect or not easily explained, be sure to shoot me an email and I will tackle it as soon as possible! Thanks for using my code :]

RFC proposal for getters and setters

If you haven't been following the PHP development lately, then you have been missing out. Recently, there was a vote on the PHP mailing lists about adding short syntax for arrays (ala Javascript), yet the devs vote against it with childish excuses. And then there was this one guy who forked the PHP project and patched it with speed improvements and features the users have been wanting (which I completely agree with). You can also view the PHP RFC Wiki on the list of *possible* features and the ones that were denied. As you can see, there is much happening in the PHP community, but nothing to show for it (yet).

However, my post today will be on the RFC suggestion for built in getters/setters. To keep it blunt, I really dislike the C# approach... it's, just not very PHP. Just seems odd to have floating curly blocks with a "get" and "set" in it, with no real defined scope block. On top of that, the "property" keyword is way too complicated for what it is trying to achieve. The one thing I do agree with though, is the readonly modifier. My suggestion is loosely based on the Traceur Compiler by Google syntax (they use get/set keywords instead of function, within the class).

class FooBar {

	public $value;

	protected readonly $_readOnly;
	
	protected static readonly $_static;

	public get value() {
		return $this->value;
	}

	final public set value($value) {
		$this->value = $value;
	}

	public get readOnly() {
		return $this->_readOnly;
	}
	
	public static get static() {
		return self::$_static;
	}

	public function noop() {
		return;
	}

}

Admittedly, my suggestion is a bit more verbose than the C# variant, and pretty similar to regular getValue() and setValue() methods, but there are a few key differences.

Method Naming

Technically, are they still considered methods? Regardless, when you are writing getters and setters, you should use the words "get" or "set" in place of "function". This dictates to the class that these methods should be used anytime a property is being read or written to. On top of this functionality, the visibility modifiers are in effect (public, protected, private). This allows you to write to protected properties using a public setter, or reading from private properties with a protected getter (while in the class scope of course). Final and static keywords work exactly the same as well. Below is a quick example.

$foo = new FooBar();

$foo->value = 'setter'; // calls set::value()
$foo->readOnly = 'readonly'; // throws an error/exception
FooBar::$static = 'static'; // throws an error/exception

echo $foo->value; // calls get::value()
echo $foo->readOnly; // calls get::readOnly()
echo FooBar::$static; // calls get::static() statically

Getters and setters are not required, but when implemented, they are automatically triggered. If a property is public, without a getter/setter, then getting/setting a value works like it normally would. The major difference with this proposal is allowing the getting/setting of non-public properties, and never having to write getValue() or setValue() (you just modify the property directly like the example above).

Read Only

One of the features within the original proposal that I did like, was the readonly keyword. This keyword can be applied to any class property to set it into a read-only state, which basically disallows the use of a set method. It also disallows setting a value to the property directly, using the old functionality. But this sounds like the final keyword right? Technically yes, the major difference is that you can overwrite a readonly value in a sub-class, and not with a final.

Abstract and Interfaces

These could also be used with abstract classes and interfaces, like so.

interface FooBar {
	public get value();
	public set value();
}

abstract class FooBar {
	protected $value;
	
	abstract protected get value();
	abstract protected set value();
}

Now this is just a personal preference and style, and is something I have been thinking about lately (I have ideas for other RFCs as well), so don't expect this to actually happen! I also didn't get too in depth, for example, when are magic methods called during this process? I will leave those out unless for some odd reason this makes it in (heh). Let me know what you think!

Old School PHP Scripts: Numword, the number to word converter

The Numword class (via Github) will rarely find a use, but its creation was primarily for fun. A friend of mine asked me if there was a PHP function that will turn a number into its word equivalent (example, 100 becomes one-hundred). As none existed, I felt like this would be a fun task to attempt, and so the Numword class was born. Numword supports the basic range of numbers and the ability to convert up to centillion (which is mind blowingly large).

The easiest way to convert a number is to use the single() method. This method accepts a single number argument and returns the word equivalent. You may also use the multiple() method which accepts an array of numbers. Do note however, that large numbers must be passed as a string, else it will blow up because of PHPs 32 bit integers.

// one-thousand, two-hundred thirty-four
Numword::single(1234);

// eight-billion, two-hundred thirty-four-million, seven-hundred eighty-thousand, two-hundred thirty-four
Numword::single('8234780234');

Some other convenient methods are block() and currency(). The block() method will parse out any numbers within a string of text, and convert them. While the currency() method is self explanatory, it converts currency.

// I am twenty-five years, fifteen days and sixty-two minutes years old.
Numword::block('I am 25 years, 15 days and 62 minutes years old.');

// one-thousand, three-hundred thirty-seven dollar(s) & fifteen cent(s)
Numword::currency('$1,337.15');

Awesome right? Furthermore, the currency() method is rather smart, in that it parses out the dollar sign, commas, and periods depending on the current locale based on setlocale(). You can also translate the strings used in currency() by passing an array as the second argument. But before we do that, lets go over translating the whole class.

Translating the strings in Numword is extremely easy, but also tedious. If you only need to translate for a single language, then you can overwrite the static properties. If you need to translate for multiple languages (user language selection system), then you will still need to overwrite the properties, but create some kind of system to know which language to use and when (possibly via includes). Here's an example translation of German; zero through nine respectively.

Numword::$digits = array('null', 'eins', 'zwei', 'drei', 'vier', 'fünf', 'sechs', 'sieben', 'acht', 'neun');

And to translate the currency strings, you can do something like:

Numword::currency('£48,530.38', array('dollar' => 'pound(s)', 'cent' => 'pence'));

Numword isn't as extensible as I would like, but since it is merely a fun project, the need for heavy translation and locale awareness settings aren't needed. You can always base your own class on Numword :). Hope you enjoyed!

Naming your cache keys

Everyone caches, that's a pretty well known fact. However, the problem I always seemed to have was how to properly name my cache keys. After much trial and tribulation, I believe I have found a great way to properly name cache keys. To make things easy, my keys usually follow this format.

<model|table>__<function|method>[-<params>]

To clear up some confusion, it goes as follows. The first word of your cache key should be your model name (or database table name), as most cached data relates to a database query result. The model name is followed by a double underscore, which is then followed by the function/method name (which helps to identify exactly where the cache is set), which is then followed by multiple parameters (optional). Here's a quick example:

public function getUserProfile($id) {
	$cacheKey = __CLASS__ .'__'. __FUNCTION__ .'-'. $id;

	// Check the cache or query the database
	// Cache the query result with the key
	// Return the result
}

The $cacheKey above would become: User__getUserProfile-1337, assuming the user's ID is 1337. Pretty easy right? Besides the verbosity that it takes to write these constants, it works rather well (unless you want to write the method and class manually). You may also have noticed that I used __FUNCTION__ over __METHOD__ -- this was on purpose. The main reasoning is that __METHOD__ returns the class and method name, like User::getUserProfile, while __FUNCTION__ just returns the method name.

The example above will work in most cases, but there are other cases where something more creative is needed. The main difficulty is how to deal with array'd options. There are a few ways of dealing with that, the first is checking to see if an ID or limit is present, if so, use that as the unique value. If none of the options in the array are unique, you can implode/serialize the array and run an md5() on the string to create a unique value.

User::getTotalActive();
// User__getTotalActive

Topic::getPopularTopics($limit);
// Topic__getPopularTopics-15

Forum::getLatestActivity($id, $limit);
// Forum__getLatestActivity-1-15

Post::getAllByUser(array('user_id' => $user_id, 'limit' => $limit));
// Post__getAllByUser-1-15

User::searchUsers(array('orderBy' => 'username', 'orderDir' => 'DESC'));
// User__searchUsers-fcff339541b2240017e8d8b697b50f8b

In most cases an ID or query limit can be used as a unique identifier. If you have another way that you name your cache keys or an example where creating the key can be difficult, be sure to tell us about it!

How useful is the new ?: operator?

As with everyone else excited about PHP 5.3, I was extremely looking forward to developing in it. I was especially excited to use the new shorthand ternary operator (?:). This would remove the redundant middle expression of returning the variable, and instead would return itself if it evaluated to true. But after much testing and trying to implement it in interesting ways, the shorthand ternary just isn't as useful as you would hope. The primary problem is the left-most expression must evaluate to true or false, which isn't possible with the shorthand. Below is my test case.

error_reporting(E_ALL | E_STRICT);

class Ternary {
	private $__data = array('key' => 'value');

	public function get($key, $default = null) {
		return $this->__data[$key] ?: $default;
	}
}

$test = new Ternary();

var_dump($test->get('key')); echo '<br>';
var_dump($test->get('test')); echo '<br>';
var_dump($test->get('')); echo '<br>';
var_dump($test->get(false)); echo '<br>';
var_dump($test->get(null)); echo '<br>';

This test works for the most part, the value or null is always returned. However, the problem is that this technique throws notice errors; here is the result after running the test. You can easily avoid this by turning of notice errors, but that's bad practice.

string(5) "value"

Notice: Undefined index: test in C:\xampp\htdocs\scripts\index.php on line 9
NULL

Notice: Undefined index: in C:\xampp\htdocs\scripts\index.php on line 9
NULL

Notice: Undefined offset: 0 in C:\xampp\htdocs\scripts\index.php on line 9
NULL

Notice: Undefined index: in C:\xampp\htdocs\scripts\index.php on line 9
NULL

I was hoping the new shorthand ternary would internally run an isset() and evaluate automatically, but it looks like it does not. So now we are still stuck with the old verbose way of doing things.

return isset($this->__data[$key]) ? $this->__data[$key] : $default;

Is there a reason why the PHP devs chose not to run an isset automatically? Or am I doing something wrong here? More information on this would be helpful, because I believe the operator would be multitudes more useful if it worked like I suggested.

Using Closures as callbacks within loops

In jQuery (and other Javascript frameworks) it is quite common to use closures (I refer to them as callback functions) to loop over arrays or objects. Even though it's a slow process and is much more efficient to use the built-in for loop, it got me thinking. Why not try and use the new Closure class in PHP 5.3 and see how well it performs within a loop? Suffice to say, I got some really really interesting results. Before I get into the details, here is the test script I wrote (the Benchmark class is merely a class I have written in the past).

<?php $data = range(0, 1000);
$clean = array();

public function loop($array, Closure $closure) {
	if (!empty($array)) {
		foreach ($array as $key => $value) {
			$closure($key, $value);
		}
	}
}

Benchmark::start('loop');

foreach ($data as $key => $value) {
	$clean[$key] = $value;
}

loop($data, function($key, $value) {
	$clean[$key] = $value;
});

Benchmark::stop('loop');
echo Benchmark::display('loop'); ?>

I didn't get too in depth with my test cases and simply used Firefox and page refresh to get my results. I am running PHP 5.3.1 on a Windows 7 XAMPP installation with Apache and no caching. For benchmarking I was using microtime(true) and memory_get_usage().

I began testing with 4 different cases, each of which that changed the size of the $data array. I started with 1000 iterations, then 5000, then 10000 and lastly 100000. I would comment out the foreach/loop sections and run them one at a time (of course), and ran each test about 5 times to gather an average. Here are the results.

foreach:
1000	Time: 0.0010 / Memory: 137128 (Max: 689160)
5000	Time: 0.0052 / Memory: 706488 (Max: 1258528)
10000	Time: 0.0097 / Memory: 1412048 (Max: 1964120)
100000	Time: 0.0545 / Memory: 13849568 (Max: 14401656)

closure:
1000	Time: 0.0027 / Memory: 84984 (Max: 688832)
5000	Time: 0.0144 / Memory: 433672 (Max: 1258192)
10000	Time: 0.0267 / Memory: 866448 (Max: 1963744)
100000	Time: 0.1223 / Memory: 8525216 (Max: 14401256)

The first thing you will notice is the time it took to interpret the page. On average using a closure as a callback within a loop will take 2-3x longer to process. However, the interesting thing is that the memory usage is around 40% smaller (using more allocated memory) while using a closure than doing a foreach, yet the max allocated is nearly identical. I knew what the outcome would be before I even started it -- Javascript closures are the same way. Regardless it was a fun experiment and if anyone knows more about this, please shed some light on this topic for the rest of us!

But in closing I can sadly say, that no, you should not be using a closure for looping, just stick to the old fashion tried and true foreach or for loop.