Stripping HTML automatically from your data

About a week ago I talked about automatically sanitizing your data before its saved. Now I want to talk about automatically stripping HTML from your data before its saved, which is good practice. Personally, I hate saving any type of HTML to a database, thats why I prefer a BB code type system for this website. To strip all tags from your data, add this method to your AppModel.

/**
 * Strip all html tags from an array
 *
 * @param array $data
 * @return array
 */
public function cleanHtml($data) {
	if (is_array($data)) {
		foreach ($data as $key => $var) {
			$data[$key] = $this->cleanHtml($var);
		}
	} else {
		$data = Sanitize::html($data, true);
	}
	return $data;
}

Pretty simple right? The next and final step is to add it to AppModel::beforeSave(). In the next example, I will use the code snippet from my previous related article. Once you have done this your are finished, now go give it a test drive.

function beforeSave() {
	if (!empty($this->data) && $this->cleanData === true) {
		$connection = (!empty($this->useDbConfig)) ? $this->useDbConfig : 'default';
		$this->data = Sanitize::clean($this->data, array('connection' => $connection, 'escape' => false));
		$this->data = $this->cleanHtml($this->data);
	}
	return true;
}

Automatically sanitizing data with beforeSave()

So if you are like me and hate having to sanitize or clean your data manually within each action, and was hoping there was an easier way, there is. Simple combine the magic of Model::beforeSave() and the powerful strength of Sanitize::clean().

function beforeSave() {
	if (!empty($this->data) && $this->cleanData === true) {
		$connection = (!empty($this->useDbConfig)) ? $this->useDbConfig : 'default';
		$this->data = Sanitize::clean($this->data, array('connection' => $connection, 'escape' => false));
	}
	return true;
}

The previous code will attempt to clean all data before it is saved. Secondly it will convert HTML, it will not strip tags completely. So if you do not want HTML in your database, you will need to add some extra functionality and set encode to false in the clean() options.

But that's not it, were not finished just yet. You may have noticed a $cleanData variable and are probably wondering what it does. This is a custom property that should be placed in your AppModel and IS NOT a CakePHP property. By placing it in the AppModel we will receive no error notices and all data will be cleaned, additionally you can disable cleaning in certain models by setting the property to false in the respective model.

public $cleanData = true;
Known Errors

So far this has worked smoothly, except for the following exception:

- Serialized arrays will be escaped incorrectly and will break when trying to unserialize(), simply set $cleanData to false to not escape the serialized arrays.

- When escape is set to true, all data will have slashes added on top of the slashes already added with the Model class, so its best to turn escaping off.