An easy way to efficiently insert, update, and work with large amounts of data using CakePHP.
My company uses CakePHP for most of our applications. However, we were running into efficiency issues when working with large amounts of data.
It’s not uncommon for us to insert (or update) hundreds of thousands of rows with a single process. Additionally, we needed an efficient way to work with those hundreds of thousands of pieces of data.
So, after some investigation I narrowed our efficiency problem to CakePHP sending data to the database, one row at a time. This works great out of the box,
but will really slow things down once large amounts of data come into play. I remedied this by creating this behavior that allows a model to have a “bundle” of objects.
This bundle is stored in memory. Upon saving the bundle, all of the model objects are inserted into the database as a bulk insert, 100,000 items per insert by default.
Additionally, this behavior allows CakePHP find results to be returned in the form of a hashed array. The user can specify a ‘key’, which will serve as the key of the returned associative array.
- CakePHP 1.3
- PHP 5.2+
- MySQL
- Download this: https://github.com/jarriett/cake_big_data/zipball/master
- Unzip the downloaded file
- Copy the resulting folder to app/plugins
- If debugging is enabled, PHP notices are generated and logged. If this behavior is being used for very large amounts of data, the log files can grow very quickly due to the generated PHP notices.
Have your model use the behavior:
<?php
App::Import('Model', 'BigDataModel');
class Frog extends BigDataModel
{
var $actsAs = array('BigData');
}
?>
Now to insert 100,000 rows to the database in one database call, do the following:
for($i=0;$i <= 100000; $i++)
{
$frog = array();
$frog['Frog']['color'] = 'green';
$frog['Frog']['name'] = $i . " Froggy";
$frog['Frog']['unique_name'] = md5(mktime());
$frog['Frog']['species_id'] = 7;
$this->Frog->addToBundle($frog);
}
$this->Frog->saveBundle();
//* Note: If a unique key exists on the database table, by default any rows matching the unique key will be updated,
To fetch a hashed result set from the database, call the fetchHashedResult() function as demonstrated below:
$frog_hash = $this->Frog->fetchHashedResult(array('conditions' => array('Frog.species_id' => 7), 'fields' => array('Frog.name', 'Frog.color'), 'key' => array('Frog.name', 'Frog.color')));
The previous function call returns an associative array, where each object’s key is + . If you would like to md5 the key, can add the ‘useHash’ => true value to the parameter array.
- Jarriett K. Robinson [jarriett (at) gmail.com] , http://github.com/jarriett
Licensed under the MIT License Redistributions of files must retain the above copyright notice.