Collections
Cora's Collection class is meant to be a substitute for arrays and gives you much more control over your data. You can either store items of the same type (recommended) or items of mixed type (have to be careful). Under the hood it makes efforts to optimize the fetching and counting of data.
Introduction
A few quick examples to spark interest
Cora Collections are:
- Fast (speed comparable to standard arrays)
- Flexible (many different types of syntax you can use)
- Easy (things just work the way you'd expect)
Example #1 - Smooth Object to Collection Transitions
Rather than force developers to use a PSR-11 style get() method to access items, I wanted to do something that would flow better when working with data (although you can still use get() if you want).
Role in this example is a simple object with a name property and an associated access level you can set through the constructor. The second argument passed to our Collection tells it to index the items using each Role's name. This allows us to easily access the data in a symantically friendly way:
$roles = new \Cora\Collection([
new Role('user', 1),
new Role('admin', 2),
new Role('developer', 3)
], 'name');
echo $roles->admin->accessLevel; // Returns 2 - Using Object syntax
echo $roles->get('admin')->accessLevel; // Returns 2 - Using PSR11 standard syntax
Under the hood, both the approaches above do the same thing, but the first hides the fact a function call is happening in order to provide a more seemless experience. To really see the power of this approach it's necessary to wire up multiple pieces of data together which we'll do next. This starts to get at the heart of why this Collection class was designed, which was to provide an innovative approach to manipulating highly interconnected data using Cora's ORM.
// Setup some data we can play with
$users = new \Cora\Collection([
(object) ['name'=>'bob', 'age'=>32],
(object) ['name'=>'jim', 'age'=>62],
(object) ['name'=>'luann', 'age'=>58]
], 'name');
$roles = new \Cora\Collection([
(object) ['name'=>'user', 'accessLevel'=>1],
(object) ['name'=>'admin', 'accessLevel'=>2],
(object) ['name'=>'developer', 'accessLevel'=>3]
], 'name');
// Setup relationship between one user and another
$users->bob->father = $users->jim;
// Setup relationship between roles and users
$roles->admin->peopleWithThisPermission = new \Cora\Collection([$users->bob, $users->luann], 'name');
// Show how we can chain stuff together (exaggerated example)
echo $roles->admin->peopleWithThisPermission->bob->father->name; // Returns "jim"
echo $roles->get('admin')->peopleWithThisPermission->get('bob')->father->name; // Returns "jim"
// loop (more typical usage example)
foreach ($roles->admin->peopleWithThisPermission as $user) {
echo $user->name; // Returns "bob" and "luann"
}
The object syntax provided by the Cora Collection class allows you to write code that feels cohesive when interweaving calls to collections of objects and individual objects together.
Example #2 - Direct Offset Assignment
Another thing you may find refreshing about the Cora collection class is the way in which you can add things to the collection. While other collection classes may require you call "push()" or "add()" to add additional things to the collection, Cora Collections give you the option to do direct assignment when it makes sense.
$items = new \Cora\Collection();
$items->thing1 = 'A Hat';
$items->thing2 = 'Subglasses';
echo $items->thing1; // Returns "A Hat";
foreach ($items as $item) {
echo $item; // Returns both items
}
If you read up on our Dependency Injection documentation, it's actually a derivative of this Collection class that provides the underlying container for it.
$app = new \Cora\Collection();
$app->resource1 = function($app) {
$user = new \stdClass();
$user->name = 'Bob';
return $user;
};
$app->resource2 = function($app) {
return ['item1', 'item2'];
};
// Simulating a 3rd resource that has a dependency on Resource #2
$app->resource3 = function($app) {
return new \Cora\Collection($app->resource2);
};
echo $app->resource1->name; // Returns "Bob";
echo $app->resource1()->name; // Returns "Bob";
foreach ($app->resource3 as $item) {
echo $item; // Returns "item1" and "item2"
}
Note that by default the Collection will attempt to execute Closures when they are the resource fetched unless you tell it not to. That's why the example above has the same result with and without the parenthesis. That might seem a little strange at first thought, but the reason is to yet again facilitate a smooth object syntax:
// Friendly syntax. Returns 'item1'
echo $app->resource3[0];
echo $app->resource3->off0;
// Less friendly
echo $app->resource3()[0];
echo $app->resource3()->off0;
When a Closure is executed by the collection, an instance of the collection is passed in as the first argument. That's how resource 3 was able to have a dependency on another Collection item. If you don't want this behavior (the auto-execution of Closures and passing in first argument) you can tell the collection not to execute them via:
$collection->returnClosure(true); // Don't execute closure automatically
For more on Dependency Injection you can read our article regarding it.
Example #3 - Inheritance
It's possible to create sub-collections within a collection and allow the sub collections to access the items present in the parent. To do so you just pass the parent collection in via the constructor.
// Create parent collection
$app = new \Cora\Collection();
// Define resource on parent
$app->resource1 = function($app) {
return ['item1', 'item2'];
};
// Create child collection, passing parent in constructor
$app->subfolder = new \Cora\Collection($app);
// Define resource in child that depends on resource in parent
$app->subfolder->resource3 = function($app) {
return new \Cora\Collection($app->resource1);
};
foreach ($app->subfolder->resource3 as $item) {
echo $item; // Returns "item1" and "item2"
}
Speed Tests
The following tests were done using Cora model objects as the items. The sort method for the standard PHP sort test was done using "usort". Laravel and Cora use the same algorithm for sorting, so no comparison between them was necessary for sorting.
Adding 30000 items to:
Standard PHP Array = 0.0063 seconds
stdClass object = 0.0172 seconds
Laravel Collection = 0.0988 seconds
Cora Collection = 0.1331 seconds
Accessing all 30000 items:
Standard PHP Array = 0.0047 seconds
Cora Collection = 0.0823 seconds
Laravel Collection = 0.1005 seconds
Counting all 30000 items 30000 times:
Standard PHP Array = 0.0237 seconds
Cora Collection = 0.0324 seconds
Laravel Collection = 0.0577 seconds
Sorting all 30000 items:
Cora Collection = 0.4270 seconds
Standard PHP Array = 0.6603 seconds
Create From Simple Array
How to turn a basic array of items into a Collection object
$c = new \Cora\Collection([1,2,3,4,5]);
echo $c[4]; // Returns "5". Standard array access format.
echo $c->off4; // Returns "5". Object access syntax.
echo $c->get(4); // Returns "5". Required by PSR-11.
echo $c->offsetGet(4); // Returns "5". Required by ArrayAccess interface.
echo $c[5]; // Returns null. No such offset.
Notice that trying to access an offset that doesn't exist returns "null" rather than throw an Exception. This was a design choice with the Collection class that has tradeoffs and could potentially trip developers up. Make sure to check for a potentially null return as necessary when using this class.
You probably noticed that there's a bunch of different ways to access the same data via offsets. This is because different interfaces this class tries to be compatible with use different terminology. This first example was just for completeness; in the rest of this document we'll only show examples that feel appropriate.
Create From Associative Array
How to turn a basic array of items into a Collection object
$c = new \Cora\Collection(["one" => 1, "two" => 2, "three" => 3, "four" => 4, "five" => 5]);
echo $c[4]; // Returns "5". Standard array access format.
echo $c["five"]; // Returns "5". Associative array access format.
echo $c->off4; // Returns "5". Object access format.
echo $c->five; // Returns "5". Direct Object access format.
Notice you can use either the numerical offset or the associative array key to get the value.
The two different Object style access options both work well, but a discussion into the difference (internally) will be covered in the section on performance optimizations.
Create From Object Array
How to turn a basic array of objects into a Collection object
Below you'll find the most basic example of loading objects into a Collection. However, this does NOT showoff the cool features of Collections. See the Introduction or the "Why Object Syntax" sections for examples that much neater.
$rolesList = new \Cora\Collection([
new \Models\Role('User'),
new \Models\Role('Admin'),
new \Models\Role('Developer')
];
echo $c[1]->name; // Returns "Admin"
echo $c->off1->name; // Returns "Admin"
Using Data Keys
$c = new \Cora\Collection([
["name" => "Jake", "age" => 33],
["name" => "Bob", "age" => 42]
], 'name');
echo $c['Bob']['age']; // Returns "5". Standard array access format.
echo $c->Bob['age']; // Returns "5". Object access syntax.
When dealing with associative arrays or objects, you can optionally pass in a second argument to a Collection to make the items accessible by that identifier. In the example above, we used "name" as the key and so were able to grab Bob by his name directly.
If the key is not unique, the last item using that key will be the one returned.
Utility Methods
map
filter
where
Performance Optimizations
The tricks used to make Cora Collections fast
Earlier we had an example (shown again below) in which there was two different ways to access some data in a Collection using an object style syntax.
$collection = new \Cora\Collection(["one" => 1, "two" => 2, "three" => 3, "four" => 4, "five" => 5]);
echo $collection->off4; // Returns "5". Object access format.
echo $collection->five; // Returns "5". Direct Object access format.
This provides a great opportunity to discuss the internal performance optimizations used in the Collection class, why both these options work, and the internal difference between them. In the examples below we'll use this collection as our starting point.
Optimization #1: Fast Item Retrieval
First off, let's do a var_dump on this collection. Don't let this overwhelm you, we'll walk through it below.
object(Cora\Collection)[121]
protected 'parent' => boolean false
protected 'signature' =>
object(stdClass)[122]
protected 'signaturesToSingletons' =>
object(stdClass)[124]
protected 'singleton' =>
object(stdClass)[123]
public 'one' => int 1
public 'two' => int 2
public 'three' => int 3
public 'four' => int 4
public 'five' => int 5
protected 'content' =>
array (size=5)
'one' => int 1
'two' => int 2
'three' => int 3
'four' => int 4
'five' => int 5
protected 'contentKeys' =>
array (size=5)
0 => string 'one' (length=3)
1 => string 'two' (length=3)
2 => string 'three' (length=5)
3 => string 'four' (length=4)
4 => string 'five' (length=4)
protected 'size' => int 5
protected 'contentModified' => boolean false
protected 'returnClosure' => boolean false
protected 'sortDirection' => boolean false
protected 'sortKey' => boolean false
Alright, so this Collection's class stores resources in one of two data members... those being Singleton or Signature. Both those variables are instantiated to be PHP stdClass objects (generic objects you can store stuff within). The Singleton object stores any resources which are not Closures, so that includes numbers, strings, arrays, objects (excluding Closures), etc. In other words, most stuff you would throw in a Collection would end up there. The Signature object is exclusively for storing Closures (the name coming from the idea that it holds the "signatures" for how to create objects if you're using this class as a dependency injection container).
We can see from our example collection that we didn't define any Closures, so the Signature object is empty, but we did add some numbers to our collection, which are being stored in the Singleton object.
protected 'signature' =>
object(stdClass)[122]
protected 'singleton' =>
object(stdClass)[123]
public 'one' => int 1
public 'two' => int 2
public 'three' => int 3
public 'four' => int 4
public 'five' => int 5
When we access item 5 via its offset name:
echo $collection->five;
the Collection class just ends up passing the request directly to the stdClass object. Access to object properties in PHP is done using a hash table and access times are in O(1) for efficiency. If you aren't familiar with Big-O notation, this just means the access time is constant no matter how many items you throw in your collection. It doesn't matter if you have a million items in the collection or five, the access time is the same.
However, the following work slightly differently:
echo $collection->off4;
echo $collection[4];
echo $collection->get(4);
When dealing with a numerical offset, it's our desire to have the same level of efficiency as using the offset name. If we had a very large collection, what we don't want to happen is having to traverse through a million records to access offset 1,000,000. To accomplish this, the class will (only when necessary) calculate the offset keys. You might have noticed this in the original var_dump:
protected 'contentKeys' =>
array (size=5)
0 => string 'one' (length=3)
1 => string 'two' (length=3)
2 => string 'three' (length=5)
3 => string 'four' (length=4)
4 => string 'five' (length=4)
So what happens when we make a request for an item out of the collection using a numerical offset is it grabs the offset name using a hash lookup on this contentKeys array. The code looks like this:
return isset($this->contentKeys[$num]) ? $this->contentKeys[$num] : null;
So in the case of grabbing item five out of the collection using numerical offset 4, it grabs the key from contentKeys, then uses that to do the direct access for the actual item. This means it has to do two hash lookups to grab the result, but the efficiency stays in O(1). This is how access times are kept low and comparable to standard arrays.
Of course, this means we have to compute some data any time the content of our Collection changes. Which brings us to the next optimization...
Optimization #2: Data Computed Only When Needed
So without going into all the details, it's been established so far that there's computed data stored within a Collection instance. One of the pieces of computed data is the contentKeys array that enables efficient lookup of collection items no matter if you use a key or an offset number. What's important is that this table gets computed only when necessary. Let's execute the following code:
// Add a new item to our collection
$collection->six = 6;
var_dump($collection);
// Retrieve the new item
echo $collection->six;
var_dump($collection);
If you look at the var_dump output of both of the above, you'll see that neither action changed the contentKeys array! This is despite the fact that we added a new item to our collection:
protected 'singleton' =>
object(stdClass)[119]
public 'one' => int 1
public 'two' => int 2
public 'three' => int 3
public 'four' => int 4
public 'five' => int 5
public 'six' => int 6
protected 'contentKeys' =>
array (size=5)
0 => string 'one' (length=3)
1 => string 'two' (length=3)
2 => string 'three' (length=5)
3 => string 'four' (length=4)
4 => string 'five' (length=4)
The reason for this is that the keys are NOT recomputed automatically when you add a new item to the collection. Same goes for grabbing the item by its name. Neither of those actions required use of the key lookup array, and so the work to recompute it wasn't bothered with. When a collection is modified, it simply sets a flag internally, and when an action is taken that requires data be recomputed, THEN it will do it.
Let's try grabbing the new item 6 by numerical offset. That should cause the keys to get recomputed.
echo $collection[5];
Now looking at a fresh var_dump of our collection it looks like this:
protected 'singleton' =>
object(stdClass)[119]
public 'one' => int 1
public 'two' => int 2
public 'three' => int 3
public 'four' => int 4
public 'five' => int 5
public 'six' => int 6
protected 'contentKeys' =>
array (size=6)
0 => string 'one' (length=3)
1 => string 'two' (length=3)
2 => string 'three' (length=5)
3 => string 'four' (length=4)
4 => string 'five' (length=4)
5 => string 'six' (length=3)
And sure enough, there's our new item in the keys array.
So to summarize, computed data should only get re-computed when necessary and will get avoided for efficiency's sake whenever possible.
Warning About Mixing Closures With Other Types
Closures won't be mixed with Singletons in the order they were added when mixed together to form the "content" array. So $object->off5 might not give you the expected result. When mixing Closures with solid values, you need to use named offsets such as $object->repository.
Non-Existent Resources
This behavior has changed slightly in release 2.6. In the past, asking for a resource out of a collection would return NULL if it wasn't present; this applied whether you used named offset or function syntax.
How it used to work:
$c = new \Cora\Collection();
$c->item1 = 'Foo';
$c->item2 = 'Bar';
echo $c->item3; // Returns NULL
echo $c->item3(); // Returns NULL
if ($c->count()) {
echo 'There are items!';
}
Notice that "count()", since it's a function on the Collection class, was intelligently NOT treated like a
resource offset. The problem with this setup was when you made a typo such as $c->countt()
you would get
no exceptions letting you know you goofed up. Instead it would think you were trying to access a resource
and just return null without errors. This was a big programmer headache and bad design choice.
This has now been changed so that non-function calls to non-existent resources will still return null, but function calls will throw an exception.
How it works now:
$c = new \Cora\Collection();
$c->item1 = 'Foo';
$c->item2 = 'Bar';
echo $c->item2; // Returns "Bar"
echo $c->item2(); // Returns "Bar"
echo $c->item3; // Returns NULL
echo $c->item3(); // Throws an Exception!