Array_unique funny story

Here is a funny story about an “error” I thought I was dealing with recently with PHP and arrays. Assume you have a simple array (elements are keyed by incremental numbers – default assignment when you use $a[] = 1) with many elements (let’s say 10.000) whose unique values you wish to save in your database for later use. So how did I choose to implement this? I run my array through PHP’s array_unique and then serialize it and then store it in the database for later!

Everything works fine so far, so the time comes to retrieve the value from the database and process the array. I do a count just to see how many elements are in my array and the count returns let’s say 4.000 elements. Still everything is ok, I think this is normal since the original array had 10.000 elements but is perfectly normal for it to contain many duplicates so after removing the duplicates I got down to 4.000 elements.

So, for some reason now I chose to actually print the values included in the array and I got a nice loooong list of elements scrolling down my screen. As the values are printed I notice that the array keys are much greater than 4.000, some are even close to 6.000, so I do my count again, but still the count returns 4.000. At this point I started to think that this is a bit weird, and started doing a number of tests with serialize and unserialize, with different types of data, with different loop functions etc. etc. just to see what is going on with my values….

Did you get it?  Some things in programming are so simple yet it is so easy (at least for me) to get wrapped up in function definitions, data descriptions and so on, that there are times when you loose the simple logical solutions that are just right there in front of your eyes the whole time.

Where did I go wrong? Simple: when you run an array past array_unique you do get an array back that only includes the unique values of the original array but it also includes the key association of the first occurrence of each unique element that was actually included in the output. So, assume the following piece of code

<?php

   $a = array(); 
   
   $a[] = 100; 
   $a[] = 200; 
   $a[] = 300; 
   $a[] = 200; 
   $a[] = 300; 
   $a[] = 400; 
   
   print_r($a); 
   
   $a = array_unique($a); 
   
   print_r($a);

And the output of the code:

  Array ( 
        [0] => 100 
        [1] => 200 
        [2] => 300 
        [3] => 200 
        [4] => 300 
        [5] => 400 
    ) 

    Array ( 
        [0] => 100 
        [1] => 200 
        [2] => 300 
        [5] => 400 
    )

So as you can see the value 200 was include in the twice in the original array but only once in the second array with a key equal to 1 and the last element of the second array has a key value of 5.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s