Friday, August 17, 2012

Using Multi-Column Data with D3 Part 1

In my previous tutorial, I showed how to import data from a CSV file or a text file. However, oftentimes the type of data that you need to use has multiple columns, like so:

Column 1 Title, Column 2 Title, Column 3 Title
1, 2, 3
4, 5, 6

rather than being in a single column format, like this:

1
2
3
4
5
6

One common reason for having multi-column data is if all of the data can be split into categories and you need to visually show which category each piece of data is in, through colors, spacing, positioning, highlighting, font size, or etc.

In this tutorial, I'm just going to show you how to fully reconstruct the data by accessing it in memory, as it's not entirely the simplest thing for a beginner to do. Once we're able to access individual pieces of data, we'll be able to create multi-column/multi-category graphs, filter out certain categories of data, color them, etc.


The person who ordered this tutorial gave me an example CSV file that I could use. After fixing a few typos, I decided that I might as well. :)

You can download this test CSV file here:

http://thecodingwebsite.com/tutorials/d3/multicolumn/data.csv

Also, for my own testing, I duplicated this file and took out most of the contents so that its data can easily fit within a page without scrolling - you can download that here:

http://thecodingwebsite.com/tutorials/d3/multicolumn/tinydata.csv

Let's briefly take a look at how it's set up. Unlike in the previous tutorial, this CSV file has what's called a "header line" at the top:

Site Type,Media Type,Data 1,Data 2,Data 3

This header line is the structure that the rest of the file will use. Every row of data will have its own "Site Type", "Media Type", "Data 1", "Data 2", and "Data 3" properties.

The "Site Type" and "Media Type" properties are for storing strings of text, like "Category 1" and "Section 1". On the other hand, "Data 1-3" are used for storing numerical values, like "25.33333333", "30", and ".5". There's no special indicator of this in the file - that's just how it was set up, and we, the coders, need to recognize that and code the graph accordingly.

So, the bottom line is: "Site Type" and "Media Type" are going to be used to filter/categorize the data, and "Data 1-3" are going to be the actual values of data displayed in our graph.


Because we used a header line this time, we can just use the "csv" function (rather than the "text" function), and now the data will already be parsed. So, this is going to be our starting point:

<html>

<head>

<script type="text/javascript" src="d3.v2.min.js"></script>

<script type="text/javascript">

 window.onload = function()
 {
  d3.csv("data.csv", function(data)
  {
      //Do something here.
  });
 };

</script>

</head>

<body>

</body>

</html>


When I was first experimenting with D3, this was the first thing I made (note: it's not any good - it's only for demonstration purposes):

http://thecodingwebsite.com/tutorials/d3/multicolumn/d3multicolumnfail.html

  d3.csv("data.csv", function(data)
  {
   //Add data to the graph and call enter.
   var dataEnter = d3.select("body").selectAll("p").data(data).enter();
   
   dataEnter.append("span").html(function(d)
   {
    return d + "</br>;";
   });
  });

Well that was a fail (not that I really expected it to work right away)! The resulting page looks like this:

[object Object]
[object Object]
[object Object]
...

I did everything like I was supposed to: I started with D3, selected the body of the page, selected all of the "p" elements (even though none exist yet), added the data from the CSV file, and called "enter" on it. Then, for each piece of data I added a new "span" element to the page and made each span's contents be the piece of data with a new line afterwards.

So... Why is it doing that? Let's take a look at the first two lines of the CSV file after the header line:

Category 1,Section 1,25.33333333,6.666666667,33
Category 1,Section 1,30,17,29.33333333

If you'll remember from the previous tutorial, the data is split by rows, not columns. This means that the first piece of data is all of this:

Category 1,Section 1,25.33333333,6.666666667,33

If D3 were supposed to import the CSV file as a plain text file and just split the data by rows only, then we would expect the first piece of data to be "Category 1,Section 1,25.33333333,6.666666667,33". However, it imports the CSV file as a CSV file, like it should. So, these are the actual contents of the first piece of data:

["Category 1", "Section 1", "25.33333333", "6.666666667", "33"]

In other words, the first piece of data in "data" is an array of data.

Now wait a minute... Isn't the "data" variable itself an array? Yes, it is! "data" is an array of arrays, or a multidimensional array.

In order to explain this better I will correct the code that I showed you earlier:

http://thecodingwebsite.com/tutorials/d3/multicolumn/d3multicolumnsuccess.html

  d3.csv("data.csv", function(data)
  {
   //Add data to the graph and call enter.
   var dataEnter = d3.select("body").selectAll("p").data(data).enter();
   
   dataEnter.append("span").html(function(d)
   {
    return d["Site Type"] + "," + d["Media Type"] + "," + d["Data 1"] + "," + d["Data 2"]
                + "," + d["Data 3"] + "</br>";
   });
  });

and bam... We have a working replica of our data! In case this isn't already starting to make sense to you, I will simplify it even further. This:

d["Site Type"]

is equal to "Category 1" for the first piece (row) of data. This:

d["Media Type"]

is equal to "Section 1" for the first piece of data. These:

d["Data 1"]
d["Data 2"]
d["Data 3"] 

are equal to "25.33333333", "6.666666667", and "33" for the first piece of data.

Finally, I simply add a comma in between each of these and return the resulting string.


What we're doing here is using "d", the variable representing each piece of data in the multidimensional "data" array, as an (associative) array. In order to access a particular column in each piece of data, you simply pass in the title (in quotes) of that column given in the "header line" at the top of the CSV file.

In case you need to know more information about JavaScript arrays, here's a good place to start:

http://www.w3schools.com/js/js_obj_array.asp

Also, even though you're using JavaScript, I highly recommend you read this tutorial on arrays in PHP... Other than some syntax differences, all of the information on this page can be very helpful to you for understanding numeric, associative, and multidimensional arrays:

http://www.w3schools.com/php/php_arrays.asp

That's it! Now you know the purpose of the header line, what a multidimensional array and an associative array are, and how to use multi-column/multi-category data in D3. Good luck! :)

21 comments:

  1. Hey, love the tutorials you've put up. d3 is actually approachable now!

    I had a question about how you access the relevant data through headers:

    Right now, you access it by name, for example, as d["Site Type"]. Is there a way to access the data through the index number such as d[0], d[1] etc. if I have a lot of headers?

    I tried d[0], or d[0][0] since I think it is how you access multidimensional arrays but I get 'undefined' as the output.

    Please help a noob out. Thanks!

    ReplyDelete
    Replies
    1. Thank you!


      It seems to me as though you're kind of missing the point: by using the "enter" function, d[0], d[1], d[2], etc. are automatically being referenced to all at once.

      Let's say your data is structured like this, however:

      Index, Name
      0, Bill
      1, Jim
      2, Amy
      3, Bob
      etc...

      To access each of the names you would first call the "enter" function and then use d["Index"] to access the Index value ("0", "1", "2", etc.) and d["Name"] to access each individual name ("Bill", "Jim", "Amy", etc.). Basically, you don't have to set up a loop to go through every value - the "enter" function is doing that for you.

      Take note of course that you don't even need to use the "Index" column or d["Index"] in this scenario, because as I pointed out in my very first D3 tutorial:

      http://thecodingtutorials.blogspot.com/2012/07/introduction-to-d3.html

      you can just use the second parameter in the function (I called it "i") to retrieve the index value.


      Now, let's say that you don't want it to loop through every piece of data for whatever reason: in this case, you can just use the "Index" property as I showed you above.

      Hope that helps!

      - Andrew

      Delete
  2. i have a tsv with following data:

    Weeks NumberOfTickets Under_SLA BreachedSLA
    Week1 13 4 9
    Week2 18 3 15
    Week3 12 9 3
    Week4 17 6 11

    for this i have used
    series = [
    [13, 18, 12, 17],
    [4, 3, 9, 6],
    [9, 15, 3, 11]
    ];
    code.

    i am trying with the code

    d3.tsv("radar.tsv", function (data) {
    series = data.map(function (d) { return [+d["NumberOfTickets"], +d["Under_SLA"], +d["BreachedSLA"]]; });
    });

    but its not working.

    ..please help me in finding correct code

    ReplyDelete
    Replies
    1. I think what it looks like you might want is this:

      d3.tsv("radar.tsv", function (data)
      {
      series = new Array();

      data.forEach(function(r)
      {
      var nextRow = { r["NumberOfTickets"], r["Under_SLA"], r["BreachedSLA"] };
      series.push(nextRow);
      }
      });

      I hope that helps!

      - Andrew

      Delete
  3. Hi Andrew.I tried this code but its not working for me. I am sharing full code with you, please help me in getting output, according to previous question what i had aksed.
    code as follows:

    var series,
    Age,
    minVal,
    maxVal,
    w = 400,
    h = 400,
    vizPadding = {
    top: 30,
    right: 0,
    bottom: 30,
    left: 0
    },
    radius,
    radiusLength,
    ruleColor = "#CCC";

    var loadViz = function(){
    loadData();
    buildBase();
    setScales();
    addAxes();
    draw();
    };

    var loadData = function () {
    var randomFromTo = function randomFromTo(from, to) {
    // return Math.floor(Math.random() * (to - from + 1) + from);

    };



    series = [
    [13,18,12,17],
    [4,3,9,6],
    [9,15,3,11]
    ];



    Age = [];

    for (i = 0; i < 5; i += 1) {
    //series[0][i] = randomFromTo(0,10 );
    // series[1][i] = randomFromTo(10, 25);
    // series[2][i] = randomFromTo(15, 30);




    switch (i) {
    case 0: Age[i] = "Week 1"; break;
    case 1: Age[i] = "Week 2"; break;
    case 2: Age[i] = "Week 3"; break;
    case 3: Age[i] = "Week 4"; break;
    case 4: break;
    } //in case we want to do different formatting

    }

    mergedArr = series[0].concat(series[1]).concat(series[2]);

    minVal = d3.min(mergedArr);
    maxVal = d3.max(mergedArr);
    //give 25% of range as buffer to top
    maxVal = maxVal + ((maxVal - minVal) * 0.25);
    minVal = 0;

    //to complete the radial lines
    for (i = 0; i < series.length ; i += 1) {
    series[i].push(series[i][0]);
    }
    };

    ReplyDelete
    Replies
    1. Try posting just the relevant code in a much smaller post and I will look at it.

      Delete
  4. This comment has been removed by the author.

    ReplyDelete
    Replies
    1. Your CSV title needs column titles; look at the very top of the post for an example.

      Don't forget that this is part 1 of 2: http://thecodingtutorials.blogspot.com/2012/09/using-multi-column-data-with-d3-part-2.html

      - Andrew

      Delete
  5. Sorry to have waste time. i didn't see part 2 before posting.

    ReplyDelete
  6. Andrew,

    I have been playing with this code for a bit and have got it all but working for me. I have no trouble reading the data, however my csv file goes like this:

    name,number,target
    Bob,2,5
    Greg,3,
    Max,2,

    and when i use the code below to see whats happening, it shows me it has lines with no value.

    dataEnter.append("body").html(function(d){
    targetNum = d["target"];
    console.log(targetNum);
    });

    What i'm trying to do is, make your graphs maxData variable read from the csv target column. The Graph:(http://thecodingtutorials.blogspot.com.au/2012/07/using-csv-and-text-files-with-d3.html)

    Is there a way of doing this??

    ReplyDelete
    Replies
    1. Try using 3 commas per line.

      - Andrew

      Delete
    2. I found the how to get the reply i wanted. This did the trick thanks

      console.log(data[0]["Target"]);

      Now i just need to pass the var to the other functions.... I would use a global var for this correct??

      Delete
    3. Never mind i just add the "data[0]["Target"]" part to the code i wanted to use it, and magically everything works... thanks heap Andrew. your code and tutorials are amazing

      Delete
  7. Thanks for your excellents tutorials and didatic, Andrew.

    ReplyDelete
  8. Hi Andrew,
    I've been scouring the web and came across this helpful page, but wondered if you could help me with a specific data question...

    I want to look at Yelp reviews for multiple business listings by month, and create an object like:

    obj 1 = {
    office: OfficeName1
    reviews: {
    Jan: 3
    Feb: 5 ...
    Dec: 20
    }
    Market: cityName
    }

    and I have a CSV File that is currently formatted like so:

    Office, Market, Jan, Feb, ... Dec.

    How do I make the months fall into the bigger category of "Number of Reviews by Month" or ("Reviews" as set in my example Object)?

    Thanks so much for your help!!!

    Best,
    Sarah

    ReplyDelete
    Replies
    1. You'll have to do that manually inside of your append function. ?

      This sounds like your question might be non-D3-related, in which case these links might help:

      http://stackoverflow.com/questions/1877864/how-to-construct-a-json-object-using-info-from-several-javascript-arrays
      http://www.w3schools.com/json/

      Delete
  9. Brilliant tutorial mate, simple very good if u are a beginner in D3

    ReplyDelete