In order to visualize data, we are require to initially to gather the data that pertains to us and to our study. We need to understand what data we need to gather. In addition, we need to answer the following questions:
- From where are we getting the data?
- How are we going to collect the data?
- Who will be providing the data to us?
- When can the data be obtained?
Before we gather data, Kirk states that we have to “establish some general criteria” as to what data is relevant and what is not, what data do we need and what data we do not need. Being able to answer this question would help us in gathering only the data that is required to us. In addition, it also helps us by reducing our effort in making necessary visualization and understanding it as we would work on only the necessary data. In order to know where we will be getting the data from, we need to locate whether you are working with primary or secondary data collection. Primary data collection involves lot of effort to gather the data whereas secondary data preexists which is why it is easier to add necessary additional information to existing data.
Also, to collect the data, it is easier if it is being collected digitally when compared to that of foraging it. A type of foraging of data could be by extracting it from the pdf files. Also, web scraping tools help in extracting both structured and unstructured data that has been published on the web. The data can sometimes be provided by stakeholders or clients or can be downloaded from the web in other instances. There also exists some third party services which provide data to us depending on our needs. The timeline of data also matters if the data is dependent on other factors such as surveys which need to be completed in order for us to extract it. These are initial questions that need to be answered in order to extract the raw data which is to be used for further understanding.
Kirk, A. (2016). Data Visualisation: A Handbook for Data Driven Design. SAGE Publications.