The software you use for producing tables needs to analyse data in any standard data format used in market research. I am sometimes surprised at the limitations of many tabulation software packages and the time staff spend to prepare data before you can even produce one table. Nowadays, you want to be able to handle data from different data collection platforms and be productive as quickly as possible. This article discusses the problems you can encounter and how MRDCL offers you time-saving and practical solutions.
Data collection platforms: some observations
There is an ever-increasing number of data collection platforms, particularly for online surveys. These range from fairly basic inexpensive tools to high-ticket products that offer huge flexibility for collecting data. However, price does not always correlate with the quality and flexibility of the data you can output for use in your favoured tabulation software. Fortunately, unlike other platforms, the powerful MRDCL scripted tabulation software comes with all the tools you need to handle most data that the data collection platforms output. Let’s start by listing the types of data that you may encounter and then dig a little deeper into problems that may arise.
The data formats that MRDCL can handle
MRDCL handles nine different data types. It handles all these formats efficiently with tools to deal with weaknesses often encountered in the exports from data collection platforms.
a) ASCII data format
A lot of market research data comes in this format. ASCII stands for American Standard Code For Information Interchange. In practice, it usually means a single line of numerics or texts with each position in the line storing a specific piece of data for one respondent. A survey of 100 respondents would have 100 lines of data. Most tabulation programs will have a limit on the length of a line that you can read. MRDCL’s limit is a generous 300,000 characters. (Not enough – see Multiple ASCII lines below). MRDCL has no limits for the number of lines in the file; other software may have limitations.
b) CSV data
Alongside ASCII data, CSV format is another common format for market research data. There is a comma after each field. Like ASCII data, each line represents the data for one respondent. Unlike ASCII data, the position in the line may be different for each record as a value of 5 would take one character, and a value of 100 would take three characters. MRDCL requires that the first line in the file has a header with a unique name for each field. The presence of this first line is standard practice. MRDCL has no limits for the number of lines in the file; other software may have limitations.
c) Excel XLSX files
Some market research data comes as one worksheet in an Excel file – this includes XLSX, XLSM, XLSB, XLS and other similar derivatives. Some tabulation software packages can read this data or import it, but it is easy to convert to CSV format from Excel or Google Sheets. MRDCL can read data in this format as it stands. MRDCL can also read data from multiple worksheets – in parallel, if necessary.
d) Binary data
Binary data files contain data that is neither readable in a text editor nor spreadsheet software such as Excel. The data is in a format that will need specialist software to decipher. Binary files are less common nowadays. Their principal benefit is that binary data files can store multi-coded data in a packed form and reduce disk space. This benefit is no longer a serious consideration in almost all cases. There are many subtly different binary data varieties; MRDCL reads about ten different variants and has no limitations. Most tabulation software cannot handle binary data.
e) MDB files
MRDCL can also read Access MDB files. The same comments about XLSX files apply. Most tabulation software cannot read MDB files.
f) Multiple ASCII lines
The limit for MRDCL is 300,000 characters per line for ASCII files. Although this is a generous limit, you can still exceed this limit. More commonly, some data collection platforms will have limits to the number of characters they can output. In these events, you may want to output multiple ASCII data lines for each respondent – maybe, three lines per respondent. Unlike almost all other tabulation software, MRDCL can process an unlimited number of ASCII data lines per respondent.
g) Quantum data format
Quantum is a particular type of binary data. It is only listed here as some MRDCL users have legacy data in older systems like Quantum. You can read the data directly using MRDCL. It is otherwise largely redundant.
h) Triple-S data format
Triple-S data is different from all the previous data formats for two reasons. Firstly, it contains metadata, the variables definitions/texts, as well as the respondent data. As Triple-S is well-structured for market research surveys, the import to MRDCL will generate MRDCL scripts and ASCII data files. Many other tabulation software packages have import tools from Triple-S; those without the facility are, in my view, lacking badly.
i) SPSS data format
Like Triple-S data, SPSS SAV files provide metadata. However, SPSS files are not always structured well, mainly due to variables with multi-responses being unconnected. MRDCL Central has a full set of tools to manage the import of SPSS files, even when they are badly structured. The import produces an efficient MRDCL script and data that is ready to analyse immediately and easily. We are not aware of any other tabulation software that handles SPSS files so effectively.
4 other important data considerations
1) Multiple data files
Many tabulation software platforms do not allow you to hold data in multiple files. There is often a need to combine all the data into one file. This restriction can be particularly inconvenient if you wish to handle a tracking study where each wave’s data is in a separate file. MRDCL allows you to use as many files as you wish, storing the files in lists with markers, if you wish, to indicate the period, sample type or any other differentiator dynamically. This capability is important when handling tracking studies or multi-country studies with (even minor) local differences.
2) Multiple streams of data
MRDCL allows you to read from up to four streams simultaneously. You could read data from an Excel spreadsheet containing 50,000 customers alongside survey data from 1000 respondents sampled from the customer database, matching the respondents as you process the data. The data files can be in different formats. Few products support this level of sophistication.
3) Hierarchical data
Hierarchical data is a term to explain different relationships within the data file. For example, you might survey doctors and ask each doctor about one to five of their patients. Similarly, you might survey respondents and ask them about their last five hotel stays. In both cases, there is a one-to-many relationship in the data requiring you to produce tables based on either doctors or patients and either respondents or trips. Most tabulation software cannot handle this level of complexity. Some software can process this data but only clumsily – you might have to add data from five trips together to produce the analysis you require, for example. MRDCL has all the tools you need to process such data efficiently.
4) Tracking studies: questionnaire changes
If you run a tracking study, there is a high chance that the questionnaire will change at some point. The changes are most likely to include adding questions, removing questions, modifying the responses to questions and differing data layouts. Some tabulation software products cannot overcome these hurdles or, if they can, require users to make lengthy recodes to align the data into a common format. This inflexibility can be time-consuming and prone to error. MRDCL allows you to process data in different formats from as many files as you need. Without this facility, tracking studies usually become more complex or time-consuming to handle as time passes.
Ease of use is important
Although this article has not covered every type of data and every problem that you may encounter, it hopefully highlights the importance of reading data of any kind – and easily. If staff need to spend time preparing or recoding data files to meet the tabulation software’s restrictiveness, it is not only time wasted but increases the risk of error. In the case of tracking studies, it can mean that projects become more and more complex over the project’s lifespan. If this article has highlighted any problems you face, please talk to me, and I will be pleased to share my ideas.