Work with Survey Data Queries

A survey may contain a huge amount of data and that can cause clients to face some limitations when they want to read all or a part of the data from the survey. For such workflows, we recommend using the query approach. This approach involves dividing the requested information into several reasonably sized chunks.

The query workflow always starts with calling POST query method which creates a new query. In this call you are allowed to specify a record filter and the variables you are interested in. Once a query is created, the user can request information about the query itself, variables schema and response data by calling the following GET methods. Only data records can be split into chunks, while the variables schema is always provided as a whole. By calling these methods users can orchestrate various workflows. As a rule, these workflows assume reading variable schema at the beginning, and then reading data records by chunks in a loop. After each read iteration a user may check the status of the query by calling GET query method. Nevertheless in some simple cases users may not read schema and/or not check query status at all, for example when user’s program works with predefined set of variables. After all the data records are read, the query will be automatically closed after a while. Note that the GET data method may return a variable number of records. For more details see topic "Get Data method"

The examples below describe a response data query, but the same can be applied to a respondent data query.

Create a Query

Parameters for this endpoint include the following:

  • variables - a comma-separated list of variable names. Here you can list names of: stand-alone variables, fields of compound questions (<question id>.<answer code>) and whole compound questions.

  • variableTemplateId - the id of a Variable Template created in Survey Designer.

  • filterExpression - a Studio filter expression (see Filter expression). If not set then all records will be returned.

  • format - the format returned when querying the data for your query. Default = "x-ndjson".

  • keepAlive - keep alive timeout in millisecond for query. This parameter is optional. Default value = 60000 (1 minute).

Formats:

  • x-ndjson - to stream records as a json array

  • tsv+zip - to write to a zip of tsv files (one per level). The delimiter is tab. No data is available before dataflow has completed.

  • csv+zip - to write to a zip of csv files (one per level). The delimiter is comma. No data is available before dataflow has completed.

Request
POST  https://<host>/v1/surveys/p1231234/responses/data/query?variables=q1,q1&filterExpression=response:status='complete'&keepAlive=30000 HTTP/1.1
Accept: application/json
Authorization: Bearer <access_token>
Response
{
    "queryId": 12345,
    "queryStatus": "preparing",
    "targetSurveyId": "p1231234",
    "links": {
        "self": "https://<host>/v1/surveys/p1231234/responses/data/query/12345"
    }
}

Get a Query

This method gets an existing query for the surveyId and queryId specified as the path parameters. There is no query parameters in this method. User may check status of the query. See topic "Query object" for details.

Request
GET  https://<host>/v1/surveys/p1231234/responses/data/query/12345 HTTP/1.1
Accept: application/json
Authorization: Bearer <access_token>
Response
{
    "queryId": 12345,
    "queryStatus": "started",
    "targetSurveyId": "p1231234",
    "links": {
        "self": "https://<host>/v1/surveys/p1231234/responses/data/query/12345"
    }
}

Get a Query Schema

This method gets survey response data schema for the surveyId and queryId specified as the path parameters. There is no query parameters in this method.

Variable schema contains all the necessary information about requested variables like variable id, type, labels, predefined answers and so on. For more details see "Understanding Response Data Schemas".

Take into account that schema is fixed within a single query so it doesn’t make sense to request schema several times. This call can be even omitted if you already have this information for example if your program always works with predefined set of variables.
Request
GET  https://<host>/v1/surveys/p1231234/responses/data/query/12345/schema HTTP/1.1
Accept: application/json
Authorization: Bearer <access_token>
Response
{
    "version": "1",
    "defaultConfirmitLanguageId": 9,
    "languages": [
        { "confirmitLanguageId": 9 }
    ],
    "root": {
        "name": "response",
        "keys": [
            {
                "name": "responseid",
                "variableType": "numeric",
                "isSystemVariable": true
        }
        ],
        "variables": [
            {
                "name": "q1",
                "variableType": "numeric",
                "isSystemVariable": false,
                "precision": 2,
                "scale": 0,
            }
        ]
    }
}

Get Query Data

This method gets the data created by the query. Response will differ depending on the format specified for the query.

x-ndjson: Gets the next portion of data records as a json array for the surveyId and queryId specified as the path parameters.

This method has one optional query parameter:

maxSize - maximum number of records to be returned within this portion. Default value = 100.

It is assumed that this method is called several times until all the requested records have been read. In order to read all the requested records you should check response code after each call. You should call the method while response code = 206, which means that current portion of records was successfully read, but there are still several records left.

Response code 200 means the query is completed and all the requested records have been read. The query will be automatically closed after a while.

Public API provides an effective way of reading data minimizing latency both on client and server side. Therefore, data chunks have a variable number of records. The parameter maxSize just limits the number of records from the top. You must take this into account when processing records. In other words; to understand if there are any records left, do not rely on number of records in current chunk, but analyze the response code instead.

tsv+zip/csv+zip: Gets the finished zip download folder containing all the data, written into one text file per level. This is not available before the query status is complete.

Request
GET  https://<host>/v1/surveys/p1231234/responses/data/query/12345/data HTTP/1.1
Accept: application/json
Authorization: Bearer <access_token>
Response (x-ndjson)
[
   {
      "responseid": 1,
      "respid": 1,
      "status": "complete",
      "last_touched": "2020-01-21T16:09:40.817+01:00",
      "q1": 1
   },
   {
      "responseid": 2,
      "respid": 2,
      "status": "complete",
      "last_touched": "2020-01-21T16:09:40.817+01:00",
      "q1": 2
   }
]
Response (text zip)
---
HTTP/1.1 200 OK
Content-Disposition: attachment; filename="p637851556603.zip";
Content-Type: application/zip
<zipped content>

Table 1. Status Codes
Code Description

200

The request has succeeded. Some survey data records for the specified query might be returned. The query is completed. Next call GET Data will not provide more records.

206

The request has succeeded. Some survey data for the specified query might be returned. If no data returned then the specified query does not have more records at the moment. The query is still active, run the request again for more records.

400

Bad request, the query parameters are not valid.

401

Unauthorized

404

Not found, the survey or the query was not found.

Filter Expressions

Use the filterExpression parameter to filter the set of records you want to get from the Api. See Create Filter Expressions for more information on syntax and usage.

Response compression

It is possible to turn on response compression for responses with body length above 1000 bytes.

To enable compression client should add "Accept-Encoding: gzip" request header. Compressed response will contain "Content-Encoding: gzip" header.

Request
GET  https://<host>/v1/surveys/p1231234/responses/data/query/12345/data HTTP/1.1
Accept: application/json
Accept-Encoding: gzip
Authorization: Bearer <access_token>
Response
OK
Content-Type: application/json
Content-Encoding: gzip
<zipped data>