Preventing ODK submissions from prehistory
When doing survey data collection, accurate timestamps are extremely important for establishing chronology and ensuring data accuracy, yet they are often difficult to ensure accuracy. In this post, I'll walk through a strategy I've used previously to try and get accurate data using ODK Collect.
Motivation and context
In a previous employment, we used ODK Collect as the data collection and validation component of long lasting insecticidal net distribution campaign funding by The Global Fund. These campaigns were door to door distributions of bednets organized at the provincial level, and were accompanied by a short ODK form to record basic household information, compute the correct number of nets, record the number of nets distributed, and other information. The survey data would be recorded entirely offline. Submissions were sent to the ODK Central server every day or other day, depending on WiFi availability at health zone offices where the phones were recharged.
The campaigns were intended to be brief, typically lasting two weeks for distribution across the province. In practice, however, they frequently extended over several months due to the complexity of the supply chain and challenges in last-mile delivery in areas with inadequate or nonexistent infrastructure. Additionally, net distribution campaigns were often phased campaigns. Data collectors and net distributors in each health zone would be trained and deployed independently and therefore some health zones would start or finish their campaigns earlier than others. Timestamps were an important source of data validation in this context since they not only indicated when the form was recorded, but also provided hints about which health zone may have submitted the data.
During early campaigns, a significant portion of our forms were submitted with dates in January 1970. These timestamps occur because the phones would have their batteries removed or swapped, and the internal CMOS would reset the time to the UNIX epoch on first boot. The automatic timestamps for form start and completion would be all be relatively to the epoch, rather than have a reference in actual time. Additionally, if the form contained a date selection, the default date value for the date widget is the current telephone date, and data collectors could simply continue through the form while recording aberrant data.
Detect and block submissions from prehistory
The solution to this issue is to block all data entry if the form detects a date that is from before the start date of the data collection campaign. You can do this by setting a note
type of question to be required
. The first screen presented to the data collector will be an error message instructing them that their phone date is incorrect and to update their phone settings. Once done, the form will allow for data collection to proceed.
I did this using an external configuration CSV file that was distributed with the form containing the compile date of the form. If the phone's current date was before the compile date, a form constraint would fail and present the error message. If not, the form would silently proceed. Here is the preamble in the survey
tab of the form XLSX.
type | name | label | required | relevant | calculation |
---|---|---|---|---|---|
start | start | ||||
end | end | ||||
deviceid | device_id | ||||
calculate | creation_date | number(pulldata('configuration', 'value', 'key', 'creation_date')) | |||
calculate | current_date | decimal-date-time(now()) | |||
calculate | creation_date_display | date(${creation_date}) | |||
calculate | current_date_display | date(int(${current_date})) | |||
note | ecran_accueil | Bienvenue sur le « Formulaire des Enregistrements pour le Dénombrement et Distribution des MILDA » | no | ${creation_date} <= ${current_date} | |
note | ecran_accueil_blocked | Bienvenue sur le « Formulaire des Enregistrements pour le Dénombrement et Distribution des MILDA » La date sur votre téléphone est fausse! Veuillez définir la date correcte pour continuer. Le date sur le téléphone est ${current_date_display}. |
yes | ${creation_date} > ${current_date} |
The Survey Worksheet
key | value |
---|---|
creation_date | 18691 |
The configuration.csv file contents
How does this work? The first three lines are not important - they contain the typical ODK preamble recording the start date, end date, and device identifier. The magic happens in the creation_date
variable. This value is pulled from the configuration.csv file using ODK's pulldata()
function. It is expressed as days since unix epoch to make it easy to compare to the ODK value. In Linux, you could get this value with the following command:
#!/bin/bash
echo $(( ($(date +%s) - $(date -d '1970-01-01 00:00:00' +%s)) / 86400 ))
The creation_date
is compared to the current_date
and fails if the current_date
is before the creation_date
. If that is the case, the second note ecran_accueil_blocked
will be shown to the user, instructing them to change the time on their phone. Importantly, the required
field is set to yes
for this note, effectively blocking the data collectors from continuing through the form.
But what are the current_date_display
and creation_date_display
for? These values are simply to provide a nicer user experience and display the exact problem to the user. If the form simply refused to enter data without more context, the data collector might assume something broke in ODK. By showing them the exact constraint logic, they are armed with all the tools to fix the issue.
FAQs
Why put this value in a separate CSV? Why not embed it directly in the form itself?
You could embed the value directly in the form itself. However, this becomes a hassle at scale for two reasons.
Firstly, it is much easier to safely update a small CSV file than a complex ODK form. If you accidentally removed a brace during your updates, your form may not compile, jeapordizing the whole campaign.
Secondly, there are specific conditions for updating a live ODK form in the field, and updating the survey
portion of the form bears the risk of needing to redeploy your form. By keeping everything in an attached CSV file, you can redeploy the same form with an updated attachment and version number without the risk of needing to ask data collectors to pull a new form. Updated form versions are pushed automatically and automatically updated on data collection devices.