-
Notifications
You must be signed in to change notification settings - Fork 0
Add script to extract and transform uploaded Timelapse data #71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
There is one additional type of metadata missing from what we can get out of Timelapse (at least, for this partner), which is metadata about the camera traps / locations themselves: e.g. a camera ID, GPS coordinates... In the data above, the only place to glean this is from the So as it stands, we will need to join this Timelapse data with an external dataset. While working with this partner, to streamline this process, @mmckown set up a KoboToolbox form with the following questions for them:
As part of a Maybe we should come up with a generic template of this form, and recommend it as part of our GuardianConnector-supported workflow for working with camera trap data and Timelapse? (@abfleishman note that this mirrors what you were thinking about in the discussions about how to get bioacoustic sensor metadata from the client) |
I think this "SearchTerms" and the entire ImageSetTable are specific to TimeLapse and may not be useful for capture separately.
One thought is around why we are ingesting these data: to store long term, to allow analysis/visualization, etc. If one of the things we want to support is allowing folks to retrieve something from GC that would allow them to open a "project" in TimeLapse again, then it seems valuable to know what is needed to "import" a project into timelapse. (All the important info might be in the template file + the csv export. IDK.
Yes that does seem interesting. I think there is probably other info that we want to capture beyond Matthews short form, but that is from a science standpoint. And I bet there is a bunch of info that community members might want that I would not think of. |
Good question. I think the answer is likely to be "yes", if we ask. The same question comes up about CoMapeo data all the time (where the answer is decisively "no"). Per page 91 of the Timelapse Reference Guide, it is possible to import a CSV, but only when already working with a So, if we decide to store the |
Since Version 2.3.2.0 (released in June 2024), Timelapse now supports tracking of Project- and Station-level metadata (see: Timelapse Metadata Guide and release notes). This appears to allow recording much of the deployment metadata (e.g., coordinates, camera trap name, etc.) directly within Timelapse, enabling it to serve as a singular source of truth. This could reduce redundancy, minimize potential points of failure, and help avoid edge cases associated with managing multiple upstream data sources. For example, one concern I have with using KoboToolbox for metadata collection is the need to customize forms to accommodate bespoke metadata needs. The KoboToolbox form builder UI can sometimes change field names during customization, and it's not always obvious to non-technical users that this has occurred. However, this raises the question of how such metadata is actually inserted into a Timelapse project. In our experience, much of this information is collected in the field using pen and paper along with a handheld GPS. Is the intended workflow for users to enter this data into Timelapse manually afterward, when they get back from the field? If so, this introduces the possibility of errors, such as missing digits in lat/lon values. Is there a way to streamline or optimize this metadata collection workflow? Perhaps we could still use KoboToolbox (or similar) and merge the data into the Timelapse database using a script? I'm planning to reach out to the Timelapse team to get their input. |
Also, just documenting @abfleishman's suggestion (corroborated by a colleague) to utilize the GBIF recommendations for standardized camera trap metadata. I am guessing that the most relevant section for our consideration is the list of column naming conventions in 3.4.1. Column naming. |
Zooming out: I think I’m leaning toward taking a different approach to this script, now that we know Timelapse supports Project, Station, and Deployment metadata—and that the columns for all of these (much like the image annotation data itself) can be customized at will. We’re likely better off treating this script as not expecting any predefined structure—just as we do for all connector scripts except We do know there are specific tables (or CSV exports) for If we try to enforce a rigid schema for Timelapse data, we’re likely to encounter mismatches—where users have unknowingly modified a structured Timelapse template to suit their use case—leading to ingestion errors. One downside to this approach is the risk of missing data needed for front-end visualization. For instance, a station’s latitude/longitude—which in Timelapse are stored in generic An alternative could be a semi-structured approach, as we use in e.g. the CoMapeo script. There, we do know that lat/lon fields exist under specific names. But in Timelapse, all fields are user-editable—including renaming latitude/longitude to localized translations—so there’s no reliable way to identify them without introspecting values heuristically, which is fraught. Other opinions welcome. |
Yikes. If the upstream data format is known and consistent then I see no need to accept open-ended data; however if it's true that all fields are renamable, then we have no choice but to accept anything that's given. Here's an edge case: What if today the data has a column "latitude" but tomorrow it has a column "breedtegraad"? It's one thing to allow a user-defined schema; but another to allow a dynamic schema. One option might be to have the Windmill script take as another input the (user edited) names of the cols representing lon and lat. Editing the Timelapse schema requires also editing the script run params. |
Good idea, thanks. There might be quite a few params to define like this for the envisioned FE described in ConservationMetrics/gc-explorer#46. But I'll give this some thought. |
Feature Request
Add a script that takes a path for Timelapse data (in ZIP file format), transforms the data to write to a database, and copies attachments to the datalake.
The script should have an optional "delete?" boolean, similar to the Locus Map script.
One key question is what kind of tabular data we want to ingest: a CSV export, or the raw contents of the Timelapse SQLite database(s).
Timelapse data structure
Note
Sample camera trap assets for a CMI partner (requires authorization to access) here: https://drive.google.com/drive/u/2/folders/1dWk2g3tK5tE5jM7hUfPuLBH7YHdtlHuy
In Timelapse, the
.tdb
database stores the project template and metadata, and the.ddb
database stores the annotations as applied to camera trap media. They are both SQLite dbs.Lastly, it is possible to export Timelapse annotation data as a CSV.
Let's dig into all three.
.tdb
Within the
.tdb
file, there are three tables:TemplateInfo
,TemplateTable
, andsqlite_sequence
. Only theTemplateTable
is of interest for ingestion; the other two contain non-relevant file metadata.TemplateTable
The columns in the
TemplateTable
are standardized:Id
(int): primary keyControlOrder
(int): not 100% sure but I'm guessing the order in which the fields appear in Timelapse.SpreadsheetOrder
(int): not 100% sure but I'm guessing the order in which the fields appear in a CSV export.Type
(text): the type of template field e.g. File, RelativePath, DateTime, DeleteFlag, FixedChoice, Counter, Note, FlagDefaultValue
(text): default value for the fieldLabel
(text): the user-facing label e.g. "Tënon_Soort" (type of animal in Wayana)DataLabel
(text): the system name for the field e.g. "Tenon_Soort"Tooltip
(text): user-facing tip about the data field e.g. "welke soort dier/tenon mekpale"TXTBOXWIDTH
: the width of the data field in TimelapseCopyable
(text): true/false value if the field can be copied or not.Visible
(text): true/false value if the field is hidden or not.List
(text): If it's aFixedChoice
field, then the list of choices will be stored here in the following way:ExportToCSV
(text): true/false value if the field will be included in a CSV export.ddb
Within the
.ddb
file, there are five tables (I'm not sure if these are standard or specific to the CMI partner asset):DataTable
,ImageSetTable
,MarkersTable
,TemplateTable
, andsqlite_sequence
.TemplateTable
This table is actually a duplicate of the homonymous table in the
.tdb
! So we may not need to work with the.tdb
database at all.ImageSetTable
This table seems to store metadata about the root folder of the project, and the number of media files annotated:
Id
(int): primary keyRootFolder
(text): the name of the folder where the template and media files are stored. For example, this might be "Recordings per month"Log
(text): unsure, it's empty in the partner assetRow
(text): number value, but it doesn't seem to match the number of rows inDataTable
. So not sure what this refers to.VersionCompatibily
(text) (not a typo): the version of Timelapse used, for example, "2.3.1.0"SortTerms
(text): an array composed of terms used... for sorting.SearchTerms
(text): object composed of aSearchTerms
array specifying search capabilities for each term (I provide just one example below), and then a bunch of key/value pairs of which I'm not sure what they do within Timelapse, or if they are useful to store. @abfleishman might know.MarkersTable
Not sure if this table has any value for ingestion, or what it does for that matter. For the partner asset, there are two columns, and empty rows:
id
(integer): primary keyTapsik_aantal
(text); this is for a Counter field type, so perhaps this table is added by default for counter fields, and the user did not leverage whatever functionality it offers in Timelapse.DataTable
This table stores all of the annotations.
There is an
id
(integer) primary key column.The rest of the columns are the values of the
DataLabel
column for each row inTemplateTable
.For example, the linked asset has the following rows. (The third column here represents the Timelapse field type in
TemplateTable
id
(integer)File
(text)RelativePath
(text)DateTime
(datetime)DeleteFlag
(text)Tenon_Soort
(text)Tapsik_aantal
(text)Takahe_eitop
(text)Ipokan_foto
(text)Enijika_tenei
(text)Reviewing the Timelapse Template Guide, the following columns are mandatory and the
DataLabel
cannot be changed:File
RelativePath
DateTime
DeleteFlag
CSV export
In addition to accessing the raw
.tdb
and.ddb
files, it is also possible to have the user export their data to a CSV file from within Timelapse. The menu option is called Export all data (folder data, image/ video data) to a CSV file.This file yields only the annotations, and the root folder. In other words, the same data as the
DataTable
from the.ddb
, supplemented by theRootFolder
column from theImageSetTable
from the.ddb
. For example (some names changed for privacy)...What kind of data do we want to store for the partner?
The CSV seems to really capture the bulk of what a user might want out of their Timelapse annotations. But having dug into the contents of the
.ddb
above, the key question is: is there anything in the Timelapse SQLite databases that we might want to store for the partner, that isn't already provided in the CSV export?For example, I can imagine that the contents of
ImageSetTable
->SearchTerms
might be useful for analysis purposes. But maybe not. (Again, @abfleishman curious to your take on this.)Furthermore, for reasons of provenance it might be good to store more of the Timelapse metadata as high-fidelity as possible. One really useful piece of metadata that is not included in this CSV is
ImageSetTable
->VersionCompatibily
(sp) which stores the actual version of Timelapse that was used for annotating. And perhaps even the field properties (like default values, or text box width?) might be useful someday... or they are just noise.One non-trivial advantage of a CSV Export is that it is an easier thing to ask for. Many users are familiar with a workflow of exporting a file from a software, and then finding that file (e.g., on their Desktop or Downloads folder) to send or upload it somewhere. Getting the SQLite database files will require the partner to use Windows Explorer to find their Timelapse root folder. Or... do they need to do that anyway to get the images/videos?
The text was updated successfully, but these errors were encountered: