Telemetry Data Archive Optimization In A Satellite Digital Twin

The telemetry data archive is a critical component of satellite ground systems. It provides historical data for engineering analysis, supporting satellite anomaly troubleshooting, data trending, and the generation of operational reports. Telemetry archives are also essential for satellite digital twins, as they supply the datasets used to train telemetry-based data models. Because this training occurs in real-time or near-real-time environments, it must be highly efficient. Achieving this efficiency requires not only optimized training algorithms for telemetry datasets, but also high-performance interfaces to the telemetry archives themselves, which have become a key bottleneck in improving data modeling efficiency for satellite digital twins.[1].

This technical note presents a database schema for optimizing telemetry data archives, thereby improving the efficiency of the interface between data training algorithms and the telemetry archive. Telemetry data downlinked from a satellite are compacted into data streams as frames or packets. Each frame includes a time tag in its header and sensor data values at specific locations in the packet, as defined in the telemetry database for a given mission. The bit length of each sensor value is tailored to contain sufficient information while allowing more sensor data to be packed into a single telemetry packet.

Telemetry data archives are generally implemented using two main approaches. The first approach stores telemetry data streams as files that must be de-commutated during data retrieval, which imposes additional computational overhead for both de-commutation and search. The second approach leverages database servers—particularly time-series databases such as InfluxDB[2] or Tiger Data[3]—which provide high performance for data ingestion and retrieval. In this case, telemetry databases typically use a common schema of time tag, mnemonic ID, and value. However, de-commutating the telemetry stream into this schema often leads to a substantial increase in data volume, even when additional compression schemes are applied within the database.

The new schema for the telemetry database takes into account of the following characteristics of telemetry datasets. First, it is a time series data with generally fixed data sampling rate, such as 1Hz that the period between data points is 1 second. Second, the bit length for the most telemetry data values are generally one byte or less. In addition, for discrete telemetry data points that represents the sensor operation status, they are mostly constant for extended period of time. This leads considerable room for data compression that reduces the data volumes.

In the proposed database schema, telemetry data are stored in chunks. Each chunk contains compressed telemetry data values over an extended period, such as one orbital period for low Earth orbit (LEO) satellites or every two hours for geosynchronous satellites. The optimized telemetry archive schema consists of three variables:

{time: REAL, int_id: INTEGER, values: BLOB}

This schema is designed for standard databases such as SQLite or PostgreSQL. The time field represents the start time of a data chunk, which cover an extended period. Because the sampling rate is fixed, individual time tags for each data point can be reconstructed during data retrieval. The mnemonic ID in telemetry data is typically a string, and for some missions it can be very long due to specific naming conventions. Mapping each mnemonic string to an integer ID in the telemetry database reduces the overall data volume. This mapping can be easily implemented using a static data file or a simple lookup table within the telemetry database.

The values column stores compressed binary images of data points over an extended period, whose duration depends on the operational requirements of the telemetry archive. Periods ranging from 1 hour 40 minutes (approximately one orbit for a low Earth orbit satellite) to 2 hours (for a geosynchronous satellite) are sufficient to enable effective data compression. At a 1 Hz sampling rate, each chunk for these periods contains roughly 6,000 to 7,000 data points. For some mnemonics, such as those used to determine satellite attitude and support high spatial resolution in remote sensing images, the sampling rate may be higher. In this schema, the sampling rate is implicit and can be obtained from the telemetry database, which is essential during data retrieval to reconstruct the time tags for each telemetry data point.

The initial test is performed on the telemetry data of a sun-synchronous LEO satellite with an orbital period of 101 minutes. There are about 1400 active mnemonics during the daily operations, the data volume for the packet image before the data de-commutation is 700 Mbyte per day. The SQLite database are used for the telemetry archive, which is a light-weight database and provide an ideal testing environment for telemetry archive with new schema. The straight-forward data ingest with mnemonic-id, time tag, and value for each data point will lead to 11Gbyte data volume per day. The data compression uses the zstandard package in Python, which offers the fast data compression with high compression ratios. The chunked schema reduces the data volume to about 370Mbytes per day, which is about a factor 1.9 reduction in data volume from the packet files before the data de-commutation stream. This is also about the factor 30 reduction from the 11Gbyte database volume without any compression scheme. The chunked schema for the telemetry database also makes the data retrieval of the specific mnemonics more efficient, since the data volume from the retrieval is about at least factor 2 smaller compared to the standard time tag and value schema while maintaining the flexibility and speed in data search provided by a database.

The chunked schema is only feasible if the database supports storing binary images using a BLOB (binary large object) data type. In our earlier implementation of the telemetry data analytics tool, we used InfluxDB[1] as the time-series database[2]. However, InfluxDB does not support a BLOB data type and therefore cannot accommodate a chunked schema for telemetry data. In contrast, BLOB support provides additional flexibility for representing the outputs of data training, since telemetry data patterns vary and the number of model parameters differs across datasets. Relational databases such as SQLite and PostgreSQL can be used to implement a telemetry database with the chunked schema, and time-series–oriented relational databases like TimescaleDB[3] also support BLOBs for this purpose.

In summary, the chunked schema for telemetry databases in satellite ground systems provides substantial reductions in data volume and significant improvements in data retrieval efficiency. This is particularly important for data training in satellite digital twins, which require frequent queries and retrieval of large telemetry datasets.

Reference:

Zhenping Li, “Satellite Digital Twins”, Journal of Satellite Operations & Communicator, July, 2024
See InfluxDB Web Site
See Tiger Data Web Site
Zhenping Li “Revolutionize Satellite Health and Safety Monitoring with Advanced Intelligent Monitoring System”, ASRC Federal White Paper, August 2023