Decrease data usage with Blue hardware
#1
Hi, 

I've been following blitzortung as an observer for a little while, and have the ability to add a sensor in central QLD, Australia. This would help cover Papua New Guinea, Noumea, and north east Australia better, where no sensors exist. The question that I have, however, is can the data usage be decreased? Currently the network available uses a 3G backhaul with a 6gb monthly allowance and is shared by 15+ users. We will likely be going to a satellite connection, however this will also be limited - perhaps 30-40gb / month. I see the Red board can pump out 3gb+ a day during intense storms - and being proximal to the tropics during the wet season this site could expect to see some of the most intense electrical storms on the planet. 

Is there a method where data can be compressed, and usage lowered?
Reply
#2
Hi,

yes, the high amount of traffic is an issue that we want to solve, i.e. by transmitting only usable data/channels. There's currently work in progress. It's independent from the system (RED/BLUE) as it's only related to the data transmission. A new (beta) firmware will be out very soon which will address even more issues. Please be patient.

Best regards
Tobias
Stations: 538, 1534, 1712, 2034, 2219
Reply
#3
Is the data uncompressed? Compression could be an option.
Reply
#4
I'd estimate with lossless compression there couldn't be saved much more than 50% of the traffic, which probably wouldn't be enough for anakaine. However it's possible to lower the gain, detect less strikes and thus have less traffic.

But another option came to my mind; maybe it's possible in exceptional cases to only transmit the timestamp and some parameters instead of the whole wavelet.
But that's a thing Tobi or Egon have to confirm whether that's an option, since the waveform may be taken into account for calculation at some point.
Stations: 233
Reply
#5
(2016-01-14, 10:06)Steph Wrote: I'd estimate with lossless compression there couldn't be saved much more than 50% of the traffic, which probably wouldn't be enough for anakaine. However it's possible to lower the gain, detect less strikes and thus have less traffic.
[/quote]
It's even less, maybe 10 to 20%. It's binary data like exe or jpeg, which can not be compressed very much.

(2016-01-14, 10:06)Steph Wrote: But another option came to my mind; maybe it's possible in exceptional cases to only transmit the timestamp and some parameters instead of the whole wavelet.
But that's a thing Tobi or Egon have to confirm whether that's an option, since the waveform may be taken into account for calculation at some point.
Yes, that's what I'm currently working on, but the timestamp itself is not enough for accurate location calculation.
Stations: 538, 1534, 1712, 2034, 2219
Reply
#6
(2016-01-14, 13:36)Tobi Wrote: It's even less, maybe 10 to 20%. It's binary data like exe or jpeg, which can not be compressed very much.

I agree, if you use simple "dictionary"-type of compression on the original signal.
But if you only store the deltas between each sample you can save some bits, since there is not much change between the samples.

For example: If i store the signal from my green station it's 256 Bytes, with default gzip it's 170 Bytes.
But if i do a delta between each sample I only need 2 or 3 bits per sample. So it's resulting in an uncompressed 64-96 Bytes.

For red stations, or blue with higher frequencies it's probably bigger, but i still think 50% is archievable, or even more if you find a way to use dynamic bits per sample.
Stations: 233
Reply
#7
To drop in my two cents: I have to agree with Steph, a delta encoder has a good chance of performing well if you actually send a waveform.

And actually several other stream compression methods could potentially perform quite well as well. Additionally the question is which sort of fidelity is necessary, if you know what you really need in the signal you could potentially build a short period of the signal out of a series of pre-determined orthogonal elements that are linearly combined in a way to match the signal with an acceptable error. Sending that combination could be a lot "cheaper" in terms of data than the current method. Question is of course how feasible it'd be to do that one on the fly with the existing hardware?

There are three issues with most methods I can think of though:
a) Server load during busy moments,
b) The moment you lose one packet you could probably throw away the rest of the data if you don't add redundancy.
c) Latency (if you hold on to your data longer you could gain more but you'll lose the quick response time).
Reply
#8
I was thinking QuickLZ, LZO, LZ4, LZ4-HC, or zlib for the the compression. My first concern would be how much data needs compressed and how much cpu power the Blitzortung receivers have. Would they be able to compress the data fast enough. The second concern is the the power needed to decompress at the server end. I agree some data is not very compressible, some is very easily compressible. I am still waiting for System Blue to launch so I can participate. Having not seen the data myself, I can not judge. Compression may not even be a worth while option.
Reply
#9
The majority of the data transmitted is 'junk' .  High gains, many useless Skywaves and Noise signals, local disturbers (which I am plagued with).  Undecided
 The goal is to eliminate as much junk as possible.  Ideally, perhaps, the network would work with stations covering radius of <1000km radius, and not attempting to detect signals in other hemispheres.  This requires station density and optimizations.  Many stations send too much noise, including mine.  As more stations come on line, those such as I will be reducing our antennas and gain settings.  Some of those stations are quite clean, in a good environment, and do very well with longer range. Many do not.
There will be a push for smaller antennas and less 'distance' capability, quality of data, etc. Since that is NOT the way the system is envisioned.  These are not 'stand alone' systems, but must participate as a 'cell' in a network to be effective.
A local station should normally go interference, and quit transmitting with 'nearby' storms... the rest of the network picks up the data, for example.
The TOA / TOGA system considers the whole pulse train, the frequencies and 'respective energy'  contained in the impulse, and not just the discharge pulse timing at trigger, or triggering by 1st or 2nd skywave signal... therefore 'recreating' or 'interpolating'  those zero crossing iterations from a 'sampling' would likely result in 'distortion' of the quality control and stroke information. The system wants as much of the complete stroke, with real data, as clean as possible.


Stations: 689, 791, 1439, 3020
Reply
#10
First, i agree on trying to get down the feeded data-amount as much as possible. Not all have unlimited connections and not all, including mine, have high-speed connections either.
Could one idea be to have some kind of "low traffic" setting as option where it reduces the feeded data-amount to a minimum even if the hitrate decreases a bit?

Quote:As more stations come on line, those such as I will be reducing our antennas and gain settings.

That sounds good only on paper. Any such thing need to be forced from the server-end if want it really to applied, a "can you please lower your gain"-thing will never work. It can be well seen on the weathernetworks (like EWN i admin) that not all care about their stations.
Stations: 1600
Reply
#11
(2016-01-16, 15:06)weatherc Wrote:
Quote:.... Any such thing need to be forced from the server-end.....

Angel  ... yep!
.... done to a limited extent already with "interference mode" forced changes in System Red...


Stations: 689, 791, 1439, 3020
Reply
#12
(2016-01-16, 09:16)kevinmcc Wrote: I was thinking QuickLZ, LZO, LZ4, LZ4-HC, or zlib for the the compression. My first concern would be how much data needs compressed and how much cpu power the Blitzortung receivers have. Would they be able to compress the data fast enough. The second concern is the the power needed to decompress at the server end.  I agree some data is not very compressible, some is very easily compressible. I am still waiting for System Blue to launch so I can participate. Having not seen the data myself, I can not judge. Compression may not even be a worth while option.

I see a significant issue with those choices. These compression algorithms only become efficient if you build up data in a buffer for a while. It'd be quite efficient if you would lets say build up data one hour at a time and then all send it in bulk, but that'd delay the detection significantly. While I presume the target is to minimize latency. You really want stream compression algorithms for this sort of job, and there are a few near-ideal cases but these are bad to use in practice for data you *must have* due to their total lack for redundancy. If you're willing to sacrifice a few detections you could probably get away with it though.
 

(2016-01-16, 12:51)Cutty Wrote: The majority of the data transmitted is 'junk' .  High gains, many useless Skywaves and Noise signals, local disturbers (which I am plagued with).  Undecided 
 The goal is to eliminate as much junk as possible.  Ideally, perhaps, the network would work with stations covering radius of <1000km radius, and not attempting to detect signals in other hemispheres.  This requires station density and optimizations.  Many stations send too much noise, including mine.  As more stations come on line, those such as I will be reducing our antennas and gain settings.  Some of those stations are quite clean, in a good environment, and do very well with longer range. Many do not.
There will be a push for smaller antennas and less 'distance' capability, quality of data, etc. Since that is NOT the way the system is envisioned.  These are not 'stand alone' systems, but must participate as a 'cell' in a network to be effective.
A local station should normally go interference, and quit transmitting with 'nearby' storms... the rest of the network picks up the data, for example.
The TOA / TOGA system considers the whole pulse train, the frequencies and 'respective energy'  contained in the impulse, and not just the discharge pulse timing at trigger, or triggering by 1st or 2nd skywave signal... therefore 'recreating' or 'interpolating'  those zero crossing iterations from a 'sampling' would likely result in 'distortion' of the quality control and stroke information. The system wants as much of the complete stroke, with real data, as clean as possible.

If the complete waveform is considered it'd indeed by tricky to decompose the signal into a series of orthogonal symbols unless you'd take the algorithm into account. Though I doubt anyone is insane enough to consider actually doing that one. I suppose more could be gained by introducing filtering to remove local noise and false signals in that case. Then again, a few gigabyte per day on a residential connection isn't really that much anymore in this day and era with 100+ Mbps readily available in many regions... Think it's more of an issue on the receiving server end really these days.
Reply
#13
I live in central Victoria (Australia) and am some what bandwidth constrained. I keep the gain down but still manage to pick a lot of strokes in the 500 to 1000 km range.
It occurs to me that most of the junk signals are due to powerline noise (50/60 hz), electric fences, UHF military transmissions, and the odd axle welding plant. If the wave form is converted to the frequency domain (FFT) it is very easy to design a very narrow band filter to get rid of the junk. then forward the FFT to the server for processing. I am not sure how this would work with the stroke picking algorithms but it may be worth considering.
Stations: 919
Reply
#14
The next firmware comes with a simple stream compression (lzfx). It can save about 10-20% data, measured with my station. The compression is done on the whole UDP packet, so it's an additional reducing of size independently from the other methods which affect single signals only. Other implementations mentioned by kevin should do the same. However, some of them need much more memory, like zlib, which needs more memory than the CPU actually has. Wink   

Additionally there's now an encoding of each signal which cuts out the noise by transmitting just the min/max amplitude and the length of the noise (3 bytes instead of 64 max.). Of course this is not lossless, we loose some information - but if guessed correctly, it's just noise. It can save up to 90% data but also nothing, if the signal is above the noise level. In thunderstorms there's often just a narrow peak in the middle with nothing than noise before and after. A lot of data would be saved here.

Another encoding which saves the differences of consecutive samples will be added later. It will even work in combination with the noise filter from above.

There are already filters integrated, like the spike filter, which doesn't send signals with just a spike mainly caused by on-board interferences from the digital part. Additionally, signals with too low amplitude won't be transmitted at all.

Of course a filter or an encoding based on deeper signal inspection or FFT would be great, this was the plan since we have System RED. But this needs really much time for testing and research. Additionally our CPU is not fast enough for checking all signals all the time, even with the integrated DSP which can speed up such calculations.


There will be a beta firmware for RED systems with the implementations above very soon.
Stations: 538, 1534, 1712, 2034, 2219
Reply
#15
(2016-01-19, 21:20)Tobi Wrote: Additionally there's now an encoding of each signal which cuts out the noise by transmitting just the min/max amplitude and the length of the noise (3 bytes instead of 64 max.).
Is this mandatory? I'd like to see my entire signal  Angel
Stations: 233
Reply
#16
(2016-01-17, 10:21)Bart Wrote:
(2016-01-16, 09:16)kevinmcc Wrote: I was thinking QuickLZ, LZO, LZ4, LZ4-HC, or zlib for the the compression. My first concern would be how much data needs compressed and how much cpu power the Blitzortung receivers have. Would they be able to compress the data fast enough. The second concern is the the power needed to decompress at the server end.  I agree some data is not very compressible, some is very easily compressible. I am still waiting for System Blue to launch so I can participate. Having not seen the data myself, I can not judge. Compression may not even be a worth while option.

I see a significant issue with those choices. These compression algorithms only become efficient if you build up data in a buffer for a while. It'd be quite efficient if you would lets say build up data one hour at a time and then all send it in bulk, but that'd delay the detection significantly. While I presume the target is to minimize latency. You really want stream compression algorithms for this sort of job, and there are a few near-ideal cases but these are bad to use in practice for data you *must have* due to their total lack for redundancy. If you're willing to sacrifice a few detections you could probably get away with it though.

You are assuming that you need a large collection of data to build a good dictionary for compression. If you use a predefined dictionary you do not need to time to build a dictionary as is done with typical compression methods. I am going to assume the Blitzertung data is quite predictable, in a predefined format, and good dictionary could be devised ahead of time. With a dictionary predefined the data being sent can be compressed and sent much quicker while still be very efficient in both time and size.
Reply
#17
@Steph: During the testing phase the compression/encoding can only be enabled by the server, otherwise we might get into trouble if something does not work as expected. Later we should have some configuration option for each station where the user can decide between "traffic saving", "standard" and "whole signals". The server will try to respect this decision unless there are no important (temporarily) reasons for a different method.

Regarding compression: We have almost only binary data and no repetitions and stream compression is not very efficient with such data. It's nice to have this compression, but it's not that important. The current solution is extremely simple and so far it seems to work well.   Smile
Stations: 538, 1534, 1712, 2034, 2219
Reply
#18
Delta encoding is also simple and no signal is lost Blush 

Example signal from my station:
First row is the raw signal, second row the delta, third row the delta-delta. After the first two samples, 4 bits per sample are enough in this case (not just in this cutout, but in the entire signal). Plus you can still run it through zlib afterwards.


Code:
101        
98    -3    
97    -1    2
96    -1    0
98     2    3
101    3    1
106    5    2
112    6    1
120    8    2
128    8    0
137    9    1
145    8   -1
153    8    0
159    6   -2
163    4   -2
165    2   -2
166    1   -1
166    0   -1
163   -3   -3
160   -3    0
156   -4   -1
151   -5   -1
146   -5    0
140   -6   -1
135   -5    1
130   -5    0
126   -4    1
122   -4    0
118   -4    0
Stations: 233
Reply
#19
@Steph: Of course the delta encoding will be added. I hope very soon. :-)
Stations: 538, 1534, 1712, 2034, 2219
Reply
#20
(2016-01-12, 22:32)Tobi Wrote: Hi,

yes, the high amount of traffic is an issue that we want to solve, i.e. by transmitting only usable data/channels. There's currently work in progress. It's independent from the system (RED/BLUE) as it's only related to the data transmission. A new (beta) firmware will be out very soon which will address even more issues. Please be patient.

Best regards
Tobias

I'd like to see this, I have a Sophos UTM as my Firewall which gives pretty good stats on all my network devices. Would like to see what new firmware can give in lowering data transmission, not that its even an issue for me Smile
Regards Simon
https://www.conligwx.org
Stations: 1283
Reply


Forum Jump:


Users browsing this thread: 2 Guest(s)