In my last post I was happy to report that I managed to get the USB interface to work. This interface has since proved to be extremely valuable in software development and testing. While the device is taking measurements you can look at the results (or intermediate results) at your PC in real time. You can even log large amounts of data to a .csv file and inspect the results in Excel.

But let’s look at the developements in a bit more detail. Last time the device was sending pulses and capturing some of the resulting zero-crossings but not much more than that. And in a not very structured way.

**Measuring amplitude**

Now it’s not only capturing the times when the zero-crossings occur but also measures the amplitude of the half-waves in between. In order to do that an interrupt is generated at every zero-crossing capture. Inside that interrupt service routine time of the zero-crossing is saved (as previously) and the ADC is started. In order to measure the amplitude at its maximum (or minimum in case of a negative half-wave), the ADC first samples the input for a 6.25 microseconds (which corresponds to a quarter-wave at 40kHz). I’ve chosen the sampling period slightly shorter to compensate for the time from when the zero-crossing occured unil the ADC is started. After the sampling period the internal sample-and-hold amplifier switches from sample to hold and the actual analog-to-digital conversion takes place.

The PIC’s datasheet advertises a maximum sampling rate of 1 million samples per second. Beware, though, that this requires separate Vref+ and Vref- connections and doesn’t work in differential mode anyway. In my configuration without separate analog references and differential measurements the maximum sampling rate drops to 400k samples per second. And the reported conversion time doesn’t include the sampling time yet. I wasn’t aware of that but luckily the PIC’s ADC is still more than fast enough for what I’m doing here.

**Finding the absolute phase**

Capturing the zero-crossings is relatively easy and the results are very precise. But there’s a challenge: there are hundreds of zero crossings for each burst of pulses sent. How do you know which zero-crossings to use?

The Arduino-based anemometer used an analog envelope detector to solve this problem. After averaging plenty of samples this works most of the time. But even after many attempts to tweek the analog circuit the envelope detector approach was never as reliable as I wanted it to be.

My standalone anemometer has an entirely different approach. Being able to digitally measure the amplitude of each half-wave we can tackle the challenge in software. We have a series of measurements and we have an idea of that the signal we are looking for looks like. Looking the scope screenshot below you could easily and reliably determine where the maximum amplitude occurs. That’s what we need to teach the software to do.

One could, of course, just loop through the all the amplitudes and find the maximum. But there are a lot of similar amplitudes and so any noise would make the result unreliable.

I did a bit of reading on digital signal processing (DSP) and realized that this is a classic DSP problem. The solution is a matched filter. You take the signal you are looking for (also known as *kernel* or* template*) and multiply it sample by sample with the captured signal and sum up the results. Do that for every possible (or reasonable) overlap of signal and kernel and find the position where the result reaches its maximum.

If you’re new to the subject (like I am), Steven W. Smith’s Digital Signal Processing is a great place to start. It’s even available online for free at dspguide.com. The method is also referred as convolution or correlation depending on who you ask. This wikipedia article gives a nice introduction to the subject.

**Implementing the matched filter**

I’ve parameterized the software to capture 34 zero crossings as well as the n=33 amplitudes in between them. The kernel consists of k=17 data points with a peak in the middle.

There are n-k+1=17 possibilities to entirely overlap the kernel with the signal. We can skip all the odd numbers where a negative kernel value would be multiplied with a positive sample value and vice versa. This leaves us with 9 possible positions to chose from.

For each position we need to do 17 multiplications and 16 additions. And we need to do that for every single measurement, i.e. 512 times per second. That sounds like a lot of work and it probably is. Luckily, since this is a very common DSP task, microcontrollers like the PIC32 have a special instruction for this. It’s called MAC – Multiply Accumulate and executes in a single clock cycle. The result is just amazing. The corresponding ISR takes less than 30 microseconds, including some housekeeping tasks as can be seen in the screenshot below.

Once the maximum amplitude is found, the 16 zero-crossings around it (i.e. the 8 before and the 8 after) are summed up and the sum is saved as the result of that measurement.

**Averaging measurements**

The goal is to report wind speed and temperature four times per second. Since there are 512 measurements per second we can use up to 128 individual measurements (32 in each direction) to do our calculations.

So we have plenty of data to identify and exclude outliers and do some averaging of the remaining samples. Robust statistics is the key word here, click here for the corresponding wikipedia article.

For now I’m doing something reasonably simple: First the 32 results per direction are sorted. The 12 lowest and 12 highest results are ignored and only the 8 results in the middle are summed up.

This might seem wasteful but it makes the average very robust against outliers and still results in 16 x 8 = 128 zero-crossings being averaged. The resolution of each zero-crossing is equal to 1 / 48MHz or about 20.83 nanoseconds. Summing up 128 of them gives us a resolution of far below a nanosecond. Beware however that resolution doesn not imply accuracy.

As a last step, the resulting sum is multiplied with 125 and then divided by 768 to make the unit 1 nanosecond. So every 250 milliseconds we finally have 4 time-of-flight measurements with a resolution of 1 nanosecond. That’s what we will use to calculate the winds speed as well as wind direction and temperature.

This sorting and averaging step is a bit more costly in terms of computation time. It takes around 600 microseconds to complete (see below). But unlike the convolution, it only has to be performed 4 times per second so this is more than fine.

**Calculating wind speed**

We are finally in a position to calculate the actual wind speed. There are various ways of doing this. For now I’ve just done the simple approach of taking the difference in time-of-flight, assuming room temperature and solving for wind speed. This means solving a quadratic equation for which I have resorted to floating point math and using <math.h> for the square root function.

I don’t have my wind tunnel any more so doing any testing was difficult. But one thing was soon obvious: at zero wind speed, the time-of-flight measurements match extremely well and are extrordinary stable from measurement to measurement. Also, looking at the intermediate results (i.e. the 128 single measurements before averaging) shows that there are basically no outliers present. I could randomly pick a measurement and still get a very reasonable value.

Something seems to be systematically wrong with the first of the 128 measurements but I haven’t had time to look into this. Otherwise, the results are impressively stable. And I’m only using a relatively primitive kernel for the matched filter.

**USB data logging**

As I’ve mentioned at the beginning, the USB interface lets me do some serious data logging even at this early stage of development. Here’s a list of what I can do by sending a character to the device from a USB terminal.

- ‘Z’: Get all the 34 zero-crossings of a single measurement
- ‘A’: Get all 33 amplitudes of a single measurement
- ‘F’: Get a full single measurement, i.e. 34 zero-crossings plus 33 amplitudes
- ‘T’: Get all the 4 x 32 = 128 time-of-flights for a measurement series
- ‘R’: Get the 4 averaged time-of-flights as well as wind speed and temperature

Some versions of these commands also let you specify the direction you’re interested in as well as how many sets of data you want to receive. This makes it easy to log large amounts of data that you can use to test possible algorithms on your PC before you implement them on the PIC.

**Next steps**

The next step is to get the I2C digipot working so I can control the amplification in software. My idea is to aim for a maximum amplitude of around 3 volts independent of wind speed and so on.

There’s also plenty of work to do to improve the algorithm that calculates wind speed and temperature. And then I also have to implement the I2C and SPI interfaces that let the anemometer communicate to other embedded devices like an Arduino or Raspberry Pi.

Having used floating point math and <math.h> I’m also running out of flash memory. I’m currently using 93% of flash (32kB total) and 52% of ram (8kB total). There will be a slightly revised board (Rev B) that uses a PIC32MX250 which is identical to the MX220 used here but has four times as much flash and ram.

That’s it for today. On the overview page you find the software at the stage of developement just described as well as some examples of logged data (all at zero wind speed) as .csv files.

The next post on this project is here: Part 24.

The technique you are applying is also called “Cross-correlation”. ( convolution is a different mathematical beast). I know it’s very effective so I considered this possibility when starting this project but I didn’t have a good micro to do that and your analog approach was more fun. I plan on buying a Teensy 3.2 ( 72 MHz, 256 kB of flash and 64 kB of ram) in the future, it should be able to perform well so maybe I’ll try that.

Hi Antiath,

You seem to be quite familiar and experienced with both the analog as well as the digital aspects of this project. Would be nice to see your designs and solutions to the various challenges…

Ho if it looks like I actually know what I’m talking about, It’s not the case at all. For this particular bit of digital signal processing, I studied that during my Master degree and it’s still quite fresh in my head but that’s all. For the rest, I’m not really more knowlegeable than your average electronic hobbyist. I don’t even have a working prototype for now. Still trying to improve the S/N ratio and get those pesky crosstalk issues out of the way. Now, the circuit is mostly a combination of your ideas (mainly the amplifier part), Carl’s ideas (for the driving side) and some tweaks I made to accomodate with what I have.

So I looked a bit more at your code and something puzzles me : how did you construct your kernel ? Is it a sin wave modulated with a guessed gaussian bell ? or did you average a large number of multiple signals to get a “mean” kernel ?

Also, I see in the code that you tried using the original square wave as a kernel ( the sequence of -1 and 1). That’s what I would have done so, did it worked ?

Hi Antiath

Yes, I did average a number of observations to get some idea what the bell shape should look like. The USB port makes it easy to export that kind of data to a CSV file.

But I did manually smooth it somewhat after that. The bell shape looks quite different for each pair of transducers. That’s why the latest version of the software keeps 4 distinct kernels, one for each direction.

cheers

lukas

Hi Lukas,

I am trying to realize a similar project and ran into trouble on the analog electronics side: very noisy results. I decided to use your design, and was wondering how you determine the “absolute” time of flight.

When I read your code correctly and ignore the averaging by summing subsequent zero crossings, it seems that the time of flight is simply the time from start to the zero crossing of the highest measured amplitude.

This value is corrected by the time of the pulses + 4.5 in the calculateAverageTimeOfFlight() function.

The 4.5 is a “magic number” to me, can you explain me what is the background of it? Is this come kind of tuning to get to the absolute travel time of the sound?

If not, how are you sure that the time to zero crossing with the highest amplitude – 4.5 pulses is the absolute time of flight?

Sorry if this information is somewhere else in your very detailed descriptions. I haven’t had the time yet to read them all 🙂

Regards

Hi Raymond

The absolute phase is found by doing a convolution of the received signal amplitudes with a filter kernel. The current kernel is very basic: it has its maximum at the center and then falls off to both sides. So yes, it basically finds the maximum amplitude.

In my tests I then noticed that the output did not coincide with the phase I was looking for so that 4.5-pulse-factor came in. There’s nothing magic about it. A better way to deal with this would probably be to adjust the kernel so that it finds the right phase without this 4.5 pulse shift.

lukas

Hi, I’ve been working again on my prototype lately after a long pause. The electronic design seems to be robust enough now and I need to finish the code. And so, I have a question for you.

After you computed the correlation/convolution, you sum up 16 time of flight around the position of the maximum.

You say you do that to get a high resolution unit of time ( on the order of ns) but I don’t really understand why summing successive tof from the same packet of pulses will get you something other than a really big number.

Ha ! I think I understand. Tell me if I’m wrong please : you count on the fact that you will subtract two time of flight from opposite directions but you sum the tof before that because it doesn’t matter mathematically . In the end, it’s like you calculate a difference of 32 tof ( 16 per direction) wich gives 16 substractions resulting in a value that should be close to each other. You average those 16 results AND you do all that multiple times and average even more.

Hi Antiath

I just replied to your other comment. Yes, summing up is just part of the averaging. Difference of sums or sum of differences, in the end it’s all the same.

How you process your ToF results to obtain wind speeds and potentially temperature is another matter. You have 4 ToF measurements (already averaged in this case) and 2 or 3 unknowns…

Hi Antiath

Each zero-crossing I add up is a noisy measurement of the relative phase shift. By adding up those noisy measurements i essentially average them. In theory, all the measurements are precisely 25us apart but in practice that number varies in the range of, say, 100ns. I don’t divide by n (as you usually would when averaging) but just change the unit. The unit of a single measurement is 20.83ns (1/48Mhz) so the sum of 16 measurements has the unit 1.3ns and so forth.

I don’t know much about DSP but quite at the beginning you are told that adding noise can increase the accuracy of a series of digital measurements. It sounds counter-intuitive at first but makes total sense once you think about it. In the end it is the presence of noise that allows us to measure something with a resolution higher than our native resolution of 20.8ns.

Yes thank you, it’s clearer now.

What you are talking about is the technique called “oversampling and averaging” (http://www.atmel.com/images/doc8003.pdf). This works only if the signal is changing over time (while the noise is not). In your case, you are dividing your time resolution by 16 wich corresponds to adding 4 bits to the resolution ( equation 3-2 of the document). And those 16 measurements correspond to changing (increasing) values of tof. So, in principle, the technique seems to apply.

What I understood was something more similar to the removal of noise trough differential signaling like with LVDS. You have two measurements opposite to each other, with noise introduced similarly on the two sides. By taking the difference between the two, the noise cancels itself while the signal remains, providing more accuracy (not more resolution). It works for LVDS because the two traces are close to each other so that external noise do apply similarly to them. Here, the analogy may not really apply because two opposite measurements are not taken at the same time.

It’s tricky :/

Okay after correcting some software issues, I could get an idea of the basic performance. For now I’m evaluating the delay of a measurement compared to the kernel ( wich should be 0 without wind) via cross- correlation. As you also noticed, we do get a weird delay even without wind. It depends on the axis and direction but I can get as much as 150 microseconds of false delay wich corresponds to 6 full pulsations. It seems consistent so it’s may be just a simple question of correcting in software but I still think it is quite odd. I’m averaging 30 zero-crossings of the same packet so I basicaly get 0.4 microseconds steps and I can already measure 5-10 km/h wind without much trouble. I can have roughly 60 measurements per second with the current timing and per sensor, so I should have enough time to oversample nicely and also get a reasonnable average.

I’d like to resolve the no wind delay but I have no idea why it happens. I use quite larger kernels (an average of 50 samples with 30 zero-crossings ) so I don’t think it’s the issue.

Hi Antiath

Cool to hear you’re making progress. I’m considering doing the convolution on the difference in amplitude instead of the amplitude directly. The difference should be positive until it suddenly changes sign. That might be easier to detect and be less dependent on the the signal strenght.

For the odd but consistant delay I don’t really have an explanation…

Hi Lukas!

First of all great job you did here!

I am trying to replicate your work, and I am currently testing the last version of the anemometer, but I am a bit puzzled since I get some “overflow”-like values when I use the external I2C. Sometimes when I point my wind to NS it works but then I get overflow on EW, or if I point towards EW i just get most of the times the weird overflow looking values. For example

Measurements:

NS 61820

EW 64468

46149

0

Hi Christian. Hard to tell from a distance what’s the issue. To me it looks like some pulses have never been received so the PIC waits for them to happen until some overflow occurs. Do you have the possibility to look at the comparator output with a scope?

How did you try to replicate it? Did you build your own board? Photos, please…

Lukas