Tuesday, July 22, 2014

Sound Source Localization Assumptions

I've been reading a number of research papers, etc., related to sound source localization as I attempt to build such a system (e.g. for traffic jam vs. open road classification), especially those about Time Delay of Arrival (TDoA) estimation, combining evidence from multiple microphone pairs, and calibrating the locations of the microphones in an array.  Some of those papers have been clear about their assumptions, which helps in understanding the limits of their designs.

The approach I'm taking involves digitizing the microphone inputs and computing the cross-correlation of the signals recorded by a pair of microphones at various time delays (e.g. how does the sound that arrived at microphone 1 compare to the sound that arrived at microphone 2 delayed by 1 ms, or delayed by 3 samples, etc.).  Let's see what assumptions I'm making...
  1. The speed of sound is constant across the region of interest (sources to microphones) over the short-term (e.g. 30 minutes). The speed of sound varies with changes in temperature, humidity, and air pressure, and also wind increases the speed in the direction of the wind, and decreases it in the opposite direction.
  2. For planar sound sources (e.g. for cars on surface roads, for fairly flat ground), we have at least 3 microphones in the plane of the sound.
  3. For non-planar sources (e.g. planes, people sitting and standing in a conference room), we have at least 4 microphones in a sensible non-planar arrangement; for example, 4 microphones in a tetrahedral arrangement gives us the best omni-directional resolution, because poor resolution from one pair (e.g. when the sound source is on the same line as the two microphones, but not between them) is compensated for by the other pairs.
  4. Not all microphones need to be paired up (e.g. we might have 4 microphones in one array, making up 6 pairs, and then another 3 microphones in another array, contributing another 3 pairs), so not all constraints below apply to every possible microphone pair, just those pairs used for TDoA estimation.
  5. For my "traffic detection, at a distance" project (the far-field), I want to be able distinguish between cars in the two lanes of the road (coming and going from the congested intersection), out to a distance of around 250 meters.  The road width is about 25 feet (measured from aerial image), so a separation of around 2 meters would be enough to distinguish the two directions.  That amounts to an angular resolution of 0.5 degrees... pretty darn small.  I've not done calculations yet to show whether this is feasible.
  6. For near-field applications (e.g. aiming a camera at a speaker in a conference room), we'd like to be able to distinguish objects 30cm apart, out to around 5m, or an angular resolution of about 3.5 degrees.
  7. The microphones in each pair are not too near each other, else our position resolution will be diminished, because the maximum TDoA is small. For example, with microphones 1 inch apart, the maximum TDoA is about 94 microseconds; with sound recorded at CD rate (44,100 samples per second, about 23 microseconds per sample), a sound can arrive at the second microphone at most 4 samples later than at the first, so at best we can say that a sound arrived from one of 9 (4*2 + 1) "directions" (broadly defined) relative to the microphone pair.
  8. The microphones in each pair have similar frequency response, else it is hard to compare their signals to compute TDoA.
  9. The microphones in each pair have similar response to sounds arriving from any direction; this is actually very hard to achieve, as microphones typically have some directionality to their response, even those known as omni-directional. If we have omni-directional microphones that all have the same orientation, then sources far away will appear to be coming from essentially the same direction relative to the orientation of the microphone for all of them; but for a nearby sound source, the sound might be at very directions relative to the microphone orientation, which can change both the amplitude response, and the frequency response.
  10. The analog to digital conversion (ADC) of the microphones is at the same sample rate for all microphones.
  11. The microphone through ADC path has similar response for both microphones in a pair (e.g. they have the same gain and electrical noise).
  12. The signal of interest is large enough by the time it is picked up by the microphones and converted to digital to be distinguished from noise (i.e. the signal-to-noise ratio is high enough, whatever that is).
  13. The sampled data is aligned (i.e. we have a straightforward means to get all the samples recorded "at the same time").  For my experiments so far this was easy because I'm using a 4-channel recorder, but for a more complete prototype I'll need a new solution.  My friend Bent recommends, based on his personal experience, the Motu 8pre, which has 8 microphone inputs; even better, he is willing to loan me his for some experiments!
I'm sure I'm making even more assumptions than I've included here (and some of these are really "desired features").

Friday, July 18, 2014

Mirror, mirror, on the wall, is there a fairer route in the land?

Some years ago I was often in a traffic jam approaching a T-junction near my house. I had a choice of taking the direct path, or taking a detour on to a circuitous route before reaching the potential traffic jam; naturally, I couldn't see the traffic jam before having to make that decision. This was before I had a smartphone with Google Maps, and before the traffic data in Google Maps was detailed enough to include that local backup.

This led me to ask this question: could I build a system, installed on my property, able to determine whether or not there was a traffic jam on the direct route? Such a system could publish this data to the web, and I would examine it before leaving work, saving myself anywhere from a couple of minutes, to 15.  Not earth shaking, but it would be nice.

If there was a direct line of sight from a window in my house to the road, I could just install a camera, and upload photos every few minutes, or on demand.  Of course, there isn't.

However, standing in my yard I can hear the road noise of fast moving vehicles, and the rumble of idling trucks in the distance, but I'm not usually sure what direction that rumble comes from.  So, could I build a system using microphones placed outside the house (mounted on the wall, or maybe in the yard), which could compute the directions from which the multiple sounds are arriving, or even better the
locations of the sound sources?  And having such an ability, could I then classify different types of sounds, such as idling cars in a traffic jam, a single idling truck at a loading dock, and vehicles moving quickly, preferably with enough spatial resolution to distinguish the slow moving cars approaching the T-junction from those quickly moving away?

I decided to try working on idea early last year, thinking of it as a passive acoustic sonar, though I've since learned this is often referred to as acoustic localization or multi-lateration.

My friend Bent loaned me a couple of microphones, which I used as the left and right channels of a 3 microphone array, the center microphone being the average of the 2 internal microphones of a TASCAM 4-channel recorder. Bent and I recorded ourselves talking as we walked back and forth in a line parallel to the line of the 3 microphones, perhaps 10 feet away. I then wrote some Python code to read the WAV files and perform cross-correlation of the channel pairs using numpy, but was dismayed at that results. The stereo effect was very evident when I listened to the left and right channels, but the results from my program showed nothing intelligible at all. I worked on it for a bit, but dropped the effort in favor of skiing.

After the end of ski season this year, Bent and I reviewed the code, and discovered that I just wasn't using numpy correctly. Sigh. After fixing those, we got this image:

The horizontal axis is time (12 seconds in total, with 10ms windows), and the vertical axis is the offset (in samples) between the sliding windows of samples from the left and right microphones, with the value being the cross-correlation between the normalized windows. I used matplotlib to render the results as an image, which used a default color map, where blue colors indicate low correlation, and red is high correlation.  The bright line across the image is where there is zero offset between the windows of samples (i.e. a high correlation value there implies there was a sound source on the line perpendicular to the mid-point of the line between the two microphones.

More to come.

Tuesday, November 22, 2011

Creating a Basic Isometric Map

Last year Christian Weber wrote a helpful introductory article showing how to use CSS to layout tiles for an isometric map. However, the screen shot and demo looked a little odd; note the black V below where you can see "underneath" a tile.

The problems were partly caused by placing tiles adjacent to each other when they didn't have matching sides (i.e. a side that is all grass should butt up against an all grass side of another tile).  In addition, the CSS styles for the tiles had the wrong offsets for identifying the positions of the sprites.  The code in the article compensated for the style errors with some "corrections", which of course made it more confusing.

I've posted a corrected version here.  It has a table indicating the type of each tile side, allowing for the generation of a random map were each tile's neighbors are appropriate, such that there are no gaps.  For example:

Sunday, May 8, 2011

Writing a Windows Service in Java

I've got a couple of Java programs that I want to leave running permanently on my laptop, so I set about creating a Windows Service. I investigated several of the alternatives (Java Service Wrapper, Yet Another Java Service Wrapper, etc.), but having recently discovered Java Native Access (JNA), I decided to see if I could produce a fairly lightweight solution.

JNA (and libffi on which it depends) provides a means to dynamically create a bridge between Java and native libraries, a feature I've been wanting for years. I'd used JNI in the past, but I find it rather brittle. I've been pleasantly surprised to find that even when I made various mistakes with JNA, the JVM didn't crash; instead, JNA threw relatively meaningful exceptions, such as when it was unable to bridge the gap between a call from Windows to a Java method I wrote to accept that call (i.e. the argument types I used weren't appropriate). For example, the Windows Service API requires the service to implement a ServiceMain function which will be called by Windows:
VOID WINAPI ServiceMain(
  __in DWORD dwArgc,
  __in LPTSTR *lpszArgv
That lpszArgv is a pointer to an array of pointers to strings that will be passed to the ServiceMain function. I wanted to define a type, ReceiveStringArray, extending Pointer, as the type of lpszArgv in Java. Unfortunately, JNA could not bridge the gap from the native arguments to ReceiveStringArray, so I had to fallback to using JNA's Pointer as its type, and fortunately Pointer has a method, getStringArray, that handles exactly the translation needed here.
interface SERVICE_MAIN_FUNCTION extends StdCallCallback {
   * ServiceMain is the main method of the service. It should return only
   * once the service is stopped.
   * @param dwArgc
   * @param argv A pointer to an array (of length dwArgc)
   *             of pointers to strings.
  void ServiceMain(int dwArgc, Pointer argv);
It may be that I was missing some appropriate constructor in ReceiveStringArray, or that a JNA TypeMapper was needed to handle initializing a ReceiveStringArray in this situation.

Another interesting challenge I had was that the first of my callback functions, ServiceMain, was being called at the expected time, but it was also being called when I expected a different callback function to be called instead. I'm not certain, but I think this is because JNA doesn't really care about the name of the method in the interface; there must be a single method, and that is the one that will be called. I had created two interfaces, but only a single class implementing them both. I suspected that I needed to have a separate class (or instance?) to receive each type of callback.

A typical Windows Service is basically just a program that runs in the background, with no direct interaction with the user, with a few extra required interactions with Windows. When the program is started it must call StartServiceCtrlDispatcher, passing in a table of services to be started (typically just one). The dispatcher starts another thread to run ServiceMain, then it dispatches various Windows events to the service (e.g. when Windows is shutting down), and returns once the service stops.
import com.sun.jna.Native;
import com.sun.jna.Structure;
import com.sun.jna.platform.win32.Advapi32;
import com.sun.jna.win32.W32APIOptions;

public interface ExtendedAdvapi32 extends Advapi32 {
  ExtendedAdvapi32 INSTANCE = (ExtendedAdvapi32) Native.loadLibrary(
      "Advapi32", ExtendedAdvapi32.class, W32APIOptions.UNICODE_OPTIONS);
  class SERVICE_TABLE_ENTRY extends Structure {
    public String serviceName;
    public SERVICE_MAIN_FUNCTION serviceProc;
  boolean StartServiceCtrlDispatcher(SERVICE_TABLE_ENTRY[] lpServiceTable);
The API doesn't include a length parameter to tell the dispatcher how many services are implemented by the program. Instead, the last entry in the table must have NULL pointers. Therefore, while the program doesn't need (at this point) the name of the service it is implementing, the field must not be NULL. Instead, set it to an empty string. For example:
ExtendedAdvapi32.SERVICE_TABLE_ENTRY entry =
    new ExtendedAdvapi32.SERVICE_TABLE_ENTRY();
entry.serviceName = "";
entry.serviceProc = someServiceMainFunction;
ExtendedAdvapi32.SERVICE_TABLE_ENTRY[] serviceTable =
    (SERVICE_TABLE_ENTRY[]) entry.toArray(2);
boolean result =

The ServiceMain callback function is invoked on another thread from the main thread, and shouldn't return until the service stops (usually when Windows is shutting down, but it can also be stopped and started via a control panel). The function is passed an array of strings (as an argc and argv, just like a C program's main). The first string in the array is the name of the service. The following elements of the array are the "Start Parameters". These can be set in the service's Properties dialog box:
ServiceMain needs to call RegisterServiceCtrlHandlerEx to provide the service control dispatcher with a function to be invoked when Windows notifies the service of events (e.g. when Windows is shutting down, a user logs in or out, or there is a change in connected hardware).
public interface ExtendedAdvapi32 extends Advapi32 {
  interface HandlerEx extends StdCallCallback {
    int serviceControlHandler(int serviceControlCode, int eventType,
                              Pointer eventData, Pointer context);
    public SERVICE_STATUS_HANDLE() { }
    public SERVICE_STATUS_HANDLE(Pointer p) { super(p); }
  SERVICE_STATUS_HANDLE RegisterServiceCtrlHandlerEx(
      String serviceName, HandlerEx handler, Object context);
Next, ServiceMain must call SetServiceStatus to inform Windows that the service running, and the types of event notifications the service wants to receive.
public interface ExtendedAdvapi32 extends Advapi32 {
  static final int SERVICE_WIN32_OWN_PROCESS = 0x00000010;
  class SERVICE_STATUS extends Structure {
    public int serviceType = SERVICE_WIN32_OWN_PROCESS;
    public int currentState = 0;
    public int controlsAccepted = 0;
    public int win32ExitCode = W32Errors.NO_ERROR;
    public int serviceSpecificExitCode = 0;
    public int checkPoint = 0;
    public int waitHint = 0;
  boolean SetServiceStatus(SERVICE_STATUS_HANDLE serviceStatusHandle,
                           SERVICE_STATUS serviceStatus);
For example:
public interface ExtendedAdvapi32 extends Advapi32 {
  static final int SERVICE_RUNNING = 0x00000004;
  static final int SERVICE_ACCEPT_SHUTDOWN = 0x00000004;
  static final int SERVICE_ACCEPT_STOP = 0x00000001;

serviceStatus.currentState = ExtendedAdvapi32.SERVICE_RUNNING;
serviceStatus.controlsAccepted = (
    ExtendedAdvapi32.SERVICE_ACCEPT_STOP |
At this point the service can do its job, but it must also have some way to be notified that it is time to stop (e.g. via a flag shared between the ServiceMain thread and the service control handler). When ServiceMain learns that it needs to stop, it must tell Windows that it has stopped before returning.
public interface ExtendedAdvapi32 extends Advapi32 {
  static final int SERVICE_STOPPED = 0x00000001;

serviceStatus.currentState = ExtendedAdvapi32.SERVICE_STOPPED;
serviceStatus.controlsAccepted = 0; // Accept no more notifications.

Service Control Handler (HandlerEx)
The service control dispatcher calls the service's control handler when the requested events occur. To support just shutdown and stop events, this suffices:
public interface ExtendedAdvapi32 extends Advapi32 {
  static final int SERVICE_CONTROL_SHUTDOWN = 0x00000005;
  static final int SERVICE_CONTROL_STOP = 0x00000001;

  // Must return NO_ERROR for this, not ERROR_CALL_NOT_IMPLEMENTED.
  static final int SERVICE_CONTROL_INTERROGATE = 0x00000004;    

public int serviceControlHandler(int serviceControlCode, int eventType,
                                   Pointer eventData, Pointer context) {
  switch (serviceControlCode) {
    return W32Errors.NO_ERROR;
  case ExtendedAdvapi32.SERVICE_CONTROL_SHUTDOWN:
  case ExtendedAdvapi32.SERVICE_CONTROL_STOP:
    // TODO Signal ServiceMain to stop.
    return W32Errors.NO_ERROR;

Encapsulating the Windows API
To avoid polluting the 'pure' Java with all of the above, I defined the following simple interface that my services would implement (which I can use on other operating systems):
public interface ISimpleService {
  int run(String[] args);
  void stop();
The return value from run could (in a slightly more complicated solution) be used to set the SERVICE_STATUS.serviceSpecificExitCode field.

These two classes are used to invoke the methods of ISimpleService:
class ServiceControlHandler implements HandlerEx {
  private final ISimpleService service;
  public ServiceControlHandler(ISimpleService service) {
    this.service = service;
  public int serviceControlHandler(int serviceControlCode, int eventType,
                                   Pointer eventData, Pointer context) {
    switch (serviceControlCode) {
    case ExtendedAdvapi32.SERVICE_CONTROL_INTERROGATE:
      return W32Errors.NO_ERROR;
    case ExtendedAdvapi32.SERVICE_CONTROL_STOP:
      return W32Errors.NO_ERROR;
      return W32Errors.ERROR_CALL_NOT_IMPLEMENTED;

class SimpleServiceMain implements SERVICE_MAIN_FUNCTION {
  private final ISimpleService simpleService;
  private final SimpleServiceControlHandler handler;
  private SERVICE_STATUS_HANDLE serviceStatusHandle;
  public SimpleServiceMain(ISimpleService simpleService,
                           SimpleServiceControlHandler handler) {
    this.simpleService = simpleService;
    this.handler = handler;
  public void ServiceMain(int argc, Pointer argv) {
    if (argc < 1 || argv == null) {
      // Missing the service name.
    try {
      String[] args = argv.getStringArray(0, argc, true);
      String serviceName = args[0];
      String[] startParameters = Arrays.copyOfRange(args, 1, args.length);
      serviceStatusHandle =
              serviceName, handler, null);
          ExtendedAdvapi32.SERVICE_ACCEPT_STOP |
    } finally {
      setServiceStatus(ExtendedAdvapi32.SERVICE_STOPPED, 0);
  private void setServiceStatus(int currentState, int controlsAccepted) {
    SERVICE_STATUS serviceStatus = new SERVICE_STATUS();
    serviceStatus.currentState = currentState;
    serviceStatus.controlsAccepted = controlsAccepted;
        serviceStatusHandle, serviceStatus);
And a function to create instances and start the service control dispatcher:
public static void runSimpleService(ISimpleService service) {
  SimpleServiceControlHandler handler =
      new SimpleServiceControlHandler(service);
  SimpleServiceMain serviceMain =
      new SimpleServiceMain(service, handler);
  entry.serviceName = "";
  entry.serviceProc = serviceMain;
  SERVICE_TABLE_ENTRY[] serviceTable =
      (SERVICE_TABLE_ENTRY[]) entry.toArray(2);

Example Service
Here is a trivial service that just waits to be stopped.
public class WindowsServiceHandlerExample implements ISimpleService {
  private final CountDownLatch latch = new CountDownLatch(1);
  public int run(String[] args) {
    try {
    } catch (InterruptedException e) {
    return 0;
  public void stop() {
  public static void main(String[] args) {
    WindowsServiceUtil.runSimpleService(new WindowsServiceHandlerExample());

Saturday, February 26, 2011

Distributing Work

During November and December of last year I developed my checkers program to the point where I could run all three of the experiments conducted by Fogel and Chellapilla.  The longest ran for 5 days on my 2 processor laptop, which indicates that it'll take months to run some of the experiments I have in mind.

Since I'm doing this exercise for fun, and I like building "systems", I decided to make a system for distributing work (games to be played) to other machines in the house.  I've developed a somewhat complicated Java web app (using embedded Jetty) that manages a jar repository and runs (in subprocesses) Java programs whose descriptions are uploaded.  The description consists of jar names and jar hashes (to avoid running the wrong version of a jar as I'm developing), a main class name, and command line.  The Java program is provided with a work area in the file system, to which the remote client can upload any necessary files prior to starting the program, and from which the remote client can download any files after the program is done, including files containing the output of the stdout and stderr streams.

I say "somewhat complicated" because I didn't make the decision to use embedded Jetty until late in the process, and if I'd made it earlier, I would have used more of Jetty's facilities to handle the file management.

To simplify creating the description of a Java program, I created a mechanism for walking the current process's classpath, creating jars for unpacked entries (e.g. the bin folders of my Eclipse projects), and uploading them to the various servers.

Next up I'm working on how to transfer units of work from the main program to the workers, and how to get the responses back.  My current plan is to have the main program run an embedded web server (Jetty) with a servlet that a worker will contact to get a unit of work (e.g. two checkers players to compete), and will then re-contact with the result of the unit of work (e.g. the game outcome).

This mechanism is certainly sufficient to the task, but feels a bit awkward.  I think I'd like some variant of a ThreadPoolExecutor that supports distributed threads, but I've not located nor invented such a beast.  Sigh.

More later.

Wednesday, December 29, 2010

Yes, Virginia, Testing is Important!

In my zeal to optimize my checkers playing program, I included a number of optimizations in the search algorithm: minimax with alpha-beta pruning, plus a memory of previous partial search trees to assist in ordering of future searches, and to skip evaluations when the results are known.  Too many optimizations, as it happens.

I wrote quite a few JUnit tests for the basic checkers board manipulation code, but wasn't sure how to go about testing the search algorithm. I wrote some very basic tests that the pruning was happening as expected, but left it at that.

I then wrote a system for evolving the neural networks, and repeated the evolution experiments described by Fogel and Chellapilla. Again, I wasn't quite sure how to evaluate the results, as I'm not yet ready to play hundreds of games online using the best evolved network as my evaluation function.

So, I studied the last generation's networks and the games they played, and noticed that they'd all played the same first move... and the same first reply... and the same next reply! The first three moves were all the same.  Wow! Did they all stumble upon the best opening moves?  Or perhaps all of the surviving networks were descended from a relatively recent network that made that choice?

I decided to confirm that initial generation of networks was much more random in its play.  Unfortunately, upon reviewing their games, I found they too played the same first three moves.  Sigh.  Back to the drawing board.

To track this down, I tried shuffling the legal move choices presented to the search algorithm, but that made no difference.  Next, I implemented minimax and compared its choices to those of my enhanced alpha-beta search. As you might expect, they didn't agree.  So, I implemented a basic alpha-beta search, which did agree with minimax.  In the course of debugging, I found several mistakes in the over optimized alpha-beta implementation, and I don't think I've found them all.  Fortunately, testing will help me figure out when I've got it wrong, and hopefully the next 1000+ generations of evolution will be more fruitful.

Saturday, December 4, 2010

Evaluating a Checker Board with a Feed Forward Neural Network

With two-player zero-sum games such as checkers and chess, it is common to build computer players based on the idea of searching the game tree to select moves using some variant of  the minimax search algorithm and an evaluation function.  The evaluation function takes a game board as input and returns a numeric score as output, where the score is highest when the board is a win for the player, and the score is lowest when the board is a loss for the player.  For checkers, an extremely simple example of an evaluation function just adds up the number of pieces the player has, and subtracts the number of pieces the opponent has.

When using a neural network to evaluate a checkers board, we need some representation of the checkers board and the pieces on it.  Fogel and Chellapilla chose to map the board to an array of 32 floating point numbers, where each entry had one of five values:

  • +K   Player's King
  • +1    Player's Man
  • 0      Empty
  • -1     Opponent's Man
  • -K    Opponent's King

K was an evolvable value, initialized to 2.0.

In addition to the above, in their second and third experiments, they computed a value directly from the inputs (i.e. without weights): they summed the 32 floating point inputs, to come up with a value that reflects the piece differential between the two players.

We can imagine other mappings.  For example, we might have several binary inputs per square, such as:

  • 3 bits: black, white, king; at most one of black or white would be set, and king is only set if one of the other two is set.
  • 4 bits: empty, black, white, king; at most one of empty, black or white would be set, and king is only set if black or white is set.
  • 4 bits: black king, black, white, white king; at most one would be set at a time.
  • 5 bits: black king, black, empty, white, white king; exactly one would be set at a time.

Fogel and Chellapilla chose a search depth of 4 ply (2 moves by each player), but extended the depth whenever a  move by a player was "forced".  Unfortunately I found their description of how this was handled a bit vague:
In addition, when forced moves were involved, the search depth was extended (let f be the number of forced moves) because in these situations the player has no real decision to make. The ply depth was extended by steps of two, up to the smallest even number that was greater than or equal to the number of forced moves f that occurred along that branch. If the extended ply search produced more forced moves then the ply was once again increased in a similar fashion. Furthermore, if the final board position was left in an "active" state, where the player has a forced jump, the depth was once again incremented by two ply. Maintaining an even depth along each branch of the search tree ensured that the boards were evaluated after the opponent had an opportunity to respond to the player's move.
It wasn't clear to me whether this test is performed only on the moves of the player, or also on the moves of the opponent.  Furthermore, it wasn't clear what the definition of "forced move" was.  Did it mean that the player to move had only jumps to choose among (i.e. possibly more than one); or did it mean that the player to move had only a single move that was possible, whether it was a simple move or a jump; or did it mean that the player to move had only a single jump available.

My best guess is that whenever the player, and not the opponent, had only jumps available (whether they had a choice of jumps or not), the search depth on that branch of the game tree was increased by 2 ply.

UPDATE 2010-12-09: Dr. Fogel was kind enough to respond to my questions about forced jumps, and he recalls that this meant the player whose choice it was had only a single move, and thus had no real choice.  This applied for either player in the game tree, and it didn't matter whether the move was a simple move or was a jump.