Persephone blog
Network computing: data






Pages:
  1. Introduction - 21 April 2020



Introduction - 21 April 2020

This project is a spinoff of the Andromeda project, and is thus yet another simple, basic investigation of specific properties of signals and electronics that might be useful to computation. In this case, it is a look at correlation between signals which are distributed across multiple hardware platforms. This project makes use of sparse signals and signal groups, which have been described in previous posts. Eventually, it will also make use of complex numbers and elliptical states, although they might better be termed "rectangular states", as they will be represented by binary, rather than continuous, signals.

As always, this project consists of software and hardware. The software is written in C, for Linux on an Intel i5 desktop computer. The hardware consists of resistors, capacitors, and op-amps as necessary, and an Arduino Mega 2560 microcontroller board. This is in the same family as the other boards I currently have (all are Atmel AVR chips), but has more RAM and ports/pins, etc... This device is also programmed in C:

The current working setup, which is very simple, is intended to collect baseline data to which other experiments can be compared later this summer, and to develop an experimental and data processing procedure that can adapt to evolving hardware and circumstances over several years. Currently, data consists of collections of bitstrings of 0's and 1's, which become 0 and 5 volts in the electronics, having bit widths (durations) of about 1 ms. "In the field," these signals are probably several hundred to several thousand bits long, but at present I am using strings of up to only a hundred bits.

Small example

This small example uses a collection of signals that are 12 bits long, and which all have exactly 6 bits set to 1 (i.e. they are all 50% ON). The input file for this test looks like:

// psyche002.txt : persephone
// 4/23/20
// Copyright Sky Coyote 2020

id 1
"This is a baseline test of the psyche and post programs.
"100 12-bit 50% set signals.
"Computer random 50% set guess, random 50% cued, random 50% sent to hardware.
"4 classes.

expmode 0 0.5
hwmode 2 0.5
cuemode 2 0.5
guessmode 1 0
nbits 12
bit_width_us 1240
ntrials 100

bg_color 0.0 0.2 0.0 1.0

This file tells the psyche program to sequentially create 100 random 12-bit strings with 50% of their bits set, to display each string randomly only 50% of the time, to send the string to the external hardware 50% of the time, and to generate an additional random string (a local "guess") for each. The 100 pairs of strings are saved as output for further analysis. This experiment has 4 classes: no-cue+no-hardware, cued+no-hardware, no-cue+hardware, and cued+hardware.

Below is an example of the computer display for a cued trial. The "remote" string is the one at the top, the "local" or "guess" string is below it. In this particular experiment, the computer generates both the remote and local strings:


Next is an example of the computer display for an uncued trial. The remote string is not displayed, possibly because it is not available on the local device. Cued and uncued trials can be intermixed in the same experiment:


At present, the hardware and electronics consist just of the microcontroller board and an LED. The looped binary signal is played through the LED over and over, although it is generally too fast to see each bit (12 bits at 1250 usec each = 0.015 sec, or 66.666 hz). There is no visible indication on the computer display as to whether the signal was actually sent to the hardware or not (although then the LED lights up and traces appear on the scope):


Here are some close-ups of signals on the scope. The lower trace is the trigger signal; the upper trace is data. Each period begins with the downward stroke of the trigger:

Each pair of signals (remote and guess) are saved in a data file:

id              1
"This is a baseline test of the psyche and post programs.
"100 12-bit 50% set signals.
"Computer random 50% set guess, random 50% cued, random 50% sent to hardware.
"4 classes.
nbits          12
bit_width_us 1240
ntrials       100
expmode         0.000     0.500
hwmode          2.000     0.500
cuemode         2.000     0.500
guessmode       1.000     0.000
trial           0
  remote        6.000     1.000     0.000     0.500     0.522     000111001110
  guess         6.000     1.000     0.000     0.500     0.522     000101001111
  hwmode        1.000     0.000
  cuemode       0.000     0.000
  guessmode     1.000     0.000
trial           1
  remote        6.000     1.000     0.000     0.500     0.522     001011110100
  guess         6.000     1.000     0.000     0.500     0.522     010011110010
  hwmode        0.000     0.000
  cuemode       1.000     0.000
  guessmode     1.000     0.000
trial           2
  remote        6.000     1.000     0.000     0.500     0.522     000101111100
  guess         6.000     1.000     0.000     0.500     0.522     111100000110
  hwmode        0.000     0.000
  cuemode       1.000     0.000
  guessmode     1.000     0.000
etc...

10 experiments of 100 trials each were performed in this way. To evaluate each trial, the similarity between the remote string and the local string is calculated in the following way:

1. For each corresponding pair of bits (A and B) in the strings, 
   the partial raw similarity is given by:
   
              bit A = 0  bit A = 1
              ---------  ---------
   bit B = 0:    0.0       -0.5
   bit B = 1:   -0.5        1.0

   This is summed up for all bit pairs to yield a raw similarity(i) 
   for a shift of i bits, where i goes from 0 to nbits - 1.

2. A raw similarity(i) is calculated for all possible shift alignments 
   of the two strings.
   
3. The total similarity score is taken as the maximum of these values, 
   divided by the number of bits which are ON (= 0.0 to 1.0).

The similarity score is calculated for each trial of each experiment. In the output below, the class, total number of matching bits, and similarity score are shown on each line. Strings with a similarity of 1.0 are an exact match at some shift. Statistics for each class in each individual file, and for all files, are summarized at the bottom: number of trials, maximum, minimum, mean, and standard deviation of the number of matching bits and the string similarity (two groups of four numbers):

************ output/psyche_2020-0425-113406.data.txt
  0:   0    2.000    0.333
  1:   1    2.000    0.333
  2:   2    4.000    0.667
  3:   0    2.000    0.333
  4:   1    2.000    0.333
  5:   3    2.000    0.333
  6:   3    2.000    0.333
  7:   1    2.000    0.333
  8:   0    4.000    0.667
  9:   1    2.000    0.333
 10:   0    6.000    1.000
 11:   0    2.000    0.333
...etc...
 97:   3    2.000    0.333
 98:   2    2.000    0.333
 99:   1    2.000    0.333
--------------------------------------------  -----------------------------------
  0:  25    6.000    2.000    3.120    1.166     1.000    0.333    0.520    0.194
  1:  26    4.000    0.000    2.231    0.863     0.667    0.000    0.372    0.144
  2:  24    4.000    0.000    3.000    1.180     0.667    0.000    0.500    0.197
  3:  25    4.000    2.000    2.800    1.000     0.667    0.333    0.467    0.167
--------------------------------------------  -----------------------------------
  0: 251    6.000    0.000    2.821    1.049     1.000    0.000    0.470    0.175
  1: 249    6.000    0.000    2.787    1.073     1.000    0.000    0.465    0.179
  2: 257    6.000    0.000    2.957    1.091     1.000    0.000    0.493    0.182
  3: 243    6.000    0.000    2.848    1.071     1.000    0.000    0.475    0.178

Here is a plot of all results for all 4 classes (red = no-cue+no-hw, blue = cue+no-hw, yellow = no-cue+hw, green = cue+hw). The mean and standard deviation for each class are shown graphically at the left of each plot:


(Click for larger image)

Although there are 4096 different 12-bit numbers, there are only 352 different 12-bit signals, due to shift duplication (i.e. a signal shifted left or right by one or more bits is the same signal: this is just time displacement). Below are some of them. The bit pattern is shown at left, then the number of bits ON and the corresponding percent ON, and then a list of the numbers having that particular bit pattern (and all shifted versions). The line at bottom shows the number of distinct signals (groups) having from 0 to 12 bits ON:

ngroups = 352
   0:  0 0 0 0 0 0 0 0 0 0 0 0     0  0.000     0
   1:  0 0 0 0 0 0 0 0 0 0 0 1     1  0.083     1   2   4   8  16  32  64 128 256 512 1024 2048
   2:  0 0 0 0 0 0 0 0 0 0 1 1     2  0.167     3   6  12  24  48  96 192 384 768 1536 2049 3072
   3:  0 0 0 0 0 0 0 0 0 1 0 1     2  0.167     5  10  20  40  80 160 320 640 1025 1280 2050 2560
   4:  0 0 0 0 0 0 0 0 0 1 1 1     3  0.250     7  14  28  56 112 224 448 896 1792 2051 3073 3584
   5:  0 0 0 0 0 0 0 0 1 0 0 1     2  0.167     9  18  36  72 144 288 513 576 1026 1152 2052 2304
   6:  0 0 0 0 0 0 0 0 1 0 1 1     3  0.250    11  22  44  88 176 352 704 1408 1537 2053 2816 3074
   7:  0 0 0 0 0 0 0 0 1 1 0 1     3  0.250    13  26  52 104 208 416 832 1027 1664 2054 2561 3328
   8:  0 0 0 0 0 0 0 0 1 1 1 1     4  0.333    15  30  60 120 240 480 960 1920 2055 3075 3585 3840
   9:  0 0 0 0 0 0 0 1 0 0 0 1     2  0.167    17  34  68 136 257 272 514 544 1028 1088 2056 2176
  10:  0 0 0 0 0 0 0 1 0 0 1 1     3  0.250    19  38  76 152 304 608 769 1216 1538 2057 2432 3076
...etc...
 345:  0 1 1 0 1 1 1 1 1 1 1 1    10  0.833   1791 2043 2943 3069 3519 3582 3807 3951 4023 4059 4077 4086
 346:  0 1 1 1 0 1 1 1 0 1 1 1     9  0.750   1911 3003 3549 3822
 347:  0 1 1 1 0 1 1 1 1 1 1 1    10  0.833   1919 2039 3007 3067 3551 3581 3823 3838 3959 4027 4061 4078
 348:  0 1 1 1 1 0 1 1 1 1 1 1    10  0.833   1983 2031 3039 3063 3567 3579 3831 3837 3963 3966 4029 4062
 349:  0 1 1 1 1 1 0 1 1 1 1 1    10  0.833   2015 3055 3575 3835 3965 4030
 350:  0 1 1 1 1 1 1 1 1 1 1 1    11  0.917   2047 3071 3583 3839 3967 4031 4063 4079 4087 4091 4093 4094
 351:  1 1 1 1 1 1 1 1 1 1 1 1    12  1.000   4095

 12   352     1     1     6    19    43    66    80    66    43    19     6     1     1
[Entire file]

If the groups above are limited to signals with exactly 6 bits ON, then there are only 80 different ones:

ngroups = 80
   0:  0 0 0 0 0 0 1 1 1 1 1 1     6  0.500    63 126 252 504 1008 2016 2079 3087 3591 3843 3969 4032
   1:  0 0 0 0 0 1 0 1 1 1 1 1     6  0.500    95 190 380 760 1520 1985 2095 3040 3095 3595 3845 3970
   2:  0 0 0 0 0 1 1 0 1 1 1 1     6  0.500   111 222 444 888 1776 1923 2103 3009 3099 3552 3597 3846
   3:  0 0 0 0 0 1 1 1 0 1 1 1     6  0.500   119 238 476 952 1799 1904 2107 2947 3101 3521 3598 3808
   4:  0 0 0 0 0 1 1 1 1 0 1 1     6  0.500   123 246 492 984 1551 1968 2109 2823 3102 3459 3777 3936
   5:  0 0 0 0 0 1 1 1 1 1 0 1     6  0.500   125 250 500 1000 1055 2000 2110 2575 3335 3715 3905 4000
   6:  0 0 0 0 1 0 0 1 1 1 1 1     6  0.500   159 318 636 993 1272 1986 2127 2544 3111 3603 3849 3972
   7:  0 0 0 0 1 0 1 0 1 1 1 1     6  0.500   175 350 700 1400 1505 1925 2135 2800 3010 3115 3605 3850
   8:  0 0 0 0 1 0 1 1 0 1 1 1     6  0.500   183 366 732 1464 1761 1803 2139 2928 2949 3117 3522 3606
   9:  0 0 0 0 1 0 1 1 1 0 1 1     6  0.500   187 374 748 1496 1559 1889 2141 2827 2992 3118 3461 3778
  10:  0 0 0 0 1 0 1 1 1 1 0 1     6  0.500   189 378 756 1071 1512 1953 2142 2583 3024 3339 3717 3906
...etc...
  75:  0 0 1 1 0 0 1 1 0 0 1 1     6  0.500   819 1638 2457 3276
  76:  0 0 1 1 0 0 1 1 0 1 0 1     6  0.500   821 851 1229 1331 1642 1702 2458 2473 2662 2713 3284 3404
  77:  0 0 1 1 0 1 0 0 1 1 0 1     6  0.500   845 1235 1690 2470 2665 3380
  78:  0 0 1 1 0 1 0 1 0 1 0 1     6  0.500   853 1237 1333 1357 1363 1706 2474 2666 2714 2726 2729 3412
  79:  0 1 0 1 0 1 0 1 0 1 0 1     6  0.500   1365 2730

 12    80     0     0     0     0     0     0    80     0     0     0     0     0     0
[Entire file]

In this example, there are not an infinite or even large number of possible strings to choose from: there are only 80, and all of the remote and local strings generated above will be one or another of these. Therefore, we can evaluate how good any of the guesses are by comparing them to all other possible guesses. For each remote string, there are 80 different guesses, each of which yields a specific similarity score. We can separate these guesses into 3 groups and count them:

  1. Those with a similarity score lower than the actual guess,
  2. Those with a similarity score equal to that of the actual guess,
  3. Those with a similarity score greater than the actual guess.

For each of the trials shown above, this yields a set of three numbers indicating how many other guesses are worse, the same as, or better than the current guess:

************ output/psyche_2020-0425-113406.data.txt
  0:   0   2.000 0.333   4  52  24
  1:   1   2.000 0.333   1  46  33
  2:   2   4.000 0.667  44  35   1
  3:   0   2.000 0.333   1  45  34
  4:   1   2.000 0.333   0  43  37
  5:   3   2.000 0.333   1  47  32
  6:   3   2.000 0.333   1  49  30
  7:   1   2.000 0.333   6  56  18
  8:   0   4.000 0.667  43  36   1
  9:   1   2.000 0.333   0  50  30
 10:   0   6.000 1.000  79   1   0
 11:   0   2.000 0.333   1  46  33
 12:   0   2.000 0.333   0  43  37
 13:   3   4.000 0.667  48  31   1
...etc...
 95:   2   4.000 0.667  47  32   1
 96:   3   2.000 0.333   1  46  33
 97:   3   2.000 0.333   1  47  32
 98:   2   2.000 0.333   1  47  32
 99:   1   2.000 0.333   1  43  36

For a perfect match, the three numbers are n - 1 (all other guesses are worse), 1 (this guess), and 0 (no better guesses). These numbers are then divided by the total number of groups (= 80), and used as the x, y, and, z coordinates of a 3d plot. The colors correspond to the 4 trial classes described above:


(Click for larger image)

Since the numbers n1 + n2 + n3 = 80, all points lie on a plane which passes through the three points (1, 0, 0), (0, 1, 0), and (0, 0, 1) at the ends of each axis. However, they do not evenly fill this 2d region, but are confined to a narrow 1d swath. This swath contains:

  1. The "best" point, near the end of the x axis, such that x is almost = 1.0, y = 1 / n, and z = 0.0.
  2. The "worst" point, near the end of the z axis, such that x = 0.0, y = 1 / n, and z is almost = 1.0.
  3. Other points with intermediate x and z coordinates, indicating varying numbers of superior and inferior guesses, but with y values indicating that they share the same similarity score with several other guesses. This isn't too surprising from the data above, since the raw similarity scores are just 0, 2, 4, and 6 bits.

One interesting thing to note is that while there were 1000 trials in the experiment, I count only 37 different points in the plot, so there are lots of duplicates, whereas for a group of 80 strings, there are 80x80 = 6400 combinations. Below left is a plot of the similarity scores for each of these combinations. Each row represents a different remote string, and each column a different guess. Colors are for similarity scores ranging from 0.0 (red) to 1.0 (yellow). Pairs of strings located at points in red have a similarity of 0.0, and are therefore orthogonal by this definition. The plot is symmetric, and the diagonal is all 1.0 (where the guess = remote). The plot at right shows the locations of the 1000 trials from the experiment above, in which there are 926 different pairs:


(Click for larger images)

Tuples for {n1, n2, n3} can be calculated for the entire 80x80 string set, and are shown below. In this case I count only 42 different points. For example, the whole diagonal of the plot above maps to the single "best" point near the end of the x axis:


(Click for larger image)

There are 352 different 12 bit signals, and they can all be compared to one another in a single 352x352 = 123904 point plot. There are two color maps shown below. At left, similarity scores greater than zero are in shades of red, zero is black, and negative scores are in blue. Similarity scores can be negative when the two strings have different numbers of ON bits (for example, a string with all bits ON and another with all bits OFF have a score of -n / 2.0, since they are different at each bit). At right, scores greater than zero are in hues from blue to yellow, zero is black, and negative values are red:


(Click for larger images)

Below is the entire 12 bit tuple set in 3d. The y axis has been expanded by 4x, as now there are fewer strings with equal similarity scores. Color indicates the number of ON bits in the remote string, from blue (0) to red (12):


(Click for larger image)

Larger example

This example uses a collection of signals that are 100 bits long, and which all have exactly 50 bits set to 1 (i.e. they are all 50% ON). The input file for this test looks like:

// psyche003.txt : persephone
// 5/3/20
// Copyright Sky Coyote 2020

id 2
"This is a baseline test of the psyche and post programs.
"100 100-bit 50% set signals.
"Computer random 50% set guess, random 50% cued, random 50% sent to hardware.
"4 classes.

expmode 0 0.5
hwmode 2 0.5
cuemode 2 0.5
guessmode 1 0
nbits 100
bit_width_us 965
ntrials 100

bg_color 0.0 0.2 0.0 1.0

This experiment also has 4 classes: no-cue+no-hardware, cued+no-hardware, no-cue+hardware, and cued+hardware.

Below is an example of the computer display for a cued trial:

Here is an example of the computer display for an uncued trial:

Here are some close-ups of signals on the scope:

Pairs of signals are saved in data files:

id              2
"This is a baseline test of the psyche and post programs.
"100 100-bit 50% set signals.
"Computer random 50% set guess, random 50% cued, random 50% sent to hardware.
"4 classes.
nbits         100
bit_width_us  965
ntrials       100
expmode         0.000     0.500
hwmode          2.000     0.500
cuemode         2.000     0.500
guessmode       1.000     0.000
trial           0
  remote       50.000     1.000     0.000     0.500     0.503     0101000010000111010010111111110010010111010010111110111000110001111110110101010000100001010110000100
  guess        50.000     1.000     0.000     0.500     0.503     1010100010111001110001110000000110001000011010000111100101110101101110101110000111010011110000111101
  hwmode        0.000     0.000
  cuemode       1.000     0.000
  guessmode     1.000     0.000
trial           1
  remote       50.000     1.000     0.000     0.500     0.503     1010000101011111000101110001011001111000011001000001001000100110100101011110010110110011010111010111
  guess        50.000     1.000     0.000     0.500     0.503     0000100101000110110001100101011011111001110111011011110010010010110101101101101000001001010000010111
  hwmode        1.000     0.000
  cuemode       1.000     0.000
  guessmode     1.000     0.000
trial           2
  remote       50.000     1.000     0.000     0.500     0.503     0010101001011001010011010001000101001000100001010011010110111011100101010101100011111000111110110111
  guess        50.000     1.000     0.000     0.500     0.503     0001011000111100011010101100111101000100101011100000011100010101110110001110100001001001010101111111
  hwmode        1.000     0.000
  cuemode       1.000     0.000
  guessmode     1.000     0.000
trial           3
  remote       50.000     1.000     0.000     0.500     0.503     0011010110011011010001011001100011011011111010101001011011010001001101101000000100101100101111010001
  guess        50.000     1.000     0.000     0.500     0.503     1010010111001101111011100010111111101101010011111011101100001000001100000000001111010011000010001001
  hwmode        0.000     0.000
  cuemode       1.000     0.000
  guessmode     1.000     0.000
etc...

10 experiments of 100 trials each were performed in this way. The similarity scores for each trial, and for each class, look like:

************ output/psyche_2020-0505-133457.data.txt
  0:   3   14.000    0.280
  1:   1   16.000    0.320
  2:   0   12.000    0.240
  3:   1   16.000    0.320
  4:   0   12.000    0.240
  5:   3   18.000    0.360
  6:   3   12.000    0.240
  7:   1   12.000    0.240
  8:   3   16.000    0.320
  9:   1   10.000    0.200
 10:   1   12.000    0.240
...etc...
 95:   3   12.000    0.240
 96:   3   10.000    0.200
 97:   1   12.000    0.240
 98:   3   12.000    0.240
 99:   3   14.000    0.280
--------------------------------------------  -----------------------------------
  0:  28   20.000   10.000   12.857    2.272     0.400    0.200    0.257    0.045
  1:  29   18.000   10.000   12.897    2.110     0.360    0.200    0.258    0.042
  2:  19   20.000    8.000   12.947    2.857     0.400    0.160    0.259    0.057
  3:  24   18.000   10.000   12.583    2.320     0.360    0.200    0.252    0.046
--------------------------------------------  -----------------------------------
  0: 250   20.000    8.000   12.456    2.158     0.400    0.160    0.249    0.043
  1: 265   20.000    8.000   12.747    2.111     0.400    0.160    0.255    0.042
  2: 223   20.000    8.000   12.422    2.186     0.400    0.160    0.248    0.044
  3: 262   20.000    8.000   12.298    2.091     0.400    0.160    0.246    0.042

Here is a plot of all results:


(Click for larger image)

The table below shows the distribution of signal groups for 0-20 bits. The second colum is the total number of groups for n bits, and additional columns are numbers of subgroups having from 0 to n bits ON. The maximum count is always for half of all bits ON:

  0     0     0
  1     2     1     1
  2     3     1     1     1
  3     4     1     1     1     1
  4     6     1     1     2     1     1
  5     8     1     1     2     2     1     1
  6    14     1     1     3     4     3     1     1
  7    20     1     1     3     5     5     3     1     1
  8    36     1     1     4     7    10     7     4     1     1
  9    60     1     1     4    10    14    14    10     4     1     1
 10   108     1     1     5    12    22    26    22    12     5     1     1
 11   188     1     1     5    15    30    42    42    30    15     5     1     1
 12   352     1     1     6    19    43    66    80    66    43    19     6     1     1
 13   632     1     1     6    22    55    99   132   132    99    55    22     6     1     1
 14  1182     1     1     7    26    73   143   217   246   217   143    73    26     7     1     1
 15  2192     1     1     7    31    91   201   335   429   429   335   201    91    31     7     1     1
 16  4116     1     1     8    35   116   273   504   715   810   715   504   273   116    35     8     1     1
 17  7712     1     1     8    40   140   364   728  1144  1430  1430  1144   728   364   140    40     8     1     1
 18 14602     1     1     9    46   172   476  1038  1768  2438  2704  2438  1768  1038   476   172    46     9     1     1
 19 27596     1     1     9    51   204   612  1428  2652  3978  4862  4862  3978  2652  1428   612   204    51     9     1     1
 20 52488     1     1    10    57   245   776  1944  3876  6310  8398  9252  8398  6310  3876  1944   776   245    57    10     1     1

Although there are 4096 different 12-bit numbers, there are 1.268x10^30 100-bit numbers (see table below). Calculating the number of distinct signal groups takes longer with more bits. For example, the calculation for 20 bits took 42 minutes. So, there's no practical way to know the exact number of groups for much longer strings. However, the number of distinct signals is bounded from below by (2^n)/n, and it is very close to this value at 20 bits, so this is probably a good way to estimate the number of 100-bit groups as well: slightly more than 1.268x10^28. The fraction of number of groups with 50% bits ON falls to 0.176 at 20 bits, and the natural log of this value starts to approach a straight line. Using the average of the last two slopes of this log-line, the log fraction, fraction, and then number of 50% ON signals can be estimated to be about 2.469x10^26 for 100 bits. That's still a pretty big number, and more than any computer can enumerate for quite some time to come. So, it isn't possible to generate all 50% ON 100-bit strings for comparison, as was done with 12-bit strings above.

nbits        2^n    (2^n)/n    ngroups     n50pct     n50/ng        ln()       slope
-----  ---------  ---------  ---------  ---------  ---------  ----------  ----------
    2          4        2.0          3          1      0.333      -1.099
    4         16        4.0          6          2      0.333      -1.099    0.000000
    6         64       10.7         14          4      0.286      -1.253   -0.077075
    8        256       32.0         36         10      0.278      -1.281   -0.014085
   10       1024      102.4        108         26      0.241      -1.424   -0.071550
   12       4096      341.3        352         80      0.227      -1.482   -0.028785
   14      16384     1170.3       1182        246      0.208      -1.570   -0.044014
   16      65536     4096.0       4116        810      0.197      -1.626   -0.027986
   18     262144    14563.6      14602       2704      0.185      -1.686   -0.030412
   20    1048576    52428.8      52488       9252      0.176      -1.736   -0.024659

   20  1.049e+06  5.243e+04  5.243e+04  9.242e+03  1.763e-01  -1.736e+00  -2.754e-02
   30  1.074e+09  3.579e+07  3.579e+07  4.790e+06  1.338e-01  -2.011e+00  -2.754e-02
   40  1.100e+12  2.749e+10  2.749e+10  2.793e+09  1.016e-01  -2.286e+00  -2.754e-02
   50  1.126e+15  2.252e+13  2.252e+13  1.738e+12  7.716e-02  -2.562e+00  -2.754e-02
   60  1.153e+18  1.922e+16  1.922e+16  1.126e+15  5.859e-02  -2.837e+00  -2.754e-02
   70  1.181e+21  1.687e+19  1.687e+19  7.503e+17  4.449e-02  -3.113e+00  -2.754e-02
   80  1.209e+24  1.511e+22  1.511e+22  5.105e+20  3.378e-02  -3.388e+00  -2.754e-02
   90  1.238e+27  1.375e+25  1.375e+25  3.528e+23  2.565e-02  -3.663e+00  -2.754e-02
  100  1.268e+30  1.268e+28  1.268e+28  2.469e+26  1.948e-02  -3.939e+00  -2.754e-02

We can only compare finite, and fairly small, subsets of all 100-bit 50% strings. The collection of remote and guess strings of all trials of an experiment is one such subset. In this case, after duplicates (if any) are removed, all remote strings can be compared to all guesses. Here is a plot of the similarity of remote strings (down y axis) and all guess strings (across x axis). The color is from blue to red:


(Click for larger image)

This plot looks fairly random, especially when compared to the much more structured similarity plots shown above for 12-bit strings. The specific set of {remote, guess} pairs for each trial of the experiment are along the diagonal of the plot, which looks no different than the rest (in the 12-bit plot it is a maximum). To see if there is "hidden structure", I tried sorting this array (along both the y and x axes) according to the numerical value of each string (at left below), and according to the trial class (at right). There are other ways to rearrange the rows and columns, but these are two obvious ones. You can load each plot into a separate tab in the browser and switch back and forth to see that although they are different, both appear random as well:


(Click for larger image)

One surprise is the 3d plot of similarity count tuples. In the plots below, the x axis is number of guesses worse than this guess, the y axis is number of guesses with equal similarity, and the z axis is number of guesses that are better. Color is the class. I have no idea what is causing the clustering. Maybe it's a software problem, but I doubt it. I would have expected a more continuous distribution than in the 12-bit case above:


(Click for larger image)

The two views below demonstrate that all points lie in a plane oblique to all three axes:


(Click for larger image)

The results shown above are obtained, in character, from any 1000 element set of random 100-bit signals with 50% ON. Below are 3 examples of the similarity of all strings(j) to all strings(i). The plots have maxima along the diagonal, where string(j) is matched with itself, but appear random elsewhere. The plot at left is unsorted, the plot at center is sorted by the numerical value of each string, and the plot at right has 2000x2000 elements rather than 1000:


(Click for larger image)

Below are 3d tuples for 1000 signals (left and center) and 2000 signals (right). The effect of increasing the number of signals is to make the clusters tighter:


(Click for larger image)

If a subset of all 100-bit signals, with any percentage of bits ON, is self-compared, plots like the following are obtained. Two different color maps are shown (blue to black to red, and red to black to blue to yellow):


(Click for larger image)

If these plots are sorted according to numerical value, they also show structure similar to the 12-bit case. Again, the black areas show the locations of pairs of orthogonal signals:


(Click for larger image)

Below are 3d tuples for several 1000-signal subsets of all 100-bit signals. The y axis has been magnified by 10x. The color is the number of ON bits of string(j) (0 to 100, from blue to red):


(Click for larger image)

Another example

This example uses a collection of signals that are 32 bits long, and which all have exactly 16 bits set to 1. The experiment is the same as last time, except that now, when a trial is cued (again, about 50% of the time), in addition to showing the remote string, the similarity score between the remote string and the current guess is also shown. The operator can then choose to reload (or not) another random guess, which might have a greater similarity. By doing this, the scores for the cued classes (with and without hardware) can be slightly inflated. Here is an example of a cued trial:

By reloading the guess one or more times, a better similarity score can usually be obtained:

Here is an uncued trial:

Here are some results. The stats at bottom show 6 classes (the 4 previous ones plus 2 new ones), of which 4 appear in this experiment. The entries from left to right are maximum, minimum, mean, and standard deviation for raw scores (at left) and normalized by number of ON bits (at right). Two of the classes show significantly higher mean similarity scores, and smaller standard deviations:

************ output/051420/psyche_2020-0514-143132.data.txt
  0:   2    8.000    0.500
  1:   5    8.000    0.500
  2:   2    8.000    0.500
  3:   2   10.000    0.625
  4:   3    6.000    0.375
  5:   0    6.000    0.375
  6:   0    6.000    0.375
  7:   5   10.000    0.625
  8:   0   10.000    0.625
  9:   3    6.000    0.375
 10:   5    8.000    0.500
...etc...
 91:   0    6.000    0.375
 92:   5    8.000    0.500
 93:   0    8.000    0.500
 94:   0    4.000    0.250
 95:   0    6.000    0.375
 96:   2    8.000    0.500
 97:   3    4.000    0.250
 98:   2    8.000    0.500
 99:   0    6.000    0.375
--------------------------------------------  -----------------------------------
  0:  31   10.000    4.000    5.935    1.315     0.625    0.250    0.371    0.082
  1:   0    0.000    0.000    0.000    0.000     0.000    0.000    0.000    0.000
  2:  22   10.000    6.000    7.909    0.971     0.625    0.375    0.494    0.061
  3:  26    8.000    4.000    5.923    1.324     0.500    0.250    0.370    0.083
  4:   0    0.000    0.000    0.000    0.000     0.000    0.000    0.000    0.000
  5:  21   10.000    8.000    8.381    0.805     0.625    0.500    0.524    0.050
--------------------------------------------  -----------------------------------
  0: 245   10.000    4.000    5.780    1.281     0.625    0.250    0.361    0.080
  1:   0    0.000    0.000    0.000    0.000     0.000    0.000    0.000    0.000
  2: 260   10.000    4.000    8.108    0.717     0.625    0.250    0.507    0.045
  3: 249   12.000    2.000    6.016    1.586     0.750    0.125    0.376    0.099
  4:   0    0.000    0.000    0.000    0.000     0.000    0.000    0.000    0.000
  5: 246   10.000    8.000    8.187    0.583     0.625    0.500    0.512    0.036

Here is a plot of all 1000 trials. The cued entries are blue and green:


(Click for larger image)

Below are plots of the similarities of all remote strings (down the y axis) and all guesses (across the x axis) in colors from blue to red. The left plot is unsorted (strings are in the order they occur in the experiment), the center plot is sorted by string numerical value, and the right plot is sorted by trial class (maintaining the sequential order within each class):


(Click for larger image)

There are now visible spatial correlations along some of the rows and columns (lighter or darker narrow horizontal and vertical regions). This means that some of the remote strings are especially easy/hard to guess (horizontal regions), and that some of the guesses are especially good/bad for all of the signals (vertical regions).

However, I'm currently interested in serial correlation between trials, or "runs of luck", which should show up along the diagonals of the plots. To detect this, each 2d array was first made binary based on whether each element(j, i) was equal to or greater than a certain threshold value (in this case 8.0), and then convolved with a 5x5 filter having the values 1.0 along the diagonal and -0.25 everywhere else. These values sum to 0.0, and are positive for array elements which are part of a run of higher-than-normal scores along the diagonal. The last plot (sorted by class) clearly shows the correlation which was artificially added to the second and fourth classes:


(Click for larger image)

These results can be further enhanced by rescaling and then running the filter a second time. Note that there is always some nominal random background correlation, sorted or otherwise:


(Click for larger image)

Here is a plot of the {worse, same, better} tuples for this experiment. It's pretty clear from the color distribution that two of the classes occur predominantly in the "better" clusters near the end of the x axis. So, there are at least 3 ways to evaluate this experiment: (1) the mean and standard deviation of each class, (2) features in the sorted 2d similarity plot, and (3) clustering in the 3d similarity plot:


(Click for larger image)

There are 2^32 = 4294967296 different 32-bit numbers, and at least 2^32 / 32 = 134217728 different signals. At least 12.67% of these, = 17 million, have 50% of their bits ON. While this number is not too big to enumerate, it is too big to calculate the similarity of every signal with every other signal (17,000,000 x 17,000,000 = 2.89x10^14). So, again, subsets of 1000 signals (with only 1 million combinations) will be used instead. Here is a self-similarity plot of one of these subsets:


(Click for larger image)

Here are 3d similarity tuples for three of these subsets:


(Click for larger image)

Here are sorted similarity plots for a 1000 element subset of all 32-bit signals (0-100% ON):


(Click for larger image)

And here are 3d similarity tuples for several of these subsets:


(Click for larger image)

Future work:

Next time: foo.

19 May 2020

©Sky Coyote 2020