PICK, a program for analyzing contra dances
-- Stan Swanson, January 2013; License: GPL


PICK helps in the process of picking dances with specific moves or with a desired set of moves. It transforms dances from a dance card format to a spreadsheet format, one line per dance. Information available in columns of the spreadsheet includes title, author, type (improper, proper, etc.), a summary, and statistics columns definable by the user. Possible statistics columns include lists of who swings, who allemandes, direction of circles or stars (R,L), a count of wave figures, type of heys, out of minor set moves, etc. The order of occurence of a figure in a dance can be tabulated. An estimate of the piece count has been attempted. The dance cards are transformed into shortened symbols which are user definable.

Usage and Output
      (default input dance file: "dances.txt", output: "out.csv")
      (example input dance file "select.dances", example output: "select.csv" )

PICK is a command line program. If the default file names are used for dances, dictionary, and analysis, only the program name need be typed,

"> pick"       (">" is the system command prompt)

An alternative dance file can be specified as a single argument:

"> pick mydances.in",

or copied to the default file "dances.txt", or referenced by a link. Other options can be specified, "> pick [options]". These options include "-h" to get help, and will be discussed below.

Sample input files for dances.txt, dance-dict.txt, and dance-stat.txt are furnished with the distribution, and their formats are discussed in greater detail below. The most important output file is "out.csv" which is in the portable "comma-separated-variable" format which can be read into most spreadsheets (e.g. Open Office, Excel). The optional information output file "info.out" contains five-line listings for each dance, along with two summaries and debug and diagnostic messages (helpful if the interpretation seems weird). "info.out" reproduces the counts of output symbols seen on the terminal screen as the program finishes. These counts give you an idea of the frequency of figures and qualifiers (who, direction, etc.) in a collection of dances.

We now discuss the presentation of the analysis as seen in the sample spreadsheet, "select.csv" generated from "select.dances" using "select-dict.txt" and " select-stat.txt ". We comment on the columns seen in "select.csv" under the colunm headers following "title" and "author".

"type" gives a single character code for the dance type (I P B ...)
and an indication of double or triple progression (2 3)

"summary" is the dance reduced to an abbreviated code (usually between 40 and 100 characters). Figures are separated by commas within a 16 beat section. Semicolons separate A1: from A2: and so on. The persons doing the figure are given first in caps (P = partner, N = neighbor, 1 = actives, W = ladies, etc.), then a lower case abbreviation for the figure (sw = swing, a = allemande, o = circle, * = star, etc.), followed by the direction (-L = left, -R = right, -x = across, etc.), and finally a digit giving the amount of turning in quarters ( 3 = 3/4, 6 = 6/4 (1 1/2), etc.). These abbreviations are given in the file dance-dict.txt and are user definable.


Click this link for summaries of "select.dances".

The seventeen columns between the explicitly blank columns headed by "_" and before the columns headed by "pieces" contain the position(s) in the dance of the figures indicated by the column headings ("sw a o oR ... out rare). These columns were generated with the statistic "order". Note that sometimes a number is absent from the sequence in a given row. This may happen if pseudo-phrase does not contain a figure, or if the figure has not been specified in any of the "order" statements.

"sw" says where swings occur. For example. in "The Baby Rose", "' 1 4" tells us that there are two swings in the dance, one at the beginning.

"a" says whether and where allemandes occur.

"o" and "*" are for circles and stars.

"oR" is a more specific indication for "circle right". See " select-stat.txt " for how this was done.
... ... ...

"out" are moves out of the minor set, like "contra corners", shadow interactions, a chain or "R&L trhu" on a diagonal.

"rare" attempts to catalog infrequent moves like "contra corners", "petronella", "mad robin", or "orbit".

"pieces" Two columns, the first of which reproduces any explicit estimate of the piece count from the title line ("#..."). The second is pick's calculation according to the scheme discussed in the section "STATISTICS and ANALYSIS".


Click this link for positions of moves in "select.dances".

Following the "pieces" columns are other statistics on the dance figures, illustrating the statistics "qualify count first-char text short group".

"swing" tells which persons swing (P = partners, N = neighbors, S = shadow, etc.). Uses statistic "qualify".

"allem" tabulates who does the allemande (M = men, W = ladies, ...).

"circle" and "star" indicates the direction (L = left, R = right).

"ch_rl_pr" counts the number chains (ladies or otherwise), R&L's, and promenades. Uses statistic "count"

"lines" counts LLFB, or just LL.

"D/R" counts the traditional "down 4 in line", "return", etc. These can be hard to determine from dance card descriptions. More discussion of this problem later.

"dosido" gives dosido (d), gypsy (g), and seesaw (z). Uses statistic "first-char".

"hey" indicates the type of hey (h = full, 2 = half, g = gypsy, ... ).

"wave" counts the appearance of the string "wav" in the dance description.

"rare" indicates the appearance of specific figures (e.g. bfy = butterfly whirl, pet = petronella, cc = contra corners). Uses statistic "text".

"out** " attempts to flag out of minor set interactions by looking for shadow (S), diagonal (\), contra corners (c), grand (g) as in grand chain or grand right and left. We have found this sort of figure particularly troubling for beginners.

"hwro" uses the "group" statistic which reports whether specified columns are non-blank. (1 = hey, 2 = wave, 3 = D/R, 4 = out**, 0 = none of these, 12 = both hey and wave, etc.). Sorting on this column results in relatively large groupings of "similar" dances. Simple, more or less glossary, dances are indicated by a "0".

"*cD" is another group reporting stars (1 = *), (2 = chain, r&l, or promenade), and (3 = D/R [down and return] ).

"synopsis" is another single character representation of figures specified in the "short" statistic. Unspecified (presumably low frequency) figures appear as "X".

"title" repeats the title for reference with the statistics columns.


Click this link to see these statistics applied to "select.dances".

"AA BB AABB" [appears optionally with the command line flag "-e"] An attempt to give sorting keys for the unique figures in A1+A2, B1+B2, and the dance, coded as the first character of the figure name. Some ambiguity occurs. It was hoped that these sorts would give groups of dances with similar moves and structures, but the reality is that they are more diverse than that. Groupings are better found with the analysis key "group" to be discussed below.

"0" the last column is the original order, so that a sort on this column will restore the spreadsheet to its original dance card sequence.



INPUT FORMAT FOR DANCES
      (default file: dances.txt)
      (example file: "select.dances" )

The format of dances in the input file "dances.txt" is easily convertible from files specifying figures for A1:, A2:, B1:, and B2: The program works fairly well with existing dance cards (tested on several collections of 130, 200, and 440 dances). In some cases the syntax can be sharpened to help with the analysis. Of particular difficulty are waves and balances, and the traditional "down 4 in line" with a variety of return possibilities. IF the cards are in MS Word format, they can be exported in .txt format and massaged slightly with a text editor. A comparison of the summaries with the original input helps to proof read dance cards, as will an examination of the alphabetized file "occ.out" generated by the "-w" option.

Title, author, and dance type information is expected by the program to be on a single line starting with the character "=". Title and author are separated by a space delimited dash, " - ". Two words are taken from the author field, then the rest of the line is scanned for words like "proper, improper, mixer, circle, triplet, Sicilian, longways, x face x, square", and for double or triple progression. Capital letters are preserved in the title and author, but ignored for type and progression. Some abbreviations are recognized (imp prop prog). We have added an option for "#nn" on this line, where "nn" is the piece count, since estimation of "nn" is hard.

Other lines in the file describe either dance figures or are ignored as comments. Comment lines start with the character "%" in the first column. Lines between the title lines and A1: and after the B2: sequence are also treated as comments, even without the "%".

The sequence of lines containing A1:, A2:, B1:, and B2: is taken as dance information, until broken by a blank line or a comment line following B2:. The section markers A1:, A2:, B1:, and B2: must have three characters, the first being "A" or "b" (in either case), the second being a digit ("1" or "2"), and the last being a punctuation character.

Within the dance description, instructions or hints enclosed within parentheses "(...)" are ignored. The description is broken into pseudo phrases delineated by commas, semicolons, or ends of lines. Words are separated by white space or by one of the separators " , ; end-of-line ( ) ". Other punctuation is left as written, with the possibility of being incorporated into strings used in the analysis (e.g. "3/4" "R&L" ). For example,

=Sample Dance - Anony Mous       improper
A1:   circle left 1x; ladies chain (to partner)
A2:   hey (ladies start, R shoulder)
B1:   partner balance and swing
B2:   circle left 3/4
    balance the ring, pass thru (to next neighbor)

is summarized as

o-L4, W ch; hey; P bal_sw; o-L3, ring_b, pass

Note that the amount of a circle, star, or allemande is given by the number of fourths: "4" means 4/4 or once around, "3" means 3/4.

THE DANCE DICTIONARY
      (default file: dance-dict.txt)
      (example file: "select-dict.txt")

The words or word fragments which are extracted from the input text are specified in the file "dance-dict.txt". A sample dictionary is furnished, but it can be customized by the user. To get a count of the unique "words" in a dance file, the program has the option "-w" which counts separate words and fragments, putting the information in "word.out" and "occ.out". Separating characters are space, tab, end-of-line, comma, semicolon, and parentheses. Punctuation other than separating characters and the comment flag "%" is left as written and can be part of a word (or string). For example: colon (e.g. A1:), slash (e.g. 3/4), ampersand (e.g. R&L)

The general structure of a dictionary line is

outsymbol # category = insymbol[s]

e.g.

P # who = partner ptr p % multiple inputs all go to "P"

sw # action = swing sw

<thr # part = thr % "thr" matches through, thru (and three !!)

Comments follow the symbol "%" either at the start of a line or later. Input symbols are in lower case, since the dance descriptions are converted to lower case. Some output symbols have been capitalized to highlight them, notably P, N, ... (who) and R, L, ... (direction). Choice of the output symbols (and input symbols) is left up to the writer of the dictionary.

Categories currently defined are "who", "direction", "how-much", "action", "part", "combo", and "ignore".

"who" specifies the persons involved,

"direction" includes right, left, up, down, across, along, forward, back, diagonal.

"how-much" gives the amount of turn for allemandes, circles, stars (in 1/4 increments).

"action" is the designator for a major dance figure (e.g. swing, circle, wave) and

"part" modifies an action, or may be one of several words used together to specify an action. Usually parts will be combined with other symbols with "combo", and replaced with a single symbol designating an action.



When several words are needed to specify a figure, these are given in a "combo" line. All the symbols to the right of "=" must be present for substitution of the outsymbol to the left of "#". The symbols on the right must be previously defined output names of various input symbols. For example

r&l # combo = R L <thr % reduces "right and left thru" to "r&l"

The trick in setting up the dictionary is to find word fragments which will match all of the possible words and abbreviations for a figure or modifier on the dance card. For example, "thr" will match both "thru" and "through", but also the word "three", which rarely occurs in a dance description. "sw" will match "swing" and "sw", but also "swat" as in "swat the flea". "ll" will match "LL" (for "long lines") but also "all" and "hall".

To circumvent possible unwanted multiple matches, the longer fragments are tested first, and any single character fragments must match a single character in the dance description. Finally, there is the "ignore" directive which specifies particular strings to ignore. The words "start" or "starting" are sometimes used in a hint for who begins a hey. They are selected by "star" used to search for stars. To fix this, either edit the dance, putting the hint inside parentheses, or use:

?st # ignore = start

The symbol "?st" will appear in a list at the end of the run with a count of the times "start" was ignored.

To help in the design or revision of the dictionary, the command line option "-w" is furnished to generate lists of strings (words and fragments) used in a collection of dances (i.e. a dance archive). One list, "words.out", is sorted by frequency of occurrence. The second list, "occ.out", is in alphabetical order.


STATISTICS and ANALYSIS
      (default statistics file: dance-stat.txt)
      (example file: "select-stat.txt", with output "select.csv" ]

The analysis can be customized by the user in the file dance-stat.txt. Options include counting the number of times a move (or symbol) occurs, listing either the first character or full output text of special moves, tabulating which of a specified set of symbols "qualifies" an action, noting which groups of moves occur, and in what sequence the moves occur.

General form:

col_header # statistic = output_symbol[s]

As in the dictionary lines, the "#" and "=" must be flanked by white space (space or tab), and symbols to the right of "=" must be separated by spaces.

"statistic" can be "count", "first-char", "text", "short", "qualify", "group", "order", or "pieces". All of the symbols to the right of "=" must have been previously defined in the dictionary file.


Examples from file "select-stat.txt" which give some of the results discussed above in the section on "Usage and Output"

wave # count = wv

Counts the number of occurences of the symbol "wv" in the dance. The symbol "wv" has previously been translated from the fragment "wav" as specified in the dictionary file:

wv # action = wav

Tip: you can introduce a blank column with the line " _ # count =".

swing # qualify = sw > P N 1 2 M W

Individual pseudo phrases (delimited by separators) are checked for the action "sw" and the occurrence of one of the person designators "P" ... "W". A list of the first characters of any of symbols to the right of ">" appearing in the dance is put into the spreadsheet cell for that dance.

out** # first-char = S \ cc gr-ch gr-rl gr-RL

We take the occurrence of "shadow", "diagonal", "contra corners" or a "grand chain" or "grand right and left" as indicating an out of the minor set interaction (there may be other markers). We find these figures hard for beginners.

hwro # group = 31 32 33 34

Reports whether the columns designated on the right are non-blank. Column 31 is non-blank if there is a hey, column 32 is for waves, column 33 is for "rare" movements, and column 34 is for the out-of-minor-set figures.

A new statistic "order" has been added, which records the position in the dance where a figure occurs. The syntax is similar to that for "qualify":

swing # order = sw > sw % records when any swing occurs

o* # order = o * > o * % occurrences of circles and stars

oR # order = o > R % only circle rights

The first part of "select-stat.txt" gives position or sequence of various moves in the dances, using the "order" statistic The "position" in . the dance is only approximate, since the descriptions do not adhere to a rigid format from which 8 beat phrases can be deduced. The second part illustrates various counts and other statistics.

Strain boundaries (16 beats) are defined by the sections A1:, etc. But the number of pseudo moves or figures within a strain is not restricted (e.g. wave balances and allemandes, hey hints, down and return (with turns)). We have taken the divisions indicated by commas, semicolons, and line ends to define pseudo phrases. We count these from the beginning of the dance. The first move will always be "1", and others will be in sequence. Occasionally a pseudo phrase will not contain an "action" and will not appear in the sequence. Some moves like "balance and swing" will be counted as one move, even though they take 16 beats.

Another requested statistic is piece count, similar to that given in Zesty Contras by Larry Jennings. He states "piece count is subjective and dependent on the locale". Moreover, we have the problems discussed above with the "order" statistic, and problems of combining identical or similar moves (such as a Ladies chain; Ladies chain back). A preliminary implementation is given with the statistic "pieces".

pc # pieces = ret % plus other flags

This produces two columns. The first contains any title line strings in the format "#nnxxx" as in Zesty Contras. The "#" is mandatory, "nn" is a one or two digit number, and "xxx" are any additional letter codes, to a limit of ten characters. The second column contains our naive estimate of the piece count of the dance. Any strains with a single figure (e.g. balance and swing, hey) have one piece. Any strains with only one type of figure (e.g. circle L, circle R) are also one additional piece. Finally, any strain containing any of the flags to the right of "=" counts as one piece. In this example, "ret" signifies "return" so that "down 4-in-line, turn alone; return, bend the line" counts as one piece, even though it nominally has four.

RUNNING and COMPILING the PROGRAM

In addition to "> pick" with all default file names and "> pick dances", other options can be specified. All output goes to the default files: csv.out, info.out, words.out, and occ.out, These default output file names cannot be changed from the command line. The resulting files must be renamed to preserve them. I still STRONGLY recommend making backup copies of your dance archive. Command line:

"> pick [options]".

These options include

"-h" to get help and exit.

"-s your_statistics_file" reads an alternative statistics file.

"-d your_dictionary_file" reads an alternative dictionary.

"-i" prints additional parsing and debug information to "info.out".

"-w" tabulates word/fragments, and writes sorted output to default files "words.out" and "occ.out". Useful for proof reading the dance file and for modifying your dictionary.

"-e" enables experimental features. At this writing, this is the "AA BB AABB" single character figure lists as discussed above.

"-b [nn]" prints detailed parsing information for the first "nn" lines of input. Output to "info.out".


The command line "> pick {options] dances" also works and should pose no danger to the file "dances", since one can not redefine output files.

Alternatively, one can copy or rename input files to the default names,
or use soft links, "> ln -s mydict.txt dance-dict.txt", in Linux or MAC OS.

[What is the equivalent in Windows???]

--------------------

The distribution can be downloaded as a single ZIP files, or individually downloadable files. It is more thoroughly described in the README file. The distribution includes executables for Linux (Fedora), Mac OS 10.6 (Snow Leopard), and Windows (compiled with gcc in MinGW, which run on 2000 XP, Vista Home, and Windows 7 Starter), along with the compilable code in C.

Compilation is straight forward: "> gcc -g -o pick pick.c".

There are a number of spreadsheet tricks involving sorting of the entire sheet (or parts thereof), cutting, and pasting, which I am still learning. Suggestions are solicited, along with comments on the program and on the dictionary and statistics files.