Michael Weber: Random Bits and Pieces

...Please don't assume Lisp is only useful for Animation and Graphics, AI, Bioinformatics, B2B and E-Commerce, Data Mining, EDA/Semiconductor applications, Expert Systems, Finance, Intelligent Agents, Knowledge Management, Mechanical CAD, Modeling and Simulation, Natural Language, Optimization, Research, Risk Analysis, Scheduling, Telecom, and Web Authoring just because these are the only things they happened to list.

Kent M. Pitman

A #lisp conversation sparked the cunning idea to write a diff-like algorithm for s-expressions.

It was mainly an exercise to see how hard it would be, since Xophe asserted it might not be trivial.

Example session:

DIFF-SEXP> (diff-sexp  
	  	'(DEFUN F (X) (+ (* X 2) 1)) 
	  	'(DEFUN F (X) (- (* X 2) 3 1)))
((DEFUN F (X) (|#-new| + |#+new| - (* X 2) |#+new| 3 1)))

The algorithm is based on Levenshtein-like edit distance calculations. It tries to minimize the number of atoms in the result tree (also counting the edit conditionals |#-new|, |#+new|). This can be observed in the example below: instead of blindly adding more edit conditionals, the algorithm backs up and replaces the whole subexpression.

DIFF-SEXP> (diff-sexp  
	  	'(DEFUN F (X) (+ (* X 2) 4 1)) 
	  	'(DEFUN F (X) (- (* X 2) 5 3 1)))
((DEFUN F (X) (|#-new| + |#+new| - (* X 2) |#-new| 4 |#+new| 5 |#+new| 3 1)))

DIFF-SEXP> (diff-sexp  
	  	'(DEFUN F (X) (+ (* X 2) 4 4 1)) 
	  	'(DEFUN F (X) (- (* X 2) 5 5 3 1)))
((DEFUN F (X) |#-new| (+ (* X 2) 4 4 1) |#+new| (- (* X 2) 5 5 3 1)))

The library is called diff-sexp (caveat emptor: should be regarded as prototype). Currently it handles updates, deletions and insertions, but does not take advantage of moved subtrees. However, for production use, e.g. inside Climacs, it needs to handle more details.

Update: Thanks to Lemonodor for the URL to the original discussion.