Indexando com CL
Inspirado pelo post do Gleicon que usa Ferret (Ruby e C) para indexar o kernel do linux, resolvi fazer o mesmo em CL.
Esse código usa Montezuma, que é uma tradução do Ferret para Common Lisp (é 100% CL):
(eval-when (:compile-toplevel :load-toplevel :execute) (require :montezuma)) (defpackage :montezuma-test (:use :cl) (:export #:add-dir-to-index #:search-index)) (in-package :montezuma-test) ;; maybe this isn’t a fast way to read a file (defun slurp-file (filename) (with-open-file (stream filename :direction :input) (let ((seq (make-string (file-length stream)))) (read-sequence seq stream) seq))) (defparameter *index* (make-instance ‘montezuma:index :path “/tmp/montezuma-test”)) (defun add-dir-to-index (dir-name) (cl-fad:walk-directory dir-name #’(lambda (file) (ignore-errors (montezuma:add-document-to-index *index* `((“file” . ,(princ-to-string file)) (“content” . ,(slurp-file file)))))))) (defun search-index (keyword) (montezuma:search-each *index* (concatenate ’string “content:” keyword) #’(lambda (doc score) (format t “~a score: ~a~n” (montezuma:document-value (montezuma:get-document *index* doc) “file”) score))))
Pequeno teste (apenas com os .c e .h):
$ wget http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.23.1.tar.bz2
$ tar jxf linux-2.6.23.1.tar.bz2
$ find linux-2.6.23.1 -type f -not \( -name "*.c" -o -name "*.h" \) -exec rm {} \;
$ find linux-2.6.23.1 -type f | wc -l
18438
$ du -hs linux-2.6.23.1
257M linux-2.6.23.1
$ sbcl --noinform --no-linedit
* (load (compile-file "montezuma-test.lisp"))
....
T
* (time (montezuma-test:add-dir-to-index "/home/lucindo/linux-2.6.23.1"))
Heap exhausted during garbage collection: 264 bytes available, 520 requested.
Gen StaPg UbSta LaSta LUbSt Boxed Unboxed LB LUB !move Alloc Waste Trig WP GCs Mem-age
0: 0 0 0 0 0 0 0 0 0 0 0 2000000 0 0 0.0000
1: 0 0 0 0 0 0 0 0 0 0 0 2000000 0 0 0.0000
2: 0 0 0 0 0 0 0 0 0 0 0 2000000 0 0 0.0000
3: 0 0 0 0 0 0 0 0 0 0 0 2000000 0 0 0.0000
4: 0 0 0 0 0 0 0 0 0 0 0 2000000 0 0 0.0000
5: 73866 73908 0 0 63308 1533 87 186 0 266268432 438512 232094792 0 3 0.8641
6: 0 0 0 0 5781 0 0 0 0 23678976 0 2000000 5628 0 0.0000
Total bytes allocated=536069368
fatal error encountered in SBCL pid 6239(tid 3085203120):
Heap exhausted, game over.
LDB monitor
ldb> quit
Máquina com 512M de RAM. SBCL não aguentou
Fonte: montezuma-test.lisp
| Enviar por e-mail | Hits para esta publicação: 609
Deixe uma resposta.