Markov, Search, Mo’ Testing

May 11, 2007

I finally got reconnected to a random blogger who I once blogged about several years ago. The last entry made was in 2004 and I thought he decided to walk away and quit. Not that I know this person really well, he seemed to be happy living a retired life outside the country. Adjusted to the economic and social conditions that he wrote about that time. And so I wondered if he was still out there blogging, interpreting the things around him according to his own philosophical views. I tried Google without luck, using his nickname didn’t return anything except his old blogsite. I tried running his text through the Markov Chain algorithm to see what words stood out from the pack, then used that to query the net. Nothing returned. So I finally gave up the search and decided to just send him an email. It was a shot in the dark, “maybe this guy is already dead?” I asked. I got a reply from him saying he has a new blogsite up and running since 2004. So I scratched my head again, sat down and started figuring out what went wrong in the attempt to use Markov Chain along with other algorithms to track him down. Now that I got reconnected, it’ll be nice to read about his opinions. Maybe use his work to refine this simple application I’m trying to complete.

Midrange Setup Using PC Thin Clients

In the olden days, the big iron was the only game and IBM supplied these monsters. On the other hand, Unix came in midrange sizes sporting a nice little OS. I remember it took me quite a while to figure out crontab, though. I’m sure the programmer who wrote that was stoned. With just the a little man page, that was all it was about to carry out the task. Talk about being macho with a man-page was not what I would consider fun.

Looking back, it would probably be nice to have to re-create that environment I once worked-in, a throw-back in the days of the old AT&T SVR4 environment by putting together a bunch of old PCs as thin clients connected to an Ubuntu server. I heard the LTSP project are into this kind of stuff so I’ll head over there and look for more info. At the moment, the current setup are desktops connected to a hub.

Desktop Webservice

It looks like the idea of implementing a webservice specifically for a desktop could become a hit. I really like the idea of having to connect to a desktop as if it is like another website with a webservice. A new level of simplification was achieved, factoring out the differences and only taking the common parts really made sense. Another interesting piece was putting Avahi to work alongside the webservice. The implications of these two working together could mean more toys to play with.

GEGL

Nice, I just got my first dive into GEGL on the Edgy distro.

Advertisements

NLP Markov: Adding Genitive, Dative And Ablative Cases

May 8, 2007

The genitive, dative and ablative are grammar cases that modify nouns and verbs. These cases form a bulk of the written and coded literature published in the internet. Buried in these paragraphs are grammatical cases supporting an idea, an intangible concept, an emotional artifact.

Classifying these embedded grammatical constructs require a good number of tools still waiting to be developed. There are probably tools that are unknown at the time of this writing and they will be found sometime later.

One approach is by book catalog method which hasn’t been designed and implemented, yet. This approach involve deconstructing a document into several basic forms, for example outline and timeline format, then placing them in a catalog. Each node in the timeline is a taggable item where elements can attach.

Markov Chain algorithm have proven to be very useful in this regard as I was able to setup a probability function for a given lexeme. Granularity is currently at the word level, and it would be interesting if the algorithm perform well at the phrase level where these three grammar cases become significant.

One example of Markov Chain implementation is the Google Toolbar. I noticed not too long ago a new behavior was added to the search dropdown box. I type-in a word and below appears a list of related words. These related words–I would imagine–are words generated by Markov Chain. What I found interesting about the list was the X number of words per row. I can’t tell what that does imply, but it does mean something big and grand.

There is still a disconnect between the output of a Markov algorithm with respect to “Phrase-ology” in the sense that a Markov algorithm is dependent on another variable which is not identified at the moment.

Solving the unknown would mean extending the parameters to N to cover for a subset of the entire phrase. Tagging may become optional in this case, though.

This is definitely an exciting experiment worth looking into. The question of simply applying Markov algorithm as a one-all solution brush; or treat it as a tool-in-a-toolbox, combining it with other methods to achieve a finer richer solution.

———

  1. Advanced Natural Language Processing