One Terabyte Load-And-Query Perf Test

May 7, 2007

Let me start with how much space is one terabyte. According to this webpage, one terabyte is about 1024 gigabytes; so let’s say that would be about four 256GB disks if you buy them from BestBuy, or ones they sell now you can get two 512GB disks currently selling at Fry’s. For a home enthusiast/hobbyist, buying two 512GB disks would be the right choice for a homebrew PC box. Stick it in the two bays, plug the cables and let Ubuntu take care of it. That’s pretty much it, got it squared-away.

512GB Disk

I’m going to the next point now and that is about data sourcing. The web is basically a network of data, a vast collection of whatever-you-wanna-call-it is right there sitting on the web. It’s just there, wow! Imagine the wealth of information you can get from that vast sea, ocean or even celestial data space. Man! that is simply awesome, mind-blowing just thinking about that great number of data. So yeah, data is up there waiting to be mined. That is the source, pure unadulterated wide-open wealth of knowledge right at your doorstep.

Here’s my third point, I’m going to relate the first paragraph to the second paragraph and it will go something like a data processing system on your home machine. All I got is one terabyte of empty space, waiting to be populated with collected data from the Great Web. I’m guessing one terabyte is enough to perform a simple experiment required to generate a very interesting report which may or may not have any value to anyone, except me. The resulting output definitely has a huge potential because I believe in this truism that “the perfect data is the one you have never seen yet.” Casting a big wide net to the web and hauling it over to a one terabyte space for processing will definitely capture that hidden gem. The most important part of the process is performing thing this called synthesis, which would even refine it a cleaner version.

This is my closing for this entry. Some of the tools are already in place, I just got it working yesterday, enough to proceed and carry-on to the next level of test. Although, I may have to cough-up some dough for the 1024GB disk as they are not cheap. 500GB disk is still pretty expensive compared to 160GB, though. It is definitely quite an investment for that small experiment I’d like to perform. Drive and Redland will be the ones doing the heavy lifting.


X11 Sim

May 2, 2007

X11 Sim 

Several weeks ago, An interesting question popped into my head if it is possible to simulate X protocol using a programming language other than C. I found that question interesting and it did got stuck in my head for several days. Recently, I did had some time to look into it and wrote a sample code. I used the X11R6 documentation which is quite good for setting up the storage definitions.

Here is another interesting document covering X11R6 performance, titled X Window System Network Performance. I did go over the doc and read the numbers, a lot of these items described showed interesting points. For example, the number of requests made from inside Nautilus. These numbers could be used to measure and then compare numbers generated by the test code, which may shed some light to the question whether binary code is really faster than interpreted code. This will also point out where these delays are coming from.

Another item worth looking into is getting performance numbers on sync and async calls between client and server. Determine where the penalties are in terms of code and algorithm complexity.

What’s exciting about this exercise is the data gathering part. Capturing data from a sample run would allow anyone (me) to see a pattern and probably gain insight on what types of structure contribute to building efficient systems.

Public Domain In The Current Copyright Regime

The plea or appeal demonstrated by the developers of NeoSmart will definitely be turned-down or simply get ignored. Why? you ask, there are several reasons making some of MS technologies cross-platform an impossibility.

  1. Cross-platform means all machines running different kinds of Operating Systems. Some of these OS run proprietary and open source applications. Moreover, software tools are tied to different licenses, which dictate to users what actions are considered legal or illegal. MS tools will have to play with these cross-platform tools which may cause legal problems.
  2. MS is at the core a desktop OS. The OS was designed to function as a desktop, connected to the network. The last thing it would do is re-design the machine as a citizen of much larger system. MS doesn’t believe in that model where they are just a part of a bigger system. They’re a company offering a set of solutions with tools entirely developed and stamped with MS seal of approval. These tools are by their definition MS-centric.
  3. MS would never place their jewels in the public domain in order to please everybody. That would be anathema. The current copyright regime has a loophole that will insure their technology will be protected for as long as they can hold on it.