Index of /data/

NameLast ModifiedSizeType
Parent Directory/ -  Directory
crawl/2008-Oct-07 15:51:00-  Directory
load/2008-Sep-19 11:30:00-  Directory
parse/2008-Oct-23 20:33:07-  Directory
README.txt2008-Jul-23 18:08:101.6Ktext/plain
This is the data directory for watchdog.net. Here you'll find the 
raw data dumps for everything that powers our website. It's 
available at:

   http://watchdog.net/data
   rsync://watchdog.net/data

We've broken things down into three categories, corresponding to 
the output of the three different types of software we use to import 
our data:

  crawl/
    These are the raw files we've gotten from our data sources.
    We're republishing them here to make them easier to find and
    to reduce the load on our providers.

  parse/
    This is the output from our parser in .njs (netstring-encoded
    JSON) format. If you're not messing with a parser, this is 
    probably what you want -- we've done all the work of extracting
    all the interesting data for you.
    
    The .njs format is kind of weird, but we find it works best for
    these kinds of data files. The idea is pretty simple: data is
    stored in dictionaries/hashtables/objects, one for each item
    of interest. The objects are encoded in JSON format then listed
    using netstrings. A netstring is just an integer that's the 
    length of a string, the string, and a comma. So an njs file 
    might look like:
    
        14:{"monkeys": 5},14:{"monkeys": 2},
    
    See also http://watchdog.jottit.com/standards
    
    If the files are large we'll often gzip them to save space.

  load/
    Here is where we keep the finished, aligned, normalized data
    ready for import into our databases. If you just want to run 
    the watchdog software, or something based on it, then this
    is all you need.

Got it? If there are any questions, don't hesitate to contact us:

the watchdog team
info@watchdog.net
lighttpd/1.4.19