Index of /data/
| Name | Last Modified | Size | Type |
| Parent Directory/ | | - | Directory |
| crawl/ | 2008-Oct-07 15:51:00 | - | Directory |
| load/ | 2008-Sep-19 11:30:00 | - | Directory |
| parse/ | 2008-Oct-23 20:33:07 | - | Directory |
| README.txt | 2008-Jul-23 18:08:10 | 1.6K | text/plain |
This is the data directory for watchdog.net. Here you'll find the
raw data dumps for everything that powers our website. It's
available at:
http://watchdog.net/data
rsync://watchdog.net/data
We've broken things down into three categories, corresponding to
the output of the three different types of software we use to import
our data:
crawl/
These are the raw files we've gotten from our data sources.
We're republishing them here to make them easier to find and
to reduce the load on our providers.
parse/
This is the output from our parser in .njs (netstring-encoded
JSON) format. If you're not messing with a parser, this is
probably what you want -- we've done all the work of extracting
all the interesting data for you.
The .njs format is kind of weird, but we find it works best for
these kinds of data files. The idea is pretty simple: data is
stored in dictionaries/hashtables/objects, one for each item
of interest. The objects are encoded in JSON format then listed
using netstrings. A netstring is just an integer that's the
length of a string, the string, and a comma. So an njs file
might look like:
14:{"monkeys": 5},14:{"monkeys": 2},
See also http://watchdog.jottit.com/standards
If the files are large we'll often gzip them to save space.
load/
Here is where we keep the finished, aligned, normalized data
ready for import into our databases. If you just want to run
the watchdog software, or something based on it, then this
is all you need.
Got it? If there are any questions, don't hesitate to contact us:
the watchdog team
info@watchdog.net