This is the data directory for watchdog.net. Here you'll find the raw data dumps for everything that powers our website. It's available at: http://watchdog.net/data rsync://watchdog.net/data We've broken things down into three categories, corresponding to the output of the three different types of software we use to import our data: crawl/ These are the raw files we've gotten from our data sources. We're republishing them here to make them easier to find and to reduce the load on our providers. parse/ This is the output from our parser in .njs (netstring-encoded JSON) format. If you're not messing with a parser, this is probably what you want -- we've done all the work of extracting all the interesting data for you. The .njs format is kind of weird, but we find it works best for these kinds of data files. The idea is pretty simple: data is stored in dictionaries/hashtables/objects, one for each item of interest. The objects are encoded in JSON format then listed using netstrings. A netstring is just an integer that's the length of a string, the string, and a comma. So an njs file might look like: 14:{"monkeys": 5},14:{"monkeys": 2}, See also http://watchdog.jottit.com/standards If the files are large we'll often gzip them to save space. load/ Here is where we keep the finished, aligned, normalized data ready for import into our databases. If you just want to run the watchdog software, or something based on it, then this is all you need. Got it? If there are any questions, don't hesitate to contact us: the watchdog team info@watchdog.net