Changelog History
-
v0.9.0.rc2
November 16, 2016 -
v0.9.0.rc1
November 13, 2016 -
v0.8.0 Changes
February 19, 2016A completely re-architected version of DeepDive is here.
⚡️ Now the system compiles an execution plan ahead of time, checkpoints at a much finer granularity, and gives users full visibility and control of the execution, so any parts of the computation can be flexibly repeated, resumed, or optimized later.
The new architecture naturally enforces modularity and extensibility, which enables us to innovate most parts independently without having to understand every possible combination of the entire code.
⏱ The abstraction layers that encapsulate database operations as well as compute resources are now clearly established, giving a stable ground for extensions in the future that support more types of database engines and compute clusters such as Hadoop/YARN and ones with traditional job schedulers.🐎 As an artifact of this redesign, exciting performance improvements are now observed:
- 🖨 The database drivers show more than 20x higher throughput (2MB/s -> 50MB/s, per connection) with zero storage footprint by streaming data in and out of UDFs.
- The grounded factor graphs save up to 100x storage space (12GB -> 180MB) by employing compression during the factor graph's grounding and loading, incurring less than 10% overhead in time (400s -> 460s, measuring only the dumping and loading, hence a much smaller fraction in practice).
👀 See the issues and pull requests for this milestone on GitHub (most notably #445) for further details.
🆕 New commands and features
An array of new commands have been added to
deepdive, and existing ones have been rewritten, such asdeepdive initdbanddeepdive run.deepdive compiledeepdive plandeepdive dodeepdive redodeepdive mark
-
deepdive done-
deepdive modeldeepdive createdeepdive loaddeepdive unloaddeepdive query
-
deepdive dbdeepdive checkdeepdive compute
-
@tsv_extractor,@returnsPython decorators for parsing and formatting in UDFs.Interactive tools
The bundled Mindbender can now automatically construct a search and browsing interface from DDlog annotations.
📚 Documentation for Dashboard has been added.mindbender searchmindbender dashboardmindbender snapshot
-
mindbender taggerMiscellaneous
deepdive whereis
To learn more about individual
deepdive COMMAND, use the followingdeepdive helpcommand.deepdive help COMMAND⬇️ Dropped and deprecated features
Scala code base has been completely dropped and rewritten in Bash and jq.
🗄 Many superfluous features have been dropped and are deprecated to be dropped as summarized below:- All other extractor style than
tsv_extractor,sql_extractor, andcmd_extractorhave been dropped, namely:plpy_extractorpiggy_extractorjson_extractor
- Manually writing
deepdive.confis strongly discouraged as filling in more fields such asdependencies:andinput_relations:became mandatory.
Rewriting them in DDlog is strongly recommended. - 🔧 Database configuration in
deepdive.db.defaultis completely ignored.
db.urlmust be used instead. deepdive.extraction.extractors.*.inputindeepdive.confshould always be SQL queries.
👍TSV(filename.tsv)orCSV(filename.csv)no longer supported.
-
v0.8-STABLE Changes
February 25, 2016🚀 This automatically updated release includes the latest fixes done to the last v0.8.x release.
-
v0.7.1 Changes
September 28, 2015- ➕ Adds better support for applications written in DDlog.
deepdive runnow runs DDlog-based applications (app.ddlog). - 👉 Makes PL/Python extension no longer necessary for PostgreSQL.
It is still needed for Greenplum and PostgreSQL-XL. - ➕ Adds
deepdive sql evalcommand now supportsformat=json. - ➕ Adds
deepdive loadcommand for loading TSV and CSV data. - ➕ Adds
deepdive helpcommand for quick usage instructions fordeepdivecommand. - ✅ Includes the latest Mindbender with the Search GUI for browsing data produced by DeepDive.
- ➕ Adds various bug fixes and improvements.
- ➕ Adds better support for applications written in DDlog.
-
v0.7.0 Changes
July 13, 2015- Provides a new command-line interface
deepdivewith a new standard DeepDive application layout.- No more installation/configuration complication: Users run everything through the only
deepdivecommand, and everything just works in any environment. The only possible failure mode is not being able to rundeepdivecommand, e.g., by not setting up thePATHenvironment correctly. - No more pathname/environment clutter in apps: repeated settings for
DEEPDIVE_HOME,APP_HOME,PYTHONPATH,LD_LIBRARY_PATH,PGHOST,PGPORT, ... in run.sh or env.sh or env_local.sh or env_db.sh or etc. are gone. Path names (e.g., extractor udf) in application.conf are all relative to the application root, and brittle relative paths are no longer used in any of the examples. - Clear separation of app code from infrastructure code, as well as source code from object code: No more confusing of deepdive source tree with binary/executable/shared-library distribution or temporary/log/output directories.
- Binary releases can be built with
make package.
- No more installation/configuration complication: Users run everything through the only
- Here are a summary of changes visible to users:
- Application settings is now kept in
deepdive.conffile instead ofapplication.conf. - Database settings is now done by putting everything (host, port, user, password, database name) into a single URL in file
db.url. - Path names (e.g., extractor udf) in deepdive.conf are all relative to the application root unless they are absolute paths.
- SQL queries against the database can be run easily with
deepdive sqlcommand when run under an application. - Database schema is now put in file
schema.sqland optional initial data loading can be done by a scriptinput/init.sh. Input data is recommended to be kept underinput/. - By passing the pipeline name as an extra argument to the
deepdive runcommand, different pipelines can be run very easily: No more application.conf editing. - Logs and outputs are placed under application root, under
snapshot/.
- Application settings is now kept in
- ➕ Adds piggy extractor that replaces the now deprecated plpy extractor.
- ✅ Includes the latest DDlog compiler with extended syntax support for writing more real world applications.
- ✅ Includes the latest Mindbender with Dashboard GUI for producing summary reports after each DeepDive run and interactively analyzing data products.
- Provides a new command-line interface
-
v0.6.0 Changes
June 17, 2015- ➕ Adds DDlog for writing applications in Datalog-like syntax.
- ➕ Adds support for incremental development cycles.
- ➕ Adds preliminary support for Postgres-XL backend.
- Simplifies installation on Ubuntu and Mac with a quick installer that takes care of all dependencies.
- ⬇️ Drops maintenance of AMI favoring the new quick installer.
- 🛠 Fixes sampler correctness issues.
- 🐎 Drops "FeatureStatsView" view due to performance issues.
- Corrects various issues.
- 🚀 Starts using Semantic Versioning for consistent and meaningful version numbers for all future releases.
-
v0.05-RELEASE Changes
February 09, 2015🚀 Changelog for release 0.0.5-alpha (02/08/2015)
- ➕ Added support to build Docker images for DeepDive. See the README.md for more.
- ➕ Added SQL "FeatureStatsView" view. Populated with feature
statistics; useful for debugging. - ➕ Added a few fixes to greenplum docs
- ➕ Added parallel greenplum loading for extractor data
- 🛠 A few misc bugfixes
-
v0.04.1-RELEASE Changes
November 25, 2014🚀 Changelog for release 0.0.4.1-alpha (11/25/2014)
🚀 This release focuses mostly on bug fixing and minor new features.
- 👌 Improve handling of failures in extractors and inference rules.
- ➕ Add support for running tests on GreenPlum.
- ➕ Add support for
-q,--quietin the DimmWitted sampler. This allows to
⬇️ reduce the verbosity of the output. - ✂ Remove some dead code.
- 🛠 Fix a small bug in the
spouse_exampletest.