Releases: ropensci/targets
Releases · ropensci/targets
Content addressable storage
targets 1.8.0
- Wrap
tar_watch()UI module inbslib::page()(#1302, @kwbyron-lilly). - Remove
callr_functionintar_make_as_job()argument list. - Ensure
storage = "worker"is respected when the process of storing an object generates an error (#1304, @multimeric). - Default to the
_targets.Rpattern intar_branches()(#1306, @multimeric, @mattwarkentin). - Remove superfluous functions and globals from metadata with
tar_prune()(#1312, @benzipperer). - Change the default
workspace_on_erroroption toTRUE(#1310, @hadley). - Enhance and organize the
error = "stop"error message. - Avoid saving a file in
_targets/objectsforerror = "null". Instead, switch to a special"null"storage format class iferroris"null"the target throws an error. This should allow users to more freely create new formats withtar_format()without worrying about how to handleNULLobjects created byerror = "null". - Implement
format = "auto"(#1311, @hadley). - Replace
pingrdependency withbase::socketConnection()for local URL utilities (#1317, #1318, @Adafede). - Implement
tar_repository_cas(),tar_repository_cas_local(), andtar_repository_cas_local_gc()for content-addressable storage (#1232, #1314, @noamross). - Add
tar_format_get()to make implementing CAS systems easier. - Implement
error = "trim"intar_target()andtar_option_set()(#1310, #1311, @hadley). - Use the file system type to decide whether to trust time stamps (#1315, @hadley, @gaborcsardi).
- Deprecate
format = "file_fast"in favor of the above (#1315). - Deprecate
trust_object_timestampsin favor of the more unifiedtrust_timestampsintar_option_set()(#1315). - Print storage size of each target in verbose reporters (#1337, @psychelzh).
- Combine help files of
tar_target()andtar_target_raw(). Same withtar_load()andtar_load_raw(). - Add a
substituteargument totar_format()to make it easier to write custom storage formats without metaprogramming.
bslib and speed
targets 1.7.1
- Use
bslibintar_watch(). - Speed up
target_upstream_edges()andpipeline_upstream_edges()by avoiding data frames until the last minute (17% speedup for certain kinds of large pipelines). - Automatically set
as_jobtoFALSEintar_make()ifrstudioapiand/or RStudio is not available.
secretbase
targets 1.7.0
Invalidating changes
- Use
secretbase::siphash13()instead ofdigest(algo = "xxhash64", serializationVersion = 3)so hashes of in-memory objects no longer depend on serialization version 3 headers (#1244, @shikokuchuo). Unfortunately, pipelines built with earlier versions oftargetswill need to rerun.
Other improvements
- Ensure patterns marshal properly (#1266, #1264, ropensci/geotargets#52, @Aariq, @njtierney).
- Inform and prompt the user when the pipeline was built with an old version of
targetsand changes to the package will cause the current work to rerun (#1244). For thetar_make*()functions,utils::menu()prompts the user to give people a chance to downgrade if necessary. - For type safety in the internal database class, read all columns as character vectors in
data.table::fread(), then convert them to the correct types afterwards. - Add a new
tar_resources_custom_format()function which can pass environment variables to customize the behavior of customtar_format()storage formats (#1263, #1232, @Aariq, @noamross). - Only marshal dependencies if actually sending the target to a parallel worker.
Custom descriptions
targets 1.6.0
- Modernize
extrasintar_renv(). tar_target()gains adescriptionargument for free-form text describing what the target is about (#1230, #1235, #1236, @tjmahr).tar_visnetwork(),tar_glimpse(),tar_network(),tar_mermaid(), andtar_manifest()now optionally show target descriptions (#1230, #1235, #1236, @tjmahr).tar_described_as()is a new wrapper aroundtidyselect::any_of()to select specific subsets of targets based on the description rather than the name (#1136, #1196, @noamross, @mattmoo).- Fix the documentation of the
namesargument (nudge users towardtidyselectexpressions). - Make assertions on the pipeline process more robust (to check if two processes are trying to access the same data store).
CRAN patch
targets 1.5.1
- Avoid
arrow-related CRAN check NOTE. use_targets()only writes the_targets.Rscript. Therun.shandrun.Rscripts are superseded by theas_jobargument oftar_make(). Users not using the RStudio IDE can calltar_make()withcallr_function = callr::r_bgto run the pipeline as a background process.tar_make_clustermq()andtar_make_future()are superseded in favortar_make(use_crwe = TRUE), so template files are no longer written for the former automatically.
{secretbase} RNG seeds, smarter task dispatch, tar_make(as_job = TRUE)
targets 1.5.0
Invalidating changes
Because of the changes below, upgrading to this version of targets will unavoidably invalidate previously built targets in existing pipelines. Your pipeline code should still work, but any targets you ran before will most likely need to rerun after the upgrade.
- In
tar_seed_create(), usesecretbase::sha3(x = TARGET_NAME, bits = 32L, convert = NA)to generate target seeds that are more resistant to overlapping RNG streams (#1139, @shikokuchuo). The previous approach used a less rigorous combination ofdigest::digest(algo = "sha512")anddigets::digest2int().
Other improvements
- Update the documentation of the
deploymentargument oftar_target()to reflect the advent ofcrew(#1208, @psychelzh). - Unset
cli.num_colorson exit intar_error()andtar_warning()(#1210, @dipterix). - Do not try to access
seconds_timeoutif thecrewcontroller is actually a controller group (#1207, wlandau/crew.cluster#35, @stemangiola, @drejom). tar_make()gains anas_jobargument to optionally run atargetspipeline as an RStudio job.- Bump required
igraphversion to 2.0.0 becauseigraph::get.edgelist()was deprecated in favor ofigraph::as_edgelist(). - Do not dispatch targets to backlogged
crewcontrollers (or controller groups) (#1220). Use the newpush_backlog()andpop_backlog()crewmethods to make this smooth. - Make the debugger message more generic (#1223, @eliocamp).
- Throw an early and informative error from
tar_make()if there is already atargetspipeline running on a local process on the same local data store. The local process is detected using the process ID and time stamp fromtar_process()(with a 1.01-second tolerance for the time stamp). - Remove
pkgload::load_all()warning (#1218). Tried using.__DEVTOOLS__but it interferes with reverse dependencies. - Add documentation and an assertion in
tar_target_raw()to let users know thatiteration = "group"is invalid for dynamic targets (ones withpattern = map(...)etc.; #1226, @bmfazio).
Small fixes
targets 1.4.1
- Print "errored pipeline" when at least one target errors.
- Bump minimum
clustermqversion to 0.9.2. - Repair the
tar_debug_instructions()tips for when commands are long. - Do not look for dependencies of primitive functions (#1200, @smwindecker, @joelnitta).
AWS/crew efficiency, random number safety
targets 1.4.0
Invalidating changes
Because of the changes below, upgrading to this version of targets will unavoidably invalidate previously built targets in existing pipelines. Your pipeline code should still work, but any targets you ran before will most likely need to rerun after the upgrade.
- Use SHA512 during the creation of target-specific pseudo-random number generator seeds (#1139). This change decreases the risk of overlapping/correlated random number generator streams. See the "RNG overlap" section of the
tar_seed_create()help file for details and justification. Unfortunately, this change will invalidate all currently built targets because the seeds will be different. To avoid rerunning your whole pipeline, setcue = tar_cue(seed = FALSE)intar_target(). - For cloud storage: instead of the hash of the local file, use the ETag for AWS S3 targets and the MD5 hash for GCP GCS targets (#1172). Sanitize with
targets:::digest_chr64()in both cases before storing the result in the metadata. - For a cloud target to be truly up to date, the hash in the metadata now needs to match the current object in the bucket, not the version recorded in the metadata (#1172). In other words,
targetsnow tries to ensure that the up-to-date data objects in the cloud are in their newest versions. So if you roll back the metadata to an older version, you will still be able to access historical data versions with e.g.tar_read(), but the pipeline will no longer be up to date.
Other changes to seeds
- Add a new exported function
tar_seed_create()which creates target-specific pseudo-random number generator seeds. - Add an "RNG overlap" section in the
tar_seed_create()help file to justify and defend howtargetsandtarchetypesapproach pseudo-random numbers. - Add function
tar_seed_set()which sets a seed and sets all the RNG algorithms to their defaults in the R installation of the user. Each target now usestar_seed_set()function to set its seed before running its R command (#1139). - Deprecate
tar_seed()in favor of the newtar_seed_get()function.
Other cloud storage improvements
- For all cloud targets, check hashes in batched LIST requests instead of individual HEAD requests (#1172). Dramatically speeds up the process of checking if cloud targets are up to date.
- For AWS S3 targets,
tar_delete(),tar_destroy(), andtar_prune()now use efficient batched calls todelete_objects()instead of costly individual calls todelete_object()(#1171). - Add a new
verboseargument totar_delete(),tar_destroy(), andtar_prune(). - Add a new
batch_sizeargument totar_delete(),tar_destroy(), andtar_prune(). - Add new arguments
page_sizeandverbosetotar_resources_aws()(#1172). - Add a new
tar_unversion()function to remove version IDs from the metadata of cloud targets. This makes it easier to interact with just the current version of each target, as opposed to the version ID recorded in the local metadata.
Other improvements
- Migrate to the changes in
clustermq0.9.0 (@mschubert). - In progress statuses, change "started" to "dispatched" and change "built" to "completed" (#1192).
- Deprecate
tar_started()in favor oftar_dispatched()(#1192). - Deprecate
tar_built()in favor oftar_completed()(#1192). - Console messages from reporters say "dispatched" and "completed" instead of "started" and "built" (#1192).
- The
crewscheduling algorithm no longer waits on saturated controllers, and targets that are ready are greedily dispatched tocreweven if all workers are busy (#1182, #1192). To appropriately set expectations for users, reporters print "dispatched (pending)" instead of "dispatched" if the task load is backlogged at the moment. - In the
crewscheduling algorithm, waiting for tasks is now a truly event-driven process and consumes 5-10x less CPU resources (#1183). Only the auto-scaling of workers uses polling (with an inexpensive default polling interval of 0.5 seconds, configurable throughseconds_intervalin the controller). - Simplify stored target tracebacks.
- Print the traceback on error.
CRAN patch
targets 1.3.2
- Try to fix function help files for CRAN.
Cloud metadata fixes
targets 1.3.1
- Add
tar_config_projects()andtar_config_yaml()(#1153, @psychelzh). - Apply error modes to
builder_wait_correct_hash()intarget_conclude.tar_builder()(#1154, @gadenbuie). - Remove duplicated error message from
builder_error_null(). - Allow
tar_meta_upload()andtar_meta_download()to avoid errors if one or more metadata files do not exist. Add a new argumentstrictto control error behavior. - Add new arguments
meta,progress,process, andcrewto control individual metadata files intar_meta_upload(),tar_meta_download(),tar_meta_sync(), andtar_meta_delete(). - Avoid newly deprecated arguments and functions in
crew0.5.0.9003 (https://github.com/wlnadau/crew/issues/131). - Allow
tar_read()etc. inside a pipeline whenever it uses a different data store (#1158, @MilesMcBain). - Set
seed = FALSEinfuture::future()(#1166, @svraka). - Add a new
physicsargument totar_visnetwork()andtar_glimpse()(#925, @Bdblodgett-usgs).