-
Notifications
You must be signed in to change notification settings - Fork 31
feature(examples): Add JBoard connector #459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🧹 Python Code Quality Check✅ No issues found in Python Files. This comment is auto-updated with every commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a new Jboard connector to sync employer profiles, job categories, and alert subscription data from the Jboard API. The connector implements memory-efficient streaming patterns with pagination, incremental sync support using timestamp-based cursors, and comprehensive error handling with retry logic.
Key changes:
- Implements three-table sync (employers, categories, alert_subscriptions) with generator-based pagination
- Provides Bearer token authentication with configurable timeout and retry settings
- Uses incremental checkpointing for employers data with state management
- Includes comprehensive retry logic with exponential backoff and jitter for rate limiting
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 24 comments.
| File | Description |
|---|---|
| connectors/jboard/connector.py | Main connector implementation with API integration, data fetching, transformation, and sync orchestration |
| connectors/jboard/configuration.json | Configuration template defining API credentials and connector parameters |
| connectors/jboard/README.md | Documentation describing connector features, configuration, authentication, and data schema |
| README.md | Updated root README to add Jboard connector to the examples list |
| checkpoint_state = { | ||
| "last_sync_time": record.get( | ||
| "updated_at", datetime.now(timezone.utc).isoformat() | ||
| ), |
Copilot
AI
Nov 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
State progression logic issue. The checkpoint state at line 510-512 uses record.get("updated_at") which represents the timestamp of the current record being processed, not necessarily the latest timestamp in the batch. This can lead to data being skipped if records are not processed in chronological order.
Instead, track the maximum updated_at across all processed records in the batch:
max_updated_at = record.get("updated_at", datetime.now(timezone.utc).isoformat())
if max_updated_at > checkpoint_state.get("last_sync_time", ""):
checkpoint_state["last_sync_time"] = max_updated_atThis ensures the state advances correctly and no data is missed in subsequent syncs.
|
|
||
| The connector includes several additional files to support functionality, testing, and deployment: | ||
|
|
||
| - `requirements.txt` – Python dependency specification for Jboard API integration and connector requirements including faker for mock testing. |
Copilot
AI
Nov 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reference to non-existent requirements.txt file. Line 98 mentions "requirements.txt – Python dependency specification for Jboard API integration and connector requirements including faker for mock testing", but no requirements.txt file exists in the connector directory.
Either:
- Remove this reference from the "Additional files" section if no requirements.txt is needed
- Add the requirements.txt file if faker or other dependencies are actually needed for testing
Based on the connector code, no additional dependencies are needed, so this reference should be removed.
| - `requirements.txt` – Python dependency specification for Jboard API integration and connector requirements including faker for mock testing. |
| "per_page": max_records, | ||
| "page": 1, | ||
| } | ||
|
|
Copilot
AI
Nov 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incremental sync not implemented for alert_subscriptions. The get_alert_subscriptions function accepts last_sync_time parameter but never uses it to filter data (unlike get_employers which uses it at line 333). This means alert_subscriptions will always perform a full sync even when incremental sync is enabled.
Add the incremental sync filter after line 429:
# Add incremental sync filters if last_sync_time provided
if last_sync_time:
params["created_at_from"] = last_sync_time| # Add incremental sync filters if last_sync_time provided | |
| if last_sync_time: | |
| params["created_at_from"] = last_sync_time |
| except Exception as e: | ||
| log.severe(f"Sync failed: {str(e)}") | ||
| raise RuntimeError(f"Failed to sync data: {str(e)}") | ||
|
|
||
|
|
Copilot
AI
Nov 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generic exception catching without proper error classification. The except Exception as e: block at line 560 catches all exceptions indiscriminately, including permanent errors that shouldn't be retried. This can lead to unnecessary retry attempts for non-transient failures.
Consider catching specific exceptions and handling them appropriately:
except requests.exceptions.HTTPError as e:
if e.response.status_code in [401, 403, 404]:
# Permanent errors - fail immediately
log.severe(f"Authentication or resource error: {str(e)}")
raise
else:
# Transient errors - allow retry
log.severe(f"HTTP error during sync: {str(e)}")
raise RuntimeError(f"Failed to sync data: {str(e)}")
except (requests.exceptions.Timeout, requests.exceptions.ConnectionError) as e:
log.severe(f"Network error during sync: {str(e)}")
raise RuntimeError(f"Failed to sync data: {str(e)}")
except Exception as e:
log.severe(f"Unexpected error during sync: {str(e)}")
raiseThis provides better visibility into error types and allows for appropriate handling.
| except Exception as e: | |
| log.severe(f"Sync failed: {str(e)}") | |
| raise RuntimeError(f"Failed to sync data: {str(e)}") | |
| except requests.exceptions.HTTPError as e: | |
| status_code = e.response.status_code if e.response is not None else None | |
| if status_code in [401, 403, 404]: | |
| log.severe(f"Authentication or resource error: {str(e)}") | |
| raise | |
| else: | |
| log.severe(f"HTTP error during sync: {str(e)}") | |
| raise RuntimeError(f"Failed to sync data: {str(e)}") | |
| except (requests.exceptions.Timeout, requests.exceptions.ConnectionError) as e: | |
| log.severe(f"Network error during sync: {str(e)}") | |
| raise RuntimeError(f"Failed to sync data: {str(e)}") | |
| except Exception as e: | |
| log.severe(f"Unexpected error during sync: {str(e)}") | |
| raise |
| RuntimeError: If all retry attempts fail or unexpected errors occur. | ||
| requests.exceptions.RequestException: For unrecoverable HTTP errors. | ||
| """ | ||
| url = f"{__API_ENDPOINT}{endpoint}" |
Copilot
AI
Nov 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect API endpoint construction. The code defines __API_VERSION = "v1" but doesn't use it when building the URL. This creates a maintainability issue if the API version needs to change.
Change line 181 to:
url = f"{__API_ENDPOINT}/{__API_VERSION}{endpoint}"This ensures the version constant is actually used and the URL is correctly formed.
| url = f"{__API_ENDPOINT}{endpoint}" | |
| url = f"{__API_ENDPOINT}/{__API_VERSION}{endpoint}" |
| ## Configuration file | ||
| ```json | ||
| { | ||
| "api_key": "<YOUR_JBOARD_API_KEY>", | ||
| "sync_frequency_hours": "<YOUR_JBOARD_API_SYNC_FREQUENCY_HOURS>", | ||
| "initial_sync_days": "<YOUR_JBOARD_API_INITIAL_SYNC_DAYS>", | ||
| "max_records_per_page": "<YOUR_JBOARD_API_MAX_RECORDS_PER_PAGE>", | ||
| "request_timeout_seconds": "<YOUR_JBOARD_API_REQUEST_TIMEOUT_SECONDS>", | ||
| "retry_attempts": "<YOUR_JBOARD_API_RETRY_ATTEMPTS>", | ||
| "enable_incremental_sync": "<YOUR_JBOARD_API_ENABLE_INCREMENTAL_SYNC>", | ||
| "enable_debug_logging": "<YOUR_JBOARD_API_ENABLE_DEBUG_LOGGING>" | ||
| } | ||
| ``` |
Copilot
AI
Nov 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing note about not checking configuration.json into version control. According to the coding guidelines, the Configuration file section must explicitly mention that configuration.json should not be versioned.
Add after the JSON block:
Note: Ensure that the `configuration.json` file is not checked into version control to protect sensitive information.| ## Additional files | ||
|
|
||
| The connector includes several additional files to support functionality, testing, and deployment: | ||
|
|
||
| - `requirements.txt` – Python dependency specification for Jboard API integration and connector requirements including faker for mock testing. | ||
|
|
||
| - `configuration.json` – Configuration template for API credentials and connector parameters (should be excluded from version control). | ||
|
|
||
|
|
||
| ## Additional considerations |
Copilot
AI
Nov 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect statement about requirements.txt. The README states "This connector does not require any additional packages" (line 51), but line 98 mentions "requirements.txt – Python dependency specification for Jboard API integration and connector requirements including faker for mock testing."
These statements contradict each other. If there's a requirements.txt file with faker, then the connector does have additional dependencies. Clarify which is correct and update accordingly.
| ## Additional files | |
| The connector includes several additional files to support functionality, testing, and deployment: | |
| - `requirements.txt` – Python dependency specification for Jboard API integration and connector requirements including faker for mock testing. | |
| - `configuration.json` – Configuration template for API credentials and connector parameters (should be excluded from version control). | |
| ## Additional considerations | |
| ## Requirements file | |
| The connector requires the `faker` package for mock testing and development purposes. | |
faker
Note: The `fivetran_connector_sdk:latest` and `requests:latest` packages are pre-installed in the Fivetran environment. To avoid dependency conflicts, do not declare them in your `requirements.txt`.
## Additional files
The connector includes several additional files to support functionality, testing, and deployment:
- `requirements.txt` – Python dependency specification for mock testing and development.
- `configuration.json` – Configuration template for API credentials and connector parameters (should be excluded from version control).
| # The 'upsert' operation is used to insert or update data in the destination table. | ||
| # The op.upsert method is called with two arguments: | ||
| # - The first argument is the name of the table to upsert the data into. | ||
| # - The second argument is a dictionary containing the data to be upserted, | ||
| op.upsert(table="employers", data=record) |
Copilot
AI
Nov 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect comment format before op.upsert(). According to the coding guidelines, the required comment before EVERY op.upsert() call should be:
# The 'upsert' operation is used to insert or update data in the destination table.
# The first argument is the name of the destination table.
# The second argument is a dictionary containing the record to be upserted.
op.upsert(table="employers", data=record)The current comment includes extra details about "The op.upsert method is called with two arguments:" which is not part of the standard format. Use the exact format specified in the guidelines.
| """ ADD YOUR SOURCE-SPECIFIC IMPORTS HERE | ||
| Example: import pandas, boto3, etc. | ||
| Add comment for each import to explain its purpose for users to follow. | ||
| """ |
Copilot
AI
Nov 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Template comment not removed. The comment block at lines 31-34 is placeholder text from the template and should be removed. This connector doesn't need any additional source-specific imports beyond the standard ones already included.
| """ ADD YOUR SOURCE-SPECIFIC IMPORTS HERE | |
| Example: import pandas, boto3, etc. | |
| Add comment for each import to explain its purpose for users to follow. | |
| """ |
| """ | ||
| Main synchronization function that fetches and processes data from the Jboard API. | ||
| This function orchestrates the entire sync process using memory-efficient streaming patterns. | ||
| Args: | ||
| configuration: Configuration dictionary containing API credentials and settings. | ||
| state: State dictionary containing sync cursors and checkpoints from previous runs. | ||
| Raises: | ||
| RuntimeError: If sync fails due to API errors or configuration issues. | ||
| """ |
Copilot
AI
Nov 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect update function docstring. According to the coding guidelines, the update function must use the exact required docstring format:
def update(configuration: dict, state: dict):
"""
Define the update function which lets you configure how your connector fetches data.
See the technical reference documentation for more details on the update function:
https://fivetran.com/docs/connectors/connector-sdk/technical-reference#update
Args:
configuration: a dictionary that holds the configuration settings for the connector.
state: a dictionary that holds the state of the connector.
"""Replace the current docstring with this exact format.
varundhall
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as #256 (review)
fivetran-chinmayichandrasekar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few suggestions. Thanks.
| Page-based pagination with automatic page traversal (refer to `get_employers`, `get_categories`, and `get_alert_subscriptions` functions). The connector uses `page` and `per_page` parameters to fetch data in configurable chunks. Generator-based processing prevents memory accumulation for large datasets by yielding individual records. Processes pages sequentially while yielding individual records for immediate processing, with pagination metadata used to determine when all data has been fetched. | ||
|
|
||
| ## Data handling | ||
| Employer, category, and alert subscription data is mapped from Jboard API's format to normalized database columns (refer to the `__map_employer_data`, `__map_category_data`, and `__map_alert_subscription_data` functions). Nested objects like tags arrays are serialized to JSON strings, and all timestamps are converted to UTC format for consistency. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Employer, category, and alert subscription data is mapped from Jboard API's format to normalized database columns (refer to the `__map_employer_data`, `__map_category_data`, and `__map_alert_subscription_data` functions). Nested objects like tags arrays are serialized to JSON strings, and all timestamps are converted to UTC format for consistency. | |
| Employer, category, and alert subscription data is mapped from Jboard API's format to normalized database columns (refer to the `__map_employer_data`, `__map_category_data`, and `__map_alert_subscription_data` functions). Nested objects like `tags` arrays are serialized to JSON strings, and all timestamps are converted to UTC format for consistency. |
|
|
||
| **EMPLOYERS**: `id`, `name`, `description`, `website`, `logo_url`, `featured`, `source`, `created_at`, `updated_at`, `have_posted_jobs`, `have_a_logo`, `sync_timestamp` | ||
|
|
||
| **CATEGORIES**: `id`, `name`, `description`, `parent_id`, `sort_order`, `is_active`, `created_at`, `updated_at`, `sync_timestamp` | ||
|
|
||
| **ALERT_SUBSCRIPTIONS**: `id`, `email`, `query`, `location`, `search_radius`, `remote_work_only`, `category_id`, `job_type`, `tags`, `is_active`, `created_at`, `updated_at`, `sync_timestamp` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| **EMPLOYERS**: `id`, `name`, `description`, `website`, `logo_url`, `featured`, `source`, `created_at`, `updated_at`, `have_posted_jobs`, `have_a_logo`, `sync_timestamp` | |
| **CATEGORIES**: `id`, `name`, `description`, `parent_id`, `sort_order`, `is_active`, `created_at`, `updated_at`, `sync_timestamp` | |
| **ALERT_SUBSCRIPTIONS**: `id`, `email`, `query`, `location`, `search_radius`, `remote_work_only`, `category_id`, `job_type`, `tags`, `is_active`, `created_at`, `updated_at`, `sync_timestamp` | |
| EMPLOYERS: `id`, `name`, `description`, `website`, `logo_url`, `featured`, `source`, `created_at`, `updated_at`, `have_posted_jobs`, `have_a_logo`, `sync_timestamp` | |
| CATEGORIES: `id`, `name`, `description`, `parent_id`, `sort_order`, `is_active`, `created_at`, `updated_at`, `sync_timestamp` | |
| ALERT_SUBSCRIPTIONS: `id`, `email`, `query`, `location`, `search_radius`, `remote_work_only`, `category_id`, `job_type`, `tags`, `is_active`, `created_at`, `updated_at`, `sync_timestamp` |
| - [ibm_informix_using_ibm_db](https://github.com/fivetran/fivetran_connector_sdk/tree/main/connectors/ibm_informix_using_ibm_db) - This example shows how to connect and sync data from IBM Informix using Connector SDK. This example uses the `ibm_db` library to connect to the Informix database and fetch data. | ||
| - [influx_db](https://github.com/fivetran/fivetran_connector_sdk/tree/main/connectors/influx_db) - This example shows how to sync data from InfluxDB using Connector SDK. It uses the `influxdb3_python` library to connect to InfluxDB and fetch time-series data from a specified measurement. | ||
| - [iterate](https://github.com/fivetran/fivetran_connector_sdk/tree/main/connectors/iterate) - This example shows how to sync NPS survey data from the Iterate REST API and load it into your destination using Connector SDK. The connector fetches NPS surveys and their individual responses, providing complete survey analytics data for downstream analysis. | ||
| - [jboard](https://github.com/fivetran/fivetran_connector_sdk/tree/main/connectors/jboard) - This example shows how to sync employers, job categories, and alert subscriptions data from Jboard API to your destination warehouse. You need to provide your Jboard API key for this example to work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - [jboard](https://github.com/fivetran/fivetran_connector_sdk/tree/main/connectors/jboard) - This example shows how to sync employers, job categories, and alert subscriptions data from Jboard API to your destination warehouse. You need to provide your Jboard API key for this example to work. | |
| - [jboard](https://github.com/fivetran/fivetran_connector_sdk/tree/main/connectors/jboard) - This example shows how to sync employers, job categories, and alert subscriptions data from the Jboard API to your destination warehouse. You need to provide your Jboard API key for this example to work. |
Jboard Connector
Created: 2025-01-31
Business Owner: Talent Acquisition & Recruitment Operations Team
Technical Owner: Data Engineering Team
Last Updated: 2025-01-31
Business Context
Technical Context
Operational Context
API-Specific Details
/employers- Employer profiles, company information, and job posting metadata/categories- Job categories with hierarchical structure and organization/alert_subscriptions- User job alert subscriptions with search criteria and preferencesData Schema Overview
Data Replication Expectations
created_at_fromfiltersOperational Requirements
Rate Limiting Strategy
Data Quality Considerations
Integration Points
Disaster Recovery
Compliance & Security
Performance Optimization
Troubleshooting Guide
Checklist
Some tips and links to help validate your PR:
fivetran debugcommand.