By now, you will no doubt be aware that Dalet Flex is a media management platform, and as a media management platform, it must deal with large files and long term operations. So, certain actions such as transcoding and publishing may take a long time to run. Some actions may also require access to resources to get work done. For example, a transcode action will require access to a transcode resource. Although Dalet Flex could run an action immediately at your request, this could present problems in terms of resource contention and the extended length time of the user interface or API interface to respond. It also limits the use of actions to a certain degree, as you may wish a specific action to run at a specific time in the future. An example might be that you want to publish an asset to YouTube next Sunday at 09:00 hours.
For all of these reasons, actions are run inside jobs. These jobs are managed by Dalet Flex's internal job scheduler. The job scheduler is responsible for handling the storage, retrieval, prioritization, and execution of jobs.
A job is a unit of work that comprises of an action (that defines the work) and additional properties such as start time, priority, status and owner. Jobs run as transactions, which means they either complete or failed, and roll back to their original state. A job runs in its own context, which means it is completely isolated from all other jobs and can be run on any Dalet Flex node in a Dalet Flex cluster that supports this job type. A job contains all the information to run in isolation, including configuration information. This information is stored in the job's "context". This is important, as it means that if a job fails, it can be retried.
Job Properties
-
Start Time: The time that the job has been scheduled to begin execution. There are a few scenarios in which the actual start time of a job may be delayed. This could happen if:
- There are other higher priority jobs that require execution.
- The resources that are required for the job are contended. - End Time: The time that the job completed execution (assumed it has completed).
- Duration: The time the job took to run.
- Priority: The priority of the job e.g. Lowest, Low, Normal, High, Highest.
- Owner: The user that owns the job.
- Status: The current status of the job e.g. Scheduled, Running, Failed, Completed.
- Retries: The number of times that a job has been retried. This means that if a job has failed it can be retried.
- Action: This is the action (defined work) that the job is carrying out on your behalf.
- Configuration: When a job is created, the configuration is copied from the action to the job. This means that if a job fails, you can amend the configuration for a specific job and then retry it.
Job States
- Created: This state is applied to a job when it has been created and has not been scheduled. A job can be created in three ways: as part of a workflow node executing, as a result of a user action or as a result of an API call. When a job is first created, it is saved to Dalet Flex's internal database. At this point it is considered created. Dalet Flex's job Scheduler ignores jobs which do not have a start time scheduled.
- Scheduled: When a job is allocated a start time it is considered scheduled. Once a start time has been set, the job scheduler will periodically check to see whether it is due to be run.
- Pending: A job is pending, when it's start time is equal to or later than the current time. At this stage it has not been added to the Dalet Flex job scheduler's internal queue as the queue is full.
- Queued: When a job is queued, it means that it has been added to the Dalet Flex Job Schedulers internal queue and will execute as soon as the required resources become available.
- Waiting for Lock: Some jobs require exclusive access to a Dalet Flex object such as an asset. The Waiting For Lock state indicates that the job is due to run, but that it's waiting for a lock. In order for the job to run safely against an asset it must first obtain access to an exclusive lock. If when the job begins running, it cannot obtain a lock, its state is set to Waiting For Lock and it is added to a lock queue. When the job that owns the lock relinquishes the lock (because its job has completed), the lock will be given to the next job in the queue.
- Running: This state indicates that a job has been executed by the Dalet Flex job scheduler and is currently running. This implies that the code inside the associated action is being run.
-
Timed Out: If the action associated with a job has a time out value set and the job has been running longer that the time out period, the Job will be set to Timed Out.
- If a time-out period is assigned to an action then when a job begins working on that action it will set a timer for the duration of the time-out period. If the time-out period is reached before the job has completed, the job scheduler will set the job status to "Timed Out". This does not (as might be expected) stop the job running, it merely sets an indicator that the job has been running longer than expected. This concept becomes powerful when an event filter is set up for timed out jobs. This means that users can be made aware that something may be wrong. Time outs are particularly powerful when a job is connecting to a 3rd party system, for example a transcode server. It may provide an early warning that the transcoder is not performing as expected and may have an internal fault. Please note that Dalet Flex never cancels running jobs for the simple reason that the job may get into an illegal state and corrupt data or other external systems. - Failed: A failed job can be either retried or cancelled.
- Cancelled: Once a job has been cancelled this cannot be changed. As a result the job cannot be retried, scheduled etc.
- Completed: A completed job has successfully completed execution. Once a job has completed it cannot be retried, scheduled etc.
Group Jobs
A group job is a job that comprises of one or more member or child jobs. A group job structure provides a way of grouping together logically related jobs. A group job is created when an action that offers group member execution is executed against an asset group. When this happens, Dalet Flex's jobs scheduler creates a group job, and then creates sub-jobs for each member or child contained in an asset group. Each sub-job is scheduled at the same time but run as a separate job.
The state of a group job is managed by Dalet Flex's job scheduler in the following way:
- If one or more child jobs is running: The group job is set to running.
- If one or more child jobs fails: The group job is set to failed. Retrying the group job will retry only the failed jobs.
- If one or more child jobs times out: The group job is set to timed out. Retrying the group job will retry only the timed out jobs.
- If all child jobs complete: The group job is set to completed.
Timed action Jobs
When a timed action runs, it is run as a timed action job in the same way that jobs and group jobs are run. Although a timed action executes at regular intervals rather than one time only in the case of normal jobs, it is always associated with a single job Instance. This job instance is then run according to the intervals configured in the timed action. Timed action jobs can be viewed in the Job Dashboard and the Job Details sections of the Dalet Flex console.
Comments
0 comments
Please sign in to leave a comment.