Uploaded image for project: 'camunda BPM'
  1. camunda BPM
  2. CAM-5284

I can use a long error message with an External Task

    Details

    • Type: Feature Request
    • Status: Closed
    • Priority: L3 - Default
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 7.6.0, 7.6.0-alpha3
    • Component/s: engine
    • Labels:

      Description

      While working on a new model, our service tasks generated errors with backtraces which were too large to be stored in the various message related fields in the database. The engine crashed very ungracefully when that happened. Our fix was to change the table definitions so that instead of making message fields to have type `varchar(4000)`, they're defined as `text`.

      I'm not sure if that is really the best fix. It might be better to have a limit and enforce it by the engine. In any case, running into a database error causes trouble that requires an engine restart, which should be avoided.

      I tried to attach our changed `sql/create/postgres_engine_7.4.1-ee.sql` file for your consideration, but JIRA gave me the error message "No project could be found with id '10330'. Something on JIRA's end seems to be broken for file uploads. The only change that I made was the column type change as described above.

      1. catalina.out
        11 kB
        Hans Hübner
      2. postgres_engine_7.4.1-ee.sql
        26 kB
        Hans Hübner

        Issue Links

          Activity

          hans.huebner@lambdawerk.com Hans Hübner created issue -
          gimbel Robert Gimbel made changes -
          Field Original Value New Value
          Project camunda BPM Community Extensions [ 10330 ] camunda BPM [ 10230 ]
          Key EXT-56 CAM-5284
          Hide
          gimbel Robert Gimbel added a comment -

          Hi Hans,

          Thank you for your feedback.

          You raised the issue in our Extensions project but I think it is realted to the camunda platform directly. So I moved it there.

          Can you try to attach again please.

          Thanks
          Robert

          Show
          gimbel Robert Gimbel added a comment - Hi Hans, Thank you for your feedback. You raised the issue in our Extensions project but I think it is realted to the camunda platform directly. So I moved it there. Can you try to attach again please. Thanks Robert
          hans.huebner@lambdawerk.com Hans Hübner made changes -
          Attachment postgres_engine_7.4.1-ee.sql [ 21366 ]
          Hide
          hans.huebner@lambdawerk.com Hans Hübner added a comment -

          Attaching the file worked now, see above. I could not file the issue into the camunda BPM project, probably I don't have permission for that?

          Show
          hans.huebner@lambdawerk.com Hans Hübner added a comment - Attaching the file worked now, see above. I could not file the issue into the camunda BPM project, probably I don't have permission for that?
          Hide
          hans.huebner@lambdawerk.com Hans Hübner added a comment -

          This problem occurs in other places as well, see the attached backtrace. We're going to modify the Postgres schema for our needs, but it seems to me that the problem should be addressed at a different level because updating the database schema won't be a good option once an application is deployed to production.

          Show
          hans.huebner@lambdawerk.com Hans Hübner added a comment - This problem occurs in other places as well, see the attached backtrace. We're going to modify the Postgres schema for our needs, but it seems to me that the problem should be addressed at a different level because updating the database schema won't be a good option once an application is deployed to production.
          hans.huebner@lambdawerk.com Hans Hübner made changes -
          Attachment catalina.out [ 21368 ]
          meyer Daniel Meyer made changes -
          Assignee Daniel Meyer [ meyer ]
          Hide
          meyer Daniel Meyer added a comment -

          Hi Hans,

          In any case, running into a database error causes trouble that requires an engine restart, which should be avoided.

          that should not be the case, why did you have to restart the process engine?

          Back to the core of the issue:
          I gathered from the stacktrace that you are implementing an external task. The external task feature is quite new (introduced with 7.4). We will polish it based on user feedback. What we could do is employ the same pattern for External task that is also used for Jobs:

          • the message is a short(er) description of the error and has some maximum size.
          • a larger error trace can be provided as well. This has no size restriction and is stored in a separate table (ACT_GE_BYTEARRAY)
            In addition: the api could verify the maxlength of the string instead of relying on the database.

          This is how it is done for jobs and we could do it in the same way for external tasks.

          All the best,
          Daniel

          Show
          meyer Daniel Meyer added a comment - Hi Hans, In any case, running into a database error causes trouble that requires an engine restart, which should be avoided. that should not be the case, why did you have to restart the process engine? Back to the core of the issue: I gathered from the stacktrace that you are implementing an external task. The external task feature is quite new (introduced with 7.4). We will polish it based on user feedback. What we could do is employ the same pattern for External task that is also used for Jobs: the message is a short(er) description of the error and has some maximum size. a larger error trace can be provided as well. This has no size restriction and is stored in a separate table (ACT_GE_BYTEARRAY) In addition: the api could verify the maxlength of the string instead of relying on the database. This is how it is done for jobs and we could do it in the same way for external tasks. All the best, Daniel
          Hide
          hans.huebner@lambdawerk.com Hans Hübner added a comment - - edited

          Daniel,

          thank you for getting back. We are in fact using external tasks, and the
          problem with columns of insufficient length occurred in different
          situations. When the first incident occurred, the engine completely lost
          its mind and all subsequent database operations failed. I agree with you
          that this should not happen, but it did. Niall looked on the screen
          together with me and he can confirm.

          I'm not quite as sure whether this problem can safely be addressed on a
          case-by-case basis. If your code does not know how large the underlying
          database columns are, for all fields, it is always possible that something
          slips through and causes unrecoverable errors in your persistence layer.
          For that reason, I would suggest that making those columns that do not have
          a well-defined bounded maximum length in your application code be defined
          as being of unlimited size in the database layout. That is at least what I
          have done now, and as the issue occurred with different tables and columns,
          I feel much safer this way until you can confirm that the application code
          makes sure that column length overflow do not occur.

          Thanks,
          Hans

          Show
          hans.huebner@lambdawerk.com Hans Hübner added a comment - - edited Daniel, thank you for getting back. We are in fact using external tasks, and the problem with columns of insufficient length occurred in different situations. When the first incident occurred, the engine completely lost its mind and all subsequent database operations failed. I agree with you that this should not happen, but it did. Niall looked on the screen together with me and he can confirm. I'm not quite as sure whether this problem can safely be addressed on a case-by-case basis. If your code does not know how large the underlying database columns are, for all fields, it is always possible that something slips through and causes unrecoverable errors in your persistence layer. For that reason, I would suggest that making those columns that do not have a well-defined bounded maximum length in your application code be defined as being of unlimited size in the database layout. That is at least what I have done now, and as the issue occurred with different tables and columns, I feel much safer this way until you can confirm that the application code makes sure that column length overflow do not occur. Thanks, Hans
          Hide
          meyer Daniel Meyer added a comment -

          Hi Hans,

          thank you for getting back to us.

          > When the first incident occurred, the engine completely lost its mind and all subsequent database operations failed

          Would it be possible for you to reproduce this in a unit test?
          Then we can reproduce the problem an comment on it in a better way.

          As an aside: camunda is architected in a way that, in theory, this cannot happen. But I wnt to check...

          Tha nks,
          Daniel

          Show
          meyer Daniel Meyer added a comment - Hi Hans, thank you for getting back to us. > When the first incident occurred, the engine completely lost its mind and all subsequent database operations failed Would it be possible for you to reproduce this in a unit test? Then we can reproduce the problem an comment on it in a better way. As an aside: camunda is architected in a way that, in theory, this cannot happen. But I wnt to check... Tha nks, Daniel
          Hide
          hans.huebner@lambdawerk.com Hans Hübner added a comment -

          Hi Daniel,

          I am unable to reproduce the complete crash, but if I encounter it again, I will let you know. It seemed to be a follow-on problem to the original issue caused by the lack of space in a column. I am currently working with a modified schema that does not have a size restriction on the various message fields (which I prefer to truncating the message anyway), so it is unlikely that I'm going to run into this by accident now.

          We can live with using a non-standard schema for now, yet this is only a temporary solution and we'd hope that this bug be fixed for external tasks so that we can safely upgrade our database schema from your scripts.

          Show
          hans.huebner@lambdawerk.com Hans Hübner added a comment - Hi Daniel, I am unable to reproduce the complete crash, but if I encounter it again, I will let you know. It seemed to be a follow-on problem to the original issue caused by the lack of space in a column. I am currently working with a modified schema that does not have a size restriction on the various message fields (which I prefer to truncating the message anyway), so it is unlikely that I'm going to run into this by accident now. We can live with using a non-standard schema for now, yet this is only a temporary solution and we'd hope that this bug be fixed for external tasks so that we can safely upgrade our database schema from your scripts.
          Hide
          meyer Daniel Meyer added a comment -

          Hi Hans,

          we can validate the max-length of strings in the Java code instead of only in the DB. We will discuss internally whether if and when we will do this.

          Concerning the

          When the first incident occurred, the engine completely lost its mind and all subsequent database operations failed."

          The "org.postgresql.util.PSQLException: ERROR: value too long for type character varying(255)" will only rollback the current transaction. It will not have any effects on subsequent transactions. What could have happened is that some other component like your custom infrastructure for performing external tasks retried submitting the same error message which is too long repeatedly. After the fix I proposed above you would then still see many exceptions but ProcessEngineExceptions instead of database errors.

          How did you fetch and complete the external tasks?

          Does that make sense to you.

          All the best,
          Daniel

          Show
          meyer Daniel Meyer added a comment - Hi Hans, we can validate the max-length of strings in the Java code instead of only in the DB. We will discuss internally whether if and when we will do this. Concerning the When the first incident occurred, the engine completely lost its mind and all subsequent database operations failed." The "org.postgresql.util.PSQLException: ERROR: value too long for type character varying(255)" will only rollback the current transaction. It will not have any effects on subsequent transactions. What could have happened is that some other component like your custom infrastructure for performing external tasks retried submitting the same error message which is too long repeatedly. After the fix I proposed above you would then still see many exceptions but ProcessEngineExceptions instead of database errors. How did you fetch and complete the external tasks? Does that make sense to you. All the best, Daniel
          Hide
          hans.huebner@lambdawerk.com Hans Hübner added a comment - - edited

          Hi Daniel,

          it is unfortunate that we have not kept the original error message when the engine completely crashed. The message was rather scary and not the same as the ones that we've been seeing for the overly long column values. Sorry about this.

          With respect to external task fetching and completion, we use a sequence like this:

          GET /external-task to select a topic that we want to work on
          POST /external-task/fetchAndLock to actually fetch and lock the task that we've selected
          GET /process-instance/<process-instance-id>/variables to get the list of variables of the process (we don't know them in advance so we can't supply the list in the fetchAndLock call)
          POST /external-task/<external-task-id>/complete to complete the task

          The reason why we're using additional GETs is that we use the topic name as configuration parameter for the external task. Our external task handler executes shell commands, and the topic begins with "shell" and then has the command to execute appended. Likewise, as we do not know what variables the shell command wants to look at, we are not sending any variables in the fetchAndLock call, but rather fetch all process variables and make them available to the command in the environment.

          We'd prefer to be able to annotate the model with task parameters (i.e. have "shell" be the topic name, put the command to execute into a named external task parameter and maybe also be able to have other task parameters that'd allow us to choose one of several executors for the task), but that does not seem to be possible both from a modeler and from a model perspective at this point.

          Thanks,
          Hans

          Show
          hans.huebner@lambdawerk.com Hans Hübner added a comment - - edited Hi Daniel, it is unfortunate that we have not kept the original error message when the engine completely crashed. The message was rather scary and not the same as the ones that we've been seeing for the overly long column values. Sorry about this. With respect to external task fetching and completion, we use a sequence like this: GET /external-task to select a topic that we want to work on POST /external-task/fetchAndLock to actually fetch and lock the task that we've selected GET /process-instance/<process-instance-id>/variables to get the list of variables of the process (we don't know them in advance so we can't supply the list in the fetchAndLock call) POST /external-task/<external-task-id>/complete to complete the task The reason why we're using additional GETs is that we use the topic name as configuration parameter for the external task. Our external task handler executes shell commands, and the topic begins with "shell" and then has the command to execute appended. Likewise, as we do not know what variables the shell command wants to look at, we are not sending any variables in the fetchAndLock call, but rather fetch all process variables and make them available to the command in the environment. We'd prefer to be able to annotate the model with task parameters (i.e. have "shell" be the topic name, put the command to execute into a named external task parameter and maybe also be able to have other task parameters that'd allow us to choose one of several executors for the task), but that does not seem to be possible both from a modeler and from a model perspective at this point. Thanks, Hans
          Hide
          hans.huebner@lambdawerk.com Hans Hübner added a comment -

          Hello,

          are there any plans to change the `varchar(4000)`s in the Postgres schema to `text`? We're currently doing that manually, and I see no reason why one would want error messages and other textual information to be cut at an arbitrary size.

          Thanks,
          Hans

          Show
          hans.huebner@lambdawerk.com Hans Hübner added a comment - Hello, are there any plans to change the `varchar(4000)`s in the Postgres schema to `text`? We're currently doing that manually, and I see no reason why one would want error messages and other textual information to be cut at an arbitrary size. Thanks, Hans
          Hide
          matthijs.burke Matthijs Burke added a comment -

          Good morning Hans,

          Thanks for your input on this issue.

          A short question: would you like us to move this issue to our support project?
          The difference between the Support project and the Camunda BPM project is that when raising issues in the Camunda BPM project, they are not subject to the agreed SLAs and they can be viewed by all users. In contrast, issues raised in the Support project can only be seen by your authorized support contacts and us. You can find more information in our documentation.
          If you would like us to move this issue, please let us know.

          Thank you and best regards,
          Mat

          Show
          matthijs.burke Matthijs Burke added a comment - Good morning Hans, Thanks for your input on this issue. A short question: would you like us to move this issue to our support project? The difference between the Support project and the Camunda BPM project is that when raising issues in the Camunda BPM project, they are not subject to the agreed SLAs and they can be viewed by all users. In contrast, issues raised in the Support project can only be seen by your authorized support contacts and us. You can find more information in our documentation . If you would like us to move this issue, please let us know. Thank you and best regards, Mat
          Hide
          hans.huebner@lambdawerk.com Hans Hübner added a comment -

          Hi Mat,

          I frankly don't care so much where this request is being tracked, and it is also not an urgent matter in that we're currently changing the database schema creation files manually before creating the Camunda BPM database. It is simply an annoyance and something that will be disturbing in production, when either the engine crashes because it wants to write an overlong message or when someone tries to diagnose a problem, finding that the error message has been cut after 4000 characters.

          Thanks,
          Hans

          Show
          hans.huebner@lambdawerk.com Hans Hübner added a comment - Hi Mat, I frankly don't care so much where this request is being tracked, and it is also not an urgent matter in that we're currently changing the database schema creation files manually before creating the Camunda BPM database. It is simply an annoyance and something that will be disturbing in production, when either the engine crashes because it wants to write an overlong message or when someone tries to diagnose a problem, finding that the error message has been cut after 4000 characters. Thanks, Hans
          matthijs.burke Matthijs Burke made changes -
          Link This issue is related to SUPPORT-2426 [ SUPPORT-2426 ]
          Hide
          matthijs.burke Matthijs Burke added a comment -

          Good morning Hans,

          we have raised a separate issue in our Support project and have linked it to this issue: SUPPORT-2426. We will take a deeper look into this in the context of our product support and will respond in the Support issue.

          Thank you and best regards,
          Mat

          Show
          matthijs.burke Matthijs Burke added a comment - Good morning Hans, we have raised a separate issue in our Support project and have linked it to this issue: SUPPORT-2426. We will take a deeper look into this in the context of our product support and will respond in the Support issue. Thank you and best regards, Mat
          michael.schoettes Michael Schoettes made changes -
          Link This issue is related to SUPPORT-2426 [ SUPPORT-2426 ]
          michael.schoettes Michael Schoettes made changes -
          Link This issue is depended on by SUPPORT-2426 [ SUPPORT-2426 ]
          michael.schoettes Michael Schoettes made changes -
          Issue Type Bug Report [ 1 ] Feature Request [ 2 ]
          meyer Daniel Meyer made changes -
          Summary Postgres schema does not contain enough space for larger errors I can use a long error message with an External Task
          michael.schoettes Michael Schoettes made changes -
          Labels SUPPORT
          michael.schoettes Michael Schoettes made changes -
          Component/s engine [ 11656 ]
          meyer Daniel Meyer made changes -
          Assignee Daniel Meyer [ meyer ] Askar Akhmerov [ askar.akhmerov ]
          gimbel Robert Gimbel made changes -
          Fix Version/s 7.6.0 [ 14490 ]
          askar.akhmerov Askar Akhmerov made changes -
          Status Open [ 1 ] In Progress [ 3 ]
          askar.akhmerov Askar Akhmerov made changes -
          Status In Progress [ 3 ] Open [ 1 ]
          askar.akhmerov Askar Akhmerov made changes -
          Status Open [ 1 ] In Progress [ 3 ]
          askar.akhmerov Askar Akhmerov made changes -
          Remote Link This issue links to "Page (camunda confluence)" [ 10952 ]
          askar.akhmerov Askar Akhmerov made changes -
          Status In Progress [ 3 ] Resolved [ 5 ]
          Original Estimate 0 minutes [ 0 ]
          Remaining Estimate 0 minutes [ 0 ]
          Resolution Fixed [ 1 ]
          askar.akhmerov Askar Akhmerov made changes -
          Assignee Askar Akhmerov [ askar.akhmerov ] Philipp Ossler [ philipp.ossler ]
          askar.akhmerov Askar Akhmerov made changes -
          Status Resolved [ 5 ] In Progress [ 3 ]
          Hide
          askar.akhmerov Askar Akhmerov added a comment -

          multi tenancy test is missing

          Show
          askar.akhmerov Askar Akhmerov added a comment - multi tenancy test is missing
          askar.akhmerov Askar Akhmerov made changes -
          Assignee Philipp Ossler [ philipp.ossler ] Askar Akhmerov [ askar.akhmerov ]
          Hide
          askar.akhmerov Askar Akhmerov added a comment -

          added missing test

          Show
          askar.akhmerov Askar Akhmerov added a comment - added missing test
          askar.akhmerov Askar Akhmerov made changes -
          Status In Progress [ 3 ] Resolved [ 5 ]
          Assignee Askar Akhmerov [ askar.akhmerov ] Philipp Ossler [ philipp.ossler ]
          philipp.ossler Philipp Ossler made changes -
          Resolution Fixed [ 1 ]
          Status Resolved [ 5 ] Reopened [ 4 ]
          philipp.ossler Philipp Ossler made changes -
          Assignee Philipp Ossler [ philipp.ossler ] Askar Akhmerov [ askar.akhmerov ]
          askar.akhmerov Askar Akhmerov made changes -
          Status Reopened [ 4 ] In Progress [ 3 ]
          askar.akhmerov Askar Akhmerov made changes -
          Status In Progress [ 3 ] Resolved [ 5 ]
          Assignee Askar Akhmerov [ askar.akhmerov ] Philipp Ossler [ philipp.ossler ]
          Resolution Fixed [ 1 ]
          philipp.ossler Philipp Ossler made changes -
          Resolution Fixed [ 1 ]
          Status Resolved [ 5 ] Reopened [ 4 ]
          Assignee Philipp Ossler [ philipp.ossler ] Askar Akhmerov [ askar.akhmerov ]
          Hide
          hans.huebner@lambdawerk.com Hans Hübner added a comment -

          I don't get to see much of what you're doing, but I would like to point out that by "long error message" i mean something which can include a complete Java stack trace, and that easily amounts to a few kilobytes.

          Show
          hans.huebner@lambdawerk.com Hans Hübner added a comment - I don't get to see much of what you're doing, but I would like to point out that by "long error message" i mean something which can include a complete Java stack trace, and that easily amounts to a few kilobytes.
          askar.akhmerov Askar Akhmerov made changes -
          Status Reopened [ 4 ] Resolved [ 5 ]
          Assignee Askar Akhmerov [ askar.akhmerov ] Philipp Ossler [ philipp.ossler ]
          Resolution Fixed [ 1 ]
          philipp.ossler Philipp Ossler made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Assignee Philipp Ossler [ philipp.ossler ]
          askar.akhmerov Askar Akhmerov made changes -
          Remote Link This issue links to "Page (camunda confluence)" [ 10952 ] This issue links to "Page (camunda confluence)" [ 10952 ]
          matthijs.burke Matthijs Burke made changes -
          Link This issue is related to SUPPORT-2578 [ SUPPORT-2578 ]
          gimbel Robert Gimbel made changes -
          Fix Version/s 7.6.0-alpha3 [ 14609 ]
          michael.schoettes Michael Schoettes made changes -
          Link This issue is related to SUPPORT-2578 [ SUPPORT-2578 ]
          michael.schoettes Michael Schoettes made changes -
          Link This issue is depended on by SUPPORT-2578 [ SUPPORT-2578 ]
          tassilo.weidner Tassilo Weidner made changes -
          Link This issue is related to CAM-8832 [ CAM-8832 ]
          kerstin.hebel Kerstin Hebel made changes -
          Remote Link This issue links to "Page (camunda confluence)" [ 10952 ]
          thorben.lindhauer Thorben Lindhauer made changes -
          Workflow camunda BPM [ 37365 ] Backup_camunda BPM [ 61043 ]

            People

            • Assignee:
              Unassigned
              Reporter:
              hans.huebner@lambdawerk.com Hans Hübner
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development