Overview
This article explains how to correctly troubleshoot issues with tiles. The issues are mainly with two types of actions: Viewing a tile and its contents, and Removing/Reinstalling a tile.
Workflow
Instructions
Each blue rectangle represents a troubleshooting procedure and links to a section in this article:
- Reproduce the issue
- Clear the browser cache
- Investigate the browser error console
- Troubleshoot network tab
- Build the edit link manually
- Troubleshoot tiles via the edit page
- Troubleshoot error logs
- Troubleshoot the HTTP error code
- Troubleshoot in the database
- Troubleshoot in the source code
- Open JIRA ticket
- Troubleshoot Corrupt Image Related Errors
- Enable Debug Logging & Uninstall
- Check for Orphan Tile
- Delete Orphan Tile from DB
- Investigate the Cause and Recreate the Tile
- Verify the behavior directly in the Document
- Troubleshoot the News Stream
- Regenerate Javascript
- Clear Application Cache and Rolling Restart
- Verify and Close Ticket
Reproduce the Issue
As an initial first step, verify whether or not you are able to reproduce the customer's reported issue on your end. When authorized, connect to their environment and attempt to reproduce this from your own browser. This can help verify whether the issue is specific to their browser cache or potentially isolated to their internal network.
Clear the browser cache
If you are not able to reproduce the behavior reported within your own browser, suggest that the customer clear their browser's cache and cookies to ensure that these are not a source for the reported behavior.
- Clear cache & cookies in Google Chrome
- Clear your browsing history in Safari on Mac
- How to clear the Firefox cache
- View and delete browser history in Microsoft Edge
Investigate the browser error console
The next step in the troubleshooting process is checking the errors in the browser console.
The browser console can be accessed in Chrome by pressing F12 → Console tab, and in other browsers in similar ways (consult your browser’s documentation).
While reproducing the issue, observe the Console for any error.
Both “Warning” and “Error” level errors may be useful.
Example of common errors:
CORB |
|
HTTP 404 |
|
Javascript errors |
|
HTTP 401 |
As shown in the examples above, HTTP 400 series errors may appear both in warnings and errors, depending on the portion of code where the error has occurred.
If you find a specific HTTP error, proceed with troubleshooting the HTTP error code.
Javascript errors like the one shown (Cannot read property ‘style’ of null) have a broader impact as the execution of javascript code usually stops in this case, unless the exception is properly handled.
If you find errors, please check in other browsers. Make sure the browser used by the customer is the same as you are testing with, and make sure it is in the list of supported browsers for your Jive version.
Example:
After checking the console, if the page loads, in the same browser interface you can proceed to check the network tab for network errors. For various reasons, not always network errors automatically translate to errors in the browser console.
If the page does not load, please proceed to 4. Troubleshoot tiles via the edit page. If, for any reason, you are not able to reach out to the Edit link, you can build the link manually by following 3. Build the edit link manually.
Troubleshoot network tab
The network tab of the developer’s console can be accessed in Chrome by pressing F12 → Network tab.
Ordering by “Status” is an easy way to find HTTP 400-Series and 500-Series errors.
When hovering with the mouse over the first column, you can get quickly the endpoint:
When clicking on the single request (in Chrome, you need always to click on the first column for this procedure to work, which is quite counterintuitive), you can access multiple valuable information on the single request.
The header sub-tab will contain the request (scroll down to the bottom for the Payload):
In this example, the API call tries to get the userBadge.
The “Response” tab contains the response:
The “Preview” tab will contain an HTML-rendered version of the response, which can be useful sometimes:
When you find an HTTP Error code, please proceed with section 6. Troubleshoot the HTTP error code.
Build the edit link manually
Sometimes it is not possible to access the Edit link directly.
The reason is that the Edit link is shown on the page once the page itself is rendered - but if the page fails to render entirely, it is definitely possible that you are not even able to reach the Edit link.
In order to manually build the edit link, perform the following steps:
- For the activity page:
- Append /api/v3 at the end of the space, sub-space or group
- In the received JSON, the container id is found in the first line:
- At the very end, you will also find the typeCode:
- Build the Edit link as:
https://instance.jiveon.com/edit-place.jspa?containerType=T&containerID=N
Where N is the id you found at the first step, and T is the typeCode. - For example, in this case it would be:
https://jivedemo-learning4trilogy.jiveon.com/edit-place.jspa?containerType=700&containerID=1266
- For the other pages:
- Find the page id by inspecting the page link:
- Build the Edit link as https://instance.jiveon.com/edit-page.jspa?pageId=N
Where N is the page Id found in the step before
- Find the page id by inspecting the page link:
If you are able to access the page through the newly created link, please proceed with section 4. Troubleshoot tiles via the edit page. Otherwise, please proceed with 5. Troubleshoot error logs.
Troubleshoot tiles via the edit page
Once you reach out the edit page, perform the following actions:
- Investigate the browser console (as described in Investigate the browser error console) and the network tab (as described in Troubleshoot network tab) - the reason is that sometimes, when in Edit mode, you can see more errors than what appears normally on the page.
- Search for broken images or broken configurations
- Try to edit and save the configuration of each tile
- With the Customer's approval, try removing tiles one at a time, saving the page, reproducing the error, and adding it back. Make sure to take note and a screenshot of each tile’s settings before removing it.
- Using this method, in most cases, you should be able to narrow down to the problematic tile
- Once you find the tile that has problems, look at the content or items configured inside it. Search for:
- Broken images
- Broken links
- Special characters
- External vs internal links
- Content that is too big or too small compared to the container in which it should fit (for example, big images and very long texts)
- Again, proceed by removing or adding content/items inside the tile, until you find what causes the issue
Troubleshoot error logs
When you encounter a server-side issue (not due to client-side javascript errors), you should be able to correlate to a log entry.
By investigating the logs (Kibana or sbs.log, access.log, and similar) you should be able to identify the specific error caused by the Tile.
Note: this is true only if the error is server-side - if the error is on the client-side (for example, a javascript or CORB error) you will not be able to find anything in the logs.
For example, searching for the Authentication Error shown in the example above, the one about “https://community.aurea.com/social/rpc?st=default%3AP…”, on Kibana it is possible to search for:
- status:401 AND community.aurea.com AND POST
In cases like this one, do not limit the search results to the specific instance, but remove the instance filter:
In this way, you would be able to discover errors like this:
In this example, the instance id is different because this is the cloud-frontdoor instance, which is generating a (wrong) 401 answer and thus generating the error.
If you find a specific error code, make sure to read the entire error message, as it may have hints on what exactly is failing, and then proceed to 6. Troubleshoot the HTTP error code.
Troubleshoot the HTTP error code
Once you get an HTTP Error code, you can consult several resources on the Internet to understand what the code exactly means. Here is a quick reference for you, with some additional Jive-specific information:
- 400 bad request: malformed request - uncommon, malformed HTTP request
- 401 unauthorized: possible authorization/authentication issue - this is often generated by the CDN when it thinks you are accessing resources without authorization, and also from several API endpoints when the auth token expired
- 403 forbidden: server-side forbidden (for example, a forbidden directory) - this is unusual in Jive, and it might be generated by reverse proxies and not from the Jive application server
- 404 not found: the resource is no longer available - typical Jive error. The resource is not there, either because it is not there anymore, or because of a malformed URL
- 500 internal server error: generic error at the server level - this is often caused by unhandled exceptions, which cause the code to fail without a specific error
- 502 bad gateway: the relevant server does not respond in a correct way, and the loadbalancer/reverse-proxy is not able to connect to it - most of the times for Jive it means that there is some outage, or that the call is pointing to the wrong endpoint (for HTTP 502, the latter is more likely to be the case)
- 503 service unavailable: the relevant server is not available - most of the times for Jive it means that there is some outage, or that the call is pointing to the wrong endpoint (for HTTP 503, the first is more likely to be the case)
- 504 gateway timeout: the relevant server is not reachable at all, and the loadbalancer/reverse-proxy is not able to connect to it - usually outage or non-existing endpoint
Note: given the number of layers between the various components, the distinction between 502, 503 and 504 is difficult and often not relevant.
In theory, 502 means “the server replies, but in an incorrect way”, 503 means “the server is reachable, but rejects the call”, and 504 means “the server is not reachable”.
But in a scenario in which A calls B, B calls C, it could happen that C is down, replies with 504, and in turn B replies to A with a different code, for example, 503.
Troubleshoot in the database
When you exhaust all other possible options, the error could be caused by some data inconsistency.
The most important tables involved are jivetileinstance, jivetilepage, and jivetile.
You can start from the following query and customize it to your needs:
SELECT to_timestamp(jti.modificationdate/1000) AS tile_modification_date,
to_timestamp(jtp.modificationdate/1000) AS page_modification_date,
jti.parentobjectid,
jti.instanceorder,
jti.tileid,
jti.tileinstanceid,
jti.instancecolumn,
jt.tilename,
jtp.pagetype,
jtp.displayname,
jtp.name,
jtp.tilepageid
FROM jivetileinstance jti
JOIN jivetile jt ON jt.tileid = jti.tileid
JOIN jivetilepage jtp ON jti.parentobjectid = jtp.tilepageid
ORDER BY jti.modificationdate DESC LIMIT 100;
Beware that not all databases support the “LIMIT” option (the standard for Jive AWS Cloud, PostgreSQL, supports it). If you do not put any LIMIT, make sure to put a WHERE clause.
Sample output:
From this output, you can understand what tiles are configured in a page.
You also get the pageid, which might be useful in the section Build the edit link manually.
Once you get the problematic tile id (column: tileid) you can investigate the configuration of the tile by looking at the column config in the table jivetileinstance, for example:
SELECT tileinstanceid, config FROM jivetileinstance where tileinstanceid = 2316;
Sample output:
The result may seem not easy to read, but it is plain JSON.
You can investigate it by beautifying it in your editor of choice or in any online tool (codebeautify.org is used in the following example):
In this example, the next step would be to verify that all the links in the configuration, for both the image and the thumbnail, are existing and reachable in the instance.
Looking at the config, you can also replicate the same tile configuration in a test page and see if you are able to identify the error without compromising the original page.
Troubleshoot in the source code
In case you need to investigate the source code, these are the major places where you can find the tile source code:
- Implementation (data providers) .java files for tiles
- Config pages for tiles
- Soy templates for tiles
Example of useful code places for the Carousel tile:
Remember that in github you can press “t” and activate the file finder.
You can then write the tile name (example, “carousel”) and find the relevant files:
For example, in this way you can find the Carousel main javascript.
If there are no useful information in the logs, and no clear root cause in the source code, it may be a bug. In this case, proceed with section 9. Open JIRA ticket.
Open JIRA ticket
If you think it is a bug, you should be able to reproduce it. If you suspect it is a bug, but still cannot reproduce, make sure you are reproducing in the same exact conditions:
-
- same tiles
- same page
- same page layout
- same place name
- same theme
- same underlying document(s) or people (depending on what is shown in the tile)
Check if it does not depend on the single node.
If you cannot reproduce it, it is not a bug, most likely.
Start from scratch and repeat every step. Ask the customer if there is anything special for that specific piece of content(s).
When you can reproduce the issue, create a Jira ticket either in JVCLD or JVHOPST.
Troubleshoot Corrupt Image Related Errors
If the customer is trying to uninstall and reinstall a tile, and seeing an "Unexpected Error" message, you should check the SBS Log (or Kibana for Cloud). This might be happening because of corrupt or orphaned images.
If you see the error: "Could not load container with type: *** and ID ***** for tile page: *****"
, you will have the below information:
- Whether the space where the tile lives is a social group (type: 700) or some other kind of place.
- The groupID and tilepageid
If you see the error: "The storage provider was unable to delete the image data,
key was 'image-*****"
, then you will have Imageids of the affected images.
- Run a query (in MagicQuery for Cloud, or PSQL/Oracle SQL Developer for On-Prem) like the one below to get more information on the place / social group:
select * from jivegroup where groupid = '12459';
- We can similarly isolate the image ids we see to confirm that they are linked to the add-on and not a document within the group that threw the error.
select imageid, objectid, filename from jiveimage where imageid in (203386,203387,203388);
- The objectid will match the internaldocid for a document if these are related, so we can chain this query on the document tables
select internaldocid, documentid, containerid from jivedocument where objectid in (select objectid from jiveimage where imageid in (203386,203387,203388));
- If we can trace these images back to the add-on with some certainty, we may be able to follow the steps within this article to clear out these image references before re-attempting the reinstall process.
Enable Debug Logging and Uninstall
Enabling Debug Logging can provide additional insights into errors in logs. Note, however, Enabling debug logs on production can be dangerous due to the heavy logging throughput (which can lead to extreme latency/an outage), but it can be relatively safe if performed outside of peak hours/on the weekend.
Follow this article to set up logging level override to DEBUG.
You can consider adding DEBUG override only for the below classes, to narrow down to relevant log messages:
com.jivesoftware.community.extension.impl.ExtensionManagerImpl
com.jivesoftware.community.objecttype.impl.AbstractJiveObjectManager
com.jivesoftware.community.cloudalytics.impl.AnalyticsActivityEventFilter
com.jivesoftware.base.event.v2.NonBlockingEventDispatcherImpl
You can narrow down log messages by inserting markers before reproducing the error:
- Navigate to Admin Console > System > Logging Management > Log Viewer
- Press "Insert Mark"
- This will insert a marker line into the log, looking something like:
2021-08-31 13:21:38,179 [http-nio-127.0.0.1-9001-exec-8] [1:jiveadmin@trilogy.com:REGULAR] ERROR com.jivesoftware.base.log.JiveLogManagerImpl - --- Marker inserted by jiveadmin@trilogy.com at 01:21:38 PM 2021.08.31 ---
- Make a note of the above marker in a text file
- Go back and reproduce the error in Jive
- Come back to the Log Viewer and insert another marker, and make note of this new marker as well.
- Now when you look at log files (SBS Log for On-Prem/Hosted, Kibana for Cloud), you can search for the above markers and look at the errors between the markers.
Ask the customer to reproduce the error after you have set up DEBUG logging and markers.
Check for Orphan Tiles
When the error is reproduced with DEBUG logging turned on, look for the resulting "Updating" log lines. Look for the last one before the error. Extract the externalID from it. Example below.
Get the Tile's ID with the below query (MagicQuery for Cloud, or PSQL/Oracle SQL Developer for On-Prem):
SELECT tileinstanceid, parentobjecttype, parentobjectid FROM jivetileinstance
WHERE externalid = '<externalID>'
Check the API endpoints:
https://<jivedomain>/api/core/v3/tiles/<tileinstanceid>
- This may return an internal server error, which lets you know there is a definite problem with the tile.
- If the parent object type is a "tilepage" , then you can check:
https://<jivedomain>/api/core/v3/pages/<parentobjectid>
- Otherwise, may have to check the endpoint that corresponds to the parentobjecttype.
- Can also check parent of parent or other "places".
- To check a "group" (type 700) place:
https://<jivedomain>/api/core/v3/places?filter=entityDescriptor(700,<groupid>)
- To check a "group" (type 700) place:
Delete Orphan Tiles from DB
If we've confirmed a problematic tile, we can delete it.
- At this point, take a backup of the database. This is not a very risky operation, but the main concern is mistyping a DELETE query.
SELECT tileinstanceid FROM jivetileinstance WHERE externalid = '<externalID>'
- Save the result.
DELETE FROM jivetileinstance WHERE externalid = '<externalID>'
INSERT INTO jivetileinstgone (tileinstanceid, deletiondate) VALUES (<resulting ID from step 1>, <deletiondate>)
<deletiondate> is the current UNIX timestamp in milliseconds (multiply unixtimestamp.com by 1000).
Note: For On-Prem installations, the customer can perform the above DB operations after taking a backup. They can use their preferred tool like PSQL or Oracle SQL Developer for these operations. For Hosted or Cloud, there is a set process to be followed for modifying database tables.
Investigate the Cause and Recreate the Tile
When a tile has been unexpectedly removed from the page entirely, there is no mechanism within Jive for restoring the previous versions. See Can We Recover/Retrieve a Place or Content Item That Has Been Deleted? for more information on helping customers review their permissions to prevent these situations from occurring in the future.
To assist in the permissions review, you can help a customer investigate who was responsible for removing the Tile using the guidance within Determining Who Removed a Tile from a Group. This process is not guaranteed, as it relies on the modified timestamps which only persist from the most recent "Save" of the affected Place. For this reason, take extra care while reviewing an issue of this nature to avoid sabotaging the timestamps.
Once you have provided the customer with all available information about who removed the tile, guide the customer to recreate their Tile.
Verify the behavior directly in the Document
When the issue with a Document Viewer Tile's embedded document HTML is only reproducible within certain web browsers, this can indicate that the handling of the markup between the browsers is different. As an isolation step, open the linked document directly and confirm whether or not the same misbehavior is present within the offending browser, as seen in Embedded HTML iFrame Content not loading correctly in Document Viewer Tile.
If the same behavior occurs directly within the document, this confirms that the Tile is not the source of the issues seen. Suggest that the customer contact the developer of the Markup directly to address the misbehavior. Troubleshooting custom HTML is outside of the Jive Support Scope of Work.
If the document functions correctly, this may indicate mishandling with the Document Viewer Tile. When possible, collect the HTML code used within the offending document and attempt to reproduce this within a clean test environment to ensure that custom theming or add-ons are not a factor. If the behavior persists and appears to be a defect, collect any Console Errors and Logging messages related to the behavior and Open a Jira Ticket.
Troubleshoot the News Stream
When the reported issue is impacting both the Tiles and the News Streams within the customer instance, this can be the result of a system-wide caching issue. It is suggested that you review the guidance within News Stream / Activity Stream - Troubleshooting Article to confirm this is not a known issue affecting the instance before continuing with your review of the Tile itself.
Regenerate Javascript
When an issue with Tiles is producing a number of Javascript Errors that cannot be directly traced to a single cause and is reproducible within multiple browsers, this may indicate that the Javascript on the instance has become stale or the cached data from installed plugins and themes are contributing to the issue.
In these situations, you can Regenerate the Javascript in Jive with approval from the customer. This process may result in a temporary increase in page load times due to clearing the Application Cache, so is best performed during non-peak hours.
Clear Application Cache and Rolling Restart
If you encounter an issue with delays or poor performance when attempting to edit a Tile, this may indicate a need to Clear the Application Cache on the instance. This process may result in a temporary increase in page load times, so is best performed during non-peak hours.
Note: If you previously Regenerated the Javascript due to JS Errors in the console, you have already cleared the Application Cache as part of this procedure, and you can instead simply perform the Rolling Restart.
Once cleared, you can follow the guidance in the articles below based on the instance type. Note that while rolling restarts do not entail any downtime, they can have a temporary impact on overall node performance:
Verify and Close Ticket
Once you have found the root cause of the issue on a test instance, and you have found a fix for it, apply that fix on the customer instance as well. If the fix is something that the customer can apply themselves, it may suffice to demonstrate the root cause and the fix to the customer over a video call.
Comments
0 comments
Please sign in to leave a comment.