View
 

Error Codes

Page history last edited by Aliya 2 weeks ago

Table of Contents


 

Jobs


The following error codes may occur when your job is running.

 

J000

There was an unknown error with your job.  Please submit a ticket and tell us your job ID as well as your best description of what you were trying to do.  If you were trying to run a job in the sandbox environment, try running it in the live environment instead.

 

J100

80legs encountered run-time errors with your job.  Please submit a ticket if you feel you set up your code correctly.

 

J101

Your job crawled more than 10,000 pages, and the % of pages crawled where code fails to load or there is a constructor or initialize error was > 1%.

 

If you're using an 80app provided by 80legs, it's possible this error was caused by a temporary problem with our system.  In this case, we recommend trying to run your job again in a few hours.

 

J102

Your job crawled more than 10,000 pages, and the % of pages crawled where parseLinks() throws a security exception was > 1%.

 

J103

Your job crawled more than 10,000 pages, and the % of pages crawled where parseLinks() throws a general exception was > 25%.

 

J104

Your job crawled more than 10,000 pages, and the % of pages crawled where your constructor and/or initialize functions have a timeout, and 80legs successfully stopped the threads, was > 1%.

 

J105

Your job crawled more than 10,000 pages, and the % of pages crawled where parseLinks() and/or processDocument() functions have a timeout, and 80legs successfully stopped the threads, was > 10%.

 

J201

Your job analyzed more than 10,000 pages, and the % of pages analyzed where processDocument() throws a security exception was > 1%.

 

J202

Your job analyzed more than 10,000 pages, and the % of pages analyzed where processDocument() throws a general exception was > 25%.

 

J300

While your job was running, the number of times your constructor and/or initialize functions had timeouts where 80legs had to stop the JVM was > 10.

 

J400

While your job was running, the number of times parseLinks() and/or processDocument() had timeouts where 80legs had to stop the JVM was > 25.

 

J500

80legs could not find the code you wanted to run.  This is most likely a problem within 80legs.  Please submit a ticket if you feel you set up your code correctly.

 

J501

80legs could not find the seed list you wanted to use with your job.

 

J600

You chose to run keyword or regular expression matching and either didn't enter the expression data or provided an invalid expression.

 

 

Code Approval


The following error codes may occur when your code is being run through the approval process.

 

C000

This is a general error that should be encountered rarely.  Please submit a ticket if you get this error.

 

C101

Your JAR was signed.  Custom code should not be signed.

 

C102

You must use the latest version code when writing custom code.  From time to time, we may make changes to the WebAnalysis class.  A new version code will be provided with each change to the class, and custom code must use the newest version code.

 

C103

Your code was taking too long to complete.  Code must finish running within 10 seconds.

 

C104

The JAR file was not found.

 

C200

initialize() was called with a general error.  This error is a catch-all for any generic errors that occur with initialize().

 

C201

initialize() was called with IllegalArgumentException.  Your code has illegal arguments for initialize().

 

C202

initialize() was called with NoSuchMethodError.  Your code doesn't contain an implementation for initialize().

 

C300

parseLinks() was called with general error.  This error is a catch-all for any generic errors that occur with parseLinks(). 

 

C301

parseLinks() was called with IllegalArgumentException.  Your code has illegal arguments for parseLinks().

 

C302

parseLinks() was called with NoSuchMethodError.  Your code doesn't contain an implementation for parseLinks().

 

C400

processDocument() was called with general error.  This error is a catch-all for any generic errors that occur with processDocument().

 

C401

processDocument() was called with IllegalArgumentException.  Your code has illegal arguments for processDocument().

 

C402

processDocument() was called with NoSuchMethodError.  Your code doesn't contain an implementation for processDocument().

 

C500

getVersion() was called with general error.  This error is a catch-all for any generic errors that occur with getVersion().

 

C501

getVersion() was called with IllegalArgumentException.  Your code has illegal arguments for getVersion().

 

C502

getVersion() was called with NoSuchMethodError.  Your code doesn't contain an implementation for getVersion().

 

C600

Your code cannot contain another JAR in it.

 

C700

The data used during your code approval could not be read.

 

 

Crawl Status


DNS_ERROR

The DNS could not resolve host name for this URL.

 

EXCEEDS_MAX_PAGE_SIZE

The page you were trying to crawl contains more data than your current subscription plan is allowed to download per page.  You can upgrade your plan or contact us to discuss available options.

 

HTTPS_SKIP

Our crawler can crawl most https-encrypted pages, but on rare occassion, it cannot.

 

INVALID_URL

This URL is not formatted correctly and was not crawled.

 

MIME_TYPE_SKIP

The MIME type for this URL is not included in the MIME types you chose to crawl during your job.

 

NO_RESPONSE

This error typically means that the web server did not give any response to our request for the page (80legs tries multiple times to fetch each page).  The remote server may not be functioning correctly at that moment.

 

ROBOTS.TXT_ERROR

Our crawler obeys the robots.txt specification and will not crawl pages that are blocked by a domain's robots.txt directives.

 

Other Codes

You may receive a three-digit numeric code in your crawl status.  These codes correspond to the standard HTTP response status codes.

 

 

Process Status


The following codes are shown in your Crawled URLs results file for each page that was crawled.

 

GOOD

The page was analyzed with no problem.

 

NO_PROCESS

The page was not analyzed due to the analysis regular expression or a crawling error.  If you received a robots.txt error, it is likely that the site you tried to crawl prohibits crawling this URL according to its robots.txt file.

 

NO_PROCESS_MIME_OR_ANALYSIS_REGEX

The page was not analyzed due to the analysis MIME type or analysis regex.  The page was crawled successfully and was used in parseLinks().

 

PROCESS_TRUNCATED

The page was analyzed with no problem, but the results were over the limit.  The results were trunctated to 1024 bytes.

 

PROCESS_EMPTY_PAGE

The page was analyzed successfully, but the document contents were empty (we allow processing empty documents because users might pull interesting information from the headers, status, or just the fact that it was empty).

 

PROCESS_RETURN_NULL

Your processDocument() method returned null for the page listed.  A return value of null from processDocument() signals 80legs to not include that url in the analyzed results.

 

PARSE_SECURITY_EXCEPTION

The parseLinks() method threw a SecurityException.  This is usually because the parseLinks() method attempted to do something that is not allowed, such as accessing the disk or making network requests.

 

PARSE_EXCEPTION

The parseLinks() method threw a general Exception.

 

PROCESS_SECURITY_EXCEPTION

The processDocument() method threw a SecurityException.  This is usually because the processDocument() method attempted to do something that is not allowed, such as accessing the disk or making network requests.

 

PROCESS_EXCEPTION

The processDocument() method threw a general Exception.

 

PROCESS_INTERNAL_ERROR

This means that an 80legs internal error occurred when processing your document.  Hopefully this never happens.  :)

 

COMPUTE_TIMEOUT_GOOD

Your parseLinks() and processDocument() combined took longer than 30 seconds to finish, and 80legs was able to stop your code.

 

COMPUTE_TIMEOUT_BAD

Your parseLinks() and processDocument() combined took longer than 30 seconds to finish, and 80legs was unable to stop your code, which forced it to stop the JVM.

Comments (0)

You don't have permission to comment on this page.