Skip to content

Conversation

MaSven
Copy link

@MaSven MaSven commented Aug 15, 2025

Use the new censys sdk for better stability in API updates. As censys now returns multiple endpoints per search, we now iterate over every endpoint and create a new result. This also changes the way ip are saved in the raw response, as we dont have the problem anymore, as we have one ip per endpoint.

In #684 described, without these changes, censys no longer works.
Also closes #684

This was already tested.

Summary by CodeRabbit

  • Refactor

    • Censys integration switched to the official SDK for more reliable searches, typed responses, and improved pagination.
    • Authentication now uses a Token and Organization ID (field renamed).
    • Endpoint results surface clearer fields (IP, host, port, URL).
  • Chores

    • Added new dependencies required by the Censys SDK.
  • Documentation

    • Updated credential examples/provider docs for token/org-id usage and added a ZoomEye usage section.

ehsandeep and others added 2 commits June 20, 2025 21:48
…iscovery#689)

Bumps [golang.org/x/oauth2](https://github.com/golang/oauth2) from 0.18.0 to 0.27.0.
- [Commits](golang/oauth2@v0.18.0...v0.27.0)

---
updated-dependencies:
- dependency-name: golang.org/x/oauth2
  dependency-version: 0.27.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Copy link
Contributor

coderabbitai bot commented Aug 15, 2025

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Replaced manual HTTP Censys integration with the official censys-sdk-go, switched authentication to token+organization ID, removed legacy Censys response models, updated key parsing/env var names, and added the SDK dependency in go.mod and docs.

Changes

Cohort / File(s) Summary
Censys integration via SDK
sources/agent/censys/censys.go
Replaced manual HTTP requests and custom response parsing with censys-sdk-go (GlobalData.Search). Uses typed endpoint models, NextPageToken pagination, context-backed calls, and SDK error handling; removed old URL constant.
Removed legacy response models
sources/agent/censys/response.go
Deleted file and three exported response types (CensysResponse, CensysResponseResult, CensysResponseLinks).
Credential model and parsing
sources/keys.go, sources/provider.go
Renamed Keys field CensysSecretCensysOrgId. Provider now parses second credential segment into CensysOrgId. Env var names changed to CENSYS_API_TOKEN / CENSYS_ORGANIZATION_ID.
Docs and deps
README.md, go.mod
README updated for token/org-id credential examples and provider docs; added dependency github.com/censys/censys-sdk-go v0.19.1 (and indirect decimal module) in go.mod.

Sequence Diagram(s)

sequenceDiagram
  participant Agent
  participant Provider
  participant CensysSDK as Censys SDK
  participant CensysAPI as Censys API

  Agent->>Provider: GetKeys()
  Provider-->>Agent: CensysToken, CensysOrgId
  Agent->>CensysSDK: Init client (OrgID, Token)
  Agent->>CensysSDK: GlobalData.Search(query, pageToken)
  CensysSDK->>CensysAPI: Request search
  CensysAPI-->>CensysSDK: Search results (+NextPageToken)
  CensysSDK-->>Agent: Typed results (hits/endpoints)
  Agent->>Agent: Iterate endpoints → collect IP/Host/Port/URL/Raw
  Agent-->>CensysSDK: GlobalData.Search(nextPageToken) if NextPageToken present
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~15 minutes

Assessment against linked issues

Objective Addressed Explanation
Update Censys auth to new key format (Org ID + Token) [#684]
Replace broken integration so queries function with updated Censys API [#684]
Remove/retire legacy response handling incompatible with new API [#684]

Out-of-scope changes

Code Change Explanation
Added ZoomEye documentation section (README.md) README additions about ZoomEye are documentation unrelated to fixing Censys integration in issue #684.

Suggested reviewers

  • DhiyaneshGeek
  • ehsandeep

Poem

I nibble code where old keys slept,
Swapped secret crumbs for Org ID kept.
SDK hops in, pages neatly swept,
Endpoints found where bytes had crept.
A rabbit cheers — the search is leapt! 🐇✨

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🔭 Outside diff range comments (1)
sources/agent/censys/censys.go (1)

35-52: Limit enforcement counts hits, not endpoints; can exceed user limit.

Each hit can yield multiple endpoints, but you only increment by hits. This can overshoot by a large margin and breaks expected Limit semantics.

Proposed refactor: track how many endpoints you’ve emitted, pass a “remaining” budget into query, and adjust PageSize accordingly.

   go func() {
     defer close(results)

-    var numberOfResults int
+    var emitted int
     nextCursor := ""
+    perPage := MaxPerPage
+    if query.Limit > 0 && query.Limit < perPage {
+      perPage = query.Limit
+    }
     for {
       censysRequest := &CensysRequest{
         Query:   query.Query,
-        PerPage: MaxPerPage,
+        PerPage: perPage,
         Cursor:  nextCursor,
       }
-      censysResponse := agent.query(session, censysRequest, results)
+      remaining := -1
+      if query.Limit > 0 {
+        remaining = query.Limit - emitted
+        if remaining <= 0 {
+          break
+        }
+      }
+      censysResponse, out := agent.query(session, censysRequest, results, remaining)
       if censysResponse == nil {
         break
       }
-      nextCursor = censysResponse.ResponseEnvelopeSearchQueryResponse.Result.NextPageToken
-      if nextCursor == "" || numberOfResults > query.Limit || len(censysResponse.ResponseEnvelopeSearchQueryResponse.Result.Hits) == 0 {
+      if censysResponse.ResponseEnvelopeSearchQueryResponse == nil || censysResponse.ResponseEnvelopeSearchQueryResponse.Result == nil {
+        break
+      }
+      nextCursor = censysResponse.ResponseEnvelopeSearchQueryResponse.Result.NextPageToken
+      emitted += out
+      if nextCursor == "" || (query.Limit > 0 && emitted >= query.Limit) || len(censysResponse.ResponseEnvelopeSearchQueryResponse.Result.Hits) == 0 {
         break
       }
-      numberOfResults += len(censysResponse.ResponseEnvelopeSearchQueryResponse.Result.Hits)
+      if query.Limit > 0 {
+        remaining = query.Limit - emitted
+        if remaining <= 0 {
+          break
+        }
+        if remaining < MaxPerPage {
+          perPage = remaining
+        } else {
+          perPage = MaxPerPage
+        }
+      }
     }
   }()

And update the helper signature to return how many endpoints were emitted (see next comment).

🧹 Nitpick comments (2)
sources/provider.go (1)

155-167: Support new env var names for Censys (backward- and forward-compatible).

Env vars still reference CENSYS_API_ID/SECRET, while code expects token+org ID. To reduce confusion and support new accounts, accept CENSYS_TOKEN and CENSYS_ORG_ID as an alternative pair.

Apply this minimal change to add support while keeping old names:

   appendIfAllExists := func(arr []string, env1 string, env2 string) []string {
     if val1, ok := os.LookupEnv(env1); ok {
       if val2, ok2 := os.LookupEnv(env2); ok2 {
         return append(arr, fmt.Sprintf("%s:%s", val1, val2))
       } else {
         gologger.Error().Msgf("%v env variable exists but %v does not", env1, env2)
       }
     }
     return arr
   }
   provider.Fofa = appendIfAllExists(provider.Fofa, "FOFA_EMAIL", "FOFA_KEY")
-  provider.Censys = appendIfAllExists(provider.Censys, "CENSYS_API_ID", "CENSYS_API_SECRET")
+  // Back-compat (old naming) and new naming supported
+  provider.Censys = appendIfAllExists(provider.Censys, "CENSYS_API_ID", "CENSYS_API_SECRET")
+  provider.Censys = appendIfAllExists(provider.Censys, "CENSYS_TOKEN", "CENSYS_ORG_ID")
   provider.Google = appendIfAllExists(provider.Google, "GOOGLE_API_KEY", "GOOGLE_API_CX")
sources/agent/censys/censys.go (1)

59-64: Optional: use a cancelable context (timeout/deadline) for API calls.

Relying on context.Background ties requests to process lifetime. Consider using a context with timeout derived from session config.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 6a159bf and afbbfe9.

⛔ Files ignored due to path filters (2)
  • go.sum is excluded by !**/*.sum
  • sources/agent/censys/example.json is excluded by !**/*.json
📒 Files selected for processing (5)
  • go.mod (2 hunks)
  • sources/agent/censys/censys.go (3 hunks)
  • sources/agent/censys/response.go (0 hunks)
  • sources/keys.go (2 hunks)
  • sources/provider.go (1 hunks)
💤 Files with no reviewable changes (1)
  • sources/agent/censys/response.go
🧰 Additional context used
🧬 Code Graph Analysis (1)
sources/agent/censys/censys.go (5)
sources/keys.go (1)
  • Keys (3-22)
uncover.go (1)
  • New (56-110)
sources/result.go (1)
  • Result (8-17)
sources/agent.go (2)
  • Agent (8-11)
  • Query (3-6)
sources/session.go (1)
  • Session (38-43)
🔇 Additional comments (5)
go.mod (2)

67-67: LGTM on indirect decimal addition.

Likely pulled in by the SDK; no concerns.


6-6: censys-sdk-go v0.19.1 — Search API present & compatible

Checked the SDK tag v0.19.1 — it includes the Search API and the request/response shapes used in this repo and they match your usage.

  • Repo usage: sources/agent/censys/censys.go — call to GlobalData.Search with operations.V3GlobaldataSearchQueryRequest (passes components.SearchQueryInputBody with PageSize via censyssdkgo.Int64) and later reads resp.ResponseEnvelopeSearchQueryResponse.Result.NextPageToken / .Hits.
  • SDK (github.com/censys/censys-sdk-go@v0.19.1) files inspected:
    • models/operations/v3globaldatasearchquery.go — V3GlobaldataSearchQueryRequest / V3GlobaldataSearchQueryResponse
    • models/components/searchqueryinputbody.go — SearchQueryInputBody (Fields []string, PageSize *int64, PageToken *string, Query string)
    • models/components/responseenvelopesearchqueryresponse.go — ResponseEnvelopeSearchQueryResponse (Result *SearchQueryResponse)
    • globaldata.go — GlobalData.Search implementation (unmarshals into components.ResponseEnvelopeSearchQueryResponse)

Conclusion: pinning to v0.19.1 is compatible for GlobalData.Search and ResponseEnvelopeSearchQueryResponse — no change required.

sources/keys.go (2)

5-5: Rename to CensysOrgId is consistent with the new auth model.

Matches the SDK initialization that uses an Organization ID plus token.


25-25: No remaining CensysSecret references — Empty() change is correct

Searched the repo for "CensysSecret" (no matches). The Keys field was renamed and all usages reference the new fields:

  • sources/keys.go — defines CensysOrgId; Empty() uses it.
  • sources/provider.go — parses provider.Censys and assigns keys.CensysOrgId = parts[1] (env vars: CENSYS_API_ID / CENSYS_API_SECRET).
  • sources/agent/censys/censys.go — validates session.Keys.CensysOrgId and passes it to censyssdkgo.WithOrganizationID.

No stale references found; no changes required.

sources/provider.go (1)

59-60: Assignment to CensysOrgId is correct.

Parsing “token:orgId” into Keys works as intended.

Comment on lines 47 to 57
nextCursor = censysResponse.ResponseEnvelopeSearchQueryResponse.Result.NextPageToken
if nextCursor == "" || numberOfResults > query.Limit || len(censysResponse.ResponseEnvelopeSearchQueryResponse.Result.Hits) == 0 {
break
}
numberOfResults += len(censysResponse.Results.Hits)
numberOfResults += len(censysResponse.ResponseEnvelopeSearchQueryResponse.Result.Hits)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Potential nil-pointer dereference when reading NextPageToken.

resp.ResponseEnvelopeSearchQueryResponse or Result can be nil on some responses; dereferencing without checks will panic.

Minimal safe-guard:

- nextCursor = censysResponse.ResponseEnvelopeSearchQueryResponse.Result.NextPageToken
- if nextCursor == "" || numberOfResults > query.Limit || len(censysResponse.ResponseEnvelopeSearchQueryResponse.Result.Hits) == 0 {
+ var hasNext bool
+ if censysResponse.ResponseEnvelopeSearchQueryResponse != nil && censysResponse.ResponseEnvelopeSearchQueryResponse.Result != nil {
+   nextCursor = censysResponse.ResponseEnvelopeSearchQueryResponse.Result.NextPageToken
+   hasNext = nextCursor != "" && len(censysResponse.ResponseEnvelopeSearchQueryResponse.ResponseEnvelopeSearchQueryResponse.Result.Hits) > 0
+ }
+ if !hasNext || numberOfResults > query.Limit {
   break
 }

Note: See the separate comment for enforcing the Limit correctly at the endpoint level.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In sources/agent/censys/censys.go around lines 47 to 52, the code dereferences
resp.ResponseEnvelopeSearchQueryResponse and its Result without nil checks which
can cause a panic; add guards that ensure
resp.ResponseEnvelopeSearchQueryResponse != nil and
resp.ResponseEnvelopeSearchQueryResponse.Result != nil before reading
NextPageToken or Hits, treat NextPageToken as empty string if absent and treat
Hits as zero-length when nil, and if either parent is nil break the loop (or
handle the error path) so the loop exits safely without panicking.

@MaSven
Copy link
Author

MaSven commented Aug 15, 2025

I dont have time for your AI shit sorry

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 44e88c3 and 4d7c334.

⛔ Files ignored due to path filters (3)
  • .github/workflows/provider-integration.yml is excluded by !**/*.yml
  • go.sum is excluded by !**/*.sum
  • sources/agent/censys/example.json is excluded by !**/*.json
📒 Files selected for processing (6)
  • README.md (3 hunks)
  • go.mod (2 hunks)
  • sources/agent/censys/censys.go (3 hunks)
  • sources/agent/censys/response.go (0 hunks)
  • sources/keys.go (2 hunks)
  • sources/provider.go (2 hunks)
💤 Files with no reviewable changes (1)
  • sources/agent/censys/response.go
🚧 Files skipped from review as they are similar to previous changes (4)
  • sources/provider.go
  • go.mod
  • README.md
  • sources/agent/censys/censys.go
🔇 Additional comments (1)
sources/keys.go (1)

25-25: Empty() update looks correct

The all-empty semantics remain intact with the switch to CensysOrgId. No issues here.

@@ -2,7 +2,7 @@ package sources

type Keys struct {
CensysToken string
CensysSecret string
CensysOrgId string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Prefer Go initialism casing: rename CensysOrgId to CensysOrgID

Go convention is to use “ID” (not “Id”). This small rename avoids future churn and matches the rest of the ecosystem.

Apply within this file:

-    CensysOrgId     string
+    CensysOrgID     string

Then verify/clean up references across the repo (field name, old identifiers, and env vars):


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) Any old identifiers still present?
rg -n -C2 -P '\bCensysSecret\b|\bCENSYS_API_SECRET\b|\bCENSYS_API_ID\b'

# 2) If you adopt CensysOrgID, ensure no leftover "OrgId" usages remain:
rg -n -C2 -P '\bCensysOrgId\b'

# 3) Double-check new env names are the only ones used:
rg -n -C2 -P '\bCENSYS_API_TOKEN\b|\bCENSYS_ORGANIZATION_ID\b'

Length of output: 93


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== repo files containing 'keys.go' (top-level search) ==="
rg -n --hidden --glob '!node_modules' --glob '!dist' --files-with-matches 'keys.go' || true

echo
echo "=== Show sources/keys.go if it exists ==="
if [ -f "sources/keys.go" ]; then
  sed -n '1,240p' sources/keys.go || true
else
  echo "sources/keys.go not found"
fi

echo
echo "=== Search for identifiers and env vars related to Censys ==="
rg -n --hidden --glob '!node_modules' -S '\bCensysOrgId\b|\bCensysOrgID\b|\bCensysSecret\b|\bCENSYS_API_SECRET\b|\bCENSYS_API_ID\b|\bCENSYS_API_TOKEN\b|\bCENSYS_ORGANIZATION_ID\b' || true

echo
echo "=== Wider search for 'Censys' to find any related usages ==="
rg -n --hidden --glob '!node_modules' -S '\bCensys\b' || true

Length of output: 4300


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== files that reference CensysOrgId (rg) ==="
rg -n --hidden -S '\bCensysOrgId\b' || true

echo
echo "=== sources/provider.go ==="
if [ -f "sources/provider.go" ]; then
  sed -n '1,240p' sources/provider.go || true
else
  echo "sources/provider.go not found"
fi

echo
echo "=== sources/agent/censys/censys.go ==="
if [ -f "sources/agent/censys/censys.go" ]; then
  sed -n '1,240p' sources/agent/censys/censys.go || true
else
  echo "sources/agent/censys/censys.go not found"
fi

Length of output: 10924


Rename CensysOrgId → CensysOrgID (use Go initialism "ID")

Follow-up: I verified occurrences — this is a simple field rename but requires updating all references.

Places to change:

  • sources/keys.go
    • field: change CensysOrgId stringCensysOrgID string
    • update Empty() to use keys.CensysOrgID
  • sources/provider.go
    • in GetKeys(): keys.CensysOrgId = parts[1]keys.CensysOrgID = parts[1]
  • sources/agent/censys/censys.go
    • Query() check: session.Keys.CensysOrgIdsession.Keys.CensysOrgID
    • queryURL(): censyssdkgo.WithOrganizationID(session.Keys.CensysOrgId)censyssdkgo.WithOrganizationID(session.Keys.CensysOrgID)

Diff to apply in sources/keys.go:

-    CensysOrgId     string
+    CensysOrgID     string

After applying, run:

  • rg -n --hidden -S '\bCensysOrgId\b' to ensure no references remain
  • go vet / go test or your CI checks to catch any missed references.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
CensysOrgId string
CensysOrgID string
🤖 Prompt for AI Agents
In sources/keys.go around line 5, rename the struct field CensysOrgId to
CensysOrgID and update any uses (e.g., Empty() should reference
keys.CensysOrgID); also update all references across the codebase as listed:
sources/provider.go (GetKeys assignment), sources/agent/censys/censys.go
(Query() check and queryURL() call to censyssdkgo.WithOrganizationID), then run
rg -n --hidden -S '\bCensysOrgId\b' to confirm no occurrences remain and run go
vet / go test (or CI) to catch any missed references.

Use the new censys sdk for better stability in API updates. As censys now returns multiple endpoints per search, we now iterate over every endpoint and create a new result. This also changes the way ip are saved in the raw response, as we dont have the problem anymore, as we have one ip per endpoint.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (5)
sources/keys.go (1)

5-5: Use Go initialism “ID”: rename CensysOrgId → CensysOrgID

Go convention prefers “ID” over “Id”. Rename the field and its usages for consistency and to avoid future churn.

Apply this diff here:

-	CensysOrgId     string
+	CensysOrgID     string

And adjust the Empty() check:

-	return keys.CensysOrgId == "" &&
+	return keys.CensysOrgID == "" &&

Run this to catch and update remaining references across the repo:

#!/bin/bash
set -euo pipefail
rg -n -C2 -P '\bCensysOrgId\b'

Also applies to: 25-25

sources/agent/censys/censys.go (4)

63-81: Respect PerPage and avoid sending empty PageToken

Use the request’s PerPage (bounded to [1..MaxPerPage]) and only set PageToken when non-empty.

 func (agent *Agent) queryURL(session *sources.Session, censysRequest *CensysRequest) (*operations.V3GlobaldataSearchQueryResponse, error) {
 	ctx := context.Background()

 	s := censyssdkgo.New(
-		censyssdkgo.WithOrganizationID(session.Keys.CensysOrgId),
+		censyssdkgo.WithOrganizationID(session.Keys.CensysOrgID),
 		censyssdkgo.WithSecurity(session.Keys.CensysToken),
 		censyssdkgo.WithClient(
 			session.Client.HTTPClient,
 		),
 	)
 
-	return s.GlobalData.Search(ctx, operations.V3GlobaldataSearchQueryRequest{
-		SearchQueryInputBody: components.SearchQueryInputBody{
-			PageSize:  censyssdkgo.Int64(MaxPerPage),
-			Query:     censysRequest.Query,
-			PageToken: &censysRequest.Cursor,
-		},
-	})
+	perPage := censysRequest.PerPage
+	if perPage <= 0 || perPage > MaxPerPage {
+		perPage = MaxPerPage
+	}
+	var pageToken *string
+	if censysRequest.Cursor != "" {
+		pageToken = &censysRequest.Cursor
+	}
+	return s.GlobalData.Search(ctx, operations.V3GlobaldataSearchQueryRequest{
+		SearchQueryInputBody: components.SearchQueryInputBody{
+			PageSize:  censyssdkgo.Int64(int64(perPage)),
+			Query:     censysRequest.Query,
+			PageToken: pageToken,
+		},
+	})

26-27: Follow Go initialism: CensysOrgId → CensysOrgID (and update usages)

Minor naming nit; keeps the codebase idiomatic.

-	if session.Keys.CensysToken == "" || session.Keys.CensysOrgId == "" {
+	if session.Keys.CensysToken == "" || session.Keys.CensysOrgID == "" {
-		censyssdkgo.WithOrganizationID(session.Keys.CensysOrgId),
+		censyssdkgo.WithOrganizationID(session.Keys.CensysOrgID),

Also applies to: 67-68


93-116: Guard against nil SDK envelopes before dereferencing Result/Hits

Directly dereferencing ResponseEnvelopeSearchQueryResponse and Result can panic on sparse responses.

Apply this diff:

-	if result := resp.ResponseEnvelopeSearchQueryResponse.Result; result != nil {
-		for _, censysResult := range result.Hits {
+	if resp != nil && resp.ResponseEnvelopeSearchQueryResponse != nil && resp.ResponseEnvelopeSearchQueryResponse.Result != nil {
+		result := resp.ResponseEnvelopeSearchQueryResponse.Result
+		for _, censysResult := range result.Hits {
 			for _, host := range censysResult.WebpropertyV1.Resource.Endpoints {
 				result := sources.Result{Source: agent.Name()}
 				if host.IP != nil {
 					result.IP = *host.IP
 				}
 				if host.Hostname != nil {
 					result.Host = *host.Hostname
 				}
 				if host.Port != nil {
 					result.Port = *host.Port
 				}
 				if host.HTTP != nil && host.HTTP.URI != nil {
 					result.Url = *host.HTTP.URI
 				}
 				raw, _ := json.Marshal(host)
 				result.Raw = raw
 				results <- result
 			}
-
-		}
-	}
+		}
+	}

47-57: Fix inverted pagination condition; loop exits when it should continue

The loop currently breaks when a next cursor is present. That prevents pagination beyond the first page. Also, it dereferences nested fields without guarding against nil.

Apply this diff to make the loop continue while a next page exists, guard against nil, and respect limit with >=:

-			hasNextCursor := false
-			if censysResponse.ResponseEnvelopeSearchQueryResponse.Result != nil && censysResponse.ResponseEnvelopeSearchQueryResponse.Result.NextPageToken != "" {
-				hasNextCursor = true
-			}
-
-			if hasNextCursor || numberOfResults > query.Limit || len(censysResponse.ResponseEnvelopeSearchQueryResponse.Result.Hits) == 0 {
-				break
-			}
-			nextCursor = censysResponse.ResponseEnvelopeSearchQueryResponse.Result.NextPageToken
-			numberOfResults += len(censysResponse.ResponseEnvelopeSearchQueryResponse.Result.Hits)
+			// Defensive checks on nested fields
+			if censysResponse.ResponseEnvelopeSearchQueryResponse == nil ||
+				censysResponse.ResponseEnvelopeSearchQueryResponse.Result == nil {
+				break
+			}
+			res := censysResponse.ResponseEnvelopeSearchQueryResponse.Result
+			hitsLen := len(res.Hits)
+			nextToken := res.NextPageToken
+
+			// Update emitted counter based on page hits (approximation; see note below)
+			numberOfResults += hitsLen
+
+			// Break when:
+			// - no hits returned
+			// - limit reached/exceeded
+			// - no next page token (i.e., last page)
+			if hitsLen == 0 || (query.Limit > 0 && numberOfResults >= query.Limit) || nextToken == "" {
+				break
+			}
+			nextCursor = nextToken

Note: this still approximates the limit by page hits, not per-endpoint emissions. See separate comment for a precise endpoint-level limit.

🧹 Nitpick comments (3)
sources/agent/censys/censys.go (2)

39-42: Tie PerPage to the requested limit (bounded by MaxPerPage)

When the user asks for fewer than MaxPerPage results, it’s more efficient to ask the API for that many per page.

-			censysRequest := &CensysRequest{
-				Query:   query.Query,
-				PerPage: MaxPerPage,
-				Cursor:  nextCursor,
-			}
+			perPage := MaxPerPage
+			if query.Limit > 0 && query.Limit < perPage {
+				perPage = query.Limit
+			}
+			censysRequest := &CensysRequest{
+				Query:   query.Query,
+				PerPage: perPage,
+				Cursor:  nextCursor,
+			}

96-113: Consider enforcing the limit at the endpoint level (exact), not per “hit” (approximate)

You currently count result “hits” per page, but you emit one result per endpoint, which may exceed the requested limit. Optional but improves UX.

If you want, I can provide a patch to propagate “remaining” into query() and stop emitting once the exact limit is reached.

README.md (1)

138-139: Avoid gitleaks false positives in docs by tweaking placeholders

The sample values look like real secrets to scanners (colon-delimited token:org-id). Minor, but it can fail CI.

Two options:

  • Adjust placeholders to break secret patterns:
    • Use commas instead of colon in docs: CENSYS_API_TOKEN_1,CENSYS_ORGANIZATION_ID_1
    • Or wrap obvious placeholders: <CENSYS_API_TOKEN_1>:<CENSYS_ORGANIZATION_ID_1>
  • Alternatively, add a gitleaks allowlist for README examples.

Example tweak:

-censys:
-  - CENSYS_API_TOKEN_1:CENSYS_ORGANIZATION_ID_1
-  - CENSYS_API_TOKEN_2:CENSYS_ORGANIZATION_ID_2
+censys:
+  # format: <CENSYS_API_TOKEN>:<CENSYS_ORGANIZATION_ID>
+  - <CENSYS_API_TOKEN_1>:<CENSYS_ORGANIZATION_ID_1>
+  - <CENSYS_API_TOKEN_2>:<CENSYS_ORGANIZATION_ID_2>
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 4d7c334 and c96774f.

⛔ Files ignored due to path filters (3)
  • .github/workflows/provider-integration.yml is excluded by !**/*.yml
  • go.sum is excluded by !**/*.sum
  • sources/agent/censys/example.json is excluded by !**/*.json
📒 Files selected for processing (6)
  • README.md (3 hunks)
  • go.mod (2 hunks)
  • sources/agent/censys/censys.go (3 hunks)
  • sources/agent/censys/response.go (0 hunks)
  • sources/keys.go (2 hunks)
  • sources/provider.go (2 hunks)
💤 Files with no reviewable changes (1)
  • sources/agent/censys/response.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • sources/provider.go
  • go.mod
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-05-22T09:38:04.688Z
Learnt from: dogancanbakir
PR: projectdiscovery/uncover#674
File: sources/provider.go:82-84
Timestamp: 2025-05-22T09:38:04.688Z
Learning: The ZoomEye API has been updated and no longer supports the old format that used "token:host". The new implementation uses a single token without host specification, connecting to a fixed endpoint at https://api.zoomeye.ai/v2/search.

Applied to files:

  • README.md
🧬 Code Graph Analysis (1)
sources/agent/censys/censys.go (4)
sources/keys.go (1)
  • Keys (3-22)
sources/result.go (1)
  • Result (8-17)
sources/agent.go (2)
  • Agent (8-11)
  • Query (3-6)
sources/session.go (1)
  • Session (38-43)
🪛 Gitleaks (8.27.2)
README.md

138-138: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)


139-139: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)

🔇 Additional comments (2)
README.md (2)

181-183: Docs: new Censys env vars look good

CENSYS_API_TOKEN and CENSYS_ORGANIZATION_ID align with the SDK-based auth and code changes.


198-198: Link update LGTM

Docs link to Censys “get started” is correct for the new platform.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
sources/agent/censys/censys.go (3)

35-57: Pagination logic bug (inverted condition) and missing limit enforcement at endpoint level.

  • The loop breaks when a next page exists (hasNextCursor == true), which prematurely stops after the first page in most cases.
  • Limit is tracked by counting hits, but results are emitted per endpoint; this can overshoot the requested limit by a large margin.
  • Multiple potential nil dereferences: ResponseEnvelopeSearchQueryResponse and Result are dereferenced without nil checks.

Proposed fix: enforce the limit based on emitted endpoints and correct the break conditions; also guard for nils.

-		var numberOfResults int
-		nextCursor := ""
+		var numberOfResults int
+		nextCursor := ""
 		for {
-			censysRequest := &CensysRequest{
-				Query:   query.Query,
-				PerPage: MaxPerPage,
-				Cursor:  nextCursor,
-			}
-			censysResponse := agent.query(session, censysRequest, results)
-			if censysResponse == nil {
-				break
-			}
-			hasNextCursor := false
-			if censysResponse.ResponseEnvelopeSearchQueryResponse.Result != nil && censysResponse.ResponseEnvelopeSearchQueryResponse.Result.NextPageToken != "" {
-				hasNextCursor = true
-			}
-
-			if hasNextCursor || numberOfResults > query.Limit || len(censysResponse.ResponseEnvelopeSearchQueryResponse.Result.Hits) == 0 {
-				break
-			}
-			nextCursor = censysResponse.ResponseEnvelopeSearchQueryResponse.Result.NextPageToken
-			numberOfResults += len(censysResponse.ResponseEnvelopeSearchQueryResponse.Result.Hits)
+			remaining := 0
+			if query.Limit > 0 {
+				remaining = query.Limit - numberOfResults
+				if remaining <= 0 {
+					break
+				}
+			}
+			perPage := MaxPerPage
+			if remaining > 0 && remaining < MaxPerPage {
+				perPage = remaining
+			}
+			censysRequest := &CensysRequest{
+				Query:   query.Query,
+				PerPage: perPage,
+				Cursor:  nextCursor,
+			}
+			censysResponse, emitted := agent.query(session, censysRequest, results, remaining)
+			numberOfResults += emitted
+			if censysResponse == nil || censysResponse.ResponseEnvelopeSearchQueryResponse == nil || censysResponse.ResponseEnvelopeSearchQueryResponse.Result == nil {
+				break
+			}
+			res := censysResponse.ResponseEnvelopeSearchQueryResponse.Result
+			nextCursor = res.NextPageToken
+			// Break when: limit reached, or no next page, or no hits.
+			if (query.Limit > 0 && numberOfResults >= query.Limit) || res.NextPageToken == "" || len(res.Hits) == 0 {
+				break
+			}
 		}

63-81: Avoid sending empty PageToken; respect per-page from request.

You already use censysRequest.PerPage (good). Only send PageToken when non-empty; some APIs treat an empty token pointer differently than nil.

 func (agent *Agent) queryURL(session *sources.Session, censysRequest *CensysRequest) (*operations.V3GlobaldataSearchQueryResponse, error) {
 	ctx := context.Background()
 
 	s := censyssdkgo.New(
 		censyssdkgo.WithOrganizationID(session.Keys.CensysOrgId),
 		censyssdkgo.WithSecurity(session.Keys.CensysToken),
 		censyssdkgo.WithClient(
 			session.Client.HTTPClient,
 		),
 	)
 
-	return s.GlobalData.Search(ctx, operations.V3GlobaldataSearchQueryRequest{
+	var pageToken *string
+	if censysRequest.Cursor != "" {
+		pageToken = &censysRequest.Cursor
+	}
+	return s.GlobalData.Search(ctx, operations.V3GlobaldataSearchQueryRequest{
 		SearchQueryInputBody: components.SearchQueryInputBody{
 			PageSize:  censyssdkgo.Int64(int64(censysRequest.PerPage)),
 			Query:     censysRequest.Query,
-			PageToken: &censysRequest.Cursor,
+			PageToken: pageToken,
 		},
 	})

84-119: Propagate remaining limit to page processing; fix nil checks and variable shadowing in result emission.

  • Introduce a remaining parameter so endpoint emission can honor the global limit.
  • Add nil checks for resp.ResponseEnvelopeSearchQueryResponse before dereferencing.
  • Avoid shadowing variable name result to improve readability.
-func (agent *Agent) query(session *sources.Session, censysRequest *CensysRequest, results chan sources.Result) *operations.V3GlobaldataSearchQueryResponse {
+func (agent *Agent) query(session *sources.Session, censysRequest *CensysRequest, results chan sources.Result, remaining int) (*operations.V3GlobaldataSearchQueryResponse, int) {
 	// query certificates
 	resp, err := agent.queryURL(session, censysRequest)
 	if err != nil {
 		results <- sources.Result{Source: agent.Name(), Error: err}
 		// httputil.DrainResponseBody(resp)
-		return nil
+		return nil, 0
 	}
 
-	if result := resp.ResponseEnvelopeSearchQueryResponse.Result; result != nil {
-		for _, censysResult := range result.Hits {
+	emitted := 0
+	if resp == nil || resp.ResponseEnvelopeSearchQueryResponse == nil {
+		return resp, 0
+	}
+	if result := resp.ResponseEnvelopeSearchQueryResponse.Result; result != nil {
+		for _, censysResult := range result.Hits {
 
-			for _, host := range censysResult.WebpropertyV1.Resource.Endpoints {
-				result := sources.Result{Source: agent.Name()}
-				if host.IP != nil {
-					result.IP = *host.IP
+			for _, endpoint := range censysResult.WebpropertyV1.Resource.Endpoints {
+				out := sources.Result{Source: agent.Name()}
+				if endpoint.IP != nil {
+					out.IP = *endpoint.IP
 				}
-				if host.Hostname != nil {
-					result.Host = *host.Hostname
+				if endpoint.Hostname != nil {
+					out.Host = *endpoint.Hostname
 				}
-				if host.Port != nil {
-					result.Port = *host.Port
+				if endpoint.Port != nil {
+					out.Port = *endpoint.Port
 				}
-				if host.HTTP != nil && host.HTTP.URI != nil {
-					result.Url = *host.HTTP.URI
+				if endpoint.HTTP != nil && endpoint.HTTP.URI != nil {
+					out.Url = *endpoint.HTTP.URI
 				}
-				raw, _ := json.Marshal(host)
-				result.Raw = raw
-				results <- result
+				raw, _ := json.Marshal(endpoint) // best-effort raw
+				out.Raw = raw
+				results <- out
+				emitted++
+				if remaining > 0 && emitted >= remaining {
+					return resp, emitted
+				}
 			}
 
 		}
 	}
 
-	return resp
+	return resp, emitted
 }
🧹 Nitpick comments (4)
sources/agent/censys/censys.go (2)

26-29: Good credentials guard; minor wording nitpick.

The guard is correct. Consider making the error explicit about which Censys keys are required (token and organization ID) to ease troubleshooting.


4-11: Context usage: consider honoring caller timeouts/cancellation.

Using context.Background() ignores any timeout/cancel intent from upstream. Consider threading a context from session or using a context with timeout derived from session config.

README.md (2)

138-139: Gitleaks false positives on example keys; add allowlist or adjust examples.

Static analysis flags these as generic API keys even though they’re placeholders. Either:

  • Add a gitleaks allowlist regex for these examples, or
  • Change placeholders to clearly non-secret values (e.g., EXAMPLE_CENSYS_API_TOKEN:EXAMPLE_CENSYS_ORG_ID).

Example allowlist regex:

  • ^CENSYS_API_TOKEN_[0-9]+:CENSYS_ORGANIZATION_ID_[0-9]+$

198-198: Doc link update LGTM; consider adding a one-line note on the new Censys key format.

A short note like “Censys requires an API token and an organization ID on the new platform” would reduce confusion for users migrating from the old ID/SECRET pair.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c96774f and a6915e7.

⛔ Files ignored due to path filters (3)
  • .github/workflows/provider-integration.yml is excluded by !**/*.yml
  • go.sum is excluded by !**/*.sum
  • sources/agent/censys/example.json is excluded by !**/*.json
📒 Files selected for processing (6)
  • README.md (3 hunks)
  • go.mod (2 hunks)
  • sources/agent/censys/censys.go (3 hunks)
  • sources/agent/censys/response.go (0 hunks)
  • sources/keys.go (2 hunks)
  • sources/provider.go (2 hunks)
💤 Files with no reviewable changes (1)
  • sources/agent/censys/response.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • sources/provider.go
  • go.mod
  • sources/keys.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-05-22T09:38:04.688Z
Learnt from: dogancanbakir
PR: projectdiscovery/uncover#674
File: sources/provider.go:82-84
Timestamp: 2025-05-22T09:38:04.688Z
Learning: The ZoomEye API has been updated and no longer supports the old format that used "token:host". The new implementation uses a single token without host specification, connecting to a fixed endpoint at https://api.zoomeye.ai/v2/search.

Applied to files:

  • README.md
🧬 Code Graph Analysis (1)
sources/agent/censys/censys.go (4)
sources/keys.go (1)
  • Keys (3-22)
sources/result.go (1)
  • Result (8-17)
sources/agent.go (2)
  • Agent (8-11)
  • Query (3-6)
sources/session.go (1)
  • Session (38-43)
🪛 Gitleaks (8.27.2)
README.md

138-138: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)


139-139: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)

🔇 Additional comments (1)
README.md (1)

181-183: Docs alignment LGTM.

Environment variable names match the new SDK-based auth (token + organization ID).

@ehsandeep ehsandeep changed the base branch from main to dev August 15, 2025 14:14
Copy link
Member

@dogancanbakir dogancanbakir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR -left some comments

hasNextCursor = true
}

if hasNextCursor || numberOfResults > query.Limit || len(censysResponse.ResponseEnvelopeSearchQueryResponse.Result.Hits) == 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!hasNextCursor, you meant?

func (agent *Agent) queryURL(session *sources.Session, censysRequest *CensysRequest) (*operations.V3GlobaldataSearchQueryResponse, error) {
ctx := context.Background()

s := censyssdkgo.New(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should create the client once and pass it along

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Censys no longer working
3 participants