-
Notifications
You must be signed in to change notification settings - Fork 241
Adapt censys to new platform search #694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
release v1.1.0
…iscovery#689) Bumps [golang.org/x/oauth2](https://github.com/golang/oauth2) from 0.18.0 to 0.27.0. - [Commits](golang/oauth2@v0.18.0...v0.27.0) --- updated-dependencies: - dependency-name: golang.org/x/oauth2 dependency-version: 0.27.0 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughReplaced manual HTTP Censys integration with the official censys-sdk-go, switched authentication to token+organization ID, removed legacy Censys response models, updated key parsing/env var names, and added the SDK dependency in go.mod and docs. Changes
Sequence Diagram(s)sequenceDiagram
participant Agent
participant Provider
participant CensysSDK as Censys SDK
participant CensysAPI as Censys API
Agent->>Provider: GetKeys()
Provider-->>Agent: CensysToken, CensysOrgId
Agent->>CensysSDK: Init client (OrgID, Token)
Agent->>CensysSDK: GlobalData.Search(query, pageToken)
CensysSDK->>CensysAPI: Request search
CensysAPI-->>CensysSDK: Search results (+NextPageToken)
CensysSDK-->>Agent: Typed results (hits/endpoints)
Agent->>Agent: Iterate endpoints → collect IP/Host/Port/URL/Raw
Agent-->>CensysSDK: GlobalData.Search(nextPageToken) if NextPageToken present
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~15 minutes Assessment against linked issues
Out-of-scope changes
Suggested reviewers
Poem
✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🔭 Outside diff range comments (1)
sources/agent/censys/censys.go (1)
35-52
: Limit enforcement counts hits, not endpoints; can exceed user limit.Each hit can yield multiple endpoints, but you only increment by hits. This can overshoot by a large margin and breaks expected Limit semantics.
Proposed refactor: track how many endpoints you’ve emitted, pass a “remaining” budget into query, and adjust PageSize accordingly.
go func() { defer close(results) - var numberOfResults int + var emitted int nextCursor := "" + perPage := MaxPerPage + if query.Limit > 0 && query.Limit < perPage { + perPage = query.Limit + } for { censysRequest := &CensysRequest{ Query: query.Query, - PerPage: MaxPerPage, + PerPage: perPage, Cursor: nextCursor, } - censysResponse := agent.query(session, censysRequest, results) + remaining := -1 + if query.Limit > 0 { + remaining = query.Limit - emitted + if remaining <= 0 { + break + } + } + censysResponse, out := agent.query(session, censysRequest, results, remaining) if censysResponse == nil { break } - nextCursor = censysResponse.ResponseEnvelopeSearchQueryResponse.Result.NextPageToken - if nextCursor == "" || numberOfResults > query.Limit || len(censysResponse.ResponseEnvelopeSearchQueryResponse.Result.Hits) == 0 { + if censysResponse.ResponseEnvelopeSearchQueryResponse == nil || censysResponse.ResponseEnvelopeSearchQueryResponse.Result == nil { + break + } + nextCursor = censysResponse.ResponseEnvelopeSearchQueryResponse.Result.NextPageToken + emitted += out + if nextCursor == "" || (query.Limit > 0 && emitted >= query.Limit) || len(censysResponse.ResponseEnvelopeSearchQueryResponse.Result.Hits) == 0 { break } - numberOfResults += len(censysResponse.ResponseEnvelopeSearchQueryResponse.Result.Hits) + if query.Limit > 0 { + remaining = query.Limit - emitted + if remaining <= 0 { + break + } + if remaining < MaxPerPage { + perPage = remaining + } else { + perPage = MaxPerPage + } + } } }()And update the helper signature to return how many endpoints were emitted (see next comment).
🧹 Nitpick comments (2)
sources/provider.go (1)
155-167
: Support new env var names for Censys (backward- and forward-compatible).Env vars still reference CENSYS_API_ID/SECRET, while code expects token+org ID. To reduce confusion and support new accounts, accept CENSYS_TOKEN and CENSYS_ORG_ID as an alternative pair.
Apply this minimal change to add support while keeping old names:
appendIfAllExists := func(arr []string, env1 string, env2 string) []string { if val1, ok := os.LookupEnv(env1); ok { if val2, ok2 := os.LookupEnv(env2); ok2 { return append(arr, fmt.Sprintf("%s:%s", val1, val2)) } else { gologger.Error().Msgf("%v env variable exists but %v does not", env1, env2) } } return arr } provider.Fofa = appendIfAllExists(provider.Fofa, "FOFA_EMAIL", "FOFA_KEY") - provider.Censys = appendIfAllExists(provider.Censys, "CENSYS_API_ID", "CENSYS_API_SECRET") + // Back-compat (old naming) and new naming supported + provider.Censys = appendIfAllExists(provider.Censys, "CENSYS_API_ID", "CENSYS_API_SECRET") + provider.Censys = appendIfAllExists(provider.Censys, "CENSYS_TOKEN", "CENSYS_ORG_ID") provider.Google = appendIfAllExists(provider.Google, "GOOGLE_API_KEY", "GOOGLE_API_CX")sources/agent/censys/censys.go (1)
59-64
: Optional: use a cancelable context (timeout/deadline) for API calls.Relying on context.Background ties requests to process lifetime. Consider using a context with timeout derived from session config.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (2)
go.sum
is excluded by!**/*.sum
sources/agent/censys/example.json
is excluded by!**/*.json
📒 Files selected for processing (5)
go.mod
(2 hunks)sources/agent/censys/censys.go
(3 hunks)sources/agent/censys/response.go
(0 hunks)sources/keys.go
(2 hunks)sources/provider.go
(1 hunks)
💤 Files with no reviewable changes (1)
- sources/agent/censys/response.go
🧰 Additional context used
🧬 Code Graph Analysis (1)
sources/agent/censys/censys.go (5)
sources/keys.go (1)
Keys
(3-22)uncover.go (1)
New
(56-110)sources/result.go (1)
Result
(8-17)sources/agent.go (2)
Agent
(8-11)Query
(3-6)sources/session.go (1)
Session
(38-43)
🔇 Additional comments (5)
go.mod (2)
67-67
: LGTM on indirect decimal addition.Likely pulled in by the SDK; no concerns.
6-6
: censys-sdk-go v0.19.1 — Search API present & compatibleChecked the SDK tag v0.19.1 — it includes the Search API and the request/response shapes used in this repo and they match your usage.
- Repo usage: sources/agent/censys/censys.go — call to GlobalData.Search with operations.V3GlobaldataSearchQueryRequest (passes components.SearchQueryInputBody with PageSize via censyssdkgo.Int64) and later reads resp.ResponseEnvelopeSearchQueryResponse.Result.NextPageToken / .Hits.
- SDK (github.com/censys/censys-sdk-go@v0.19.1) files inspected:
- models/operations/v3globaldatasearchquery.go — V3GlobaldataSearchQueryRequest / V3GlobaldataSearchQueryResponse
- models/components/searchqueryinputbody.go — SearchQueryInputBody (Fields []string, PageSize *int64, PageToken *string, Query string)
- models/components/responseenvelopesearchqueryresponse.go — ResponseEnvelopeSearchQueryResponse (Result *SearchQueryResponse)
- globaldata.go — GlobalData.Search implementation (unmarshals into components.ResponseEnvelopeSearchQueryResponse)
Conclusion: pinning to v0.19.1 is compatible for GlobalData.Search and ResponseEnvelopeSearchQueryResponse — no change required.
sources/keys.go (2)
5-5
: Rename to CensysOrgId is consistent with the new auth model.Matches the SDK initialization that uses an Organization ID plus token.
25-25
: No remaining CensysSecret references — Empty() change is correctSearched the repo for "CensysSecret" (no matches). The Keys field was renamed and all usages reference the new fields:
- sources/keys.go — defines CensysOrgId; Empty() uses it.
- sources/provider.go — parses provider.Censys and assigns keys.CensysOrgId = parts[1] (env vars: CENSYS_API_ID / CENSYS_API_SECRET).
- sources/agent/censys/censys.go — validates session.Keys.CensysOrgId and passes it to censyssdkgo.WithOrganizationID.
No stale references found; no changes required.
sources/provider.go (1)
59-60
: Assignment to CensysOrgId is correct.Parsing “token:orgId” into Keys works as intended.
sources/agent/censys/censys.go
Outdated
nextCursor = censysResponse.ResponseEnvelopeSearchQueryResponse.Result.NextPageToken | ||
if nextCursor == "" || numberOfResults > query.Limit || len(censysResponse.ResponseEnvelopeSearchQueryResponse.Result.Hits) == 0 { | ||
break | ||
} | ||
numberOfResults += len(censysResponse.Results.Hits) | ||
numberOfResults += len(censysResponse.ResponseEnvelopeSearchQueryResponse.Result.Hits) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential nil-pointer dereference when reading NextPageToken.
resp.ResponseEnvelopeSearchQueryResponse or Result can be nil on some responses; dereferencing without checks will panic.
Minimal safe-guard:
- nextCursor = censysResponse.ResponseEnvelopeSearchQueryResponse.Result.NextPageToken
- if nextCursor == "" || numberOfResults > query.Limit || len(censysResponse.ResponseEnvelopeSearchQueryResponse.Result.Hits) == 0 {
+ var hasNext bool
+ if censysResponse.ResponseEnvelopeSearchQueryResponse != nil && censysResponse.ResponseEnvelopeSearchQueryResponse.Result != nil {
+ nextCursor = censysResponse.ResponseEnvelopeSearchQueryResponse.Result.NextPageToken
+ hasNext = nextCursor != "" && len(censysResponse.ResponseEnvelopeSearchQueryResponse.ResponseEnvelopeSearchQueryResponse.Result.Hits) > 0
+ }
+ if !hasNext || numberOfResults > query.Limit {
break
}
Note: See the separate comment for enforcing the Limit correctly at the endpoint level.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In sources/agent/censys/censys.go around lines 47 to 52, the code dereferences
resp.ResponseEnvelopeSearchQueryResponse and its Result without nil checks which
can cause a panic; add guards that ensure
resp.ResponseEnvelopeSearchQueryResponse != nil and
resp.ResponseEnvelopeSearchQueryResponse.Result != nil before reading
NextPageToken or Hits, treat NextPageToken as empty string if absent and treat
Hits as zero-length when nil, and if either parent is nil break the loop (or
handle the error path) so the loop exits safely without panicking.
I dont have time for your AI shit sorry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (3)
.github/workflows/provider-integration.yml
is excluded by!**/*.yml
go.sum
is excluded by!**/*.sum
sources/agent/censys/example.json
is excluded by!**/*.json
📒 Files selected for processing (6)
README.md
(3 hunks)go.mod
(2 hunks)sources/agent/censys/censys.go
(3 hunks)sources/agent/censys/response.go
(0 hunks)sources/keys.go
(2 hunks)sources/provider.go
(2 hunks)
💤 Files with no reviewable changes (1)
- sources/agent/censys/response.go
🚧 Files skipped from review as they are similar to previous changes (4)
- sources/provider.go
- go.mod
- README.md
- sources/agent/censys/censys.go
🔇 Additional comments (1)
sources/keys.go (1)
25-25
: Empty() update looks correctThe all-empty semantics remain intact with the switch to CensysOrgId. No issues here.
@@ -2,7 +2,7 @@ package sources | |||
|
|||
type Keys struct { | |||
CensysToken string | |||
CensysSecret string | |||
CensysOrgId string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Prefer Go initialism casing: rename CensysOrgId to CensysOrgID
Go convention is to use “ID” (not “Id”). This small rename avoids future churn and matches the rest of the ecosystem.
Apply within this file:
- CensysOrgId string
+ CensysOrgID string
Then verify/clean up references across the repo (field name, old identifiers, and env vars):
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 1) Any old identifiers still present?
rg -n -C2 -P '\bCensysSecret\b|\bCENSYS_API_SECRET\b|\bCENSYS_API_ID\b'
# 2) If you adopt CensysOrgID, ensure no leftover "OrgId" usages remain:
rg -n -C2 -P '\bCensysOrgId\b'
# 3) Double-check new env names are the only ones used:
rg -n -C2 -P '\bCENSYS_API_TOKEN\b|\bCENSYS_ORGANIZATION_ID\b'
Length of output: 93
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== repo files containing 'keys.go' (top-level search) ==="
rg -n --hidden --glob '!node_modules' --glob '!dist' --files-with-matches 'keys.go' || true
echo
echo "=== Show sources/keys.go if it exists ==="
if [ -f "sources/keys.go" ]; then
sed -n '1,240p' sources/keys.go || true
else
echo "sources/keys.go not found"
fi
echo
echo "=== Search for identifiers and env vars related to Censys ==="
rg -n --hidden --glob '!node_modules' -S '\bCensysOrgId\b|\bCensysOrgID\b|\bCensysSecret\b|\bCENSYS_API_SECRET\b|\bCENSYS_API_ID\b|\bCENSYS_API_TOKEN\b|\bCENSYS_ORGANIZATION_ID\b' || true
echo
echo "=== Wider search for 'Censys' to find any related usages ==="
rg -n --hidden --glob '!node_modules' -S '\bCensys\b' || true
Length of output: 4300
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== files that reference CensysOrgId (rg) ==="
rg -n --hidden -S '\bCensysOrgId\b' || true
echo
echo "=== sources/provider.go ==="
if [ -f "sources/provider.go" ]; then
sed -n '1,240p' sources/provider.go || true
else
echo "sources/provider.go not found"
fi
echo
echo "=== sources/agent/censys/censys.go ==="
if [ -f "sources/agent/censys/censys.go" ]; then
sed -n '1,240p' sources/agent/censys/censys.go || true
else
echo "sources/agent/censys/censys.go not found"
fi
Length of output: 10924
Rename CensysOrgId → CensysOrgID (use Go initialism "ID")
Follow-up: I verified occurrences — this is a simple field rename but requires updating all references.
Places to change:
- sources/keys.go
- field: change
CensysOrgId string
→CensysOrgID string
- update Empty() to use
keys.CensysOrgID
- field: change
- sources/provider.go
- in GetKeys():
keys.CensysOrgId = parts[1]
→keys.CensysOrgID = parts[1]
- in GetKeys():
- sources/agent/censys/censys.go
- Query() check:
session.Keys.CensysOrgId
→session.Keys.CensysOrgID
- queryURL():
censyssdkgo.WithOrganizationID(session.Keys.CensysOrgId)
→censyssdkgo.WithOrganizationID(session.Keys.CensysOrgID)
- Query() check:
Diff to apply in sources/keys.go:
- CensysOrgId string
+ CensysOrgID string
After applying, run:
- rg -n --hidden -S '\bCensysOrgId\b' to ensure no references remain
- go vet / go test or your CI checks to catch any missed references.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
CensysOrgId string | |
CensysOrgID string |
🤖 Prompt for AI Agents
In sources/keys.go around line 5, rename the struct field CensysOrgId to
CensysOrgID and update any uses (e.g., Empty() should reference
keys.CensysOrgID); also update all references across the codebase as listed:
sources/provider.go (GetKeys assignment), sources/agent/censys/censys.go
(Query() check and queryURL() call to censyssdkgo.WithOrganizationID), then run
rg -n --hidden -S '\bCensysOrgId\b' to confirm no occurrences remain and run go
vet / go test (or CI) to catch any missed references.
Use the new censys sdk for better stability in API updates. As censys now returns multiple endpoints per search, we now iterate over every endpoint and create a new result. This also changes the way ip are saved in the raw response, as we dont have the problem anymore, as we have one ip per endpoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (5)
sources/keys.go (1)
5-5
: Use Go initialism “ID”: rename CensysOrgId → CensysOrgIDGo convention prefers “ID” over “Id”. Rename the field and its usages for consistency and to avoid future churn.
Apply this diff here:
- CensysOrgId string + CensysOrgID stringAnd adjust the Empty() check:
- return keys.CensysOrgId == "" && + return keys.CensysOrgID == "" &&Run this to catch and update remaining references across the repo:
#!/bin/bash set -euo pipefail rg -n -C2 -P '\bCensysOrgId\b'Also applies to: 25-25
sources/agent/censys/censys.go (4)
63-81
: Respect PerPage and avoid sending empty PageTokenUse the request’s PerPage (bounded to [1..MaxPerPage]) and only set PageToken when non-empty.
func (agent *Agent) queryURL(session *sources.Session, censysRequest *CensysRequest) (*operations.V3GlobaldataSearchQueryResponse, error) { ctx := context.Background() s := censyssdkgo.New( - censyssdkgo.WithOrganizationID(session.Keys.CensysOrgId), + censyssdkgo.WithOrganizationID(session.Keys.CensysOrgID), censyssdkgo.WithSecurity(session.Keys.CensysToken), censyssdkgo.WithClient( session.Client.HTTPClient, ), ) - return s.GlobalData.Search(ctx, operations.V3GlobaldataSearchQueryRequest{ - SearchQueryInputBody: components.SearchQueryInputBody{ - PageSize: censyssdkgo.Int64(MaxPerPage), - Query: censysRequest.Query, - PageToken: &censysRequest.Cursor, - }, - }) + perPage := censysRequest.PerPage + if perPage <= 0 || perPage > MaxPerPage { + perPage = MaxPerPage + } + var pageToken *string + if censysRequest.Cursor != "" { + pageToken = &censysRequest.Cursor + } + return s.GlobalData.Search(ctx, operations.V3GlobaldataSearchQueryRequest{ + SearchQueryInputBody: components.SearchQueryInputBody{ + PageSize: censyssdkgo.Int64(int64(perPage)), + Query: censysRequest.Query, + PageToken: pageToken, + }, + })
26-27
: Follow Go initialism: CensysOrgId → CensysOrgID (and update usages)Minor naming nit; keeps the codebase idiomatic.
- if session.Keys.CensysToken == "" || session.Keys.CensysOrgId == "" { + if session.Keys.CensysToken == "" || session.Keys.CensysOrgID == "" {- censyssdkgo.WithOrganizationID(session.Keys.CensysOrgId), + censyssdkgo.WithOrganizationID(session.Keys.CensysOrgID),Also applies to: 67-68
93-116
: Guard against nil SDK envelopes before dereferencing Result/HitsDirectly dereferencing ResponseEnvelopeSearchQueryResponse and Result can panic on sparse responses.
Apply this diff:
- if result := resp.ResponseEnvelopeSearchQueryResponse.Result; result != nil { - for _, censysResult := range result.Hits { + if resp != nil && resp.ResponseEnvelopeSearchQueryResponse != nil && resp.ResponseEnvelopeSearchQueryResponse.Result != nil { + result := resp.ResponseEnvelopeSearchQueryResponse.Result + for _, censysResult := range result.Hits { for _, host := range censysResult.WebpropertyV1.Resource.Endpoints { result := sources.Result{Source: agent.Name()} if host.IP != nil { result.IP = *host.IP } if host.Hostname != nil { result.Host = *host.Hostname } if host.Port != nil { result.Port = *host.Port } if host.HTTP != nil && host.HTTP.URI != nil { result.Url = *host.HTTP.URI } raw, _ := json.Marshal(host) result.Raw = raw results <- result } - - } - } + } + }
47-57
: Fix inverted pagination condition; loop exits when it should continueThe loop currently breaks when a next cursor is present. That prevents pagination beyond the first page. Also, it dereferences nested fields without guarding against nil.
Apply this diff to make the loop continue while a next page exists, guard against nil, and respect limit with >=:
- hasNextCursor := false - if censysResponse.ResponseEnvelopeSearchQueryResponse.Result != nil && censysResponse.ResponseEnvelopeSearchQueryResponse.Result.NextPageToken != "" { - hasNextCursor = true - } - - if hasNextCursor || numberOfResults > query.Limit || len(censysResponse.ResponseEnvelopeSearchQueryResponse.Result.Hits) == 0 { - break - } - nextCursor = censysResponse.ResponseEnvelopeSearchQueryResponse.Result.NextPageToken - numberOfResults += len(censysResponse.ResponseEnvelopeSearchQueryResponse.Result.Hits) + // Defensive checks on nested fields + if censysResponse.ResponseEnvelopeSearchQueryResponse == nil || + censysResponse.ResponseEnvelopeSearchQueryResponse.Result == nil { + break + } + res := censysResponse.ResponseEnvelopeSearchQueryResponse.Result + hitsLen := len(res.Hits) + nextToken := res.NextPageToken + + // Update emitted counter based on page hits (approximation; see note below) + numberOfResults += hitsLen + + // Break when: + // - no hits returned + // - limit reached/exceeded + // - no next page token (i.e., last page) + if hitsLen == 0 || (query.Limit > 0 && numberOfResults >= query.Limit) || nextToken == "" { + break + } + nextCursor = nextTokenNote: this still approximates the limit by page hits, not per-endpoint emissions. See separate comment for a precise endpoint-level limit.
🧹 Nitpick comments (3)
sources/agent/censys/censys.go (2)
39-42
: Tie PerPage to the requested limit (bounded by MaxPerPage)When the user asks for fewer than MaxPerPage results, it’s more efficient to ask the API for that many per page.
- censysRequest := &CensysRequest{ - Query: query.Query, - PerPage: MaxPerPage, - Cursor: nextCursor, - } + perPage := MaxPerPage + if query.Limit > 0 && query.Limit < perPage { + perPage = query.Limit + } + censysRequest := &CensysRequest{ + Query: query.Query, + PerPage: perPage, + Cursor: nextCursor, + }
96-113
: Consider enforcing the limit at the endpoint level (exact), not per “hit” (approximate)You currently count result “hits” per page, but you emit one result per endpoint, which may exceed the requested limit. Optional but improves UX.
If you want, I can provide a patch to propagate “remaining” into query() and stop emitting once the exact limit is reached.
README.md (1)
138-139
: Avoid gitleaks false positives in docs by tweaking placeholdersThe sample values look like real secrets to scanners (colon-delimited token:org-id). Minor, but it can fail CI.
Two options:
- Adjust placeholders to break secret patterns:
- Use commas instead of colon in docs: CENSYS_API_TOKEN_1,CENSYS_ORGANIZATION_ID_1
- Or wrap obvious placeholders: <CENSYS_API_TOKEN_1>:<CENSYS_ORGANIZATION_ID_1>
- Alternatively, add a gitleaks allowlist for README examples.
Example tweak:
-censys: - - CENSYS_API_TOKEN_1:CENSYS_ORGANIZATION_ID_1 - - CENSYS_API_TOKEN_2:CENSYS_ORGANIZATION_ID_2 +censys: + # format: <CENSYS_API_TOKEN>:<CENSYS_ORGANIZATION_ID> + - <CENSYS_API_TOKEN_1>:<CENSYS_ORGANIZATION_ID_1> + - <CENSYS_API_TOKEN_2>:<CENSYS_ORGANIZATION_ID_2>
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (3)
.github/workflows/provider-integration.yml
is excluded by!**/*.yml
go.sum
is excluded by!**/*.sum
sources/agent/censys/example.json
is excluded by!**/*.json
📒 Files selected for processing (6)
README.md
(3 hunks)go.mod
(2 hunks)sources/agent/censys/censys.go
(3 hunks)sources/agent/censys/response.go
(0 hunks)sources/keys.go
(2 hunks)sources/provider.go
(2 hunks)
💤 Files with no reviewable changes (1)
- sources/agent/censys/response.go
🚧 Files skipped from review as they are similar to previous changes (2)
- sources/provider.go
- go.mod
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-05-22T09:38:04.688Z
Learnt from: dogancanbakir
PR: projectdiscovery/uncover#674
File: sources/provider.go:82-84
Timestamp: 2025-05-22T09:38:04.688Z
Learning: The ZoomEye API has been updated and no longer supports the old format that used "token:host". The new implementation uses a single token without host specification, connecting to a fixed endpoint at https://api.zoomeye.ai/v2/search.
Applied to files:
README.md
🧬 Code Graph Analysis (1)
sources/agent/censys/censys.go (4)
sources/keys.go (1)
Keys
(3-22)sources/result.go (1)
Result
(8-17)sources/agent.go (2)
Agent
(8-11)Query
(3-6)sources/session.go (1)
Session
(38-43)
🪛 Gitleaks (8.27.2)
README.md
138-138: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
139-139: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
🔇 Additional comments (2)
README.md (2)
181-183
: Docs: new Censys env vars look goodCENSYS_API_TOKEN and CENSYS_ORGANIZATION_ID align with the SDK-based auth and code changes.
198-198
: Link update LGTMDocs link to Censys “get started” is correct for the new platform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
sources/agent/censys/censys.go (3)
35-57
: Pagination logic bug (inverted condition) and missing limit enforcement at endpoint level.
- The loop breaks when a next page exists (hasNextCursor == true), which prematurely stops after the first page in most cases.
- Limit is tracked by counting hits, but results are emitted per endpoint; this can overshoot the requested limit by a large margin.
- Multiple potential nil dereferences: ResponseEnvelopeSearchQueryResponse and Result are dereferenced without nil checks.
Proposed fix: enforce the limit based on emitted endpoints and correct the break conditions; also guard for nils.
- var numberOfResults int - nextCursor := "" + var numberOfResults int + nextCursor := "" for { - censysRequest := &CensysRequest{ - Query: query.Query, - PerPage: MaxPerPage, - Cursor: nextCursor, - } - censysResponse := agent.query(session, censysRequest, results) - if censysResponse == nil { - break - } - hasNextCursor := false - if censysResponse.ResponseEnvelopeSearchQueryResponse.Result != nil && censysResponse.ResponseEnvelopeSearchQueryResponse.Result.NextPageToken != "" { - hasNextCursor = true - } - - if hasNextCursor || numberOfResults > query.Limit || len(censysResponse.ResponseEnvelopeSearchQueryResponse.Result.Hits) == 0 { - break - } - nextCursor = censysResponse.ResponseEnvelopeSearchQueryResponse.Result.NextPageToken - numberOfResults += len(censysResponse.ResponseEnvelopeSearchQueryResponse.Result.Hits) + remaining := 0 + if query.Limit > 0 { + remaining = query.Limit - numberOfResults + if remaining <= 0 { + break + } + } + perPage := MaxPerPage + if remaining > 0 && remaining < MaxPerPage { + perPage = remaining + } + censysRequest := &CensysRequest{ + Query: query.Query, + PerPage: perPage, + Cursor: nextCursor, + } + censysResponse, emitted := agent.query(session, censysRequest, results, remaining) + numberOfResults += emitted + if censysResponse == nil || censysResponse.ResponseEnvelopeSearchQueryResponse == nil || censysResponse.ResponseEnvelopeSearchQueryResponse.Result == nil { + break + } + res := censysResponse.ResponseEnvelopeSearchQueryResponse.Result + nextCursor = res.NextPageToken + // Break when: limit reached, or no next page, or no hits. + if (query.Limit > 0 && numberOfResults >= query.Limit) || res.NextPageToken == "" || len(res.Hits) == 0 { + break + } }
63-81
: Avoid sending empty PageToken; respect per-page from request.You already use censysRequest.PerPage (good). Only send PageToken when non-empty; some APIs treat an empty token pointer differently than nil.
func (agent *Agent) queryURL(session *sources.Session, censysRequest *CensysRequest) (*operations.V3GlobaldataSearchQueryResponse, error) { ctx := context.Background() s := censyssdkgo.New( censyssdkgo.WithOrganizationID(session.Keys.CensysOrgId), censyssdkgo.WithSecurity(session.Keys.CensysToken), censyssdkgo.WithClient( session.Client.HTTPClient, ), ) - return s.GlobalData.Search(ctx, operations.V3GlobaldataSearchQueryRequest{ + var pageToken *string + if censysRequest.Cursor != "" { + pageToken = &censysRequest.Cursor + } + return s.GlobalData.Search(ctx, operations.V3GlobaldataSearchQueryRequest{ SearchQueryInputBody: components.SearchQueryInputBody{ PageSize: censyssdkgo.Int64(int64(censysRequest.PerPage)), Query: censysRequest.Query, - PageToken: &censysRequest.Cursor, + PageToken: pageToken, }, })
84-119
: Propagate remaining limit to page processing; fix nil checks and variable shadowing in result emission.
- Introduce a remaining parameter so endpoint emission can honor the global limit.
- Add nil checks for resp.ResponseEnvelopeSearchQueryResponse before dereferencing.
- Avoid shadowing variable name result to improve readability.
-func (agent *Agent) query(session *sources.Session, censysRequest *CensysRequest, results chan sources.Result) *operations.V3GlobaldataSearchQueryResponse { +func (agent *Agent) query(session *sources.Session, censysRequest *CensysRequest, results chan sources.Result, remaining int) (*operations.V3GlobaldataSearchQueryResponse, int) { // query certificates resp, err := agent.queryURL(session, censysRequest) if err != nil { results <- sources.Result{Source: agent.Name(), Error: err} // httputil.DrainResponseBody(resp) - return nil + return nil, 0 } - if result := resp.ResponseEnvelopeSearchQueryResponse.Result; result != nil { - for _, censysResult := range result.Hits { + emitted := 0 + if resp == nil || resp.ResponseEnvelopeSearchQueryResponse == nil { + return resp, 0 + } + if result := resp.ResponseEnvelopeSearchQueryResponse.Result; result != nil { + for _, censysResult := range result.Hits { - for _, host := range censysResult.WebpropertyV1.Resource.Endpoints { - result := sources.Result{Source: agent.Name()} - if host.IP != nil { - result.IP = *host.IP + for _, endpoint := range censysResult.WebpropertyV1.Resource.Endpoints { + out := sources.Result{Source: agent.Name()} + if endpoint.IP != nil { + out.IP = *endpoint.IP } - if host.Hostname != nil { - result.Host = *host.Hostname + if endpoint.Hostname != nil { + out.Host = *endpoint.Hostname } - if host.Port != nil { - result.Port = *host.Port + if endpoint.Port != nil { + out.Port = *endpoint.Port } - if host.HTTP != nil && host.HTTP.URI != nil { - result.Url = *host.HTTP.URI + if endpoint.HTTP != nil && endpoint.HTTP.URI != nil { + out.Url = *endpoint.HTTP.URI } - raw, _ := json.Marshal(host) - result.Raw = raw - results <- result + raw, _ := json.Marshal(endpoint) // best-effort raw + out.Raw = raw + results <- out + emitted++ + if remaining > 0 && emitted >= remaining { + return resp, emitted + } } } } - return resp + return resp, emitted }
🧹 Nitpick comments (4)
sources/agent/censys/censys.go (2)
26-29
: Good credentials guard; minor wording nitpick.The guard is correct. Consider making the error explicit about which Censys keys are required (token and organization ID) to ease troubleshooting.
4-11
: Context usage: consider honoring caller timeouts/cancellation.Using context.Background() ignores any timeout/cancel intent from upstream. Consider threading a context from session or using a context with timeout derived from session config.
README.md (2)
138-139
: Gitleaks false positives on example keys; add allowlist or adjust examples.Static analysis flags these as generic API keys even though they’re placeholders. Either:
- Add a gitleaks allowlist regex for these examples, or
- Change placeholders to clearly non-secret values (e.g., EXAMPLE_CENSYS_API_TOKEN:EXAMPLE_CENSYS_ORG_ID).
Example allowlist regex:
- ^CENSYS_API_TOKEN_[0-9]+:CENSYS_ORGANIZATION_ID_[0-9]+$
198-198
: Doc link update LGTM; consider adding a one-line note on the new Censys key format.A short note like “Censys requires an API token and an organization ID on the new platform” would reduce confusion for users migrating from the old ID/SECRET pair.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (3)
.github/workflows/provider-integration.yml
is excluded by!**/*.yml
go.sum
is excluded by!**/*.sum
sources/agent/censys/example.json
is excluded by!**/*.json
📒 Files selected for processing (6)
README.md
(3 hunks)go.mod
(2 hunks)sources/agent/censys/censys.go
(3 hunks)sources/agent/censys/response.go
(0 hunks)sources/keys.go
(2 hunks)sources/provider.go
(2 hunks)
💤 Files with no reviewable changes (1)
- sources/agent/censys/response.go
🚧 Files skipped from review as they are similar to previous changes (3)
- sources/provider.go
- go.mod
- sources/keys.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-05-22T09:38:04.688Z
Learnt from: dogancanbakir
PR: projectdiscovery/uncover#674
File: sources/provider.go:82-84
Timestamp: 2025-05-22T09:38:04.688Z
Learning: The ZoomEye API has been updated and no longer supports the old format that used "token:host". The new implementation uses a single token without host specification, connecting to a fixed endpoint at https://api.zoomeye.ai/v2/search.
Applied to files:
README.md
🧬 Code Graph Analysis (1)
sources/agent/censys/censys.go (4)
sources/keys.go (1)
Keys
(3-22)sources/result.go (1)
Result
(8-17)sources/agent.go (2)
Agent
(8-11)Query
(3-6)sources/session.go (1)
Session
(38-43)
🪛 Gitleaks (8.27.2)
README.md
138-138: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
139-139: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
🔇 Additional comments (1)
README.md (1)
181-183
: Docs alignment LGTM.Environment variable names match the new SDK-based auth (token + organization ID).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR -left some comments
hasNextCursor = true | ||
} | ||
|
||
if hasNextCursor || numberOfResults > query.Limit || len(censysResponse.ResponseEnvelopeSearchQueryResponse.Result.Hits) == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
!hasNextCursor
, you meant?
func (agent *Agent) queryURL(session *sources.Session, censysRequest *CensysRequest) (*operations.V3GlobaldataSearchQueryResponse, error) { | ||
ctx := context.Background() | ||
|
||
s := censyssdkgo.New( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should create the client once and pass it along
Use the new censys sdk for better stability in API updates. As censys now returns multiple endpoints per search, we now iterate over every endpoint and create a new result. This also changes the way ip are saved in the raw response, as we dont have the problem anymore, as we have one ip per endpoint.
In #684 described, without these changes, censys no longer works.
Also closes #684
This was already tested.
Summary by CodeRabbit
Refactor
Chores
Documentation