Skip to content

Conversation

murphyatwork
Copy link
Contributor

@murphyatwork murphyatwork commented Aug 28, 2025

Why I'm doing:

This PR resolves issue #60535 by adding the ability to query column size and compressed column size via _META_ scans. This provides users with estimates of data storage for individual columns.

What I'm doing:

  • Introduced column_size(col) and column_compressed_size(col) built-in functions.
  • Implemented BE-side meta collection for these new fields in SegmentMetaCollecter and OlapMetaReader.
    • column_size uses ColumnMetaPB.total_mem_footprint() as an uncompressed size proxy.
    • column_compressed_size calculates the sum of data page sizes by iterating through ordinal page indexes.
  • Extended FE rules (PushDownAggToMetaScanRule, RewriteSimpleAggToMetaScanRule) to support pushing down SUM(column_size(col)) and SUM(column_compressed_size(col)) to meta scans.

Usage:

  • Decompressed estimate: SELECT column_size(col) FROM t [_META_];
  • Compressed estimate: SELECT column_compressed_size(col) FROM t [_META_];
  • Both can be aggregated and pushed down, e.g., SELECT sum(column_size(col)) FROM t [_META_];

Fixes #60535

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 4.0
    • 3.5
    • 3.4
    • 3.3

Open in Cursor Open in Web

Co-authored-by: huanmingwong <huanmingwong@gmail.com>
Copy link

cursor bot commented Aug 28, 2025

Cursor Agent can help with this pull request. Just @cursor in comments and I'll start working on changes in this branch.
Learn more about Cursor Agents

@CLAassistant
Copy link

CLAassistant commented Aug 28, 2025

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ murphyatwork
❌ cursoragent
You have signed the CLA already but the status is still pending? Let us recheck it.

Co-authored-by: huanmingwong <huanmingwong@gmail.com>
@murphyatwork murphyatwork changed the title Resolve issue 60535 [Feature] introduce a function to obtain the column size Aug 29, 2025
@murphyatwork murphyatwork marked this pull request as ready for review August 29, 2025 02:50
@murphyatwork murphyatwork requested review from a team as code owners August 29, 2025 02:50
cursoragent and others added 3 commits August 29, 2025 03:26
Co-authored-by: huanmingwong <huanmingwong@gmail.com>
Signed-off-by: Murphy <mofei@starrocks.com>
Signed-off-by: Murphy <mofei@starrocks.com>
@murphyatwork murphyatwork changed the title [Feature] introduce a function to obtain the column size [Enhancement] introduce a function to obtain the column size Aug 29, 2025
@github-actions github-actions bot added the 4.0 label Aug 29, 2025
Signed-off-by: Murphy <mofei@starrocks.com>
Signed-off-by: Murphy <mofei@starrocks.com>
@murphyatwork murphyatwork requested a review from a team as a code owner August 29, 2025 05:15
Signed-off-by: Murphy <mofei@starrocks.com>
@alvin-celerdata
Copy link
Contributor

@cursor review

cursor[bot]

This comment was marked as outdated.

kangkaisen
kangkaisen previously approved these changes Aug 29, 2025
Signed-off-by: Murphy <mofei@starrocks.com>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is being reviewed by Cursor Bugbot

Details

You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.

To receive Bugbot reviews on all of your PRs, please upgrade to Bugbot Pro by visiting the Cursor dashboard. Your first 14 days will be free!

Copy link

sonarqubecloud bot commented Sep 2, 2025

Copy link

github-actions bot commented Sep 2, 2025

[Java-Extensions Incremental Coverage Report]

pass : 0 / 0 (0%)

Copy link

github-actions bot commented Sep 2, 2025

[FE Incremental Coverage Report]

pass : 20 / 20 (100.00%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 com/starrocks/sql/optimizer/rule/transformation/PushDownAggToMetaScanRule.java 9 9 100.00% []
🔵 com/starrocks/catalog/FunctionSet.java 2 2 100.00% []
🔵 com/starrocks/sql/optimizer/rule/transformation/RewriteSimpleAggToMetaScanRule.java 9 9 100.00% []

Copy link

github-actions bot commented Sep 2, 2025

[BE Incremental Coverage Report]

pass : 51 / 51 (100.00%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 be/src/storage/olap_meta_reader.cpp 2 2 100.00% []
🔵 be/src/storage/meta_reader.cpp 41 41 100.00% []
🔵 be/src/storage/rowset/column_reader.h 1 1 100.00% []
🔵 be/src/storage/rowset/column_reader.cpp 7 7 100.00% []

@murphyatwork murphyatwork merged commit 347b0b1 into main Sep 3, 2025
155 of 161 checks passed
@murphyatwork murphyatwork deleted the cursor/resolve-issue-60535-ce6a branch September 3, 2025 02:18
Copy link

github-actions bot commented Sep 3, 2025

@Mergifyio backport branch-4.0

@github-actions github-actions bot removed the 4.0 label Sep 3, 2025
Copy link
Contributor

mergify bot commented Sep 3, 2025

backport branch-4.0

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request Sep 3, 2025
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
(cherry picked from commit 347b0b1)
wanpengfei-git pushed a commit that referenced this pull request Sep 3, 2025
…#62481) (#62674)

Co-authored-by: Murphy <96611012+murphyatwork@users.noreply.github.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Introduce a meta scan function for inspecting table column size
7 participants