seaweedfs

Commit Graph

Author	SHA1	Message	Date
chrislu	c834e30a72	debug: add logging to SeaweedFileSystemStore.createFile() Critical diagnostic: Our FSDataOutputStream.getPos() override is NOT being called! Adding WARN logs to SeaweedFileSystemStore.createFile() to determine: 1. Is createFile() being called at all? 2. If yes, but FSDataOutputStream override not called, then streams are being returned WITHOUT going through SeaweedFileSystem.create/append 3. This would explain why our position tracking fix has no effect Hypothesis: SeaweedFileSystemStore.createFile() returns SeaweedHadoopOutputStream directly, and it gets wrapped by something else (not our custom FSDataOutputStream).	4 months ago
chrislu	6fe5c372ee	debug: change logs to WARN level to ensure visibility INFO logs from seaweed.hdfs package may be filtered. Changed all diagnostic logs to WARN level to match the 'PARQUET FILE WRITTEN' log which DOES appear in test output. This will definitively show: 1. Whether our code path is being used 2. Whether the getPos() override is being called 3. What position values are being returned	4 months ago
chrislu	c91175cb97	fix: make path variable final for anonymous inner class Java compilation error: - 'local variables referenced from an inner class must be final or effectively final' - The 'path' variable was being reassigned (path = qualify(path)) - This made it non-effectively-final Solution: - Create 'final Path finalPath = path' after qualification - Use finalPath in the anonymous FSDataOutputStream subclass - Applied to both create() and append() methods	4 months ago
chrislu	d6f9234cea	debug: add aggressive logging to FSDataOutputStream getPos() override This will help determine: 1. If the anonymous FSDataOutputStream subclass is being created 2. If the getPos() override is actually being called by Parquet 3. What position value is being returned If we see 'Creating FSDataOutputStream' but NOT 'getPos() override called', it means FSDataOutputStream is using a different mechanism for position tracking. If we don't see either log, it means the code path isn't being used at all.	4 months ago
chrislu	9e7ed48688	fix: Override FSDataOutputStream.getPos() to use SeaweedOutputStream position CRITICAL FIX for Parquet 78-byte EOF error! Root Cause Analysis: - Hadoop's FSDataOutputStream tracks position with an internal counter - It does NOT call SeaweedOutputStream.getPos() by default - When Parquet writes data and calls getPos() to record column chunk offsets, it gets FSDataOutputStream's counter, not SeaweedOutputStream's actual position - This creates a 78-byte mismatch between recorded offsets and actual file size - Result: EOFException when reading (tries to read beyond file end) The Fix: - Override getPos() in the anonymous FSDataOutputStream subclass - Delegate to SeaweedOutputStream.getPos() which returns 'position + buffer.position()' - This ensures Parquet gets the correct position when recording metadata - Column chunk offsets in footer will now match actual data positions This should fix the consistent 78-byte discrepancy we've been seeing across all Parquet file writes (regardless of file size: 684, 693, 1275 bytes, etc.)	4 months ago
chrislu	ac9fbeefac	refactor: remove emojis from logging and workflow messages Removed all emoji characters from: 1. SeaweedOutputStream.java - write() logs - close() logs - getPos() logs - flushWrittenBytesToServiceInternal() logs - writeCurrentBufferToService() logs 2. SeaweedWrite.java - Chunk write logs - Metadata write logs - Mismatch warnings 3. SeaweedHadoopOutputStream.java - Constructor logs 4. spark-integration-tests.yml workflow - Replaced checkmarks with 'OK' - Replaced X marks with 'FAILED' - Replaced error marks with 'ERROR' - Replaced warning marks with 'WARNING:' All functionality remains the same, just cleaner ASCII-only output.	4 months ago
chrislu	a3cf4eb843	debug: track stream lifecycle and total bytes written Added comprehensive logging to identify why Parquet files fail with 'EOFException: Still have: 78 bytes left'. Key additions: 1. SeaweedHadoopOutputStream constructor logging with 🔧 marker - Shows when output streams are created - Logs path, position, bufferSize, replication 2. totalBytesWritten counter in SeaweedOutputStream - Tracks cumulative bytes written via write() calls - Helps identify if Parquet wrote 762 bytes but only 684 reached chunks 3. Enhanced close() logging with 🔒 and ✅ markers - Shows totalBytesWritten vs position vs buffer.position() - If totalBytesWritten=762 but position=684, write submission failed - If buffer.position()=78 at close, buffer wasn't flushed Expected scenarios in next run: A) Stream never created → No 🔧 log for .parquet files B) Write failed → totalBytesWritten=762 but position=684 C) Buffer not flushed → buffer.position()=78 at close D) All correct → totalBytesWritten=position=684, but Parquet expects 762 This will pinpoint whether the issue is in: - Stream creation/lifecycle - Write submission - Buffer flushing - Or Parquet's internal state	4 months ago
chrislu	a5bccca443	debug: add critical diagnostics for EOFException (78 bytes missing) The persistent EOFException shows Parquet expects 78 more bytes than exist. This suggests a mismatch between what was written vs what's in chunks. Added logging to track: 1. Buffer state at close (position before flush) 2. Stream position when flushing metadata 3. Chunk count vs file size in attributes 4. Explicit fileSize setting from stream position Key hypothesis: - Parquet writes N bytes total (e.g., 762) - Stream.position tracks all writes - But only (N-78) bytes end up in chunks - This causes Parquet read to fail with 'Still have: 78 bytes left' If buffer.position() = 78 at close, the buffer wasn't flushed. If position != chunk total, write submission failed. If attr.fileSize != position, metadata is inconsistent. Next run will show which scenario is happening.	4 months ago
chrislu	966b053ed3	fix: use SNAPSHOT version to force Maven to use locally built JARs ROOT CAUSE: Maven was downloading seaweedfs-client:3.80 from Maven Central instead of using the locally built version in CI! Changes: - Changed all versions from 3.80 to 3.80.1-SNAPSHOT - other/java/client/pom.xml: 3.80 → 3.80.1-SNAPSHOT - other/java/hdfs2/pom.xml: property 3.80 → 3.80.1-SNAPSHOT - other/java/hdfs3/pom.xml: property 3.80 → 3.80.1-SNAPSHOT - test/java/spark/pom.xml: property 3.80 → 3.80.1-SNAPSHOT Maven behavior: - Release versions (3.80): Downloaded from remote repos if available - SNAPSHOT versions: Prefer local builds, can be updated This ensures the CI uses the locally built JARs with our debug logging! Also added unique [DEBUG-2024] markers to verify in logs.	4 months ago
chrislu	c86177e063	add comments	4 months ago
chrislu	a7f786ac92	NPE	4 months ago
chrislu	c96448f3a5	more flexible replication configuration	4 months ago
dependabot[bot]	c14e513964	chore(deps): bump org.apache.hadoop:hadoop-common from 3.2.4 to 3.4.0 in /other/java/hdfs3 (#7512 ) * chore(deps): bump org.apache.hadoop:hadoop-common in /other/java/hdfs3 Bumps org.apache.hadoop:hadoop-common from 3.2.4 to 3.4.0. --- updated-dependencies: - dependency-name: org.apache.hadoop:hadoop-common dependency-version: 3.4.0 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> * add java client unit tests * Update dependency-reduced-pom.xml * add java integration tests * fix * fix buffer --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: chrislu <chris.lu@gmail.com>	4 months ago
orthoxerox	d8cc269294	feature: added ssl support for HCFS (#6699 ) (#6775 )	11 months ago
chrislu	2caa0e3741	java 3.80	1 year ago
chrislu	915f9f5054	update java client to 3.71, also adjust the groupId	2 years ago
chrislu	83fe2bfc36	java 3.71	2 years ago
chrislu	cd01a2346a	Java 3.59 fix https://github.com/seaweedfs/seaweedfs/issues/5001	2 years ago
chrislu	710e88f713	Java: upgrade to 3.55	3 years ago
chrislu	ea2637734a	refactor filer proto chunk variable from mtime to modified_ts_ns	3 years ago
chrislu	707abd5b2d	3.30 java	4 years ago
dependabot[bot]	710d2a6f16	Bump hadoop-common from 3.2.3 to 3.2.4 in /other/java/hdfs3 (#3432 ) Bumps hadoop-common from 3.2.3 to 3.2.4. --- updated-dependencies: - dependency-name: org.apache.hadoop:hadoop-common dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	4 years ago
chrislu	d003bb0166	java 3.13	4 years ago
dependabot[bot]	2c0e2e11df	Bump hadoop-common from 3.1.4 to 3.2.3 in /other/java/hdfs3 Bumps hadoop-common from 3.1.4 to 3.2.3. --- updated-dependencies: - dependency-name: org.apache.hadoop:hadoop-common dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>	4 years ago
dependabot[bot]	1236efb039	Bump hadoop-common from 3.1.1 to 3.1.4 in /other/java/hdfs3 Bumps hadoop-common from 3.1.1 to 3.1.4. --- updated-dependencies: - dependency-name: org.apache.hadoop:hadoop-common dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>	4 years ago
chrislu	19555385f7	2.85	4 years ago
chrislu	5799a20f71	2.84	4 years ago
chrislu	5ea9715721	2.81 also sync java client version to SeaweedFS version	4 years ago
Chris Lu	6fb6480a3b	Java: 1.7.0 update org.apache.httpcomponents to 4.5.13 update grpc API to use cacheRemoteObjectToLocalCluster	4 years ago
Chris Lu	8cd7a0365b	1.6.9	5 years ago
Chris Lu	20ac710ceb	2.68	5 years ago
Chris Lu	e5fc35ed0c	change server address from string to a type	5 years ago
Chris Lu	e9128e75d0	Java: 1.6.7 Support Mounted Remote Storage	5 years ago
Chris Lu	2d85ffe7c5	java 1.6.6	5 years ago
Chris Lu	6c1c72b1f4	java client 1.6.5	5 years ago
Chris Lu	be25bc6766	Java client 1.6.4	5 years ago
Chris Lu	b5e10bf511	Java client 1.6.3	5 years ago
Chris Lu	c276117fef	Java: 1.6.2	5 years ago
Chris Lu	5b1def9080	Java: 1.6.1 refacoring API	5 years ago
Chris Lu	ad36c7b0d7	refactoring: only expose FilerClient class	5 years ago
Chris Lu	5138d3954f	Java 1.6.0 fix filerProxy mode	5 years ago
Chris Lu	9c1efdf11b	HCFS: 1.5.9	5 years ago
Chris Lu	8f3a51f2b8	Java: 1.5.8 additional fixes	5 years ago
Chris Lu	6a2a9b67e8	Java: 1.5.8	5 years ago
Chris Lu	87d1bfa862	Hadoop Compatible FS: 1.5.7	5 years ago
Chris Lu	6f4aab51f9	refactoring SeaweedInputStream	5 years ago
Chris Lu	043c2d7960	refactoring SeaweedOutputStream	5 years ago
Chris Lu	4d2855476c	Hadoop: add BufferedByteBufferReadableInputStream fix https://github.com/chrislusf/seaweedfs/issues/1645	5 years ago
Chris Lu	3857f9c840	Hadoop: switch to ByteBuffer fix https://github.com/chrislusf/seaweedfs/issues/1645	5 years ago
Chris Lu	a9efaa6385	HDFS: implement ByteBufferReadable fix https://github.com/chrislusf/seaweedfs/issues/1645	5 years ago

1 2 3

118 Commits (d7d4d9709802f1fd4f4e92893abfe6d2ba007cf0)