SYSTEM LOG
GENAI TEST MATRIX
Tracking experiments, bottlenecks, render limits, and workflow findings across hybrid GenAI systems.
> ready
[ VIDEO PIPELINE ]
[✓]
Continuity Stress Test
Done
Test long continuous shots using first-frame to last-frame consistency. Check subject identity, wardrobe, scene integrity, camera logic, and drift over time. Community feedback: Several viewers noted that the sequence does not fully feel glued together as one continuous piece. Continuity reads better at normal playback speed than under closer scrutiny. Interpretation: Temporal consistency and shot-to-shot coherence are improving, but long-form continuity still breaks when evaluating the sequence as a unified scene. Action: Keep testing first-frame / last-frame workflows and reduce scene-to-scene visual drift.
[✓]
Multi-Output Test 1080p (5s-20s)
Done
Benchmark native 20-second generation. Track render time, visual degradation, motion drift, and overall coherence. Test 1080p 15-second output. Compare speed, temporal consistency, and artifact rate against shorter generations. Benchmark 10-second generation as a mid-length test. Compare quality and speed against 5-second and 15-second runs. Establish short-output baseline for speed, stability, fidelity, and prompt reliability.
[✓]
Multishot Sequencing
Done
Test scene-to-scene sequencing. Evaluate subject consistency, shot transitions, pacing, and editorial logic across multiple shots. Community feedback: Multiple comments suggested that individual shots can work on their own, but the full sequence feels layered or assembled rather than fully unified. Interpretation: The work currently performs better as editorial montage than strict narrative continuity. Action: Treat multishot pieces as mood-driven sequences unless stronger inter-shot controls are in place.
[✓]
Seed Repeatability
Done
Test whether the same seed produces the same animation or closely matching motion on repeat runs. Result: You should not expect perfect deterministic motion across all reruns or across different images. Community feedback: Perspective and place consistency appear to drift from shot to shot. Interpretation: Even when individual frames are strong, repeatable structural consistency is not yet stable enough across a sequence. Action: Use seed control, tighter source framing, and reduced variation when continuity matters more than exploration.
[~]
Object Motion
Testing
Test believable motion for inanimate subjects such as products, props, or packaging. Compare object-driven motion versus camera-driven motion. Community feedback: The speed of the machines in the background of the opening shot was specifically flagged as giving the effect away. Interpretation: Secondary object motion and background mechanics remain a visible realism breakpoint. Action: Audit prop and machine motion more aggressively, especially repeating background actions.
[ ]
3D Motion Test
Pending
Evaluate spatial believability, parallax, dimensional consistency, rotation logic, and camera movement around subjects. Community feedback: Viewers called out that real-world movement and environmental motion still feel synthetic in places, especially when physical timing and perspective need to hold together. Interpretation: Spatial believability is improving, but motion still exposes weaknesses faster than static composition. Action: Prioritize camera realism, grounded movement, and environmental timing over ambitious motion complexity.
[ ]
2D Motion Test
Pending
Evaluate stylized flat-motion workflows. Check frame stability, graphic consistency, and temporal smoothness.
[ IMAGE RELIABILITY ]
[✓]
Hand Accuracy
Done
Test finger count, hand structure, gesture fidelity, and failure cases involving complex hand poses.
[✓]
High-Res Source Check
Done
Verify whether 2K+ source images improve generation quality, detail retention, and consistency in downstream image-to-video workflows.
[✓]
Clothing Fidelity
Done
Test outfit consistency, garment structure, folds, material behavior, and detail retention across generations. Community feedback: Hair and makeup were described as averaging toward a generic middle rather than feeling specifically designed. Interpretation: Styling details are currently converging toward safe statistical outputs instead of highly intentional character design. Action: Push stronger styling references and custom control where wardrobe, grooming, and identity specificity matter.
[✓]
Animal Accuracy
Done
Test animal anatomy, realism, interaction with humans, and reliability in still and motion workflows.
[~]
Multi-Character Stability
Testing
Test multiple characters in one shot. Community feedback: No direct multi-character failure was singled out here, but broader comments about layered assembly and lack of cohesion suggest scene complexity still increases instability. Interpretation: As subject count and interaction complexity rise, the sequence becomes harder to unify convincingly. Action: Favor simpler subject staging until stronger consistency controls are in place.
[✓]
Skin Texture Realism
Done
Add or leave natural human blemishes, scars, hairs etc. No smoothing. Community feedback: Skin smoothing was one of the most immediate realism tells. Viewers noted that faces still look too perfect. Interpretation: Over-clean skin reduces believability and pushes faces toward synthetic beauty treatment. Action: Preserve pores, texture, minor blemishes, and natural facial irregularities instead of over-smoothing.
[ SYSTEM CAPABILITIES ]
[~]
Offline LLM
Testing
Evaluate local/offline language model options for prompting, planning, summarization, and workflow support.
[✓]
Offline Image Generator
Done
Tested local image generation for storyboard creation, concept exploration, speed, and consistency. In practice, high-quality local models such as Flux2.dev are too computationally heavy for standard prosumer hardware. On an RTX 3090, generation times were extremely slow, with single images taking roughly 30 minutes. Reaching cloud-level performance and realism would require significantly more expensive hardware, making the local route less cost-effective than using mature cloud tools like Firefly, Midjourney, or Nano Banana for production-grade image generation.
[ ]
Depth Map Workflow
Pending
Test depth-map use for parallax, camera moves, pseudo-3D shots, and image-to-video enhancements. Community feedback: One of the strongest positive notes was that lighting consistency and spatial depth are improving. Interpretation: Depth and scene structure are becoming more convincing, even if full realism is not solved. Action: Keep testing depth-assisted workflows to strengthen dimensionality and reduce flat layered motion.
[~]
Audience Acceptance Threshold
Testing
Community feedback: Several comments suggested that trained creatives and VFX-aware viewers notice flaws quickly, while general audiences are more likely to accept the work at face value, especially in casual viewing contexts. Interpretation: Perceived success is audience-dependent. Mobile playback and everyday viewing conditions may be much more forgiving than frame-by-frame professional scrutiny. Action: Evaluate outputs against intended audience context before judging commercial viability.
[ TRAINING & CONTROL ]
[~]
LoRA Training Pipeline
Testing
Test LoRA workflow for character consistency, style locking, product identity, and reusable visual control. Community feedback: Styling, skin treatment, and visual identity still tend to collapse toward generic outputs. Interpretation: Base-model generation is not enough when you need more specific identity, texture, and repeatability. Action: Use LoRA or stronger custom control when character specificity and visual continuity matter.
[~]
Custom Voice Input
Testing
Test voice-driven prompting and custom voice input workflows. Evaluate speed, clarity, and practical integration.
[✓]
Physical Contact Realism
Done
Test believable touch interactions between people, clothing, props, and surfaces. Watch for fusion, distortion, or contact errors. Community feedback: Viewers repeatedly described the output as struggling with how real objects and people move, especially when believable interaction and physical logic are required. Interpretation: Physical realism remains one of the most exposed weak points in motion generation. Action: Avoid complex contact-heavy actions unless motion has been tightly constrained or simplified.
[~]
Directed Motion Control
Testing
Community feedback: A repeated criticism was that generative movement still does not behave like observed real-world motion. Interpretation: Motion direction is still too generalized and statistically averaged in complex scenes. Action: Test simpler body actions, stronger motion references, and narrower prompt intent before escalating complexity.
[ VIDEO PIPELINE ]
[✓]
Continuity Stress Test
Done
Test long continuous shots using first-frame to last-frame consistency. Check subject identity, wardrobe, scene integrity, camera logic, and drift over time. Community feedback: Several viewers noted that the sequence does not fully feel glued together as one continuous piece. Continuity reads better at normal playback speed than under closer scrutiny. Interpretation: Temporal consistency and shot-to-shot coherence are improving, but long-form continuity still breaks when evaluating the sequence as a unified scene. Action: Keep testing first-frame / last-frame workflows and reduce scene-to-scene visual drift.
[✓]
Multi-Output Test 1080p (5s-20s)
Done
Benchmark native 20-second generation. Track render time, visual degradation, motion drift, and overall coherence. Test 1080p 15-second output. Compare speed, temporal consistency, and artifact rate against shorter generations. Benchmark 10-second generation as a mid-length test. Compare quality and speed against 5-second and 15-second runs. Establish short-output baseline for speed, stability, fidelity, and prompt reliability.
[✓]
Multishot Sequencing
Done
Test scene-to-scene sequencing. Evaluate subject consistency, shot transitions, pacing, and editorial logic across multiple shots. Community feedback: Multiple comments suggested that individual shots can work on their own, but the full sequence feels layered or assembled rather than fully unified. Interpretation: The work currently performs better as editorial montage than strict narrative continuity. Action: Treat multishot pieces as mood-driven sequences unless stronger inter-shot controls are in place.
[✓]
Seed Repeatability
Done
Test whether the same seed produces the same animation or closely matching motion on repeat runs. Result: You should not expect perfect deterministic motion across all reruns or across different images. Community feedback: Perspective and place consistency appear to drift from shot to shot. Interpretation: Even when individual frames are strong, repeatable structural consistency is not yet stable enough across a sequence. Action: Use seed control, tighter source framing, and reduced variation when continuity matters more than exploration.
[~]
Object Motion
Testing
Test believable motion for inanimate subjects such as products, props, or packaging. Compare object-driven motion versus camera-driven motion. Community feedback: The speed of the machines in the background of the opening shot was specifically flagged as giving the effect away. Interpretation: Secondary object motion and background mechanics remain a visible realism breakpoint. Action: Audit prop and machine motion more aggressively, especially repeating background actions.
[ ]
3D Motion Test
Pending
Evaluate spatial believability, parallax, dimensional consistency, rotation logic, and camera movement around subjects. Community feedback: Viewers called out that real-world movement and environmental motion still feel synthetic in places, especially when physical timing and perspective need to hold together. Interpretation: Spatial believability is improving, but motion still exposes weaknesses faster than static composition. Action: Prioritize camera realism, grounded movement, and environmental timing over ambitious motion complexity.
[ ]
2D Motion Test
Pending
Evaluate stylized flat-motion workflows. Check frame stability, graphic consistency, and temporal smoothness.
[ IMAGE RELIABILITY ]
[✓]
Hand Accuracy
Done
Test finger count, hand structure, gesture fidelity, and failure cases involving complex hand poses.
[✓]
High-Res Source Check
Done
Verify whether 2K+ source images improve generation quality, detail retention, and consistency in downstream image-to-video workflows.
[✓]
Clothing Fidelity
Done
Test outfit consistency, garment structure, folds, material behavior, and detail retention across generations. Community feedback: Hair and makeup were described as averaging toward a generic middle rather than feeling specifically designed. Interpretation: Styling details are currently converging toward safe statistical outputs instead of highly intentional character design. Action: Push stronger styling references and custom control where wardrobe, grooming, and identity specificity matter.
[✓]
Animal Accuracy
Done
Test animal anatomy, realism, interaction with humans, and reliability in still and motion workflows.
[~]
Multi-Character Stability
Testing
Test multiple characters in one shot. Community feedback: No direct multi-character failure was singled out here, but broader comments about layered assembly and lack of cohesion suggest scene complexity still increases instability. Interpretation: As subject count and interaction complexity rise, the sequence becomes harder to unify convincingly. Action: Favor simpler subject staging until stronger consistency controls are in place.
[✓]
Skin Texture Realism
Done
Add or leave natural human blemishes, scars, hairs etc. No smoothing. Community feedback: Skin smoothing was one of the most immediate realism tells. Viewers noted that faces still look too perfect. Interpretation: Over-clean skin reduces believability and pushes faces toward synthetic beauty treatment. Action: Preserve pores, texture, minor blemishes, and natural facial irregularities instead of over-smoothing.
[ SYSTEM CAPABILITIES ]
[~]
Offline LLM
Testing
Evaluate local/offline language model options for prompting, planning, summarization, and workflow support.
[✓]
Offline Image Generator
Done
Tested local image generation for storyboard creation, concept exploration, speed, and consistency. In practice, high-quality local models such as Flux2.dev are too computationally heavy for standard prosumer hardware. On an RTX 3090, generation times were extremely slow, with single images taking roughly 30 minutes. Reaching cloud-level performance and realism would require significantly more expensive hardware, making the local route less cost-effective than using mature cloud tools like Firefly, Midjourney, or Nano Banana for production-grade image generation.
[ ]
Depth Map Workflow
Pending
Test depth-map use for parallax, camera moves, pseudo-3D shots, and image-to-video enhancements. Community feedback: One of the strongest positive notes was that lighting consistency and spatial depth are improving. Interpretation: Depth and scene structure are becoming more convincing, even if full realism is not solved. Action: Keep testing depth-assisted workflows to strengthen dimensionality and reduce flat layered motion.
[~]
Audience Acceptance Threshold
Testing
Community feedback: Several comments suggested that trained creatives and VFX-aware viewers notice flaws quickly, while general audiences are more likely to accept the work at face value, especially in casual viewing contexts. Interpretation: Perceived success is audience-dependent. Mobile playback and everyday viewing conditions may be much more forgiving than frame-by-frame professional scrutiny. Action: Evaluate outputs against intended audience context before judging commercial viability.
[ TRAINING & CONTROL ]
[~]
LoRA Training Pipeline
Testing
Test LoRA workflow for character consistency, style locking, product identity, and reusable visual control. Community feedback: Styling, skin treatment, and visual identity still tend to collapse toward generic outputs. Interpretation: Base-model generation is not enough when you need more specific identity, texture, and repeatability. Action: Use LoRA or stronger custom control when character specificity and visual continuity matter.
[~]
Custom Voice Input
Testing
Test voice-driven prompting and custom voice input workflows. Evaluate speed, clarity, and practical integration.
[✓]
Physical Contact Realism
Done
Test believable touch interactions between people, clothing, props, and surfaces. Watch for fusion, distortion, or contact errors. Community feedback: Viewers repeatedly described the output as struggling with how real objects and people move, especially when believable interaction and physical logic are required. Interpretation: Physical realism remains one of the most exposed weak points in motion generation. Action: Avoid complex contact-heavy actions unless motion has been tightly constrained or simplified.
[~]
Directed Motion Control
Testing
Community feedback: A repeated criticism was that generative movement still does not behave like observed real-world motion. Interpretation: Motion direction is still too generalized and statistically averaged in complex scenes. Action: Test simpler body actions, stronger motion references, and narrower prompt intent before escalating complexity.
3D Motion Test
VIDEO PIPELINE → PENDING
April 12, 2026 6:05 am
Animal Accuracy
IMAGE RELIABILITY → DONE
April 12, 2026 6:05 am
Custom Voice Input
TRAINING & CONTROL → TESTING
April 12, 2026 6:05 am
Physical Contact Realism
TRAINING & CONTROL → DONE
April 12, 2026 6:05 am
Offline LLM
SYSTEM CAPABILITIES → TESTING
April 12, 2026 6:05 am
Seed Repeatability
VIDEO PIPELINE → DONE
April 12, 2026 6:04 am
Clothing Fidelity
IMAGE RELIABILITY → DONE
April 12, 2026 6:04 am
Multishot Sequencing
VIDEO PIPELINE → DONE
April 12, 2026 6:04 am
Multishot Sequencing
VIDEO PIPELINE → PENDING
April 12, 2026 6:04 am
Skin Texture Realism
IMAGE RELIABILITY → DONE
April 8, 2026 7:01 pm
Continuity Stress Test
VIDEO PIPELINE → DONE
April 8, 2026 7:01 pm
Offline Image Generator
SYSTEM CAPABILITIES → DONE
April 6, 2026 8:04 am
Directed Motion Control
TRAINING & CONTROL → TESTING
March 29, 2026 7:24 pm
Physical Contact Realism
TRAINING & CONTROL → TESTING
March 29, 2026 7:24 pm
LoRA Training Pipeline
TRAINING & CONTROL → TESTING
March 29, 2026 7:24 pm
3D Motion Test
VIDEO PIPELINE → TESTING
March 29, 2026 7:24 pm
Object Motion
VIDEO PIPELINE → TESTING
March 29, 2026 7:24 pm
Multishot Sequencing
VIDEO PIPELINE → TESTING
March 29, 2026 7:24 pm
Continuity Stress Test
VIDEO PIPELINE → TESTING
March 29, 2026 7:24 pm
Skin Texture Realism
IMAGE RELIABILITY → TESTING
March 29, 2026 7:22 pm
Clothing Fidelity
IMAGE RELIABILITY → TESTING
March 29, 2026 7:22 pm
Hand Accuracy
IMAGE RELIABILITY → DONE
March 28, 2026 2:24 pm
Hand Accuracy
IMAGE RELIABILITY → TESTING
March 28, 2026 2:24 pm
3D Motion Test
VIDEO PIPELINE → PENDING
March 27, 2026 7:46 pm
Clothing Fidelity
IMAGE RELIABILITY → DONE
March 27, 2026 7:45 pm
Seed Repeatability
VIDEO PIPELINE → TESTING
March 27, 2026 4:58 pm
Object Motion
VIDEO PIPELINE → PENDING
March 27, 2026 4:57 pm
