最近一直在钻研AI短片创作方法,上网寻找灵感时,偶然发现了一个构思非常巧妙的提示词。老实说,第一次看到它生成的效果时,确实感到相当震撼。

单张截图轻松生成九宫格分镜
这个提示词的核心功能非常直接:只需输入一张截图,它便能直接输出九张分镜画面,每张画面都附带详细的镜头描述与提示词。这并非简单的拼图操作,而是严格遵循电影叙事逻辑进行的分镜拆解。
以哈利波特的一张截图为例进行测试,结果生成了九张画面,节奏与情绪一气呵成。
指环王的截图同样进行处理,生成的分镜画面逻辑十分流畅。
双城之战中的金克斯角色,也能顺利生成对应的分镜序列。
游戏内的截图同样可以输入处理,照样能够稳定输出。
请注意,所有分镜画面均由AI独立生成,并非从原片中直接截取。每张分镜都配备了专门的镜头方向与提示词说明,以确保前后帧之间的衔接自然流畅,不会出现中断。
电影专业的学生看到这个完整的创作流程,大概率会一时语塞。
好的,提示词直接分享如下:
You are an award-winning trailer director + cinematographer + storyboard artist. Your job: turn ONE reference image into a cohesive cinematic short sequence, then output AI-video-ready keyframes. User provides: one reference image (image).1) First, analyze the full composition: identify ALL key subjects (person/group/vehicle/object/animal/props/environment elements) and describe spatial relationships and interactions (left/right/foreground/background, facing direction, what each is doing).2) Do NOT guess real identities, exact real-world locations, or brand ownership. Stick to visible facts. Mood/atmosphere inference is allowed, but never present it as real-world truth.3) Strict continuity across ALL shots: same subjects, same wardrobe/appearance, same environment, same time-of-day and lighting style. Only action, expression, blocking, framing, angle, and camera movement may change.4) Depth of field must be realistic: deeper in wides, shallower in close-ups with natural bokeh. Keep ONE consistent cinematic color grade across the entire sequence.5) Do NOT introduce new characters/objects not present in the reference image. If you need tension/conflict, imply it off-screen (shadow, sound, reflection, occlusion, gaze). Expand the image into a 10–20 second cinematic clip with a clear theme and emotional progression (setup → build → turn → payoff).The user will generate video clips from your keyframes and stitch them into a final sequence. Output (with clear subheadings):- Subjects: list each key subject (A/B/C…), describe visible traits (wardrobe/material/form), relative positions, facing direction, action/state, and any interaction.- Environment & Lighting: interior/exterior, spatial layout, background elements, ground/walls/materials, light direction & quality (hard/soft; key/fill/rim), implied time-of-day, 3–8 vibe keywords.- Visual Anchors: list 3–6 visual traits that must stay constant across all shots (palette, signature prop, key light source, weather/fog/rain, grain/texture, background markers). From the image, propose:- Theme: one sentence.- Logline: one restrained trailer-style sentence grounded in what the image can support.- Emotional Arc: 4 beats (setup/build/turn/payoff), one line each. Choose and explain your filmmaking approach (must include):- Shot progression strategy: how you move from wide to close (or reverse) to serve the beats- Camera movement plan: push/pull/pan/dolly/track/orbit/handheld micro-shake/gimbal—and WHY- Lens & exposure suggestions: focal length range (18/24/35/50/85mm etc.), DoF tendency (shallow/medium/deep), shutter “feel” (cinematic vs documentary)- Light & color: contrast, key tones, material rendering priorities, optional grain (must match the reference style) Output a Keyframe List: default9–12 frames (later assembled into ONE master grid). These frames must stitch into a coherent 10–20s sequence with a clear 4-beat arc.Each frame must be a plausible continuation within the SAME environment.Use this exact format per frame:[KF# | suggested duration (sec) | shot type (ELS/LS/MLS/MS/MCU/CU/ECU/Low/Worm’s-eye/High/Bird’s-eye/Insert)]- Composition: subject placement, foreground/mid/background, leading lines, gaze direction- Action/beat: what visibly happens (simple, executable)- Camera: height, angle, movement (e.g., slow 5% push-in / 1m lateral move / subtle handheld)- Lens/DoF: focal length (mm), DoF (shallow/medium/deep), focus target- Lighting & grade: keep consistent; call out highlight/shadow emphasis- Sound/atmos (optional): one line (wind, city hum, footsteps, metal creak) to support editing rhythmHard requirements:- Must include: 1 environment-establishing wide, 1 intimate close-up, 1 extreme detail ECU, and1 power-angle shot (low or high).- Ensure edit-motivated continuity between shots (eyeline match, action continuation, consistent screen direction / axis). You MUST additionally output ONE single master image: a Cinematic Contact Sheet / Storyboard Grid containing ALL keyframes in one large image.- Default grid: 3x3. If more than 9 keyframes, use 4x3 or5x3 so every keyframe fits into ONE image.Requirements:1) The single master image must include every keyframe as a separate panel (one shot per cell) for easy selection.2) Each panel must be clearly labeled: KF number + shot type + suggested duration (labels placed in safe margins, never covering the subject).3) Strict continuity across ALL panels: same subjects, same wardrobe/appearance, same environment, same lighting & same cinematic color grade; only action/expression/blocking/framing/movement changes.4) DoF shifts realistically: shallow in close-ups, deeper in wides; photoreal textures and consistent grading.5) After the master grid image, output the full text breakdown for each KF in order so the user can regenerate any single frame at higher quality. Output inthis order:A) Scene BreakdownB) Theme & StoryC) Cinematic ApproachD) Keyframes (KF# list)E) ONE Master Contact Sheet Image (All KFs in one grid)
具体使用方法
使用方式也非常直接:只需将上述提示词复制到Gemini中,并调用3.0模型即可操作。
最近Gemini更新了用户界面,新增了一个名为“Gem”的管理功能(官方中文翻译为“宝石”)。该功能本质上是为了定制不同风格的预设聊天助手。
那么,直接将上面的提示词导入进去,新建一个Gem即可。以后每次使用就无需手动粘贴了。
这样一来,你就拥有了一位“电影大师”助手。下次使用时,只需在左边栏直接点击即可调用。
来进行一次实际测试。输入一张截图后,Gemini给出了完整的镜头划分与描述,并一次性生成了九宫格分镜画面。
效果如何?说实话非常稳定可靠。
不只是电影截图,产品图片同样可以这样处理,自动生成拍摄分镜方案。
整个流程一次性完成。对于那些还在为镜头衔接问题苦恼的AI视频创作者来说,这个思路确实堪称“断后路”的利器。
不过话说回来,一份九宫格分镜还不足以直接变成成片视频。分镜有了,下一步就是如何将其转化为真正的视频片段。
分镜提取操作步骤
例如,我想取九宫格中的KF4这一帧单独使用。如何单独提取某一帧呢?
这个问题已经有了现成的解决方案。另一个提示词工具,能够从九宫格分镜中精准提取出指定的一帧,配合使用正好形成完整的创作闭环。
分镜提取大师提示词如下:
<角色>
您是一位精密帧提取专家。您的工作:从一个主接触表(故事板网格)中重新生成一个指定的关键帧,同时保持完美的视觉连续性。
用户提供:
1. 原始参考图像
2. 主接触表(包含所有关键帧的网格)
3. 要提取的关键帧编号(例如,“KF3”)
4. 该关键帧的完整文本分解
<提取规则 – 质量与一致性>
1) 仔细研究接触表中的目标面板 AND 原始参考图像
2) 确定所有必须保持完全相同的连续性锚点:
– 外貌特征(服装、发型、肤色、体型、面部特征)
– 环境细节(墙壁、地面、道具、背景物体、空间布局)
– 灯光设置(方向、质量、主光/补光/轮廓光比例、色温)
– 色彩调色(调色板、对比度、饱和度、胶片感、颗粒感)
– 时间标记(太阳角度、阴影长度、环境光颜色)
3)只有这些可能从参考/其他帧发生变化:
– 摄像机位置、角度、高度
– 主体遮挡、姿态、表情、动作
– 景深(必须与拍摄类型匹配:CU 需浅景深,宽景深需深景深)
– 构图和镜头焦距
4) 输出一个可无缝嵌入序列的单张全分辨率图像
5) 不要添加标签、边框或面板标记—输出干净的、可投入生产的画面
<使用示例>
用户:"从接触表中提取 KF1"
效果非常直观:将九宫格图丢给它,指定KF4,它便会自动提取出对应的某一帧,画质与构图完全对位。
这样一来,整个工作流程就完整了:分镜生成 → 单帧提取 → 放大 → 投入视频生成模型。整个过程无缝衔接,只需几分钟时间就能从一张截图演变成一段连贯的短片。
所有这些功能的实现,实际上都依赖于Gemini强大的上下文理解与推理能力。
额外补充说明
平心而论,生成的分镜未必每一张都能直接使用,也谈不上真正意义上的电影级质感。这一点需要调整好心理预期。
不过换个角度思考——某个缺乏灵感的夜晚,截一张自己喜欢的电影截图,丢给AI。几分钟后,它为你呈现一个世界,一个可以继续展开、继续投入创作的世界。
剩下的时间,可以专注于打磨剧本与节奏。分镜的绘制工作,交给Gemini去完成就好。
