WO2018177139A1 - Procédé et appareil de génération de résumé vidéo, serveur et support de stockage - Google Patents

Procédé et appareil de génération de résumé vidéo, serveur et support de stockage Download PDF

Info

Publication number
WO2018177139A1
WO2018177139A1 PCT/CN2018/079246 CN2018079246W WO2018177139A1 WO 2018177139 A1 WO2018177139 A1 WO 2018177139A1 CN 2018079246 W CN2018079246 W CN 2018079246W WO 2018177139 A1 WO2018177139 A1 WO 2018177139A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
target
sub
frames
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2018/079246
Other languages
English (en)
Chinese (zh)
Inventor
曾佩玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Publication of WO2018177139A1 publication Critical patent/WO2018177139A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8549Creating video summaries, e.g. movie trailer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4884Data services, e.g. news ticker for displaying subtitles

Definitions

  • the embodiments of the present invention relate to the field of computer applications, and in particular, to a video summary generation method, apparatus, server, and storage medium.
  • Video this type of text is called a video summary.
  • the description of the video summary has a significant impact on the number of page views, so how to create a better-performing video summary is a concern for video sites or video producers.
  • the video summary is manually created, that is, the staff writes a description of the video, and after the completion of the writing, the description is displayed as a video summary on the corresponding website for the user to browse.
  • the video summary produced can only be directed to the video itself.
  • the video summary seen by each user is the same, but different users have different preferences. For the same video, different users want to obtain The effective information is not the same, and the manually produced video summaries are less targeted and cannot provide effective information related to the video for each user.
  • the embodiment of the invention provides a method, a device, a server and a storage medium for generating a video summary, which are used to automatically generate different video summaries for different users, improve the browsing amount of videos, provide effective information for more users, and improve the number of users.
  • the efficiency of video summary generation is used to automatically generate different video summaries for different users, improve the browsing amount of videos, provide effective information for more users, and improve the number of users.
  • an aspect of the embodiments of the present invention provides a video summary generating method, which is used in a server, where the method includes:
  • N target frames corresponding to the user from the plurality of video frames according to user characteristics, where N is an integer greater than 1;
  • An aspect of an embodiment of the present invention provides a video summary generating apparatus, where the apparatus includes:
  • a segmentation module configured to divide the target video into a plurality of video frames
  • a first determining module configured to determine, according to user characteristics, N target frames corresponding to the user from the plurality of video frames, where N is an integer greater than 1;
  • An extracting module configured to extract subtitles in the N target frames
  • a generating module configured to generate a target video summary according to the subtitle.
  • An aspect of an embodiment of the present invention provides a server, where the server includes:
  • One or more processors are One or more processors; and,
  • the memory stores one or more programs, the one or more programs being configured to be executed by the one or more processors, the one or more programs including instructions for performing the following operations:
  • N target frames corresponding to the user from the plurality of video frames according to user characteristics, where N is an integer greater than 1;
  • An aspect of an embodiment of the present invention provides a computer readable storage medium, where the storage medium stores at least one instruction, at least one program, a code set, or a set of instructions, the at least one instruction, the at least one program,
  • the code set or set of instructions is loaded and executed by the processor to implement a video summary generation method as described above.
  • the embodiment of the present invention may divide the target video into a plurality of video frames, determine N target frames corresponding to the user according to the user characteristics, extract subtitles in the N target frames, and generate a target video summary of the user according to the extracted subtitles. It can be seen that the solution can automatically generate a video summary, and can display different video summaries to different users according to user characteristics, which is more targeted, can improve the video browsing amount, provide effective information for more users, and improve the video summary. The efficiency of the generation.
  • FIG. 1 is a schematic diagram of an embodiment of a video summary generating system in an embodiment of the present invention
  • FIG. 2 is a flowchart of an embodiment of a video summary generating method in an embodiment of the present invention
  • FIG. 3 is a flowchart of another embodiment of a video summary generating method in an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of an embodiment of a video summary generating apparatus in an embodiment of the present invention.
  • FIG. 5 is a schematic diagram of another embodiment of a video summary generating apparatus according to an embodiment of the present invention.
  • FIG. 6 is a schematic diagram of another embodiment of a video summary generating apparatus according to an embodiment of the present invention.
  • FIG. 7 is a schematic diagram of another embodiment of a video summary generating apparatus according to an embodiment of the present invention.
  • FIG. 8 is a schematic diagram of another embodiment of a video summary generating apparatus according to an embodiment of the present invention.
  • FIG. 9 is a schematic diagram of another embodiment of a video summary generating apparatus according to an embodiment of the present invention.
  • FIG. 10 is a schematic diagram of another embodiment of a video summary generating apparatus according to an embodiment of the present invention.
  • the embodiment of the invention provides a method, a device, a server and a storage medium for generating a video summary, which are used to automatically generate different video summaries for each user, improve the browsing amount of the video, provide effective information for more users, and improve the number of users.
  • the efficiency of video summary generation is used to automatically generate different video summaries for each user, improve the browsing amount of the video, provide effective information for more users, and improve the number of users.
  • FIG. 1 a video summary generation method, apparatus, server, and storage medium are provided.
  • the system may include a service system composed of at least one server 101, and a plurality of terminals 102.
  • the server 101 in the service system may store data for generating a video summary, and transmit the generated video summary to the terminal 102.
  • the terminal 102 can be configured to upload the target video data that needs to generate a video summary to the server 101, and display the video summary returned by the server 101.
  • the terminal 102 is not limited to the personal computer (PC, Personal Computer) shown in FIG. 1 , and may be another device capable of acquiring and displaying a video summary, such as a mobile phone or a tablet computer.
  • the user can upload the target video to the server 101 through the terminal 102.
  • the server 101 generates a video summary corresponding to the user for each user by using the video summary generating method in the embodiment of the present invention, and returns the terminal to the terminal 102.
  • the user-matched video summary of 102, the terminal 102 then presents the video summary returned by the server to the user.
  • a video frame is a single image of the smallest unit in an image animation.
  • a frame is a still picture, and continuous frames form an image animation, such as a TV.
  • image animation each frame is a still image. The frame is displayed continuously and continuously to form an illusion of motion.
  • Key frames to represent the movement or change of any image animation, must at least give two different key states before and after, and the change and connection between the intermediate states between the two key states can be automatically completed by the computer, in Flash.
  • the frame representing the critical state is called a key frame.
  • Lens data refers to a piece of video data captured by the camera at one time. It is the basic physical unit of video structuring.
  • K-means clustering is a typical distance-based clustering algorithm. Distance is used as the evaluation index of similarity, that is, the closer the distance between two objects is, the greater the similarity is.
  • the algorithm considers clusters to be composed of objects that are close together, thus making compact and independent clusters the ultimate goal.
  • the principle of the algorithm is to input the number of clusters k and the database containing n data objects, and finally output the k clusters that meet the standard of the smallest variance.
  • the k clusters have the following characteristics: each cluster itself is as compact as possible, and each cluster is separated as much as possible.
  • the process is as follows: firstly, k objects are arbitrarily selected from n data objects as the initial cluster center; and for other objects remaining, according to their similarity (distance) with these cluster centers, they are respectively assigned to The most similar clusters (represented by cluster centers) get each new cluster; then calculate the cluster center of each new cluster (the mean of all objects in the cluster); repeat this process until The standard measure function begins to converge.
  • the mean square error is generally used as a standard measure function.
  • video summary generation method, apparatus, server, and storage medium in the embodiments of the present invention are applicable to the video summary production mentioned above, and can also be applied to other video-related text introductions such as the creation of the text portion of the movie poster. This is not limited here.
  • an embodiment of the video summary generation method in the embodiment of the present invention includes:
  • the target video is first input to the video summary generating device, and the video summary generating device acquires the target video and divides the target video into a plurality of video frames.
  • the video summary generating means can be located in the server 101 shown in FIG.
  • the target video may be one or more video sequences, such as a movie, a few episodes of a TV series, or other videos, which are not limited herein.
  • the video summary generating device divides the target video into a plurality of video frames, determining N target frames corresponding to the user according to the user characteristics, wherein the target frame is selected from several video frames of the target video, that is, the video summary generating device According to the user characteristics, the N target frames corresponding to the user are selected from a plurality of video frames.
  • the number of the target frames N is an integer greater than 1, and the value of N can be set by the user or the system, which is not limited herein.
  • the video summary generating device After determining the N target frames corresponding to the user, the video summary generating device extracts the subtitles in the N target frames corresponding to the user.
  • subtitle refers to the non-image content such as dialogues and actions in TV dramas, movies and other film and television works in the form of words, and also refers to the texts processed in the post-production of film and television works.
  • the subtitles may also include symbols, expressions, and the like, which are not limited herein.
  • the target video digest is generated based on the extracted subtitles.
  • the target video summary refers to a video summary of the target video for describing the content of the target video to the user. It should be understood that the target video summary generated from the subtitles should conform to the requirements of natural language and consist of one or more complete sentences.
  • steps 201-204 may be performed for each user.
  • the first step is to split the target video into several video frames, and the several video frames divided for different users are the same, so the divided video frames can be multiplexed. . That is, if the video summary is generated for the first user, steps 201-204 need to be performed; if the video summary is generated for subsequent users, several video frames that have been divided can be read, and then steps 202-204 are performed.
  • the embodiment of the present invention may divide the target video into a plurality of video frames, determine N target frames corresponding to each user according to the user characteristics, extract subtitles in the N target frames, and generate a target video of the user according to the extracted subtitles. Summary. It can be seen that the solution can automatically generate a video summary, and can display different video summaries to different users according to user characteristics, which is more targeted, can improve the video browsing amount, provide effective information for more users, and improve the video summary. The efficiency of the generation.
  • the target video can be divided into video frames in a plurality of manners, and the manner of determining the target frame is different according to different manners, and the video in the embodiment of the present invention is taken as an example.
  • FIG. 3 another embodiment of the method for generating a video summary in the embodiment of the present invention includes:
  • the target video is first input to the video summary generating device, and the video summary generating device acquires the target video, and divides the target video into a plurality of lens data, for example, according to the distance of the color space. Or other parameters are divided, which is not limited here.
  • the video summary generating means can be located in the server 101 shown in FIG.
  • the target video may be one or more video sequences, such as a movie, a few episodes of a TV series, or other videos, which are not limited herein.
  • each lens data is also divided into sub-lens data.
  • the segmentation may be performed according to other parameters such as the camera motion direction, which is not limited herein.
  • the video summary generating means divides each shot data into a plurality of sub-shot data, and also divides each sub-lens data into a plurality of video frames.
  • the L sub-shot data corresponding to the user is determined according to the user feature, that is, the video summary generating device selects the L corresponding to the user from the plurality of video frames according to the user feature.
  • Sub-shot data, L is an integer equal to or greater than one.
  • the video summary generating device may determine, in the sub-shot data corresponding to the target video, target sub-lens data including the tag information corresponding to the user, and determine, in the target sub-shot data, the preset sub-lens weights before the ranking L. Sub-shot data.
  • the sub-lens weights in the embodiment of the present invention may be determined by the video summary generating device dividing each sub-lens data into a plurality of video frames, and according to the duration length of the sub-shots, the sub-shots are included.
  • the number of video frames is used as the value of the weight of the sub-lens.
  • the weight of the sub-lens may be determined according to the weight of the video frame included in the sub-lens, and may be determined according to other parameters, which is not limited herein.
  • tag information corresponding to the user in the embodiment of the present invention may be the name of the actor in the user tag, may be the name of the director in the user tag, may be the type of the movie in the user tag, or may be in the user tag. Other information is not limited here.
  • the video summary generating apparatus may directly use the sub-lens data of the top L-weight of the sub-lens weight as the L sub-shot data corresponding to the user.
  • the sub-lens data of the top L of the sub-lens weight ranking may be determined by the following method: the video summary generating device sorts all the sub-shot data according to the sub-lens weights in descending order, and selects the sorted sub-shot data from the sorted sub-lens data. The sub-lens data ranked in the top L, and the selected L sub-lens data are used as L sub-shot data corresponding to the user.
  • the remaining L-M target sub-shot data are further selected from the sub-shot data corresponding to the target video according to the sub-shot weight. That is, if the number M of target sub-lens data including the tag information corresponding to the user is less than L, the video digest generating device selects all the target sub-lens data, and the sub-lens weights are not in the order of the target video. The selected sub-shot data is sorted, and the remaining LM target sub-shot data are selected from the sorted sub-shot data.
  • the video summary generating device may determine the target sub-lens data according to the video information viewed by the user, the video information collected by the user, and the keyword searched by the user, which is not limited herein. .
  • the video summary generating device determines X target frames in each of the L sub-shot data according to the preset frame weight.
  • X is an integer equal to or greater than 1, and X is multiplied by L equal to N.
  • the frame weight is determined after the video summary generating device divides the sub-lens data into several video frames, and may be determined by: for each sub-shot data, the video frames in the sub-shot data are clustered by K-means. Divided into class K, the video frames closest to the cluster center in each type of video frame are determined as key frames of the video frame, and the frame weight of each key frame is determined according to the frame parameters.
  • the frame parameters include the proportion of the face, or the direction of camera movement, or the focal length of the camera, or whether the camera is rocking, or other parameters.
  • Each of the sub-lens data herein may be in all of the sub-shot data in the target video, or may be in the L sub-shot data determined for the user, which is not limited herein.
  • the video summary generating apparatus may determine a key frame included in each of the L sub-shot data, and determine, in the L-sub-shot data, a key frame included in each sub-shot data,
  • the X video frames with the largest frame weight, the X video frames are the X target frames in the sub-shot data.
  • the video summary generating apparatus may determine the frame weight and the X target frames by other means, which are not limited herein.
  • the video summary generating device After determining the target frame corresponding to the user, the video summary generating device extracts the subtitles in the N target frames corresponding to the user.
  • subtitle refers to the non-image content such as dialogues and actions in TV dramas, movies and other film and television works in the form of words, and also refers to the texts processed in the post-production of film and television works.
  • the subtitles may also include symbols, expressions, and the like, which are not limited herein.
  • a preset length of the subtitle in the target frame is extracted.
  • the preset length is set by the user or the video summary generating device, and the preset length may be a limitation on the number of characters, a limitation on the number of sentences, or a limitation on the paragraph, for example, a preset length. It can be 30 words, it can be 3 sentences, it can be 1 paragraph, or it can be other lengths, which is not limited here.
  • a caption of a certain length before and after the target frame is extracted.
  • the front and rear refer to the order in which the subtitles appear in the target frame, and the length is a preset length, which is similar to the preset length, and will not be described here.
  • the following description is given by way of example: for each target frame, the first three sentences and the last three sentences in the subtitles of the target frame are extracted. It should be understood that the above is only an example and does not constitute a limitation of the embodiments of the present invention.
  • the subtitles in the target frame may be extracted by other means, which is not limited herein.
  • the target video digest is generated based on the extracted subtitles.
  • the target video summary refers to a video summary of the target video for describing the content of the target video to the user. It should be understood that the target video summary generated from the subtitles should conform to the requirements of natural language and consist of one or more complete sentences.
  • the video summary generating apparatus may generate the target video summary by:
  • the plurality of keywords in the subtitle are extracted, and the extracted plurality of keywords are combined to generate at least one sentence, and the composed one or more sentences constitute a target video summary corresponding to the user.
  • the keyword may be a word whose frequency of occurrence in the subtitle is greater than a preset value, may be a word whose word form is a preset type in the subtitle, may be a word in the subtitle that matches a preset word, or may be determined by other means.
  • the words are not limited here.
  • the sentence generated by the combination should satisfy the natural language requirement and should be a complete sentence.
  • the video summary generating device may also generate a target video summary corresponding to the user by other means, which is not limited herein.
  • steps 301-307 may be performed for each user.
  • the first three steps divide the target video into several video frames, and the several video frames divided for different users are the same, so the divided video frames can be multiplexed. . That is, if the video digest is generated for the first user, steps 301-307 are required; if the video digest is generated for subsequent users, a plurality of video frames that have been divided may be read, and then steps 304-307 are performed.
  • the video digest generating device may further update the video digest according to a preset rule.
  • the preset rule refers to a preset update rule, which may be a time period, that is, the video summary is updated periodically, such as updating once a week, updating once a month, etc., and may be a trigger condition, such as an episode of each episode of the TV series. Then update the video summary, and it can be other rules, which is not limited here.
  • the embodiment of the present invention may divide the target video into a plurality of video frames, determine N target frames corresponding to the user according to the user characteristics, extract subtitles in the N target frames, and generate a target video summary of the user according to the extracted subtitles. It can be seen that the solution can automatically generate a video summary, and can display different video summaries to different users according to user characteristics, which is more targeted, can improve the video browsing amount, provide effective information for more users, and improve the video summary. The efficiency of the generation.
  • the embodiment of the invention provides a method for dividing a target video into a plurality of video frames, which improves the achievability of the solution.
  • the embodiment of the present invention provides a plurality of manners for determining a target frame, and various manners of extracting subtitles and generating a digest, thereby improving the flexibility of the solution.
  • the embodiment of the present invention may update the video summary to further improve the timeliness of the video summary.
  • the system inputs two videos (target video) of the first episode and the second episode of the TV series "Small Divor", and the video summary generating device divides the two videos into six lens data according to the color space distance, and then the six lens data.
  • the 24 sub-lens data is divided into 24 sub-lens data according to the moving direction of the camera, and then the 24 sub-lens data is divided into 100 video frames.
  • the video summary generating means uses the number of video frames included in the sub-lens data as the weight of the sub-lens data.
  • the video summary generating device divides the video frames in the sub-shot data into three categories by mean clustering for each sub-shot data, and determines the video frames closest to the cluster center in each type of video frames as the video frames.
  • the key frame that is, each sub-lens data corresponds to three key frames, and then the frame weight corresponding to the key frame is determined according to the proportion of the face in the image corresponding to the key frame.
  • the sub-shot data is recorded as a, b, and c, respectively.
  • the video summary generating means determines that the target video contains sub-shot data of Haiqing, and the result shows that there are 15 sub-lens data including Haiqing (target sub-shot data), and then the video summary generating means determines the 15 sub-shot data.
  • the video summary generating device After determining the three target frames corresponding to A, the video summary generating device extracts all the subtitles in the three target frames, wherein the subtitle corresponding to a1 is “Dad, my English test fails”, “Flowing, how can fail” "Is it not always good at English?" "Mom knows that I must marry me. Can you go to the parent meeting on Sunday?" "Well, I will go to the parent meeting on Sunday.”
  • the subtitles corresponding to b1 are: "The English score failed to pass the mother, and the mother did not put it in the eyes.”
  • the subtitle corresponding to c1 is "Flowering, how can I bring the dog back without my consent? I can't raise a dog at home.” "I always wanted to raise a dog, and you promised me.”
  • the video summary generating device extracts keywords “flowering”, “English grade”, “fail”, “dad”, “go to the parent club”, “want to raise a dog”, according to the subtitles corresponding to a1, b1 and c1. Without consent, “mothering the mother”, and then combining these subtitles, the sentence “following English scores failed, my father took the mother to open a parent meeting. The blossoming wants to raise a dog", the above sentence is A corresponding video summary.
  • the video summary generating means determines the key frame in b, c, d, and then according to the frame weight of the determined key frame, the three keys included from b
  • the key frame b1 with the largest frame weight is selected in the frame
  • one key frame c1 with the largest frame weight is selected from the three key frames included in c
  • one key frame d1 with the largest frame weight is selected from the three key frames included in d.
  • b1, c1, and d1 are taken as the target frames corresponding to B.
  • the video summary generating means After determining the three target frames corresponding to B, the video summary generating means extracts all the subtitles in the three target frames, and the subtitles corresponding to b1 and c1 are as described above, and the subtitle corresponding to d1 is: "Flower, mother invited you. English tutor, you have to cooperate with the teacher to improve your English score.”
  • the above sentence is the video summary of the target video corresponding to B.
  • the video summary generating device extracts keywords "flowering”, “English grade”, “fail”, “dad”, “go to the parent club”, “kick”, “mother” according to the subtitles corresponding to b1, c1 and d1. ",” invited English tutor”, “improved” and then combined these subtitles to generate a sentence "The blossoming English score failed, Dad took the mother to open the parent meeting. Mom asked English tutor to improve the English score” The above sentence is the video summary corresponding to B.
  • the video summary generating device presets an update rule: the TV show updates the video summary every two episodes of the update. A week later, the TV series "Small Separation" updated two episodes. The system entered the third and fourth episodes of "Small Divorce", and the generating device updates the video corresponding to each user according to the newly input Episode 3 video and Episode 4 video. Summary.
  • an embodiment of the video digest generating apparatus in the embodiment of the present invention includes:
  • a segmentation module 401 configured to divide the target video into a plurality of video frames
  • the first determining module 402 is configured to determine, according to the user feature, N target frames corresponding to the user from the plurality of video frames, where N is an integer greater than one;
  • An extracting module 403, configured to extract subtitles in the N target frames
  • the generating module 404 is configured to generate a target video summary according to the subtitles extracted by the extracting module 403.
  • the embodiment of the present invention may divide the target video into a plurality of video frames, determine N target frames corresponding to the user according to the user characteristics, extract subtitles in the N target frames, and generate a target video summary of the user according to the extracted subtitles. It can be seen that the solution can automatically generate a video summary, and can display different video summaries to different users according to user characteristics, which is more targeted, can improve the video browsing amount, provide effective information for more users, and improve the video summary. The efficiency of the generation.
  • the generating module 404 includes:
  • a first extracting unit 4041 configured to extract a plurality of keywords in the subtitle
  • the generating unit 4042 is configured to combine a plurality of keywords to generate at least one sentence, and use at least one sentence as the target video summary.
  • the extracting module 403 may include:
  • a second extracting unit 4031 configured to extract, for each target frame of the N target frames, all the subtitles in the target frame
  • the third extracting unit 4032 is configured to extract a preset length of the subtitle in the target frame for each of the N target frames.
  • the embodiment of the invention provides an implementation manner for generating a video summary, which improves the achievability of the solution.
  • the embodiment of the present invention provides a plurality of ways of extracting subtitles in a target frame, which improves the flexibility of the solution.
  • the segmentation module 401 includes:
  • a first dividing unit 4011 configured to divide the target video into a plurality of lens data
  • a second dividing unit 4012 configured to divide each lens data into a plurality of sub-lens data
  • the third dividing unit 4013 is configured to divide each sub-lens data into a plurality of video frames.
  • the embodiment of the invention provides an implementation manner of splitting the target video, and improves the achievability of the solution.
  • the first determining module 402 includes:
  • the first determining unit 4021 is configured to determine L sub-shot data corresponding to the user from the plurality of video frames according to the user feature, where L is an integer equal to or greater than 1;
  • the second determining unit 4022 determines, according to the preset frame weight, X target frames in each of the L sub-shot data, where X is an integer equal to or greater than 1, and X is multiplied by L equal to N.
  • the embodiment of the invention provides an implementation manner for determining a target frame, and improves the achievability of the solution.
  • the first determining unit 4021 includes:
  • the first determining sub-unit 40211 is configured to determine, in the plurality of sub-shot data corresponding to the target video, the target sub-shot data including the tag information corresponding to the user;
  • the second determining sub-unit 40212 is configured to determine, in the target sub-shot data, the sub-lens data of the top L of the preset sub-lens weights.
  • the video summary generating apparatus provides a method for determining L sub-shot data corresponding to each user, thereby improving the achievability of the solution.
  • the video summary generating apparatus further includes:
  • a classification module 405, configured to divide the video frames in the sub-shot data into K classes by K-means clustering for each sub-shot data;
  • a second determining module 406 configured to determine, in each type of video frame, a video frame that is closest to a cluster center as a key frame of the video frame;
  • a third determining module 407 configured to determine a frame weight of each key frame according to the frame parameter
  • the second determining unit 4022 includes:
  • the third determining sub-unit 40221 is configured to determine, for each of the L sub-shot data, the X target frames with the largest frame weight among the key frames included in the sub-shot data.
  • the embodiment of the invention provides a method for determining a target frame in L sub-shot data, which improves the achievability of the solution.
  • the video digest generating apparatus may further include:
  • An update module for updating a video summary according to a preset rule.
  • the video summary generating apparatus may further update the video summary according to the preset rule, thereby improving the flexibility of the solution.
  • FIG. 10 is an embodiment of the present invention.
  • the video summary generating device 50 can include an input device 510, an output device 520, a processor 530, and a memory 540.
  • the output device in the embodiment of the present invention may be a display device.
  • Memory 540 can include read only memory and random access memory and provides instructions and data to processor 530. A portion of the memory 540 may also include a non-volatile random access memory (English name: Non-Volatile Random Access Memory, English abbreviation: NVRAM).
  • NVRAM Non-Volatile Random Access Memory
  • Memory 540 stores the following elements, executable modules or data structures, or subsets thereof, or their extended sets:
  • Operation instructions include various operation instructions for implementing various operations.
  • Operating system Includes a variety of system programs for implementing various basic services and handling hardware-based tasks.
  • the processor 530 is configured to:
  • N target frames corresponding to the user from a plurality of video frames according to user characteristics, where N is an integer greater than one;
  • the processor 530 is configured to:
  • the processor 530 is configured to:
  • a preset length of the subtitle in the target frame is extracted.
  • the processor 530 is configured to:
  • Each sub-lens data is segmented into several video frames.
  • the processor 530 is configured to:
  • L is an integer equal to or greater than 1
  • the processor 530 is configured to:
  • target sub-lens data including tag information corresponding to the user
  • the processor 530 is configured to:
  • the video frames in the sub-shot data are divided into K classes by K-means clustering
  • the processor 530 is configured to:
  • the processor 530 controls the operation of the video summary generating device 50.
  • the processor 530 may also be referred to as a central processing unit (English full name: Central Processing Unit: CPU).
  • Memory 540 can include read only memory and random access memory and provides instructions and data to processor 530. A portion of the memory 540 may also include an NVRAM.
  • the various components of the video summary generating device 50 are coupled together by a bus system 550.
  • the bus system 550 may include a power bus, a control bus, a status signal bus, and the like in addition to the data bus. However, for clarity of description, various buses are labeled as bus system 550 in the figure.
  • Processor 530 may be an integrated circuit chip with signal processing capabilities.
  • each step of the foregoing method may be completed by an integrated logic circuit of hardware in the processor 530 or an instruction in a form of software.
  • the processor 530 may be a general-purpose processor, a digital signal processor (English name: Digital Signal Processing, English abbreviation: DSP), an application specific integrated circuit (English name: Application Specific Integrated Circuit, English abbreviation: ASIC), ready-made programmable Gate array (English name: Field-Programmable Gate Array, English abbreviation: FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • DSP Digital Signal Processing
  • ASIC Application Specific Integrated Circuit
  • FPGA ready-made programmable Gate array
  • the methods, steps, and logical block diagrams disclosed in the embodiments of the present invention may be implemented or carried out.
  • the general purpose processor may be a microprocessor or the processor or any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present invention may be directly implemented by the hardware decoding processor, or may be performed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a conventional storage medium such as random access memory, flash memory, read only memory, programmable read only memory or electrically erasable programmable memory, registers, and the like.
  • the storage medium is located in the memory 540, and the processor 530 reads the information in the memory 540 and performs the steps of the above method in combination with its hardware.
  • An embodiment of the present invention provides a computer readable storage medium, where the storage medium stores at least one instruction, at least one program, a code set, or a set of instructions, the at least one instruction, the at least one program, and the A code set or set of instructions is loaded and executed by the processor to implement a video summary generation method as described above.
  • the disclosed system, apparatus, and method may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium.
  • a computer device which may be a personal computer, server, or network device, etc.
  • the medium includes instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read only memory (English full name: Read-Only Memory, English abbreviation: ROM), a random access memory (English full name: Random Access Memory, English abbreviation: RAM), magnetic A variety of media that can store program code, such as a disc or a disc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Television Signal Processing For Recording (AREA)
  • Studio Circuits (AREA)

Abstract

L'invention concerne un procédé et un appareil conçus pour générer un résumé vidéo, un serveur ainsi qu'un support de stockage, qui sont utilisés pour générer automatiquement différents résumés vidéo pour différents utilisateurs, augmenter la quantité de visualisations d'une vidéo, fournir des informations efficaces pour plus d'utilisateurs et améliorer l'efficacité de la génération de résumés vidéo. Le procédé consiste à : segmenter une vidéo cible en plusieurs trames vidéo ; déterminer, en fonction d'une caractéristique d'utilisateur, N trames cibles correspondantes parmi les trames vidéo, N étant un nombre entier supérieur à 1 ; extraire un sous-titre dans les N trames cibles ; et générer un résumé vidéo cible en fonction du sous-titre.
PCT/CN2018/079246 2017-03-28 2018-03-16 Procédé et appareil de génération de résumé vidéo, serveur et support de stockage Ceased WO2018177139A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710192629.4A CN106888407B (zh) 2017-03-28 2017-03-28 一种视频摘要生成方法及装置
CN201710192629.4 2017-03-28

Publications (1)

Publication Number Publication Date
WO2018177139A1 true WO2018177139A1 (fr) 2018-10-04

Family

ID=59181973

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/079246 Ceased WO2018177139A1 (fr) 2017-03-28 2018-03-16 Procédé et appareil de génération de résumé vidéo, serveur et support de stockage

Country Status (2)

Country Link
CN (1) CN106888407B (fr)
WO (1) WO2018177139A1 (fr)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106888407B (zh) * 2017-03-28 2019-04-02 腾讯科技(深圳)有限公司 一种视频摘要生成方法及装置
CN109729425B (zh) * 2017-10-27 2021-05-18 优酷网络技术(北京)有限公司 一种关键片段的预测方法及系统
CN109756767B (zh) * 2017-11-06 2021-12-14 腾讯科技(深圳)有限公司 预览数据播放方法、装置及存储介质
CN108683924B (zh) * 2018-05-30 2021-12-28 北京奇艺世纪科技有限公司 一种视频处理的方法和装置
CN109151576A (zh) * 2018-06-20 2019-01-04 新华网股份有限公司 多媒体信息剪辑方法和系统
CN110753269B (zh) * 2018-07-24 2022-05-03 Tcl科技集团股份有限公司 视频摘要生成方法、智能终端及存储介质
CN110769279B (zh) * 2018-07-27 2023-04-07 北京京东尚科信息技术有限公司 视频处理方法和装置
CN110933488A (zh) * 2018-09-19 2020-03-27 传线网络科技(上海)有限公司 视频剪辑方法及装置
CN109413510B (zh) * 2018-10-19 2021-05-18 深圳市商汤科技有限公司 视频摘要生成方法和装置、电子设备、计算机存储介质
CN109348287B (zh) * 2018-10-22 2022-01-28 深圳市商汤科技有限公司 视频摘要生成方法、装置、存储介质和电子设备
CN111050191B (zh) * 2019-12-30 2021-02-02 腾讯科技(深圳)有限公司 一种视频生成方法、装置、计算机设备和存储介质
CN115190357B (zh) * 2022-07-05 2024-08-30 三星电子(中国)研发中心 一种视频摘要生成方法和装置
CN115334367B (zh) * 2022-07-11 2023-10-17 北京达佳互联信息技术有限公司 视频的摘要信息生成方法、装置、服务器以及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6751776B1 (en) * 1999-08-06 2004-06-15 Nec Corporation Method and apparatus for personalized multimedia summarization based upon user specified theme
CN101131850A (zh) * 2006-08-21 2008-02-27 索尼株式会社 节目提供方法及节目提供设备
CN103646094A (zh) * 2013-12-18 2014-03-19 上海紫竹数字创意港有限公司 实现视听类产品内容摘要自动提取生成的系统及方法
CN106528884A (zh) * 2016-12-15 2017-03-22 腾讯科技(深圳)有限公司 一种信息展示图片生成方法及装置
CN106888407A (zh) * 2017-03-28 2017-06-23 腾讯科技(深圳)有限公司 一种视频摘要生成方法及装置
CN106921891A (zh) * 2015-12-24 2017-07-04 北京奇虎科技有限公司 一种视频特征信息的展示方法和装置

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1616275A1 (fr) * 2003-04-14 2006-01-18 Koninklijke Philips Electronics N.V. Procede et appareil de production de resumes de videoclips par analyse du contenu
US8036263B2 (en) * 2005-12-23 2011-10-11 Qualcomm Incorporated Selecting key frames from video frames
CN101464893B (zh) * 2008-12-31 2010-09-08 清华大学 一种提取视频摘要的方法及装置
CN102184221B (zh) * 2011-05-06 2012-12-19 北京航空航天大学 一种基于用户偏好的实时视频摘要生成方法
CN104185089B (zh) * 2013-05-23 2018-02-16 三星电子(中国)研发中心 视频概要生成方法及服务器、客户端
EP2960812A1 (fr) * 2014-06-27 2015-12-30 Thomson Licensing Procédé et appareil de création d'un résumé vidéo

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6751776B1 (en) * 1999-08-06 2004-06-15 Nec Corporation Method and apparatus for personalized multimedia summarization based upon user specified theme
CN101131850A (zh) * 2006-08-21 2008-02-27 索尼株式会社 节目提供方法及节目提供设备
CN103646094A (zh) * 2013-12-18 2014-03-19 上海紫竹数字创意港有限公司 实现视听类产品内容摘要自动提取生成的系统及方法
CN106921891A (zh) * 2015-12-24 2017-07-04 北京奇虎科技有限公司 一种视频特征信息的展示方法和装置
CN106528884A (zh) * 2016-12-15 2017-03-22 腾讯科技(深圳)有限公司 一种信息展示图片生成方法及装置
CN106888407A (zh) * 2017-03-28 2017-06-23 腾讯科技(深圳)有限公司 一种视频摘要生成方法及装置

Also Published As

Publication number Publication date
CN106888407A (zh) 2017-06-23
CN106888407B (zh) 2019-04-02

Similar Documents

Publication Publication Date Title
WO2018177139A1 (fr) Procédé et appareil de génération de résumé vidéo, serveur et support de stockage
JP7201729B2 (ja) ビデオ再生ノードの位置決め方法、装置、デバイス、記憶媒体およびコンピュータプログラム
CN111143610B (zh) 一种内容推荐方法、装置、电子设备和存储介质
KR102276728B1 (ko) 멀티모달 콘텐츠 분석 시스템 및 그 방법
CN111611436B (zh) 一种标签数据处理方法、装置以及计算机可读存储介质
CN111798879B (zh) 用于生成视频的方法和装置
CN111274442B (zh) 确定视频标签的方法、服务器及存储介质
EP3813376A1 (fr) Système et procédé pour générer annotations vidéo localisées et contextuelles
CN103299324B (zh) 使用潜在子标记来学习用于视频注释的标记
US8364660B2 (en) Apparatus and software system for and method of performing a visual-relevance-rank subsequent search
US9031974B2 (en) Apparatus and software system for and method of performing a visual-relevance-rank subsequent search
KR101944469B1 (ko) 컴퓨터 실행 방법, 시스템 및 컴퓨터 판독 가능 매체
US20170124096A1 (en) System and method for multi-modal fusion based fault-tolerant video content recognition
WO2018108047A1 (fr) Procédé et dispositif de génération d'image d'affichage d'informations
CN106921891A (zh) 一种视频特征信息的展示方法和装置
CN103384883B (zh) 利用Top-K处理使语义丰富
Thomas et al. Perceptual video summarization—A new framework for video summarization
CN114845149B (zh) 视频片段的剪辑方法、视频推荐方法、装置、设备及介质
CN111263186A (zh) 视频生成、播放、搜索以及处理方法、装置和存储介质
CN110399505A (zh) 语义标签生成方法及设备、计算机存储介质
TWI725375B (zh) 資料搜尋方法及其資料搜尋系統
US20250119625A1 (en) Generating video insights based on machine-generated text representations of videos
CN116975363A (zh) 视频标签生成方法、装置、电子设备及存储介质
CN116051192A (zh) 处理数据的方法和装置
CN113297452B (zh) 多级检索方法、多级检索装置及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18776483

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18776483

Country of ref document: EP

Kind code of ref document: A1