Saliency-based methods for automated video cropping in sidewalk footage

Costa, Suayder Milhomem

Saliency-based methods for automated video cropping in sidewalk footage

Detalhes bibliográficos
Ano de defesa:	2025
Autor(a) principal:	Costa, Suayder Milhomem
Orientador(a):	Não Informado pela instituição
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Dissertação
Tipo de acesso:	Acesso aberto
Idioma:	eng
Instituição de defesa:	Biblioteca Digitais de Teses e Dissertações da USP
Programa de Pós-Graduação:	Não Informado pela instituição
Departamento:	Não Informado pela instituição
País:	Não Informado pela instituição
Palavras-chave em Português:	Infraestrutura urbana Pavimento tátil Predição de saliência Recorte de vídeos Salience prediction Tactile paving Urban infrastructure Video cropping
Link de acesso:	https://www.teses.usp.br/teses/disponiveis/45/45134/tde-13092025-192333/
Resumo:	The condition of urban infrastructure is an important aspect in ensuring the safety and well-being of pedestrians. This is especially important when considering accessibility for individuals with mobility challenges, such as the elderly and visually impaired, who are especially vulnerable to poorly maintained sidewalks. Areas surrounding hospitals are of particular concernnot only due to the high volume of pedestrian and vehicle traffic, but also because they serve individuals in vulnerable health conditions who require safe and reliable access to medical services. In this context, many computational tools have already demonstrated their potential for urban infrastructure analysis, including surface material classification and obstacle detection; however, most solutions require labeled data, which is costly and time-consuming. To address this gap, two strategies for salience prediction in videos are proposed, aiming to reduce the dependence on manual labeling and contribute to sidewalk analysis. Both strategies ultimately serve to train saliency predictors tailored to specific sidewalk features. The first strategy leverages human visual attention by converting user clicks into attention maps through post-processing. This approach is particularly effective for identifying general sidewalk obstacles, such as cracks and surface defects. The second employs the \\acf model, enhanced with additional processing, to generate labeled video data more efficiently for specialized tactile features. This enables the training of saliency predictors that can recognize key elements of tactile paving, including directional changes and broken tiles. An important advantage of this strategy is its scalability -- it can potentially be extended to detect a wider range of features in the urban environment. These saliency models serve as the foundation for a proposed video cropping method designed to automatically trim frames and highlight the most relevant regions based on saliency map outputs. This approach enables the identification of key areas within each frame and supports applications such as content-aware video retargeting, object-focused attention, and sidewalk condition analysis by emphasizing defects and potential hazards. This research compiles our previous studies \\citep{costa2024videocropping, costa2024tactile, costa2025salience} presenting the following main contributions: (1) development of a click-based video annotation tool, (2) an annotated dataset of egocentric videos in sidewalk footage tailored for saliency prediction, (3) implementation of two saliency prediction strategies for sidewalk video cropping, (4) training and evaluation of saliency models for sidewalk structure analysis, and (5) integration of these models into a video cropping framework. Experimental results demonstrate that the saliency models effectively highlight relevant information in urban environments, achieving an AUC of 0.582 for human-based attention and 0.914 for tactile-based attention, thereby enhancing assistive technologies for visually impaired individuals.

Metadados do item

id	USP_4682912395459d6f02e888bad259efbe
oai_identifier_str	oai:teses.usp.br:tde-13092025-192333
network_acronym_str	USP
network_name_str	Biblioteca Digital de Teses e Dissertações da USP
repository_id_str
spelling	Saliency-based methods for automated video cropping in sidewalk footageMétodos baseados em saliência para corte automatizado de vídeo em filmagens de calçadasInfraestrutura urbanaPavimento tátilPredição de saliênciaRecorte de vídeosSalience predictionTactile pavingUrban infrastructureVideo croppingThe condition of urban infrastructure is an important aspect in ensuring the safety and well-being of pedestrians. This is especially important when considering accessibility for individuals with mobility challenges, such as the elderly and visually impaired, who are especially vulnerable to poorly maintained sidewalks. Areas surrounding hospitals are of particular concernnot only due to the high volume of pedestrian and vehicle traffic, but also because they serve individuals in vulnerable health conditions who require safe and reliable access to medical services. In this context, many computational tools have already demonstrated their potential for urban infrastructure analysis, including surface material classification and obstacle detection; however, most solutions require labeled data, which is costly and time-consuming. To address this gap, two strategies for salience prediction in videos are proposed, aiming to reduce the dependence on manual labeling and contribute to sidewalk analysis. Both strategies ultimately serve to train saliency predictors tailored to specific sidewalk features. The first strategy leverages human visual attention by converting user clicks into attention maps through post-processing. This approach is particularly effective for identifying general sidewalk obstacles, such as cracks and surface defects. The second employs the \\acf model, enhanced with additional processing, to generate labeled video data more efficiently for specialized tactile features. This enables the training of saliency predictors that can recognize key elements of tactile paving, including directional changes and broken tiles. An important advantage of this strategy is its scalability -- it can potentially be extended to detect a wider range of features in the urban environment. These saliency models serve as the foundation for a proposed video cropping method designed to automatically trim frames and highlight the most relevant regions based on saliency map outputs. This approach enables the identification of key areas within each frame and supports applications such as content-aware video retargeting, object-focused attention, and sidewalk condition analysis by emphasizing defects and potential hazards. This research compiles our previous studies \\citep{costa2024videocropping, costa2024tactile, costa2025salience} presenting the following main contributions: (1) development of a click-based video annotation tool, (2) an annotated dataset of egocentric videos in sidewalk footage tailored for saliency prediction, (3) implementation of two saliency prediction strategies for sidewalk video cropping, (4) training and evaluation of saliency models for sidewalk structure analysis, and (5) integration of these models into a video cropping framework. Experimental results demonstrate that the saliency models effectively highlight relevant information in urban environments, achieving an AUC of 0.582 for human-based attention and 0.914 for tactile-based attention, thereby enhancing assistive technologies for visually impaired individuals.A condição da infraestrutura urbana é um aspecto fundamental para garantir a segurança e o bem-estar dos pedestres. Esse fator torna-se ainda mais relevante ao se considerar a acessibilidade para pessoas com mobilidade reduzida, como idosos e pessoas com deficiência visual, que são particularmente vulneráveis a calçadas mal conservadas. Regiões no entorno de hospitais merecem atenção especial não apenas pelo alto fluxo de pedestres e veículos, mas também por atenderem indivíduos em condições de saúde fragilizadas, que demandam acesso seguro e confiável aos serviços médicos. Nesse contexto, diversas ferramentas computacionais já demonstraram seu potencial para análise de infraestrutura urbana, como a classificação de materiais de superfície e a detecção de obstáculos; no entanto, a maioria das soluções existentes depende de dados rotulados, cuja obtenção é dispendiosa e demorada. Para suprir essa lacuna, propõem-se duas estratégias para predição de saliência em vídeos, com o objetivo de reduzir a dependência de rotulagem manual e contribuir para a análise de calçadas. Ambas as estratégias visam, em última instância, o treinamento de preditores de saliência adaptados a características específicas do ambiente urbano. A primeira estratégia explora a atenção visual humana, convertendo cliques de usuários em mapas de atenção por meio de pós-processamento. Essa abordagem demonstra particular eficácia na identificação de obstáculos genéricos em calçadas, como rachaduras e defeitos na superfície. A segunda estratégia emprega o modelo \\acf, aprimorado com etapas adicionais de processamento, para gerar de forma mais eficiente dados de vídeo rotulados voltados a características táteis especializadas. Isso possibilita o treinamento de preditores de saliência capazes de reconhecer elementos-chave do piso tátil, incluindo alterações de direção e placas táteis danificadas. Um diferencial dessa abordagem é sua escalabilidade -- com potencial para ser estendida à detecção de uma gama mais ampla de características no ambiente urbano. Esses modelos de saliência constituem a base para um método proposto de recorte automático de vídeos, que visa eliminar regiões irrelevantes dos quadros e destacar as áreas mais significativas com base nos mapas de saliência gerados. Essa abordagem permite identificar regiões-chave em cada quadro e viabiliza aplicações como redirecionamento de vídeo com consciência de conteúdo, foco de atenção em objetos e análise das condições das calçadas, ao evidenciar defeitos e riscos potenciais. Esta pesquisa consolida estudos anteriores \\citep{costa2024videocropping, costa2024tactile, costa2025salience}, apresentando as seguintes contribuições principais: (1) desenvolvimento de uma ferramenta de anotação de vídeos baseada em cliques, (2) um conjunto de dados anotados de vídeos egocêntricos de calçadas, voltado para predição de saliência, (3) implementação de duas estratégias de detecção de saliência para recorte de vídeos de calçadas, (4) treinamento e avaliação de modelos de saliência para análise estrutural de calçadas, e (5) integração desses modelos em um framework de recorte automático de vídeo. Os resultados experimentais demonstram que os modelos de saliência propostos destacam de forma eficaz informações relevantes em ambientes urbanos, alcançando AUC de 0,582 para atenção baseada em humanos e 0,914 para atenção baseada em elementos táteis, contribuindo assim para o aprimoramento de tecnologias assistivas voltadas a pessoas com deficiência visual.Biblioteca Digitais de Teses e Dissertações da USPCesar Junior, Roberto MarcondesCosta, Suayder Milhomem2025-07-18info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://www.teses.usp.br/teses/disponiveis/45/45134/tde-13092025-192333/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2025-10-02T09:07:02Zoai:teses.usp.br:tde-13092025-192333Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br\|\| atendimento@aguia.usp.br\|\|virginia@if.usp.bropendoar:27212025-10-02T09:07:02Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false
dc.title.none.fl_str_mv	Saliency-based methods for automated video cropping in sidewalk footage Métodos baseados em saliência para corte automatizado de vídeo em filmagens de calçadas
title	Saliency-based methods for automated video cropping in sidewalk footage
spellingShingle	Saliency-based methods for automated video cropping in sidewalk footage Costa, Suayder Milhomem Infraestrutura urbana Pavimento tátil Predição de saliência Recorte de vídeos Salience prediction Tactile paving Urban infrastructure Video cropping
title_short	Saliency-based methods for automated video cropping in sidewalk footage
title_full	Saliency-based methods for automated video cropping in sidewalk footage
title_fullStr	Saliency-based methods for automated video cropping in sidewalk footage
title_full_unstemmed	Saliency-based methods for automated video cropping in sidewalk footage
title_sort	Saliency-based methods for automated video cropping in sidewalk footage
author	Costa, Suayder Milhomem
author_facet	Costa, Suayder Milhomem
author_role	author
dc.contributor.none.fl_str_mv	Cesar Junior, Roberto Marcondes
dc.contributor.author.fl_str_mv	Costa, Suayder Milhomem
dc.subject.por.fl_str_mv	Infraestrutura urbana Pavimento tátil Predição de saliência Recorte de vídeos Salience prediction Tactile paving Urban infrastructure Video cropping
topic	Infraestrutura urbana Pavimento tátil Predição de saliência Recorte de vídeos Salience prediction Tactile paving Urban infrastructure Video cropping
description	The condition of urban infrastructure is an important aspect in ensuring the safety and well-being of pedestrians. This is especially important when considering accessibility for individuals with mobility challenges, such as the elderly and visually impaired, who are especially vulnerable to poorly maintained sidewalks. Areas surrounding hospitals are of particular concernnot only due to the high volume of pedestrian and vehicle traffic, but also because they serve individuals in vulnerable health conditions who require safe and reliable access to medical services. In this context, many computational tools have already demonstrated their potential for urban infrastructure analysis, including surface material classification and obstacle detection; however, most solutions require labeled data, which is costly and time-consuming. To address this gap, two strategies for salience prediction in videos are proposed, aiming to reduce the dependence on manual labeling and contribute to sidewalk analysis. Both strategies ultimately serve to train saliency predictors tailored to specific sidewalk features. The first strategy leverages human visual attention by converting user clicks into attention maps through post-processing. This approach is particularly effective for identifying general sidewalk obstacles, such as cracks and surface defects. The second employs the \\acf model, enhanced with additional processing, to generate labeled video data more efficiently for specialized tactile features. This enables the training of saliency predictors that can recognize key elements of tactile paving, including directional changes and broken tiles. An important advantage of this strategy is its scalability -- it can potentially be extended to detect a wider range of features in the urban environment. These saliency models serve as the foundation for a proposed video cropping method designed to automatically trim frames and highlight the most relevant regions based on saliency map outputs. This approach enables the identification of key areas within each frame and supports applications such as content-aware video retargeting, object-focused attention, and sidewalk condition analysis by emphasizing defects and potential hazards. This research compiles our previous studies \\citep{costa2024videocropping, costa2024tactile, costa2025salience} presenting the following main contributions: (1) development of a click-based video annotation tool, (2) an annotated dataset of egocentric videos in sidewalk footage tailored for saliency prediction, (3) implementation of two saliency prediction strategies for sidewalk video cropping, (4) training and evaluation of saliency models for sidewalk structure analysis, and (5) integration of these models into a video cropping framework. Experimental results demonstrate that the saliency models effectively highlight relevant information in urban environments, achieving an AUC of 0.582 for human-based attention and 0.914 for tactile-based attention, thereby enhancing assistive technologies for visually impaired individuals.
publishDate	2025
dc.date.none.fl_str_mv	2025-07-18
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	https://www.teses.usp.br/teses/disponiveis/45/45134/tde-13092025-192333/
url	https://www.teses.usp.br/teses/disponiveis/45/45134/tde-13092025-192333/
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv
dc.rights.driver.fl_str_mv	Liberar o conteúdo para acesso público. info:eu-repo/semantics/openAccess
rights_invalid_str_mv	Liberar o conteúdo para acesso público.
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.coverage.none.fl_str_mv
dc.publisher.none.fl_str_mv	Biblioteca Digitais de Teses e Dissertações da USP
publisher.none.fl_str_mv	Biblioteca Digitais de Teses e Dissertações da USP
dc.source.none.fl_str_mv	reponame:Biblioteca Digital de Teses e Dissertações da USP instname:Universidade de São Paulo (USP) instacron:USP
instname_str	Universidade de São Paulo (USP)
instacron_str	USP
institution	USP
reponame_str	Biblioteca Digital de Teses e Dissertações da USP
collection	Biblioteca Digital de Teses e Dissertações da USP
repository.name.fl_str_mv	Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)
repository.mail.fl_str_mv	virginia@if.usp.br\|\| atendimento@aguia.usp.br\|\|virginia@if.usp.br
_version_	1848370473629384704

Saliency-based methods for automated video cropping in sidewalk footage

Registros relacionados