Saliency-based methods for automated video cropping in sidewalk footage
| Ano de defesa: | 2025 |
|---|---|
| Autor(a) principal: | |
| Orientador(a): | |
| Banca de defesa: | |
| Tipo de documento: | Dissertação |
| Tipo de acesso: | Acesso aberto |
| Idioma: | eng |
| Instituição de defesa: |
Biblioteca Digitais de Teses e Dissertações da USP
|
| Programa de Pós-Graduação: |
Não Informado pela instituição
|
| Departamento: |
Não Informado pela instituição
|
| País: |
Não Informado pela instituição
|
| Palavras-chave em Português: | |
| Link de acesso: | https://www.teses.usp.br/teses/disponiveis/45/45134/tde-13092025-192333/ |
Resumo: | The condition of urban infrastructure is an important aspect in ensuring the safety and well-being of pedestrians. This is especially important when considering accessibility for individuals with mobility challenges, such as the elderly and visually impaired, who are especially vulnerable to poorly maintained sidewalks. Areas surrounding hospitals are of particular concernnot only due to the high volume of pedestrian and vehicle traffic, but also because they serve individuals in vulnerable health conditions who require safe and reliable access to medical services. In this context, many computational tools have already demonstrated their potential for urban infrastructure analysis, including surface material classification and obstacle detection; however, most solutions require labeled data, which is costly and time-consuming. To address this gap, two strategies for salience prediction in videos are proposed, aiming to reduce the dependence on manual labeling and contribute to sidewalk analysis. Both strategies ultimately serve to train saliency predictors tailored to specific sidewalk features. The first strategy leverages human visual attention by converting user clicks into attention maps through post-processing. This approach is particularly effective for identifying general sidewalk obstacles, such as cracks and surface defects. The second employs the \\acf model, enhanced with additional processing, to generate labeled video data more efficiently for specialized tactile features. This enables the training of saliency predictors that can recognize key elements of tactile paving, including directional changes and broken tiles. An important advantage of this strategy is its scalability -- it can potentially be extended to detect a wider range of features in the urban environment. These saliency models serve as the foundation for a proposed video cropping method designed to automatically trim frames and highlight the most relevant regions based on saliency map outputs. This approach enables the identification of key areas within each frame and supports applications such as content-aware video retargeting, object-focused attention, and sidewalk condition analysis by emphasizing defects and potential hazards. This research compiles our previous studies \\citep{costa2024videocropping, costa2024tactile, costa2025salience} presenting the following main contributions: (1) development of a click-based video annotation tool, (2) an annotated dataset of egocentric videos in sidewalk footage tailored for saliency prediction, (3) implementation of two saliency prediction strategies for sidewalk video cropping, (4) training and evaluation of saliency models for sidewalk structure analysis, and (5) integration of these models into a video cropping framework. Experimental results demonstrate that the saliency models effectively highlight relevant information in urban environments, achieving an AUC of 0.582 for human-based attention and 0.914 for tactile-based attention, thereby enhancing assistive technologies for visually impaired individuals. |
| id |
USP_4682912395459d6f02e888bad259efbe |
|---|---|
| oai_identifier_str |
oai:teses.usp.br:tde-13092025-192333 |
| network_acronym_str |
USP |
| network_name_str |
Biblioteca Digital de Teses e Dissertações da USP |
| repository_id_str |
|
| spelling |
Saliency-based methods for automated video cropping in sidewalk footageMétodos baseados em saliência para corte automatizado de vídeo em filmagens de calçadasInfraestrutura urbanaPavimento tátilPredição de saliênciaRecorte de vídeosSalience predictionTactile pavingUrban infrastructureVideo croppingThe condition of urban infrastructure is an important aspect in ensuring the safety and well-being of pedestrians. This is especially important when considering accessibility for individuals with mobility challenges, such as the elderly and visually impaired, who are especially vulnerable to poorly maintained sidewalks. Areas surrounding hospitals are of particular concernnot only due to the high volume of pedestrian and vehicle traffic, but also because they serve individuals in vulnerable health conditions who require safe and reliable access to medical services. In this context, many computational tools have already demonstrated their potential for urban infrastructure analysis, including surface material classification and obstacle detection; however, most solutions require labeled data, which is costly and time-consuming. To address this gap, two strategies for salience prediction in videos are proposed, aiming to reduce the dependence on manual labeling and contribute to sidewalk analysis. Both strategies ultimately serve to train saliency predictors tailored to specific sidewalk features. The first strategy leverages human visual attention by converting user clicks into attention maps through post-processing. This approach is particularly effective for identifying general sidewalk obstacles, such as cracks and surface defects. The second employs the \\acf model, enhanced with additional processing, to generate labeled video data more efficiently for specialized tactile features. This enables the training of saliency predictors that can recognize key elements of tactile paving, including directional changes and broken tiles. An important advantage of this strategy is its scalability -- it can potentially be extended to detect a wider range of features in the urban environment. These saliency models serve as the foundation for a proposed video cropping method designed to automatically trim frames and highlight the most relevant regions based on saliency map outputs. This approach enables the identification of key areas within each frame and supports applications such as content-aware video retargeting, object-focused attention, and sidewalk condition analysis by emphasizing defects and potential hazards. This research compiles our previous studies \\citep{costa2024videocropping, costa2024tactile, costa2025salience} presenting the following main contributions: (1) development of a click-based video annotation tool, (2) an annotated dataset of egocentric videos in sidewalk footage tailored for saliency prediction, (3) implementation of two saliency prediction strategies for sidewalk video cropping, (4) training and evaluation of saliency models for sidewalk structure analysis, and (5) integration of these models into a video cropping framework. Experimental results demonstrate that the saliency models effectively highlight relevant information in urban environments, achieving an AUC of 0.582 for human-based attention and 0.914 for tactile-based attention, thereby enhancing assistive technologies for visually impaired individuals.A condição da infraestrutura urbana é um aspecto fundamental para garantir a segurança e o bem-estar dos pedestres. Esse fator torna-se ainda mais relevante ao se considerar a acessibilidade para pessoas com mobilidade reduzida, como idosos e pessoas com deficiência visual, que são particularmente vulneráveis a calçadas mal conservadas. Regiões no entorno de hospitais merecem atenção especial não apenas pelo alto fluxo de pedestres e veículos, mas também por atenderem indivíduos em condições de saúde fragilizadas, que demandam acesso seguro e confiável aos serviços médicos. Nesse contexto, diversas ferramentas computacionais já demonstraram seu potencial para análise de infraestrutura urbana, como a classificação de materiais de superfície e a detecção de obstáculos; no entanto, a maioria das soluções existentes depende de dados rotulados, cuja obtenção é dispendiosa e demorada. Para suprir essa lacuna, propõem-se duas estratégias para predição de saliência em vídeos, com o objetivo de reduzir a dependência de rotulagem manual e contribuir para a análise de calçadas. Ambas as estratégias visam, em última instância, o treinamento de preditores de saliência adaptados a características específicas do ambiente urbano. A primeira estratégia explora a atenção visual humana, convertendo cliques de usuários em mapas de atenção por meio de pós-processamento. Essa abordagem demonstra particular eficácia na identificação de obstáculos genéricos em calçadas, como rachaduras e defeitos na superfície. A segunda estratégia emprega o modelo \\acf, aprimorado com etapas adicionais de processamento, para gerar de forma mais eficiente dados de vídeo rotulados voltados a características táteis especializadas. Isso possibilita o treinamento de preditores de saliência capazes de reconhecer elementos-chave do piso tátil, incluindo alterações de direção e placas táteis danificadas. Um diferencial dessa abordagem é sua escalabilidade -- com potencial para ser estendida à detecção de uma gama mais ampla de características no ambiente urbano. Esses modelos de saliência constituem a base para um método proposto de recorte automático de vídeos, que visa eliminar regiões irrelevantes dos quadros e destacar as áreas mais significativas com base nos mapas de saliência gerados. Essa abordagem permite identificar regiões-chave em cada quadro e viabiliza aplicações como redirecionamento de vídeo com consciência de conteúdo, foco de atenção em objetos e análise das condições das calçadas, ao evidenciar defeitos e riscos potenciais. Esta pesquisa consolida estudos anteriores \\citep{costa2024videocropping, costa2024tactile, costa2025salience}, apresentando as seguintes contribuições principais: (1) desenvolvimento de uma ferramenta de anotação de vídeos baseada em cliques, (2) um conjunto de dados anotados de vídeos egocêntricos de calçadas, voltado para predição de saliência, (3) implementação de duas estratégias de detecção de saliência para recorte de vídeos de calçadas, (4) treinamento e avaliação de modelos de saliência para análise estrutural de calçadas, e (5) integração desses modelos em um framework de recorte automático de vídeo. Os resultados experimentais demonstram que os modelos de saliência propostos destacam de forma eficaz informações relevantes em ambientes urbanos, alcançando AUC de 0,582 para atenção baseada em humanos e 0,914 para atenção baseada em elementos táteis, contribuindo assim para o aprimoramento de tecnologias assistivas voltadas a pessoas com deficiência visual.Biblioteca Digitais de Teses e Dissertações da USPCesar Junior, Roberto MarcondesCosta, Suayder Milhomem2025-07-18info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://www.teses.usp.br/teses/disponiveis/45/45134/tde-13092025-192333/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2025-10-02T09:07:02Zoai:teses.usp.br:tde-13092025-192333Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.bropendoar:27212025-10-02T09:07:02Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false |
| dc.title.none.fl_str_mv |
Saliency-based methods for automated video cropping in sidewalk footage Métodos baseados em saliência para corte automatizado de vídeo em filmagens de calçadas |
| title |
Saliency-based methods for automated video cropping in sidewalk footage |
| spellingShingle |
Saliency-based methods for automated video cropping in sidewalk footage Costa, Suayder Milhomem Infraestrutura urbana Pavimento tátil Predição de saliência Recorte de vídeos Salience prediction Tactile paving Urban infrastructure Video cropping |
| title_short |
Saliency-based methods for automated video cropping in sidewalk footage |
| title_full |
Saliency-based methods for automated video cropping in sidewalk footage |
| title_fullStr |
Saliency-based methods for automated video cropping in sidewalk footage |
| title_full_unstemmed |
Saliency-based methods for automated video cropping in sidewalk footage |
| title_sort |
Saliency-based methods for automated video cropping in sidewalk footage |
| author |
Costa, Suayder Milhomem |
| author_facet |
Costa, Suayder Milhomem |
| author_role |
author |
| dc.contributor.none.fl_str_mv |
Cesar Junior, Roberto Marcondes |
| dc.contributor.author.fl_str_mv |
Costa, Suayder Milhomem |
| dc.subject.por.fl_str_mv |
Infraestrutura urbana Pavimento tátil Predição de saliência Recorte de vídeos Salience prediction Tactile paving Urban infrastructure Video cropping |
| topic |
Infraestrutura urbana Pavimento tátil Predição de saliência Recorte de vídeos Salience prediction Tactile paving Urban infrastructure Video cropping |
| description |
The condition of urban infrastructure is an important aspect in ensuring the safety and well-being of pedestrians. This is especially important when considering accessibility for individuals with mobility challenges, such as the elderly and visually impaired, who are especially vulnerable to poorly maintained sidewalks. Areas surrounding hospitals are of particular concernnot only due to the high volume of pedestrian and vehicle traffic, but also because they serve individuals in vulnerable health conditions who require safe and reliable access to medical services. In this context, many computational tools have already demonstrated their potential for urban infrastructure analysis, including surface material classification and obstacle detection; however, most solutions require labeled data, which is costly and time-consuming. To address this gap, two strategies for salience prediction in videos are proposed, aiming to reduce the dependence on manual labeling and contribute to sidewalk analysis. Both strategies ultimately serve to train saliency predictors tailored to specific sidewalk features. The first strategy leverages human visual attention by converting user clicks into attention maps through post-processing. This approach is particularly effective for identifying general sidewalk obstacles, such as cracks and surface defects. The second employs the \\acf model, enhanced with additional processing, to generate labeled video data more efficiently for specialized tactile features. This enables the training of saliency predictors that can recognize key elements of tactile paving, including directional changes and broken tiles. An important advantage of this strategy is its scalability -- it can potentially be extended to detect a wider range of features in the urban environment. These saliency models serve as the foundation for a proposed video cropping method designed to automatically trim frames and highlight the most relevant regions based on saliency map outputs. This approach enables the identification of key areas within each frame and supports applications such as content-aware video retargeting, object-focused attention, and sidewalk condition analysis by emphasizing defects and potential hazards. This research compiles our previous studies \\citep{costa2024videocropping, costa2024tactile, costa2025salience} presenting the following main contributions: (1) development of a click-based video annotation tool, (2) an annotated dataset of egocentric videos in sidewalk footage tailored for saliency prediction, (3) implementation of two saliency prediction strategies for sidewalk video cropping, (4) training and evaluation of saliency models for sidewalk structure analysis, and (5) integration of these models into a video cropping framework. Experimental results demonstrate that the saliency models effectively highlight relevant information in urban environments, achieving an AUC of 0.582 for human-based attention and 0.914 for tactile-based attention, thereby enhancing assistive technologies for visually impaired individuals. |
| publishDate |
2025 |
| dc.date.none.fl_str_mv |
2025-07-18 |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
| format |
masterThesis |
| status_str |
publishedVersion |
| dc.identifier.uri.fl_str_mv |
https://www.teses.usp.br/teses/disponiveis/45/45134/tde-13092025-192333/ |
| url |
https://www.teses.usp.br/teses/disponiveis/45/45134/tde-13092025-192333/ |
| dc.language.iso.fl_str_mv |
eng |
| language |
eng |
| dc.relation.none.fl_str_mv |
|
| dc.rights.driver.fl_str_mv |
Liberar o conteúdo para acesso público. info:eu-repo/semantics/openAccess |
| rights_invalid_str_mv |
Liberar o conteúdo para acesso público. |
| eu_rights_str_mv |
openAccess |
| dc.format.none.fl_str_mv |
application/pdf |
| dc.coverage.none.fl_str_mv |
|
| dc.publisher.none.fl_str_mv |
Biblioteca Digitais de Teses e Dissertações da USP |
| publisher.none.fl_str_mv |
Biblioteca Digitais de Teses e Dissertações da USP |
| dc.source.none.fl_str_mv |
reponame:Biblioteca Digital de Teses e Dissertações da USP instname:Universidade de São Paulo (USP) instacron:USP |
| instname_str |
Universidade de São Paulo (USP) |
| instacron_str |
USP |
| institution |
USP |
| reponame_str |
Biblioteca Digital de Teses e Dissertações da USP |
| collection |
Biblioteca Digital de Teses e Dissertações da USP |
| repository.name.fl_str_mv |
Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP) |
| repository.mail.fl_str_mv |
virginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.br |
| _version_ |
1848370473629384704 |