PDF and MS Office (X) files identified as "compound" files and "not processed"

  • 20 October 2021
  • 9 replies

Userlevel 3

Is it correct that KAV identified PDF and MS Office X files (docx, xlsx) as “compound files” which then fall under the maximum size restriction for being scanned (aka “not processed” due to size limit)?

According to my own log files and someone on Reddit this is currently happening.

9 replies

Userlevel 7
Badge +8

Hello @Timur Born,

the assumption is correct.
.docx or .xlsx are .zip archives. You can open them e.g. with 7-ZIP and analyze the different components.

Userlevel 3

So I assume that the MS Office files are then scanned when opened in an application (and thus unzipped by the application)?

What about PDF files? Why are these identified as compound files and are they scanned once  opened via PDF application?

Userlevel 7
Badge +8

Hi @Timur Born,

I don't know exactly what format is used for .pdf. It surely depends on how images, graphics or other content are embedded.

In any case, these files are analyzed when they are opened and unpacked.

Userlevel 3

PDF files are more or less postSCRIPT files, close to text files. They are not compressed.

This is what the content of a PDF file looks like:

?xpacket begin=' ' id='W5M0MpCehiHzreSzNTczkc9d' ?>
<x:xmpmeta xmlns:x='adobe:ns:meta/'>
<rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'>
<rdf:Description rdf:about='' xmlns:xmp="http://ns.adobe.com/xap/1.0/"><xmp:Identifier><rdf:Bag><rdf:li>16853155</rdf:li></rdf:Bag></xmp:Identifier></rdf:Description>
<rdf:Description rdf:about='' xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/"><xmpMM:VersionID>1234334</xmpMM:VersionID></rdf:Description>

Userlevel 7
Badge +8

If you scan a .pdf via command line, it will be listed as an archive. In my example 'data0000' and 'data0001' are included.


2021-10-20 16:14:51     Scan_Objects$6749                          starting   1%

2021-10-20 16:14:51     files\  skipped: not found
2021-10-20 16:14:51     Scan_Objects$6749                          running    1%

2021-10-20 16:14:51     C:\Users\admin\Desktop\testpdf.pdf      archive PDF
2021-10-20 16:14:51     C:\Users\admin\Desktop\testpdf.pdf//data0000    ok
2021-10-20 16:14:51     C:\Users\admin\Desktop\testpdf.pdf//data0001    ok
2021-10-20 16:14:51     C:\Users\admin\Desktop\testpdf.pdf      ok
2021-10-20 16:14:52     Scan_Objects$6749                          completed

Info: task 'ods' finished, last error code 0
Warning: 1 skipped with not found
;  --- Statistics ---
; Time Start:   2021-10-20 16:14:51
; Time Finish:  2021-10-20 16:14:52
; Processed objects:    3
; Total OK:     3
; Total detected:       0
; Suspicions:   0
; Total skipped:        0
; Password protected:   0
; Corrupted:    0
; Errors:       0
;  ------------------



Userlevel 3

They sure are identified as archives, but I am not convinced that they are scanned upon access. On the contrary, they are explicitly listed as *not* being scanned due to size in the log when being opened by a PDF application (or via Explorer.exe double-click).

Userlevel 3

Here is the log-file showing that neither PDF nor MSO (docx) files are processed when opened in their respective applications.


“Scan files in Microsoft Office Format” is enabled.

Userlevel 3

When the Word docx file is opened in Word then KAV also does not scan its contents as decompressed temporary files. The file-access you see in this screenshot is AVP checking the file-size and then deciding not to process it due to large size.

Same goes for the PDF files, because as with Word the files is only decompressed and processed in memory, but not to disk.

Userlevel 3

I am also confused by the “Minimum file size” option. According to help:

“ If this check box is cleared, Kaspersky Total Security provides access to compound files only after unpacking and scanning files, regardless of their size. “

This reads as if compound files larger than the default 8 mb should be blocked by KAV due to not being scanned if the “Minimum file size” option is disabled?! But in my tests the large PDF and Office files are neither scanned nor blocked. They are just opened unscanned with a log entry stating their size being too large.