Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Remote Blog Storage (RBS) Best Practices in SharePoint 2010 - EPC Group
1.
2.
3.
4. Overview
Understanding Unstructured Data Storage
◦ SQL BLOB
◦ RBS
◦ FILESTREAM
RBS Structure and Mechanics
Planning BLOB Storage
Deployment and Migration
Support
EBS
Summary
5. Remote BLOB Storage is designed to delineate
structured (metadata) and unstructured (BLOB
data) data
Enables organizations to deploy more efficient
content storage models based on commodity
storage
◦ Does not address capacity – database is the sum of the
unstructured and structured data regardless of location
Provides an upgrade path for WIDE customers
Improves existing BLOB storage scenarios (EBS)
6. On average 20% of data is structured, 80% is
unstructured or semi-structured
7. Does not adhere to specific format or
sequence
Is not tied to rules and unpredictable
Examples:
◦ Text
◦ Video
◦ Audio
◦ Images
◦ Word, PowerPoint, code, etc.
8. Organized in semantic chunks (entities)
Tied to relationships and has attributes
Associated with a defined schema:
◦ All entities have the defined format
◦ Have a predefined length
9. BLOB Binary Large OBject
BLOB is the data stream associated with a file
◦ SharePoint file metadata and BLOBs are stored in
SQL databases
◦ BLOBs do not participate in query operations
◦ Sample BLOB operations: Get, Put, Read range, etc.
SharePoint is built around the file
◦ Document libraries, Record Centers
BLOBs generally represent 80% of total
content
10. Web Server
Database Server
Content
Database
SharePoint stores BLOBs and associated
metadata in the content database
11. Storage
◦ SQL storage is usually more expensive
SAN versus CAS stores
Performance
◦ Impacts load on SQL Server box
Policy requirements
◦ Expunge, BLOB immutability
12. Web Server
Database Server
Content
Database
BLOB BLOB BLOB
Store X Store Y Store Z
• Independent component that can be registered for a SharePoint farm
• Data store for all BLOBs added to the content databases where the Provider is set
Active
• Customers can select BLOB store providers
13. Unstructured Data Unstructured Data Unstructured Data
Dedicated BLOB Stores
SQL BLOB Integrated File +
Remote File Servers
Database
14. Binary large objects stored in data tables
(varbinary(MAX))
Traditional method of storing and retrieving
binary large objects with SharePoint Products
and Technologies
15. SQL Server 2008 Add-on
Remote Blob Storage (RBS) is a library API set
that is designed to move storage of large
binary data (BLOBs) from Microsoft SQL Server
to external storage solutions.
RBS gives applications the ability to use rich
relational capabilities of SQL Server for their
structured data along with capabilities of
dedicated storage solutions for their
unstructured data in a transactionally
consistent manner.
16. Leverages NTFS FS by storing varbinary(max)
BLOB data as files on the FS
Addresses performance by enabling more
memory access for query processing
Suited for scenarios where BLOB
(unstructured) data is 1 MB or larger and fast
read access is desired (i.e. RM scenarios)
17. Storage attribute on
Unstructured Data VARBINARY(MAX)
Unstructured data stored directly in
the file system (requires NTFS)
Dual Programming Model
Data Consistency
Size limit is the file system volume
size
Integrated Manageability
SQL Server Security Stack (Same as
Integrated Files &
Database SQL BLOB)
19. Unstructured data is Unstructured data is
stored in a Filegroup stored in a Filegroup in a
separate database or SQL
with the associated Server instance from the
Content Database with associated Content
the related structured Database with the related
data on the local SQL structured data
Server instance Does not supported
integrated
Supports integrated management, structured
management, i.e. and unstructured data is
backup and restore managed separately
Local FILESTREAM Remote FILESTREAM
20. FILESTREAM Provider is limited local storage
◦ DAS, NAS, SAN are considered remote storage
regardless of disk presentation
◦ Does not support compression, TDE, and other
SQL Server capabilities
◦ Special constraints and limitations apply to BCM
scenarios such as Database Mirroring and Log
Shipping (see FAQ)
3rd party ISV solutions require SQL Server Enterprise Edition
◦ NAS storage devices require 20ms TTFB
21. RBS is a downloadable component in the SQL
Server 2008 R2 Feature Pack
◦ Includes a set of libraries and interface specifications
Defines and exposes 3 views for interaction
◦ Application View
Interacts with SharePoint Web Front-end, Provider
Library, SQL DB
Implemented by SharePoint 2010 – Transparent to the user
◦ Administrator View
Windows PowerShell CmdLets – Call Stored Procedures and
Functions
Installation, configuration, provisioning, RBS Maintainer etc.
◦ Provider View
Defines an interface that should be implemented by each
BLOB store provider
22. RBS storage contains two main features: blob
storage and blob retrieval. Blobs are
immutable so edits are translated in the
backend into new blobs.
Blobs are only deleted by garbage
collection, which scans the content database
and blob library, deleting blobs (of a certain
age) that are no longer referenced.
Where to store items (RBS or inline) is
determined at the content database level on
the back-end, not the WFE
23. Business Logic
Application Database Server
Web Server
Return BLOB
Read BLOB
BLOB
Store
Response
User Request
Content
Db
Get BLOB Id
Config Db
24. Business Logic
Application Database Server
Web Server
Save BLOB Commit
Data BLOB
Return BLOB
Id
BLOB
Store
Response
User Request
Commit
Content
BLOB Id &
Db
Metadata
Config Db
25. mssql_resources.rbs_internal_ta RBS providers store metadata
bles in their own tables inside the
[internal rbs … content database.
Content data]
Database … …
RBS provider = []
AllDocStreams
Content RbsId
NULL [ id # ] RBS Blob
[Inline Content] NULL
26. DatabaseInformation
… …
RBS Enabled [1|0]
RBS Provider [
providername The provider is set at the
] database level.
AllSites
… RBS Collection …
ID
[ id # ] Each site has a separate
RBS Provider Logic collection (top-level storage
unit). This is the default
AllDocStreams location for new blobs, and
… RBS ID …
consulted when adding new
[ rbs id (bin blobs.
This is a lookup-only
data) ] operation. The location of a
RBS Provider Logic Blob blob in the AllDocStreams
table need not be in the
same collection as what the
AllSites table default
collection.
27. Select storage solutions designed to support
BLOB data storage
◦ Write Once Read Multiple (WORM) devices can pronounce
orphan occurrences
Prevents deletion of BLOB related SharePoint metadata row
◦ SAN devices support BLOB data replication, locally and
over wide area networks using bit mapping. Database
mirroring and/or Log Shipping can augment these
solutions, but add operation complexity.
RBS Maintainer is instrumental in detecting and
resolving orphans in the event of failover
◦ Provides Reference Scan (RS), Delete Propagation
(DP), and Orphan Cleanup (OC)
28. RBS can be used in conjunction with commodity
storage, i.e. DAS; however, collaborative scenarios
should design for IO (RAID 10)
◦ CAPEX and OPEX considerations are critical to realizing ROI
Ensure operationally a multi-tiered storage subsystem can be
maintained and supported
Ensure BCM plans ensure metadata and BLOB data can be kept
synchronized
◦ Consider RBS when the following is true:
BLOB data files are larger than 256KB on average
BLOB data files are at least 80KB and the DB server I/O is a
bottleneck
◦ A large number of small BLOBs can decrease performance
RBS provides maximum value in archiving and DAM
scenarios, particularly large files with infrequent access
29. Unstructured Data Unstructured Data Unstructured Data
Dedicated BLOB Stores File System
SQL BLOB
Remote File Servers File Server
Solution Dedicated BLOB Store File System/File Server SQL BLOB
Advantages Lower cost per GB at scale Lowest cost per GB Integrated management
Scalability & Expandability Streaming Performance Data-level consistency
Disadvantages Complex application Complex application Poor data streaming support
development & deployment development & deployment File size limitations
Separate data management Integration with structured Highest cost per GB
Enterprise-scales only data
Example EMC Centera Windows File Servers SQL Server VARBINARY(MAX)
Fujitsu Nearline
30. Ensure the provider is working within the
scope of provided API’s and support policies
Evaluate provider characteristics, features,
and capabilities
Map provider offering to scenario
31. Required
◦ Implementation of RBS provider interface
◦ Enable multiple provider instances
◦ Guarantee BLOB persistence
◦ Guarantee link-level consistency
Recommended
◦ Backup, HA and Disaster recovery capability
◦ Data de-duplication
◦ Expunge, Immutability of BLOBs
32. 1. Enable FILESTREAM
2. Provision BLOB Store
3. Download and install SQL Server Remote
BLOB Store on each database server
4. Download and install SQL Server Remote
BLOB Store on each front-end Web server
5. Enable RBS
33. Enable/Disable
◦ Enables/Disables usage $database = Get-SPContentDatabase -
of RBS with SharePoint
WebApplication http://<server>
$rbs =
GetProviderNames $database.RemoteBlobStorageSettings
$rbs.Installed()
◦ List of all registered $rbs.Enable()
Providers
◦ Registered Providers are
kept track of in Config DB
$rbs.SetActiveProviderName($rbs.GetProv
iderNames()[0])
SetActiveProvider $rbs
◦ One active provider/BLOB
store per Content DB
◦ Other BLOB stores can be
used for read operations
34. Use Windows PowerShell Migrate CmdLet
◦ Moves BLOBs from current location to the current
Active RBS Provider store.
◦ Performs a “deep copy” of BLOB, sequentially
◦ Live migration – does not require downtime
◦ Migration can be stopped and resumed
Migrate – can be used for upgrade from EBS
to RBS
35. Backup
Backup Start Both Backups Complete
Restore Start Both Restores are Complete
36. Provider SQL Server 2008 SQL Server 2008 R2
2008 Not Supported Not Supported
2008 R2 with Supported Supported
FILESTREAM
Provider
37. SPFarm scoped SPContentDatabase scoped
Exposed as a COM Exposed as a .NET interface
interface (unmanaged) (managed)
Does not provide a Provides a configurable
configurable maintainer maintainer
Supports single provider Supports multiple providers
Supports Object Model,
Supports Object Model PRIME, and *SQL Server
and PRIME backup/restore
backup/restore Introduced in SharePoint
Deprecated in SharePoint 2010
2010 ◦ Implemented by SQL Server,
◦ Roadmap based on RBS supported with SharePoint
EBS RBS
38. Deep copy the binary large objects from EBS
either inline or into RBS
Internalize EBS binary large objects, re-
externalize with RBS
39. Used by Work and PowerPoint Web
Applications
◦ Implemented per Web Application
40. Database Mirroring does not support
FILESTREAM
◦ Log Shipping should augment Database Mirroring
for BLOB protection
42. Successful binary large object
implementations require careful planning to
ensure expected ROI is realized
Binary large object externalization is
designed to reduce capital expenditures, it
does not resolve capacity or performance
constraints
Standardized API set allows choice of
providers
Editor's Notes
BLOBs and relational database data are very different entities. Relational data usually consists of text or numbers and tends to be small. In contrast, BLOB data is most often pictures in .jpg, .tiff, or .bmp format—such as product images on a Web site—which can be quite large.
There are three (3) approaches to storing unstructured data with SQL Server, RBS, SQL BLOB, and FILESTREAM.Remote BLOB Storage (RBS), SharePoint relies on a new layer in SQL Server to read or update BLOB data stored outside of the database on separate BLOB Stores (File System or dedicated BLOB Stored)SQL BLOB traditional BLOB storage with SharePoint, BLOB data is stored along side the structured metadata in the Content DatabaseFILESTREAM Storage: Improving SQL BLOBGeneral guidance can be summarized as:Storing records that are on average smaller than 256KB is optimized with traditional SQL BLOBStoring large BLOB data like large video files will benefit from FILESTREAM - FILESTREAM will provide better streaming performance on these filesStoring the TIFF images or large static files on remote BLOBs, you should consider to leveraging the new Remote BLOB API layer, i.e. RBS
varbinary is the binary data type designation for binary large objects stored in SharePoint 2010 Content Databases and refers to variable-length binary data. (MAX) refers to a value that max indicates that the maximum storage size is 2^31-1 bytes or otherwise 2GB.
RBS is a SQL Server 2008 add-on that uses auxiliary tables, stored procedures and a managed client library to provide its services. A reference to the blob (provided by the Blob Store) is stored in RBS auxiliary tables and an RBS Blob ID is generated.ISVs create RBS Provider Libraries to enable custom BLOB stores using the RBS API set. The externalization of binary large objects in SharePoint 2010 is often referred to as RBS.
FILESTREAM storage is implemented as a *varbinary(max) column in which the data is stored as BLOBs in the file system.Transact-SQL statements can insert, update, query, search, and back up FILESTREAM data. Win32 file system interfaces provide streaming access to the data.A FILESTREAM filegroup is a special filegroup that contains file system directories instead of the files themselves. These file system directories are called data containers. Data containers are the interface between Database Engine storage and file system storage.FILESTREAM uses the NT system cache for caching file data. This helps reduce any effect that FILESTREAM data might have on Database Engine performance. The SQL Server buffer pool is not used; therefore, this memory is available for query processing.Performance: The way the data is going to be used is a critical factor. If streaming access is needed, storing the data inside a SQL Server database may be slower than storing it externally in a location such as the NTFS file system. Using file system storage, the data is read from the file and passed to the client application (either directly or with additional buffering). When the BLOB is stored in a SQL Server database, the data must first be read into SQL Server’s memory (the buffer pool) and then passed back out through a client connection to the client application. Not only does this mean the data goes through an extra processing step, it also means that SQL Server’s memory is unnecessarily “polluted” with BLOB data, which can cause further performance problems for SQL Server operations.FILESTREAM is most appropriate when the following conditions are true:Objects that are being stored are, on average, larger than 1 MB.Fast read access is important.http://technet.microsoft.com/en-us/library/bb933993.aspxhttp://msdn.microsoft.com/en-us/library/cc949109(SQL.100).aspx*See slide
Storage AttributeA FILESTREAM BLOB is a SQL BLOB, varbinary(MAX) Dual Programming ModelAll the existing code you have on BLOBs can work for a filestream BLOB.We leverage the Windows file systems layer to enable the win32 rich streaming programming model with the same transactional semantics as T-SQL. You open can obtain a real win32 handle to the filestream BLOB and call win32 read/write APIs on it.This is main advantage over SQL BLOB. With your large BLOB data stored now on the filesystem, the streaming performance can be as good as the filesystem’s. You can leverage the file system streaming optimizations options that fit your app’s access patterns.Data ConsistencySQL Server is now on the I/O stack of the streaming that you do on the filestream unstructured data Integrated ManageabilityThe filestream BLOB is still a SQL BLOB; FTS, Indexing, Replication, Log shipping etc.. Works for filestream BLOBs.
BLOBs smaller than 256 kilobytes SQL BLOB suitable, and BLOBs larger than 1 megabyte (MB) are best stored outside the database. For those sized between 256 KB and 1 MB, the more efficient storage solution depends on the read vs. write ratio of the data, and the rate of “overwrite”. Storing BLOB data solely within the database (e.g., using the varbinary(max) data type) is limited to 2 gigabytes (GB) per BLOB and enforced through SharePoint 2010.
Write Once, Read Many (alternatively Write One, Read Multiple or WORM) refers to computer data storage systems, data storage devices, and data storage media that can be written to once, but read from multiple times. DVD/CDRLook through the application tables and find blobs that are no longer referenced by the application.GC time window to define when BLOBs are deleted from the blob store.blobs are "orphans" and can be caused due to aborted transactions, application misbehavior or other failures. Orphan blobs created before the GC time window are deleted from the blob store.Get more info here http://blogs.msdn.com/b/sqlrbs/archive/2008/08/08/rbs-garbage-collection-settings-and-rationale.aspx.
Enterprises have 3 options. Each option has advantages & disadvantages based on your scenario.What are the main tradeoffs? When I choose to store my Jpeg images or video files on Windows file system/file server configuration, I get great streaming performance. But my development cost is high because SharePoint needs to have the logic to keep the structured data in sync with my Jpegs/MP3 files stored outside the db.When I choose to store my Jpeg images or video files on a dedicated Blob store, I can scale and expand my storage space easily. The more I expand it the cheaper it gets to add to it. In this case streaming performance depends on your store and keeping structured and unstructured data in sync requires a complex implementation at the app level. When I choose to store my Jpeg images as SQL VARBINAY(MAX) BLOBs in SQL Database, my Jpeg files are constantly in sync with the metadata. I can also use all the SQL Server database management features on my Jpeg files. What’s the tradeoff in this case? SharePoint has to manage the file size limitation and poor streaming performance if my jpegs grow larger (a paper by Jim Grey points out the SQL BLOBs streaming performance is better than file system streaming perf for files smaller than 250KB, while when the unstructured data is around 1MB and larger the File system streaming performs much better than SQL BLOB Streaming).I’ve also seen large enterprises, designing more than one of these options in the same app and some central module decides in which option to store the new BLOB depending on the storage purpose (how large the BLOB is, how it will be used)Show of handsWho store some or all BLOB data in SQL Server BLOBs (Varbinary(max))?Who store some of the BLOB data on file system?Who store some of the BLOB data on remote dedicated store?
The Office Web Applications cache is used by Word and PowerPoint Web Applications to create a version of a document requested for viewing through the browser improving performance and reducing resource consumption on server machines by making cached versions of a document or presentation available in cases where there are multiple requests for the same document.The Office Web Applications cache occurs in two (2) distinct tiers, on the server file system and within a “specialized” site collection hosted on a per Web application basis. Document or presentation requests made through the Office Web Applications are served through both caches as the images are rendered for client consumption. Both cache locations are used by all site collections within a Web application where the Office Web Applications features activated.For example, when a client requests a document, the document is rendered through an AppServerHost.exe process and the document subsequently is cached to the server file system cache located on each server from which the content is propagated to the site collection cache which exists on a per Web application basis. Subsequent requests for the document are rendered from the site collection cache.
The maintenance tasks associated with Remote BLOB Storage (RBS) are mainly performed through the RBS Maintainer. The RBS Maintainer performs periodic garbage collection and other maintenance tasks for an RBS deployment.GCGarbage collection is how unreferenced or deleted data is removed from the remote BLOB store. Garbage collection in RBS is performed passively. References to BLOBs are counted by looking at the list of BLOB IDs stored by the application in its RBS table columns at the time of garbage collection.RSLook through the application tables and find blobs that are no longer referenced by the application. The list of registered RBS columns is used for this purpose. BlobIds must not be stored in any place other than the registered columns. The blobs that are no longer referenced by the application are marked to be deleted.DPBlobs marked for deletion are actually deleted from the blob store. There is a gap between when the blobs get marked for deletion and when they are actually deleted. This gap duration can be configured using the "garbage_collection_time_window" config item and defaults to 30 days.